Downloads:
37,327
Downloads of v 2.7.11-candidate1:
144
Last Update:
15 Sep 2020
Package Maintainer(s):
Software Author(s):
- IronPython Contributors
- Microsoft
Tags:
ironpython python dynamic dlr- Software Specific:
- Software Site
- Software License
- Software Mailing List
- Package Specific:
- Package Source
- Package outdated?
- Package broken?
- Contact Maintainers
- Contact Site Admins
- Software Vendor?
- Report Abuse
- Download
IronPython
This is a prerelease version of IronPython.
- 1
- 2
- 3
2.7.11-candidate1 | Updated: 15 Sep 2020
- Software Specific:
- Software Site
- Software License
- Software Mailing List
- Package Specific:
- Package Source
- Package outdated?
- Package broken?
- Contact Maintainers
- Contact Site Admins
- Software Vendor?
- Report Abuse
- Download
Downloads:
37,327
Downloads of v 2.7.11-candidate1:
144
Maintainer(s):
Software Author(s):
- IronPython Contributors
- Microsoft
IronPython 2.7.11-candidate1
This is a prerelease version of IronPython.
Legal Disclaimer: Neither this package nor Chocolatey Software, Inc. are affiliated with or endorsed by IronPython Contributors, Microsoft. The inclusion of IronPython Contributors, Microsoft trademark(s), if any, upon this webpage is solely to identify IronPython Contributors, Microsoft goods or services and not for commercial purposes.
- 1
- 2
- 3
All Checks are Passing
3 Passing Tests
Deployment Method: Individual Install, Upgrade, & Uninstall
To install IronPython, run the following command from the command line or from PowerShell:
To upgrade IronPython, run the following command from the command line or from PowerShell:
To uninstall IronPython, run the following command from the command line or from PowerShell:
Deployment Method:
This applies to both open source and commercial editions of Chocolatey.
1. Enter Your Internal Repository Url
(this should look similar to https://community.chocolatey.org/api/v2/)
2. Setup Your Environment
1. Ensure you are set for organizational deployment
Please see the organizational deployment guide
2. Get the package into your environment
Option 1: Cached Package (Unreliable, Requires Internet - Same As Community)-
Open Source or Commercial:
- Proxy Repository - Create a proxy nuget repository on Nexus, Artifactory Pro, or a proxy Chocolatey repository on ProGet. Point your upstream to https://community.chocolatey.org/api/v2/. Packages cache on first access automatically. Make sure your choco clients are using your proxy repository as a source and NOT the default community repository. See source command for more information.
- You can also just download the package and push it to a repository Download
-
Open Source
-
Download the package:
Download - Follow manual internalization instructions
-
-
Package Internalizer (C4B)
-
Run: (additional options)
choco download ironpython --internalize --version=2.7.11-candidate1 --pre --source=https://community.chocolatey.org/api/v2/
-
For package and dependencies run:
choco push --source="'INTERNAL REPO URL'"
- Automate package internalization
-
Run: (additional options)
3. Copy Your Script
choco upgrade ironpython -y --source="'INTERNAL REPO URL'" --version="'2.7.11-candidate1'" --prerelease [other options]
See options you can pass to upgrade.
See best practices for scripting.
Add this to a PowerShell script or use a Batch script with tools and in places where you are calling directly to Chocolatey. If you are integrating, keep in mind enhanced exit codes.
If you do use a PowerShell script, use the following to ensure bad exit codes are shown as failures:
choco upgrade ironpython -y --source="'INTERNAL REPO URL'" --version="'2.7.11-candidate1'" --prerelease
$exitCode = $LASTEXITCODE
Write-Verbose "Exit code was $exitCode"
$validExitCodes = @(0, 1605, 1614, 1641, 3010)
if ($validExitCodes -contains $exitCode) {
Exit 0
}
Exit $exitCode
- name: Install ironpython
win_chocolatey:
name: ironpython
version: '2.7.11-candidate1'
source: INTERNAL REPO URL
state: present
allow_prerelease: yes
See docs at https://docs.ansible.com/ansible/latest/modules/win_chocolatey_module.html.
chocolatey_package 'ironpython' do
action :install
source 'INTERNAL REPO URL'
version '2.7.11-candidate1'
options '--prerelease'
end
See docs at https://docs.chef.io/resource_chocolatey_package.html.
cChocoPackageInstaller ironpython
{
Name = "ironpython"
Version = "2.7.11-candidate1"
Source = "INTERNAL REPO URL"
chocoParams = "--prerelease"
}
Requires cChoco DSC Resource. See docs at https://github.com/chocolatey/cChoco.
package { 'ironpython':
ensure => '2.7.11-candidate1',
install_options => ['--prerelease'],
provider => 'chocolatey',
source => 'INTERNAL REPO URL',
}
Requires Puppet Chocolatey Provider module. See docs at https://forge.puppet.com/puppetlabs/chocolatey.
4. If applicable - Chocolatey configuration/installation
See infrastructure management matrix for Chocolatey configuration elements and examples.
This package was approved as a trusted package on 15 Sep 2020.
IronPython is an open-source implementation of the Python programming language which is tightly integrated with the .NET Framework. IronPython can use the .NET Framework and Python libraries, and other .NET languages can use Python code just as easily.
md5: 2E07419EED278287B82FB47849712F6E | sha1: 8B7D6D374C189FB8EE7887B27E840C3D56348CFA | sha256: D430CFF64770AE090CFA93002C9A12CC06A3E63CEEA8D9AB37D1F552001AC269 | sha512: 3943A910EBB2CF7157C9CEBA783E06348835ACF5500A46CDDE421851B8A1FC212DCEB50C5D5562D20AD9DC3424BBB7DD1EFFBDA03B10FB3DB4755828321AF38A
md5: C2FDEC41C9D6F21B9749133954D9A148 | sha1: DCD3AFB5A55EF9D72CFA4C7EF3DEF2D0EF2A8776 | sha256: 4F2D443DC901CF7085BA38CE14EDD6C1C3A2684B6EE0D4B3F5AD875D587DA8E6 | sha512: A4458C7B2F5424FEA48A9A9A122BBD566562A894B5247BE31C9918BD3F658F0283BAE411FD08687F4E37D7E0D7510DE6CD81C348A088C55A6FD833B3DFF741EB
md5: 2EC8B6F478E732D34F6458B47CC83016 | sha1: 9D833125C1B4FCAE8B49C8DEB13D681D0BBC63F7 | sha256: 0718C32F2184FE2B177A9F1DFADFCFFCC2C1755DE84F659730BA237E13339AF7 | sha512: 6AD6A512471702967E67B76075EF7E458500A7EC6A3DA08841D8108ED287B1F9A20144420303EF95D12910F5318B3A7E8051E35F947732F3910CD281D2E366AA
md5: DED5731235827097AE14603BE10DC267 | sha1: B3B48B0303692F6ACD554D97C425B9CCFEE51F02 | sha256: 152FF4A241FD2F9B7F27391A38E8348026EE40CBDDE753DD4A22D8EC325A011F | sha512: 952EE380D19B778C0A7148E286821175C657BCCD97F9D3ACB6C073CC367F9C1A961D4312B5747E086E8D732F8EA679FBAE5F9F67D182B748E2146B83404A6132
md5: 1116E618155F1378324AC49639D5A330 | sha1: 0BB32EF77B004CB904A3C24B727EA4649808E0CF | sha256: 921270D008E74114E4CCA7AFD7085A82348897294CE7D045BF6045FF81F3FD4A | sha512: D5825AA384D51229FE27B7B4E23486A0B637735F7907068B72AF6914E251AE4F13FE0F8466E4709DA20B9D666AA44836A5D039D475B6C45E48F937F9DB93BFC7
md5: 2DF518822D5A79195364A481CDE57631 | sha1: B7584B0A2344AA4E8FE1A31C4B8D177AC183D236 | sha256: 727D0963DA91B94BFA6FB1FA9D25409D276653BF6ACA1E1F5F00A0B5ABE59B69 | sha512: 141208E4B9E889C441774B0B1621A213E275B02E5C530BE057D182529D643AF7BB5BEEAD1C04DC82E26DF817D9912564075F46219D79BDBF4DDB35674B4D96D2
md5: 808164C7F2C6C4E140B212D9B2F6F4EE | sha1: 82B543291A14DD6D9D30231A4DD0F5570088B426 | sha256: A1788F9ABC1F22C2176F9B4EC90333B50DE57FEEB86FE87A0E9D114C96CFD55A | sha512: AB2E3D32A16040BED92A7E0A34DE627D0ED744EF78E4F5B798A7D2224BB726E8F5C85E7A955D5E67BC1AA3203D5B19C17552C0D9F045968881E1F63C41B98D4B
md5: 3653BEA9DA0B6522815776299859BBC8 | sha1: 0E7F3FFC13638F2313C154BA19B79E17809A8F32 | sha256: CFB23F53BE700E3E0072DA07F3226A41122609BC3678563A8C2898D98A53B245 | sha512: 424520A0E3FD6C76FE3CC044A37DA2970E0A36AF2545FBA59EC97B5F8F6EACBD18271E4F57EE59B31854938AC8C1BEAADB9202E597B123A14FC7FEE524ED5F64
md5: 325A1E760B29BD88F62FCAEE12F08CA0 | sha1: C5F2F3631E2AB4862D02145A4E77DC3739C67199 | sha256: 9986BE2D54D5A5669C26F31A59A0398B4EBCF965B19D768607C123DF2F4CA37D | sha512: C1AE1E8B62F2141EBAA224F4F90F1C37C5279F0B4676435BCE28CB4B35D1AD87CD8B014FD5AC0F7098C8110836348D4FA60BFF4E412E71D2BD6A7FDC7D0D5A7E
md5: 1F8BC2B82EB7F4CA58E72C59AF85ED75 | sha1: 0DF7F692B8A8F94B05DA733D74B129C7D27A560D | sha256: 1E993D278213A141E92314C5759F21DC59DF6882254DE7EE07A09DFF7A7F3735 | sha512: E6ACC080621D7F87D35606329235AFE8EB00BCFE0FD7FCEC6F36927B2FD1C9CFCF8BDC018FB237AD578D437AA50A402B322E98CAD3B9053066092772C4782414
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Abstract Base Classes (ABCs) according to PEP 3119."""
import types
from _weakrefset import WeakSet
# Instance of old-style class
class _C: pass
_InstanceType = type(_C())
def abstractmethod(funcobj):
"""A decorator indicating abstract methods.
Requires that the metaclass is ABCMeta or derived from it. A
class that has a metaclass derived from ABCMeta cannot be
instantiated unless all of its abstract methods are overridden.
The abstract methods can be called using any of the normal
'super' call mechanisms.
Usage:
class C:
__metaclass__ = ABCMeta
@abstractmethod
def my_abstract_method(self, ...):
...
"""
funcobj.__isabstractmethod__ = True
return funcobj
class abstractproperty(property):
"""A decorator indicating abstract properties.
Requires that the metaclass is ABCMeta or derived from it. A
class that has a metaclass derived from ABCMeta cannot be
instantiated unless all of its abstract properties are overridden.
The abstract properties can be called using any of the normal
'super' call mechanisms.
Usage:
class C:
__metaclass__ = ABCMeta
@abstractproperty
def my_abstract_property(self):
...
This defines a read-only property; you can also define a read-write
abstract property using the 'long' form of property declaration:
class C:
__metaclass__ = ABCMeta
def getx(self): ...
def setx(self, value): ...
x = abstractproperty(getx, setx)
"""
__isabstractmethod__ = True
class ABCMeta(type):
"""Metaclass for defining Abstract Base Classes (ABCs).
Use this metaclass to create an ABC. An ABC can be subclassed
directly, and then acts as a mix-in class. You can also register
unrelated concrete classes (even built-in classes) and unrelated
ABCs as 'virtual subclasses' -- these and their descendants will
be considered subclasses of the registering ABC by the built-in
issubclass() function, but the registering ABC won't show up in
their MRO (Method Resolution Order) nor will method
implementations defined by the registering ABC be callable (not
even via super()).
"""
# A global counter that is incremented each time a class is
# registered as a virtual subclass of anything. It forces the
# negative cache to be cleared before its next use.
_abc_invalidation_counter = 0
def __new__(mcls, name, bases, namespace):
cls = super(ABCMeta, mcls).__new__(mcls, name, bases, namespace)
# Compute set of abstract method names
abstracts = set(name
for name, value in namespace.items()
if getattr(value, "__isabstractmethod__", False))
for base in bases:
for name in getattr(base, "__abstractmethods__", set()):
value = getattr(cls, name, None)
if getattr(value, "__isabstractmethod__", False):
abstracts.add(name)
cls.__abstractmethods__ = frozenset(abstracts)
# Set up inheritance registry
cls._abc_registry = WeakSet()
cls._abc_cache = WeakSet()
cls._abc_negative_cache = WeakSet()
cls._abc_negative_cache_version = ABCMeta._abc_invalidation_counter
return cls
def register(cls, subclass):
"""Register a virtual subclass of an ABC."""
if not isinstance(subclass, (type, types.ClassType)):
raise TypeError("Can only register classes")
if issubclass(subclass, cls):
return # Already a subclass
# Subtle: test for cycles *after* testing for "already a subclass";
# this means we allow X.register(X) and interpret it as a no-op.
if issubclass(cls, subclass):
# This would create a cycle, which is bad for the algorithm below
raise RuntimeError("Refusing to create an inheritance cycle")
cls._abc_registry.add(subclass)
ABCMeta._abc_invalidation_counter += 1 # Invalidate negative cache
def _dump_registry(cls, file=None):
"""Debug helper to print the ABC registry."""
print >> file, "Class: %s.%s" % (cls.__module__, cls.__name__)
print >> file, "Inv.counter: %s" % ABCMeta._abc_invalidation_counter
for name in sorted(cls.__dict__.keys()):
if name.startswith("_abc_"):
value = getattr(cls, name)
print >> file, "%s: %r" % (name, value)
def __instancecheck__(cls, instance):
"""Override for isinstance(instance, cls)."""
# Inline the cache checking when it's simple.
subclass = getattr(instance, '__class__', None)
if subclass is not None and subclass in cls._abc_cache:
return True
subtype = type(instance)
# Old-style instances
if subtype is _InstanceType:
subtype = subclass
if subtype is subclass or subclass is None:
if (cls._abc_negative_cache_version ==
ABCMeta._abc_invalidation_counter and
subtype in cls._abc_negative_cache):
return False
# Fall back to the subclass check.
return cls.__subclasscheck__(subtype)
return (cls.__subclasscheck__(subclass) or
cls.__subclasscheck__(subtype))
def __subclasscheck__(cls, subclass):
"""Override for issubclass(subclass, cls)."""
# Check cache
if subclass in cls._abc_cache:
return True
# Check negative cache; may have to invalidate
if cls._abc_negative_cache_version < ABCMeta._abc_invalidation_counter:
# Invalidate the negative cache
cls._abc_negative_cache = WeakSet()
cls._abc_negative_cache_version = ABCMeta._abc_invalidation_counter
elif subclass in cls._abc_negative_cache:
return False
# Check the subclass hook
ok = cls.__subclasshook__(subclass)
if ok is not NotImplemented:
assert isinstance(ok, bool)
if ok:
cls._abc_cache.add(subclass)
else:
cls._abc_negative_cache.add(subclass)
return ok
# Check if it's a direct subclass
if cls in getattr(subclass, '__mro__', ()):
cls._abc_cache.add(subclass)
return True
# Check if it's a subclass of a registered class (recursive)
for rcls in cls._abc_registry:
if issubclass(subclass, rcls):
cls._abc_cache.add(subclass)
return True
# Check if it's a subclass of a subclass (recursive)
for scls in cls.__subclasses__():
if issubclass(subclass, scls):
cls._abc_cache.add(subclass)
return True
# No dice; update negative cache
cls._abc_negative_cache.add(subclass)
return False
"""Stuff to parse AIFF-C and AIFF files.
Unless explicitly stated otherwise, the description below is true
both for AIFF-C files and AIFF files.
An AIFF-C file has the following structure.
+-----------------+
| FORM |
+-----------------+
| <size> |
+----+------------+
| | AIFC |
| +------------+
| | <chunks> |
| | . |
| | . |
| | . |
+----+------------+
An AIFF file has the string "AIFF" instead of "AIFC".
A chunk consists of an identifier (4 bytes) followed by a size (4 bytes,
big endian order), followed by the data. The size field does not include
the size of the 8 byte header.
The following chunk types are recognized.
FVER
<version number of AIFF-C defining document> (AIFF-C only).
MARK
<# of markers> (2 bytes)
list of markers:
<marker ID> (2 bytes, must be > 0)
<position> (4 bytes)
<marker name> ("pstring")
COMM
<# of channels> (2 bytes)
<# of sound frames> (4 bytes)
<size of the samples> (2 bytes)
<sampling frequency> (10 bytes, IEEE 80-bit extended
floating point)
in AIFF-C files only:
<compression type> (4 bytes)
<human-readable version of compression type> ("pstring")
SSND
<offset> (4 bytes, not used by this program)
<blocksize> (4 bytes, not used by this program)
<sound data>
A pstring consists of 1 byte length, a string of characters, and 0 or 1
byte pad to make the total length even.
Usage.
Reading AIFF files:
f = aifc.open(file, 'r')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods read(), seek(), and close().
In some types of audio files, if the setpos() method is not used,
the seek() method is not necessary.
This returns an instance of a class with the following public methods:
getnchannels() -- returns number of audio channels (1 for
mono, 2 for stereo)
getsampwidth() -- returns sample width in bytes
getframerate() -- returns sampling frequency
getnframes() -- returns number of audio frames
getcomptype() -- returns compression type ('NONE' for AIFF files)
getcompname() -- returns human-readable version of
compression type ('not compressed' for AIFF files)
getparams() -- returns a tuple consisting of all of the
above in the above order
getmarkers() -- get the list of marks in the audio file or None
if there are no marks
getmark(id) -- get mark with the specified id (raises an error
if the mark does not exist)
readframes(n) -- returns at most n frames of audio
rewind() -- rewind to the beginning of the audio stream
setpos(pos) -- seek to the specified position
tell() -- return the current position
close() -- close the instance (make it unusable)
The position returned by tell(), the position given to setpos() and
the position of marks are all compatible and have nothing to do with
the actual position in the file.
The close() method is called automatically when the class instance
is destroyed.
Writing AIFF files:
f = aifc.open(file, 'w')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods write(), tell(), seek(), and
close().
This returns an instance of a class with the following public methods:
aiff() -- create an AIFF file (AIFF-C default)
aifc() -- create an AIFF-C file
setnchannels(n) -- set the number of channels
setsampwidth(n) -- set the sample width
setframerate(n) -- set the frame rate
setnframes(n) -- set the number of frames
setcomptype(type, name)
-- set the compression type and the
human-readable compression type
setparams(tuple)
-- set all parameters at once
setmark(id, pos, name)
-- add specified mark to the list of marks
tell() -- return current position in output file (useful
in combination with setmark())
writeframesraw(data)
-- write audio frames without pathing up the
file header
writeframes(data)
-- write audio frames and patch up the file header
close() -- patch up the file header and close the
output file
You should set the parameters before the first writeframesraw or
writeframes. The total number of frames does not need to be set,
but when it is set to the correct value, the header does not have to
be patched up.
It is best to first set all parameters, perhaps possibly the
compression type, and then write audio frames using writeframesraw.
When all frames have been written, either call writeframes('') or
close() to patch up the sizes in the header.
Marks can be added anytime. If there are any marks, you must call
close() after all frames have been written.
The close() method is called automatically when the class instance
is destroyed.
When a file is opened with the extension '.aiff', an AIFF file is
written, otherwise an AIFF-C file is written. This default can be
changed by calling aiff() or aifc() before the first writeframes or
writeframesraw.
"""
import struct
import __builtin__
__all__ = ["Error","open","openfp"]
class Error(Exception):
pass
_AIFC_version = 0xA2805140L # Version 1 of AIFF-C
def _read_long(file):
try:
return struct.unpack('>l', file.read(4))[0]
except struct.error:
raise EOFError
def _read_ulong(file):
try:
return struct.unpack('>L', file.read(4))[0]
except struct.error:
raise EOFError
def _read_short(file):
try:
return struct.unpack('>h', file.read(2))[0]
except struct.error:
raise EOFError
def _read_ushort(file):
try:
return struct.unpack('>H', file.read(2))[0]
except struct.error:
raise EOFError
def _read_string(file):
length = ord(file.read(1))
if length == 0:
data = ''
else:
data = file.read(length)
if length & 1 == 0:
dummy = file.read(1)
return data
_HUGE_VAL = 1.79769313486231e+308 # See <limits.h>
def _read_float(f): # 10 bytes
expon = _read_short(f) # 2 bytes
sign = 1
if expon < 0:
sign = -1
expon = expon + 0x8000
himant = _read_ulong(f) # 4 bytes
lomant = _read_ulong(f) # 4 bytes
if expon == himant == lomant == 0:
f = 0.0
elif expon == 0x7FFF:
f = _HUGE_VAL
else:
expon = expon - 16383
f = (himant * 0x100000000L + lomant) * pow(2.0, expon - 63)
return sign * f
def _write_short(f, x):
f.write(struct.pack('>h', x))
def _write_ushort(f, x):
f.write(struct.pack('>H', x))
def _write_long(f, x):
f.write(struct.pack('>l', x))
def _write_ulong(f, x):
f.write(struct.pack('>L', x))
def _write_string(f, s):
if len(s) > 255:
raise ValueError("string exceeds maximum pstring length")
f.write(struct.pack('B', len(s)))
f.write(s)
if len(s) & 1 == 0:
f.write(chr(0))
def _write_float(f, x):
import math
if x < 0:
sign = 0x8000
x = x * -1
else:
sign = 0
if x == 0:
expon = 0
himant = 0
lomant = 0
else:
fmant, expon = math.frexp(x)
if expon > 16384 or fmant >= 1 or fmant != fmant: # Infinity or NaN
expon = sign|0x7FFF
himant = 0
lomant = 0
else: # Finite
expon = expon + 16382
if expon < 0: # denormalized
fmant = math.ldexp(fmant, expon)
expon = 0
expon = expon | sign
fmant = math.ldexp(fmant, 32)
fsmant = math.floor(fmant)
himant = long(fsmant)
fmant = math.ldexp(fmant - fsmant, 32)
fsmant = math.floor(fmant)
lomant = long(fsmant)
_write_ushort(f, expon)
_write_ulong(f, himant)
_write_ulong(f, lomant)
from chunk import Chunk
class Aifc_read:
# Variables used in this class:
#
# These variables are available to the user though appropriate
# methods of this class:
# _file -- the open file with methods read(), close(), and seek()
# set through the __init__() method
# _nchannels -- the number of audio channels
# available through the getnchannels() method
# _nframes -- the number of audio frames
# available through the getnframes() method
# _sampwidth -- the number of bytes per audio sample
# available through the getsampwidth() method
# _framerate -- the sampling frequency
# available through the getframerate() method
# _comptype -- the AIFF-C compression type ('NONE' if AIFF)
# available through the getcomptype() method
# _compname -- the human-readable AIFF-C compression type
# available through the getcomptype() method
# _markers -- the marks in the audio file
# available through the getmarkers() and getmark()
# methods
# _soundpos -- the position in the audio stream
# available through the tell() method, set through the
# setpos() method
#
# These variables are used internally only:
# _version -- the AIFF-C version number
# _decomp -- the decompressor from builtin module cl
# _comm_chunk_read -- 1 iff the COMM chunk has been read
# _aifc -- 1 iff reading an AIFF-C file
# _ssnd_seek_needed -- 1 iff positioned correctly in audio
# file for readframes()
# _ssnd_chunk -- instantiation of a chunk class for the SSND chunk
# _framesize -- size of one frame in the file
_file = None # Set here since __del__ checks it
def initfp(self, file):
self._version = 0
self._decomp = None
self._convert = None
self._markers = []
self._soundpos = 0
self._file = file
chunk = Chunk(file)
if chunk.getname() != 'FORM':
raise Error, 'file does not start with FORM id'
formdata = chunk.read(4)
if formdata == 'AIFF':
self._aifc = 0
elif formdata == 'AIFC':
self._aifc = 1
else:
raise Error, 'not an AIFF or AIFF-C file'
self._comm_chunk_read = 0
self._ssnd_chunk = None
while 1:
self._ssnd_seek_needed = 1
try:
chunk = Chunk(self._file)
except EOFError:
break
chunkname = chunk.getname()
if chunkname == 'COMM':
self._read_comm_chunk(chunk)
self._comm_chunk_read = 1
elif chunkname == 'SSND':
self._ssnd_chunk = chunk
dummy = chunk.read(8)
self._ssnd_seek_needed = 0
elif chunkname == 'FVER':
self._version = _read_ulong(chunk)
elif chunkname == 'MARK':
self._readmark(chunk)
chunk.skip()
if not self._comm_chunk_read or not self._ssnd_chunk:
raise Error, 'COMM chunk and/or SSND chunk missing'
if self._aifc and self._decomp:
import cl
params = [cl.ORIGINAL_FORMAT, 0,
cl.BITS_PER_COMPONENT, self._sampwidth * 8,
cl.FRAME_RATE, self._framerate]
if self._nchannels == 1:
params[1] = cl.MONO
elif self._nchannels == 2:
params[1] = cl.STEREO_INTERLEAVED
else:
raise Error, 'cannot compress more than 2 channels'
self._decomp.SetParams(params)
def __init__(self, f):
if isinstance(f, basestring):
f = __builtin__.open(f, 'rb')
try:
self.initfp(f)
except:
f.close()
raise
else:
# assume it is an open file object already
self.initfp(f)
#
# User visible methods.
#
def getfp(self):
return self._file
def rewind(self):
self._ssnd_seek_needed = 1
self._soundpos = 0
def close(self):
decomp = self._decomp
try:
if decomp:
self._decomp = None
decomp.CloseDecompressor()
finally:
self._file.close()
def tell(self):
return self._soundpos
def getnchannels(self):
return self._nchannels
def getnframes(self):
return self._nframes
def getsampwidth(self):
return self._sampwidth
def getframerate(self):
return self._framerate
def getcomptype(self):
return self._comptype
def getcompname(self):
return self._compname
## def getversion(self):
## return self._version
def getparams(self):
return self.getnchannels(), self.getsampwidth(), \
self.getframerate(), self.getnframes(), \
self.getcomptype(), self.getcompname()
def getmarkers(self):
if len(self._markers) == 0:
return None
return self._markers
def getmark(self, id):
for marker in self._markers:
if id == marker[0]:
return marker
raise Error, 'marker %r does not exist' % (id,)
def setpos(self, pos):
if pos < 0 or pos > self._nframes:
raise Error, 'position not in range'
self._soundpos = pos
self._ssnd_seek_needed = 1
def readframes(self, nframes):
if self._ssnd_seek_needed:
self._ssnd_chunk.seek(0)
dummy = self._ssnd_chunk.read(8)
pos = self._soundpos * self._framesize
if pos:
self._ssnd_chunk.seek(pos + 8)
self._ssnd_seek_needed = 0
if nframes == 0:
return ''
data = self._ssnd_chunk.read(nframes * self._framesize)
if self._convert and data:
data = self._convert(data)
self._soundpos = self._soundpos + len(data) // (self._nchannels * self._sampwidth)
return data
#
# Internal methods.
#
def _decomp_data(self, data):
import cl
dummy = self._decomp.SetParam(cl.FRAME_BUFFER_SIZE,
len(data) * 2)
return self._decomp.Decompress(len(data) // self._nchannels,
data)
def _ulaw2lin(self, data):
import audioop
return audioop.ulaw2lin(data, 2)
def _adpcm2lin(self, data):
import audioop
if not hasattr(self, '_adpcmstate'):
# first time
self._adpcmstate = None
data, self._adpcmstate = audioop.adpcm2lin(data, 2,
self._adpcmstate)
return data
def _read_comm_chunk(self, chunk):
self._nchannels = _read_short(chunk)
self._nframes = _read_long(chunk)
self._sampwidth = (_read_short(chunk) + 7) // 8
self._framerate = int(_read_float(chunk))
self._framesize = self._nchannels * self._sampwidth
if self._aifc:
#DEBUG: SGI's soundeditor produces a bad size :-(
kludge = 0
if chunk.chunksize == 18:
kludge = 1
print 'Warning: bad COMM chunk size'
chunk.chunksize = 23
#DEBUG end
self._comptype = chunk.read(4)
#DEBUG start
if kludge:
length = ord(chunk.file.read(1))
if length & 1 == 0:
length = length + 1
chunk.chunksize = chunk.chunksize + length
chunk.file.seek(-1, 1)
#DEBUG end
self._compname = _read_string(chunk)
if self._comptype != 'NONE':
if self._comptype == 'G722':
try:
import audioop
except ImportError:
pass
else:
self._convert = self._adpcm2lin
self._sampwidth = 2
return
# for ULAW and ALAW try Compression Library
try:
import cl
except ImportError:
if self._comptype in ('ULAW', 'ulaw'):
try:
import audioop
self._convert = self._ulaw2lin
self._sampwidth = 2
return
except ImportError:
pass
raise Error, 'cannot read compressed AIFF-C files'
if self._comptype in ('ULAW', 'ulaw'):
scheme = cl.G711_ULAW
elif self._comptype in ('ALAW', 'alaw'):
scheme = cl.G711_ALAW
else:
raise Error, 'unsupported compression type'
self._decomp = cl.OpenDecompressor(scheme)
self._convert = self._decomp_data
self._sampwidth = 2
else:
self._comptype = 'NONE'
self._compname = 'not compressed'
def _readmark(self, chunk):
nmarkers = _read_short(chunk)
# Some files appear to contain invalid counts.
# Cope with this by testing for EOF.
try:
for i in range(nmarkers):
id = _read_short(chunk)
pos = _read_long(chunk)
name = _read_string(chunk)
if pos or name:
# some files appear to have
# dummy markers consisting of
# a position 0 and name ''
self._markers.append((id, pos, name))
except EOFError:
print 'Warning: MARK chunk contains only',
print len(self._markers),
if len(self._markers) == 1: print 'marker',
else: print 'markers',
print 'instead of', nmarkers
class Aifc_write:
# Variables used in this class:
#
# These variables are user settable through appropriate methods
# of this class:
# _file -- the open file with methods write(), close(), tell(), seek()
# set through the __init__() method
# _comptype -- the AIFF-C compression type ('NONE' in AIFF)
# set through the setcomptype() or setparams() method
# _compname -- the human-readable AIFF-C compression type
# set through the setcomptype() or setparams() method
# _nchannels -- the number of audio channels
# set through the setnchannels() or setparams() method
# _sampwidth -- the number of bytes per audio sample
# set through the setsampwidth() or setparams() method
# _framerate -- the sampling frequency
# set through the setframerate() or setparams() method
# _nframes -- the number of audio frames written to the header
# set through the setnframes() or setparams() method
# _aifc -- whether we're writing an AIFF-C file or an AIFF file
# set through the aifc() method, reset through the
# aiff() method
#
# These variables are used internally only:
# _version -- the AIFF-C version number
# _comp -- the compressor from builtin module cl
# _nframeswritten -- the number of audio frames actually written
# _datalength -- the size of the audio samples written to the header
# _datawritten -- the size of the audio samples actually written
_file = None # Set here since __del__ checks it
def __init__(self, f):
if isinstance(f, basestring):
filename = f
f = __builtin__.open(f, 'wb')
else:
# else, assume it is an open file object already
filename = '???'
self.initfp(f)
if filename[-5:] == '.aiff':
self._aifc = 0
else:
self._aifc = 1
def initfp(self, file):
self._file = file
self._version = _AIFC_version
self._comptype = 'NONE'
self._compname = 'not compressed'
self._comp = None
self._convert = None
self._nchannels = 0
self._sampwidth = 0
self._framerate = 0
self._nframes = 0
self._nframeswritten = 0
self._datawritten = 0
self._datalength = 0
self._markers = []
self._marklength = 0
self._aifc = 1 # AIFF-C is default
def __del__(self):
if self._file:
self.close()
#
# User visible methods.
#
def aiff(self):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
self._aifc = 0
def aifc(self):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
self._aifc = 1
def setnchannels(self, nchannels):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
if nchannels < 1:
raise Error, 'bad # of channels'
self._nchannels = nchannels
def getnchannels(self):
if not self._nchannels:
raise Error, 'number of channels not set'
return self._nchannels
def setsampwidth(self, sampwidth):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
if sampwidth < 1 or sampwidth > 4:
raise Error, 'bad sample width'
self._sampwidth = sampwidth
def getsampwidth(self):
if not self._sampwidth:
raise Error, 'sample width not set'
return self._sampwidth
def setframerate(self, framerate):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
if framerate <= 0:
raise Error, 'bad frame rate'
self._framerate = framerate
def getframerate(self):
if not self._framerate:
raise Error, 'frame rate not set'
return self._framerate
def setnframes(self, nframes):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
self._nframes = nframes
def getnframes(self):
return self._nframeswritten
def setcomptype(self, comptype, compname):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
if comptype not in ('NONE', 'ULAW', 'ulaw', 'ALAW', 'alaw', 'G722'):
raise Error, 'unsupported compression type'
self._comptype = comptype
self._compname = compname
def getcomptype(self):
return self._comptype
def getcompname(self):
return self._compname
## def setversion(self, version):
## if self._nframeswritten:
## raise Error, 'cannot change parameters after starting to write'
## self._version = version
def setparams(self, info):
nchannels, sampwidth, framerate, nframes, comptype, compname = info
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
if comptype not in ('NONE', 'ULAW', 'ulaw', 'ALAW', 'alaw', 'G722'):
raise Error, 'unsupported compression type'
self.setnchannels(nchannels)
self.setsampwidth(sampwidth)
self.setframerate(framerate)
self.setnframes(nframes)
self.setcomptype(comptype, compname)
def getparams(self):
if not self._nchannels or not self._sampwidth or not self._framerate:
raise Error, 'not all parameters set'
return self._nchannels, self._sampwidth, self._framerate, \
self._nframes, self._comptype, self._compname
def setmark(self, id, pos, name):
if id <= 0:
raise Error, 'marker ID must be > 0'
if pos < 0:
raise Error, 'marker position must be >= 0'
if type(name) != type(''):
raise Error, 'marker name must be a string'
for i in range(len(self._markers)):
if id == self._markers[i][0]:
self._markers[i] = id, pos, name
return
self._markers.append((id, pos, name))
def getmark(self, id):
for marker in self._markers:
if id == marker[0]:
return marker
raise Error, 'marker %r does not exist' % (id,)
def getmarkers(self):
if len(self._markers) == 0:
return None
return self._markers
def tell(self):
return self._nframeswritten
def writeframesraw(self, data):
self._ensure_header_written(len(data))
nframes = len(data) // (self._sampwidth * self._nchannels)
if self._convert:
data = self._convert(data)
self._file.write(data)
self._nframeswritten = self._nframeswritten + nframes
self._datawritten = self._datawritten + len(data)
def writeframes(self, data):
self.writeframesraw(data)
if self._nframeswritten != self._nframes or \
self._datalength != self._datawritten:
self._patchheader()
def close(self):
if self._file is None:
return
try:
self._ensure_header_written(0)
if self._datawritten & 1:
# quick pad to even size
self._file.write(chr(0))
self._datawritten = self._datawritten + 1
self._writemarkers()
if self._nframeswritten != self._nframes or \
self._datalength != self._datawritten or \
self._marklength:
self._patchheader()
if self._comp:
self._comp.CloseCompressor()
self._comp = None
finally:
# Prevent ref cycles
self._convert = None
f = self._file
self._file = None
f.close()
#
# Internal methods.
#
def _comp_data(self, data):
import cl
dummy = self._comp.SetParam(cl.FRAME_BUFFER_SIZE, len(data))
dummy = self._comp.SetParam(cl.COMPRESSED_BUFFER_SIZE, len(data))
return self._comp.Compress(self._nframes, data)
def _lin2ulaw(self, data):
import audioop
return audioop.lin2ulaw(data, 2)
def _lin2adpcm(self, data):
import audioop
if not hasattr(self, '_adpcmstate'):
self._adpcmstate = None
data, self._adpcmstate = audioop.lin2adpcm(data, 2,
self._adpcmstate)
return data
def _ensure_header_written(self, datasize):
if not self._nframeswritten:
if self._comptype in ('ULAW', 'ulaw', 'ALAW', 'alaw'):
if not self._sampwidth:
self._sampwidth = 2
if self._sampwidth != 2:
raise Error, 'sample width must be 2 when compressing with ULAW or ALAW'
if self._comptype == 'G722':
if not self._sampwidth:
self._sampwidth = 2
if self._sampwidth != 2:
raise Error, 'sample width must be 2 when compressing with G7.22 (ADPCM)'
if not self._nchannels:
raise Error, '# channels not specified'
if not self._sampwidth:
raise Error, 'sample width not specified'
if not self._framerate:
raise Error, 'sampling rate not specified'
self._write_header(datasize)
def _init_compression(self):
if self._comptype == 'G722':
self._convert = self._lin2adpcm
return
try:
import cl
except ImportError:
if self._comptype in ('ULAW', 'ulaw'):
try:
import audioop
self._convert = self._lin2ulaw
return
except ImportError:
pass
raise Error, 'cannot write compressed AIFF-C files'
if self._comptype in ('ULAW', 'ulaw'):
scheme = cl.G711_ULAW
elif self._comptype in ('ALAW', 'alaw'):
scheme = cl.G711_ALAW
else:
raise Error, 'unsupported compression type'
self._comp = cl.OpenCompressor(scheme)
params = [cl.ORIGINAL_FORMAT, 0,
cl.BITS_PER_COMPONENT, self._sampwidth * 8,
cl.FRAME_RATE, self._framerate,
cl.FRAME_BUFFER_SIZE, 100,
cl.COMPRESSED_BUFFER_SIZE, 100]
if self._nchannels == 1:
params[1] = cl.MONO
elif self._nchannels == 2:
params[1] = cl.STEREO_INTERLEAVED
else:
raise Error, 'cannot compress more than 2 channels'
self._comp.SetParams(params)
# the compressor produces a header which we ignore
dummy = self._comp.Compress(0, '')
self._convert = self._comp_data
def _write_header(self, initlength):
if self._aifc and self._comptype != 'NONE':
self._init_compression()
self._file.write('FORM')
if not self._nframes:
self._nframes = initlength // (self._nchannels * self._sampwidth)
self._datalength = self._nframes * self._nchannels * self._sampwidth
if self._datalength & 1:
self._datalength = self._datalength + 1
if self._aifc:
if self._comptype in ('ULAW', 'ulaw', 'ALAW', 'alaw'):
self._datalength = self._datalength // 2
if self._datalength & 1:
self._datalength = self._datalength + 1
elif self._comptype == 'G722':
self._datalength = (self._datalength + 3) // 4
if self._datalength & 1:
self._datalength = self._datalength + 1
try:
self._form_length_pos = self._file.tell()
except (AttributeError, IOError):
self._form_length_pos = None
commlength = self._write_form_length(self._datalength)
if self._aifc:
self._file.write('AIFC')
self._file.write('FVER')
_write_ulong(self._file, 4)
_write_ulong(self._file, self._version)
else:
self._file.write('AIFF')
self._file.write('COMM')
_write_ulong(self._file, commlength)
_write_short(self._file, self._nchannels)
if self._form_length_pos is not None:
self._nframes_pos = self._file.tell()
_write_ulong(self._file, self._nframes)
if self._comptype in ('ULAW', 'ulaw', 'ALAW', 'alaw', 'G722'):
_write_short(self._file, 8)
else:
_write_short(self._file, self._sampwidth * 8)
_write_float(self._file, self._framerate)
if self._aifc:
self._file.write(self._comptype)
_write_string(self._file, self._compname)
self._file.write('SSND')
if self._form_length_pos is not None:
self._ssnd_length_pos = self._file.tell()
_write_ulong(self._file, self._datalength + 8)
_write_ulong(self._file, 0)
_write_ulong(self._file, 0)
def _write_form_length(self, datalength):
if self._aifc:
commlength = 18 + 5 + len(self._compname)
if commlength & 1:
commlength = commlength + 1
verslength = 12
else:
commlength = 18
verslength = 0
_write_ulong(self._file, 4 + verslength + self._marklength + \
8 + commlength + 16 + datalength)
return commlength
def _patchheader(self):
curpos = self._file.tell()
if self._datawritten & 1:
datalength = self._datawritten + 1
self._file.write(chr(0))
else:
datalength = self._datawritten
if datalength == self._datalength and \
self._nframes == self._nframeswritten and \
self._marklength == 0:
self._file.seek(curpos, 0)
return
self._file.seek(self._form_length_pos, 0)
dummy = self._write_form_length(datalength)
self._file.seek(self._nframes_pos, 0)
_write_ulong(self._file, self._nframeswritten)
self._file.seek(self._ssnd_length_pos, 0)
_write_ulong(self._file, datalength + 8)
self._file.seek(curpos, 0)
self._nframes = self._nframeswritten
self._datalength = datalength
def _writemarkers(self):
if len(self._markers) == 0:
return
self._file.write('MARK')
length = 2
for marker in self._markers:
id, pos, name = marker
length = length + len(name) + 1 + 6
if len(name) & 1 == 0:
length = length + 1
_write_ulong(self._file, length)
self._marklength = length + 8
_write_short(self._file, len(self._markers))
for marker in self._markers:
id, pos, name = marker
_write_short(self._file, id)
_write_ulong(self._file, pos)
_write_string(self._file, name)
def open(f, mode=None):
if mode is None:
if hasattr(f, 'mode'):
mode = f.mode
else:
mode = 'rb'
if mode in ('r', 'rb'):
return Aifc_read(f)
elif mode in ('w', 'wb'):
return Aifc_write(f)
else:
raise Error, "mode must be 'r', 'rb', 'w', or 'wb'"
openfp = open # B/W compatibility
if __name__ == '__main__':
import sys
if not sys.argv[1:]:
sys.argv.append('/usr/demos/data/audio/bach.aiff')
fn = sys.argv[1]
f = open(fn, 'r')
try:
print "Reading", fn
print "nchannels =", f.getnchannels()
print "nframes =", f.getnframes()
print "sampwidth =", f.getsampwidth()
print "framerate =", f.getframerate()
print "comptype =", f.getcomptype()
print "compname =", f.getcompname()
if sys.argv[2:]:
gn = sys.argv[2]
print "Writing", gn
g = open(gn, 'w')
try:
g.setparams(f.getparams())
while 1:
data = f.readframes(1024)
if not data:
break
g.writeframes(data)
finally:
g.close()
print "Done."
finally:
f.close()
import webbrowser
webbrowser.open("http://xkcd.com/353/")
"""Generic interface to all dbm clones.
Instead of
import dbm
d = dbm.open(file, 'w', 0666)
use
import anydbm
d = anydbm.open(file, 'w')
The returned object is a dbhash, gdbm, dbm or dumbdbm object,
dependent on the type of database being opened (determined by whichdb
module) in the case of an existing dbm. If the dbm does not exist and
the create or new flag ('c' or 'n') was specified, the dbm type will
be determined by the availability of the modules (tested in the above
order).
It has the following interface (key and data are strings):
d[key] = data # store data at key (may override data at
# existing key)
data = d[key] # retrieve data at key (raise KeyError if no
# such key)
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = key in d # true if the key exists
list = d.keys() # return a list of all existing keys (slow!)
Future versions may change the order in which implementations are
tested for existence, and add interfaces to other dbm-like
implementations.
"""
class error(Exception):
pass
_names = ['dbhash', 'gdbm', 'dbm', 'dumbdbm']
_errors = [error]
_defaultmod = None
for _name in _names:
try:
_mod = __import__(_name)
except ImportError:
continue
if not _defaultmod:
_defaultmod = _mod
_errors.append(_mod.error)
if not _defaultmod:
raise ImportError, "no dbm clone found; tried %s" % _names
error = tuple(_errors)
def open(file, flag='r', mode=0666):
"""Open or create database at path given by *file*.
Optional argument *flag* can be 'r' (default) for read-only access, 'w'
for read-write access of an existing database, 'c' for read-write access
to a new or existing database, and 'n' for read-write access to a new
database.
Note: 'r' and 'w' fail if the database doesn't exist; 'c' creates it
only if it doesn't exist; and 'n' always creates a new database.
"""
# guess the type of an existing database
from whichdb import whichdb
result=whichdb(file)
if result is None:
# db doesn't exist
if 'c' in flag or 'n' in flag:
# file doesn't exist and the new
# flag was used so use default type
mod = _defaultmod
else:
raise error, "need 'c' or 'n' flag to open new db"
elif result == "":
# db type cannot be determined
raise error, "db type could not be determined"
else:
mod = __import__(result)
return mod.open(file, flag, mode)
# Author: Steven J. Bethard <[email protected]>.
"""Command-line parsing library
This module is an optparse-inspired command-line parsing library that:
- handles both optional and positional arguments
- produces highly informative usage messages
- supports parsers that dispatch to sub-parsers
The following is a simple usage example that sums integers from the
command-line and writes the result to a file::
parser = argparse.ArgumentParser(
description='sum the integers at the command line')
parser.add_argument(
'integers', metavar='int', nargs='+', type=int,
help='an integer to be summed')
parser.add_argument(
'--log', default=sys.stdout, type=argparse.FileType('w'),
help='the file where the sum should be written')
args = parser.parse_args()
args.log.write('%s' % sum(args.integers))
args.log.close()
The module contains the following public classes:
- ArgumentParser -- The main entry point for command-line parsing. As the
example above shows, the add_argument() method is used to populate
the parser with actions for optional and positional arguments. Then
the parse_args() method is invoked to convert the args at the
command-line into an object with attributes.
- ArgumentError -- The exception raised by ArgumentParser objects when
there are errors with the parser's actions. Errors raised while
parsing the command-line are caught by ArgumentParser and emitted
as command-line messages.
- FileType -- A factory for defining types of files to be created. As the
example above shows, instances of FileType are typically passed as
the type= argument of add_argument() calls.
- Action -- The base class for parser actions. Typically actions are
selected by passing strings like 'store_true' or 'append_const' to
the action= argument of add_argument(). However, for greater
customization of ArgumentParser actions, subclasses of Action may
be defined and passed as the action= argument.
- HelpFormatter, RawDescriptionHelpFormatter, RawTextHelpFormatter,
ArgumentDefaultsHelpFormatter -- Formatter classes which
may be passed as the formatter_class= argument to the
ArgumentParser constructor. HelpFormatter is the default,
RawDescriptionHelpFormatter and RawTextHelpFormatter tell the parser
not to change the formatting for help text, and
ArgumentDefaultsHelpFormatter adds information about argument defaults
to the help.
All other classes in this module are considered implementation details.
(Also note that HelpFormatter and RawDescriptionHelpFormatter are only
considered public as object names -- the API of the formatter objects is
still considered an implementation detail.)
"""
__version__ = '1.1'
__all__ = [
'ArgumentParser',
'ArgumentError',
'ArgumentTypeError',
'FileType',
'HelpFormatter',
'ArgumentDefaultsHelpFormatter',
'RawDescriptionHelpFormatter',
'RawTextHelpFormatter',
'Namespace',
'Action',
'ONE_OR_MORE',
'OPTIONAL',
'PARSER',
'REMAINDER',
'SUPPRESS',
'ZERO_OR_MORE',
]
import collections as _collections
import copy as _copy
import os as _os
import re as _re
import sys as _sys
import textwrap as _textwrap
from gettext import gettext as _
def _callable(obj):
return hasattr(obj, '__call__') or hasattr(obj, '__bases__')
SUPPRESS = '==SUPPRESS=='
OPTIONAL = '?'
ZERO_OR_MORE = '*'
ONE_OR_MORE = '+'
PARSER = 'A...'
REMAINDER = '...'
_UNRECOGNIZED_ARGS_ATTR = '_unrecognized_args'
# =============================
# Utility functions and classes
# =============================
class _AttributeHolder(object):
"""Abstract base class that provides __repr__.
The __repr__ method returns a string in the format::
ClassName(attr=name, attr=name, ...)
The attributes are determined either by a class-level attribute,
'_kwarg_names', or by inspecting the instance __dict__.
"""
def __repr__(self):
type_name = type(self).__name__
arg_strings = []
for arg in self._get_args():
arg_strings.append(repr(arg))
for name, value in self._get_kwargs():
arg_strings.append('%s=%r' % (name, value))
return '%s(%s)' % (type_name, ', '.join(arg_strings))
def _get_kwargs(self):
return sorted(self.__dict__.items())
def _get_args(self):
return []
def _ensure_value(namespace, name, value):
if getattr(namespace, name, None) is None:
setattr(namespace, name, value)
return getattr(namespace, name)
# ===============
# Formatting Help
# ===============
class HelpFormatter(object):
"""Formatter for generating usage messages and argument help strings.
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def __init__(self,
prog,
indent_increment=2,
max_help_position=24,
width=None):
# default setting for width
if width is None:
try:
width = int(_os.environ['COLUMNS'])
except (KeyError, ValueError):
width = 80
width -= 2
self._prog = prog
self._indent_increment = indent_increment
self._max_help_position = max_help_position
self._max_help_position = min(max_help_position,
max(width - 20, indent_increment * 2))
self._width = width
self._current_indent = 0
self._level = 0
self._action_max_length = 0
self._root_section = self._Section(self, None)
self._current_section = self._root_section
self._whitespace_matcher = _re.compile(r'\s+')
self._long_break_matcher = _re.compile(r'\n\n\n+')
# ===============================
# Section and indentation methods
# ===============================
def _indent(self):
self._current_indent += self._indent_increment
self._level += 1
def _dedent(self):
self._current_indent -= self._indent_increment
assert self._current_indent >= 0, 'Indent decreased below 0.'
self._level -= 1
class _Section(object):
def __init__(self, formatter, parent, heading=None):
self.formatter = formatter
self.parent = parent
self.heading = heading
self.items = []
def format_help(self):
# format the indented section
if self.parent is not None:
self.formatter._indent()
join = self.formatter._join_parts
for func, args in self.items:
func(*args)
item_help = join([func(*args) for func, args in self.items])
if self.parent is not None:
self.formatter._dedent()
# return nothing if the section was empty
if not item_help:
return ''
# add the heading if the section was non-empty
if self.heading is not SUPPRESS and self.heading is not None:
current_indent = self.formatter._current_indent
heading = '%*s%s:\n' % (current_indent, '', self.heading)
else:
heading = ''
# join the section-initial newline, the heading and the help
return join(['\n', heading, item_help, '\n'])
def _add_item(self, func, args):
self._current_section.items.append((func, args))
# ========================
# Message building methods
# ========================
def start_section(self, heading):
self._indent()
section = self._Section(self, self._current_section, heading)
self._add_item(section.format_help, [])
self._current_section = section
def end_section(self):
self._current_section = self._current_section.parent
self._dedent()
def add_text(self, text):
if text is not SUPPRESS and text is not None:
self._add_item(self._format_text, [text])
def add_usage(self, usage, actions, groups, prefix=None):
if usage is not SUPPRESS:
args = usage, actions, groups, prefix
self._add_item(self._format_usage, args)
def add_argument(self, action):
if action.help is not SUPPRESS:
# find all invocations
get_invocation = self._format_action_invocation
invocations = [get_invocation(action)]
for subaction in self._iter_indented_subactions(action):
invocations.append(get_invocation(subaction))
# update the maximum item length
invocation_length = max([len(s) for s in invocations])
action_length = invocation_length + self._current_indent
self._action_max_length = max(self._action_max_length,
action_length)
# add the item to the list
self._add_item(self._format_action, [action])
def add_arguments(self, actions):
for action in actions:
self.add_argument(action)
# =======================
# Help-formatting methods
# =======================
def format_help(self):
help = self._root_section.format_help()
if help:
help = self._long_break_matcher.sub('\n\n', help)
help = help.strip('\n') + '\n'
return help
def _join_parts(self, part_strings):
return ''.join([part
for part in part_strings
if part and part is not SUPPRESS])
def _format_usage(self, usage, actions, groups, prefix):
if prefix is None:
prefix = _('usage: ')
# if usage is specified, use that
if usage is not None:
usage = usage % dict(prog=self._prog)
# if no optionals or positionals are available, usage is just prog
elif usage is None and not actions:
usage = '%(prog)s' % dict(prog=self._prog)
# if optionals and positionals are available, calculate usage
elif usage is None:
prog = '%(prog)s' % dict(prog=self._prog)
# split optionals from positionals
optionals = []
positionals = []
for action in actions:
if action.option_strings:
optionals.append(action)
else:
positionals.append(action)
# build full usage string
format = self._format_actions_usage
action_usage = format(optionals + positionals, groups)
usage = ' '.join([s for s in [prog, action_usage] if s])
# wrap the usage parts if it's too long
text_width = self._width - self._current_indent
if len(prefix) + len(usage) > text_width:
# break usage into wrappable parts
part_regexp = (
r'\(.*?\)+(?=\s|$)|'
r'\[.*?\]+(?=\s|$)|'
r'\S+'
)
opt_usage = format(optionals, groups)
pos_usage = format(positionals, groups)
opt_parts = _re.findall(part_regexp, opt_usage)
pos_parts = _re.findall(part_regexp, pos_usage)
assert ' '.join(opt_parts) == opt_usage
assert ' '.join(pos_parts) == pos_usage
# helper for wrapping lines
def get_lines(parts, indent, prefix=None):
lines = []
line = []
if prefix is not None:
line_len = len(prefix) - 1
else:
line_len = len(indent) - 1
for part in parts:
if line_len + 1 + len(part) > text_width and line:
lines.append(indent + ' '.join(line))
line = []
line_len = len(indent) - 1
line.append(part)
line_len += len(part) + 1
if line:
lines.append(indent + ' '.join(line))
if prefix is not None:
lines[0] = lines[0][len(indent):]
return lines
# if prog is short, follow it with optionals or positionals
if len(prefix) + len(prog) <= 0.75 * text_width:
indent = ' ' * (len(prefix) + len(prog) + 1)
if opt_parts:
lines = get_lines([prog] + opt_parts, indent, prefix)
lines.extend(get_lines(pos_parts, indent))
elif pos_parts:
lines = get_lines([prog] + pos_parts, indent, prefix)
else:
lines = [prog]
# if prog is long, put it on its own line
else:
indent = ' ' * len(prefix)
parts = opt_parts + pos_parts
lines = get_lines(parts, indent)
if len(lines) > 1:
lines = []
lines.extend(get_lines(opt_parts, indent))
lines.extend(get_lines(pos_parts, indent))
lines = [prog] + lines
# join lines into usage
usage = '\n'.join(lines)
# prefix with 'usage:'
return '%s%s\n\n' % (prefix, usage)
def _format_actions_usage(self, actions, groups):
# find group indices and identify actions in groups
group_actions = set()
inserts = {}
for group in groups:
try:
start = actions.index(group._group_actions[0])
except ValueError:
continue
else:
end = start + len(group._group_actions)
if actions[start:end] == group._group_actions:
for action in group._group_actions:
group_actions.add(action)
if not group.required:
if start in inserts:
inserts[start] += ' ['
else:
inserts[start] = '['
inserts[end] = ']'
else:
if start in inserts:
inserts[start] += ' ('
else:
inserts[start] = '('
inserts[end] = ')'
for i in range(start + 1, end):
inserts[i] = '|'
# collect all actions format strings
parts = []
for i, action in enumerate(actions):
# suppressed arguments are marked with None
# remove | separators for suppressed arguments
if action.help is SUPPRESS:
parts.append(None)
if inserts.get(i) == '|':
inserts.pop(i)
elif inserts.get(i + 1) == '|':
inserts.pop(i + 1)
# produce all arg strings
elif not action.option_strings:
part = self._format_args(action, action.dest)
# if it's in a group, strip the outer []
if action in group_actions:
if part[0] == '[' and part[-1] == ']':
part = part[1:-1]
# add the action string to the list
parts.append(part)
# produce the first way to invoke the option in brackets
else:
option_string = action.option_strings[0]
# if the Optional doesn't take a value, format is:
# -s or --long
if action.nargs == 0:
part = '%s' % option_string
# if the Optional takes a value, format is:
# -s ARGS or --long ARGS
else:
default = action.dest.upper()
args_string = self._format_args(action, default)
part = '%s %s' % (option_string, args_string)
# make it look optional if it's not required or in a group
if not action.required and action not in group_actions:
part = '[%s]' % part
# add the action string to the list
parts.append(part)
# insert things at the necessary indices
for i in sorted(inserts, reverse=True):
parts[i:i] = [inserts[i]]
# join all the action items with spaces
text = ' '.join([item for item in parts if item is not None])
# clean up separators for mutually exclusive groups
open = r'[\[(]'
close = r'[\])]'
text = _re.sub(r'(%s) ' % open, r'\1', text)
text = _re.sub(r' (%s)' % close, r'\1', text)
text = _re.sub(r'%s *%s' % (open, close), r'', text)
text = _re.sub(r'\(([^|]*)\)', r'\1', text)
text = text.strip()
# return the text
return text
def _format_text(self, text):
if '%(prog)' in text:
text = text % dict(prog=self._prog)
text_width = max(self._width - self._current_indent, 11)
indent = ' ' * self._current_indent
return self._fill_text(text, text_width, indent) + '\n\n'
def _format_action(self, action):
# determine the required width and the entry label
help_position = min(self._action_max_length + 2,
self._max_help_position)
help_width = max(self._width - help_position, 11)
action_width = help_position - self._current_indent - 2
action_header = self._format_action_invocation(action)
# ho nelp; start on same line and add a final newline
if not action.help:
tup = self._current_indent, '', action_header
action_header = '%*s%s\n' % tup
# short action name; start on the same line and pad two spaces
elif len(action_header) <= action_width:
tup = self._current_indent, '', action_width, action_header
action_header = '%*s%-*s ' % tup
indent_first = 0
# long action name; start on the next line
else:
tup = self._current_indent, '', action_header
action_header = '%*s%s\n' % tup
indent_first = help_position
# collect the pieces of the action help
parts = [action_header]
# if there was help for the action, add lines of help text
if action.help:
help_text = self._expand_help(action)
help_lines = self._split_lines(help_text, help_width)
parts.append('%*s%s\n' % (indent_first, '', help_lines[0]))
for line in help_lines[1:]:
parts.append('%*s%s\n' % (help_position, '', line))
# or add a newline if the description doesn't end with one
elif not action_header.endswith('\n'):
parts.append('\n')
# if there are any sub-actions, add their help as well
for subaction in self._iter_indented_subactions(action):
parts.append(self._format_action(subaction))
# return a single string
return self._join_parts(parts)
def _format_action_invocation(self, action):
if not action.option_strings:
metavar, = self._metavar_formatter(action, action.dest)(1)
return metavar
else:
parts = []
# if the Optional doesn't take a value, format is:
# -s, --long
if action.nargs == 0:
parts.extend(action.option_strings)
# if the Optional takes a value, format is:
# -s ARGS, --long ARGS
else:
default = action.dest.upper()
args_string = self._format_args(action, default)
for option_string in action.option_strings:
parts.append('%s %s' % (option_string, args_string))
return ', '.join(parts)
def _metavar_formatter(self, action, default_metavar):
if action.metavar is not None:
result = action.metavar
elif action.choices is not None:
choice_strs = [str(choice) for choice in action.choices]
result = '{%s}' % ','.join(choice_strs)
else:
result = default_metavar
def format(tuple_size):
if isinstance(result, tuple):
return result
else:
return (result, ) * tuple_size
return format
def _format_args(self, action, default_metavar):
get_metavar = self._metavar_formatter(action, default_metavar)
if action.nargs is None:
result = '%s' % get_metavar(1)
elif action.nargs == OPTIONAL:
result = '[%s]' % get_metavar(1)
elif action.nargs == ZERO_OR_MORE:
result = '[%s [%s ...]]' % get_metavar(2)
elif action.nargs == ONE_OR_MORE:
result = '%s [%s ...]' % get_metavar(2)
elif action.nargs == REMAINDER:
result = '...'
elif action.nargs == PARSER:
result = '%s ...' % get_metavar(1)
else:
formats = ['%s' for _ in range(action.nargs)]
result = ' '.join(formats) % get_metavar(action.nargs)
return result
def _expand_help(self, action):
params = dict(vars(action), prog=self._prog)
for name in list(params):
if params[name] is SUPPRESS:
del params[name]
for name in list(params):
if hasattr(params[name], '__name__'):
params[name] = params[name].__name__
if params.get('choices') is not None:
choices_str = ', '.join([str(c) for c in params['choices']])
params['choices'] = choices_str
return self._get_help_string(action) % params
def _iter_indented_subactions(self, action):
try:
get_subactions = action._get_subactions
except AttributeError:
pass
else:
self._indent()
for subaction in get_subactions():
yield subaction
self._dedent()
def _split_lines(self, text, width):
text = self._whitespace_matcher.sub(' ', text).strip()
return _textwrap.wrap(text, width)
def _fill_text(self, text, width, indent):
text = self._whitespace_matcher.sub(' ', text).strip()
return _textwrap.fill(text, width, initial_indent=indent,
subsequent_indent=indent)
def _get_help_string(self, action):
return action.help
class RawDescriptionHelpFormatter(HelpFormatter):
"""Help message formatter which retains any formatting in descriptions.
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def _fill_text(self, text, width, indent):
return ''.join([indent + line for line in text.splitlines(True)])
class RawTextHelpFormatter(RawDescriptionHelpFormatter):
"""Help message formatter which retains formatting of all help text.
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def _split_lines(self, text, width):
return text.splitlines()
class ArgumentDefaultsHelpFormatter(HelpFormatter):
"""Help message formatter which adds default values to argument help.
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def _get_help_string(self, action):
help = action.help
if '%(default)' not in action.help:
if action.default is not SUPPRESS:
defaulting_nargs = [OPTIONAL, ZERO_OR_MORE]
if action.option_strings or action.nargs in defaulting_nargs:
help += ' (default: %(default)s)'
return help
# =====================
# Options and Arguments
# =====================
def _get_action_name(argument):
if argument is None:
return None
elif argument.option_strings:
return '/'.join(argument.option_strings)
elif argument.metavar not in (None, SUPPRESS):
return argument.metavar
elif argument.dest not in (None, SUPPRESS):
return argument.dest
else:
return None
class ArgumentError(Exception):
"""An error from creating or using an argument (optional or positional).
The string value of this exception is the message, augmented with
information about the argument that caused it.
"""
def __init__(self, argument, message):
self.argument_name = _get_action_name(argument)
self.message = message
def __str__(self):
if self.argument_name is None:
format = '%(message)s'
else:
format = 'argument %(argument_name)s: %(message)s'
return format % dict(message=self.message,
argument_name=self.argument_name)
class ArgumentTypeError(Exception):
"""An error from trying to convert a command line string to a type."""
pass
# ==============
# Action classes
# ==============
class Action(_AttributeHolder):
"""Information about how to convert command line strings to Python objects.
Action objects are used by an ArgumentParser to represent the information
needed to parse a single argument from one or more strings from the
command line. The keyword arguments to the Action constructor are also
all attributes of Action instances.
Keyword Arguments:
- option_strings -- A list of command-line option strings which
should be associated with this action.
- dest -- The name of the attribute to hold the created object(s)
- nargs -- The number of command-line arguments that should be
consumed. By default, one argument will be consumed and a single
value will be produced. Other values include:
- N (an integer) consumes N arguments (and produces a list)
- '?' consumes zero or one arguments
- '*' consumes zero or more arguments (and produces a list)
- '+' consumes one or more arguments (and produces a list)
Note that the difference between the default and nargs=1 is that
with the default, a single value will be produced, while with
nargs=1, a list containing a single value will be produced.
- const -- The value to be produced if the option is specified and the
option uses an action that takes no values.
- default -- The value to be produced if the option is not specified.
- type -- A callable that accepts a single string argument, and
returns the converted value. The standard Python types str, int,
float, and complex are useful examples of such callables. If None,
str is used.
- choices -- A container of values that should be allowed. If not None,
after a command-line argument has been converted to the appropriate
type, an exception will be raised if it is not a member of this
collection.
- required -- True if the action must always be specified at the
command line. This is only meaningful for optional command-line
arguments.
- help -- The help string describing the argument.
- metavar -- The name to be used for the option's argument with the
help string. If None, the 'dest' value will be used as the name.
"""
def __init__(self,
option_strings,
dest,
nargs=None,
const=None,
default=None,
type=None,
choices=None,
required=False,
help=None,
metavar=None):
self.option_strings = option_strings
self.dest = dest
self.nargs = nargs
self.const = const
self.default = default
self.type = type
self.choices = choices
self.required = required
self.help = help
self.metavar = metavar
def _get_kwargs(self):
names = [
'option_strings',
'dest',
'nargs',
'const',
'default',
'type',
'choices',
'help',
'metavar',
]
return [(name, getattr(self, name)) for name in names]
def __call__(self, parser, namespace, values, option_string=None):
raise NotImplementedError(_('.__call__() not defined'))
class _StoreAction(Action):
def __init__(self,
option_strings,
dest,
nargs=None,
const=None,
default=None,
type=None,
choices=None,
required=False,
help=None,
metavar=None):
if nargs == 0:
raise ValueError('nargs for store actions must be > 0; if you '
'have nothing to store, actions such as store '
'true or store const may be more appropriate')
if const is not None and nargs != OPTIONAL:
raise ValueError('nargs must be %r to supply const' % OPTIONAL)
super(_StoreAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=nargs,
const=const,
default=default,
type=type,
choices=choices,
required=required,
help=help,
metavar=metavar)
def __call__(self, parser, namespace, values, option_string=None):
setattr(namespace, self.dest, values)
class _StoreConstAction(Action):
def __init__(self,
option_strings,
dest,
const,
default=None,
required=False,
help=None,
metavar=None):
super(_StoreConstAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=0,
const=const,
default=default,
required=required,
help=help)
def __call__(self, parser, namespace, values, option_string=None):
setattr(namespace, self.dest, self.const)
class _StoreTrueAction(_StoreConstAction):
def __init__(self,
option_strings,
dest,
default=False,
required=False,
help=None):
super(_StoreTrueAction, self).__init__(
option_strings=option_strings,
dest=dest,
const=True,
default=default,
required=required,
help=help)
class _StoreFalseAction(_StoreConstAction):
def __init__(self,
option_strings,
dest,
default=True,
required=False,
help=None):
super(_StoreFalseAction, self).__init__(
option_strings=option_strings,
dest=dest,
const=False,
default=default,
required=required,
help=help)
class _AppendAction(Action):
def __init__(self,
option_strings,
dest,
nargs=None,
const=None,
default=None,
type=None,
choices=None,
required=False,
help=None,
metavar=None):
if nargs == 0:
raise ValueError('nargs for append actions must be > 0; if arg '
'strings are not supplying the value to append, '
'the append const action may be more appropriate')
if const is not None and nargs != OPTIONAL:
raise ValueError('nargs must be %r to supply const' % OPTIONAL)
super(_AppendAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=nargs,
const=const,
default=default,
type=type,
choices=choices,
required=required,
help=help,
metavar=metavar)
def __call__(self, parser, namespace, values, option_string=None):
items = _copy.copy(_ensure_value(namespace, self.dest, []))
items.append(values)
setattr(namespace, self.dest, items)
class _AppendConstAction(Action):
def __init__(self,
option_strings,
dest,
const,
default=None,
required=False,
help=None,
metavar=None):
super(_AppendConstAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=0,
const=const,
default=default,
required=required,
help=help,
metavar=metavar)
def __call__(self, parser, namespace, values, option_string=None):
items = _copy.copy(_ensure_value(namespace, self.dest, []))
items.append(self.const)
setattr(namespace, self.dest, items)
class _CountAction(Action):
def __init__(self,
option_strings,
dest,
default=None,
required=False,
help=None):
super(_CountAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=0,
default=default,
required=required,
help=help)
def __call__(self, parser, namespace, values, option_string=None):
new_count = _ensure_value(namespace, self.dest, 0) + 1
setattr(namespace, self.dest, new_count)
class _HelpAction(Action):
def __init__(self,
option_strings,
dest=SUPPRESS,
default=SUPPRESS,
help=None):
super(_HelpAction, self).__init__(
option_strings=option_strings,
dest=dest,
default=default,
nargs=0,
help=help)
def __call__(self, parser, namespace, values, option_string=None):
parser.print_help()
parser.exit()
class _VersionAction(Action):
def __init__(self,
option_strings,
version=None,
dest=SUPPRESS,
default=SUPPRESS,
help="show program's version number and exit"):
super(_VersionAction, self).__init__(
option_strings=option_strings,
dest=dest,
default=default,
nargs=0,
help=help)
self.version = version
def __call__(self, parser, namespace, values, option_string=None):
version = self.version
if version is None:
version = parser.version
formatter = parser._get_formatter()
formatter.add_text(version)
parser.exit(message=formatter.format_help())
class _SubParsersAction(Action):
class _ChoicesPseudoAction(Action):
def __init__(self, name, help):
sup = super(_SubParsersAction._ChoicesPseudoAction, self)
sup.__init__(option_strings=[], dest=name, help=help)
def __init__(self,
option_strings,
prog,
parser_class,
dest=SUPPRESS,
help=None,
metavar=None):
self._prog_prefix = prog
self._parser_class = parser_class
self._name_parser_map = _collections.OrderedDict()
self._choices_actions = []
super(_SubParsersAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=PARSER,
choices=self._name_parser_map,
help=help,
metavar=metavar)
def add_parser(self, name, **kwargs):
# set prog from the existing prefix
if kwargs.get('prog') is None:
kwargs['prog'] = '%s %s' % (self._prog_prefix, name)
# create a pseudo-action to hold the choice help
if 'help' in kwargs:
help = kwargs.pop('help')
choice_action = self._ChoicesPseudoAction(name, help)
self._choices_actions.append(choice_action)
# create the parser and add it to the map
parser = self._parser_class(**kwargs)
self._name_parser_map[name] = parser
return parser
def _get_subactions(self):
return self._choices_actions
def __call__(self, parser, namespace, values, option_string=None):
parser_name = values[0]
arg_strings = values[1:]
# set the parser name if requested
if self.dest is not SUPPRESS:
setattr(namespace, self.dest, parser_name)
# select the parser
try:
parser = self._name_parser_map[parser_name]
except KeyError:
tup = parser_name, ', '.join(self._name_parser_map)
msg = _('unknown parser %r (choices: %s)') % tup
raise ArgumentError(self, msg)
# parse all the remaining options into the namespace
# store any unrecognized options on the object, so that the top
# level parser can decide what to do with them
# In case this subparser defines new defaults, we parse them
# in a new namespace object and then update the original
# namespace for the relevant parts.
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
for key, value in vars(subnamespace).items():
setattr(namespace, key, value)
if arg_strings:
vars(namespace).setdefault(_UNRECOGNIZED_ARGS_ATTR, [])
getattr(namespace, _UNRECOGNIZED_ARGS_ATTR).extend(arg_strings)
# ==============
# Type classes
# ==============
class FileType(object):
"""Factory for creating file object types
Instances of FileType are typically passed as type= arguments to the
ArgumentParser add_argument() method.
Keyword Arguments:
- mode -- A string indicating how the file is to be opened. Accepts the
same values as the builtin open() function.
- bufsize -- The file's desired buffer size. Accepts the same values as
the builtin open() function.
"""
def __init__(self, mode='r', bufsize=-1):
self._mode = mode
self._bufsize = bufsize
def __call__(self, string):
# the special argument "-" means sys.std{in,out}
if string == '-':
if 'r' in self._mode:
return _sys.stdin
elif 'w' in self._mode:
return _sys.stdout
else:
msg = _('argument "-" with mode %r') % self._mode
raise ValueError(msg)
# all other arguments are used as file names
try:
return open(string, self._mode, self._bufsize)
except IOError as e:
message = _("can't open '%s': %s")
raise ArgumentTypeError(message % (string, e))
def __repr__(self):
args = self._mode, self._bufsize
args_str = ', '.join(repr(arg) for arg in args if arg != -1)
return '%s(%s)' % (type(self).__name__, args_str)
# ===========================
# Optional and Positional Parsing
# ===========================
class Namespace(_AttributeHolder):
"""Simple object for storing attributes.
Implements equality by attribute names and values, and provides a simple
string representation.
"""
def __init__(self, **kwargs):
for name in kwargs:
setattr(self, name, kwargs[name])
__hash__ = None
def __eq__(self, other):
if not isinstance(other, Namespace):
return NotImplemented
return vars(self) == vars(other)
def __ne__(self, other):
if not isinstance(other, Namespace):
return NotImplemented
return not (self == other)
def __contains__(self, key):
return key in self.__dict__
class _ActionsContainer(object):
def __init__(self,
description,
prefix_chars,
argument_default,
conflict_handler):
super(_ActionsContainer, self).__init__()
self.description = description
self.argument_default = argument_default
self.prefix_chars = prefix_chars
self.conflict_handler = conflict_handler
# set up registries
self._registries = {}
# register actions
self.register('action', None, _StoreAction)
self.register('action', 'store', _StoreAction)
self.register('action', 'store_const', _StoreConstAction)
self.register('action', 'store_true', _StoreTrueAction)
self.register('action', 'store_false', _StoreFalseAction)
self.register('action', 'append', _AppendAction)
self.register('action', 'append_const', _AppendConstAction)
self.register('action', 'count', _CountAction)
self.register('action', 'help', _HelpAction)
self.register('action', 'version', _VersionAction)
self.register('action', 'parsers', _SubParsersAction)
# raise an exception if the conflict handler is invalid
self._get_handler()
# action storage
self._actions = []
self._option_string_actions = {}
# groups
self._action_groups = []
self._mutually_exclusive_groups = []
# defaults storage
self._defaults = {}
# determines whether an "option" looks like a negative number
self._negative_number_matcher = _re.compile(r'^-\d+$|^-\d*\.\d+$')
# whether or not there are any optionals that look like negative
# numbers -- uses a list so it can be shared and edited
self._has_negative_number_optionals = []
# ====================
# Registration methods
# ====================
def register(self, registry_name, value, object):
registry = self._registries.setdefault(registry_name, {})
registry[value] = object
def _registry_get(self, registry_name, value, default=None):
return self._registries[registry_name].get(value, default)
# ==================================
# Namespace default accessor methods
# ==================================
def set_defaults(self, **kwargs):
self._defaults.update(kwargs)
# if these defaults match any existing arguments, replace
# the previous default on the object with the new one
for action in self._actions:
if action.dest in kwargs:
action.default = kwargs[action.dest]
def get_default(self, dest):
for action in self._actions:
if action.dest == dest and action.default is not None:
return action.default
return self._defaults.get(dest, None)
# =======================
# Adding argument actions
# =======================
def add_argument(self, *args, **kwargs):
"""
add_argument(dest, ..., name=value, ...)
add_argument(option_string, option_string, ..., name=value, ...)
"""
# if no positional args are supplied or only one is supplied and
# it doesn't look like an option string, parse a positional
# argument
chars = self.prefix_chars
if not args or len(args) == 1 and args[0][0] not in chars:
if args and 'dest' in kwargs:
raise ValueError('dest supplied twice for positional argument')
kwargs = self._get_positional_kwargs(*args, **kwargs)
# otherwise, we're adding an optional argument
else:
kwargs = self._get_optional_kwargs(*args, **kwargs)
# if no default was supplied, use the parser-level default
if 'default' not in kwargs:
dest = kwargs['dest']
if dest in self._defaults:
kwargs['default'] = self._defaults[dest]
elif self.argument_default is not None:
kwargs['default'] = self.argument_default
# create the action object, and add it to the parser
action_class = self._pop_action_class(kwargs)
if not _callable(action_class):
raise ValueError('unknown action "%s"' % (action_class,))
action = action_class(**kwargs)
# raise an error if the action type is not callable
type_func = self._registry_get('type', action.type, action.type)
if not _callable(type_func):
raise ValueError('%r is not callable' % (type_func,))
# raise an error if the metavar does not match the type
if hasattr(self, "_get_formatter"):
try:
self._get_formatter()._format_args(action, None)
except TypeError:
raise ValueError("length of metavar tuple does not match nargs")
return self._add_action(action)
def add_argument_group(self, *args, **kwargs):
group = _ArgumentGroup(self, *args, **kwargs)
self._action_groups.append(group)
return group
def add_mutually_exclusive_group(self, **kwargs):
group = _MutuallyExclusiveGroup(self, **kwargs)
self._mutually_exclusive_groups.append(group)
return group
def _add_action(self, action):
# resolve any conflicts
self._check_conflict(action)
# add to actions list
self._actions.append(action)
action.container = self
# index the action by any option strings it has
for option_string in action.option_strings:
self._option_string_actions[option_string] = action
# set the flag if any option strings look like negative numbers
for option_string in action.option_strings:
if self._negative_number_matcher.match(option_string):
if not self._has_negative_number_optionals:
self._has_negative_number_optionals.append(True)
# return the created action
return action
def _remove_action(self, action):
self._actions.remove(action)
def _add_container_actions(self, container):
# collect groups by titles
title_group_map = {}
for group in self._action_groups:
if group.title in title_group_map:
msg = _('cannot merge actions - two groups are named %r')
raise ValueError(msg % (group.title))
title_group_map[group.title] = group
# map each action to its group
group_map = {}
for group in container._action_groups:
# if a group with the title exists, use that, otherwise
# create a new group matching the container's group
if group.title not in title_group_map:
title_group_map[group.title] = self.add_argument_group(
title=group.title,
description=group.description,
conflict_handler=group.conflict_handler)
# map the actions to their new group
for action in group._group_actions:
group_map[action] = title_group_map[group.title]
# add container's mutually exclusive groups
# NOTE: if add_mutually_exclusive_group ever gains title= and
# description= then this code will need to be expanded as above
for group in container._mutually_exclusive_groups:
mutex_group = self.add_mutually_exclusive_group(
required=group.required)
# map the actions to their new mutex group
for action in group._group_actions:
group_map[action] = mutex_group
# add all actions to this container or their group
for action in container._actions:
group_map.get(action, self)._add_action(action)
def _get_positional_kwargs(self, dest, **kwargs):
# make sure required is not specified
if 'required' in kwargs:
msg = _("'required' is an invalid argument for positionals")
raise TypeError(msg)
# mark positional arguments as required if at least one is
# always required
if kwargs.get('nargs') not in [OPTIONAL, ZERO_OR_MORE]:
kwargs['required'] = True
if kwargs.get('nargs') == ZERO_OR_MORE and 'default' not in kwargs:
kwargs['required'] = True
# return the keyword arguments with no option strings
return dict(kwargs, dest=dest, option_strings=[])
def _get_optional_kwargs(self, *args, **kwargs):
# determine short and long option strings
option_strings = []
long_option_strings = []
for option_string in args:
# error on strings that don't start with an appropriate prefix
if not option_string[0] in self.prefix_chars:
msg = _('invalid option string %r: '
'must start with a character %r')
tup = option_string, self.prefix_chars
raise ValueError(msg % tup)
# strings starting with two prefix characters are long options
option_strings.append(option_string)
if option_string[0] in self.prefix_chars:
if len(option_string) > 1:
if option_string[1] in self.prefix_chars:
long_option_strings.append(option_string)
# infer destination, '--foo-bar' -> 'foo_bar' and '-x' -> 'x'
dest = kwargs.pop('dest', None)
if dest is None:
if long_option_strings:
dest_option_string = long_option_strings[0]
else:
dest_option_string = option_strings[0]
dest = dest_option_string.lstrip(self.prefix_chars)
if not dest:
msg = _('dest= is required for options like %r')
raise ValueError(msg % option_string)
dest = dest.replace('-', '_')
# return the updated keyword arguments
return dict(kwargs, dest=dest, option_strings=option_strings)
def _pop_action_class(self, kwargs, default=None):
action = kwargs.pop('action', default)
return self._registry_get('action', action, action)
def _get_handler(self):
# determine function from conflict handler string
handler_func_name = '_handle_conflict_%s' % self.conflict_handler
try:
return getattr(self, handler_func_name)
except AttributeError:
msg = _('invalid conflict_resolution value: %r')
raise ValueError(msg % self.conflict_handler)
def _check_conflict(self, action):
# find all options that conflict with this option
confl_optionals = []
for option_string in action.option_strings:
if option_string in self._option_string_actions:
confl_optional = self._option_string_actions[option_string]
confl_optionals.append((option_string, confl_optional))
# resolve any conflicts
if confl_optionals:
conflict_handler = self._get_handler()
conflict_handler(action, confl_optionals)
def _handle_conflict_error(self, action, conflicting_actions):
message = _('conflicting option string(s): %s')
conflict_string = ', '.join([option_string
for option_string, action
in conflicting_actions])
raise ArgumentError(action, message % conflict_string)
def _handle_conflict_resolve(self, action, conflicting_actions):
# remove all conflicting options
for option_string, action in conflicting_actions:
# remove the conflicting option
action.option_strings.remove(option_string)
self._option_string_actions.pop(option_string, None)
# if the option now has no option string, remove it from the
# container holding it
if not action.option_strings:
action.container._remove_action(action)
class _ArgumentGroup(_ActionsContainer):
def __init__(self, container, title=None, description=None, **kwargs):
# add any missing keyword arguments by checking the container
update = kwargs.setdefault
update('conflict_handler', container.conflict_handler)
update('prefix_chars', container.prefix_chars)
update('argument_default', container.argument_default)
super_init = super(_ArgumentGroup, self).__init__
super_init(description=description, **kwargs)
# group attributes
self.title = title
self._group_actions = []
# share most attributes with the container
self._registries = container._registries
self._actions = container._actions
self._option_string_actions = container._option_string_actions
self._defaults = container._defaults
self._has_negative_number_optionals = \
container._has_negative_number_optionals
self._mutually_exclusive_groups = container._mutually_exclusive_groups
def _add_action(self, action):
action = super(_ArgumentGroup, self)._add_action(action)
self._group_actions.append(action)
return action
def _remove_action(self, action):
super(_ArgumentGroup, self)._remove_action(action)
self._group_actions.remove(action)
class _MutuallyExclusiveGroup(_ArgumentGroup):
def __init__(self, container, required=False):
super(_MutuallyExclusiveGroup, self).__init__(container)
self.required = required
self._container = container
def _add_action(self, action):
if action.required:
msg = _('mutually exclusive arguments must be optional')
raise ValueError(msg)
action = self._container._add_action(action)
self._group_actions.append(action)
return action
def _remove_action(self, action):
self._container._remove_action(action)
self._group_actions.remove(action)
class ArgumentParser(_AttributeHolder, _ActionsContainer):
"""Object for parsing command line strings into Python objects.
Keyword Arguments:
- prog -- The name of the program (default: sys.argv[0])
- usage -- A usage message (default: auto-generated from arguments)
- description -- A description of what the program does
- epilog -- Text following the argument descriptions
- parents -- Parsers whose arguments should be copied into this one
- formatter_class -- HelpFormatter class for printing help messages
- prefix_chars -- Characters that prefix optional arguments
- fromfile_prefix_chars -- Characters that prefix files containing
additional arguments
- argument_default -- The default value for all arguments
- conflict_handler -- String indicating how to handle conflicts
- add_help -- Add a -h/-help option
"""
def __init__(self,
prog=None,
usage=None,
description=None,
epilog=None,
version=None,
parents=[],
formatter_class=HelpFormatter,
prefix_chars='-',
fromfile_prefix_chars=None,
argument_default=None,
conflict_handler='error',
add_help=True):
if version is not None:
import warnings
warnings.warn(
"""The "version" argument to ArgumentParser is deprecated. """
"""Please use """
""""add_argument(..., action='version', version="N", ...)" """
"""instead""", DeprecationWarning)
superinit = super(ArgumentParser, self).__init__
superinit(description=description,
prefix_chars=prefix_chars,
argument_default=argument_default,
conflict_handler=conflict_handler)
# default setting for prog
if prog is None:
prog = _os.path.basename(_sys.argv[0])
self.prog = prog
self.usage = usage
self.epilog = epilog
self.version = version
self.formatter_class = formatter_class
self.fromfile_prefix_chars = fromfile_prefix_chars
self.add_help = add_help
add_group = self.add_argument_group
self._positionals = add_group(_('positional arguments'))
self._optionals = add_group(_('optional arguments'))
self._subparsers = None
# register types
def identity(string):
return string
self.register('type', None, identity)
# add help and version arguments if necessary
# (using explicit default to override global argument_default)
default_prefix = '-' if '-' in prefix_chars else prefix_chars[0]
if self.add_help:
self.add_argument(
default_prefix+'h', default_prefix*2+'help',
action='help', default=SUPPRESS,
help=_('show this help message and exit'))
if self.version:
self.add_argument(
default_prefix+'v', default_prefix*2+'version',
action='version', default=SUPPRESS,
version=self.version,
help=_("show program's version number and exit"))
# add parent arguments and defaults
for parent in parents:
self._add_container_actions(parent)
try:
defaults = parent._defaults
except AttributeError:
pass
else:
self._defaults.update(defaults)
# =======================
# Pretty __repr__ methods
# =======================
def _get_kwargs(self):
names = [
'prog',
'usage',
'description',
'version',
'formatter_class',
'conflict_handler',
'add_help',
]
return [(name, getattr(self, name)) for name in names]
# ==================================
# Optional/Positional adding methods
# ==================================
def add_subparsers(self, **kwargs):
if self._subparsers is not None:
self.error(_('cannot have multiple subparser arguments'))
# add the parser class to the arguments if it's not present
kwargs.setdefault('parser_class', type(self))
if 'title' in kwargs or 'description' in kwargs:
title = _(kwargs.pop('title', 'subcommands'))
description = _(kwargs.pop('description', None))
self._subparsers = self.add_argument_group(title, description)
else:
self._subparsers = self._positionals
# prog defaults to the usage message of this parser, skipping
# optional arguments and with no "usage:" prefix
if kwargs.get('prog') is None:
formatter = self._get_formatter()
positionals = self._get_positional_actions()
groups = self._mutually_exclusive_groups
formatter.add_usage(self.usage, positionals, groups, '')
kwargs['prog'] = formatter.format_help().strip()
# create the parsers action and add it to the positionals list
parsers_class = self._pop_action_class(kwargs, 'parsers')
action = parsers_class(option_strings=[], **kwargs)
self._subparsers._add_action(action)
# return the created parsers action
return action
def _add_action(self, action):
if action.option_strings:
self._optionals._add_action(action)
else:
self._positionals._add_action(action)
return action
def _get_optional_actions(self):
return [action
for action in self._actions
if action.option_strings]
def _get_positional_actions(self):
return [action
for action in self._actions
if not action.option_strings]
# =====================================
# Command line argument parsing methods
# =====================================
def parse_args(self, args=None, namespace=None):
args, argv = self.parse_known_args(args, namespace)
if argv:
msg = _('unrecognized arguments: %s')
self.error(msg % ' '.join(argv))
return args
def parse_known_args(self, args=None, namespace=None):
if args is None:
# args default to the system args
args = _sys.argv[1:]
else:
# make sure that args are mutable
args = list(args)
# default Namespace built from parser defaults
if namespace is None:
namespace = Namespace()
# add any action defaults that aren't present
for action in self._actions:
if action.dest is not SUPPRESS:
if not hasattr(namespace, action.dest):
if action.default is not SUPPRESS:
setattr(namespace, action.dest, action.default)
# add any parser defaults that aren't present
for dest in self._defaults:
if not hasattr(namespace, dest):
setattr(namespace, dest, self._defaults[dest])
# parse the arguments and exit if there are any errors
try:
namespace, args = self._parse_known_args(args, namespace)
if hasattr(namespace, _UNRECOGNIZED_ARGS_ATTR):
args.extend(getattr(namespace, _UNRECOGNIZED_ARGS_ATTR))
delattr(namespace, _UNRECOGNIZED_ARGS_ATTR)
return namespace, args
except ArgumentError:
err = _sys.exc_info()[1]
self.error(str(err))
def _parse_known_args(self, arg_strings, namespace):
# replace arg strings that are file references
if self.fromfile_prefix_chars is not None:
arg_strings = self._read_args_from_files(arg_strings)
# map all mutually exclusive arguments to the other arguments
# they can't occur with
action_conflicts = {}
for mutex_group in self._mutually_exclusive_groups:
group_actions = mutex_group._group_actions
for i, mutex_action in enumerate(mutex_group._group_actions):
conflicts = action_conflicts.setdefault(mutex_action, [])
conflicts.extend(group_actions[:i])
conflicts.extend(group_actions[i + 1:])
# find all option indices, and determine the arg_string_pattern
# which has an 'O' if there is an option at an index,
# an 'A' if there is an argument, or a '-' if there is a '--'
option_string_indices = {}
arg_string_pattern_parts = []
arg_strings_iter = iter(arg_strings)
for i, arg_string in enumerate(arg_strings_iter):
# all args after -- are non-options
if arg_string == '--':
arg_string_pattern_parts.append('-')
for arg_string in arg_strings_iter:
arg_string_pattern_parts.append('A')
# otherwise, add the arg to the arg strings
# and note the index if it was an option
else:
option_tuple = self._parse_optional(arg_string)
if option_tuple is None:
pattern = 'A'
else:
option_string_indices[i] = option_tuple
pattern = 'O'
arg_string_pattern_parts.append(pattern)
# join the pieces together to form the pattern
arg_strings_pattern = ''.join(arg_string_pattern_parts)
# converts arg strings to the appropriate and then takes the action
seen_actions = set()
seen_non_default_actions = set()
def take_action(action, argument_strings, option_string=None):
seen_actions.add(action)
argument_values = self._get_values(action, argument_strings)
# error if this argument is not allowed with other previously
# seen arguments, assuming that actions that use the default
# value don't really count as "present"
if argument_values is not action.default:
seen_non_default_actions.add(action)
for conflict_action in action_conflicts.get(action, []):
if conflict_action in seen_non_default_actions:
msg = _('not allowed with argument %s')
action_name = _get_action_name(conflict_action)
raise ArgumentError(action, msg % action_name)
# take the action if we didn't receive a SUPPRESS value
# (e.g. from a default)
if argument_values is not SUPPRESS:
action(self, namespace, argument_values, option_string)
# function to convert arg_strings into an optional action
def consume_optional(start_index):
# get the optional identified at this index
option_tuple = option_string_indices[start_index]
action, option_string, explicit_arg = option_tuple
# identify additional optionals in the same arg string
# (e.g. -xyz is the same as -x -y -z if no args are required)
match_argument = self._match_argument
action_tuples = []
while True:
# if we found no optional action, skip it
if action is None:
extras.append(arg_strings[start_index])
return start_index + 1
# if there is an explicit argument, try to match the
# optional's string arguments to only this
if explicit_arg is not None:
arg_count = match_argument(action, 'A')
# if the action is a single-dash option and takes no
# arguments, try to parse more single-dash options out
# of the tail of the option string
chars = self.prefix_chars
if arg_count == 0 and option_string[1] not in chars:
action_tuples.append((action, [], option_string))
char = option_string[0]
option_string = char + explicit_arg[0]
new_explicit_arg = explicit_arg[1:] or None
optionals_map = self._option_string_actions
if option_string in optionals_map:
action = optionals_map[option_string]
explicit_arg = new_explicit_arg
else:
msg = _('ignored explicit argument %r')
raise ArgumentError(action, msg % explicit_arg)
# if the action expect exactly one argument, we've
# successfully matched the option; exit the loop
elif arg_count == 1:
stop = start_index + 1
args = [explicit_arg]
action_tuples.append((action, args, option_string))
break
# error if a double-dash option did not use the
# explicit argument
else:
msg = _('ignored explicit argument %r')
raise ArgumentError(action, msg % explicit_arg)
# if there is no explicit argument, try to match the
# optional's string arguments with the following strings
# if successful, exit the loop
else:
start = start_index + 1
selected_patterns = arg_strings_pattern[start:]
arg_count = match_argument(action, selected_patterns)
stop = start + arg_count
args = arg_strings[start:stop]
action_tuples.append((action, args, option_string))
break
# add the Optional to the list and return the index at which
# the Optional's string args stopped
assert action_tuples
for action, args, option_string in action_tuples:
take_action(action, args, option_string)
return stop
# the list of Positionals left to be parsed; this is modified
# by consume_positionals()
positionals = self._get_positional_actions()
# function to convert arg_strings into positional actions
def consume_positionals(start_index):
# match as many Positionals as possible
match_partial = self._match_arguments_partial
selected_pattern = arg_strings_pattern[start_index:]
arg_counts = match_partial(positionals, selected_pattern)
# slice off the appropriate arg strings for each Positional
# and add the Positional and its args to the list
for action, arg_count in zip(positionals, arg_counts):
args = arg_strings[start_index: start_index + arg_count]
start_index += arg_count
take_action(action, args)
# slice off the Positionals that we just parsed and return the
# index at which the Positionals' string args stopped
positionals[:] = positionals[len(arg_counts):]
return start_index
# consume Positionals and Optionals alternately, until we have
# passed the last option string
extras = []
start_index = 0
if option_string_indices:
max_option_string_index = max(option_string_indices)
else:
max_option_string_index = -1
while start_index <= max_option_string_index:
# consume any Positionals preceding the next option
next_option_string_index = min([
index
for index in option_string_indices
if index >= start_index])
if start_index != next_option_string_index:
positionals_end_index = consume_positionals(start_index)
# only try to parse the next optional if we didn't consume
# the option string during the positionals parsing
if positionals_end_index > start_index:
start_index = positionals_end_index
continue
else:
start_index = positionals_end_index
# if we consumed all the positionals we could and we're not
# at the index of an option string, there were extra arguments
if start_index not in option_string_indices:
strings = arg_strings[start_index:next_option_string_index]
extras.extend(strings)
start_index = next_option_string_index
# consume the next optional and any arguments for it
start_index = consume_optional(start_index)
# consume any positionals following the last Optional
stop_index = consume_positionals(start_index)
# if we didn't consume all the argument strings, there were extras
extras.extend(arg_strings[stop_index:])
# if we didn't use all the Positional objects, there were too few
# arg strings supplied.
if positionals:
self.error(_('too few arguments'))
# make sure all required actions were present, and convert defaults.
for action in self._actions:
if action not in seen_actions:
if action.required:
name = _get_action_name(action)
self.error(_('argument %s is required') % name)
else:
# Convert action default now instead of doing it before
# parsing arguments to avoid calling convert functions
# twice (which may fail) if the argument was given, but
# only if it was defined already in the namespace
if (action.default is not None and
isinstance(action.default, basestring) and
hasattr(namespace, action.dest) and
action.default is getattr(namespace, action.dest)):
setattr(namespace, action.dest,
self._get_value(action, action.default))
# make sure all required groups had one option present
for group in self._mutually_exclusive_groups:
if group.required:
for action in group._group_actions:
if action in seen_non_default_actions:
break
# if no actions were used, report the error
else:
names = [_get_action_name(action)
for action in group._group_actions
if action.help is not SUPPRESS]
msg = _('one of the arguments %s is required')
self.error(msg % ' '.join(names))
# return the updated namespace and the extra arguments
return namespace, extras
def _read_args_from_files(self, arg_strings):
# expand arguments referencing files
new_arg_strings = []
for arg_string in arg_strings:
# for regular arguments, just add them back into the list
if not arg_string or arg_string[0] not in self.fromfile_prefix_chars:
new_arg_strings.append(arg_string)
# replace arguments referencing files with the file content
else:
try:
args_file = open(arg_string[1:])
try:
arg_strings = []
for arg_line in args_file.read().splitlines():
for arg in self.convert_arg_line_to_args(arg_line):
arg_strings.append(arg)
arg_strings = self._read_args_from_files(arg_strings)
new_arg_strings.extend(arg_strings)
finally:
args_file.close()
except IOError:
err = _sys.exc_info()[1]
self.error(str(err))
# return the modified argument list
return new_arg_strings
def convert_arg_line_to_args(self, arg_line):
return [arg_line]
def _match_argument(self, action, arg_strings_pattern):
# match the pattern for this action to the arg strings
nargs_pattern = self._get_nargs_pattern(action)
match = _re.match(nargs_pattern, arg_strings_pattern)
# raise an exception if we weren't able to find a match
if match is None:
nargs_errors = {
None: _('expected one argument'),
OPTIONAL: _('expected at most one argument'),
ONE_OR_MORE: _('expected at least one argument'),
}
default = _('expected %s argument(s)') % action.nargs
msg = nargs_errors.get(action.nargs, default)
raise ArgumentError(action, msg)
# return the number of arguments matched
return len(match.group(1))
def _match_arguments_partial(self, actions, arg_strings_pattern):
# progressively shorten the actions list by slicing off the
# final actions until we find a match
result = []
for i in range(len(actions), 0, -1):
actions_slice = actions[:i]
pattern = ''.join([self._get_nargs_pattern(action)
for action in actions_slice])
match = _re.match(pattern, arg_strings_pattern)
if match is not None:
result.extend([len(string) for string in match.groups()])
break
# return the list of arg string counts
return result
def _parse_optional(self, arg_string):
# if it's an empty string, it was meant to be a positional
if not arg_string:
return None
# if it doesn't start with a prefix, it was meant to be positional
if not arg_string[0] in self.prefix_chars:
return None
# if the option string is present in the parser, return the action
if arg_string in self._option_string_actions:
action = self._option_string_actions[arg_string]
return action, arg_string, None
# if it's just a single character, it was meant to be positional
if len(arg_string) == 1:
return None
# if the option string before the "=" is present, return the action
if '=' in arg_string:
option_string, explicit_arg = arg_string.split('=', 1)
if option_string in self._option_string_actions:
action = self._option_string_actions[option_string]
return action, option_string, explicit_arg
# search through all possible prefixes of the option string
# and all actions in the parser for possible interpretations
option_tuples = self._get_option_tuples(arg_string)
# if multiple actions match, the option string was ambiguous
if len(option_tuples) > 1:
options = ', '.join([option_string
for action, option_string, explicit_arg in option_tuples])
tup = arg_string, options
self.error(_('ambiguous option: %s could match %s') % tup)
# if exactly one action matched, this segmentation is good,
# so return the parsed action
elif len(option_tuples) == 1:
option_tuple, = option_tuples
return option_tuple
# if it was not found as an option, but it looks like a negative
# number, it was meant to be positional
# unless there are negative-number-like options
if self._negative_number_matcher.match(arg_string):
if not self._has_negative_number_optionals:
return None
# if it contains a space, it was meant to be a positional
if ' ' in arg_string:
return None
# it was meant to be an optional but there is no such option
# in this parser (though it might be a valid option in a subparser)
return None, arg_string, None
def _get_option_tuples(self, option_string):
result = []
# option strings starting with two prefix characters are only
# split at the '='
chars = self.prefix_chars
if option_string[0] in chars and option_string[1] in chars:
if '=' in option_string:
option_prefix, explicit_arg = option_string.split('=', 1)
else:
option_prefix = option_string
explicit_arg = None
for option_string in self._option_string_actions:
if option_string.startswith(option_prefix):
action = self._option_string_actions[option_string]
tup = action, option_string, explicit_arg
result.append(tup)
# single character options can be concatenated with their arguments
# but multiple character options always have to have their argument
# separate
elif option_string[0] in chars and option_string[1] not in chars:
option_prefix = option_string
explicit_arg = None
short_option_prefix = option_string[:2]
short_explicit_arg = option_string[2:]
for option_string in self._option_string_actions:
if option_string == short_option_prefix:
action = self._option_string_actions[option_string]
tup = action, option_string, short_explicit_arg
result.append(tup)
elif option_string.startswith(option_prefix):
action = self._option_string_actions[option_string]
tup = action, option_string, explicit_arg
result.append(tup)
# shouldn't ever get here
else:
self.error(_('unexpected option string: %s') % option_string)
# return the collected option tuples
return result
def _get_nargs_pattern(self, action):
# in all examples below, we have to allow for '--' args
# which are represented as '-' in the pattern
nargs = action.nargs
# the default (None) is assumed to be a single argument
if nargs is None:
nargs_pattern = '(-*A-*)'
# allow zero or one arguments
elif nargs == OPTIONAL:
nargs_pattern = '(-*A?-*)'
# allow zero or more arguments
elif nargs == ZERO_OR_MORE:
nargs_pattern = '(-*[A-]*)'
# allow one or more arguments
elif nargs == ONE_OR_MORE:
nargs_pattern = '(-*A[A-]*)'
# allow any number of options or arguments
elif nargs == REMAINDER:
nargs_pattern = '([-AO]*)'
# allow one argument followed by any number of options or arguments
elif nargs == PARSER:
nargs_pattern = '(-*A[-AO]*)'
# all others should be integers
else:
nargs_pattern = '(-*%s-*)' % '-*'.join('A' * nargs)
# if this is an optional action, -- is not allowed
if action.option_strings:
nargs_pattern = nargs_pattern.replace('-*', '')
nargs_pattern = nargs_pattern.replace('-', '')
# return the pattern
return nargs_pattern
# ========================
# Value conversion methods
# ========================
def _get_values(self, action, arg_strings):
# for everything but PARSER, REMAINDER args, strip out first '--'
if action.nargs not in [PARSER, REMAINDER]:
try:
arg_strings.remove('--')
except ValueError:
pass
# optional argument produces a default when not present
if not arg_strings and action.nargs == OPTIONAL:
if action.option_strings:
value = action.const
else:
value = action.default
if isinstance(value, basestring):
value = self._get_value(action, value)
self._check_value(action, value)
# when nargs='*' on a positional, if there were no command-line
# args, use the default if it is anything other than None
elif (not arg_strings and action.nargs == ZERO_OR_MORE and
not action.option_strings):
if action.default is not None:
value = action.default
else:
value = arg_strings
self._check_value(action, value)
# single argument or optional argument produces a single value
elif len(arg_strings) == 1 and action.nargs in [None, OPTIONAL]:
arg_string, = arg_strings
value = self._get_value(action, arg_string)
self._check_value(action, value)
# REMAINDER arguments convert all values, checking none
elif action.nargs == REMAINDER:
value = [self._get_value(action, v) for v in arg_strings]
# PARSER arguments convert all values, but check only the first
elif action.nargs == PARSER:
value = [self._get_value(action, v) for v in arg_strings]
self._check_value(action, value[0])
# all other types of nargs produce a list
else:
value = [self._get_value(action, v) for v in arg_strings]
for v in value:
self._check_value(action, v)
# return the converted value
return value
def _get_value(self, action, arg_string):
type_func = self._registry_get('type', action.type, action.type)
if not _callable(type_func):
msg = _('%r is not callable')
raise ArgumentError(action, msg % type_func)
# convert the value to the appropriate type
try:
result = type_func(arg_string)
# ArgumentTypeErrors indicate errors
except ArgumentTypeError:
name = getattr(action.type, '__name__', repr(action.type))
msg = str(_sys.exc_info()[1])
raise ArgumentError(action, msg)
# TypeErrors or ValueErrors also indicate errors
except (TypeError, ValueError):
name = getattr(action.type, '__name__', repr(action.type))
msg = _('invalid %s value: %r')
raise ArgumentError(action, msg % (name, arg_string))
# return the converted value
return result
def _check_value(self, action, value):
# converted value must be one of the choices (if specified)
if action.choices is not None and value not in action.choices:
tup = value, ', '.join(map(repr, action.choices))
msg = _('invalid choice: %r (choose from %s)') % tup
raise ArgumentError(action, msg)
# =======================
# Help-formatting methods
# =======================
def format_usage(self):
formatter = self._get_formatter()
formatter.add_usage(self.usage, self._actions,
self._mutually_exclusive_groups)
return formatter.format_help()
def format_help(self):
formatter = self._get_formatter()
# usage
formatter.add_usage(self.usage, self._actions,
self._mutually_exclusive_groups)
# description
formatter.add_text(self.description)
# positionals, optionals and user-defined groups
for action_group in self._action_groups:
formatter.start_section(action_group.title)
formatter.add_text(action_group.description)
formatter.add_arguments(action_group._group_actions)
formatter.end_section()
# epilog
formatter.add_text(self.epilog)
# determine help from format above
return formatter.format_help()
def format_version(self):
import warnings
warnings.warn(
'The format_version method is deprecated -- the "version" '
'argument to ArgumentParser is no longer supported.',
DeprecationWarning)
formatter = self._get_formatter()
formatter.add_text(self.version)
return formatter.format_help()
def _get_formatter(self):
return self.formatter_class(prog=self.prog)
# =====================
# Help-printing methods
# =====================
def print_usage(self, file=None):
if file is None:
file = _sys.stdout
self._print_message(self.format_usage(), file)
def print_help(self, file=None):
if file is None:
file = _sys.stdout
self._print_message(self.format_help(), file)
def print_version(self, file=None):
import warnings
warnings.warn(
'The print_version method is deprecated -- the "version" '
'argument to ArgumentParser is no longer supported.',
DeprecationWarning)
self._print_message(self.format_version(), file)
def _print_message(self, message, file=None):
if message:
if file is None:
file = _sys.stderr
file.write(message)
# ===============
# Exiting methods
# ===============
def exit(self, status=0, message=None):
if message:
self._print_message(message, _sys.stderr)
_sys.exit(status)
def error(self, message):
"""error(message: string)
Prints a usage message incorporating the message to stderr and
exits.
If you override this in a subclass, it should not return -- it
should either exit or raise an exception.
"""
self.print_usage(_sys.stderr)
self.exit(2, _('%s: error: %s\n') % (self.prog, message))
# -*- coding: utf-8 -*-
"""
ast
~~~
The `ast` module helps Python applications to process trees of the Python
abstract syntax grammar. The abstract syntax itself might change with
each Python release; this module helps to find out programmatically what
the current grammar looks like and allows modifications of it.
An abstract syntax tree can be generated by passing `ast.PyCF_ONLY_AST` as
a flag to the `compile()` builtin function or by using the `parse()`
function from this module. The result will be a tree of objects whose
classes all inherit from `ast.AST`.
A modified abstract syntax tree can be compiled into a Python code object
using the built-in `compile()` function.
Additionally various helper functions are provided that make working with
the trees simpler. The main intention of the helper functions and this
module in general is to provide an easy to use interface for libraries
that work tightly with the python syntax (template engines for example).
:copyright: Copyright 2008 by Armin Ronacher.
:license: Python License.
"""
from _ast import *
from _ast import __version__
def parse(source, filename='<unknown>', mode='exec'):
"""
Parse the source into an AST node.
Equivalent to compile(source, filename, mode, PyCF_ONLY_AST).
"""
return compile(source, filename, mode, PyCF_ONLY_AST)
def literal_eval(node_or_string):
"""
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the following
Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
and None.
"""
_safe_names = {'None': None, 'True': True, 'False': False}
if isinstance(node_or_string, basestring):
node_or_string = parse(node_or_string, mode='eval')
if isinstance(node_or_string, Expression):
node_or_string = node_or_string.body
def _convert(node):
if isinstance(node, Str):
return node.s
elif isinstance(node, Num):
return node.n
elif isinstance(node, Tuple):
return tuple(map(_convert, node.elts))
elif isinstance(node, List):
return list(map(_convert, node.elts))
elif isinstance(node, Dict):
return dict((_convert(k), _convert(v)) for k, v
in zip(node.keys, node.values))
elif isinstance(node, Name):
if node.id in _safe_names:
return _safe_names[node.id]
elif isinstance(node, BinOp) and \
isinstance(node.op, (Add, Sub)) and \
isinstance(node.right, Num) and \
isinstance(node.right.n, complex) and \
isinstance(node.left, Num) and \
isinstance(node.left.n, (int, long, float)):
left = node.left.n
right = node.right.n
if isinstance(node.op, Add):
return left + right
else:
return left - right
raise ValueError('malformed string')
return _convert(node_or_string)
def dump(node, annotate_fields=True, include_attributes=False):
"""
Return a formatted dump of the tree in *node*. This is mainly useful for
debugging purposes. The returned string will show the names and the values
for fields. This makes the code impossible to evaluate, so if evaluation is
wanted *annotate_fields* must be set to False. Attributes such as line
numbers and column offsets are not dumped by default. If this is wanted,
*include_attributes* can be set to True.
"""
def _format(node):
if isinstance(node, AST):
fields = [(a, _format(b)) for a, b in iter_fields(node)]
rv = '%s(%s' % (node.__class__.__name__, ', '.join(
('%s=%s' % field for field in fields)
if annotate_fields else
(b for a, b in fields)
))
if include_attributes and node._attributes:
rv += fields and ', ' or ' '
rv += ', '.join('%s=%s' % (a, _format(getattr(node, a)))
for a in node._attributes)
return rv + ')'
elif isinstance(node, list):
return '[%s]' % ', '.join(_format(x) for x in node)
return repr(node)
if not isinstance(node, AST):
raise TypeError('expected AST, got %r' % node.__class__.__name__)
return _format(node)
def copy_location(new_node, old_node):
"""
Copy source location (`lineno` and `col_offset` attributes) from
*old_node* to *new_node* if possible, and return *new_node*.
"""
for attr in 'lineno', 'col_offset':
if attr in old_node._attributes and attr in new_node._attributes \
and hasattr(old_node, attr):
setattr(new_node, attr, getattr(old_node, attr))
return new_node
def fix_missing_locations(node):
"""
When you compile a node tree with compile(), the compiler expects lineno and
col_offset attributes for every node that supports them. This is rather
tedious to fill in for generated nodes, so this helper adds these attributes
recursively where not already set, by setting them to the values of the
parent node. It works recursively starting at *node*.
"""
def _fix(node, lineno, col_offset):
if 'lineno' in node._attributes:
if not hasattr(node, 'lineno'):
node.lineno = lineno
else:
lineno = node.lineno
if 'col_offset' in node._attributes:
if not hasattr(node, 'col_offset'):
node.col_offset = col_offset
else:
col_offset = node.col_offset
for child in iter_child_nodes(node):
_fix(child, lineno, col_offset)
_fix(node, 1, 0)
return node
def increment_lineno(node, n=1):
"""
Increment the line number of each node in the tree starting at *node* by *n*.
This is useful to "move code" to a different location in a file.
"""
for child in walk(node):
if 'lineno' in child._attributes:
child.lineno = getattr(child, 'lineno', 0) + n
return node
def iter_fields(node):
"""
Yield a tuple of ``(fieldname, value)`` for each field in ``node._fields``
that is present on *node*.
"""
for field in node._fields:
try:
yield field, getattr(node, field)
except AttributeError:
pass
def iter_child_nodes(node):
"""
Yield all direct child nodes of *node*, that is, all fields that are nodes
and all items of fields that are lists of nodes.
"""
for name, field in iter_fields(node):
if isinstance(field, AST):
yield field
elif isinstance(field, list):
for item in field:
if isinstance(item, AST):
yield item
def get_docstring(node, clean=True):
"""
Return the docstring for the given node or None if no docstring can
be found. If the node provided does not have docstrings a TypeError
will be raised.
"""
if not isinstance(node, (FunctionDef, ClassDef, Module)):
raise TypeError("%r can't have docstrings" % node.__class__.__name__)
if node.body and isinstance(node.body[0], Expr) and \
isinstance(node.body[0].value, Str):
if clean:
import inspect
return inspect.cleandoc(node.body[0].value.s)
return node.body[0].value.s
def walk(node):
"""
Recursively yield all descendant nodes in the tree starting at *node*
(including *node* itself), in no specified order. This is useful if you
only want to modify nodes in place and don't care about the context.
"""
from collections import deque
todo = deque([node])
while todo:
node = todo.popleft()
todo.extend(iter_child_nodes(node))
yield node
class NodeVisitor(object):
"""
A node visitor base class that walks the abstract syntax tree and calls a
visitor function for every node found. This function may return a value
which is forwarded by the `visit` method.
This class is meant to be subclassed, with the subclass adding visitor
methods.
Per default the visitor functions for the nodes are ``'visit_'`` +
class name of the node. So a `TryFinally` node visit function would
be `visit_TryFinally`. This behavior can be changed by overriding
the `visit` method. If no visitor function exists for a node
(return value `None`) the `generic_visit` visitor is used instead.
Don't use the `NodeVisitor` if you want to apply changes to nodes during
traversing. For this a special visitor exists (`NodeTransformer`) that
allows modifications.
"""
def visit(self, node):
"""Visit a node."""
method = 'visit_' + node.__class__.__name__
visitor = getattr(self, method, self.generic_visit)
return visitor(node)
def generic_visit(self, node):
"""Called if no explicit visitor function exists for a node."""
for field, value in iter_fields(node):
if isinstance(value, list):
for item in value:
if isinstance(item, AST):
self.visit(item)
elif isinstance(value, AST):
self.visit(value)
class NodeTransformer(NodeVisitor):
"""
A :class:`NodeVisitor` subclass that walks the abstract syntax tree and
allows modification of nodes.
The `NodeTransformer` will walk the AST and use the return value of the
visitor methods to replace or remove the old node. If the return value of
the visitor method is ``None``, the node will be removed from its location,
otherwise it is replaced with the return value. The return value may be the
original node in which case no replacement takes place.
Here is an example transformer that rewrites all occurrences of name lookups
(``foo``) to ``data['foo']``::
class RewriteName(NodeTransformer):
def visit_Name(self, node):
return copy_location(Subscript(
value=Name(id='data', ctx=Load()),
slice=Index(value=Str(s=node.id)),
ctx=node.ctx
), node)
Keep in mind that if the node you're operating on has child nodes you must
either transform the child nodes yourself or call the :meth:`generic_visit`
method for the node first.
For nodes that were part of a collection of statements (that applies to all
statement nodes), the visitor may also return a list of nodes rather than
just a single node.
Usually you use the transformer like this::
node = YourTransformer().visit(node)
"""
def generic_visit(self, node):
for field, old_value in iter_fields(node):
old_value = getattr(node, field, None)
if isinstance(old_value, list):
new_values = []
for value in old_value:
if isinstance(value, AST):
value = self.visit(value)
if value is None:
continue
elif not isinstance(value, AST):
new_values.extend(value)
continue
new_values.append(value)
old_value[:] = new_values
elif isinstance(old_value, AST):
new_node = self.visit(old_value)
if new_node is None:
delattr(node, field)
else:
setattr(node, field, new_node)
return node
# -*- Mode: Python; tab-width: 4 -*-
# Id: asynchat.py,v 2.26 2000/09/07 22:29:26 rushing Exp
# Author: Sam Rushing <[email protected]>
# ======================================================================
# Copyright 1996 by Sam Rushing
#
# All Rights Reserved
#
# Permission to use, copy, modify, and distribute this software and
# its documentation for any purpose and without fee is hereby
# granted, provided that the above copyright notice appear in all
# copies and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of Sam
# Rushing not be used in advertising or publicity pertaining to
# distribution of the software without specific, written prior
# permission.
#
# SAM RUSHING DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
# INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN
# NO EVENT SHALL SAM RUSHING BE LIABLE FOR ANY SPECIAL, INDIRECT OR
# CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
# OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
# NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
# CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
# ======================================================================
r"""A class supporting chat-style (command/response) protocols.
This class adds support for 'chat' style protocols - where one side
sends a 'command', and the other sends a response (examples would be
the common internet protocols - smtp, nntp, ftp, etc..).
The handle_read() method looks at the input stream for the current
'terminator' (usually '\r\n' for single-line responses, '\r\n.\r\n'
for multi-line output), calling self.found_terminator() on its
receipt.
for example:
Say you build an async nntp client using this class. At the start
of the connection, you'll have self.terminator set to '\r\n', in
order to process the single-line greeting. Just before issuing a
'LIST' command you'll set it to '\r\n.\r\n'. The output of the LIST
command will be accumulated (using your own 'collect_incoming_data'
method) up to the terminator, and then control will be returned to
you - by calling your self.found_terminator() method.
"""
import asyncore
import errno
import socket
from collections import deque
from sys import py3kwarning
from warnings import filterwarnings, catch_warnings
_BLOCKING_IO_ERRORS = (errno.EAGAIN, errno.EALREADY, errno.EINPROGRESS,
errno.EWOULDBLOCK)
class async_chat (asyncore.dispatcher):
"""This is an abstract class. You must derive from this class, and add
the two methods collect_incoming_data() and found_terminator()"""
# these are overridable defaults
ac_in_buffer_size = 4096
ac_out_buffer_size = 4096
def __init__ (self, sock=None, map=None):
# for string terminator matching
self.ac_in_buffer = ''
# we use a list here rather than cStringIO for a few reasons...
# del lst[:] is faster than sio.truncate(0)
# lst = [] is faster than sio.truncate(0)
# cStringIO will be gaining unicode support in py3k, which
# will negatively affect the performance of bytes compared to
# a ''.join() equivalent
self.incoming = []
# we toss the use of the "simple producer" and replace it with
# a pure deque, which the original fifo was a wrapping of
self.producer_fifo = deque()
asyncore.dispatcher.__init__ (self, sock, map)
def collect_incoming_data(self, data):
raise NotImplementedError("must be implemented in subclass")
def _collect_incoming_data(self, data):
self.incoming.append(data)
def _get_data(self):
d = ''.join(self.incoming)
del self.incoming[:]
return d
def found_terminator(self):
raise NotImplementedError("must be implemented in subclass")
def set_terminator (self, term):
"Set the input delimiter. Can be a fixed string of any length, an integer, or None"
self.terminator = term
def get_terminator (self):
return self.terminator
# grab some more data from the socket,
# throw it to the collector method,
# check for the terminator,
# if found, transition to the next state.
def handle_read (self):
try:
data = self.recv (self.ac_in_buffer_size)
except socket.error, why:
if why.args[0] in _BLOCKING_IO_ERRORS:
return
self.handle_error()
return
self.ac_in_buffer = self.ac_in_buffer + data
# Continue to search for self.terminator in self.ac_in_buffer,
# while calling self.collect_incoming_data. The while loop
# is necessary because we might read several data+terminator
# combos with a single recv(4096).
while self.ac_in_buffer:
lb = len(self.ac_in_buffer)
terminator = self.get_terminator()
if not terminator:
# no terminator, collect it all
self.collect_incoming_data (self.ac_in_buffer)
self.ac_in_buffer = ''
elif isinstance(terminator, (int, long)):
# numeric terminator
n = terminator
if lb < n:
self.collect_incoming_data (self.ac_in_buffer)
self.ac_in_buffer = ''
self.terminator = self.terminator - lb
else:
self.collect_incoming_data (self.ac_in_buffer[:n])
self.ac_in_buffer = self.ac_in_buffer[n:]
self.terminator = 0
self.found_terminator()
else:
# 3 cases:
# 1) end of buffer matches terminator exactly:
# collect data, transition
# 2) end of buffer matches some prefix:
# collect data to the prefix
# 3) end of buffer does not match any prefix:
# collect data
terminator_len = len(terminator)
index = self.ac_in_buffer.find(terminator)
if index != -1:
# we found the terminator
if index > 0:
# don't bother reporting the empty string (source of subtle bugs)
self.collect_incoming_data (self.ac_in_buffer[:index])
self.ac_in_buffer = self.ac_in_buffer[index+terminator_len:]
# This does the Right Thing if the terminator is changed here.
self.found_terminator()
else:
# check for a prefix of the terminator
index = find_prefix_at_end (self.ac_in_buffer, terminator)
if index:
if index != lb:
# we found a prefix, collect up to the prefix
self.collect_incoming_data (self.ac_in_buffer[:-index])
self.ac_in_buffer = self.ac_in_buffer[-index:]
break
else:
# no prefix, collect it all
self.collect_incoming_data (self.ac_in_buffer)
self.ac_in_buffer = ''
def handle_write (self):
self.initiate_send()
def handle_close (self):
self.close()
def push (self, data):
sabs = self.ac_out_buffer_size
if len(data) > sabs:
for i in xrange(0, len(data), sabs):
self.producer_fifo.append(data[i:i+sabs])
else:
self.producer_fifo.append(data)
self.initiate_send()
def push_with_producer (self, producer):
self.producer_fifo.append(producer)
self.initiate_send()
def readable (self):
"predicate for inclusion in the readable for select()"
# cannot use the old predicate, it violates the claim of the
# set_terminator method.
# return (len(self.ac_in_buffer) <= self.ac_in_buffer_size)
return 1
def writable (self):
"predicate for inclusion in the writable for select()"
return self.producer_fifo or (not self.connected)
def close_when_done (self):
"automatically close this channel once the outgoing queue is empty"
self.producer_fifo.append(None)
def initiate_send(self):
while self.producer_fifo and self.connected:
first = self.producer_fifo[0]
# handle empty string/buffer or None entry
if not first:
del self.producer_fifo[0]
if first is None:
self.handle_close()
return
# handle classic producer behavior
obs = self.ac_out_buffer_size
try:
with catch_warnings():
if py3kwarning:
filterwarnings("ignore", ".*buffer", DeprecationWarning)
data = buffer(first, 0, obs)
except TypeError:
data = first.more()
if data:
self.producer_fifo.appendleft(data)
else:
del self.producer_fifo[0]
continue
# send the data
try:
num_sent = self.send(data)
except socket.error:
self.handle_error()
return
if num_sent:
if num_sent < len(data) or obs < len(first):
self.producer_fifo[0] = first[num_sent:]
else:
del self.producer_fifo[0]
# we tried to send some actual data
return
def discard_buffers (self):
# Emergencies only!
self.ac_in_buffer = ''
del self.incoming[:]
self.producer_fifo.clear()
class simple_producer:
def __init__ (self, data, buffer_size=512):
self.data = data
self.buffer_size = buffer_size
def more (self):
if len (self.data) > self.buffer_size:
result = self.data[:self.buffer_size]
self.data = self.data[self.buffer_size:]
return result
else:
result = self.data
self.data = ''
return result
class fifo:
def __init__ (self, list=None):
if not list:
self.list = deque()
else:
self.list = deque(list)
def __len__ (self):
return len(self.list)
def is_empty (self):
return not self.list
def first (self):
return self.list[0]
def push (self, data):
self.list.append(data)
def pop (self):
if self.list:
return (1, self.list.popleft())
else:
return (0, None)
# Given 'haystack', see if any prefix of 'needle' is at its end. This
# assumes an exact match has already been checked. Return the number of
# characters matched.
# for example:
# f_p_a_e ("qwerty\r", "\r\n") => 1
# f_p_a_e ("qwertydkjf", "\r\n") => 0
# f_p_a_e ("qwerty\r\n", "\r\n") => <undefined>
# this could maybe be made faster with a computed regex?
# [answer: no; circa Python-2.0, Jan 2001]
# new python: 28961/s
# old python: 18307/s
# re: 12820/s
# regex: 14035/s
def find_prefix_at_end (haystack, needle):
l = len(needle) - 1
while l and not haystack.endswith(needle[:l]):
l -= 1
return l
# -*- Mode: Python -*-
# Id: asyncore.py,v 2.51 2000/09/07 22:29:26 rushing Exp
# Author: Sam Rushing <[email protected]>
# ======================================================================
# Copyright 1996 by Sam Rushing
#
# All Rights Reserved
#
# Permission to use, copy, modify, and distribute this software and
# its documentation for any purpose and without fee is hereby
# granted, provided that the above copyright notice appear in all
# copies and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of Sam
# Rushing not be used in advertising or publicity pertaining to
# distribution of the software without specific, written prior
# permission.
#
# SAM RUSHING DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
# INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN
# NO EVENT SHALL SAM RUSHING BE LIABLE FOR ANY SPECIAL, INDIRECT OR
# CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
# OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
# NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
# CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
# ======================================================================
"""Basic infrastructure for asynchronous socket service clients and servers.
There are only two ways to have a program on a single processor do "more
than one thing at a time". Multi-threaded programming is the simplest and
most popular way to do it, but there is another very different technique,
that lets you have nearly all the advantages of multi-threading, without
actually using multiple threads. it's really only practical if your program
is largely I/O bound. If your program is CPU bound, then pre-emptive
scheduled threads are probably what you really need. Network servers are
rarely CPU-bound, however.
If your operating system supports the select() system call in its I/O
library (and nearly all do), then you can use it to juggle multiple
communication channels at once; doing other work while your I/O is taking
place in the "background." Although this strategy can seem strange and
complex, especially at first, it is in many ways easier to understand and
control than multi-threaded programming. The module documented here solves
many of the difficult problems for you, making the task of building
sophisticated high-performance network servers and clients a snap.
"""
import select
import socket
import sys
import time
import warnings
import os
from errno import EALREADY, EINPROGRESS, EWOULDBLOCK, ECONNRESET, EINVAL, \
ENOTCONN, ESHUTDOWN, EINTR, EISCONN, EBADF, ECONNABORTED, EPIPE, EAGAIN, \
errorcode
_DISCONNECTED = frozenset((ECONNRESET, ENOTCONN, ESHUTDOWN, ECONNABORTED, EPIPE,
EBADF))
try:
socket_map
except NameError:
socket_map = {}
def _strerror(err):
try:
return os.strerror(err)
except (ValueError, OverflowError, NameError):
if err in errorcode:
return errorcode[err]
return "Unknown error %s" %err
class ExitNow(Exception):
pass
_reraised_exceptions = (ExitNow, KeyboardInterrupt, SystemExit)
def read(obj):
try:
obj.handle_read_event()
except _reraised_exceptions:
raise
except:
obj.handle_error()
def write(obj):
try:
obj.handle_write_event()
except _reraised_exceptions:
raise
except:
obj.handle_error()
def _exception(obj):
try:
obj.handle_expt_event()
except _reraised_exceptions:
raise
except:
obj.handle_error()
def readwrite(obj, flags):
try:
if flags & select.POLLIN:
obj.handle_read_event()
if flags & select.POLLOUT:
obj.handle_write_event()
if flags & select.POLLPRI:
obj.handle_expt_event()
if flags & (select.POLLHUP | select.POLLERR | select.POLLNVAL):
obj.handle_close()
except socket.error, e:
if e.args[0] not in _DISCONNECTED:
obj.handle_error()
else:
obj.handle_close()
except _reraised_exceptions:
raise
except:
obj.handle_error()
def poll(timeout=0.0, map=None):
if map is None:
map = socket_map
if map:
r = []; w = []; e = []
for fd, obj in map.items():
is_r = obj.readable()
is_w = obj.writable()
if is_r:
r.append(fd)
# accepting sockets should not be writable
if is_w and not obj.accepting:
w.append(fd)
if is_r or is_w:
e.append(fd)
if [] == r == w == e:
time.sleep(timeout)
return
try:
r, w, e = select.select(r, w, e, timeout)
except select.error, err:
if err.args[0] != EINTR:
raise
else:
return
for fd in r:
obj = map.get(fd)
if obj is None:
continue
read(obj)
for fd in w:
obj = map.get(fd)
if obj is None:
continue
write(obj)
for fd in e:
obj = map.get(fd)
if obj is None:
continue
_exception(obj)
def poll2(timeout=0.0, map=None):
# Use the poll() support added to the select module in Python 2.0
if map is None:
map = socket_map
if timeout is not None:
# timeout is in milliseconds
timeout = int(timeout*1000)
pollster = select.poll()
if map:
for fd, obj in map.items():
flags = 0
if obj.readable():
flags |= select.POLLIN | select.POLLPRI
# accepting sockets should not be writable
if obj.writable() and not obj.accepting:
flags |= select.POLLOUT
if flags:
# Only check for exceptions if object was either readable
# or writable.
flags |= select.POLLERR | select.POLLHUP | select.POLLNVAL
pollster.register(fd, flags)
try:
r = pollster.poll(timeout)
except select.error, err:
if err.args[0] != EINTR:
raise
r = []
for fd, flags in r:
obj = map.get(fd)
if obj is None:
continue
readwrite(obj, flags)
poll3 = poll2 # Alias for backward compatibility
def loop(timeout=30.0, use_poll=False, map=None, count=None):
if map is None:
map = socket_map
if use_poll and hasattr(select, 'poll'):
poll_fun = poll2
else:
poll_fun = poll
if count is None:
while map:
poll_fun(timeout, map)
else:
while map and count > 0:
poll_fun(timeout, map)
count = count - 1
class dispatcher:
debug = False
connected = False
accepting = False
connecting = False
closing = False
addr = None
ignore_log_types = frozenset(['warning'])
def __init__(self, sock=None, map=None):
if map is None:
self._map = socket_map
else:
self._map = map
self._fileno = None
if sock:
# Set to nonblocking just to make sure for cases where we
# get a socket from a blocking source.
sock.setblocking(0)
self.set_socket(sock, map)
self.connected = True
# The constructor no longer requires that the socket
# passed be connected.
try:
self.addr = sock.getpeername()
except socket.error, err:
if err.args[0] in (ENOTCONN, EINVAL):
# To handle the case where we got an unconnected
# socket.
self.connected = False
else:
# The socket is broken in some unknown way, alert
# the user and remove it from the map (to prevent
# polling of broken sockets).
self.del_channel(map)
raise
else:
self.socket = None
def __repr__(self):
status = [self.__class__.__module__+"."+self.__class__.__name__]
if self.accepting and self.addr:
status.append('listening')
elif self.connected:
status.append('connected')
if self.addr is not None:
try:
status.append('%s:%d' % self.addr)
except TypeError:
status.append(repr(self.addr))
return '<%s at %#x>' % (' '.join(status), id(self))
__str__ = __repr__
def add_channel(self, map=None):
#self.log_info('adding channel %s' % self)
if map is None:
map = self._map
map[self._fileno] = self
def del_channel(self, map=None):
fd = self._fileno
if map is None:
map = self._map
if fd in map:
#self.log_info('closing channel %d:%s' % (fd, self))
del map[fd]
self._fileno = None
def create_socket(self, family, type):
self.family_and_type = family, type
sock = socket.socket(family, type)
sock.setblocking(0)
self.set_socket(sock)
def set_socket(self, sock, map=None):
self.socket = sock
## self.__dict__['socket'] = sock
self._fileno = sock.fileno()
self.add_channel(map)
def set_reuse_addr(self):
# try to re-use a server port if possible
try:
self.socket.setsockopt(
socket.SOL_SOCKET, socket.SO_REUSEADDR,
self.socket.getsockopt(socket.SOL_SOCKET,
socket.SO_REUSEADDR) | 1
)
except socket.error:
pass
# ==================================================
# predicates for select()
# these are used as filters for the lists of sockets
# to pass to select().
# ==================================================
def readable(self):
return True
def writable(self):
return True
# ==================================================
# socket object methods.
# ==================================================
def listen(self, num):
self.accepting = True
if os.name == 'nt' and num > 5:
num = 5
return self.socket.listen(num)
def bind(self, addr):
self.addr = addr
return self.socket.bind(addr)
def connect(self, address):
self.connected = False
self.connecting = True
err = self.socket.connect_ex(address)
if err in (EINPROGRESS, EALREADY, EWOULDBLOCK) \
or err == EINVAL and os.name in ('nt', 'ce'):
self.addr = address
return
if err in (0, EISCONN):
self.addr = address
self.handle_connect_event()
else:
raise socket.error(err, errorcode[err])
def accept(self):
# XXX can return either an address pair or None
try:
conn, addr = self.socket.accept()
except TypeError:
return None
except socket.error as why:
if why.args[0] in (EWOULDBLOCK, ECONNABORTED, EAGAIN):
return None
else:
raise
else:
return conn, addr
def send(self, data):
try:
result = self.socket.send(data)
return result
except socket.error, why:
if why.args[0] == EWOULDBLOCK:
return 0
elif why.args[0] in _DISCONNECTED:
self.handle_close()
return 0
else:
raise
def recv(self, buffer_size):
try:
data = self.socket.recv(buffer_size)
if not data:
# a closed connection is indicated by signaling
# a read condition, and having recv() return 0.
self.handle_close()
return ''
else:
return data
except socket.error, why:
# winsock sometimes raises ENOTCONN
if why.args[0] in _DISCONNECTED:
self.handle_close()
return ''
else:
raise
def close(self):
self.connected = False
self.accepting = False
self.connecting = False
self.del_channel()
try:
self.socket.close()
except socket.error, why:
if why.args[0] not in (ENOTCONN, EBADF):
raise
# cheap inheritance, used to pass all other attribute
# references to the underlying socket object.
def __getattr__(self, attr):
try:
retattr = getattr(self.socket, attr)
except AttributeError:
raise AttributeError("%s instance has no attribute '%s'"
%(self.__class__.__name__, attr))
else:
msg = "%(me)s.%(attr)s is deprecated. Use %(me)s.socket.%(attr)s " \
"instead." % {'me': self.__class__.__name__, 'attr':attr}
warnings.warn(msg, DeprecationWarning, stacklevel=2)
return retattr
# log and log_info may be overridden to provide more sophisticated
# logging and warning methods. In general, log is for 'hit' logging
# and 'log_info' is for informational, warning and error logging.
def log(self, message):
sys.stderr.write('log: %s\n' % str(message))
def log_info(self, message, type='info'):
if type not in self.ignore_log_types:
print '%s: %s' % (type, message)
def handle_read_event(self):
if self.accepting:
# accepting sockets are never connected, they "spawn" new
# sockets that are connected
self.handle_accept()
elif not self.connected:
if self.connecting:
self.handle_connect_event()
self.handle_read()
else:
self.handle_read()
def handle_connect_event(self):
err = self.socket.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
if err != 0:
raise socket.error(err, _strerror(err))
self.handle_connect()
self.connected = True
self.connecting = False
def handle_write_event(self):
if self.accepting:
# Accepting sockets shouldn't get a write event.
# We will pretend it didn't happen.
return
if not self.connected:
if self.connecting:
self.handle_connect_event()
self.handle_write()
def handle_expt_event(self):
# handle_expt_event() is called if there might be an error on the
# socket, or if there is OOB data
# check for the error condition first
err = self.socket.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
if err != 0:
# we can get here when select.select() says that there is an
# exceptional condition on the socket
# since there is an error, we'll go ahead and close the socket
# like we would in a subclassed handle_read() that received no
# data
self.handle_close()
else:
self.handle_expt()
def handle_error(self):
nil, t, v, tbinfo = compact_traceback()
# sometimes a user repr method will crash.
try:
self_repr = repr(self)
except:
self_repr = '<__repr__(self) failed for object at %0x>' % id(self)
self.log_info(
'uncaptured python exception, closing channel %s (%s:%s %s)' % (
self_repr,
t,
v,
tbinfo
),
'error'
)
self.handle_close()
def handle_expt(self):
self.log_info('unhandled incoming priority event', 'warning')
def handle_read(self):
self.log_info('unhandled read event', 'warning')
def handle_write(self):
self.log_info('unhandled write event', 'warning')
def handle_connect(self):
self.log_info('unhandled connect event', 'warning')
def handle_accept(self):
self.log_info('unhandled accept event', 'warning')
def handle_close(self):
self.log_info('unhandled close event', 'warning')
self.close()
# ---------------------------------------------------------------------------
# adds simple buffered output capability, useful for simple clients.
# [for more sophisticated usage use asynchat.async_chat]
# ---------------------------------------------------------------------------
class dispatcher_with_send(dispatcher):
def __init__(self, sock=None, map=None):
dispatcher.__init__(self, sock, map)
self.out_buffer = ''
def initiate_send(self):
num_sent = 0
num_sent = dispatcher.send(self, self.out_buffer[:512])
self.out_buffer = self.out_buffer[num_sent:]
def handle_write(self):
self.initiate_send()
def writable(self):
return (not self.connected) or len(self.out_buffer)
def send(self, data):
if self.debug:
self.log_info('sending %s' % repr(data))
self.out_buffer = self.out_buffer + data
self.initiate_send()
# ---------------------------------------------------------------------------
# used for debugging.
# ---------------------------------------------------------------------------
def compact_traceback():
t, v, tb = sys.exc_info()
tbinfo = []
if not tb: # Must have a traceback
raise AssertionError("traceback does not exist")
while tb:
tbinfo.append((
tb.tb_frame.f_code.co_filename,
tb.tb_frame.f_code.co_name,
str(tb.tb_lineno)
))
tb = tb.tb_next
# just to be safe
del tb
file, function, line = tbinfo[-1]
info = ' '.join(['[%s|%s|%s]' % x for x in tbinfo])
return (file, function, line), t, v, info
def close_all(map=None, ignore_all=False):
if map is None:
map = socket_map
for x in map.values():
try:
x.close()
except OSError, x:
if x.args[0] == EBADF:
pass
elif not ignore_all:
raise
except _reraised_exceptions:
raise
except:
if not ignore_all:
raise
map.clear()
# Asynchronous File I/O:
#
# After a little research (reading man pages on various unixen, and
# digging through the linux kernel), I've determined that select()
# isn't meant for doing asynchronous file i/o.
# Heartening, though - reading linux/mm/filemap.c shows that linux
# supports asynchronous read-ahead. So _MOST_ of the time, the data
# will be sitting in memory for us already when we go to read it.
#
# What other OS's (besides NT) support async file i/o? [VMS?]
#
# Regardless, this is useful for pipes, and stdin/stdout...
if os.name == 'posix':
import fcntl
class file_wrapper:
# Here we override just enough to make a file
# look like a socket for the purposes of asyncore.
# The passed fd is automatically os.dup()'d
def __init__(self, fd):
self.fd = os.dup(fd)
def recv(self, *args):
return os.read(self.fd, *args)
def send(self, *args):
return os.write(self.fd, *args)
def getsockopt(self, level, optname, buflen=None):
if (level == socket.SOL_SOCKET and
optname == socket.SO_ERROR and
not buflen):
return 0
raise NotImplementedError("Only asyncore specific behaviour "
"implemented.")
read = recv
write = send
def close(self):
if self.fd < 0:
return
fd = self.fd
self.fd = -1
os.close(fd)
def fileno(self):
return self.fd
class file_dispatcher(dispatcher):
def __init__(self, fd, map=None):
dispatcher.__init__(self, None, map)
self.connected = True
try:
fd = fd.fileno()
except AttributeError:
pass
self.set_file(fd)
# set it to non-blocking mode
flags = fcntl.fcntl(fd, fcntl.F_GETFL, 0)
flags = flags | os.O_NONBLOCK
fcntl.fcntl(fd, fcntl.F_SETFL, flags)
def set_file(self, fd):
self.socket = file_wrapper(fd)
self._fileno = self.socket.fileno()
self.add_channel()
"""
atexit.py - allow programmer to define multiple exit functions to be executed
upon normal program termination.
One public function, register, is defined.
"""
__all__ = ["register"]
import sys
_exithandlers = []
def _run_exitfuncs():
"""run any registered exit functions
_exithandlers is traversed in reverse order so functions are executed
last in, first out.
"""
exc_info = None
while _exithandlers:
func, targs, kargs = _exithandlers.pop()
try:
func(*targs, **kargs)
except SystemExit:
exc_info = sys.exc_info()
except:
import traceback
print >> sys.stderr, "Error in atexit._run_exitfuncs:"
traceback.print_exc()
exc_info = sys.exc_info()
if exc_info is not None:
raise exc_info[0], exc_info[1], exc_info[2]
def register(func, *targs, **kargs):
"""register a function to be executed upon normal program termination
func - function to be called at exit
targs - optional arguments to pass to func
kargs - optional keyword arguments to pass to func
func is returned to facilitate usage as a decorator.
"""
_exithandlers.append((func, targs, kargs))
return func
if hasattr(sys, "exitfunc"):
# Assume it's another registered exit function - append it to our list
register(sys.exitfunc)
sys.exitfunc = _run_exitfuncs
if __name__ == "__main__":
def x1():
print "running x1"
def x2(n):
print "running x2(%r)" % (n,)
def x3(n, kwd=None):
print "running x3(%r, kwd=%r)" % (n, kwd)
register(x1)
register(x2, 12)
register(x3, 5, "bar")
register(x3, "no kwd args")
"""Classes for manipulating audio devices (currently only for Sun and SGI)"""
from warnings import warnpy3k
warnpy3k("the audiodev module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
__all__ = ["error","AudioDev"]
class error(Exception):
pass
class Play_Audio_sgi:
# Private instance variables
## if 0: access frameratelist, nchannelslist, sampwidthlist, oldparams, \
## params, config, inited_outrate, inited_width, \
## inited_nchannels, port, converter, classinited: private
classinited = 0
frameratelist = nchannelslist = sampwidthlist = None
def initclass(self):
import AL
self.frameratelist = [
(48000, AL.RATE_48000),
(44100, AL.RATE_44100),
(32000, AL.RATE_32000),
(22050, AL.RATE_22050),
(16000, AL.RATE_16000),
(11025, AL.RATE_11025),
( 8000, AL.RATE_8000),
]
self.nchannelslist = [
(1, AL.MONO),
(2, AL.STEREO),
(4, AL.QUADRO),
]
self.sampwidthlist = [
(1, AL.SAMPLE_8),
(2, AL.SAMPLE_16),
(3, AL.SAMPLE_24),
]
self.classinited = 1
def __init__(self):
import al, AL
if not self.classinited:
self.initclass()
self.oldparams = []
self.params = [AL.OUTPUT_RATE, 0]
self.config = al.newconfig()
self.inited_outrate = 0
self.inited_width = 0
self.inited_nchannels = 0
self.converter = None
self.port = None
return
def __del__(self):
if self.port:
self.stop()
if self.oldparams:
import al, AL
al.setparams(AL.DEFAULT_DEVICE, self.oldparams)
self.oldparams = []
def wait(self):
if not self.port:
return
import time
while self.port.getfilled() > 0:
time.sleep(0.1)
self.stop()
def stop(self):
if self.port:
self.port.closeport()
self.port = None
if self.oldparams:
import al, AL
al.setparams(AL.DEFAULT_DEVICE, self.oldparams)
self.oldparams = []
def setoutrate(self, rate):
for (raw, cooked) in self.frameratelist:
if rate == raw:
self.params[1] = cooked
self.inited_outrate = 1
break
else:
raise error, 'bad output rate'
def setsampwidth(self, width):
for (raw, cooked) in self.sampwidthlist:
if width == raw:
self.config.setwidth(cooked)
self.inited_width = 1
break
else:
if width == 0:
import AL
self.inited_width = 0
self.config.setwidth(AL.SAMPLE_16)
self.converter = self.ulaw2lin
else:
raise error, 'bad sample width'
def setnchannels(self, nchannels):
for (raw, cooked) in self.nchannelslist:
if nchannels == raw:
self.config.setchannels(cooked)
self.inited_nchannels = 1
break
else:
raise error, 'bad # of channels'
def writeframes(self, data):
if not (self.inited_outrate and self.inited_nchannels):
raise error, 'params not specified'
if not self.port:
import al, AL
self.port = al.openport('Python', 'w', self.config)
self.oldparams = self.params[:]
al.getparams(AL.DEFAULT_DEVICE, self.oldparams)
al.setparams(AL.DEFAULT_DEVICE, self.params)
if self.converter:
data = self.converter(data)
self.port.writesamps(data)
def getfilled(self):
if self.port:
return self.port.getfilled()
else:
return 0
def getfillable(self):
if self.port:
return self.port.getfillable()
else:
return self.config.getqueuesize()
# private methods
## if 0: access *: private
def ulaw2lin(self, data):
import audioop
return audioop.ulaw2lin(data, 2)
class Play_Audio_sun:
## if 0: access outrate, sampwidth, nchannels, inited_outrate, inited_width, \
## inited_nchannels, converter: private
def __init__(self):
self.outrate = 0
self.sampwidth = 0
self.nchannels = 0
self.inited_outrate = 0
self.inited_width = 0
self.inited_nchannels = 0
self.converter = None
self.port = None
return
def __del__(self):
self.stop()
def setoutrate(self, rate):
self.outrate = rate
self.inited_outrate = 1
def setsampwidth(self, width):
self.sampwidth = width
self.inited_width = 1
def setnchannels(self, nchannels):
self.nchannels = nchannels
self.inited_nchannels = 1
def writeframes(self, data):
if not (self.inited_outrate and self.inited_width and self.inited_nchannels):
raise error, 'params not specified'
if not self.port:
import sunaudiodev, SUNAUDIODEV
self.port = sunaudiodev.open('w')
info = self.port.getinfo()
info.o_sample_rate = self.outrate
info.o_channels = self.nchannels
if self.sampwidth == 0:
info.o_precision = 8
self.o_encoding = SUNAUDIODEV.ENCODING_ULAW
# XXX Hack, hack -- leave defaults
else:
info.o_precision = 8 * self.sampwidth
info.o_encoding = SUNAUDIODEV.ENCODING_LINEAR
self.port.setinfo(info)
if self.converter:
data = self.converter(data)
self.port.write(data)
def wait(self):
if not self.port:
return
self.port.drain()
self.stop()
def stop(self):
if self.port:
self.port.flush()
self.port.close()
self.port = None
def getfilled(self):
if self.port:
return self.port.obufcount()
else:
return 0
## # Nobody remembers what this method does, and it's broken. :-(
## def getfillable(self):
## return BUFFERSIZE - self.getfilled()
def AudioDev():
# Dynamically try to import and use a platform specific module.
try:
import al
except ImportError:
try:
import sunaudiodev
return Play_Audio_sun()
except ImportError:
try:
import Audio_mac
except ImportError:
raise error, 'no audio device'
else:
return Audio_mac.Play_Audio_mac()
else:
return Play_Audio_sgi()
def test(fn = None):
import sys
if sys.argv[1:]:
fn = sys.argv[1]
else:
fn = 'f:just samples:just.aif'
import aifc
af = aifc.open(fn, 'r')
print fn, af.getparams()
p = AudioDev()
p.setoutrate(af.getframerate())
p.setsampwidth(af.getsampwidth())
p.setnchannels(af.getnchannels())
BUFSIZ = af.getframerate()/af.getsampwidth()/af.getnchannels()
while 1:
data = af.readframes(BUFSIZ)
if not data: break
print len(data)
p.writeframes(data)
p.wait()
if __name__ == '__main__':
test()
#! /usr/bin/env python
"""RFC 3548: Base16, Base32, Base64 Data Encodings"""
# Modified 04-Oct-1995 by Jack Jansen to use binascii module
# Modified 30-Dec-2003 by Barry Warsaw to add full RFC 3548 support
import re
import struct
import string
import binascii
__all__ = [
# Legacy interface exports traditional RFC 1521 Base64 encodings
'encode', 'decode', 'encodestring', 'decodestring',
# Generalized interface for other encodings
'b64encode', 'b64decode', 'b32encode', 'b32decode',
'b16encode', 'b16decode',
# Standard Base64 encoding
'standard_b64encode', 'standard_b64decode',
# Some common Base64 alternatives. As referenced by RFC 3458, see thread
# starting at:
#
# http://zgp.org/pipermail/p2p-hackers/2001-September/000316.html
'urlsafe_b64encode', 'urlsafe_b64decode',
]
_translation = [chr(_x) for _x in range(256)]
EMPTYSTRING = ''
def _translate(s, altchars):
translation = _translation[:]
for k, v in altchars.items():
translation[ord(k)] = v
return s.translate(''.join(translation))
# Base64 encoding/decoding uses binascii
def b64encode(s, altchars=None):
"""Encode a string using Base64.
s is the string to encode. Optional altchars must be a string of at least
length 2 (additional characters are ignored) which specifies an
alternative alphabet for the '+' and '/' characters. This allows an
application to e.g. generate url or filesystem safe Base64 strings.
The encoded string is returned.
"""
# Strip off the trailing newline
encoded = binascii.b2a_base64(s)[:-1]
if altchars is not None:
return encoded.translate(string.maketrans(b'+/', altchars[:2]))
return encoded
def b64decode(s, altchars=None):
"""Decode a Base64 encoded string.
s is the string to decode. Optional altchars must be a string of at least
length 2 (additional characters are ignored) which specifies the
alternative alphabet used instead of the '+' and '/' characters.
The decoded string is returned. A TypeError is raised if s is
incorrectly padded. Characters that are neither in the normal base-64
alphabet nor the alternative alphabet are discarded prior to the padding
check.
"""
if altchars is not None:
s = s.translate(string.maketrans(altchars[:2], '+/'))
try:
return binascii.a2b_base64(s)
except binascii.Error, msg:
# Transform this exception for consistency
raise TypeError(msg)
def standard_b64encode(s):
"""Encode a string using the standard Base64 alphabet.
s is the string to encode. The encoded string is returned.
"""
return b64encode(s)
def standard_b64decode(s):
"""Decode a string encoded with the standard Base64 alphabet.
Argument s is the string to decode. The decoded string is returned. A
TypeError is raised if the string is incorrectly padded. Characters that
are not in the standard alphabet are discarded prior to the padding
check.
"""
return b64decode(s)
_urlsafe_encode_translation = string.maketrans(b'+/', b'-_')
_urlsafe_decode_translation = string.maketrans(b'-_', b'+/')
def urlsafe_b64encode(s):
"""Encode a string using the URL- and filesystem-safe Base64 alphabet.
Argument s is the string to encode. The encoded string is returned. The
alphabet uses '-' instead of '+' and '_' instead of '/'.
"""
return b64encode(s).translate(_urlsafe_encode_translation)
def urlsafe_b64decode(s):
"""Decode a string using the URL- and filesystem-safe Base64 alphabet.
Argument s is the string to decode. The decoded string is returned. A
TypeError is raised if the string is incorrectly padded. Characters that
are not in the URL-safe base-64 alphabet, and are not a plus '+' or slash
'/', are discarded prior to the padding check.
The alphabet uses '-' instead of '+' and '_' instead of '/'.
"""
return b64decode(s.translate(_urlsafe_decode_translation))
# Base32 encoding/decoding must be done in Python
_b32alphabet = {
0: 'A', 9: 'J', 18: 'S', 27: '3',
1: 'B', 10: 'K', 19: 'T', 28: '4',
2: 'C', 11: 'L', 20: 'U', 29: '5',
3: 'D', 12: 'M', 21: 'V', 30: '6',
4: 'E', 13: 'N', 22: 'W', 31: '7',
5: 'F', 14: 'O', 23: 'X',
6: 'G', 15: 'P', 24: 'Y',
7: 'H', 16: 'Q', 25: 'Z',
8: 'I', 17: 'R', 26: '2',
}
_b32tab = _b32alphabet.items()
_b32tab.sort()
_b32tab = [v for k, v in _b32tab]
_b32rev = dict([(v, long(k)) for k, v in _b32alphabet.items()])
def b32encode(s):
"""Encode a string using Base32.
s is the string to encode. The encoded string is returned.
"""
parts = []
quanta, leftover = divmod(len(s), 5)
# Pad the last quantum with zero bits if necessary
if leftover:
s += ('\0' * (5 - leftover))
quanta += 1
for i in range(quanta):
# c1 and c2 are 16 bits wide, c3 is 8 bits wide. The intent of this
# code is to process the 40 bits in units of 5 bits. So we take the 1
# leftover bit of c1 and tack it onto c2. Then we take the 2 leftover
# bits of c2 and tack them onto c3. The shifts and masks are intended
# to give us values of exactly 5 bits in width.
c1, c2, c3 = struct.unpack('!HHB', s[i*5:(i+1)*5])
c2 += (c1 & 1) << 16 # 17 bits wide
c3 += (c2 & 3) << 8 # 10 bits wide
parts.extend([_b32tab[c1 >> 11], # bits 1 - 5
_b32tab[(c1 >> 6) & 0x1f], # bits 6 - 10
_b32tab[(c1 >> 1) & 0x1f], # bits 11 - 15
_b32tab[c2 >> 12], # bits 16 - 20 (1 - 5)
_b32tab[(c2 >> 7) & 0x1f], # bits 21 - 25 (6 - 10)
_b32tab[(c2 >> 2) & 0x1f], # bits 26 - 30 (11 - 15)
_b32tab[c3 >> 5], # bits 31 - 35 (1 - 5)
_b32tab[c3 & 0x1f], # bits 36 - 40 (1 - 5)
])
encoded = EMPTYSTRING.join(parts)
# Adjust for any leftover partial quanta
if leftover == 1:
return encoded[:-6] + '======'
elif leftover == 2:
return encoded[:-4] + '===='
elif leftover == 3:
return encoded[:-3] + '==='
elif leftover == 4:
return encoded[:-1] + '='
return encoded
def b32decode(s, casefold=False, map01=None):
"""Decode a Base32 encoded string.
s is the string to decode. Optional casefold is a flag specifying whether
a lowercase alphabet is acceptable as input. For security purposes, the
default is False.
RFC 3548 allows for optional mapping of the digit 0 (zero) to the letter O
(oh), and for optional mapping of the digit 1 (one) to either the letter I
(eye) or letter L (el). The optional argument map01 when not None,
specifies which letter the digit 1 should be mapped to (when map01 is not
None, the digit 0 is always mapped to the letter O). For security
purposes the default is None, so that 0 and 1 are not allowed in the
input.
The decoded string is returned. A TypeError is raised if s were
incorrectly padded or if there are non-alphabet characters present in the
string.
"""
quanta, leftover = divmod(len(s), 8)
if leftover:
raise TypeError('Incorrect padding')
# Handle section 2.4 zero and one mapping. The flag map01 will be either
# False, or the character to map the digit 1 (one) to. It should be
# either L (el) or I (eye).
if map01:
s = s.translate(string.maketrans(b'01', b'O' + map01))
if casefold:
s = s.upper()
# Strip off pad characters from the right. We need to count the pad
# characters because this will tell us how many null bytes to remove from
# the end of the decoded string.
padchars = 0
mo = re.search('(?P<pad>[=]*)$', s)
if mo:
padchars = len(mo.group('pad'))
if padchars > 0:
s = s[:-padchars]
# Now decode the full quanta
parts = []
acc = 0
shift = 35
for c in s:
val = _b32rev.get(c)
if val is None:
raise TypeError('Non-base32 digit found')
acc += _b32rev[c] << shift
shift -= 5
if shift < 0:
parts.append(binascii.unhexlify('%010x' % acc))
acc = 0
shift = 35
# Process the last, partial quanta
last = binascii.unhexlify('%010x' % acc)
if padchars == 0:
last = '' # No characters
elif padchars == 1:
last = last[:-1]
elif padchars == 3:
last = last[:-2]
elif padchars == 4:
last = last[:-3]
elif padchars == 6:
last = last[:-4]
else:
raise TypeError('Incorrect padding')
parts.append(last)
return EMPTYSTRING.join(parts)
# RFC 3548, Base 16 Alphabet specifies uppercase, but hexlify() returns
# lowercase. The RFC also recommends against accepting input case
# insensitively.
def b16encode(s):
"""Encode a string using Base16.
s is the string to encode. The encoded string is returned.
"""
return binascii.hexlify(s).upper()
def b16decode(s, casefold=False):
"""Decode a Base16 encoded string.
s is the string to decode. Optional casefold is a flag specifying whether
a lowercase alphabet is acceptable as input. For security purposes, the
default is False.
The decoded string is returned. A TypeError is raised if s is
incorrectly padded or if there are non-alphabet characters present in the
string.
"""
if casefold:
s = s.upper()
if re.search('[^0-9A-F]', s):
raise TypeError('Non-base16 digit found')
return binascii.unhexlify(s)
# Legacy interface. This code could be cleaned up since I don't believe
# binascii has any line length limitations. It just doesn't seem worth it
# though.
MAXLINESIZE = 76 # Excluding the CRLF
MAXBINSIZE = (MAXLINESIZE//4)*3
def encode(input, output):
"""Encode a file."""
while True:
s = input.read(MAXBINSIZE)
if not s:
break
while len(s) < MAXBINSIZE:
ns = input.read(MAXBINSIZE-len(s))
if not ns:
break
s += ns
line = binascii.b2a_base64(s)
output.write(line)
def decode(input, output):
"""Decode a file."""
while True:
line = input.readline()
if not line:
break
s = binascii.a2b_base64(line)
output.write(s)
def encodestring(s):
"""Encode a string into multiple lines of base-64 data."""
pieces = []
for i in range(0, len(s), MAXBINSIZE):
chunk = s[i : i + MAXBINSIZE]
pieces.append(binascii.b2a_base64(chunk))
return "".join(pieces)
def decodestring(s):
"""Decode a string."""
return binascii.a2b_base64(s)
# Useable as a script...
def test():
"""Small test program"""
import sys, getopt
try:
opts, args = getopt.getopt(sys.argv[1:], 'deut')
except getopt.error, msg:
sys.stdout = sys.stderr
print msg
print """usage: %s [-d|-e|-u|-t] [file|-]
-d, -u: decode
-e: encode (default)
-t: encode and decode string 'Aladdin:open sesame'"""%sys.argv[0]
sys.exit(2)
func = encode
for o, a in opts:
if o == '-e': func = encode
if o == '-d': func = decode
if o == '-u': func = decode
if o == '-t': test1(); return
if args and args[0] != '-':
with open(args[0], 'rb') as f:
func(f, sys.stdout)
else:
func(sys.stdin, sys.stdout)
def test1():
s0 = "Aladdin:open sesame"
s1 = encodestring(s0)
s2 = decodestring(s1)
print s0, repr(s1), s2
if __name__ == '__main__':
test()
"""HTTP server base class.
Note: the class in this module doesn't implement any HTTP request; see
SimpleHTTPServer for simple implementations of GET, HEAD and POST
(including CGI scripts). It does, however, optionally implement HTTP/1.1
persistent connections, as of version 0.3.
Contents:
- BaseHTTPRequestHandler: HTTP request handler base class
- test: test function
XXX To do:
- log requests even later (to capture byte count)
- log user-agent header and other interesting goodies
- send error log to separate file
"""
# See also:
#
# HTTP Working Group T. Berners-Lee
# INTERNET-DRAFT R. T. Fielding
# <draft-ietf-http-v10-spec-00.txt> H. Frystyk Nielsen
# Expires September 8, 1995 March 8, 1995
#
# URL: http://www.ics.uci.edu/pub/ietf/http/draft-ietf-http-v10-spec-00.txt
#
# and
#
# Network Working Group R. Fielding
# Request for Comments: 2616 et al
# Obsoletes: 2068 June 1999
# Category: Standards Track
#
# URL: http://www.faqs.org/rfcs/rfc2616.html
# Log files
# ---------
#
# Here's a quote from the NCSA httpd docs about log file format.
#
# | The logfile format is as follows. Each line consists of:
# |
# | host rfc931 authuser [DD/Mon/YYYY:hh:mm:ss] "request" ddd bbbb
# |
# | host: Either the DNS name or the IP number of the remote client
# | rfc931: Any information returned by identd for this person,
# | - otherwise.
# | authuser: If user sent a userid for authentication, the user name,
# | - otherwise.
# | DD: Day
# | Mon: Month (calendar name)
# | YYYY: Year
# | hh: hour (24-hour format, the machine's timezone)
# | mm: minutes
# | ss: seconds
# | request: The first line of the HTTP request as sent by the client.
# | ddd: the status code returned by the server, - if not available.
# | bbbb: the total number of bytes sent,
# | *not including the HTTP/1.0 header*, - if not available
# |
# | You can determine the name of the file accessed through request.
#
# (Actually, the latter is only true if you know the server configuration
# at the time the request was made!)
__version__ = "0.3"
__all__ = ["HTTPServer", "BaseHTTPRequestHandler"]
import sys
import time
import socket # For gethostbyaddr()
from warnings import filterwarnings, catch_warnings
with catch_warnings():
if sys.py3kwarning:
filterwarnings("ignore", ".*mimetools has been removed",
DeprecationWarning)
import mimetools
import SocketServer
# Default error message template
DEFAULT_ERROR_MESSAGE = """\
<head>
<title>Error response</title>
</head>
<body>
<h1>Error response</h1>
<p>Error code %(code)d.
<p>Message: %(message)s.
<p>Error code explanation: %(code)s = %(explain)s.
</body>
"""
DEFAULT_ERROR_CONTENT_TYPE = "text/html"
def _quote_html(html):
return html.replace("&", "&").replace("<", "<").replace(">", ">")
class HTTPServer(SocketServer.TCPServer):
allow_reuse_address = 1 # Seems to make sense in testing environment
def server_bind(self):
"""Override server_bind to store the server name."""
SocketServer.TCPServer.server_bind(self)
host, port = self.socket.getsockname()[:2]
self.server_name = socket.getfqdn(host)
self.server_port = port
class BaseHTTPRequestHandler(SocketServer.StreamRequestHandler):
"""HTTP request handler base class.
The following explanation of HTTP serves to guide you through the
code as well as to expose any misunderstandings I may have about
HTTP (so you don't need to read the code to figure out I'm wrong
:-).
HTTP (HyperText Transfer Protocol) is an extensible protocol on
top of a reliable stream transport (e.g. TCP/IP). The protocol
recognizes three parts to a request:
1. One line identifying the request type and path
2. An optional set of RFC-822-style headers
3. An optional data part
The headers and data are separated by a blank line.
The first line of the request has the form
<command> <path> <version>
where <command> is a (case-sensitive) keyword such as GET or POST,
<path> is a string containing path information for the request,
and <version> should be the string "HTTP/1.0" or "HTTP/1.1".
<path> is encoded using the URL encoding scheme (using %xx to signify
the ASCII character with hex code xx).
The specification specifies that lines are separated by CRLF but
for compatibility with the widest range of clients recommends
servers also handle LF. Similarly, whitespace in the request line
is treated sensibly (allowing multiple spaces between components
and allowing trailing whitespace).
Similarly, for output, lines ought to be separated by CRLF pairs
but most clients grok LF characters just fine.
If the first line of the request has the form
<command> <path>
(i.e. <version> is left out) then this is assumed to be an HTTP
0.9 request; this form has no optional headers and data part and
the reply consists of just the data.
The reply form of the HTTP 1.x protocol again has three parts:
1. One line giving the response code
2. An optional set of RFC-822-style headers
3. The data
Again, the headers and data are separated by a blank line.
The response code line has the form
<version> <responsecode> <responsestring>
where <version> is the protocol version ("HTTP/1.0" or "HTTP/1.1"),
<responsecode> is a 3-digit response code indicating success or
failure of the request, and <responsestring> is an optional
human-readable string explaining what the response code means.
This server parses the request and the headers, and then calls a
function specific to the request type (<command>). Specifically,
a request SPAM will be handled by a method do_SPAM(). If no
such method exists the server sends an error response to the
client. If it exists, it is called with no arguments:
do_SPAM()
Note that the request name is case sensitive (i.e. SPAM and spam
are different requests).
The various request details are stored in instance variables:
- client_address is the client IP address in the form (host,
port);
- command, path and version are the broken-down request line;
- headers is an instance of mimetools.Message (or a derived
class) containing the header information;
- rfile is a file object open for reading positioned at the
start of the optional input data part;
- wfile is a file object open for writing.
IT IS IMPORTANT TO ADHERE TO THE PROTOCOL FOR WRITING!
The first thing to be written must be the response line. Then
follow 0 or more header lines, then a blank line, and then the
actual data (if any). The meaning of the header lines depends on
the command executed by the server; in most cases, when data is
returned, there should be at least one header line of the form
Content-type: <type>/<subtype>
where <type> and <subtype> should be registered MIME types,
e.g. "text/html" or "text/plain".
"""
# The Python system version, truncated to its first component.
sys_version = "Python/" + sys.version.split()[0]
# The server software version. You may want to override this.
# The format is multiple whitespace-separated strings,
# where each string is of the form name[/version].
server_version = "BaseHTTP/" + __version__
# The default request version. This only affects responses up until
# the point where the request line is parsed, so it mainly decides what
# the client gets back when sending a malformed request line.
# Most web servers default to HTTP 0.9, i.e. don't send a status line.
default_request_version = "HTTP/0.9"
def parse_request(self):
"""Parse a request (internal).
The request should be stored in self.raw_requestline; the results
are in self.command, self.path, self.request_version and
self.headers.
Return True for success, False for failure; on failure, an
error is sent back.
"""
self.command = None # set in case of error on the first line
self.request_version = version = self.default_request_version
self.close_connection = 1
requestline = self.raw_requestline
requestline = requestline.rstrip('\r\n')
self.requestline = requestline
words = requestline.split()
if len(words) == 3:
command, path, version = words
if version[:5] != 'HTTP/':
self.send_error(400, "Bad request version (%r)" % version)
return False
try:
base_version_number = version.split('/', 1)[1]
version_number = base_version_number.split(".")
# RFC 2145 section 3.1 says there can be only one "." and
# - major and minor numbers MUST be treated as
# separate integers;
# - HTTP/2.4 is a lower version than HTTP/2.13, which in
# turn is lower than HTTP/12.3;
# - Leading zeros MUST be ignored by recipients.
if len(version_number) != 2:
raise ValueError
version_number = int(version_number[0]), int(version_number[1])
except (ValueError, IndexError):
self.send_error(400, "Bad request version (%r)" % version)
return False
if version_number >= (1, 1) and self.protocol_version >= "HTTP/1.1":
self.close_connection = 0
if version_number >= (2, 0):
self.send_error(505,
"Invalid HTTP Version (%s)" % base_version_number)
return False
elif len(words) == 2:
command, path = words
self.close_connection = 1
if command != 'GET':
self.send_error(400,
"Bad HTTP/0.9 request type (%r)" % command)
return False
elif not words:
return False
else:
self.send_error(400, "Bad request syntax (%r)" % requestline)
return False
self.command, self.path, self.request_version = command, path, version
# Examine the headers and look for a Connection directive
self.headers = self.MessageClass(self.rfile, 0)
conntype = self.headers.get('Connection', "")
if conntype.lower() == 'close':
self.close_connection = 1
elif (conntype.lower() == 'keep-alive' and
self.protocol_version >= "HTTP/1.1"):
self.close_connection = 0
return True
def handle_one_request(self):
"""Handle a single HTTP request.
You normally don't need to override this method; see the class
__doc__ string for information on how to handle specific HTTP
commands such as GET and POST.
"""
try:
self.raw_requestline = self.rfile.readline(65537)
if len(self.raw_requestline) > 65536:
self.requestline = ''
self.request_version = ''
self.command = ''
self.send_error(414)
return
if not self.raw_requestline:
self.close_connection = 1
return
if not self.parse_request():
# An error code has been sent, just exit
return
mname = 'do_' + self.command
if not hasattr(self, mname):
self.send_error(501, "Unsupported method (%r)" % self.command)
return
method = getattr(self, mname)
method()
self.wfile.flush() #actually send the response if not already done.
except socket.timeout, e:
#a read or a write timed out. Discard this connection
self.log_error("Request timed out: %r", e)
self.close_connection = 1
return
def handle(self):
"""Handle multiple requests if necessary."""
self.close_connection = 1
self.handle_one_request()
while not self.close_connection:
self.handle_one_request()
def send_error(self, code, message=None):
"""Send and log an error reply.
Arguments are the error code, and a detailed message.
The detailed message defaults to the short entry matching the
response code.
This sends an error response (so it must be called before any
output has been generated), logs the error, and finally sends
a piece of HTML explaining the error to the user.
"""
try:
short, long = self.responses[code]
except KeyError:
short, long = '???', '???'
if message is None:
message = short
explain = long
self.log_error("code %d, message %s", code, message)
self.send_response(code, message)
self.send_header('Connection', 'close')
# Message body is omitted for cases described in:
# - RFC7230: 3.3. 1xx, 204(No Content), 304(Not Modified)
# - RFC7231: 6.3.6. 205(Reset Content)
content = None
if code >= 200 and code not in (204, 205, 304):
# HTML encode to prevent Cross Site Scripting attacks
# (see bug #1100201)
content = (self.error_message_format % {
'code': code,
'message': _quote_html(message),
'explain': explain
})
self.send_header("Content-Type", self.error_content_type)
self.end_headers()
if self.command != 'HEAD' and content:
self.wfile.write(content)
error_message_format = DEFAULT_ERROR_MESSAGE
error_content_type = DEFAULT_ERROR_CONTENT_TYPE
def send_response(self, code, message=None):
"""Send the response header and log the response code.
Also send two standard headers with the server software
version and the current date.
"""
self.log_request(code)
if message is None:
if code in self.responses:
message = self.responses[code][0]
else:
message = ''
if self.request_version != 'HTTP/0.9':
self.wfile.write("%s %d %s\r\n" %
(self.protocol_version, code, message))
# print (self.protocol_version, code, message)
self.send_header('Server', self.version_string())
self.send_header('Date', self.date_time_string())
def send_header(self, keyword, value):
"""Send a MIME header."""
if self.request_version != 'HTTP/0.9':
self.wfile.write("%s: %s\r\n" % (keyword, value))
if keyword.lower() == 'connection':
if value.lower() == 'close':
self.close_connection = 1
elif value.lower() == 'keep-alive':
self.close_connection = 0
def end_headers(self):
"""Send the blank line ending the MIME headers."""
if self.request_version != 'HTTP/0.9':
self.wfile.write("\r\n")
def log_request(self, code='-', size='-'):
"""Log an accepted request.
This is called by send_response().
"""
self.log_message('"%s" %s %s',
self.requestline, str(code), str(size))
def log_error(self, format, *args):
"""Log an error.
This is called when a request cannot be fulfilled. By
default it passes the message on to log_message().
Arguments are the same as for log_message().
XXX This should go to the separate error log.
"""
self.log_message(format, *args)
def log_message(self, format, *args):
"""Log an arbitrary message.
This is used by all other logging functions. Override
it if you have specific logging wishes.
The first argument, FORMAT, is a format string for the
message to be logged. If the format string contains
any % escapes requiring parameters, they should be
specified as subsequent arguments (it's just like
printf!).
The client ip address and current date/time are prefixed to every
message.
"""
sys.stderr.write("%s - - [%s] %s\n" %
(self.client_address[0],
self.log_date_time_string(),
format%args))
def version_string(self):
"""Return the server software version string."""
return self.server_version + ' ' + self.sys_version
def date_time_string(self, timestamp=None):
"""Return the current date and time formatted for a message header."""
if timestamp is None:
timestamp = time.time()
year, month, day, hh, mm, ss, wd, y, z = time.gmtime(timestamp)
s = "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (
self.weekdayname[wd],
day, self.monthname[month], year,
hh, mm, ss)
return s
def log_date_time_string(self):
"""Return the current time formatted for logging."""
now = time.time()
year, month, day, hh, mm, ss, x, y, z = time.localtime(now)
s = "%02d/%3s/%04d %02d:%02d:%02d" % (
day, self.monthname[month], year, hh, mm, ss)
return s
weekdayname = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
monthname = [None,
'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
def address_string(self):
"""Return the client address formatted for logging.
This version looks up the full hostname using gethostbyaddr(),
and tries to find a name that contains at least one dot.
"""
host, port = self.client_address[:2]
return socket.getfqdn(host)
# Essentially static class variables
# The version of the HTTP protocol we support.
# Set this to HTTP/1.1 to enable automatic keepalive
protocol_version = "HTTP/1.0"
# The Message-like class used to parse headers
MessageClass = mimetools.Message
# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
# See RFC 2616.
responses = {
100: ('Continue', 'Request received, please continue'),
101: ('Switching Protocols',
'Switching to new protocol; obey Upgrade header'),
200: ('OK', 'Request fulfilled, document follows'),
201: ('Created', 'Document created, URL follows'),
202: ('Accepted',
'Request accepted, processing continues off-line'),
203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
204: ('No Content', 'Request fulfilled, nothing follows'),
205: ('Reset Content', 'Clear input form for further input.'),
206: ('Partial Content', 'Partial content follows.'),
300: ('Multiple Choices',
'Object has several resources -- see URI list'),
301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
302: ('Found', 'Object moved temporarily -- see URI list'),
303: ('See Other', 'Object moved -- see Method and URL list'),
304: ('Not Modified',
'Document has not changed since given time'),
305: ('Use Proxy',
'You must use proxy specified in Location to access this '
'resource.'),
307: ('Temporary Redirect',
'Object moved temporarily -- see URI list'),
400: ('Bad Request',
'Bad request syntax or unsupported method'),
401: ('Unauthorized',
'No permission -- see authorization schemes'),
402: ('Payment Required',
'No payment -- see charging schemes'),
403: ('Forbidden',
'Request forbidden -- authorization will not help'),
404: ('Not Found', 'Nothing matches the given URI'),
405: ('Method Not Allowed',
'Specified method is invalid for this resource.'),
406: ('Not Acceptable', 'URI not available in preferred format.'),
407: ('Proxy Authentication Required', 'You must authenticate with '
'this proxy before proceeding.'),
408: ('Request Timeout', 'Request timed out; try again later.'),
409: ('Conflict', 'Request conflict.'),
410: ('Gone',
'URI no longer exists and has been permanently removed.'),
411: ('Length Required', 'Client must specify Content-Length.'),
412: ('Precondition Failed', 'Precondition in headers is false.'),
413: ('Request Entity Too Large', 'Entity is too large.'),
414: ('Request-URI Too Long', 'URI is too long.'),
415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
416: ('Requested Range Not Satisfiable',
'Cannot satisfy request range.'),
417: ('Expectation Failed',
'Expect condition could not be satisfied.'),
500: ('Internal Server Error', 'Server got itself in trouble'),
501: ('Not Implemented',
'Server does not support this operation'),
502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
503: ('Service Unavailable',
'The server cannot process the request due to a high load'),
504: ('Gateway Timeout',
'The gateway server did not receive a timely response'),
505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
}
def test(HandlerClass = BaseHTTPRequestHandler,
ServerClass = HTTPServer, protocol="HTTP/1.0"):
"""Test the HTTP request handler class.
This runs an HTTP server on port 8000 (or the first command line
argument).
"""
if sys.argv[1:]:
port = int(sys.argv[1])
else:
port = 8000
server_address = ('', port)
HandlerClass.protocol_version = protocol
httpd = ServerClass(server_address, HandlerClass)
sa = httpd.socket.getsockname()
print "Serving HTTP on", sa[0], "port", sa[1], "..."
httpd.serve_forever()
if __name__ == '__main__':
test()
"""Bastionification utility.
A bastion (for another object -- the 'original') is an object that has
the same methods as the original but does not give access to its
instance variables. Bastions have a number of uses, but the most
obvious one is to provide code executing in restricted mode with a
safe interface to an object implemented in unrestricted mode.
The bastionification routine has an optional second argument which is
a filter function. Only those methods for which the filter method
(called with the method name as argument) returns true are accessible.
The default filter method returns true unless the method name begins
with an underscore.
There are a number of possible implementations of bastions. We use a
'lazy' approach where the bastion's __getattr__() discipline does all
the work for a particular method the first time it is used. This is
usually fastest, especially if the user doesn't call all available
methods. The retrieved methods are stored as instance variables of
the bastion, so the overhead is only occurred on the first use of each
method.
Detail: the bastion class has a __repr__() discipline which includes
the repr() of the original object. This is precomputed when the
bastion is created.
"""
from warnings import warnpy3k
warnpy3k("the Bastion module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
__all__ = ["BastionClass", "Bastion"]
from types import MethodType
class BastionClass:
"""Helper class used by the Bastion() function.
You could subclass this and pass the subclass as the bastionclass
argument to the Bastion() function, as long as the constructor has
the same signature (a get() function and a name for the object).
"""
def __init__(self, get, name):
"""Constructor.
Arguments:
get - a function that gets the attribute value (by name)
name - a human-readable name for the original object
(suggestion: use repr(object))
"""
self._get_ = get
self._name_ = name
def __repr__(self):
"""Return a representation string.
This includes the name passed in to the constructor, so that
if you print the bastion during debugging, at least you have
some idea of what it is.
"""
return "<Bastion for %s>" % self._name_
def __getattr__(self, name):
"""Get an as-yet undefined attribute value.
This calls the get() function that was passed to the
constructor. The result is stored as an instance variable so
that the next time the same attribute is requested,
__getattr__() won't be invoked.
If the get() function raises an exception, this is simply
passed on -- exceptions are not cached.
"""
attribute = self._get_(name)
self.__dict__[name] = attribute
return attribute
def Bastion(object, filter = lambda name: name[:1] != '_',
name=None, bastionclass=BastionClass):
"""Create a bastion for an object, using an optional filter.
See the Bastion module's documentation for background.
Arguments:
object - the original object
filter - a predicate that decides whether a function name is OK;
by default all names are OK that don't start with '_'
name - the name of the object; default repr(object)
bastionclass - class used to create the bastion; default BastionClass
"""
raise RuntimeError, "This code is not secure in Python 2.2 and later"
# Note: we define *two* ad-hoc functions here, get1 and get2.
# Both are intended to be called in the same way: get(name).
# It is clear that the real work (getting the attribute
# from the object and calling the filter) is done in get1.
# Why can't we pass get1 to the bastion? Because the user
# would be able to override the filter argument! With get2,
# overriding the default argument is no security loophole:
# all it does is call it.
# Also notice that we can't place the object and filter as
# instance variables on the bastion object itself, since
# the user has full access to all instance variables!
def get1(name, object=object, filter=filter):
"""Internal function for Bastion(). See source comments."""
if filter(name):
attribute = getattr(object, name)
if type(attribute) == MethodType:
return attribute
raise AttributeError, name
def get2(name, get1=get1):
"""Internal function for Bastion(). See source comments."""
return get1(name)
if name is None:
name = repr(object)
return bastionclass(get2, name)
def _test():
"""Test the Bastion() function."""
class Original:
def __init__(self):
self.sum = 0
def add(self, n):
self._add(n)
def _add(self, n):
self.sum = self.sum + n
def total(self):
return self.sum
o = Original()
b = Bastion(o)
testcode = """if 1:
b.add(81)
b.add(18)
print "b.total() =", b.total()
try:
print "b.sum =", b.sum,
except:
print "inaccessible"
else:
print "accessible"
try:
print "b._add =", b._add,
except:
print "inaccessible"
else:
print "accessible"
try:
print "b._get_.func_defaults =", map(type, b._get_.func_defaults),
except:
print "inaccessible"
else:
print "accessible"
\n"""
exec testcode
print '='*20, "Using rexec:", '='*20
import rexec
r = rexec.RExec()
m = r.add_module('__main__')
m.b = b
r.r_exec(testcode)
if __name__ == '__main__':
_test()
"""Debugger basics"""
import fnmatch
import sys
import os
import types
__all__ = ["BdbQuit","Bdb","Breakpoint"]
class BdbQuit(Exception):
"""Exception to give up completely"""
class Bdb:
"""Generic Python debugger base class.
This class takes care of details of the trace facility;
a derived class should implement user interaction.
The standard debugger class (pdb.Pdb) is an example.
"""
def __init__(self, skip=None):
self.skip = set(skip) if skip else None
self.breaks = {}
self.fncache = {}
self.frame_returning = None
def canonic(self, filename):
if filename == "<" + filename[1:-1] + ">":
return filename
canonic = self.fncache.get(filename)
if not canonic:
canonic = os.path.abspath(filename)
canonic = os.path.normcase(canonic)
self.fncache[filename] = canonic
return canonic
def reset(self):
import linecache
linecache.checkcache()
self.botframe = None
self._set_stopinfo(None, None)
def trace_dispatch(self, frame, event, arg):
if self.quitting:
return # None
if event == 'line':
return self.dispatch_line(frame)
if event == 'call':
return self.dispatch_call(frame, arg)
if event == 'return':
return self.dispatch_return(frame, arg)
if event == 'exception':
return self.dispatch_exception(frame, arg)
if event == 'c_call':
return self.trace_dispatch
if event == 'c_exception':
return self.trace_dispatch
if event == 'c_return':
return self.trace_dispatch
print 'bdb.Bdb.dispatch: unknown debugging event:', repr(event)
return self.trace_dispatch
def dispatch_line(self, frame):
if self.stop_here(frame) or self.break_here(frame):
self.user_line(frame)
if self.quitting: raise BdbQuit
return self.trace_dispatch
def dispatch_call(self, frame, arg):
# XXX 'arg' is no longer used
if self.botframe is None:
# First call of dispatch since reset()
self.botframe = frame.f_back # (CT) Note that this may also be None!
return self.trace_dispatch
if not (self.stop_here(frame) or self.break_anywhere(frame)):
# No need to trace this function
return # None
self.user_call(frame, arg)
if self.quitting: raise BdbQuit
return self.trace_dispatch
def dispatch_return(self, frame, arg):
if self.stop_here(frame) or frame == self.returnframe:
try:
self.frame_returning = frame
self.user_return(frame, arg)
finally:
self.frame_returning = None
if self.quitting: raise BdbQuit
return self.trace_dispatch
def dispatch_exception(self, frame, arg):
if self.stop_here(frame):
self.user_exception(frame, arg)
if self.quitting: raise BdbQuit
return self.trace_dispatch
# Normally derived classes don't override the following
# methods, but they may if they want to redefine the
# definition of stopping and breakpoints.
def is_skipped_module(self, module_name):
for pattern in self.skip:
if fnmatch.fnmatch(module_name, pattern):
return True
return False
def stop_here(self, frame):
# (CT) stopframe may now also be None, see dispatch_call.
# (CT) the former test for None is therefore removed from here.
if self.skip and \
self.is_skipped_module(frame.f_globals.get('__name__')):
return False
if frame is self.stopframe:
if self.stoplineno == -1:
return False
return frame.f_lineno >= self.stoplineno
while frame is not None and frame is not self.stopframe:
if frame is self.botframe:
return True
frame = frame.f_back
return False
def break_here(self, frame):
filename = self.canonic(frame.f_code.co_filename)
if not filename in self.breaks:
return False
lineno = frame.f_lineno
if not lineno in self.breaks[filename]:
# The line itself has no breakpoint, but maybe the line is the
# first line of a function with breakpoint set by function name.
lineno = frame.f_code.co_firstlineno
if not lineno in self.breaks[filename]:
return False
# flag says ok to delete temp. bp
(bp, flag) = effective(filename, lineno, frame)
if bp:
self.currentbp = bp.number
if (flag and bp.temporary):
self.do_clear(str(bp.number))
return True
else:
return False
def do_clear(self, arg):
raise NotImplementedError, "subclass of bdb must implement do_clear()"
def break_anywhere(self, frame):
return self.canonic(frame.f_code.co_filename) in self.breaks
# Derived classes should override the user_* methods
# to gain control.
def user_call(self, frame, argument_list):
"""This method is called when there is the remote possibility
that we ever need to stop in this function."""
pass
def user_line(self, frame):
"""This method is called when we stop or break at this line."""
pass
def user_return(self, frame, return_value):
"""This method is called when a return trap is set here."""
pass
def user_exception(self, frame, exc_info):
exc_type, exc_value, exc_traceback = exc_info
"""This method is called if an exception occurs,
but only if we are to stop at or just below this level."""
pass
def _set_stopinfo(self, stopframe, returnframe, stoplineno=0):
self.stopframe = stopframe
self.returnframe = returnframe
self.quitting = 0
# stoplineno >= 0 means: stop at line >= the stoplineno
# stoplineno -1 means: don't stop at all
self.stoplineno = stoplineno
# Derived classes and clients can call the following methods
# to affect the stepping state.
def set_until(self, frame): #the name "until" is borrowed from gdb
"""Stop when the line with the line no greater than the current one is
reached or when returning from current frame"""
self._set_stopinfo(frame, frame, frame.f_lineno+1)
def set_step(self):
"""Stop after one line of code."""
# Issue #13183: pdb skips frames after hitting a breakpoint and running
# step commands.
# Restore the trace function in the caller (that may not have been set
# for performance reasons) when returning from the current frame.
if self.frame_returning:
caller_frame = self.frame_returning.f_back
if caller_frame and not caller_frame.f_trace:
caller_frame.f_trace = self.trace_dispatch
self._set_stopinfo(None, None)
def set_next(self, frame):
"""Stop on the next line in or below the given frame."""
self._set_stopinfo(frame, None)
def set_return(self, frame):
"""Stop when returning from the given frame."""
self._set_stopinfo(frame.f_back, frame)
def set_trace(self, frame=None):
"""Start debugging from `frame`.
If frame is not specified, debugging starts from caller's frame.
"""
if frame is None:
frame = sys._getframe().f_back
self.reset()
while frame:
frame.f_trace = self.trace_dispatch
self.botframe = frame
frame = frame.f_back
self.set_step()
sys.settrace(self.trace_dispatch)
def set_continue(self):
# Don't stop except at breakpoints or when finished
self._set_stopinfo(self.botframe, None, -1)
if not self.breaks:
# no breakpoints; run without debugger overhead
sys.settrace(None)
frame = sys._getframe().f_back
while frame and frame is not self.botframe:
del frame.f_trace
frame = frame.f_back
def set_quit(self):
self.stopframe = self.botframe
self.returnframe = None
self.quitting = 1
sys.settrace(None)
# Derived classes and clients can call the following methods
# to manipulate breakpoints. These methods return an
# error message is something went wrong, None if all is well.
# Set_break prints out the breakpoint line and file:lineno.
# Call self.get_*break*() to see the breakpoints or better
# for bp in Breakpoint.bpbynumber: if bp: bp.bpprint().
def set_break(self, filename, lineno, temporary=0, cond = None,
funcname=None):
filename = self.canonic(filename)
import linecache # Import as late as possible
line = linecache.getline(filename, lineno)
if not line:
return 'Line %s:%d does not exist' % (filename,
lineno)
if not filename in self.breaks:
self.breaks[filename] = []
list = self.breaks[filename]
if not lineno in list:
list.append(lineno)
bp = Breakpoint(filename, lineno, temporary, cond, funcname)
def _prune_breaks(self, filename, lineno):
if (filename, lineno) not in Breakpoint.bplist:
self.breaks[filename].remove(lineno)
if not self.breaks[filename]:
del self.breaks[filename]
def clear_break(self, filename, lineno):
filename = self.canonic(filename)
if not filename in self.breaks:
return 'There are no breakpoints in %s' % filename
if lineno not in self.breaks[filename]:
return 'There is no breakpoint at %s:%d' % (filename,
lineno)
# If there's only one bp in the list for that file,line
# pair, then remove the breaks entry
for bp in Breakpoint.bplist[filename, lineno][:]:
bp.deleteMe()
self._prune_breaks(filename, lineno)
def clear_bpbynumber(self, arg):
try:
number = int(arg)
except:
return 'Non-numeric breakpoint number (%s)' % arg
try:
bp = Breakpoint.bpbynumber[number]
except IndexError:
return 'Breakpoint number (%d) out of range' % number
if not bp:
return 'Breakpoint (%d) already deleted' % number
bp.deleteMe()
self._prune_breaks(bp.file, bp.line)
def clear_all_file_breaks(self, filename):
filename = self.canonic(filename)
if not filename in self.breaks:
return 'There are no breakpoints in %s' % filename
for line in self.breaks[filename]:
blist = Breakpoint.bplist[filename, line]
for bp in blist:
bp.deleteMe()
del self.breaks[filename]
def clear_all_breaks(self):
if not self.breaks:
return 'There are no breakpoints'
for bp in Breakpoint.bpbynumber:
if bp:
bp.deleteMe()
self.breaks = {}
def get_break(self, filename, lineno):
filename = self.canonic(filename)
return filename in self.breaks and \
lineno in self.breaks[filename]
def get_breaks(self, filename, lineno):
filename = self.canonic(filename)
return filename in self.breaks and \
lineno in self.breaks[filename] and \
Breakpoint.bplist[filename, lineno] or []
def get_file_breaks(self, filename):
filename = self.canonic(filename)
if filename in self.breaks:
return self.breaks[filename]
else:
return []
def get_all_breaks(self):
return self.breaks
# Derived classes and clients can call the following method
# to get a data structure representing a stack trace.
def get_stack(self, f, t):
stack = []
if t and t.tb_frame is f:
t = t.tb_next
while f is not None:
stack.append((f, f.f_lineno))
if f is self.botframe:
break
f = f.f_back
stack.reverse()
i = max(0, len(stack) - 1)
while t is not None:
stack.append((t.tb_frame, t.tb_lineno))
t = t.tb_next
if f is None:
i = max(0, len(stack) - 1)
return stack, i
#
def format_stack_entry(self, frame_lineno, lprefix=': '):
import linecache, repr
frame, lineno = frame_lineno
filename = self.canonic(frame.f_code.co_filename)
s = '%s(%r)' % (filename, lineno)
if frame.f_code.co_name:
s = s + frame.f_code.co_name
else:
s = s + "<lambda>"
if '__args__' in frame.f_locals:
args = frame.f_locals['__args__']
else:
args = None
if args:
s = s + repr.repr(args)
else:
s = s + '()'
if '__return__' in frame.f_locals:
rv = frame.f_locals['__return__']
s = s + '->'
s = s + repr.repr(rv)
line = linecache.getline(filename, lineno, frame.f_globals)
if line: s = s + lprefix + line.strip()
return s
# The following two methods can be called by clients to use
# a debugger to debug a statement, given as a string.
def run(self, cmd, globals=None, locals=None):
if globals is None:
import __main__
globals = __main__.__dict__
if locals is None:
locals = globals
self.reset()
sys.settrace(self.trace_dispatch)
if not isinstance(cmd, types.CodeType):
cmd = cmd+'\n'
try:
exec cmd in globals, locals
except BdbQuit:
pass
finally:
self.quitting = 1
sys.settrace(None)
def runeval(self, expr, globals=None, locals=None):
if globals is None:
import __main__
globals = __main__.__dict__
if locals is None:
locals = globals
self.reset()
sys.settrace(self.trace_dispatch)
if not isinstance(expr, types.CodeType):
expr = expr+'\n'
try:
return eval(expr, globals, locals)
except BdbQuit:
pass
finally:
self.quitting = 1
sys.settrace(None)
def runctx(self, cmd, globals, locals):
# B/W compatibility
self.run(cmd, globals, locals)
# This method is more useful to debug a single function call.
def runcall(self, func, *args, **kwds):
self.reset()
sys.settrace(self.trace_dispatch)
res = None
try:
res = func(*args, **kwds)
except BdbQuit:
pass
finally:
self.quitting = 1
sys.settrace(None)
return res
def set_trace():
Bdb().set_trace()
class Breakpoint:
"""Breakpoint class
Implements temporary breakpoints, ignore counts, disabling and
(re)-enabling, and conditionals.
Breakpoints are indexed by number through bpbynumber and by
the file,line tuple using bplist. The former points to a
single instance of class Breakpoint. The latter points to a
list of such instances since there may be more than one
breakpoint per line.
"""
# XXX Keeping state in the class is a mistake -- this means
# you cannot have more than one active Bdb instance.
next = 1 # Next bp to be assigned
bplist = {} # indexed by (file, lineno) tuple
bpbynumber = [None] # Each entry is None or an instance of Bpt
# index 0 is unused, except for marking an
# effective break .... see effective()
def __init__(self, file, line, temporary=0, cond=None, funcname=None):
self.funcname = funcname
# Needed if funcname is not None.
self.func_first_executable_line = None
self.file = file # This better be in canonical form!
self.line = line
self.temporary = temporary
self.cond = cond
self.enabled = 1
self.ignore = 0
self.hits = 0
self.number = Breakpoint.next
Breakpoint.next = Breakpoint.next + 1
# Build the two lists
self.bpbynumber.append(self)
if (file, line) in self.bplist:
self.bplist[file, line].append(self)
else:
self.bplist[file, line] = [self]
def deleteMe(self):
index = (self.file, self.line)
self.bpbynumber[self.number] = None # No longer in list
self.bplist[index].remove(self)
if not self.bplist[index]:
# No more bp for this f:l combo
del self.bplist[index]
def enable(self):
self.enabled = 1
def disable(self):
self.enabled = 0
def bpprint(self, out=None):
if out is None:
out = sys.stdout
if self.temporary:
disp = 'del '
else:
disp = 'keep '
if self.enabled:
disp = disp + 'yes '
else:
disp = disp + 'no '
print >>out, '%-4dbreakpoint %s at %s:%d' % (self.number, disp,
self.file, self.line)
if self.cond:
print >>out, '\tstop only if %s' % (self.cond,)
if self.ignore:
print >>out, '\tignore next %d hits' % (self.ignore)
if (self.hits):
if (self.hits > 1): ss = 's'
else: ss = ''
print >>out, ('\tbreakpoint already hit %d time%s' %
(self.hits, ss))
# -----------end of Breakpoint class----------
def checkfuncname(b, frame):
"""Check whether we should break here because of `b.funcname`."""
if not b.funcname:
# Breakpoint was set via line number.
if b.line != frame.f_lineno:
# Breakpoint was set at a line with a def statement and the function
# defined is called: don't break.
return False
return True
# Breakpoint set via function name.
if frame.f_code.co_name != b.funcname:
# It's not a function call, but rather execution of def statement.
return False
# We are in the right frame.
if not b.func_first_executable_line:
# The function is entered for the 1st time.
b.func_first_executable_line = frame.f_lineno
if b.func_first_executable_line != frame.f_lineno:
# But we are not at the first line number: don't break.
return False
return True
# Determines if there is an effective (active) breakpoint at this
# line of code. Returns breakpoint number or 0 if none
def effective(file, line, frame):
"""Determine which breakpoint for this file:line is to be acted upon.
Called only if we know there is a bpt at this
location. Returns breakpoint that was triggered and a flag
that indicates if it is ok to delete a temporary bp.
"""
possibles = Breakpoint.bplist[file,line]
for i in range(0, len(possibles)):
b = possibles[i]
if b.enabled == 0:
continue
if not checkfuncname(b, frame):
continue
# Count every hit when bp is enabled
b.hits = b.hits + 1
if not b.cond:
# If unconditional, and ignoring,
# go on to next, else break
if b.ignore > 0:
b.ignore = b.ignore -1
continue
else:
# breakpoint and marker that's ok
# to delete if temporary
return (b,1)
else:
# Conditional bp.
# Ignore count applies only to those bpt hits where the
# condition evaluates to true.
try:
val = eval(b.cond, frame.f_globals,
frame.f_locals)
if val:
if b.ignore > 0:
b.ignore = b.ignore -1
# continue
else:
return (b,1)
# else:
# continue
except:
# if eval fails, most conservative
# thing is to stop on breakpoint
# regardless of ignore count.
# Don't delete temporary,
# as another hint to user.
return (b,0)
return (None, None)
# -------------------- testing --------------------
class Tdb(Bdb):
def user_call(self, frame, args):
name = frame.f_code.co_name
if not name: name = '???'
print '+++ call', name, args
def user_line(self, frame):
import linecache
name = frame.f_code.co_name
if not name: name = '???'
fn = self.canonic(frame.f_code.co_filename)
line = linecache.getline(fn, frame.f_lineno, frame.f_globals)
print '+++', fn, frame.f_lineno, name, ':', line.strip()
def user_return(self, frame, retval):
print '+++ return', retval
def user_exception(self, frame, exc_stuff):
print '+++ exception', exc_stuff
self.set_continue()
def foo(n):
print 'foo(', n, ')'
x = bar(n*10)
print 'bar returned', x
def bar(a):
print 'bar(', a, ')'
return a/2
def test():
t = Tdb()
t.run('import bdb; bdb.foo(10)')
# end
"""Macintosh binhex compression/decompression.
easy interface:
binhex(inputfilename, outputfilename)
hexbin(inputfilename, outputfilename)
"""
#
# Jack Jansen, CWI, August 1995.
#
# The module is supposed to be as compatible as possible. Especially the
# easy interface should work "as expected" on any platform.
# XXXX Note: currently, textfiles appear in mac-form on all platforms.
# We seem to lack a simple character-translate in python.
# (we should probably use ISO-Latin-1 on all but the mac platform).
# XXXX The simple routines are too simple: they expect to hold the complete
# files in-core. Should be fixed.
# XXXX It would be nice to handle AppleDouble format on unix
# (for servers serving macs).
# XXXX I don't understand what happens when you get 0x90 times the same byte on
# input. The resulting code (xx 90 90) would appear to be interpreted as an
# escaped *value* of 0x90. All coders I've seen appear to ignore this nicety...
#
import sys
import os
import struct
import binascii
__all__ = ["binhex","hexbin","Error"]
class Error(Exception):
pass
# States (what have we written)
_DID_HEADER = 0
_DID_DATA = 1
# Various constants
REASONABLY_LARGE=32768 # Minimal amount we pass the rle-coder
LINELEN=64
RUNCHAR=chr(0x90) # run-length introducer
#
# This code is no longer byte-order dependent
#
# Workarounds for non-mac machines.
try:
from Carbon.File import FSSpec, FInfo
from MacOS import openrf
def getfileinfo(name):
finfo = FSSpec(name).FSpGetFInfo()
dir, file = os.path.split(name)
# XXX Get resource/data sizes
fp = open(name, 'rb')
fp.seek(0, 2)
dlen = fp.tell()
fp = openrf(name, '*rb')
fp.seek(0, 2)
rlen = fp.tell()
return file, finfo, dlen, rlen
def openrsrc(name, *mode):
if not mode:
mode = '*rb'
else:
mode = '*' + mode[0]
return openrf(name, mode)
except ImportError:
#
# Glue code for non-macintosh usage
#
class FInfo:
def __init__(self):
self.Type = '????'
self.Creator = '????'
self.Flags = 0
def getfileinfo(name):
finfo = FInfo()
# Quick check for textfile
fp = open(name)
data = open(name).read(256)
for c in data:
if not c.isspace() and (c<' ' or ord(c) > 0x7f):
break
else:
finfo.Type = 'TEXT'
fp.seek(0, 2)
dsize = fp.tell()
fp.close()
dir, file = os.path.split(name)
file = file.replace(':', '-', 1)
return file, finfo, dsize, 0
class openrsrc:
def __init__(self, *args):
pass
def read(self, *args):
return ''
def write(self, *args):
pass
def close(self):
pass
class _Hqxcoderengine:
"""Write data to the coder in 3-byte chunks"""
def __init__(self, ofp):
self.ofp = ofp
self.data = ''
self.hqxdata = ''
self.linelen = LINELEN-1
def write(self, data):
self.data = self.data + data
datalen = len(self.data)
todo = (datalen//3)*3
data = self.data[:todo]
self.data = self.data[todo:]
if not data:
return
self.hqxdata = self.hqxdata + binascii.b2a_hqx(data)
self._flush(0)
def _flush(self, force):
first = 0
while first <= len(self.hqxdata)-self.linelen:
last = first + self.linelen
self.ofp.write(self.hqxdata[first:last]+'\n')
self.linelen = LINELEN
first = last
self.hqxdata = self.hqxdata[first:]
if force:
self.ofp.write(self.hqxdata + ':\n')
def close(self):
if self.data:
self.hqxdata = \
self.hqxdata + binascii.b2a_hqx(self.data)
self._flush(1)
self.ofp.close()
del self.ofp
class _Rlecoderengine:
"""Write data to the RLE-coder in suitably large chunks"""
def __init__(self, ofp):
self.ofp = ofp
self.data = ''
def write(self, data):
self.data = self.data + data
if len(self.data) < REASONABLY_LARGE:
return
rledata = binascii.rlecode_hqx(self.data)
self.ofp.write(rledata)
self.data = ''
def close(self):
if self.data:
rledata = binascii.rlecode_hqx(self.data)
self.ofp.write(rledata)
self.ofp.close()
del self.ofp
class BinHex:
def __init__(self, name_finfo_dlen_rlen, ofp):
name, finfo, dlen, rlen = name_finfo_dlen_rlen
if type(ofp) == type(''):
ofname = ofp
ofp = open(ofname, 'w')
ofp.write('(This file must be converted with BinHex 4.0)\n\n:')
hqxer = _Hqxcoderengine(ofp)
self.ofp = _Rlecoderengine(hqxer)
self.crc = 0
if finfo is None:
finfo = FInfo()
self.dlen = dlen
self.rlen = rlen
self._writeinfo(name, finfo)
self.state = _DID_HEADER
def _writeinfo(self, name, finfo):
nl = len(name)
if nl > 63:
raise Error, 'Filename too long'
d = chr(nl) + name + '\0'
d2 = finfo.Type + finfo.Creator
# Force all structs to be packed with big-endian
d3 = struct.pack('>h', finfo.Flags)
d4 = struct.pack('>ii', self.dlen, self.rlen)
info = d + d2 + d3 + d4
self._write(info)
self._writecrc()
def _write(self, data):
self.crc = binascii.crc_hqx(data, self.crc)
self.ofp.write(data)
def _writecrc(self):
# XXXX Should this be here??
# self.crc = binascii.crc_hqx('\0\0', self.crc)
if self.crc < 0:
fmt = '>h'
else:
fmt = '>H'
self.ofp.write(struct.pack(fmt, self.crc))
self.crc = 0
def write(self, data):
if self.state != _DID_HEADER:
raise Error, 'Writing data at the wrong time'
self.dlen = self.dlen - len(data)
self._write(data)
def close_data(self):
if self.dlen != 0:
raise Error, 'Incorrect data size, diff=%r' % (self.rlen,)
self._writecrc()
self.state = _DID_DATA
def write_rsrc(self, data):
if self.state < _DID_DATA:
self.close_data()
if self.state != _DID_DATA:
raise Error, 'Writing resource data at the wrong time'
self.rlen = self.rlen - len(data)
self._write(data)
def close(self):
if self.state is None:
return
try:
if self.state < _DID_DATA:
self.close_data()
if self.state != _DID_DATA:
raise Error, 'Close at the wrong time'
if self.rlen != 0:
raise Error, \
"Incorrect resource-datasize, diff=%r" % (self.rlen,)
self._writecrc()
finally:
self.state = None
ofp = self.ofp
del self.ofp
ofp.close()
def binhex(inp, out):
"""(infilename, outfilename) - Create binhex-encoded copy of a file"""
finfo = getfileinfo(inp)
ofp = BinHex(finfo, out)
ifp = open(inp, 'rb')
# XXXX Do textfile translation on non-mac systems
while 1:
d = ifp.read(128000)
if not d: break
ofp.write(d)
ofp.close_data()
ifp.close()
ifp = openrsrc(inp, 'rb')
while 1:
d = ifp.read(128000)
if not d: break
ofp.write_rsrc(d)
ofp.close()
ifp.close()
class _Hqxdecoderengine:
"""Read data via the decoder in 4-byte chunks"""
def __init__(self, ifp):
self.ifp = ifp
self.eof = 0
def read(self, totalwtd):
"""Read at least wtd bytes (or until EOF)"""
decdata = ''
wtd = totalwtd
#
# The loop here is convoluted, since we don't really now how
# much to decode: there may be newlines in the incoming data.
while wtd > 0:
if self.eof: return decdata
wtd = ((wtd+2)//3)*4
data = self.ifp.read(wtd)
#
# Next problem: there may not be a complete number of
# bytes in what we pass to a2b. Solve by yet another
# loop.
#
while 1:
try:
decdatacur, self.eof = \
binascii.a2b_hqx(data)
break
except binascii.Incomplete:
pass
newdata = self.ifp.read(1)
if not newdata:
raise Error, \
'Premature EOF on binhex file'
data = data + newdata
decdata = decdata + decdatacur
wtd = totalwtd - len(decdata)
if not decdata and not self.eof:
raise Error, 'Premature EOF on binhex file'
return decdata
def close(self):
self.ifp.close()
class _Rledecoderengine:
"""Read data via the RLE-coder"""
def __init__(self, ifp):
self.ifp = ifp
self.pre_buffer = ''
self.post_buffer = ''
self.eof = 0
def read(self, wtd):
if wtd > len(self.post_buffer):
self._fill(wtd-len(self.post_buffer))
rv = self.post_buffer[:wtd]
self.post_buffer = self.post_buffer[wtd:]
return rv
def _fill(self, wtd):
self.pre_buffer = self.pre_buffer + self.ifp.read(wtd+4)
if self.ifp.eof:
self.post_buffer = self.post_buffer + \
binascii.rledecode_hqx(self.pre_buffer)
self.pre_buffer = ''
return
#
# Obfuscated code ahead. We have to take care that we don't
# end up with an orphaned RUNCHAR later on. So, we keep a couple
# of bytes in the buffer, depending on what the end of
# the buffer looks like:
# '\220\0\220' - Keep 3 bytes: repeated \220 (escaped as \220\0)
# '?\220' - Keep 2 bytes: repeated something-else
# '\220\0' - Escaped \220: Keep 2 bytes.
# '?\220?' - Complete repeat sequence: decode all
# otherwise: keep 1 byte.
#
mark = len(self.pre_buffer)
if self.pre_buffer[-3:] == RUNCHAR + '\0' + RUNCHAR:
mark = mark - 3
elif self.pre_buffer[-1] == RUNCHAR:
mark = mark - 2
elif self.pre_buffer[-2:] == RUNCHAR + '\0':
mark = mark - 2
elif self.pre_buffer[-2] == RUNCHAR:
pass # Decode all
else:
mark = mark - 1
self.post_buffer = self.post_buffer + \
binascii.rledecode_hqx(self.pre_buffer[:mark])
self.pre_buffer = self.pre_buffer[mark:]
def close(self):
self.ifp.close()
class HexBin:
def __init__(self, ifp):
if type(ifp) == type(''):
ifp = open(ifp)
#
# Find initial colon.
#
while 1:
ch = ifp.read(1)
if not ch:
raise Error, "No binhex data found"
# Cater for \r\n terminated lines (which show up as \n\r, hence
# all lines start with \r)
if ch == '\r':
continue
if ch == ':':
break
if ch != '\n':
dummy = ifp.readline()
hqxifp = _Hqxdecoderengine(ifp)
self.ifp = _Rledecoderengine(hqxifp)
self.crc = 0
self._readheader()
def _read(self, len):
data = self.ifp.read(len)
self.crc = binascii.crc_hqx(data, self.crc)
return data
def _checkcrc(self):
filecrc = struct.unpack('>h', self.ifp.read(2))[0] & 0xffff
#self.crc = binascii.crc_hqx('\0\0', self.crc)
# XXXX Is this needed??
self.crc = self.crc & 0xffff
if filecrc != self.crc:
raise Error, 'CRC error, computed %x, read %x' \
%(self.crc, filecrc)
self.crc = 0
def _readheader(self):
len = self._read(1)
fname = self._read(ord(len))
rest = self._read(1+4+4+2+4+4)
self._checkcrc()
type = rest[1:5]
creator = rest[5:9]
flags = struct.unpack('>h', rest[9:11])[0]
self.dlen = struct.unpack('>l', rest[11:15])[0]
self.rlen = struct.unpack('>l', rest[15:19])[0]
self.FName = fname
self.FInfo = FInfo()
self.FInfo.Creator = creator
self.FInfo.Type = type
self.FInfo.Flags = flags
self.state = _DID_HEADER
def read(self, *n):
if self.state != _DID_HEADER:
raise Error, 'Read data at wrong time'
if n:
n = n[0]
n = min(n, self.dlen)
else:
n = self.dlen
rv = ''
while len(rv) < n:
rv = rv + self._read(n-len(rv))
self.dlen = self.dlen - n
return rv
def close_data(self):
if self.state != _DID_HEADER:
raise Error, 'close_data at wrong time'
if self.dlen:
dummy = self._read(self.dlen)
self._checkcrc()
self.state = _DID_DATA
def read_rsrc(self, *n):
if self.state == _DID_HEADER:
self.close_data()
if self.state != _DID_DATA:
raise Error, 'Read resource data at wrong time'
if n:
n = n[0]
n = min(n, self.rlen)
else:
n = self.rlen
self.rlen = self.rlen - n
return self._read(n)
def close(self):
if self.state is None:
return
try:
if self.rlen:
dummy = self.read_rsrc(self.rlen)
self._checkcrc()
finally:
self.state = None
self.ifp.close()
def hexbin(inp, out):
"""(infilename, outfilename) - Decode binhexed file"""
ifp = HexBin(inp)
finfo = ifp.FInfo
if not out:
out = ifp.FName
ofp = open(out, 'wb')
# XXXX Do translation on non-mac systems
while 1:
d = ifp.read(128000)
if not d: break
ofp.write(d)
ofp.close()
ifp.close_data()
d = ifp.read_rsrc(128000)
if d:
ofp = openrsrc(out, 'wb')
ofp.write(d)
while 1:
d = ifp.read_rsrc(128000)
if not d: break
ofp.write(d)
ofp.close()
ifp.close()
def _test():
fname = sys.argv[1]
binhex(fname, fname+'.hqx')
hexbin(fname+'.hqx', fname+'.viahqx')
#hexbin(fname, fname+'.unpacked')
sys.exit(1)
if __name__ == '__main__':
_test()
"""Bisection algorithms."""
def insort_right(a, x, lo=0, hi=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the right of the rightmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if x < a[mid]: hi = mid
else: lo = mid+1
a.insert(lo, x)
insort = insort_right # backward compatibility
def bisect_right(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e <= x, and all e in
a[i:] have e > x. So if x already appears in the list, a.insert(x) will
insert just after the rightmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if x < a[mid]: hi = mid
else: lo = mid+1
return lo
bisect = bisect_right # backward compatibility
def insort_left(a, x, lo=0, hi=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the left of the leftmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
a.insert(lo, x)
def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, a.insert(x) will
insert just before the leftmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
return lo
# Overwrite above definitions with a fast C implementation
try:
from _bisect import *
except ImportError:
pass
"""Calendar printing functions
Note when comparing these calendars to the ones printed by cal(1): By
default, these calendars have Monday as the first day of the week, and
Sunday as the last (the European convention). Use setfirstweekday() to
set the first day of the week (0=Monday, 6=Sunday)."""
import sys
import datetime
import locale as _locale
__all__ = ["IllegalMonthError", "IllegalWeekdayError", "setfirstweekday",
"firstweekday", "isleap", "leapdays", "weekday", "monthrange",
"monthcalendar", "prmonth", "month", "prcal", "calendar",
"timegm", "month_name", "month_abbr", "day_name", "day_abbr"]
# Exception raised for bad input (with string parameter for details)
error = ValueError
# Exceptions raised for bad input
class IllegalMonthError(ValueError):
def __init__(self, month):
self.month = month
def __str__(self):
return "bad month number %r; must be 1-12" % self.month
class IllegalWeekdayError(ValueError):
def __init__(self, weekday):
self.weekday = weekday
def __str__(self):
return "bad weekday number %r; must be 0 (Monday) to 6 (Sunday)" % self.weekday
# Constants for months referenced later
January = 1
February = 2
# Number of days per month (except for February in leap years)
mdays = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
# This module used to have hard-coded lists of day and month names, as
# English strings. The classes following emulate a read-only version of
# that, but supply localized names. Note that the values are computed
# fresh on each call, in case the user changes locale between calls.
class _localized_month:
_months = [datetime.date(2001, i+1, 1).strftime for i in range(12)]
_months.insert(0, lambda x: "")
def __init__(self, format):
self.format = format
def __getitem__(self, i):
funcs = self._months[i]
if isinstance(i, slice):
return [f(self.format) for f in funcs]
else:
return funcs(self.format)
def __len__(self):
return 13
class _localized_day:
# January 1, 2001, was a Monday.
_days = [datetime.date(2001, 1, i+1).strftime for i in range(7)]
def __init__(self, format):
self.format = format
def __getitem__(self, i):
funcs = self._days[i]
if isinstance(i, slice):
return [f(self.format) for f in funcs]
else:
return funcs(self.format)
def __len__(self):
return 7
# Full and abbreviated names of weekdays
day_name = _localized_day('%A')
day_abbr = _localized_day('%a')
# Full and abbreviated names of months (1-based arrays!!!)
month_name = _localized_month('%B')
month_abbr = _localized_month('%b')
# Constants for weekdays
(MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY) = range(7)
def isleap(year):
"""Return True for leap years, False for non-leap years."""
return year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)
def leapdays(y1, y2):
"""Return number of leap years in range [y1, y2).
Assume y1 <= y2."""
y1 -= 1
y2 -= 1
return (y2//4 - y1//4) - (y2//100 - y1//100) + (y2//400 - y1//400)
def weekday(year, month, day):
"""Return weekday (0-6 ~ Mon-Sun) for year (1970-...), month (1-12),
day (1-31)."""
return datetime.date(year, month, day).weekday()
def monthrange(year, month):
"""Return weekday (0-6 ~ Mon-Sun) and number of days (28-31) for
year, month."""
if not 1 <= month <= 12:
raise IllegalMonthError(month)
day1 = weekday(year, month, 1)
ndays = mdays[month] + (month == February and isleap(year))
return day1, ndays
class Calendar(object):
"""
Base calendar class. This class doesn't do any formatting. It simply
provides data to subclasses.
"""
def __init__(self, firstweekday=0):
self.firstweekday = firstweekday # 0 = Monday, 6 = Sunday
def getfirstweekday(self):
return self._firstweekday % 7
def setfirstweekday(self, firstweekday):
self._firstweekday = firstweekday
firstweekday = property(getfirstweekday, setfirstweekday)
def iterweekdays(self):
"""
Return an iterator for one week of weekday numbers starting with the
configured first one.
"""
for i in range(self.firstweekday, self.firstweekday + 7):
yield i%7
def itermonthdates(self, year, month):
"""
Return an iterator for one month. The iterator will yield datetime.date
values and will always iterate through complete weeks, so it will yield
dates outside the specified month.
"""
date = datetime.date(year, month, 1)
# Go back to the beginning of the week
days = (date.weekday() - self.firstweekday) % 7
date -= datetime.timedelta(days=days)
oneday = datetime.timedelta(days=1)
while True:
yield date
try:
date += oneday
except OverflowError:
# Adding one day could fail after datetime.MAXYEAR
break
if date.month != month and date.weekday() == self.firstweekday:
break
def itermonthdays2(self, year, month):
"""
Like itermonthdates(), but will yield (day number, weekday number)
tuples. For days outside the specified month the day number is 0.
"""
for i, d in enumerate(self.itermonthdays(year, month), self.firstweekday):
yield d, i % 7
def itermonthdays(self, year, month):
"""
Like itermonthdates(), but will yield day numbers. For days outside
the specified month the day number is 0.
"""
day1, ndays = monthrange(year, month)
days_before = (day1 - self.firstweekday) % 7
for _ in range(days_before):
yield 0
for d in range(1, ndays + 1):
yield d
days_after = (self.firstweekday - day1 - ndays) % 7
for _ in range(days_after):
yield 0
def monthdatescalendar(self, year, month):
"""
Return a matrix (list of lists) representing a month's calendar.
Each row represents a week; week entries are datetime.date values.
"""
dates = list(self.itermonthdates(year, month))
return [ dates[i:i+7] for i in range(0, len(dates), 7) ]
def monthdays2calendar(self, year, month):
"""
Return a matrix representing a month's calendar.
Each row represents a week; week entries are
(day number, weekday number) tuples. Day numbers outside this month
are zero.
"""
days = list(self.itermonthdays2(year, month))
return [ days[i:i+7] for i in range(0, len(days), 7) ]
def monthdayscalendar(self, year, month):
"""
Return a matrix representing a month's calendar.
Each row represents a week; days outside this month are zero.
"""
days = list(self.itermonthdays(year, month))
return [ days[i:i+7] for i in range(0, len(days), 7) ]
def yeardatescalendar(self, year, width=3):
"""
Return the data for the specified year ready for formatting. The return
value is a list of month rows. Each month row contains up to width months.
Each month contains between 4 and 6 weeks and each week contains 1-7
days. Days are datetime.date objects.
"""
months = [
self.monthdatescalendar(year, i)
for i in range(January, January+12)
]
return [months[i:i+width] for i in range(0, len(months), width) ]
def yeardays2calendar(self, year, width=3):
"""
Return the data for the specified year ready for formatting (similar to
yeardatescalendar()). Entries in the week lists are
(day number, weekday number) tuples. Day numbers outside this month are
zero.
"""
months = [
self.monthdays2calendar(year, i)
for i in range(January, January+12)
]
return [months[i:i+width] for i in range(0, len(months), width) ]
def yeardayscalendar(self, year, width=3):
"""
Return the data for the specified year ready for formatting (similar to
yeardatescalendar()). Entries in the week lists are day numbers.
Day numbers outside this month are zero.
"""
months = [
self.monthdayscalendar(year, i)
for i in range(January, January+12)
]
return [months[i:i+width] for i in range(0, len(months), width) ]
class TextCalendar(Calendar):
"""
Subclass of Calendar that outputs a calendar as a simple plain text
similar to the UNIX program cal.
"""
def prweek(self, theweek, width):
"""
Print a single week (no newline).
"""
print self.formatweek(theweek, width),
def formatday(self, day, weekday, width):
"""
Returns a formatted day.
"""
if day == 0:
s = ''
else:
s = '%2i' % day # right-align single-digit days
return s.center(width)
def formatweek(self, theweek, width):
"""
Returns a single week in a string (no newline).
"""
return ' '.join(self.formatday(d, wd, width) for (d, wd) in theweek)
def formatweekday(self, day, width):
"""
Returns a formatted week day name.
"""
if width >= 9:
names = day_name
else:
names = day_abbr
return names[day][:width].center(width)
def formatweekheader(self, width):
"""
Return a header for a week.
"""
return ' '.join(self.formatweekday(i, width) for i in self.iterweekdays())
def formatmonthname(self, theyear, themonth, width, withyear=True):
"""
Return a formatted month name.
"""
s = month_name[themonth]
if withyear:
s = "%s %r" % (s, theyear)
return s.center(width)
def prmonth(self, theyear, themonth, w=0, l=0):
"""
Print a month's calendar.
"""
print self.formatmonth(theyear, themonth, w, l),
def formatmonth(self, theyear, themonth, w=0, l=0):
"""
Return a month's calendar string (multi-line).
"""
w = max(2, w)
l = max(1, l)
s = self.formatmonthname(theyear, themonth, 7 * (w + 1) - 1)
s = s.rstrip()
s += '\n' * l
s += self.formatweekheader(w).rstrip()
s += '\n' * l
for week in self.monthdays2calendar(theyear, themonth):
s += self.formatweek(week, w).rstrip()
s += '\n' * l
return s
def formatyear(self, theyear, w=2, l=1, c=6, m=3):
"""
Returns a year's calendar as a multi-line string.
"""
w = max(2, w)
l = max(1, l)
c = max(2, c)
colwidth = (w + 1) * 7 - 1
v = []
a = v.append
a(repr(theyear).center(colwidth*m+c*(m-1)).rstrip())
a('\n'*l)
header = self.formatweekheader(w)
for (i, row) in enumerate(self.yeardays2calendar(theyear, m)):
# months in this row
months = range(m*i+1, min(m*(i+1)+1, 13))
a('\n'*l)
names = (self.formatmonthname(theyear, k, colwidth, False)
for k in months)
a(formatstring(names, colwidth, c).rstrip())
a('\n'*l)
headers = (header for k in months)
a(formatstring(headers, colwidth, c).rstrip())
a('\n'*l)
# max number of weeks for this row
height = max(len(cal) for cal in row)
for j in range(height):
weeks = []
for cal in row:
if j >= len(cal):
weeks.append('')
else:
weeks.append(self.formatweek(cal[j], w))
a(formatstring(weeks, colwidth, c).rstrip())
a('\n' * l)
return ''.join(v)
def pryear(self, theyear, w=0, l=0, c=6, m=3):
"""Print a year's calendar."""
print self.formatyear(theyear, w, l, c, m)
class HTMLCalendar(Calendar):
"""
This calendar returns complete HTML pages.
"""
# CSS classes for the day <td>s
cssclasses = ["mon", "tue", "wed", "thu", "fri", "sat", "sun"]
def formatday(self, day, weekday):
"""
Return a day as a table cell.
"""
if day == 0:
return '<td class="noday"> </td>' # day outside month
else:
return '<td class="%s">%d</td>' % (self.cssclasses[weekday], day)
def formatweek(self, theweek):
"""
Return a complete week as a table row.
"""
s = ''.join(self.formatday(d, wd) for (d, wd) in theweek)
return '<tr>%s</tr>' % s
def formatweekday(self, day):
"""
Return a weekday name as a table header.
"""
return '<th class="%s">%s</th>' % (self.cssclasses[day], day_abbr[day])
def formatweekheader(self):
"""
Return a header for a week as a table row.
"""
s = ''.join(self.formatweekday(i) for i in self.iterweekdays())
return '<tr>%s</tr>' % s
def formatmonthname(self, theyear, themonth, withyear=True):
"""
Return a month name as a table row.
"""
if withyear:
s = '%s %s' % (month_name[themonth], theyear)
else:
s = '%s' % month_name[themonth]
return '<tr><th colspan="7" class="month">%s</th></tr>' % s
def formatmonth(self, theyear, themonth, withyear=True):
"""
Return a formatted month as a table.
"""
v = []
a = v.append
a('<table border="0" cellpadding="0" cellspacing="0" class="month">')
a('\n')
a(self.formatmonthname(theyear, themonth, withyear=withyear))
a('\n')
a(self.formatweekheader())
a('\n')
for week in self.monthdays2calendar(theyear, themonth):
a(self.formatweek(week))
a('\n')
a('</table>')
a('\n')
return ''.join(v)
def formatyear(self, theyear, width=3):
"""
Return a formatted year as a table of tables.
"""
v = []
a = v.append
width = max(width, 1)
a('<table border="0" cellpadding="0" cellspacing="0" class="year">')
a('\n')
a('<tr><th colspan="%d" class="year">%s</th></tr>' % (width, theyear))
for i in range(January, January+12, width):
# months in this row
months = range(i, min(i+width, 13))
a('<tr>')
for m in months:
a('<td>')
a(self.formatmonth(theyear, m, withyear=False))
a('</td>')
a('</tr>')
a('</table>')
return ''.join(v)
def formatyearpage(self, theyear, width=3, css='calendar.css', encoding=None):
"""
Return a formatted year as a complete HTML page.
"""
if encoding is None:
encoding = sys.getdefaultencoding()
v = []
a = v.append
a('<?xml version="1.0" encoding="%s"?>\n' % encoding)
a('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n')
a('<html>\n')
a('<head>\n')
a('<meta http-equiv="Content-Type" content="text/html; charset=%s" />\n' % encoding)
if css is not None:
a('<link rel="stylesheet" type="text/css" href="%s" />\n' % css)
a('<title>Calendar for %d</title>\n' % theyear)
a('</head>\n')
a('<body>\n')
a(self.formatyear(theyear, width))
a('</body>\n')
a('</html>\n')
return ''.join(v).encode(encoding, "xmlcharrefreplace")
class TimeEncoding:
def __init__(self, locale):
self.locale = locale
def __enter__(self):
self.oldlocale = _locale.getlocale(_locale.LC_TIME)
_locale.setlocale(_locale.LC_TIME, self.locale)
return _locale.getlocale(_locale.LC_TIME)[1]
def __exit__(self, *args):
_locale.setlocale(_locale.LC_TIME, self.oldlocale)
class LocaleTextCalendar(TextCalendar):
"""
This class can be passed a locale name in the constructor and will return
month and weekday names in the specified locale. If this locale includes
an encoding all strings containing month and weekday names will be returned
as unicode.
"""
def __init__(self, firstweekday=0, locale=None):
TextCalendar.__init__(self, firstweekday)
if locale is None:
locale = _locale.getdefaultlocale()
self.locale = locale
def formatweekday(self, day, width):
with TimeEncoding(self.locale) as encoding:
if width >= 9:
names = day_name
else:
names = day_abbr
name = names[day]
if encoding is not None:
name = name.decode(encoding)
return name[:width].center(width)
def formatmonthname(self, theyear, themonth, width, withyear=True):
with TimeEncoding(self.locale) as encoding:
s = month_name[themonth]
if encoding is not None:
s = s.decode(encoding)
if withyear:
s = "%s %r" % (s, theyear)
return s.center(width)
class LocaleHTMLCalendar(HTMLCalendar):
"""
This class can be passed a locale name in the constructor and will return
month and weekday names in the specified locale. If this locale includes
an encoding all strings containing month and weekday names will be returned
as unicode.
"""
def __init__(self, firstweekday=0, locale=None):
HTMLCalendar.__init__(self, firstweekday)
if locale is None:
locale = _locale.getdefaultlocale()
self.locale = locale
def formatweekday(self, day):
with TimeEncoding(self.locale) as encoding:
s = day_abbr[day]
if encoding is not None:
s = s.decode(encoding)
return '<th class="%s">%s</th>' % (self.cssclasses[day], s)
def formatmonthname(self, theyear, themonth, withyear=True):
with TimeEncoding(self.locale) as encoding:
s = month_name[themonth]
if encoding is not None:
s = s.decode(encoding)
if withyear:
s = '%s %s' % (s, theyear)
return '<tr><th colspan="7" class="month">%s</th></tr>' % s
# Support for old module level interface
c = TextCalendar()
firstweekday = c.getfirstweekday
def setfirstweekday(firstweekday):
try:
firstweekday.__index__
except AttributeError:
raise IllegalWeekdayError(firstweekday)
if not MONDAY <= firstweekday <= SUNDAY:
raise IllegalWeekdayError(firstweekday)
c.firstweekday = firstweekday
monthcalendar = c.monthdayscalendar
prweek = c.prweek
week = c.formatweek
weekheader = c.formatweekheader
prmonth = c.prmonth
month = c.formatmonth
calendar = c.formatyear
prcal = c.pryear
# Spacing of month columns for multi-column year calendar
_colwidth = 7*3 - 1 # Amount printed by prweek()
_spacing = 6 # Number of spaces between columns
def format(cols, colwidth=_colwidth, spacing=_spacing):
"""Prints multi-column formatting for year calendars"""
print formatstring(cols, colwidth, spacing)
def formatstring(cols, colwidth=_colwidth, spacing=_spacing):
"""Returns a string formatted from n strings, centered within n columns."""
spacing *= ' '
return spacing.join(c.center(colwidth) for c in cols)
EPOCH = 1970
_EPOCH_ORD = datetime.date(EPOCH, 1, 1).toordinal()
def timegm(tuple):
"""Unrelated but handy function to calculate Unix timestamp from GMT."""
year, month, day, hour, minute, second = tuple[:6]
days = datetime.date(year, month, 1).toordinal() - _EPOCH_ORD + day - 1
hours = days*24 + hour
minutes = hours*60 + minute
seconds = minutes*60 + second
return seconds
def main(args):
import optparse
parser = optparse.OptionParser(usage="usage: %prog [options] [year [month]]")
parser.add_option(
"-w", "--width",
dest="width", type="int", default=2,
help="width of date column (default 2, text only)"
)
parser.add_option(
"-l", "--lines",
dest="lines", type="int", default=1,
help="number of lines for each week (default 1, text only)"
)
parser.add_option(
"-s", "--spacing",
dest="spacing", type="int", default=6,
help="spacing between months (default 6, text only)"
)
parser.add_option(
"-m", "--months",
dest="months", type="int", default=3,
help="months per row (default 3, text only)"
)
parser.add_option(
"-c", "--css",
dest="css", default="calendar.css",
help="CSS to use for page (html only)"
)
parser.add_option(
"-L", "--locale",
dest="locale", default=None,
help="locale to be used from month and weekday names"
)
parser.add_option(
"-e", "--encoding",
dest="encoding", default=None,
help="Encoding to use for output"
)
parser.add_option(
"-t", "--type",
dest="type", default="text",
choices=("text", "html"),
help="output type (text or html)"
)
(options, args) = parser.parse_args(args)
if options.locale and not options.encoding:
parser.error("if --locale is specified --encoding is required")
sys.exit(1)
locale = options.locale, options.encoding
if options.type == "html":
if options.locale:
cal = LocaleHTMLCalendar(locale=locale)
else:
cal = HTMLCalendar()
encoding = options.encoding
if encoding is None:
encoding = sys.getdefaultencoding()
optdict = dict(encoding=encoding, css=options.css)
if len(args) == 1:
print cal.formatyearpage(datetime.date.today().year, **optdict)
elif len(args) == 2:
print cal.formatyearpage(int(args[1]), **optdict)
else:
parser.error("incorrect number of arguments")
sys.exit(1)
else:
if options.locale:
cal = LocaleTextCalendar(locale=locale)
else:
cal = TextCalendar()
optdict = dict(w=options.width, l=options.lines)
if len(args) != 3:
optdict["c"] = options.spacing
optdict["m"] = options.months
if len(args) == 1:
result = cal.formatyear(datetime.date.today().year, **optdict)
elif len(args) == 2:
result = cal.formatyear(int(args[1]), **optdict)
elif len(args) == 3:
result = cal.formatmonth(int(args[1]), int(args[2]), **optdict)
else:
parser.error("incorrect number of arguments")
sys.exit(1)
if options.encoding:
result = result.encode(options.encoding)
print result
if __name__ == "__main__":
main(sys.argv)
#! /usr/local/bin/python
# NOTE: the above "/usr/local/bin/python" is NOT a mistake. It is
# intentionally NOT "/usr/bin/env python". On many systems
# (e.g. Solaris), /usr/local/bin is not in $PATH as passed to CGI
# scripts, and /usr/local/bin is the default directory where Python is
# installed, so /usr/bin/env would be unable to find python. Granted,
# binary installations by Linux vendors often install Python in
# /usr/bin. So let those vendors patch cgi.py to match their choice
# of installation.
"""Support module for CGI (Common Gateway Interface) scripts.
This module defines a number of utilities for use by CGI scripts
written in Python.
"""
# XXX Perhaps there should be a slimmed version that doesn't contain
# all those backwards compatible and debugging classes and functions?
# History
# -------
#
# Michael McLay started this module. Steve Majewski changed the
# interface to SvFormContentDict and FormContentDict. The multipart
# parsing was inspired by code submitted by Andreas Paepcke. Guido van
# Rossum rewrote, reformatted and documented the module and is currently
# responsible for its maintenance.
#
__version__ = "2.6"
# Imports
# =======
from operator import attrgetter
import sys
import os
import UserDict
import urlparse
from warnings import filterwarnings, catch_warnings, warn
with catch_warnings():
if sys.py3kwarning:
filterwarnings("ignore", ".*mimetools has been removed",
DeprecationWarning)
filterwarnings("ignore", ".*rfc822 has been removed",
DeprecationWarning)
import mimetools
import rfc822
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
__all__ = ["MiniFieldStorage", "FieldStorage", "FormContentDict",
"SvFormContentDict", "InterpFormContentDict", "FormContent",
"parse", "parse_qs", "parse_qsl", "parse_multipart",
"parse_header", "print_exception", "print_environ",
"print_form", "print_directory", "print_arguments",
"print_environ_usage", "escape"]
# Logging support
# ===============
logfile = "" # Filename to log to, if not empty
logfp = None # File object to log to, if not None
def initlog(*allargs):
"""Write a log message, if there is a log file.
Even though this function is called initlog(), you should always
use log(); log is a variable that is set either to initlog
(initially), to dolog (once the log file has been opened), or to
nolog (when logging is disabled).
The first argument is a format string; the remaining arguments (if
any) are arguments to the % operator, so e.g.
log("%s: %s", "a", "b")
will write "a: b" to the log file, followed by a newline.
If the global logfp is not None, it should be a file object to
which log data is written.
If the global logfp is None, the global logfile may be a string
giving a filename to open, in append mode. This file should be
world writable!!! If the file can't be opened, logging is
silently disabled (since there is no safe place where we could
send an error message).
"""
global logfp, log
if logfile and not logfp:
try:
logfp = open(logfile, "a")
except IOError:
pass
if not logfp:
log = nolog
else:
log = dolog
log(*allargs)
def dolog(fmt, *args):
"""Write a log message to the log file. See initlog() for docs."""
logfp.write(fmt%args + "\n")
def nolog(*allargs):
"""Dummy function, assigned to log when logging is disabled."""
pass
log = initlog # The current logging function
# Parsing functions
# =================
# Maximum input we will accept when REQUEST_METHOD is POST
# 0 ==> unlimited input
maxlen = 0
def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
"""Parse a query in the environment or from a file (default stdin)
Arguments, all optional:
fp : file pointer; default: sys.stdin
environ : environment dictionary; default: os.environ
keep_blank_values: flag indicating whether blank values in
percent-encoded forms should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings. The default false value indicates that
blank values are to be ignored and treated as if they were
not included.
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
"""
if fp is None:
fp = sys.stdin
if not 'REQUEST_METHOD' in environ:
environ['REQUEST_METHOD'] = 'GET' # For testing stand-alone
if environ['REQUEST_METHOD'] == 'POST':
ctype, pdict = parse_header(environ['CONTENT_TYPE'])
if ctype == 'multipart/form-data':
return parse_multipart(fp, pdict)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(environ['CONTENT_LENGTH'])
if maxlen and clength > maxlen:
raise ValueError, 'Maximum content length exceeded'
qs = fp.read(clength)
else:
qs = '' # Unknown content-type
if 'QUERY_STRING' in environ:
if qs: qs = qs + '&'
qs = qs + environ['QUERY_STRING']
elif sys.argv[1:]:
if qs: qs = qs + '&'
qs = qs + sys.argv[1]
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
elif 'QUERY_STRING' in environ:
qs = environ['QUERY_STRING']
else:
if sys.argv[1:]:
qs = sys.argv[1]
else:
qs = ""
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
# parse query string function called from urlparse,
# this is done in order to maintain backward compatibility.
def parse_qs(qs, keep_blank_values=0, strict_parsing=0):
"""Parse a query given as a string argument."""
warn("cgi.parse_qs is deprecated, use urlparse.parse_qs instead",
PendingDeprecationWarning, 2)
return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
"""Parse a query given as a string argument."""
warn("cgi.parse_qsl is deprecated, use urlparse.parse_qsl instead",
PendingDeprecationWarning, 2)
return urlparse.parse_qsl(qs, keep_blank_values, strict_parsing,
max_num_fields)
def parse_multipart(fp, pdict):
"""Parse multipart input.
Arguments:
fp : input file
pdict: dictionary containing other parameters of content-type header
Returns a dictionary just like parse_qs(): keys are the field names, each
value is a list of values for that field. This is easy to use but not
much good if you are expecting megabytes to be uploaded -- in that case,
use the FieldStorage class instead which is much more flexible. Note
that content-type is the raw, unparsed contents of the content-type
header.
XXX This does not parse nested multipart parts -- use FieldStorage for
that.
XXX This should really be subsumed by FieldStorage altogether -- no
point in having two implementations of the same parsing algorithm.
Also, FieldStorage protects itself better against certain DoS attacks
by limiting the size of the data read in one chunk. The API here
does not support that kind of protection. This also affects parse()
since it can call parse_multipart().
"""
boundary = ""
if 'boundary' in pdict:
boundary = pdict['boundary']
if not valid_boundary(boundary):
raise ValueError, ('Invalid boundary in multipart form: %r'
% (boundary,))
nextpart = "--" + boundary
lastpart = "--" + boundary + "--"
partdict = {}
terminator = ""
while terminator != lastpart:
bytes = -1
data = None
if terminator:
# At start of next part. Read headers first.
headers = mimetools.Message(fp)
clength = headers.getheader('content-length')
if clength:
try:
bytes = int(clength)
except ValueError:
pass
if bytes > 0:
if maxlen and bytes > maxlen:
raise ValueError, 'Maximum content length exceeded'
data = fp.read(bytes)
else:
data = ""
# Read lines until end of part.
lines = []
while 1:
line = fp.readline()
if not line:
terminator = lastpart # End outer loop
break
if line[:2] == "--":
terminator = line.strip()
if terminator in (nextpart, lastpart):
break
lines.append(line)
# Done with part.
if data is None:
continue
if bytes < 0:
if lines:
# Strip final line terminator
line = lines[-1]
if line[-2:] == "\r\n":
line = line[:-2]
elif line[-1:] == "\n":
line = line[:-1]
lines[-1] = line
data = "".join(lines)
line = headers['content-disposition']
if not line:
continue
key, params = parse_header(line)
if key != 'form-data':
continue
if 'name' in params:
name = params['name']
else:
continue
if name in partdict:
partdict[name].append(data)
else:
partdict[name] = [data]
return partdict
def _parseparam(s):
while s[:1] == ';':
s = s[1:]
end = s.find(';')
while end > 0 and (s.count('"', 0, end) - s.count('\\"', 0, end)) % 2:
end = s.find(';', end + 1)
if end < 0:
end = len(s)
f = s[:end]
yield f.strip()
s = s[end:]
def parse_header(line):
"""Parse a Content-type like header.
Return the main content-type and a dictionary of options.
"""
parts = _parseparam(';' + line)
key = parts.next()
pdict = {}
for p in parts:
i = p.find('=')
if i >= 0:
name = p[:i].strip().lower()
value = p[i+1:].strip()
if len(value) >= 2 and value[0] == value[-1] == '"':
value = value[1:-1]
value = value.replace('\\\\', '\\').replace('\\"', '"')
pdict[name] = value
return key, pdict
# Classes for field storage
# =========================
class MiniFieldStorage:
"""Like FieldStorage, for use when no file uploads are possible."""
# Dummy attributes
filename = None
list = None
type = None
file = None
type_options = {}
disposition = None
disposition_options = {}
headers = {}
def __init__(self, name, value):
"""Constructor from field name and value."""
self.name = name
self.value = value
# self.file = StringIO(value)
def __repr__(self):
"""Return printable representation."""
return "MiniFieldStorage(%r, %r)" % (self.name, self.value)
class FieldStorage:
"""Store a sequence of fields, reading multipart/form-data.
This class provides naming, typing, files stored on disk, and
more. At the top level, it is accessible like a dictionary, whose
keys are the field names. (Note: None can occur as a field name.)
The items are either a Python list (if there's multiple values) or
another FieldStorage or MiniFieldStorage object. If it's a single
object, it has the following attributes:
name: the field name, if specified; otherwise None
filename: the filename, if specified; otherwise None; this is the
client side filename, *not* the file name on which it is
stored (that's a temporary file you don't deal with)
value: the value as a *string*; for file uploads, this
transparently reads the file every time you request the value
file: the file(-like) object from which you can read the data;
None if the data is stored a simple string
type: the content-type, or None if not specified
type_options: dictionary of options specified on the content-type
line
disposition: content-disposition, or None if not specified
disposition_options: dictionary of corresponding options
headers: a dictionary(-like) object (sometimes rfc822.Message or a
subclass thereof) containing *all* headers
The class is subclassable, mostly for the purpose of overriding
the make_file() method, which is called internally to come up with
a file open for reading and writing. This makes it possible to
override the default choice of storing all files in a temporary
directory and unlinking them as soon as they have been opened.
"""
def __init__(self, fp=None, headers=None, outerboundary="",
environ=os.environ, keep_blank_values=0, strict_parsing=0,
max_num_fields=None):
"""Constructor. Read multipart/* until last part.
Arguments, all optional:
fp : file pointer; default: sys.stdin
(not used when the request method is GET)
headers : header dictionary-like object; default:
taken from environ as per CGI spec
outerboundary : terminating multipart boundary
(for internal use only)
environ : environment dictionary; default: os.environ
keep_blank_values: flag indicating whether blank values in
percent-encoded forms should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings. The default false value indicates that
blank values are to be ignored and treated as if they were
not included.
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
max_num_fields: int. If set, then __init__ throws a ValueError
if there are more than n fields read by parse_qsl().
"""
method = 'GET'
self.keep_blank_values = keep_blank_values
self.strict_parsing = strict_parsing
self.max_num_fields = max_num_fields
if 'REQUEST_METHOD' in environ:
method = environ['REQUEST_METHOD'].upper()
self.qs_on_post = None
if method == 'GET' or method == 'HEAD':
if 'QUERY_STRING' in environ:
qs = environ['QUERY_STRING']
elif sys.argv[1:]:
qs = sys.argv[1]
else:
qs = ""
fp = StringIO(qs)
if headers is None:
headers = {'content-type':
"application/x-www-form-urlencoded"}
if headers is None:
headers = {}
if method == 'POST':
# Set default content-type for POST to what's traditional
headers['content-type'] = "application/x-www-form-urlencoded"
if 'CONTENT_TYPE' in environ:
headers['content-type'] = environ['CONTENT_TYPE']
if 'QUERY_STRING' in environ:
self.qs_on_post = environ['QUERY_STRING']
if 'CONTENT_LENGTH' in environ:
headers['content-length'] = environ['CONTENT_LENGTH']
self.fp = fp or sys.stdin
self.headers = headers
self.outerboundary = outerboundary
# Process content-disposition header
cdisp, pdict = "", {}
if 'content-disposition' in self.headers:
cdisp, pdict = parse_header(self.headers['content-disposition'])
self.disposition = cdisp
self.disposition_options = pdict
self.name = None
if 'name' in pdict:
self.name = pdict['name']
self.filename = None
if 'filename' in pdict:
self.filename = pdict['filename']
# Process content-type header
#
# Honor any existing content-type header. But if there is no
# content-type header, use some sensible defaults. Assume
# outerboundary is "" at the outer level, but something non-false
# inside a multi-part. The default for an inner part is text/plain,
# but for an outer part it should be urlencoded. This should catch
# bogus clients which erroneously forget to include a content-type
# header.
#
# See below for what we do if there does exist a content-type header,
# but it happens to be something we don't understand.
if 'content-type' in self.headers:
ctype, pdict = parse_header(self.headers['content-type'])
elif self.outerboundary or method != 'POST':
ctype, pdict = "text/plain", {}
else:
ctype, pdict = 'application/x-www-form-urlencoded', {}
self.type = ctype
self.type_options = pdict
self.innerboundary = ""
if 'boundary' in pdict:
self.innerboundary = pdict['boundary']
clen = -1
if 'content-length' in self.headers:
try:
clen = int(self.headers['content-length'])
except ValueError:
pass
if maxlen and clen > maxlen:
raise ValueError, 'Maximum content length exceeded'
self.length = clen
self.list = self.file = None
self.done = 0
if ctype == 'application/x-www-form-urlencoded':
self.read_urlencoded()
elif ctype[:10] == 'multipart/':
self.read_multi(environ, keep_blank_values, strict_parsing)
else:
self.read_single()
def __repr__(self):
"""Return a printable representation."""
return "FieldStorage(%r, %r, %r)" % (
self.name, self.filename, self.value)
def __iter__(self):
return iter(self.keys())
def __getattr__(self, name):
if name != 'value':
raise AttributeError, name
if self.file:
self.file.seek(0)
value = self.file.read()
self.file.seek(0)
elif self.list is not None:
value = self.list
else:
value = None
return value
def __getitem__(self, key):
"""Dictionary style indexing."""
if self.list is None:
raise TypeError, "not indexable"
found = []
for item in self.list:
if item.name == key: found.append(item)
if not found:
raise KeyError, key
if len(found) == 1:
return found[0]
else:
return found
def getvalue(self, key, default=None):
"""Dictionary style get() method, including 'value' lookup."""
if key in self:
value = self[key]
if type(value) is type([]):
return map(attrgetter('value'), value)
else:
return value.value
else:
return default
def getfirst(self, key, default=None):
""" Return the first value received."""
if key in self:
value = self[key]
if type(value) is type([]):
return value[0].value
else:
return value.value
else:
return default
def getlist(self, key):
""" Return list of received values."""
if key in self:
value = self[key]
if type(value) is type([]):
return map(attrgetter('value'), value)
else:
return [value.value]
else:
return []
def keys(self):
"""Dictionary style keys() method."""
if self.list is None:
raise TypeError, "not indexable"
return list(set(item.name for item in self.list))
def has_key(self, key):
"""Dictionary style has_key() method."""
if self.list is None:
raise TypeError, "not indexable"
return any(item.name == key for item in self.list)
def __contains__(self, key):
"""Dictionary style __contains__ method."""
if self.list is None:
raise TypeError, "not indexable"
return any(item.name == key for item in self.list)
def __len__(self):
"""Dictionary style len(x) support."""
return len(self.keys())
def __nonzero__(self):
return bool(self.list)
def read_urlencoded(self):
"""Internal: read data in query string format."""
qs = self.fp.read(self.length)
if self.qs_on_post:
qs += '&' + self.qs_on_post
query = urlparse.parse_qsl(qs, self.keep_blank_values,
self.strict_parsing, self.max_num_fields)
self.list = [MiniFieldStorage(key, value) for key, value in query]
self.skip_lines()
FieldStorageClass = None
def read_multi(self, environ, keep_blank_values, strict_parsing):
"""Internal: read a part that is itself multipart."""
ib = self.innerboundary
if not valid_boundary(ib):
raise ValueError, 'Invalid boundary in multipart form: %r' % (ib,)
self.list = []
if self.qs_on_post:
query = urlparse.parse_qsl(self.qs_on_post,
self.keep_blank_values,
self.strict_parsing,
self.max_num_fields)
self.list.extend(MiniFieldStorage(key, value)
for key, value in query)
FieldStorageClass = None
# Propagate max_num_fields into the sub class appropriately
max_num_fields = self.max_num_fields
if max_num_fields is not None:
max_num_fields -= len(self.list)
klass = self.FieldStorageClass or self.__class__
part = klass(self.fp, {}, ib,
environ, keep_blank_values, strict_parsing,
max_num_fields)
# Throw first part away
while not part.done:
headers = rfc822.Message(self.fp)
part = klass(self.fp, headers, ib,
environ, keep_blank_values, strict_parsing,
max_num_fields)
if max_num_fields is not None:
max_num_fields -= 1
if part.list:
max_num_fields -= len(part.list)
if max_num_fields < 0:
raise ValueError('Max number of fields exceeded')
self.list.append(part)
self.skip_lines()
def read_single(self):
"""Internal: read an atomic part."""
if self.length >= 0:
self.read_binary()
self.skip_lines()
else:
self.read_lines()
self.file.seek(0)
bufsize = 8*1024 # I/O buffering size for copy to file
def read_binary(self):
"""Internal: read binary data."""
self.file = self.make_file('b')
todo = self.length
if todo >= 0:
while todo > 0:
data = self.fp.read(min(todo, self.bufsize))
if not data:
self.done = -1
break
self.file.write(data)
todo = todo - len(data)
def read_lines(self):
"""Internal: read lines until EOF or outerboundary."""
self.file = self.__file = StringIO()
if self.outerboundary:
self.read_lines_to_outerboundary()
else:
self.read_lines_to_eof()
def __write(self, line):
if self.__file is not None:
if self.__file.tell() + len(line) > 1000:
self.file = self.make_file('')
self.file.write(self.__file.getvalue())
self.__file = None
self.file.write(line)
def read_lines_to_eof(self):
"""Internal: read lines until EOF."""
while 1:
line = self.fp.readline(1<<16)
if not line:
self.done = -1
break
self.__write(line)
def read_lines_to_outerboundary(self):
"""Internal: read lines until outerboundary."""
next = "--" + self.outerboundary
last = next + "--"
delim = ""
last_line_lfend = True
while 1:
line = self.fp.readline(1<<16)
if not line:
self.done = -1
break
if delim == "\r":
line = delim + line
delim = ""
if line[:2] == "--" and last_line_lfend:
strippedline = line.strip()
if strippedline == next:
break
if strippedline == last:
self.done = 1
break
odelim = delim
if line[-2:] == "\r\n":
delim = "\r\n"
line = line[:-2]
last_line_lfend = True
elif line[-1] == "\n":
delim = "\n"
line = line[:-1]
last_line_lfend = True
elif line[-1] == "\r":
# We may interrupt \r\n sequences if they span the 2**16
# byte boundary
delim = "\r"
line = line[:-1]
last_line_lfend = False
else:
delim = ""
last_line_lfend = False
self.__write(odelim + line)
def skip_lines(self):
"""Internal: skip lines until outer boundary if defined."""
if not self.outerboundary or self.done:
return
next = "--" + self.outerboundary
last = next + "--"
last_line_lfend = True
while 1:
line = self.fp.readline(1<<16)
if not line:
self.done = -1
break
if line[:2] == "--" and last_line_lfend:
strippedline = line.strip()
if strippedline == next:
break
if strippedline == last:
self.done = 1
break
last_line_lfend = line.endswith('\n')
def make_file(self, binary=None):
"""Overridable: return a readable & writable file.
The file will be used as follows:
- data is written to it
- seek(0)
- data is read from it
The 'binary' argument is unused -- the file is always opened
in binary mode.
This version opens a temporary file for reading and writing,
and immediately deletes (unlinks) it. The trick (on Unix!) is
that the file can still be used, but it can't be opened by
another process, and it will automatically be deleted when it
is closed or when the current process terminates.
If you want a more permanent file, you derive a class which
overrides this method. If you want a visible temporary file
that is nevertheless automatically deleted when the script
terminates, try defining a __del__ method in a derived class
which unlinks the temporary files you have created.
"""
import tempfile
return tempfile.TemporaryFile("w+b")
# Backwards Compatibility Classes
# ===============================
class FormContentDict(UserDict.UserDict):
"""Form content as dictionary with a list of values per field.
form = FormContentDict()
form[key] -> [value, value, ...]
key in form -> Boolean
form.keys() -> [key, key, ...]
form.values() -> [[val, val, ...], [val, val, ...], ...]
form.items() -> [(key, [val, val, ...]), (key, [val, val, ...]), ...]
form.dict == {key: [val, val, ...], ...}
"""
def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0):
self.dict = self.data = parse(environ=environ,
keep_blank_values=keep_blank_values,
strict_parsing=strict_parsing)
self.query_string = environ['QUERY_STRING']
class SvFormContentDict(FormContentDict):
"""Form content as dictionary expecting a single value per field.
If you only expect a single value for each field, then form[key]
will return that single value. It will raise an IndexError if
that expectation is not true. If you expect a field to have
possible multiple values, than you can use form.getlist(key) to
get all of the values. values() and items() are a compromise:
they return single strings where there is a single value, and
lists of strings otherwise.
"""
def __getitem__(self, key):
if len(self.dict[key]) > 1:
raise IndexError, 'expecting a single value'
return self.dict[key][0]
def getlist(self, key):
return self.dict[key]
def values(self):
result = []
for value in self.dict.values():
if len(value) == 1:
result.append(value[0])
else: result.append(value)
return result
def items(self):
result = []
for key, value in self.dict.items():
if len(value) == 1:
result.append((key, value[0]))
else: result.append((key, value))
return result
class InterpFormContentDict(SvFormContentDict):
"""This class is present for backwards compatibility only."""
def __getitem__(self, key):
v = SvFormContentDict.__getitem__(self, key)
if v[0] in '0123456789+-.':
try: return int(v)
except ValueError:
try: return float(v)
except ValueError: pass
return v.strip()
def values(self):
result = []
for key in self.keys():
try:
result.append(self[key])
except IndexError:
result.append(self.dict[key])
return result
def items(self):
result = []
for key in self.keys():
try:
result.append((key, self[key]))
except IndexError:
result.append((key, self.dict[key]))
return result
class FormContent(FormContentDict):
"""This class is present for backwards compatibility only."""
def values(self, key):
if key in self.dict :return self.dict[key]
else: return None
def indexed_value(self, key, location):
if key in self.dict:
if len(self.dict[key]) > location:
return self.dict[key][location]
else: return None
else: return None
def value(self, key):
if key in self.dict: return self.dict[key][0]
else: return None
def length(self, key):
return len(self.dict[key])
def stripped(self, key):
if key in self.dict: return self.dict[key][0].strip()
else: return None
def pars(self):
return self.dict
# Test/debug code
# ===============
def test(environ=os.environ):
"""Robust test CGI script, usable as main program.
Write minimal HTTP headers and dump all information provided to
the script in HTML form.
"""
print "Content-type: text/html"
print
sys.stderr = sys.stdout
try:
form = FieldStorage() # Replace with other classes to test those
print_directory()
print_arguments()
print_form(form)
print_environ(environ)
print_environ_usage()
def f():
exec "testing print_exception() -- <I>italics?</I>"
def g(f=f):
f()
print "<H3>What follows is a test, not an actual exception:</H3>"
g()
except:
print_exception()
print "<H1>Second try with a small maxlen...</H1>"
global maxlen
maxlen = 50
try:
form = FieldStorage() # Replace with other classes to test those
print_directory()
print_arguments()
print_form(form)
print_environ(environ)
except:
print_exception()
def print_exception(type=None, value=None, tb=None, limit=None):
if type is None:
type, value, tb = sys.exc_info()
import traceback
print
print "<H3>Traceback (most recent call last):</H3>"
list = traceback.format_tb(tb, limit) + \
traceback.format_exception_only(type, value)
print "<PRE>%s<B>%s</B></PRE>" % (
escape("".join(list[:-1])),
escape(list[-1]),
)
del tb
def print_environ(environ=os.environ):
"""Dump the shell environment as HTML."""
keys = environ.keys()
keys.sort()
print
print "<H3>Shell Environment:</H3>"
print "<DL>"
for key in keys:
print "<DT>", escape(key), "<DD>", escape(environ[key])
print "</DL>"
print
def print_form(form):
"""Dump the contents of a form as HTML."""
keys = form.keys()
keys.sort()
print
print "<H3>Form Contents:</H3>"
if not keys:
print "<P>No form fields."
print "<DL>"
for key in keys:
print "<DT>" + escape(key) + ":",
value = form[key]
print "<i>" + escape(repr(type(value))) + "</i>"
print "<DD>" + escape(repr(value))
print "</DL>"
print
def print_directory():
"""Dump the current directory as HTML."""
print
print "<H3>Current Working Directory:</H3>"
try:
pwd = os.getcwd()
except os.error, msg:
print "os.error:", escape(str(msg))
else:
print escape(pwd)
print
def print_arguments():
print
print "<H3>Command Line Arguments:</H3>"
print
print sys.argv
print
def print_environ_usage():
"""Dump a list of environment variables used by CGI as HTML."""
print """
<H3>These environment variables could have been set:</H3>
<UL>
<LI>AUTH_TYPE
<LI>CONTENT_LENGTH
<LI>CONTENT_TYPE
<LI>DATE_GMT
<LI>DATE_LOCAL
<LI>DOCUMENT_NAME
<LI>DOCUMENT_ROOT
<LI>DOCUMENT_URI
<LI>GATEWAY_INTERFACE
<LI>LAST_MODIFIED
<LI>PATH
<LI>PATH_INFO
<LI>PATH_TRANSLATED
<LI>QUERY_STRING
<LI>REMOTE_ADDR
<LI>REMOTE_HOST
<LI>REMOTE_IDENT
<LI>REMOTE_USER
<LI>REQUEST_METHOD
<LI>SCRIPT_NAME
<LI>SERVER_NAME
<LI>SERVER_PORT
<LI>SERVER_PROTOCOL
<LI>SERVER_ROOT
<LI>SERVER_SOFTWARE
</UL>
In addition, HTTP headers sent by the server may be passed in the
environment as well. Here are some common variable names:
<UL>
<LI>HTTP_ACCEPT
<LI>HTTP_CONNECTION
<LI>HTTP_HOST
<LI>HTTP_PRAGMA
<LI>HTTP_REFERER
<LI>HTTP_USER_AGENT
</UL>
"""
# Utilities
# =========
def escape(s, quote=None):
'''Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.'''
s = s.replace("&", "&") # Must be done first!
s = s.replace("<", "<")
s = s.replace(">", ">")
if quote:
s = s.replace('"', """)
return s
def valid_boundary(s, _vb_pattern="^[ -~]{0,200}[!-~]$"):
import re
return re.match(_vb_pattern, s)
# Invoke mainline
# ===============
# Call test() when this file is run as a script (not imported as a module)
if __name__ == '__main__':
test()
"""CGI-savvy HTTP Server.
This module builds on SimpleHTTPServer by implementing GET and POST
requests to cgi-bin scripts.
If the os.fork() function is not present (e.g. on Windows),
os.popen2() is used as a fallback, with slightly altered semantics; if
that function is not present either (e.g. on Macintosh), only Python
scripts are supported, and they are executed by the current process.
In all cases, the implementation is intentionally naive -- all
requests are executed sychronously.
SECURITY WARNING: DON'T USE THIS CODE UNLESS YOU ARE INSIDE A FIREWALL
-- it may execute arbitrary Python code or external programs.
Note that status code 200 is sent prior to execution of a CGI script, so
scripts cannot send other status codes such as 302 (redirect).
"""
__version__ = "0.4"
__all__ = ["CGIHTTPRequestHandler"]
import os
import sys
import urllib
import BaseHTTPServer
import SimpleHTTPServer
import select
import copy
class CGIHTTPRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
"""Complete HTTP server with GET, HEAD and POST commands.
GET and HEAD also support running CGI scripts.
The POST command is *only* implemented for CGI scripts.
"""
# Determine platform specifics
have_fork = hasattr(os, 'fork')
have_popen2 = hasattr(os, 'popen2')
have_popen3 = hasattr(os, 'popen3')
# Make rfile unbuffered -- we need to read one line and then pass
# the rest to a subprocess, so we can't use buffered input.
rbufsize = 0
def do_POST(self):
"""Serve a POST request.
This is only implemented for CGI scripts.
"""
if self.is_cgi():
self.run_cgi()
else:
self.send_error(501, "Can only POST to CGI scripts")
def send_head(self):
"""Version of send_head that support CGI scripts"""
if self.is_cgi():
return self.run_cgi()
else:
return SimpleHTTPServer.SimpleHTTPRequestHandler.send_head(self)
def is_cgi(self):
"""Test whether self.path corresponds to a CGI script.
Returns True and updates the cgi_info attribute to the tuple
(dir, rest) if self.path requires running a CGI script.
Returns False otherwise.
If any exception is raised, the caller should assume that
self.path was rejected as invalid and act accordingly.
The default implementation tests whether the normalized url
path begins with one of the strings in self.cgi_directories
(and the next character is a '/' or the end of the string).
"""
collapsed_path = _url_collapse_path(self.path)
dir_sep = collapsed_path.find('/', 1)
head, tail = collapsed_path[:dir_sep], collapsed_path[dir_sep+1:]
if head in self.cgi_directories:
self.cgi_info = head, tail
return True
return False
cgi_directories = ['/cgi-bin', '/htbin']
def is_executable(self, path):
"""Test whether argument path is an executable file."""
return executable(path)
def is_python(self, path):
"""Test whether argument path is a Python script."""
head, tail = os.path.splitext(path)
return tail.lower() in (".py", ".pyw")
def run_cgi(self):
"""Execute a CGI script."""
dir, rest = self.cgi_info
path = dir + '/' + rest
i = path.find('/', len(dir)+1)
while i >= 0:
nextdir = path[:i]
nextrest = path[i+1:]
scriptdir = self.translate_path(nextdir)
if os.path.isdir(scriptdir):
dir, rest = nextdir, nextrest
i = path.find('/', len(dir)+1)
else:
break
# find an explicit query string, if present.
rest, _, query = rest.partition('?')
# dissect the part after the directory name into a script name &
# a possible additional path, to be stored in PATH_INFO.
i = rest.find('/')
if i >= 0:
script, rest = rest[:i], rest[i:]
else:
script, rest = rest, ''
scriptname = dir + '/' + script
scriptfile = self.translate_path(scriptname)
if not os.path.exists(scriptfile):
self.send_error(404, "No such CGI script (%r)" % scriptname)
return
if not os.path.isfile(scriptfile):
self.send_error(403, "CGI script is not a plain file (%r)" %
scriptname)
return
ispy = self.is_python(scriptname)
if not ispy:
if not (self.have_fork or self.have_popen2 or self.have_popen3):
self.send_error(403, "CGI script is not a Python script (%r)" %
scriptname)
return
if not self.is_executable(scriptfile):
self.send_error(403, "CGI script is not executable (%r)" %
scriptname)
return
# Reference: http://hoohoo.ncsa.uiuc.edu/cgi/env.html
# XXX Much of the following could be prepared ahead of time!
env = copy.deepcopy(os.environ)
env['SERVER_SOFTWARE'] = self.version_string()
env['SERVER_NAME'] = self.server.server_name
env['GATEWAY_INTERFACE'] = 'CGI/1.1'
env['SERVER_PROTOCOL'] = self.protocol_version
env['SERVER_PORT'] = str(self.server.server_port)
env['REQUEST_METHOD'] = self.command
uqrest = urllib.unquote(rest)
env['PATH_INFO'] = uqrest
env['PATH_TRANSLATED'] = self.translate_path(uqrest)
env['SCRIPT_NAME'] = scriptname
if query:
env['QUERY_STRING'] = query
host = self.address_string()
if host != self.client_address[0]:
env['REMOTE_HOST'] = host
env['REMOTE_ADDR'] = self.client_address[0]
authorization = self.headers.getheader("authorization")
if authorization:
authorization = authorization.split()
if len(authorization) == 2:
import base64, binascii
env['AUTH_TYPE'] = authorization[0]
if authorization[0].lower() == "basic":
try:
authorization = base64.decodestring(authorization[1])
except binascii.Error:
pass
else:
authorization = authorization.split(':')
if len(authorization) == 2:
env['REMOTE_USER'] = authorization[0]
# XXX REMOTE_IDENT
if self.headers.typeheader is None:
env['CONTENT_TYPE'] = self.headers.type
else:
env['CONTENT_TYPE'] = self.headers.typeheader
length = self.headers.getheader('content-length')
if length:
env['CONTENT_LENGTH'] = length
referer = self.headers.getheader('referer')
if referer:
env['HTTP_REFERER'] = referer
accept = []
for line in self.headers.getallmatchingheaders('accept'):
if line[:1] in "\t\n\r ":
accept.append(line.strip())
else:
accept = accept + line[7:].split(',')
env['HTTP_ACCEPT'] = ','.join(accept)
ua = self.headers.getheader('user-agent')
if ua:
env['HTTP_USER_AGENT'] = ua
co = filter(None, self.headers.getheaders('cookie'))
if co:
env['HTTP_COOKIE'] = ', '.join(co)
# XXX Other HTTP_* headers
# Since we're setting the env in the parent, provide empty
# values to override previously set values
for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH',
'HTTP_USER_AGENT', 'HTTP_COOKIE', 'HTTP_REFERER'):
env.setdefault(k, "")
self.send_response(200, "Script output follows")
decoded_query = query.replace('+', ' ')
if self.have_fork:
# Unix -- fork as we should
args = [script]
if '=' not in decoded_query:
args.append(decoded_query)
nobody = nobody_uid()
self.wfile.flush() # Always flush before forking
pid = os.fork()
if pid != 0:
# Parent
pid, sts = os.waitpid(pid, 0)
# throw away additional data [see bug #427345]
while select.select([self.rfile], [], [], 0)[0]:
if not self.rfile.read(1):
break
if sts:
self.log_error("CGI script exit status %#x", sts)
return
# Child
try:
try:
os.setuid(nobody)
except os.error:
pass
os.dup2(self.rfile.fileno(), 0)
os.dup2(self.wfile.fileno(), 1)
os.execve(scriptfile, args, env)
except:
self.server.handle_error(self.request, self.client_address)
os._exit(127)
else:
# Non Unix - use subprocess
import subprocess
cmdline = [scriptfile]
if self.is_python(scriptfile):
interp = sys.executable
if interp.lower().endswith("w.exe"):
# On Windows, use python.exe, not pythonw.exe
interp = interp[:-5] + interp[-4:]
cmdline = [interp, '-u'] + cmdline
if '=' not in query:
cmdline.append(query)
self.log_message("command: %s", subprocess.list2cmdline(cmdline))
try:
nbytes = int(length)
except (TypeError, ValueError):
nbytes = 0
p = subprocess.Popen(cmdline,
stdin = subprocess.PIPE,
stdout = subprocess.PIPE,
stderr = subprocess.PIPE,
env = env
)
if self.command.lower() == "post" and nbytes > 0:
data = self.rfile.read(nbytes)
else:
data = None
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
stdout, stderr = p.communicate(data)
self.wfile.write(stdout)
if stderr:
self.log_error('%s', stderr)
p.stderr.close()
p.stdout.close()
status = p.returncode
if status:
self.log_error("CGI script exit status %#x", status)
else:
self.log_message("CGI script exited OK")
def _url_collapse_path(path):
"""
Given a URL path, remove extra '/'s and '.' path elements and collapse
any '..' references and returns a colllapsed path.
Implements something akin to RFC-2396 5.2 step 6 to parse relative paths.
The utility of this function is limited to is_cgi method and helps
preventing some security attacks.
Returns: The reconstituted URL, which will always start with a '/'.
Raises: IndexError if too many '..' occur within the path.
"""
# Query component should not be involved.
path, _, query = path.partition('?')
path = urllib.unquote(path)
# Similar to os.path.split(os.path.normpath(path)) but specific to URL
# path semantics rather than local operating system semantics.
path_parts = path.split('/')
head_parts = []
for part in path_parts[:-1]:
if part == '..':
head_parts.pop() # IndexError if more '..' than prior parts
elif part and part != '.':
head_parts.append( part )
if path_parts:
tail_part = path_parts.pop()
if tail_part:
if tail_part == '..':
head_parts.pop()
tail_part = ''
elif tail_part == '.':
tail_part = ''
else:
tail_part = ''
if query:
tail_part = '?'.join((tail_part, query))
splitpath = ('/' + '/'.join(head_parts), tail_part)
collapsed_path = "/".join(splitpath)
return collapsed_path
nobody = None
def nobody_uid():
"""Internal routine to get nobody's uid"""
global nobody
if nobody:
return nobody
try:
import pwd
except ImportError:
return -1
try:
nobody = pwd.getpwnam('nobody')[2]
except KeyError:
nobody = 1 + max(map(lambda x: x[2], pwd.getpwall()))
return nobody
def executable(path):
"""Test for executable file."""
try:
st = os.stat(path)
except os.error:
return False
return st.st_mode & 0111 != 0
def test(HandlerClass = CGIHTTPRequestHandler,
ServerClass = BaseHTTPServer.HTTPServer):
SimpleHTTPServer.test(HandlerClass, ServerClass)
if __name__ == '__main__':
test()
"""More comprehensive traceback formatting for Python scripts.
To enable this module, do:
import cgitb; cgitb.enable()
at the top of your script. The optional arguments to enable() are:
display - if true, tracebacks are displayed in the web browser
logdir - if set, tracebacks are written to files in this directory
context - number of lines of source code to show for each stack frame
format - 'text' or 'html' controls the output format
By default, tracebacks are displayed but not saved, the context is 5 lines
and the output format is 'html' (for backwards compatibility with the
original use of this module)
Alternatively, if you have caught an exception and want cgitb to display it
for you, call cgitb.handler(). The optional argument to handler() is a
3-item tuple (etype, evalue, etb) just like the value of sys.exc_info().
The default handler displays output as HTML.
"""
import inspect
import keyword
import linecache
import os
import pydoc
import sys
import tempfile
import time
import tokenize
import traceback
import types
def reset():
"""Return a string that resets the CGI and browser to a known state."""
return '''<!--: spam
Content-Type: text/html
<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> -->
<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> --> -->
</font> </font> </font> </script> </object> </blockquote> </pre>
</table> </table> </table> </table> </table> </font> </font> </font>'''
__UNDEF__ = [] # a special sentinel object
def small(text):
if text:
return '<small>' + text + '</small>'
else:
return ''
def strong(text):
if text:
return '<strong>' + text + '</strong>'
else:
return ''
def grey(text):
if text:
return '<font color="#909090">' + text + '</font>'
else:
return ''
def lookup(name, frame, locals):
"""Find the value for a given name in the given environment."""
if name in locals:
return 'local', locals[name]
if name in frame.f_globals:
return 'global', frame.f_globals[name]
if '__builtins__' in frame.f_globals:
builtins = frame.f_globals['__builtins__']
if type(builtins) is type({}):
if name in builtins:
return 'builtin', builtins[name]
else:
if hasattr(builtins, name):
return 'builtin', getattr(builtins, name)
return None, __UNDEF__
def scanvars(reader, frame, locals):
"""Scan one logical line of Python and look up values of variables used."""
vars, lasttoken, parent, prefix, value = [], None, None, '', __UNDEF__
for ttype, token, start, end, line in tokenize.generate_tokens(reader):
if ttype == tokenize.NEWLINE: break
if ttype == tokenize.NAME and token not in keyword.kwlist:
if lasttoken == '.':
if parent is not __UNDEF__:
value = getattr(parent, token, __UNDEF__)
vars.append((prefix + token, prefix, value))
else:
where, value = lookup(token, frame, locals)
vars.append((token, where, value))
elif token == '.':
prefix += lasttoken + '.'
parent = value
else:
parent, prefix = None, ''
lasttoken = token
return vars
def html(einfo, context=5):
"""Return a nice HTML document describing a given traceback."""
etype, evalue, etb = einfo
if type(etype) is types.ClassType:
etype = etype.__name__
pyver = 'Python ' + sys.version.split()[0] + ': ' + sys.executable
date = time.ctime(time.time())
head = '<body bgcolor="#f0f0f8">' + pydoc.html.heading(
'<big><big>%s</big></big>' %
strong(pydoc.html.escape(str(etype))),
'#ffffff', '#6622aa', pyver + '<br>' + date) + '''
<p>A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they occurred.</p>'''
indent = '<tt>' + small(' ' * 5) + ' </tt>'
frames = []
records = inspect.getinnerframes(etb, context)
for frame, file, lnum, func, lines, index in records:
if file:
file = os.path.abspath(file)
link = '<a href="file://%s">%s</a>' % (file, pydoc.html.escape(file))
else:
file = link = '?'
args, varargs, varkw, locals = inspect.getargvalues(frame)
call = ''
if func != '?':
call = 'in ' + strong(pydoc.html.escape(func)) + \
inspect.formatargvalues(args, varargs, varkw, locals,
formatvalue=lambda value: '=' + pydoc.html.repr(value))
highlight = {}
def reader(lnum=[lnum]):
highlight[lnum[0]] = 1
try: return linecache.getline(file, lnum[0])
finally: lnum[0] += 1
vars = scanvars(reader, frame, locals)
rows = ['<tr><td bgcolor="#d8bbff">%s%s %s</td></tr>' %
('<big> </big>', link, call)]
if index is not None:
i = lnum - index
for line in lines:
num = small(' ' * (5-len(str(i))) + str(i)) + ' '
if i in highlight:
line = '<tt>=>%s%s</tt>' % (num, pydoc.html.preformat(line))
rows.append('<tr><td bgcolor="#ffccee">%s</td></tr>' % line)
else:
line = '<tt> %s%s</tt>' % (num, pydoc.html.preformat(line))
rows.append('<tr><td>%s</td></tr>' % grey(line))
i += 1
done, dump = {}, []
for name, where, value in vars:
if name in done: continue
done[name] = 1
if value is not __UNDEF__:
if where in ('global', 'builtin'):
name = ('<em>%s</em> ' % where) + strong(name)
elif where == 'local':
name = strong(name)
else:
name = where + strong(name.split('.')[-1])
dump.append('%s = %s' % (name, pydoc.html.repr(value)))
else:
dump.append(name + ' <em>undefined</em>')
rows.append('<tr><td>%s</td></tr>' % small(grey(', '.join(dump))))
frames.append('''
<table width="100%%" cellspacing=0 cellpadding=0 border=0>
%s</table>''' % '\n'.join(rows))
exception = ['<p>%s: %s' % (strong(pydoc.html.escape(str(etype))),
pydoc.html.escape(str(evalue)))]
if isinstance(evalue, BaseException):
for name in dir(evalue):
if name[:1] == '_': continue
value = pydoc.html.repr(getattr(evalue, name))
exception.append('\n<br>%s%s =\n%s' % (indent, name, value))
return head + ''.join(frames) + ''.join(exception) + '''
<!-- The above is a description of an error in a Python program, formatted
for a Web browser because the 'cgitb' module was enabled. In case you
are not reading this in a Web browser, here is the original traceback:
%s
-->
''' % pydoc.html.escape(
''.join(traceback.format_exception(etype, evalue, etb)))
def text(einfo, context=5):
"""Return a plain text document describing a given traceback."""
etype, evalue, etb = einfo
if type(etype) is types.ClassType:
etype = etype.__name__
pyver = 'Python ' + sys.version.split()[0] + ': ' + sys.executable
date = time.ctime(time.time())
head = "%s\n%s\n%s\n" % (str(etype), pyver, date) + '''
A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they occurred.
'''
frames = []
records = inspect.getinnerframes(etb, context)
for frame, file, lnum, func, lines, index in records:
file = file and os.path.abspath(file) or '?'
args, varargs, varkw, locals = inspect.getargvalues(frame)
call = ''
if func != '?':
call = 'in ' + func + \
inspect.formatargvalues(args, varargs, varkw, locals,
formatvalue=lambda value: '=' + pydoc.text.repr(value))
highlight = {}
def reader(lnum=[lnum]):
highlight[lnum[0]] = 1
try: return linecache.getline(file, lnum[0])
finally: lnum[0] += 1
vars = scanvars(reader, frame, locals)
rows = [' %s %s' % (file, call)]
if index is not None:
i = lnum - index
for line in lines:
num = '%5d ' % i
rows.append(num+line.rstrip())
i += 1
done, dump = {}, []
for name, where, value in vars:
if name in done: continue
done[name] = 1
if value is not __UNDEF__:
if where == 'global': name = 'global ' + name
elif where != 'local': name = where + name.split('.')[-1]
dump.append('%s = %s' % (name, pydoc.text.repr(value)))
else:
dump.append(name + ' undefined')
rows.append('\n'.join(dump))
frames.append('\n%s\n' % '\n'.join(rows))
exception = ['%s: %s' % (str(etype), str(evalue))]
if isinstance(evalue, BaseException):
for name in dir(evalue):
value = pydoc.text.repr(getattr(evalue, name))
exception.append('\n%s%s = %s' % (" "*4, name, value))
return head + ''.join(frames) + ''.join(exception) + '''
The above is a description of an error in a Python program. Here is
the original traceback:
%s
''' % ''.join(traceback.format_exception(etype, evalue, etb))
class Hook:
"""A hook to replace sys.excepthook that shows tracebacks in HTML."""
def __init__(self, display=1, logdir=None, context=5, file=None,
format="html"):
self.display = display # send tracebacks to browser if true
self.logdir = logdir # log tracebacks to files if not None
self.context = context # number of source code lines per frame
self.file = file or sys.stdout # place to send the output
self.format = format
def __call__(self, etype, evalue, etb):
self.handle((etype, evalue, etb))
def handle(self, info=None):
info = info or sys.exc_info()
if self.format == "html":
self.file.write(reset())
formatter = (self.format=="html") and html or text
plain = False
try:
doc = formatter(info, self.context)
except: # just in case something goes wrong
doc = ''.join(traceback.format_exception(*info))
plain = True
if self.display:
if plain:
doc = pydoc.html.escape(doc)
self.file.write('<pre>' + doc + '</pre>\n')
else:
self.file.write(doc + '\n')
else:
self.file.write('<p>A problem occurred in a Python script.\n')
if self.logdir is not None:
suffix = ['.txt', '.html'][self.format=="html"]
(fd, path) = tempfile.mkstemp(suffix=suffix, dir=self.logdir)
try:
file = os.fdopen(fd, 'w')
file.write(doc)
file.close()
msg = '%s contains the description of this error.' % path
except:
msg = 'Tried to save traceback to %s, but failed.' % path
if self.format == 'html':
self.file.write('<p>%s</p>\n' % msg)
else:
self.file.write(msg + '\n')
try:
self.file.flush()
except: pass
handler = Hook().handle
def enable(display=1, logdir=None, context=5, format="html"):
"""Install an exception handler that formats tracebacks as HTML.
The optional argument 'display' can be set to 0 to suppress sending the
traceback to the browser, and 'logdir' can be set to a directory to cause
tracebacks to be written to files there."""
sys.excepthook = Hook(display=display, logdir=logdir,
context=context, format=format)
"""Simple class to read IFF chunks.
An IFF chunk (used in formats such as AIFF, TIFF, RMFF (RealMedia File
Format)) has the following structure:
+----------------+
| ID (4 bytes) |
+----------------+
| size (4 bytes) |
+----------------+
| data |
| ... |
+----------------+
The ID is a 4-byte string which identifies the type of chunk.
The size field (a 32-bit value, encoded using big-endian byte order)
gives the size of the whole chunk, including the 8-byte header.
Usually an IFF-type file consists of one or more chunks. The proposed
usage of the Chunk class defined here is to instantiate an instance at
the start of each chunk and read from the instance until it reaches
the end, after which a new instance can be instantiated. At the end
of the file, creating a new instance will fail with an EOFError
exception.
Usage:
while True:
try:
chunk = Chunk(file)
except EOFError:
break
chunktype = chunk.getname()
while True:
data = chunk.read(nbytes)
if not data:
pass
# do something with data
The interface is file-like. The implemented methods are:
read, close, seek, tell, isatty.
Extra methods are: skip() (called by close, skips to the end of the chunk),
getname() (returns the name (ID) of the chunk)
The __init__ method has one required argument, a file-like object
(including a chunk instance), and one optional argument, a flag which
specifies whether or not chunks are aligned on 2-byte boundaries. The
default is 1, i.e. aligned.
"""
class Chunk:
def __init__(self, file, align=True, bigendian=True, inclheader=False):
import struct
self.closed = False
self.align = align # whether to align to word (2-byte) boundaries
if bigendian:
strflag = '>'
else:
strflag = '<'
self.file = file
self.chunkname = file.read(4)
if len(self.chunkname) < 4:
raise EOFError
try:
self.chunksize = struct.unpack(strflag+'L', file.read(4))[0]
except struct.error:
raise EOFError
if inclheader:
self.chunksize = self.chunksize - 8 # subtract header
self.size_read = 0
try:
self.offset = self.file.tell()
except (AttributeError, IOError):
self.seekable = False
else:
self.seekable = True
def getname(self):
"""Return the name (ID) of the current chunk."""
return self.chunkname
def getsize(self):
"""Return the size of the current chunk."""
return self.chunksize
def close(self):
if not self.closed:
try:
self.skip()
finally:
self.closed = True
def isatty(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return False
def seek(self, pos, whence=0):
"""Seek to specified position into the chunk.
Default position is 0 (start of chunk).
If the file is not seekable, this will result in an error.
"""
if self.closed:
raise ValueError, "I/O operation on closed file"
if not self.seekable:
raise IOError, "cannot seek"
if whence == 1:
pos = pos + self.size_read
elif whence == 2:
pos = pos + self.chunksize
if pos < 0 or pos > self.chunksize:
raise RuntimeError
self.file.seek(self.offset + pos, 0)
self.size_read = pos
def tell(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return self.size_read
def read(self, size=-1):
"""Read at most size bytes from the chunk.
If size is omitted or negative, read until the end
of the chunk.
"""
if self.closed:
raise ValueError, "I/O operation on closed file"
if self.size_read >= self.chunksize:
return ''
if size < 0:
size = self.chunksize - self.size_read
if size > self.chunksize - self.size_read:
size = self.chunksize - self.size_read
data = self.file.read(size)
self.size_read = self.size_read + len(data)
if self.size_read == self.chunksize and \
self.align and \
(self.chunksize & 1):
dummy = self.file.read(1)
self.size_read = self.size_read + len(dummy)
return data
def skip(self):
"""Skip the rest of the chunk.
If you are not interested in the contents of the chunk,
this method should be called so that the file points to
the start of the next chunk.
"""
if self.closed:
raise ValueError, "I/O operation on closed file"
if self.seekable:
try:
n = self.chunksize - self.size_read
# maybe fix alignment
if self.align and (self.chunksize & 1):
n = n + 1
self.file.seek(n, 1)
self.size_read = self.size_read + n
return
except IOError:
pass
while self.size_read < self.chunksize:
n = min(8192, self.chunksize - self.size_read)
dummy = self.read(n)
if not dummy:
raise EOFError
# Licensed to the .NET Foundation under one or more agreements.
# The .NET Foundation licenses this file to you under the Apache 2.0 License.
# See the LICENSE file in the project root for more information.
__all__ = ["ClrClass", "ClrInterface", "accepts", "returns", "attribute", "propagate_attributes"]
import clr
clr.AddReference("Microsoft.Dynamic")
clr.AddReference("Microsoft.Scripting")
clr.AddReference("IronPython")
if clr.IsNetCoreApp:
clr.AddReference("System.Reflection.Emit")
import System
from System import Char, Void, Boolean, Array, Type, AppDomain
from System.Reflection import FieldAttributes, MethodAttributes, PropertyAttributes, ParameterAttributes
from System.Reflection import CallingConventions, TypeAttributes, AssemblyName
from System.Reflection.Emit import OpCodes, CustomAttributeBuilder, AssemblyBuilder, AssemblyBuilderAccess
from System.Runtime.InteropServices import DllImportAttribute, CallingConvention, CharSet
from Microsoft.Scripting.Generation import Snippets
from Microsoft.Scripting.Runtime import DynamicOperations
from Microsoft.Scripting.Utils import ReflectionUtils
from IronPython.Runtime import NameType, PythonContext
from IronPython.Runtime.Types import PythonType, ReflectedField, ReflectedProperty
def validate_clr_types(signature_types, var_signature = False):
if not isinstance(signature_types, tuple):
signature_types = (signature_types,)
for t in signature_types:
if type(t) is type(System.IComparable): # type overloaded on generic arity, eg IComparable and IComparable[T]
t = t[()] # select non-generic version
clr_type = clr.GetClrType(t)
if t == Void:
raise TypeError("Void cannot be used in signature")
is_typed = clr.GetPythonType(clr_type) == t
# is_typed needs to be weakened until the generated type
# gets explicitly published as the underlying CLR type
is_typed = is_typed or (hasattr(t, "__metaclass__") and t.__metaclass__ in [ClrInterface, ClrClass])
if not is_typed:
raise Exception, "Invalid CLR type %s" % str(t)
if not var_signature:
if clr_type.IsByRef:
raise TypeError("Byref can only be used as arguments and locals")
# ArgIterator is not present in Silverlight
if hasattr(System, "ArgIterator") and t == System.ArgIterator:
raise TypeError("Stack-referencing types can only be used as arguments and locals")
class TypedFunction(object):
"""
A strongly-typed function can get wrapped up as a staticmethod, a property, etc.
This class represents the raw function, but with the type information
it is decorated with.
Other information is stored as attributes on the function. See propagate_attributes
"""
def __init__(self, function, is_static = False, prop_name_if_prop_get = None, prop_name_if_prop_set = None):
self.function = function
self.is_static = is_static
self.prop_name_if_prop_get = prop_name_if_prop_get
self.prop_name_if_prop_set = prop_name_if_prop_set
class ClrType(type):
"""
Base metaclass for creating strongly-typed CLR types
"""
def is_typed_method(self, function):
if hasattr(function, "arg_types") != hasattr(function, "return_type"):
raise TypeError("One of @accepts and @returns is missing for %s" % function.func_name)
return hasattr(function, "arg_types")
def get_typed_properties(self):
for item_name, item in self.__dict__.items():
if isinstance(item, property):
if item.fget:
if not self.is_typed_method(item.fget): continue
prop_type = item.fget.return_type
else:
if not self.is_typed_method(item.fset): continue
prop_type = item.fset.arg_types[0]
validate_clr_types(prop_type)
clr_prop_type = clr.GetClrType(prop_type)
yield item, item_name, clr_prop_type
def emit_properties(self, typebld):
for prop, prop_name, clr_prop_type in self.get_typed_properties():
self.emit_property(typebld, prop, prop_name, clr_prop_type)
def emit_property(self, typebld, prop, name, clrtype):
prpbld = typebld.DefineProperty(name, PropertyAttributes.None, clrtype, None)
if prop.fget:
getter = self.emitted_methods[(prop.fget.func_name, prop.fget.arg_types)]
prpbld.SetGetMethod(getter)
if prop.fset:
setter = self.emitted_methods[(prop.fset.func_name, prop.fset.arg_types)]
prpbld.SetSetMethod(setter)
def dummy_function(self): raise RuntimeError("this should not get called")
def get_typed_methods(self):
"""
Get all the methods with @accepts (and @returns) decorators
Functions are assumed to be instance methods, unless decorated with @staticmethod
"""
# We avoid using the "types" library as it is not a builtin
FunctionType = type(ClrType.__dict__["dummy_function"])
for item_name, item in self.__dict__.items():
function = None
is_static = False
if isinstance(item, FunctionType):
function, is_static = item, False
elif isinstance(item, staticmethod):
function, is_static = getattr(self, item_name), True
elif isinstance(item, property):
if item.fget and self.is_typed_method(item.fget):
if item.fget.func_name == item_name:
# The property hides the getter. So yield the getter
yield TypedFunction(item.fget, False, item_name, None)
if item.fset and self.is_typed_method(item.fset):
if item.fset.func_name == item_name:
# The property hides the setter. So yield the setter
yield TypedFunction(item.fset, False, None, item_name)
continue
else:
continue
if self.is_typed_method(function):
yield TypedFunction(function, is_static)
def emit_methods(self, typebld):
# We need to track the generated methods so that we can emit properties
# referring these methods.
# Also, the hash is indexed by name *and signature*. Even though Python does
# not have method overloading, property getter and setter functions can have
# the same func_name attribute
self.emitted_methods = {}
for function_info in self.get_typed_methods():
method_builder = self.emit_method(typebld, function_info)
function = function_info.function
if self.emitted_methods.has_key((function.func_name, function.arg_types)):
raise TypeError("methods with clashing names")
self.emitted_methods[(function.func_name, function.arg_types)] = method_builder
def emit_classattribs(self, typebld):
if hasattr(self, '_clrclassattribs'):
for attrib_info in self._clrclassattribs:
if isinstance(attrib_info, type):
ci = clr.GetClrType(attrib_info).GetConstructor(())
cab = CustomAttributeBuilder(ci, ())
elif isinstance(attrib_info, CustomAttributeDecorator):
cab = attrib_info.GetBuilder()
else:
make_decorator = attrib_info()
cab = make_decorator.GetBuilder()
typebld.SetCustomAttribute(cab)
def get_clr_type_name(self):
if hasattr(self, "_clrnamespace"):
return self._clrnamespace + "." + self.__name__
else:
return self.__name__
def create_type(self, typebld):
self.emit_members(typebld)
new_type = typebld.CreateType()
self.map_members(new_type)
return new_type
class ClrInterface(ClrType):
"""
Set __metaclass__ in a Python class declaration to declare a
CLR interface type.
You need to specify object as the base-type if you do not specify any other
interfaces as the base interfaces
"""
def __init__(self, *args):
return super(ClrInterface, self).__init__(*args)
def emit_method(self, typebld, function_info):
assert(not function_info.is_static)
function = function_info.function
attributes = MethodAttributes.Public | MethodAttributes.Virtual | MethodAttributes.Abstract
method_builder = typebld.DefineMethod(
function.func_name,
attributes,
function.return_type,
function.arg_types)
instance_offset = 0 if function_info.is_static else 1
arg_names = function.func_code.co_varnames
for i in xrange(len(function.arg_types)):
# TODO - set non-trivial ParameterAttributes, default value and custom attributes
p = method_builder.DefineParameter(i + 1, ParameterAttributes.None, arg_names[i + instance_offset])
if hasattr(function, "CustomAttributeBuilders"):
for cab in function.CustomAttributeBuilders:
method_builder.SetCustomAttribute(cab)
return method_builder
def emit_members(self, typebld):
self.emit_methods(typebld)
self.emit_properties(typebld)
self.emit_classattribs(typebld)
def map_members(self, new_type): pass
interface_module_builder = None
@staticmethod
def define_interface(typename, bases):
for b in bases:
validate_clr_types(b)
if not ClrInterface.interface_module_builder:
name = AssemblyName("interfaces")
access = AssemblyBuilderAccess.Run
assembly_builder = ReflectionUtils.DefineDynamicAssembly(name, access)
ClrInterface.interface_module_builder = assembly_builder.DefineDynamicModule("interfaces")
attrs = TypeAttributes.Public | TypeAttributes.Interface | TypeAttributes.Abstract
return ClrInterface.interface_module_builder.DefineType(typename, attrs, None, bases)
def map_clr_type(self, clr_type):
"""
TODO - Currently "t = clr.GetPythonType(clr.GetClrType(C)); t == C" will be False
for C where C.__metaclass__ is ClrInterface, even though both t and C
represent the same CLR type. This can be fixed by publishing a mapping
between t and C in the IronPython runtime.
"""
pass
def __clrtype__(self):
# CFoo below will use ClrInterface as its metaclass, but the user will not expect CFoo
# to be an interface in this case:
#
# class IFoo(object):
# __metaclass__ = ClrInterface
# class CFoo(IFoo): pass
if not "__metaclass__" in self.__dict__:
return super(ClrInterface, self).__clrtype__()
bases = list(self.__bases__)
bases.remove(object)
bases = tuple(bases)
if False: # Snippets currently does not support creating interfaces
typegen = Snippets.Shared.DefineType(self.get_clr_type_name(), bases, True, False)
typebld = typegen.TypeBuilder
else:
typebld = ClrInterface.define_interface(self.get_clr_type_name(), bases)
clr_type = self.create_type(typebld)
self.map_clr_type(clr_type)
return clr_type
# Note that ClrClass inherits from ClrInterface to satisfy Python requirements of metaclasses.
# A metaclass of a subtype has to be subtype of the metaclass of a base type. As a result,
# if you define a type hierarchy as shown below, it requires ClrClass to be a subtype
# of ClrInterface:
#
# class IFoo(object):
# __metaclass__ = ClrInterface
# class CFoo(IFoo):
# __metaclass__ = ClrClass
class ClrClass(ClrInterface):
"""
Set __metaclass__ in a Python class declaration to specify strong-type
information for the class or its attributes. The Python class
retains its Python attributes, like being able to add or remove methods.
"""
# Holds the FieldInfo for a static CLR field which points to a
# Microsoft.Scripting.Runtime.DynamicOperations corresponding to the current ScriptEngine
dynamic_operations_field = None
def emit_fields(self, typebld):
if hasattr(self, "_clrfields"):
for fldname in self._clrfields:
field_type = self._clrfields[fldname]
validate_clr_types(field_type)
typebld.DefineField(
fldname,
clr.GetClrType(field_type),
FieldAttributes.Public)
def map_fields(self, new_type):
if hasattr(self, "_clrfields"):
for fldname in self._clrfields:
fldinfo = new_type.GetField(fldname)
setattr(self, fldname, ReflectedField(fldinfo))
@staticmethod
def get_dynamic_operations_field():
if ClrClass.dynamic_operations_field:
return ClrClass.dynamic_operations_field
python_context = clr.GetCurrentRuntime().GetLanguage(PythonContext)
dynamic_operations = DynamicOperations(python_context)
typegen = Snippets.Shared.DefineType(
"DynamicOperationsHolder" + str(hash(python_context)),
object,
True,
False)
typebld = typegen.TypeBuilder
typebld.DefineField(
"DynamicOperations",
DynamicOperations,
FieldAttributes.Public | FieldAttributes.Static)
new_type = typebld.CreateType()
ClrClass.dynamic_operations_field = new_type.GetField("DynamicOperations")
ClrClass.dynamic_operations_field.SetValue(None, dynamic_operations)
return ClrClass.dynamic_operations_field
def emit_typed_stub_to_python_method(self, typebld, function_info):
function = function_info.function
"""
Generate a stub method that repushes all the arguments and
dispatches to DynamicOperations.InvokeMember
"""
invoke_member = clr.GetClrType(DynamicOperations).GetMethod(
"InvokeMember",
Array[Type]((object, str, Array[object])))
# Type.GetMethod raises an AmbiguousMatchException if there is a generic and a non-generic method
# (like DynamicOperations.GetMember) with the same name and signature. So we have to do things
# the hard way
get_member_search = [m for m in clr.GetClrType(DynamicOperations).GetMethods() if m.Name == "GetMember" and not m.IsGenericMethod and m.GetParameters().Length == 2]
assert(len(get_member_search) == 1)
get_member = get_member_search[0]
set_member_search = [m for m in clr.GetClrType(DynamicOperations).GetMethods() if m.Name == "SetMember" and not m.IsGenericMethod and m.GetParameters().Length == 3]
assert(len(set_member_search) == 1)
set_member = set_member_search[0]
convert_to = clr.GetClrType(DynamicOperations).GetMethod(
"ConvertTo",
Array[Type]((object, Type)))
get_type_from_handle = clr.GetClrType(Type).GetMethod("GetTypeFromHandle")
attributes = MethodAttributes.Public
if function_info.is_static: attributes |= MethodAttributes.Static
if function.func_name == "__new__":
if function_info.is_static: raise TypeError
method_builder = typebld.DefineConstructor(
attributes,
CallingConventions.HasThis,
function.arg_types)
raise NotImplementedError("Need to call self.baseType ctor passing in self.get_python_type_field()")
else:
method_builder = typebld.DefineMethod(
function.func_name,
attributes,
function.return_type,
function.arg_types)
instance_offset = 0 if function_info.is_static else 1
arg_names = function.func_code.co_varnames
for i in xrange(len(function.arg_types)):
# TODO - set non-trivial ParameterAttributes, default value and custom attributes
p = method_builder.DefineParameter(i + 1, ParameterAttributes.None, arg_names[i + instance_offset])
ilgen = method_builder.GetILGenerator()
args_array = ilgen.DeclareLocal(Array[object])
args_count = len(function.arg_types)
ilgen.Emit(OpCodes.Ldc_I4, args_count)
ilgen.Emit(OpCodes.Newarr, object)
ilgen.Emit(OpCodes.Stloc, args_array)
for i in xrange(args_count):
arg_type = function.arg_types[i]
if clr.GetClrType(arg_type).IsByRef:
raise NotImplementedError("byref params not supported")
ilgen.Emit(OpCodes.Ldloc, args_array)
ilgen.Emit(OpCodes.Ldc_I4, i)
ilgen.Emit(OpCodes.Ldarg, i + int(not function_info.is_static))
ilgen.Emit(OpCodes.Box, arg_type)
ilgen.Emit(OpCodes.Stelem_Ref)
has_return_value = True
if function_info.prop_name_if_prop_get:
ilgen.Emit(OpCodes.Ldsfld, ClrClass.get_dynamic_operations_field())
ilgen.Emit(OpCodes.Ldarg, 0)
ilgen.Emit(OpCodes.Ldstr, function_info.prop_name_if_prop_get)
ilgen.Emit(OpCodes.Callvirt, get_member)
elif function_info.prop_name_if_prop_set:
ilgen.Emit(OpCodes.Ldsfld, ClrClass.get_dynamic_operations_field())
ilgen.Emit(OpCodes.Ldarg, 0)
ilgen.Emit(OpCodes.Ldstr, function_info.prop_name_if_prop_set)
ilgen.Emit(OpCodes.Ldarg, 1)
ilgen.Emit(OpCodes.Callvirt, set_member)
has_return_value = False
else:
ilgen.Emit(OpCodes.Ldsfld, ClrClass.get_dynamic_operations_field())
if function_info.is_static:
raise NotImplementedError("need to load Python class object from a CLR static field")
# ilgen.Emit(OpCodes.Ldsfld, class_object)
else:
ilgen.Emit(OpCodes.Ldarg, 0)
ilgen.Emit(OpCodes.Ldstr, function.func_name)
ilgen.Emit(OpCodes.Ldloc, args_array)
ilgen.Emit(OpCodes.Callvirt, invoke_member)
if has_return_value:
if function.return_type == Void:
ilgen.Emit(OpCodes.Pop)
else:
ret_val = ilgen.DeclareLocal(object)
ilgen.Emit(OpCodes.Stloc, ret_val)
ilgen.Emit(OpCodes.Ldsfld, ClrClass.get_dynamic_operations_field())
ilgen.Emit(OpCodes.Ldloc, ret_val)
ilgen.Emit(OpCodes.Ldtoken, clr.GetClrType(function.return_type))
ilgen.Emit(OpCodes.Call, get_type_from_handle)
ilgen.Emit(OpCodes.Callvirt, convert_to)
ilgen.Emit(OpCodes.Unbox_Any, function.return_type)
ilgen.Emit(OpCodes.Ret)
return method_builder
def emit_method(self, typebld, function_info):
function = function_info.function
if hasattr(function, "DllImportAttributeDecorator"):
dllImportAttributeDecorator = function.DllImportAttributeDecorator
name = function.func_name
dllName = dllImportAttributeDecorator.args[0]
entryName = function.func_name
attributes = MethodAttributes.Public | MethodAttributes.Static | MethodAttributes.PinvokeImpl
callingConvention = CallingConventions.Standard
returnType = function.return_type
returnTypeRequiredCustomModifiers = ()
returnTypeOptionalCustomModifiers = ()
parameterTypes = function.arg_types
parameterTypeRequiredCustomModifiers = None
parameterTypeOptionalCustomModifiers = None
nativeCallConv = CallingConvention.Winapi
nativeCharSet = CharSet.Auto
method_builder = typebld.DefinePInvokeMethod(
name,
dllName,
entryName,
attributes,
callingConvention,
returnType,
returnTypeRequiredCustomModifiers,
returnTypeOptionalCustomModifiers,
parameterTypes,
parameterTypeRequiredCustomModifiers,
parameterTypeOptionalCustomModifiers,
nativeCallConv,
nativeCharSet)
else:
method_builder = self.emit_typed_stub_to_python_method(typebld, function_info)
if hasattr(function, "CustomAttributeBuilders"):
for cab in function.CustomAttributeBuilders:
method_builder.SetCustomAttribute(cab)
return method_builder
def map_pinvoke_methods(self, new_type):
pythonType = clr.GetPythonType(new_type)
for function_info in self.get_typed_methods():
function = function_info.function
if hasattr(function, "DllImportAttributeDecorator"):
# Overwrite the Python function with the pinvoke_method
pinvoke_method = getattr(pythonType, function.func_name)
setattr(self, function.func_name, pinvoke_method)
def emit_python_type_field(self, typebld):
return typebld.DefineField(
"PythonType",
PythonType,
FieldAttributes.Public | FieldAttributes.Static)
def set_python_type_field(self, new_type):
self.PythonType = new_type.GetField("PythonType")
self.PythonType.SetValue(None, self)
def add_wrapper_ctors(self, baseType, typebld):
python_type_field = self.emit_python_type_field(typebld)
for ctor in baseType.GetConstructors():
ctorparams = ctor.GetParameters()
# leave out the PythonType argument
assert(ctorparams[0].ParameterType == clr.GetClrType(PythonType))
ctorparams = ctorparams[1:]
ctorbld = typebld.DefineConstructor(
ctor.Attributes,
ctor.CallingConvention,
tuple([p.ParameterType for p in ctorparams]))
ilgen = ctorbld.GetILGenerator()
ilgen.Emit(OpCodes.Ldarg, 0)
ilgen.Emit(OpCodes.Ldsfld, python_type_field)
for index in xrange(len(ctorparams)):
ilgen.Emit(OpCodes.Ldarg, index + 1)
ilgen.Emit(OpCodes.Call, ctor)
ilgen.Emit(OpCodes.Ret)
def emit_members(self, typebld):
self.emit_fields(typebld)
self.add_wrapper_ctors(self.baseType, typebld)
super(ClrClass, self).emit_members(typebld)
def map_members(self, new_type):
self.map_fields(new_type)
self.map_pinvoke_methods(new_type)
self.set_python_type_field(new_type)
super(ClrClass, self).map_members(new_type)
def __clrtype__(self):
# CDerived below will use ClrClass as its metaclass, but the user may not expect CDerived
# to be a typed .NET class in this case:
#
# class CBase(object):
# __metaclass__ = ClrClass
# class CDerived(CBase): pass
if not "__metaclass__" in self.__dict__:
return super(ClrClass, self).__clrtype__()
# Create a simple Python type first.
self.baseType = super(ClrType, self).__clrtype__()
# We will now subtype it to create a customized class with the
# CLR attributes as defined by the user
typegen = Snippets.Shared.DefineType(self.get_clr_type_name(), self.baseType, True, False)
typebld = typegen.TypeBuilder
return self.create_type(typebld)
def make_cab(attrib_type, *args, **kwds):
clrtype = clr.GetClrType(attrib_type)
argtypes = tuple(map(lambda x:clr.GetClrType(type(x)), args))
ci = clrtype.GetConstructor(argtypes)
props = ([],[])
fields = ([],[])
for kwd in kwds:
pi = clrtype.GetProperty(kwd)
if pi is not None:
props[0].append(pi)
props[1].append(kwds[kwd])
else:
fi = clrtype.GetField(kwd)
if fi is not None:
fields[0].append(fi)
fields[1].append(kwds[kwd])
else:
raise TypeError("No %s Member found on %s" % (kwd, clrtype.Name))
return CustomAttributeBuilder(ci, args,
tuple(props[0]), tuple(props[1]),
tuple(fields[0]), tuple(fields[1]))
def accepts(*args):
"""
TODO - needs to be merged with clr.accepts
"""
validate_clr_types(args, True)
def decorator(function):
function.arg_types = args
return function
return decorator
def returns(return_type = Void):
"""
TODO - needs to be merged with clr.returns
"""
if return_type != Void:
validate_clr_types(return_type)
def decorator(function):
function.return_type = return_type
return function
return decorator
class CustomAttributeDecorator(object):
"""
This represents information about a custom-attribute applied to a type or a method
Note that we cannot use an instance of System.Attribute to capture this information
as it is not possible to go from an instance of System.Attribute to an instance
of System.Reflection.Emit.CustomAttributeBuilder as the latter needs to know
how to represent information in metadata to later *recreate* a similar instance of
System.Attribute.
Also note that once a CustomAttributeBuilder is created, it is not possible to
query it. Hence, we need to store the arguments required to store the
CustomAttributeBuilder so that pseudo-custom-attributes can get to the information.
"""
def __init__(self, attrib_type, *args, **kwargs):
self.attrib_type = attrib_type
self.args = args
self.kwargs = kwargs
def __call__(self, function):
if self.attrib_type == DllImportAttribute:
function.DllImportAttributeDecorator = self
else:
if not hasattr(function, "CustomAttributeBuilders"):
function.CustomAttributeBuilders = []
function.CustomAttributeBuilders.append(self.GetBuilder())
return function
def GetBuilder(self):
assert not self.attrib_type in [DllImportAttribute]
return make_cab(self.attrib_type, *self.args, **self.kwargs)
def attribute(attrib_type):
"""
This decorator is used to specify a CustomAttribute for a type or method.
"""
def make_decorator(*args, **kwargs):
return CustomAttributeDecorator(attrib_type, *args, **kwargs)
return make_decorator
def propagate_attributes(old_function, new_function):
"""
Use this if you replace a function in a type with ClrInterface or ClrClass as the metaclass.
This will typically be needed if you are defining a decorator which wraps functions with
new functions, and want it to work in conjunction with clrtype
"""
if hasattr(old_function, "return_type"):
new_function.func_name = old_function.func_name
new_function.return_type = old_function.return_type
new_function.arg_types = old_function.arg_types
if hasattr(old_function, "CustomAttributeBuilders"):
new_function.CustomAttributeBuilders = old_function.CustomAttributeBuilders
if hasattr(old_function, "CustomAttributeBuilders"):
new_function.DllImportAttributeDecorator = old_function.DllImportAttributeDecorator
"""A generic class to build line-oriented command interpreters.
Interpreters constructed with this class obey the following conventions:
1. End of file on input is processed as the command 'EOF'.
2. A command is parsed out of each line by collecting the prefix composed
of characters in the identchars member.
3. A command `foo' is dispatched to a method 'do_foo()'; the do_ method
is passed a single argument consisting of the remainder of the line.
4. Typing an empty line repeats the last command. (Actually, it calls the
method `emptyline', which may be overridden in a subclass.)
5. There is a predefined `help' method. Given an argument `topic', it
calls the command `help_topic'. With no arguments, it lists all topics
with defined help_ functions, broken into up to three topics; documented
commands, miscellaneous help topics, and undocumented commands.
6. The command '?' is a synonym for `help'. The command '!' is a synonym
for `shell', if a do_shell method exists.
7. If completion is enabled, completing commands will be done automatically,
and completing of commands args is done by calling complete_foo() with
arguments text, line, begidx, endidx. text is string we are matching
against, all returned matches must begin with it. line is the current
input line (lstripped), begidx and endidx are the beginning and end
indexes of the text being matched, which could be used to provide
different completion depending upon which position the argument is in.
The `default' method may be overridden to intercept commands for which there
is no do_ method.
The `completedefault' method may be overridden to intercept completions for
commands that have no complete_ method.
The data member `self.ruler' sets the character used to draw separator lines
in the help messages. If empty, no ruler line is drawn. It defaults to "=".
If the value of `self.intro' is nonempty when the cmdloop method is called,
it is printed out on interpreter startup. This value may be overridden
via an optional argument to the cmdloop() method.
The data members `self.doc_header', `self.misc_header', and
`self.undoc_header' set the headers used for the help function's
listings of documented functions, miscellaneous topics, and undocumented
functions respectively.
These interpreters use raw_input; thus, if the readline module is loaded,
they automatically support Emacs-like command history and editing features.
"""
import string
__all__ = ["Cmd"]
PROMPT = '(Cmd) '
IDENTCHARS = string.ascii_letters + string.digits + '_'
class Cmd:
"""A simple framework for writing line-oriented command interpreters.
These are often useful for test harnesses, administrative tools, and
prototypes that will later be wrapped in a more sophisticated interface.
A Cmd instance or subclass instance is a line-oriented interpreter
framework. There is no good reason to instantiate Cmd itself; rather,
it's useful as a superclass of an interpreter class you define yourself
in order to inherit Cmd's methods and encapsulate action methods.
"""
prompt = PROMPT
identchars = IDENTCHARS
ruler = '='
lastcmd = ''
intro = None
doc_leader = ""
doc_header = "Documented commands (type help <topic>):"
misc_header = "Miscellaneous help topics:"
undoc_header = "Undocumented commands:"
nohelp = "*** No help on %s"
use_rawinput = 1
def __init__(self, completekey='tab', stdin=None, stdout=None):
"""Instantiate a line-oriented interpreter framework.
The optional argument 'completekey' is the readline name of a
completion key; it defaults to the Tab key. If completekey is
not None and the readline module is available, command completion
is done automatically. The optional arguments stdin and stdout
specify alternate input and output file objects; if not specified,
sys.stdin and sys.stdout are used.
"""
import sys
if stdin is not None:
self.stdin = stdin
else:
self.stdin = sys.stdin
if stdout is not None:
self.stdout = stdout
else:
self.stdout = sys.stdout
self.cmdqueue = []
self.completekey = completekey
def cmdloop(self, intro=None):
"""Repeatedly issue a prompt, accept input, parse an initial prefix
off the received input, and dispatch to action methods, passing them
the remainder of the line as argument.
"""
self.preloop()
if self.use_rawinput and self.completekey:
try:
import readline
self.old_completer = readline.get_completer()
readline.set_completer(self.complete)
readline.parse_and_bind(self.completekey+": complete")
except ImportError:
pass
try:
if intro is not None:
self.intro = intro
if self.intro:
self.stdout.write(str(self.intro)+"\n")
stop = None
while not stop:
if self.cmdqueue:
line = self.cmdqueue.pop(0)
else:
if self.use_rawinput:
try:
line = raw_input(self.prompt)
except EOFError:
line = 'EOF'
else:
self.stdout.write(self.prompt)
self.stdout.flush()
line = self.stdin.readline()
if not len(line):
line = 'EOF'
else:
line = line.rstrip('\r\n')
line = self.precmd(line)
stop = self.onecmd(line)
stop = self.postcmd(stop, line)
self.postloop()
finally:
if self.use_rawinput and self.completekey:
try:
import readline
readline.set_completer(self.old_completer)
except ImportError:
pass
def precmd(self, line):
"""Hook method executed just before the command line is
interpreted, but after the input prompt is generated and issued.
"""
return line
def postcmd(self, stop, line):
"""Hook method executed just after a command dispatch is finished."""
return stop
def preloop(self):
"""Hook method executed once when the cmdloop() method is called."""
pass
def postloop(self):
"""Hook method executed once when the cmdloop() method is about to
return.
"""
pass
def parseline(self, line):
"""Parse the line into a command name and a string containing
the arguments. Returns a tuple containing (command, args, line).
'command' and 'args' may be None if the line couldn't be parsed.
"""
line = line.strip()
if not line:
return None, None, line
elif line[0] == '?':
line = 'help ' + line[1:]
elif line[0] == '!':
if hasattr(self, 'do_shell'):
line = 'shell ' + line[1:]
else:
return None, None, line
i, n = 0, len(line)
while i < n and line[i] in self.identchars: i = i+1
cmd, arg = line[:i], line[i:].strip()
return cmd, arg, line
def onecmd(self, line):
"""Interpret the argument as though it had been typed in response
to the prompt.
This may be overridden, but should not normally need to be;
see the precmd() and postcmd() methods for useful execution hooks.
The return value is a flag indicating whether interpretation of
commands by the interpreter should stop.
"""
cmd, arg, line = self.parseline(line)
if not line:
return self.emptyline()
if cmd is None:
return self.default(line)
self.lastcmd = line
if line == 'EOF' :
self.lastcmd = ''
if cmd == '':
return self.default(line)
else:
try:
func = getattr(self, 'do_' + cmd)
except AttributeError:
return self.default(line)
return func(arg)
def emptyline(self):
"""Called when an empty line is entered in response to the prompt.
If this method is not overridden, it repeats the last nonempty
command entered.
"""
if self.lastcmd:
return self.onecmd(self.lastcmd)
def default(self, line):
"""Called on an input line when the command prefix is not recognized.
If this method is not overridden, it prints an error message and
returns.
"""
self.stdout.write('*** Unknown syntax: %s\n'%line)
def completedefault(self, *ignored):
"""Method called to complete an input line when no command-specific
complete_*() method is available.
By default, it returns an empty list.
"""
return []
def completenames(self, text, *ignored):
dotext = 'do_'+text
return [a[3:] for a in self.get_names() if a.startswith(dotext)]
def complete(self, text, state):
"""Return the next possible completion for 'text'.
If a command has not been entered, then complete against command list.
Otherwise try to call complete_<command> to get list of completions.
"""
if state == 0:
import readline
origline = readline.get_line_buffer()
line = origline.lstrip()
stripped = len(origline) - len(line)
begidx = readline.get_begidx() - stripped
endidx = readline.get_endidx() - stripped
if begidx>0:
cmd, args, foo = self.parseline(line)
if cmd == '':
compfunc = self.completedefault
else:
try:
compfunc = getattr(self, 'complete_' + cmd)
except AttributeError:
compfunc = self.completedefault
else:
compfunc = self.completenames
self.completion_matches = compfunc(text, line, begidx, endidx)
try:
return self.completion_matches[state]
except IndexError:
return None
def get_names(self):
# This method used to pull in base class attributes
# at a time dir() didn't do it yet.
return dir(self.__class__)
def complete_help(self, *args):
commands = set(self.completenames(*args))
topics = set(a[5:] for a in self.get_names()
if a.startswith('help_' + args[0]))
return list(commands | topics)
def do_help(self, arg):
'List available commands with "help" or detailed help with "help cmd".'
if arg:
# XXX check arg syntax
try:
func = getattr(self, 'help_' + arg)
except AttributeError:
try:
doc=getattr(self, 'do_' + arg).__doc__
if doc:
self.stdout.write("%s\n"%str(doc))
return
except AttributeError:
pass
self.stdout.write("%s\n"%str(self.nohelp % (arg,)))
return
func()
else:
names = self.get_names()
cmds_doc = []
cmds_undoc = []
help = {}
for name in names:
if name[:5] == 'help_':
help[name[5:]]=1
names.sort()
# There can be duplicates if routines overridden
prevname = ''
for name in names:
if name[:3] == 'do_':
if name == prevname:
continue
prevname = name
cmd=name[3:]
if cmd in help:
cmds_doc.append(cmd)
del help[cmd]
elif getattr(self, name).__doc__:
cmds_doc.append(cmd)
else:
cmds_undoc.append(cmd)
self.stdout.write("%s\n"%str(self.doc_leader))
self.print_topics(self.doc_header, cmds_doc, 15,80)
self.print_topics(self.misc_header, help.keys(),15,80)
self.print_topics(self.undoc_header, cmds_undoc, 15,80)
def print_topics(self, header, cmds, cmdlen, maxcol):
if cmds:
self.stdout.write("%s\n"%str(header))
if self.ruler:
self.stdout.write("%s\n"%str(self.ruler * len(header)))
self.columnize(cmds, maxcol-1)
self.stdout.write("\n")
def columnize(self, list, displaywidth=80):
"""Display a list of strings as a compact set of columns.
Each column is only as wide as necessary.
Columns are separated by two spaces (one was not legible enough).
"""
if not list:
self.stdout.write("<empty>\n")
return
nonstrings = [i for i in range(len(list))
if not isinstance(list[i], str)]
if nonstrings:
raise TypeError, ("list[i] not a string for i in %s" %
", ".join(map(str, nonstrings)))
size = len(list)
if size == 1:
self.stdout.write('%s\n'%str(list[0]))
return
# Try every row count from 1 upwards
for nrows in range(1, len(list)):
ncols = (size+nrows-1) // nrows
colwidths = []
totwidth = -2
for col in range(ncols):
colwidth = 0
for row in range(nrows):
i = row + nrows*col
if i >= size:
break
x = list[i]
colwidth = max(colwidth, len(x))
colwidths.append(colwidth)
totwidth += colwidth + 2
if totwidth > displaywidth:
break
if totwidth <= displaywidth:
break
else:
nrows = len(list)
ncols = 1
colwidths = [0]
for row in range(nrows):
texts = []
for col in range(ncols):
i = row + nrows*col
if i >= size:
x = ""
else:
x = list[i]
texts.append(x)
while texts and not texts[-1]:
del texts[-1]
for col in range(len(texts)):
texts[col] = texts[col].ljust(colwidths[col])
self.stdout.write("%s\n"%str(" ".join(texts)))
"""Utilities needed to emulate Python's interactive interpreter.
"""
# Inspired by similar code by Jeff Epler and Fredrik Lundh.
import sys
import traceback
from codeop import CommandCompiler, compile_command
__all__ = ["InteractiveInterpreter", "InteractiveConsole", "interact",
"compile_command"]
def softspace(file, newvalue):
oldvalue = 0
try:
oldvalue = file.softspace
except AttributeError:
pass
try:
file.softspace = newvalue
except (AttributeError, TypeError):
# "attribute-less object" or "read-only attributes"
pass
return oldvalue
class InteractiveInterpreter:
"""Base class for InteractiveConsole.
This class deals with parsing and interpreter state (the user's
namespace); it doesn't deal with input buffering or prompting or
input file naming (the filename is always passed in explicitly).
"""
def __init__(self, locals=None):
"""Constructor.
The optional 'locals' argument specifies the dictionary in
which code will be executed; it defaults to a newly created
dictionary with key "__name__" set to "__console__" and key
"__doc__" set to None.
"""
if locals is None:
locals = {"__name__": "__console__", "__doc__": None}
self.locals = locals
self.compile = CommandCompiler()
def runsource(self, source, filename="<input>", symbol="single"):
"""Compile and run some source in the interpreter.
Arguments are as for compile_command().
One several things can happen:
1) The input is incorrect; compile_command() raised an
exception (SyntaxError or OverflowError). A syntax traceback
will be printed by calling the showsyntaxerror() method.
2) The input is incomplete, and more input is required;
compile_command() returned None. Nothing happens.
3) The input is complete; compile_command() returned a code
object. The code is executed by calling self.runcode() (which
also handles run-time exceptions, except for SystemExit).
The return value is True in case 2, False in the other cases (unless
an exception is raised). The return value can be used to
decide whether to use sys.ps1 or sys.ps2 to prompt the next
line.
"""
try:
code = self.compile(source, filename, symbol)
except (OverflowError, SyntaxError, ValueError):
# Case 1
self.showsyntaxerror(filename)
return False
if code is None:
# Case 2
return True
# Case 3
self.runcode(code)
return False
def runcode(self, code):
"""Execute a code object.
When an exception occurs, self.showtraceback() is called to
display a traceback. All exceptions are caught except
SystemExit, which is reraised.
A note about KeyboardInterrupt: this exception may occur
elsewhere in this code, and may not always be caught. The
caller should be prepared to deal with it.
"""
try:
exec code in self.locals
except SystemExit:
raise
except:
self.showtraceback()
else:
if softspace(sys.stdout, 0):
print
def showsyntaxerror(self, filename=None):
"""Display the syntax error that just occurred.
This doesn't display a stack trace because there isn't one.
If a filename is given, it is stuffed in the exception instead
of what was there before (because Python's parser always uses
"<string>" when reading from a string).
The output is written by self.write(), below.
"""
type, value, sys.last_traceback = sys.exc_info()
sys.last_type = type
sys.last_value = value
if filename and type is SyntaxError:
# Work hard to stuff the correct filename in the exception
try:
msg, (dummy_filename, lineno, offset, line) = value
except:
# Not the format we expect; leave it alone
pass
else:
# Stuff in the right filename
value = SyntaxError(msg, (filename, lineno, offset, line))
sys.last_value = value
list = traceback.format_exception_only(type, value)
map(self.write, list)
def showtraceback(self):
"""Display the exception that just occurred.
We remove the first stack item because it is our own code.
The output is written by self.write(), below.
"""
try:
type, value, tb = sys.exc_info()
sys.last_type = type
sys.last_value = value
sys.last_traceback = tb
tblist = traceback.extract_tb(tb)
del tblist[:1]
list = traceback.format_list(tblist)
if list:
list.insert(0, "Traceback (most recent call last):\n")
list[len(list):] = traceback.format_exception_only(type, value)
finally:
tblist = tb = None
map(self.write, list)
def write(self, data):
"""Write a string.
The base implementation writes to sys.stderr; a subclass may
replace this with a different implementation.
"""
sys.stderr.write(data)
class InteractiveConsole(InteractiveInterpreter):
"""Closely emulate the behavior of the interactive Python interpreter.
This class builds on InteractiveInterpreter and adds prompting
using the familiar sys.ps1 and sys.ps2, and input buffering.
"""
def __init__(self, locals=None, filename="<console>"):
"""Constructor.
The optional locals argument will be passed to the
InteractiveInterpreter base class.
The optional filename argument should specify the (file)name
of the input stream; it will show up in tracebacks.
"""
InteractiveInterpreter.__init__(self, locals)
self.filename = filename
self.resetbuffer()
def resetbuffer(self):
"""Reset the input buffer."""
self.buffer = []
def interact(self, banner=None):
"""Closely emulate the interactive Python console.
The optional banner argument specify the banner to print
before the first interaction; by default it prints a banner
similar to the one printed by the real Python interpreter,
followed by the current class name in parentheses (so as not
to confuse this with the real interpreter -- since it's so
close!).
"""
try:
sys.ps1
except AttributeError:
sys.ps1 = ">>> "
try:
sys.ps2
except AttributeError:
sys.ps2 = "... "
cprt = 'Type "help", "copyright", "credits" or "license" for more information.'
if banner is None:
self.write("Python %s on %s\n%s\n(%s)\n" %
(sys.version, sys.platform, cprt,
self.__class__.__name__))
else:
self.write("%s\n" % str(banner))
more = 0
while 1:
try:
if more:
prompt = sys.ps2
else:
prompt = sys.ps1
try:
line = self.raw_input(prompt)
# Can be None if sys.stdin was redefined
encoding = getattr(sys.stdin, "encoding", None)
if encoding and not isinstance(line, unicode):
line = line.decode(encoding)
except EOFError:
self.write("\n")
break
else:
more = self.push(line)
except KeyboardInterrupt:
self.write("\nKeyboardInterrupt\n")
self.resetbuffer()
more = 0
def push(self, line):
"""Push a line to the interpreter.
The line should not have a trailing newline; it may have
internal newlines. The line is appended to a buffer and the
interpreter's runsource() method is called with the
concatenated contents of the buffer as source. If this
indicates that the command was executed or invalid, the buffer
is reset; otherwise, the command is incomplete, and the buffer
is left as it was after the line was appended. The return
value is 1 if more input is required, 0 if the line was dealt
with in some way (this is the same as runsource()).
"""
self.buffer.append(line)
source = "\n".join(self.buffer)
more = self.runsource(source, self.filename)
if not more:
self.resetbuffer()
return more
def raw_input(self, prompt=""):
"""Write a prompt and read a line.
The returned line does not include the trailing newline.
When the user enters the EOF key sequence, EOFError is raised.
The base implementation uses the built-in function
raw_input(); a subclass may replace this with a different
implementation.
"""
return raw_input(prompt)
def interact(banner=None, readfunc=None, local=None):
"""Closely emulate the interactive Python interpreter.
This is a backwards compatible interface to the InteractiveConsole
class. When readfunc is not specified, it attempts to import the
readline module to enable GNU readline if it is available.
Arguments (all optional, all default to None):
banner -- passed to InteractiveConsole.interact()
readfunc -- if not None, replaces InteractiveConsole.raw_input()
local -- passed to InteractiveInterpreter.__init__()
"""
console = InteractiveConsole(local)
if readfunc is not None:
console.raw_input = readfunc
else:
try:
import readline
except ImportError:
pass
console.interact(banner)
if __name__ == "__main__":
interact()
""" codecs -- Python Codec Registry, API and helpers.
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""#"
import __builtin__, sys
### Registry and builtin stateless codec functions
try:
from _codecs import *
except ImportError, why:
raise SystemError('Failed to load the builtin codecs: %s' % why)
__all__ = ["register", "lookup", "open", "EncodedFile", "BOM", "BOM_BE",
"BOM_LE", "BOM32_BE", "BOM32_LE", "BOM64_BE", "BOM64_LE",
"BOM_UTF8", "BOM_UTF16", "BOM_UTF16_LE", "BOM_UTF16_BE",
"BOM_UTF32", "BOM_UTF32_LE", "BOM_UTF32_BE",
"CodecInfo", "Codec", "IncrementalEncoder", "IncrementalDecoder",
"StreamReader", "StreamWriter",
"StreamReaderWriter", "StreamRecoder",
"getencoder", "getdecoder", "getincrementalencoder",
"getincrementaldecoder", "getreader", "getwriter",
"encode", "decode", "iterencode", "iterdecode",
"strict_errors", "ignore_errors", "replace_errors",
"xmlcharrefreplace_errors", "backslashreplace_errors",
"register_error", "lookup_error"]
### Constants
#
# Byte Order Mark (BOM = ZERO WIDTH NO-BREAK SPACE = U+FEFF)
# and its possible byte string values
# for UTF8/UTF16/UTF32 output and little/big endian machines
#
# UTF-8
BOM_UTF8 = '\xef\xbb\xbf'
# UTF-16, little endian
BOM_LE = BOM_UTF16_LE = '\xff\xfe'
# UTF-16, big endian
BOM_BE = BOM_UTF16_BE = '\xfe\xff'
# UTF-32, little endian
BOM_UTF32_LE = '\xff\xfe\x00\x00'
# UTF-32, big endian
BOM_UTF32_BE = '\x00\x00\xfe\xff'
if sys.byteorder == 'little':
# UTF-16, native endianness
BOM = BOM_UTF16 = BOM_UTF16_LE
# UTF-32, native endianness
BOM_UTF32 = BOM_UTF32_LE
else:
# UTF-16, native endianness
BOM = BOM_UTF16 = BOM_UTF16_BE
# UTF-32, native endianness
BOM_UTF32 = BOM_UTF32_BE
# Old broken names (don't use in new code)
BOM32_LE = BOM_UTF16_LE
BOM32_BE = BOM_UTF16_BE
BOM64_LE = BOM_UTF32_LE
BOM64_BE = BOM_UTF32_BE
### Codec base classes (defining the API)
class CodecInfo(tuple):
"""Codec details when looking up the codec registry"""
# Private API to allow Python to blacklist the known non-Unicode
# codecs in the standard library. A more general mechanism to
# reliably distinguish test encodings from other codecs will hopefully
# be defined for Python 3.5
#
# See http://bugs.python.org/issue19619
_is_text_encoding = True # Assume codecs are text encodings by default
def __new__(cls, encode, decode, streamreader=None, streamwriter=None,
incrementalencoder=None, incrementaldecoder=None, name=None,
_is_text_encoding=None):
self = tuple.__new__(cls, (encode, decode, streamreader, streamwriter))
self.name = name
self.encode = encode
self.decode = decode
self.incrementalencoder = incrementalencoder
self.incrementaldecoder = incrementaldecoder
self.streamwriter = streamwriter
self.streamreader = streamreader
if _is_text_encoding is not None:
self._is_text_encoding = _is_text_encoding
return self
def __repr__(self):
return "<%s.%s object for encoding %s at 0x%x>" % (self.__class__.__module__, self.__class__.__name__, self.name, id(self))
class Codec:
""" Defines the interface for stateless encoders/decoders.
The .encode()/.decode() methods may use different error
handling schemes by providing the errors argument. These
string values are predefined:
'strict' - raise a ValueError error (or a subclass)
'ignore' - ignore the character and continue with the next
'replace' - replace with a suitable replacement character;
Python will use the official U+FFFD REPLACEMENT
CHARACTER for the builtin Unicode codecs on
decoding and '?' on encoding.
'xmlcharrefreplace' - Replace with the appropriate XML
character reference (only for encoding).
'backslashreplace' - Replace with backslashed escape sequences
(only for encoding).
The set of allowed values can be extended via register_error.
"""
def encode(self, input, errors='strict'):
""" Encodes the object input and returns a tuple (output
object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling.
The method may not store state in the Codec instance. Use
StreamWriter for codecs which have to keep state in order to
make encoding efficient.
The encoder must be able to handle zero length input and
return an empty object of the output object type in this
situation.
"""
raise NotImplementedError
def decode(self, input, errors='strict'):
""" Decodes the object input and returns a tuple (output
object, length consumed).
input must be an object which provides the bf_getreadbuf
buffer slot. Python strings, buffer objects and memory
mapped files are examples of objects providing this slot.
errors defines the error handling to apply. It defaults to
'strict' handling.
The method may not store state in the Codec instance. Use
StreamReader for codecs which have to keep state in order to
make decoding efficient.
The decoder must be able to handle zero length input and
return an empty object of the output object type in this
situation.
"""
raise NotImplementedError
class IncrementalEncoder(object):
"""
An IncrementalEncoder encodes an input in multiple steps. The input can be
passed piece by piece to the encode() method. The IncrementalEncoder remembers
the state of the Encoding process between calls to encode().
"""
def __init__(self, errors='strict'):
"""
Creates an IncrementalEncoder instance.
The IncrementalEncoder may use different error handling schemes by
providing the errors keyword argument. See the module docstring
for a list of possible values.
"""
self.errors = errors
self.buffer = ""
def encode(self, input, final=False):
"""
Encodes input and returns the resulting object.
"""
raise NotImplementedError
def reset(self):
"""
Resets the encoder to the initial state.
"""
def getstate(self):
"""
Return the current state of the encoder.
"""
return 0
def setstate(self, state):
"""
Set the current state of the encoder. state must have been
returned by getstate().
"""
class BufferedIncrementalEncoder(IncrementalEncoder):
"""
This subclass of IncrementalEncoder can be used as the baseclass for an
incremental encoder if the encoder must keep some of the output in a
buffer between calls to encode().
"""
def __init__(self, errors='strict'):
IncrementalEncoder.__init__(self, errors)
self.buffer = "" # unencoded input that is kept between calls to encode()
def _buffer_encode(self, input, errors, final):
# Overwrite this method in subclasses: It must encode input
# and return an (output, length consumed) tuple
raise NotImplementedError
def encode(self, input, final=False):
# encode input (taking the buffer into account)
data = self.buffer + input
(result, consumed) = self._buffer_encode(data, self.errors, final)
# keep unencoded input until the next call
self.buffer = data[consumed:]
return result
def reset(self):
IncrementalEncoder.reset(self)
self.buffer = ""
def getstate(self):
return self.buffer or 0
def setstate(self, state):
self.buffer = state or ""
class IncrementalDecoder(object):
"""
An IncrementalDecoder decodes an input in multiple steps. The input can be
passed piece by piece to the decode() method. The IncrementalDecoder
remembers the state of the decoding process between calls to decode().
"""
def __init__(self, errors='strict'):
"""
Creates an IncrementalDecoder instance.
The IncrementalDecoder may use different error handling schemes by
providing the errors keyword argument. See the module docstring
for a list of possible values.
"""
self.errors = errors
def decode(self, input, final=False):
"""
Decodes input and returns the resulting object.
"""
raise NotImplementedError
def reset(self):
"""
Resets the decoder to the initial state.
"""
def getstate(self):
"""
Return the current state of the decoder.
This must be a (buffered_input, additional_state_info) tuple.
buffered_input must be a bytes object containing bytes that
were passed to decode() that have not yet been converted.
additional_state_info must be a non-negative integer
representing the state of the decoder WITHOUT yet having
processed the contents of buffered_input. In the initial state
and after reset(), getstate() must return (b"", 0).
"""
return (b"", 0)
def setstate(self, state):
"""
Set the current state of the decoder.
state must have been returned by getstate(). The effect of
setstate((b"", 0)) must be equivalent to reset().
"""
class BufferedIncrementalDecoder(IncrementalDecoder):
"""
This subclass of IncrementalDecoder can be used as the baseclass for an
incremental decoder if the decoder must be able to handle incomplete byte
sequences.
"""
def __init__(self, errors='strict'):
IncrementalDecoder.__init__(self, errors)
self.buffer = "" # undecoded input that is kept between calls to decode()
def _buffer_decode(self, input, errors, final):
# Overwrite this method in subclasses: It must decode input
# and return an (output, length consumed) tuple
raise NotImplementedError
def decode(self, input, final=False):
# decode input (taking the buffer into account)
data = self.buffer + input
(result, consumed) = self._buffer_decode(data, self.errors, final)
# keep undecoded input until the next call
self.buffer = data[consumed:]
return result
def reset(self):
IncrementalDecoder.reset(self)
self.buffer = ""
def getstate(self):
# additional state info is always 0
return (self.buffer, 0)
def setstate(self, state):
# ignore additional state info
self.buffer = state[0]
#
# The StreamWriter and StreamReader class provide generic working
# interfaces which can be used to implement new encoding submodules
# very easily. See encodings/utf_8.py for an example on how this is
# done.
#
class StreamWriter(Codec):
def __init__(self, stream, errors='strict'):
""" Creates a StreamWriter instance.
stream must be a file-like object open for writing
(binary) data.
The StreamWriter may use different error handling
schemes by providing the errors keyword argument. These
parameters are predefined:
'strict' - raise a ValueError (or a subclass)
'ignore' - ignore the character and continue with the next
'replace'- replace with a suitable replacement character
'xmlcharrefreplace' - Replace with the appropriate XML
character reference.
'backslashreplace' - Replace with backslashed escape
sequences (only for encoding).
The set of allowed parameter values can be extended via
register_error.
"""
self.stream = stream
self.errors = errors
def write(self, object):
""" Writes the object's contents encoded to self.stream.
"""
data, consumed = self.encode(object, self.errors)
self.stream.write(data)
def writelines(self, list):
""" Writes the concatenated list of strings to the stream
using .write().
"""
self.write(''.join(list))
def reset(self):
""" Flushes and resets the codec buffers used for keeping state.
Calling this method should ensure that the data on the
output is put into a clean state, that allows appending
of new fresh data without having to rescan the whole
stream to recover state.
"""
pass
def seek(self, offset, whence=0):
self.stream.seek(offset, whence)
if whence == 0 and offset == 0:
self.reset()
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
def __enter__(self):
return self
def __exit__(self, type, value, tb):
self.stream.close()
###
class StreamReader(Codec):
def __init__(self, stream, errors='strict'):
""" Creates a StreamReader instance.
stream must be a file-like object open for reading
(binary) data.
The StreamReader may use different error handling
schemes by providing the errors keyword argument. These
parameters are predefined:
'strict' - raise a ValueError (or a subclass)
'ignore' - ignore the character and continue with the next
'replace'- replace with a suitable replacement character;
The set of allowed parameter values can be extended via
register_error.
"""
self.stream = stream
self.errors = errors
self.bytebuffer = ""
# For str->str decoding this will stay a str
# For str->unicode decoding the first read will promote it to unicode
self.charbuffer = ""
self.linebuffer = None
def decode(self, input, errors='strict'):
raise NotImplementedError
def read(self, size=-1, chars=-1, firstline=False):
""" Decodes data from the stream self.stream and returns the
resulting object.
chars indicates the number of characters to read from the
stream. read() will never return more than chars
characters, but it might return less, if there are not enough
characters available.
size indicates the approximate maximum number of bytes to
read from the stream for decoding purposes. The decoder
can modify this setting as appropriate. The default value
-1 indicates to read and decode as much as possible. size
is intended to prevent having to decode huge files in one
step.
If firstline is true, and a UnicodeDecodeError happens
after the first line terminator in the input only the first line
will be returned, the rest of the input will be kept until the
next call to read().
The method should use a greedy read strategy meaning that
it should read as much data as is allowed within the
definition of the encoding and the given size, e.g. if
optional encoding endings or state markers are available
on the stream, these should be read too.
"""
# If we have lines cached, first merge them back into characters
if self.linebuffer:
self.charbuffer = "".join(self.linebuffer)
self.linebuffer = None
if chars < 0:
# For compatibility with other read() methods that take a
# single argument
chars = size
# read until we get the required number of characters (if available)
while True:
# can the request be satisfied from the character buffer?
if chars >= 0:
if len(self.charbuffer) >= chars:
break
# we need more data
if size < 0:
newdata = self.stream.read()
else:
newdata = self.stream.read(size)
# decode bytes (those remaining from the last call included)
data = self.bytebuffer + newdata
try:
newchars, decodedbytes = self.decode(data, self.errors)
except UnicodeDecodeError, exc:
if firstline:
newchars, decodedbytes = self.decode(data[:exc.start], self.errors)
lines = newchars.splitlines(True)
if len(lines)<=1:
raise
else:
raise
# keep undecoded bytes until the next call
self.bytebuffer = data[decodedbytes:]
# put new characters in the character buffer
self.charbuffer += newchars
# there was no data available
if not newdata:
break
if chars < 0:
# Return everything we've got
result = self.charbuffer
self.charbuffer = ""
else:
# Return the first chars characters
result = self.charbuffer[:chars]
self.charbuffer = self.charbuffer[chars:]
return result
def readline(self, size=None, keepends=True):
""" Read one line from the input stream and return the
decoded data.
size, if given, is passed as size argument to the
read() method.
"""
# If we have lines cached from an earlier read, return
# them unconditionally
if self.linebuffer:
line = self.linebuffer[0]
del self.linebuffer[0]
if len(self.linebuffer) == 1:
# revert to charbuffer mode; we might need more data
# next time
self.charbuffer = self.linebuffer[0]
self.linebuffer = None
if not keepends:
line = line.splitlines(False)[0]
return line
readsize = size or 72
line = ""
# If size is given, we call read() only once
while True:
data = self.read(readsize, firstline=True)
if data:
# If we're at a "\r" read one extra character (which might
# be a "\n") to get a proper line ending. If the stream is
# temporarily exhausted we return the wrong line ending.
if data.endswith("\r"):
data += self.read(size=1, chars=1)
line += data
lines = line.splitlines(True)
if lines:
if len(lines) > 1:
# More than one line result; the first line is a full line
# to return
line = lines[0]
del lines[0]
if len(lines) > 1:
# cache the remaining lines
lines[-1] += self.charbuffer
self.linebuffer = lines
self.charbuffer = None
else:
# only one remaining line, put it back into charbuffer
self.charbuffer = lines[0] + self.charbuffer
if not keepends:
line = line.splitlines(False)[0]
break
line0withend = lines[0]
line0withoutend = lines[0].splitlines(False)[0]
if line0withend != line0withoutend: # We really have a line end
# Put the rest back together and keep it until the next call
self.charbuffer = "".join(lines[1:]) + self.charbuffer
if keepends:
line = line0withend
else:
line = line0withoutend
break
# we didn't get anything or this was our only try
if not data or size is not None:
if line and not keepends:
line = line.splitlines(False)[0]
break
if readsize<8000:
readsize *= 2
return line
def readlines(self, sizehint=None, keepends=True):
""" Read all lines available on the input stream
and return them as list of lines.
Line breaks are implemented using the codec's decoder
method and are included in the list entries.
sizehint, if given, is ignored since there is no efficient
way to finding the true end-of-line.
"""
data = self.read()
return data.splitlines(keepends)
def reset(self):
""" Resets the codec buffers used for keeping state.
Note that no stream repositioning should take place.
This method is primarily intended to be able to recover
from decoding errors.
"""
self.bytebuffer = ""
self.charbuffer = u""
self.linebuffer = None
def seek(self, offset, whence=0):
""" Set the input stream's current position.
Resets the codec buffers used for keeping state.
"""
self.stream.seek(offset, whence)
self.reset()
def next(self):
""" Return the next decoded line from the input stream."""
line = self.readline()
if line:
return line
raise StopIteration
def __iter__(self):
return self
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
def __enter__(self):
return self
def __exit__(self, type, value, tb):
self.stream.close()
###
class StreamReaderWriter:
""" StreamReaderWriter instances allow wrapping streams which
work in both read and write modes.
The design is such that one can use the factory functions
returned by the codec.lookup() function to construct the
instance.
"""
# Optional attributes set by the file wrappers below
encoding = 'unknown'
def __init__(self, stream, Reader, Writer, errors='strict'):
""" Creates a StreamReaderWriter instance.
stream must be a Stream-like object.
Reader, Writer must be factory functions or classes
providing the StreamReader, StreamWriter interface resp.
Error handling is done in the same way as defined for the
StreamWriter/Readers.
"""
self.stream = stream
self.reader = Reader(stream, errors)
self.writer = Writer(stream, errors)
self.errors = errors
def read(self, size=-1):
return self.reader.read(size)
def readline(self, size=None):
return self.reader.readline(size)
def readlines(self, sizehint=None):
return self.reader.readlines(sizehint)
def next(self):
""" Return the next decoded line from the input stream."""
return self.reader.next()
def __iter__(self):
return self
def write(self, data):
return self.writer.write(data)
def writelines(self, list):
return self.writer.writelines(list)
def reset(self):
self.reader.reset()
self.writer.reset()
def seek(self, offset, whence=0):
self.stream.seek(offset, whence)
self.reader.reset()
if whence == 0 and offset == 0:
self.writer.reset()
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
# these are needed to make "with codecs.open(...)" work properly
def __enter__(self):
return self
def __exit__(self, type, value, tb):
self.stream.close()
###
class StreamRecoder:
""" StreamRecoder instances provide a frontend - backend
view of encoding data.
They use the complete set of APIs returned by the
codecs.lookup() function to implement their task.
Data written to the stream is first decoded into an
intermediate format (which is dependent on the given codec
combination) and then written to the stream using an instance
of the provided Writer class.
In the other direction, data is read from the stream using a
Reader instance and then return encoded data to the caller.
"""
# Optional attributes set by the file wrappers below
data_encoding = 'unknown'
file_encoding = 'unknown'
def __init__(self, stream, encode, decode, Reader, Writer,
errors='strict'):
""" Creates a StreamRecoder instance which implements a two-way
conversion: encode and decode work on the frontend (the
input to .read() and output of .write()) while
Reader and Writer work on the backend (reading and
writing to the stream).
You can use these objects to do transparent direct
recodings from e.g. latin-1 to utf-8 and back.
stream must be a file-like object.
encode, decode must adhere to the Codec interface, Reader,
Writer must be factory functions or classes providing the
StreamReader, StreamWriter interface resp.
encode and decode are needed for the frontend translation,
Reader and Writer for the backend translation. Unicode is
used as intermediate encoding.
Error handling is done in the same way as defined for the
StreamWriter/Readers.
"""
self.stream = stream
self.encode = encode
self.decode = decode
self.reader = Reader(stream, errors)
self.writer = Writer(stream, errors)
self.errors = errors
def read(self, size=-1):
data = self.reader.read(size)
data, bytesencoded = self.encode(data, self.errors)
return data
def readline(self, size=None):
if size is None:
data = self.reader.readline()
else:
data = self.reader.readline(size)
data, bytesencoded = self.encode(data, self.errors)
return data
def readlines(self, sizehint=None):
data = self.reader.read()
data, bytesencoded = self.encode(data, self.errors)
return data.splitlines(1)
def next(self):
""" Return the next decoded line from the input stream."""
data = self.reader.next()
data, bytesencoded = self.encode(data, self.errors)
return data
def __iter__(self):
return self
def write(self, data):
data, bytesdecoded = self.decode(data, self.errors)
return self.writer.write(data)
def writelines(self, list):
data = ''.join(list)
data, bytesdecoded = self.decode(data, self.errors)
return self.writer.write(data)
def reset(self):
self.reader.reset()
self.writer.reset()
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
def __enter__(self):
return self
def __exit__(self, type, value, tb):
self.stream.close()
### Shortcuts
def open(filename, mode='rb', encoding=None, errors='strict', buffering=1):
""" Open an encoded file using the given mode and return
a wrapped version providing transparent encoding/decoding.
Note: The wrapped version will only accept the object format
defined by the codecs, i.e. Unicode objects for most builtin
codecs. Output is also codec dependent and will usually be
Unicode as well.
Files are always opened in binary mode, even if no binary mode
was specified. This is done to avoid data loss due to encodings
using 8-bit values. The default file mode is 'rb' meaning to
open the file in binary read mode.
encoding specifies the encoding which is to be used for the
file.
errors may be given to define the error handling. It defaults
to 'strict' which causes ValueErrors to be raised in case an
encoding error occurs.
buffering has the same meaning as for the builtin open() API.
It defaults to line buffered.
The returned wrapped file object provides an extra attribute
.encoding which allows querying the used encoding. This
attribute is only available if an encoding was specified as
parameter.
"""
if encoding is not None:
if 'U' in mode:
# No automatic conversion of '\n' is done on reading and writing
mode = mode.strip().replace('U', '')
if mode[:1] not in set('rwa'):
mode = 'r' + mode
if 'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'
file = __builtin__.open(filename, mode, buffering)
if encoding is None:
return file
info = lookup(encoding)
srw = StreamReaderWriter(file, info.streamreader, info.streamwriter, errors)
# Add attributes to simplify introspection
srw.encoding = encoding
return srw
def EncodedFile(file, data_encoding, file_encoding=None, errors='strict'):
""" Return a wrapped version of file which provides transparent
encoding translation.
Strings written to the wrapped file are interpreted according
to the given data_encoding and then written to the original
file as string using file_encoding. The intermediate encoding
will usually be Unicode but depends on the specified codecs.
Strings are read from the file using file_encoding and then
passed back to the caller as string using data_encoding.
If file_encoding is not given, it defaults to data_encoding.
errors may be given to define the error handling. It defaults
to 'strict' which causes ValueErrors to be raised in case an
encoding error occurs.
The returned wrapped file object provides two extra attributes
.data_encoding and .file_encoding which reflect the given
parameters of the same name. The attributes can be used for
introspection by Python programs.
"""
if file_encoding is None:
file_encoding = data_encoding
data_info = lookup(data_encoding)
file_info = lookup(file_encoding)
sr = StreamRecoder(file, data_info.encode, data_info.decode,
file_info.streamreader, file_info.streamwriter, errors)
# Add attributes to simplify introspection
sr.data_encoding = data_encoding
sr.file_encoding = file_encoding
return sr
### Helpers for codec lookup
def getencoder(encoding):
""" Lookup up the codec for the given encoding and return
its encoder function.
Raises a LookupError in case the encoding cannot be found.
"""
return lookup(encoding).encode
def getdecoder(encoding):
""" Lookup up the codec for the given encoding and return
its decoder function.
Raises a LookupError in case the encoding cannot be found.
"""
return lookup(encoding).decode
def getincrementalencoder(encoding):
""" Lookup up the codec for the given encoding and return
its IncrementalEncoder class or factory function.
Raises a LookupError in case the encoding cannot be found
or the codecs doesn't provide an incremental encoder.
"""
encoder = lookup(encoding).incrementalencoder
if encoder is None:
raise LookupError(encoding)
return encoder
def getincrementaldecoder(encoding):
""" Lookup up the codec for the given encoding and return
its IncrementalDecoder class or factory function.
Raises a LookupError in case the encoding cannot be found
or the codecs doesn't provide an incremental decoder.
"""
decoder = lookup(encoding).incrementaldecoder
if decoder is None:
raise LookupError(encoding)
return decoder
def getreader(encoding):
""" Lookup up the codec for the given encoding and return
its StreamReader class or factory function.
Raises a LookupError in case the encoding cannot be found.
"""
return lookup(encoding).streamreader
def getwriter(encoding):
""" Lookup up the codec for the given encoding and return
its StreamWriter class or factory function.
Raises a LookupError in case the encoding cannot be found.
"""
return lookup(encoding).streamwriter
def iterencode(iterator, encoding, errors='strict', **kwargs):
"""
Encoding iterator.
Encodes the input strings from the iterator using an IncrementalEncoder.
errors and kwargs are passed through to the IncrementalEncoder
constructor.
"""
encoder = getincrementalencoder(encoding)(errors, **kwargs)
for input in iterator:
output = encoder.encode(input)
if output:
yield output
output = encoder.encode("", True)
if output:
yield output
def iterdecode(iterator, encoding, errors='strict', **kwargs):
"""
Decoding iterator.
Decodes the input strings from the iterator using an IncrementalDecoder.
errors and kwargs are passed through to the IncrementalDecoder
constructor.
"""
decoder = getincrementaldecoder(encoding)(errors, **kwargs)
for input in iterator:
output = decoder.decode(input)
if output:
yield output
output = decoder.decode("", True)
if output:
yield output
### Helpers for charmap-based codecs
def make_identity_dict(rng):
""" make_identity_dict(rng) -> dict
Return a dictionary where elements of the rng sequence are
mapped to themselves.
"""
res = {}
for i in rng:
res[i]=i
return res
def make_encoding_map(decoding_map):
""" Creates an encoding map from a decoding map.
If a target mapping in the decoding map occurs multiple
times, then that target is mapped to None (undefined mapping),
causing an exception when encountered by the charmap codec
during translation.
One example where this happens is cp875.py which decodes
multiple character to \\u001a.
"""
m = {}
for k,v in decoding_map.items():
if not v in m:
m[v] = k
else:
m[v] = None
return m
### error handlers
try:
strict_errors = lookup_error("strict")
ignore_errors = lookup_error("ignore")
replace_errors = lookup_error("replace")
xmlcharrefreplace_errors = lookup_error("xmlcharrefreplace")
backslashreplace_errors = lookup_error("backslashreplace")
except LookupError:
# In --disable-unicode builds, these error handler are missing
strict_errors = None
ignore_errors = None
replace_errors = None
xmlcharrefreplace_errors = None
backslashreplace_errors = None
# Tell modulefinder that using codecs probably needs the encodings
# package
_false = 0
if _false:
import encodings
### Tests
if __name__ == '__main__':
# Make stdout translate Latin-1 output into UTF-8 output
sys.stdout = EncodedFile(sys.stdout, 'latin-1', 'utf-8')
# Have stdin translate Latin-1 input into UTF-8 input
sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1')
r"""Utilities to compile possibly incomplete Python source code.
This module provides two interfaces, broadly similar to the builtin
function compile(), which take program text, a filename and a 'mode'
and:
- Return code object if the command is complete and valid
- Return None if the command is incomplete
- Raise SyntaxError, ValueError or OverflowError if the command is a
syntax error (OverflowError and ValueError can be produced by
malformed literals).
Approach:
First, check if the source consists entirely of blank lines and
comments; if so, replace it with 'pass', because the built-in
parser doesn't always do the right thing for these.
Compile three times: as is, with \n, and with \n\n appended. If it
compiles as is, it's complete. If it compiles with one \n appended,
we expect more. If it doesn't compile either way, we compare the
error we get when compiling with \n or \n\n appended. If the errors
are the same, the code is broken. But if the errors are different, we
expect more. Not intuitive; not even guaranteed to hold in future
releases; but this matches the compiler's behavior from Python 1.4
through 2.2, at least.
Caveat:
It is possible (but not likely) that the parser stops parsing with a
successful outcome before reaching the end of the source; in this
case, trailing symbols may be ignored instead of causing an error.
For example, a backslash followed by two newlines may be followed by
arbitrary garbage. This will be fixed once the API for the parser is
better.
The two interfaces are:
compile_command(source, filename, symbol):
Compiles a single command in the manner described above.
CommandCompiler():
Instances of this class have __call__ methods identical in
signature to compile_command; the difference is that if the
instance compiles program text containing a __future__ statement,
the instance 'remembers' and compiles all subsequent program texts
with the statement in force.
The module also provides another class:
Compile():
Instances of this class act like the built-in function compile,
but with 'memory' in the sense described above.
"""
import __future__
_features = [getattr(__future__, fname)
for fname in __future__.all_feature_names]
__all__ = ["compile_command", "Compile", "CommandCompiler"]
PyCF_DONT_IMPLY_DEDENT = 0x200 # Matches pythonrun.h
def _maybe_compile(compiler, source, filename, symbol):
# Check for source consisting of only blank lines and comments
for line in source.split("\n"):
line = line.strip()
if line and line[0] != '#':
break # Leave it alone
else:
if symbol != "eval":
source = "pass" # Replace it with a 'pass' statement
err = err1 = err2 = None
code = code1 = code2 = None
try:
code = compiler(source, filename, symbol)
except SyntaxError, err:
pass
try:
code1 = compiler(source + "\n", filename, symbol)
except SyntaxError, err1:
pass
try:
code2 = compiler(source + "\n\n", filename, symbol)
except SyntaxError, err2:
pass
if code:
return code
if not code1 and repr(err1) == repr(err2):
raise SyntaxError, err1
def _compile(source, filename, symbol):
return compile(source, filename, symbol, PyCF_DONT_IMPLY_DEDENT)
def compile_command(source, filename="<input>", symbol="single"):
r"""Compile a command and determine whether it is incomplete.
Arguments:
source -- the source string; may contain \n characters
filename -- optional filename from which source was read; default
"<input>"
symbol -- optional grammar start symbol; "single" (default) or "eval"
Return value / exceptions raised:
- Return a code object if the command is complete and valid
- Return None if the command is incomplete
- Raise SyntaxError, ValueError or OverflowError if the command is a
syntax error (OverflowError and ValueError can be produced by
malformed literals).
"""
return _maybe_compile(_compile, source, filename, symbol)
class Compile:
"""Instances of this class behave much like the built-in compile
function, but if one is used to compile text containing a future
statement, it "remembers" and compiles all subsequent program texts
with the statement in force."""
def __init__(self):
self.flags = PyCF_DONT_IMPLY_DEDENT
def __call__(self, source, filename, symbol):
codeob = compile(source, filename, symbol, self.flags, 1)
for feature in _features:
if codeob.co_flags & feature.compiler_flag:
self.flags |= feature.compiler_flag
return codeob
class CommandCompiler:
"""Instances of this class have __call__ methods identical in
signature to compile_command; the difference is that if the
instance compiles program text containing a __future__ statement,
the instance 'remembers' and compiles all subsequent program texts
with the statement in force."""
def __init__(self,):
self.compiler = Compile()
def __call__(self, source, filename="<input>", symbol="single"):
r"""Compile a command and determine whether it is incomplete.
Arguments:
source -- the source string; may contain \n characters
filename -- optional filename from which source was read;
default "<input>"
symbol -- optional grammar start symbol; "single" (default) or
"eval"
Return value / exceptions raised:
- Return a code object if the command is complete and valid
- Return None if the command is incomplete
- Raise SyntaxError, ValueError or OverflowError if the command is a
syntax error (OverflowError and ValueError can be produced by
malformed literals).
"""
return _maybe_compile(self.compiler, source, filename, symbol)
'''This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.
* namedtuple factory function for creating tuple subclasses with named fields
* deque list-like container with fast appends and pops on either end
* Counter dict subclass for counting hashable objects
* OrderedDict dict subclass that remembers the order entries were added
* defaultdict dict subclass that calls a factory function to supply missing values
'''
__all__ = ['Counter', 'deque', 'defaultdict', 'namedtuple', 'OrderedDict']
# For bootstrapping reasons, the collection ABCs are defined in _abcoll.py.
# They should however be considered an integral part of collections.py.
from _abcoll import *
import _abcoll
__all__ += _abcoll.__all__
from _collections import deque, defaultdict
from operator import itemgetter as _itemgetter, eq as _eq
from keyword import iskeyword as _iskeyword
import sys as _sys
import heapq as _heapq
from itertools import repeat as _repeat, chain as _chain, starmap as _starmap
from itertools import imap as _imap
try:
from thread import get_ident as _get_ident
except ImportError:
from dummy_thread import get_ident as _get_ident
################################################################################
### OrderedDict
################################################################################
class OrderedDict(dict):
'Dictionary that remembers insertion order'
# An inherited dict maps keys to values.
# The inherited dict provides __getitem__, __len__, __contains__, and get.
# The remaining methods are order-aware.
# Big-O running times for all methods are the same as regular dictionaries.
# The internal self.__map dict maps keys to links in a doubly linked list.
# The circular doubly linked list starts and ends with a sentinel element.
# The sentinel element never gets deleted (this simplifies the algorithm).
# Each link is stored as a list of length three: [PREV, NEXT, KEY].
def __init__(*args, **kwds):
'''Initialize an ordered dictionary. The signature is the same as
regular dictionaries, but keyword arguments are not recommended because
their insertion order is arbitrary.
'''
if not args:
raise TypeError("descriptor '__init__' of 'OrderedDict' object "
"needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
try:
self.__root
except AttributeError:
self.__root = root = [] # sentinel node
root[:] = [root, root, None]
self.__map = {}
self.__update(*args, **kwds)
def __setitem__(self, key, value, dict_setitem=dict.__setitem__):
'od.__setitem__(i, y) <==> od[i]=y'
# Setting a new item creates a new link at the end of the linked list,
# and the inherited dictionary is updated with the new key/value pair.
if key not in self:
root = self.__root
last = root[0]
last[1] = root[0] = self.__map[key] = [last, root, key]
return dict_setitem(self, key, value)
def __delitem__(self, key, dict_delitem=dict.__delitem__):
'od.__delitem__(y) <==> del od[y]'
# Deleting an existing item uses self.__map to find the link which gets
# removed by updating the links in the predecessor and successor nodes.
dict_delitem(self, key)
link_prev, link_next, _ = self.__map.pop(key)
link_prev[1] = link_next # update link_prev[NEXT]
link_next[0] = link_prev # update link_next[PREV]
def __iter__(self):
'od.__iter__() <==> iter(od)'
# Traverse the linked list in order.
root = self.__root
curr = root[1] # start at the first node
while curr is not root:
yield curr[2] # yield the curr[KEY]
curr = curr[1] # move to next node
def __reversed__(self):
'od.__reversed__() <==> reversed(od)'
# Traverse the linked list in reverse order.
root = self.__root
curr = root[0] # start at the last node
while curr is not root:
yield curr[2] # yield the curr[KEY]
curr = curr[0] # move to previous node
def clear(self):
'od.clear() -> None. Remove all items from od.'
root = self.__root
root[:] = [root, root, None]
self.__map.clear()
dict.clear(self)
# -- the following methods do not depend on the internal structure --
def keys(self):
'od.keys() -> list of keys in od'
return list(self)
def values(self):
'od.values() -> list of values in od'
return [self[key] for key in self]
def items(self):
'od.items() -> list of (key, value) pairs in od'
return [(key, self[key]) for key in self]
def iterkeys(self):
'od.iterkeys() -> an iterator over the keys in od'
return iter(self)
def itervalues(self):
'od.itervalues -> an iterator over the values in od'
for k in self:
yield self[k]
def iteritems(self):
'od.iteritems -> an iterator over the (key, value) pairs in od'
for k in self:
yield (k, self[k])
update = MutableMapping.update
__update = update # let subclasses override update without breaking __init__
__marker = object()
def pop(self, key, default=__marker):
'''od.pop(k[,d]) -> v, remove specified key and return the corresponding
value. If key is not found, d is returned if given, otherwise KeyError
is raised.
'''
if key in self:
result = self[key]
del self[key]
return result
if default is self.__marker:
raise KeyError(key)
return default
def setdefault(self, key, default=None):
'od.setdefault(k[,d]) -> od.get(k,d), also set od[k]=d if k not in od'
if key in self:
return self[key]
self[key] = default
return default
def popitem(self, last=True):
'''od.popitem() -> (k, v), return and remove a (key, value) pair.
Pairs are returned in LIFO order if last is true or FIFO order if false.
'''
if not self:
raise KeyError('dictionary is empty')
key = next(reversed(self) if last else iter(self))
value = self.pop(key)
return key, value
def __repr__(self, _repr_running={}):
'od.__repr__() <==> repr(od)'
call_key = id(self), _get_ident()
if call_key in _repr_running:
return '...'
_repr_running[call_key] = 1
try:
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, self.items())
finally:
del _repr_running[call_key]
def __reduce__(self):
'Return state information for pickling'
items = [[k, self[k]] for k in self]
inst_dict = vars(self).copy()
for k in vars(OrderedDict()):
inst_dict.pop(k, None)
if inst_dict:
return (self.__class__, (items,), inst_dict)
return self.__class__, (items,)
def copy(self):
'od.copy() -> a shallow copy of od'
return self.__class__(self)
@classmethod
def fromkeys(cls, iterable, value=None):
'''OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S.
If not specified, the value defaults to None.
'''
self = cls()
for key in iterable:
self[key] = value
return self
def __eq__(self, other):
'''od.__eq__(y) <==> od==y. Comparison to another OD is order-sensitive
while comparison to a regular mapping is order-insensitive.
'''
if isinstance(other, OrderedDict):
return dict.__eq__(self, other) and all(_imap(_eq, self, other))
return dict.__eq__(self, other)
def __ne__(self, other):
'od.__ne__(y) <==> od!=y'
return not self == other
# -- the following methods support python 3.x style dictionary views --
def viewkeys(self):
"od.viewkeys() -> a set-like object providing a view on od's keys"
return KeysView(self)
def viewvalues(self):
"od.viewvalues() -> an object providing a view on od's values"
return ValuesView(self)
def viewitems(self):
"od.viewitems() -> a set-like object providing a view on od's items"
return ItemsView(self)
################################################################################
### namedtuple
################################################################################
_class_template = '''\
class {typename}(tuple):
'{typename}({arg_list})'
__slots__ = ()
_fields = {field_names!r}
def __new__(_cls, {arg_list}):
'Create new instance of {typename}({arg_list})'
return _tuple.__new__(_cls, ({arg_list}))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new {typename} object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != {num_fields:d}:
raise TypeError('Expected {num_fields:d} arguments, got %d' % len(result))
return result
def __repr__(self):
'Return a nicely formatted representation string'
return '{typename}({repr_fmt})' % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values'
return OrderedDict(zip(self._fields, self))
def _replace(_self, **kwds):
'Return a new {typename} object replacing specified fields with new values'
result = _self._make(map(kwds.pop, {field_names!r}, _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % kwds.keys())
return result
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
__dict__ = _property(_asdict)
def __getstate__(self):
'Exclude the OrderedDict from pickling'
pass
{field_defs}
'''
_repr_template = '{name}=%r'
_field_template = '''\
{name} = _property(_itemgetter({index:d}), doc='Alias for field number {index:d}')
'''
def namedtuple(typename, field_names, verbose=False, rename=False):
"""Returns a new subclass of tuple with named fields.
>>> Point = namedtuple('Point', ['x', 'y'])
>>> Point.__doc__ # docstring for the new class
'Point(x, y)'
>>> p = Point(11, y=22) # instantiate with positional args or keywords
>>> p[0] + p[1] # indexable like a plain tuple
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> d = p._asdict() # convert to a dictionary
>>> d['x']
11
>>> Point(**d) # convert from a dictionary
Point(x=11, y=22)
>>> p._replace(x=100) # _replace() is like str.replace() but targets named fields
Point(x=100, y=22)
"""
# Validate the field names. At the user's option, either generate an error
# message or automatically replace the field name with a valid name.
if isinstance(field_names, basestring):
field_names = field_names.replace(',', ' ').split()
field_names = map(str, field_names)
typename = str(typename)
if rename:
seen = set()
for index, name in enumerate(field_names):
if (not all(c.isalnum() or c=='_' for c in name)
or _iskeyword(name)
or not name
or name[0].isdigit()
or name.startswith('_')
or name in seen):
field_names[index] = '_%d' % index
seen.add(name)
for name in [typename] + field_names:
if type(name) != str:
raise TypeError('Type names and field names must be strings')
if not all(c.isalnum() or c=='_' for c in name):
raise ValueError('Type names and field names can only contain '
'alphanumeric characters and underscores: %r' % name)
if _iskeyword(name):
raise ValueError('Type names and field names cannot be a '
'keyword: %r' % name)
if name[0].isdigit():
raise ValueError('Type names and field names cannot start with '
'a number: %r' % name)
seen = set()
for name in field_names:
if name.startswith('_') and not rename:
raise ValueError('Field names cannot start with an underscore: '
'%r' % name)
if name in seen:
raise ValueError('Encountered duplicate field name: %r' % name)
seen.add(name)
# Fill-in the class template
class_definition = _class_template.format(
typename = typename,
field_names = tuple(field_names),
num_fields = len(field_names),
arg_list = repr(tuple(field_names)).replace("'", "")[1:-1],
repr_fmt = ', '.join(_repr_template.format(name=name)
for name in field_names),
field_defs = '\n'.join(_field_template.format(index=index, name=name)
for index, name in enumerate(field_names))
)
if verbose:
print class_definition
# Execute the template string in a temporary namespace and support
# tracing utilities by setting a value for frame.f_globals['__name__']
namespace = dict(_itemgetter=_itemgetter, __name__='namedtuple_%s' % typename,
OrderedDict=OrderedDict, _property=property, _tuple=tuple)
try:
exec class_definition in namespace
except SyntaxError as e:
raise SyntaxError(e.message + ':\n' + class_definition)
result = namespace[typename]
# For pickling to work, the __module__ variable needs to be set to the frame
# where the named tuple is created. Bypass this step in environments where
# sys._getframe is not defined (Jython for example) or sys._getframe is not
# defined for arguments greater than 0 (IronPython).
try:
result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
return result
########################################################################
### Counter
########################################################################
class Counter(dict):
'''Dict subclass for counting hashable items. Sometimes called a bag
or multiset. Elements are stored as dictionary keys and their counts
are stored as dictionary values.
>>> c = Counter('abcdeabcdabcaba') # count elements from a string
>>> c.most_common(3) # three most common elements
[('a', 5), ('b', 4), ('c', 3)]
>>> sorted(c) # list all unique elements
['a', 'b', 'c', 'd', 'e']
>>> ''.join(sorted(c.elements())) # list elements with repetitions
'aaaaabbbbcccdde'
>>> sum(c.values()) # total of all counts
15
>>> c['a'] # count of letter 'a'
5
>>> for elem in 'shazam': # update counts from an iterable
... c[elem] += 1 # by adding 1 to each element's count
>>> c['a'] # now there are seven 'a'
7
>>> del c['b'] # remove all 'b'
>>> c['b'] # now there are zero 'b'
0
>>> d = Counter('simsalabim') # make another counter
>>> c.update(d) # add in the second counter
>>> c['a'] # now there are nine 'a'
9
>>> c.clear() # empty the counter
>>> c
Counter()
Note: If a count is set to zero or reduced to zero, it will remain
in the counter until the entry is deleted or the counter is cleared:
>>> c = Counter('aaabbc')
>>> c['b'] -= 2 # reduce the count of 'b' by two
>>> c.most_common() # 'b' is still in, but its count is zero
[('a', 3), ('c', 1), ('b', 0)]
'''
# References:
# http://en.wikipedia.org/wiki/Multiset
# http://www.gnu.org/software/smalltalk/manual-base/html_node/Bag.html
# http://www.demo2s.com/Tutorial/Cpp/0380__set-multiset/Catalog0380__set-multiset.htm
# http://code.activestate.com/recipes/259174/
# Knuth, TAOCP Vol. II section 4.6.3
def __init__(*args, **kwds):
'''Create a new, empty Counter object. And if given, count elements
from an input iterable. Or, initialize the count from another mapping
of elements to their counts.
>>> c = Counter() # a new, empty counter
>>> c = Counter('gallahad') # a new counter from an iterable
>>> c = Counter({'a': 4, 'b': 2}) # a new counter from a mapping
>>> c = Counter(a=4, b=2) # a new counter from keyword args
'''
if not args:
raise TypeError("descriptor '__init__' of 'Counter' object "
"needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
super(Counter, self).__init__()
self.update(*args, **kwds)
def __missing__(self, key):
'The count of elements not in the Counter is zero.'
# Needed so that self[missing_item] does not raise KeyError
return 0
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))
def elements(self):
'''Iterator over elements repeating each as many times as its count.
>>> c = Counter('ABCABC')
>>> sorted(c.elements())
['A', 'A', 'B', 'B', 'C', 'C']
# Knuth's example for prime factors of 1836: 2**2 * 3**3 * 17**1
>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
>>> product = 1
>>> for factor in prime_factors.elements(): # loop over factors
... product *= factor # and multiply them
>>> product
1836
Note, if an element's count has been set to zero or is a negative
number, elements() will ignore it.
'''
# Emulate Bag.do from Smalltalk and Multiset.begin from C++.
return _chain.from_iterable(_starmap(_repeat, self.iteritems()))
# Override dict methods where necessary
@classmethod
def fromkeys(cls, iterable, v=None):
# There is no equivalent method for counters because setting v=1
# means that no element can have a count greater than one.
raise NotImplementedError(
'Counter.fromkeys() is undefined. Use Counter(iterable) instead.')
def update(*args, **kwds):
'''Like dict.update() but add counts instead of replacing them.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which')
>>> c.update('witch') # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d) # add elements from another counter
>>> c['h'] # four 'h' in which, witch, and watch
4
'''
# The regular dict.update() operation makes no sense here because the
# replace behavior results in the some of original untouched counts
# being mixed-in with all of the other counts for a mismash that
# doesn't have a straight-forward interpretation in most counting
# contexts. Instead, we implement straight-addition. Both the inputs
# and outputs are allowed to contain zero and negative counts.
if not args:
raise TypeError("descriptor 'update' of 'Counter' object "
"needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
iterable = args[0] if args else None
if iterable is not None:
if isinstance(iterable, Mapping):
if self:
self_get = self.get
for elem, count in iterable.iteritems():
self[elem] = self_get(elem, 0) + count
else:
super(Counter, self).update(iterable) # fast path when counter is empty
else:
self_get = self.get
for elem in iterable:
self[elem] = self_get(elem, 0) + 1
if kwds:
self.update(kwds)
def subtract(*args, **kwds):
'''Like dict.update() but subtracts counts instead of replacing them.
Counts can be reduced below zero. Both the inputs and outputs are
allowed to contain zero and negative counts.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which')
>>> c.subtract('witch') # subtract elements from another iterable
>>> c.subtract(Counter('watch')) # subtract elements from another counter
>>> c['h'] # 2 in which, minus 1 in witch, minus 1 in watch
0
>>> c['w'] # 1 in which, minus 1 in witch, minus 1 in watch
-1
'''
if not args:
raise TypeError("descriptor 'subtract' of 'Counter' object "
"needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
iterable = args[0] if args else None
if iterable is not None:
self_get = self.get
if isinstance(iterable, Mapping):
for elem, count in iterable.items():
self[elem] = self_get(elem, 0) - count
else:
for elem in iterable:
self[elem] = self_get(elem, 0) - 1
if kwds:
self.subtract(kwds)
def copy(self):
'Return a shallow copy.'
return self.__class__(self)
def __reduce__(self):
return self.__class__, (dict(self),)
def __delitem__(self, elem):
'Like dict.__delitem__() but does not raise KeyError for missing values.'
if elem in self:
super(Counter, self).__delitem__(elem)
def __repr__(self):
if not self:
return '%s()' % self.__class__.__name__
items = ', '.join(map('%r: %r'.__mod__, self.most_common()))
return '%s({%s})' % (self.__class__.__name__, items)
# Multiset-style mathematical operations discussed in:
# Knuth TAOCP Volume II section 4.6.3 exercise 19
# and at http://en.wikipedia.org/wiki/Multiset
#
# Outputs guaranteed to only include positive counts.
#
# To strip negative and zero counts, add-in an empty counter:
# c += Counter()
def __add__(self, other):
'''Add counts from two counters.
>>> Counter('abbb') + Counter('bcc')
Counter({'b': 4, 'c': 2, 'a': 1})
'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem, count in self.items():
newcount = count + other[elem]
if newcount > 0:
result[elem] = newcount
for elem, count in other.items():
if elem not in self and count > 0:
result[elem] = count
return result
def __sub__(self, other):
''' Subtract count, but keep only results with positive counts.
>>> Counter('abbbc') - Counter('bccd')
Counter({'b': 2, 'a': 1})
'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem, count in self.items():
newcount = count - other[elem]
if newcount > 0:
result[elem] = newcount
for elem, count in other.items():
if elem not in self and count < 0:
result[elem] = 0 - count
return result
def __or__(self, other):
'''Union is the maximum of value in either of the input counters.
>>> Counter('abbb') | Counter('bcc')
Counter({'b': 3, 'c': 2, 'a': 1})
'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem, count in self.items():
other_count = other[elem]
newcount = other_count if count < other_count else count
if newcount > 0:
result[elem] = newcount
for elem, count in other.items():
if elem not in self and count > 0:
result[elem] = count
return result
def __and__(self, other):
''' Intersection is the minimum of corresponding counts.
>>> Counter('abbb') & Counter('bcc')
Counter({'b': 1})
'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem, count in self.items():
other_count = other[elem]
newcount = count if count < other_count else other_count
if newcount > 0:
result[elem] = newcount
return result
if __name__ == '__main__':
# verify that instances can be pickled
from cPickle import loads, dumps
Point = namedtuple('Point', 'x, y', True)
p = Point(x=10, y=20)
assert p == loads(dumps(p))
# test and demonstrate ability to override methods
class Point(namedtuple('Point', 'x y')):
__slots__ = ()
@property
def hypot(self):
return (self.x ** 2 + self.y ** 2) ** 0.5
def __str__(self):
return 'Point: x=%6.3f y=%6.3f hypot=%6.3f' % (self.x, self.y, self.hypot)
for p in Point(3, 4), Point(14, 5/7.):
print p
class Point(namedtuple('Point', 'x y')):
'Point class with optimized _make() and _replace() without error-checking'
__slots__ = ()
_make = classmethod(tuple.__new__)
def _replace(self, _map=map, **kwds):
return self._make(_map(kwds.get, ('x', 'y'), self))
print Point(11, 22)._replace(x=100)
Point3D = namedtuple('Point3D', Point._fields + ('z',))
print Point3D.__doc__
import doctest
TestResults = namedtuple('TestResults', 'failed attempted')
print TestResults(*doctest.testmod())
"""Conversion functions between RGB and other color systems.
This modules provides two functions for each color system ABC:
rgb_to_abc(r, g, b) --> a, b, c
abc_to_rgb(a, b, c) --> r, g, b
All inputs and outputs are triples of floats in the range [0.0...1.0]
(with the exception of I and Q, which covers a slightly larger range).
Inputs outside the valid range may cause exceptions or invalid outputs.
Supported color systems:
RGB: Red, Green, Blue components
YIQ: Luminance, Chrominance (used by composite video signals)
HLS: Hue, Luminance, Saturation
HSV: Hue, Saturation, Value
"""
# References:
# http://en.wikipedia.org/wiki/YIQ
# http://en.wikipedia.org/wiki/HLS_color_space
# http://en.wikipedia.org/wiki/HSV_color_space
__all__ = ["rgb_to_yiq","yiq_to_rgb","rgb_to_hls","hls_to_rgb",
"rgb_to_hsv","hsv_to_rgb"]
# Some floating point constants
ONE_THIRD = 1.0/3.0
ONE_SIXTH = 1.0/6.0
TWO_THIRD = 2.0/3.0
# YIQ: used by composite video signals (linear combinations of RGB)
# Y: perceived grey level (0.0 == black, 1.0 == white)
# I, Q: color components
def rgb_to_yiq(r, g, b):
y = 0.30*r + 0.59*g + 0.11*b
i = 0.60*r - 0.28*g - 0.32*b
q = 0.21*r - 0.52*g + 0.31*b
return (y, i, q)
def yiq_to_rgb(y, i, q):
r = y + 0.948262*i + 0.624013*q
g = y - 0.276066*i - 0.639810*q
b = y - 1.105450*i + 1.729860*q
if r < 0.0:
r = 0.0
if g < 0.0:
g = 0.0
if b < 0.0:
b = 0.0
if r > 1.0:
r = 1.0
if g > 1.0:
g = 1.0
if b > 1.0:
b = 1.0
return (r, g, b)
# HLS: Hue, Luminance, Saturation
# H: position in the spectrum
# L: color lightness
# S: color saturation
def rgb_to_hls(r, g, b):
maxc = max(r, g, b)
minc = min(r, g, b)
# XXX Can optimize (maxc+minc) and (maxc-minc)
l = (minc+maxc)/2.0
if minc == maxc:
return 0.0, l, 0.0
if l <= 0.5:
s = (maxc-minc) / (maxc+minc)
else:
s = (maxc-minc) / (2.0-maxc-minc)
rc = (maxc-r) / (maxc-minc)
gc = (maxc-g) / (maxc-minc)
bc = (maxc-b) / (maxc-minc)
if r == maxc:
h = bc-gc
elif g == maxc:
h = 2.0+rc-bc
else:
h = 4.0+gc-rc
h = (h/6.0) % 1.0
return h, l, s
def hls_to_rgb(h, l, s):
if s == 0.0:
return l, l, l
if l <= 0.5:
m2 = l * (1.0+s)
else:
m2 = l+s-(l*s)
m1 = 2.0*l - m2
return (_v(m1, m2, h+ONE_THIRD), _v(m1, m2, h), _v(m1, m2, h-ONE_THIRD))
def _v(m1, m2, hue):
hue = hue % 1.0
if hue < ONE_SIXTH:
return m1 + (m2-m1)*hue*6.0
if hue < 0.5:
return m2
if hue < TWO_THIRD:
return m1 + (m2-m1)*(TWO_THIRD-hue)*6.0
return m1
# HSV: Hue, Saturation, Value
# H: position in the spectrum
# S: color saturation ("purity")
# V: color brightness
def rgb_to_hsv(r, g, b):
maxc = max(r, g, b)
minc = min(r, g, b)
v = maxc
if minc == maxc:
return 0.0, 0.0, v
s = (maxc-minc) / maxc
rc = (maxc-r) / (maxc-minc)
gc = (maxc-g) / (maxc-minc)
bc = (maxc-b) / (maxc-minc)
if r == maxc:
h = bc-gc
elif g == maxc:
h = 2.0+rc-bc
else:
h = 4.0+gc-rc
h = (h/6.0) % 1.0
return h, s, v
def hsv_to_rgb(h, s, v):
if s == 0.0:
return v, v, v
i = int(h*6.0) # XXX assume int() truncates!
f = (h*6.0) - i
p = v*(1.0 - s)
q = v*(1.0 - s*f)
t = v*(1.0 - s*(1.0-f))
i = i%6
if i == 0:
return v, t, p
if i == 1:
return q, v, p
if i == 2:
return p, v, t
if i == 3:
return p, q, v
if i == 4:
return t, p, v
if i == 5:
return v, p, q
# Cannot get here
"""Execute shell commands via os.popen() and return status, output.
Interface summary:
import commands
outtext = commands.getoutput(cmd)
(exitstatus, outtext) = commands.getstatusoutput(cmd)
outtext = commands.getstatus(file) # returns output of "ls -ld file"
A trailing newline is removed from the output string.
Encapsulates the basic operation:
pipe = os.popen('{ ' + cmd + '; } 2>&1', 'r')
text = pipe.read()
sts = pipe.close()
[Note: it would be nice to add functions to interpret the exit status.]
"""
from warnings import warnpy3k
warnpy3k("the commands module has been removed in Python 3.0; "
"use the subprocess module instead", stacklevel=2)
del warnpy3k
__all__ = ["getstatusoutput","getoutput","getstatus"]
# Module 'commands'
#
# Various tools for executing commands and looking at their output and status.
#
# NB This only works (and is only relevant) for UNIX.
# Get 'ls -l' status for an object into a string
#
def getstatus(file):
"""Return output of "ls -ld <file>" in a string."""
import warnings
warnings.warn("commands.getstatus() is deprecated", DeprecationWarning, 2)
return getoutput('ls -ld' + mkarg(file))
# Get the output from a shell command into a string.
# The exit status is ignored; a trailing newline is stripped.
# Assume the command will work with '{ ... ; } 2>&1' around it..
#
def getoutput(cmd):
"""Return output (stdout or stderr) of executing cmd in a shell."""
return getstatusoutput(cmd)[1]
# Ditto but preserving the exit status.
# Returns a pair (sts, output)
#
def getstatusoutput(cmd):
"""Return (status, output) of executing cmd in a shell."""
import os
pipe = os.popen('{ ' + cmd + '; } 2>&1', 'r')
text = pipe.read()
sts = pipe.close()
if sts is None: sts = 0
if text[-1:] == '\n': text = text[:-1]
return sts, text
# Make command argument from directory and pathname (prefix space, add quotes).
#
def mk2arg(head, x):
import os
return mkarg(os.path.join(head, x))
# Make a shell command argument from a string.
# Return a string beginning with a space followed by a shell-quoted
# version of the argument.
# Two strategies: enclose in single quotes if it contains none;
# otherwise, enclose in double quotes and prefix quotable characters
# with backslash.
#
def mkarg(x):
if '\'' not in x:
return ' \'' + x + '\''
s = ' "'
for c in x:
if c in '\\$"`':
s = s + '\\'
s = s + c
s = s + '"'
return s
"""Module/script to byte-compile all .py files to .pyc (or .pyo) files.
When called as a script with arguments, this compiles the directories
given as arguments recursively; the -l option prevents it from
recursing into directories.
Without arguments, if compiles all modules on sys.path, without
recursing into subdirectories. (Even though it should do so for
packages -- for now, you'll have to deal with packages separately.)
See module py_compile for details of the actual byte-compilation.
"""
import os
import sys
import py_compile
import struct
import imp
__all__ = ["compile_dir","compile_file","compile_path"]
def compile_dir(dir, maxlevels=10, ddir=None,
force=0, rx=None, quiet=0):
"""Byte-compile all modules in the given directory tree.
Arguments (only dir is required):
dir: the directory to byte-compile
maxlevels: maximum recursion level (default 10)
ddir: the directory that will be prepended to the path to the
file as it is compiled into each byte-code file.
force: if 1, force compilation, even if timestamps are up-to-date
quiet: if 1, be quiet during compilation
"""
if not quiet:
print 'Listing', dir, '...'
try:
names = os.listdir(dir)
except os.error:
print "Can't list", dir
names = []
names.sort()
success = 1
for name in names:
fullname = os.path.join(dir, name)
if ddir is not None:
dfile = os.path.join(ddir, name)
else:
dfile = None
if not os.path.isdir(fullname):
if not compile_file(fullname, ddir, force, rx, quiet):
success = 0
elif maxlevels > 0 and \
name != os.curdir and name != os.pardir and \
os.path.isdir(fullname) and \
not os.path.islink(fullname):
if not compile_dir(fullname, maxlevels - 1, dfile, force, rx,
quiet):
success = 0
return success
def compile_file(fullname, ddir=None, force=0, rx=None, quiet=0):
"""Byte-compile one file.
Arguments (only fullname is required):
fullname: the file to byte-compile
ddir: if given, the directory name compiled in to the
byte-code file.
force: if 1, force compilation, even if timestamps are up-to-date
quiet: if 1, be quiet during compilation
"""
success = 1
name = os.path.basename(fullname)
if ddir is not None:
dfile = os.path.join(ddir, name)
else:
dfile = None
if rx is not None:
mo = rx.search(fullname)
if mo:
return success
if os.path.isfile(fullname):
head, tail = name[:-3], name[-3:]
if tail == '.py':
if not force:
try:
mtime = int(os.stat(fullname).st_mtime)
expect = struct.pack('<4sl', imp.get_magic(), mtime)
cfile = fullname + (__debug__ and 'c' or 'o')
with open(cfile, 'rb') as chandle:
actual = chandle.read(8)
if expect == actual:
return success
except IOError:
pass
if not quiet:
print 'Compiling', fullname, '...'
try:
ok = py_compile.compile(fullname, None, dfile, True)
except py_compile.PyCompileError,err:
if quiet:
print 'Compiling', fullname, '...'
print err.msg
success = 0
except IOError, e:
print "Sorry", e
success = 0
else:
if ok == 0:
success = 0
return success
def compile_path(skip_curdir=1, maxlevels=0, force=0, quiet=0):
"""Byte-compile all module on sys.path.
Arguments (all optional):
skip_curdir: if true, skip current directory (default true)
maxlevels: max recursion level (default 0)
force: as for compile_dir() (default 0)
quiet: as for compile_dir() (default 0)
"""
success = 1
for dir in sys.path:
if (not dir or dir == os.curdir) and skip_curdir:
print 'Skipping current directory'
else:
success = success and compile_dir(dir, maxlevels, None,
force, quiet=quiet)
return success
def expand_args(args, flist):
"""read names in flist and append to args"""
expanded = args[:]
if flist:
try:
if flist == '-':
fd = sys.stdin
else:
fd = open(flist)
while 1:
line = fd.readline()
if not line:
break
expanded.append(line[:-1])
except IOError:
print "Error reading file list %s" % flist
raise
return expanded
def main():
"""Script main program."""
import getopt
try:
opts, args = getopt.getopt(sys.argv[1:], 'lfqd:x:i:')
except getopt.error, msg:
print msg
print "usage: python compileall.py [-l] [-f] [-q] [-d destdir] " \
"[-x regexp] [-i list] [directory|file ...]"
print
print "arguments: zero or more file and directory names to compile; " \
"if no arguments given, "
print " defaults to the equivalent of -l sys.path"
print
print "options:"
print "-l: don't recurse into subdirectories"
print "-f: force rebuild even if timestamps are up-to-date"
print "-q: output only error messages"
print "-d destdir: directory to prepend to file paths for use in " \
"compile-time tracebacks and in"
print " runtime tracebacks in cases where the source " \
"file is unavailable"
print "-x regexp: skip files matching the regular expression regexp; " \
"the regexp is searched for"
print " in the full path of each file considered for " \
"compilation"
print "-i file: add all the files and directories listed in file to " \
"the list considered for"
print ' compilation; if "-", names are read from stdin'
sys.exit(2)
maxlevels = 10
ddir = None
force = 0
quiet = 0
rx = None
flist = None
for o, a in opts:
if o == '-l': maxlevels = 0
if o == '-d': ddir = a
if o == '-f': force = 1
if o == '-q': quiet = 1
if o == '-x':
import re
rx = re.compile(a)
if o == '-i': flist = a
if ddir:
if len(args) != 1 and not os.path.isdir(args[0]):
print "-d destdir require exactly one directory argument"
sys.exit(2)
success = 1
try:
if args or flist:
try:
if flist:
args = expand_args(args, flist)
except IOError:
success = 0
if success:
for arg in args:
if os.path.isdir(arg):
if not compile_dir(arg, maxlevels, ddir,
force, rx, quiet):
success = 0
else:
if not compile_file(arg, ddir, force, rx, quiet):
success = 0
else:
success = compile_path()
except KeyboardInterrupt:
print "\n[interrupted]"
success = 0
return success
if __name__ == '__main__':
exit_status = int(not main())
sys.exit(exit_status)
"""Configuration file parser.
A setup file consists of sections, lead by a "[section]" header,
and followed by "name: value" entries, with continuations and such in
the style of RFC 822.
The option values can contain format strings which refer to other values in
the same section, or values in a special [DEFAULT] section.
For example:
something: %(dir)s/whatever
would resolve the "%(dir)s" to the value of dir. All reference
expansions are done late, on demand.
Intrinsic defaults can be specified by passing them into the
ConfigParser constructor as a dictionary.
class:
ConfigParser -- responsible for parsing a list of
configuration files, and managing the parsed database.
methods:
__init__(defaults=None)
create the parser and specify a dictionary of intrinsic defaults. The
keys must be strings, the values must be appropriate for %()s string
interpolation. Note that `__name__' is always an intrinsic default;
its value is the section's name.
sections()
return all the configuration section names, sans DEFAULT
has_section(section)
return whether the given section exists
has_option(section, option)
return whether the given option exists in the given section
options(section)
return list of configuration options for the named section
read(filenames)
read and parse the list of named configuration files, given by
name. A single filename is also allowed. Non-existing files
are ignored. Return list of successfully read files.
readfp(fp, filename=None)
read and parse one configuration file, given as a file object.
The filename defaults to fp.name; it is only used in error
messages (if fp has no `name' attribute, the string `<???>' is used).
get(section, option, raw=False, vars=None)
return a string value for the named option. All % interpolations are
expanded in the return values, based on the defaults passed into the
constructor and the DEFAULT section. Additional substitutions may be
provided using the `vars' argument, which must be a dictionary whose
contents override any pre-existing defaults.
getint(section, options)
like get(), but convert value to an integer
getfloat(section, options)
like get(), but convert value to a float
getboolean(section, options)
like get(), but convert value to a boolean (currently case
insensitively defined as 0, false, no, off for False, and 1, true,
yes, on for True). Returns False or True.
items(section, raw=False, vars=None)
return a list of tuples with (name, value) for each option
in the section.
remove_section(section)
remove the given file section and all its options
remove_option(section, option)
remove the given option from the given section
set(section, option, value)
set the given option
write(fp)
write the configuration state in .ini format
"""
try:
from collections import OrderedDict as _default_dict
except ImportError:
# fallback for setup.py which hasn't yet built _collections
_default_dict = dict
import re
__all__ = ["NoSectionError", "DuplicateSectionError", "NoOptionError",
"InterpolationError", "InterpolationDepthError",
"InterpolationSyntaxError", "ParsingError",
"MissingSectionHeaderError",
"ConfigParser", "SafeConfigParser", "RawConfigParser",
"DEFAULTSECT", "MAX_INTERPOLATION_DEPTH"]
DEFAULTSECT = "DEFAULT"
MAX_INTERPOLATION_DEPTH = 10
# exception classes
class Error(Exception):
"""Base class for ConfigParser exceptions."""
def _get_message(self):
"""Getter for 'message'; needed only to override deprecation in
BaseException."""
return self.__message
def _set_message(self, value):
"""Setter for 'message'; needed only to override deprecation in
BaseException."""
self.__message = value
# BaseException.message has been deprecated since Python 2.6. To prevent
# DeprecationWarning from popping up over this pre-existing attribute, use
# a new property that takes lookup precedence.
message = property(_get_message, _set_message)
def __init__(self, msg=''):
self.message = msg
Exception.__init__(self, msg)
def __repr__(self):
return self.message
__str__ = __repr__
class NoSectionError(Error):
"""Raised when no section matches a requested option."""
def __init__(self, section):
Error.__init__(self, 'No section: %r' % (section,))
self.section = section
self.args = (section, )
class DuplicateSectionError(Error):
"""Raised when a section is multiply-created."""
def __init__(self, section):
Error.__init__(self, "Section %r already exists" % section)
self.section = section
self.args = (section, )
class NoOptionError(Error):
"""A requested option was not found."""
def __init__(self, option, section):
Error.__init__(self, "No option %r in section: %r" %
(option, section))
self.option = option
self.section = section
self.args = (option, section)
class InterpolationError(Error):
"""Base class for interpolation-related exceptions."""
def __init__(self, option, section, msg):
Error.__init__(self, msg)
self.option = option
self.section = section
self.args = (option, section, msg)
class InterpolationMissingOptionError(InterpolationError):
"""A string substitution required a setting which was not available."""
def __init__(self, option, section, rawval, reference):
msg = ("Bad value substitution:\n"
"\tsection: [%s]\n"
"\toption : %s\n"
"\tkey : %s\n"
"\trawval : %s\n"
% (section, option, reference, rawval))
InterpolationError.__init__(self, option, section, msg)
self.reference = reference
self.args = (option, section, rawval, reference)
class InterpolationSyntaxError(InterpolationError):
"""Raised when the source text into which substitutions are made
does not conform to the required syntax."""
class InterpolationDepthError(InterpolationError):
"""Raised when substitutions are nested too deeply."""
def __init__(self, option, section, rawval):
msg = ("Value interpolation too deeply recursive:\n"
"\tsection: [%s]\n"
"\toption : %s\n"
"\trawval : %s\n"
% (section, option, rawval))
InterpolationError.__init__(self, option, section, msg)
self.args = (option, section, rawval)
class ParsingError(Error):
"""Raised when a configuration file does not follow legal syntax."""
def __init__(self, filename):
Error.__init__(self, 'File contains parsing errors: %s' % filename)
self.filename = filename
self.errors = []
self.args = (filename, )
def append(self, lineno, line):
self.errors.append((lineno, line))
self.message += '\n\t[line %2d]: %s' % (lineno, line)
class MissingSectionHeaderError(ParsingError):
"""Raised when a key-value pair is found before any section header."""
def __init__(self, filename, lineno, line):
Error.__init__(
self,
'File contains no section headers.\nfile: %s, line: %d\n%r' %
(filename, lineno, line))
self.filename = filename
self.lineno = lineno
self.line = line
self.args = (filename, lineno, line)
class RawConfigParser:
def __init__(self, defaults=None, dict_type=_default_dict,
allow_no_value=False):
self._dict = dict_type
self._sections = self._dict()
self._defaults = self._dict()
if allow_no_value:
self._optcre = self.OPTCRE_NV
else:
self._optcre = self.OPTCRE
if defaults:
for key, value in defaults.items():
self._defaults[self.optionxform(key)] = value
def defaults(self):
return self._defaults
def sections(self):
"""Return a list of section names, excluding [DEFAULT]"""
# self._sections will never have [DEFAULT] in it
return self._sections.keys()
def add_section(self, section):
"""Create a new section in the configuration.
Raise DuplicateSectionError if a section by the specified name
already exists. Raise ValueError if name is DEFAULT or any of it's
case-insensitive variants.
"""
if section.lower() == "default":
raise ValueError, 'Invalid section name: %s' % section
if section in self._sections:
raise DuplicateSectionError(section)
self._sections[section] = self._dict()
def has_section(self, section):
"""Indicate whether the named section is present in the configuration.
The DEFAULT section is not acknowledged.
"""
return section in self._sections
def options(self, section):
"""Return a list of option names for the given section name."""
try:
opts = self._sections[section].copy()
except KeyError:
raise NoSectionError(section)
opts.update(self._defaults)
if '__name__' in opts:
del opts['__name__']
return opts.keys()
def read(self, filenames):
"""Read and parse a filename or a list of filenames.
Files that cannot be opened are silently ignored; this is
designed so that you can specify a list of potential
configuration file locations (e.g. current directory, user's
home directory, systemwide directory), and all existing
configuration files in the list will be read. A single
filename may also be given.
Return list of successfully read files.
"""
if isinstance(filenames, basestring):
filenames = [filenames]
read_ok = []
for filename in filenames:
try:
fp = open(filename)
except IOError:
continue
self._read(fp, filename)
fp.close()
read_ok.append(filename)
return read_ok
def readfp(self, fp, filename=None):
"""Like read() but the argument must be a file-like object.
The `fp' argument must have a `readline' method. Optional
second argument is the `filename', which if not given, is
taken from fp.name. If fp has no `name' attribute, `<???>' is
used.
"""
if filename is None:
try:
filename = fp.name
except AttributeError:
filename = '<???>'
self._read(fp, filename)
def get(self, section, option):
opt = self.optionxform(option)
if section not in self._sections:
if section != DEFAULTSECT:
raise NoSectionError(section)
if opt in self._defaults:
return self._defaults[opt]
else:
raise NoOptionError(option, section)
elif opt in self._sections[section]:
return self._sections[section][opt]
elif opt in self._defaults:
return self._defaults[opt]
else:
raise NoOptionError(option, section)
def items(self, section):
try:
d2 = self._sections[section]
except KeyError:
if section != DEFAULTSECT:
raise NoSectionError(section)
d2 = self._dict()
d = self._defaults.copy()
d.update(d2)
if "__name__" in d:
del d["__name__"]
return d.items()
def _get(self, section, conv, option):
return conv(self.get(section, option))
def getint(self, section, option):
return self._get(section, int, option)
def getfloat(self, section, option):
return self._get(section, float, option)
_boolean_states = {'1': True, 'yes': True, 'true': True, 'on': True,
'0': False, 'no': False, 'false': False, 'off': False}
def getboolean(self, section, option):
v = self.get(section, option)
if v.lower() not in self._boolean_states:
raise ValueError, 'Not a boolean: %s' % v
return self._boolean_states[v.lower()]
def optionxform(self, optionstr):
return optionstr.lower()
def has_option(self, section, option):
"""Check for the existence of a given option in a given section."""
if not section or section == DEFAULTSECT:
option = self.optionxform(option)
return option in self._defaults
elif section not in self._sections:
return False
else:
option = self.optionxform(option)
return (option in self._sections[section]
or option in self._defaults)
def set(self, section, option, value=None):
"""Set an option."""
if not section or section == DEFAULTSECT:
sectdict = self._defaults
else:
try:
sectdict = self._sections[section]
except KeyError:
raise NoSectionError(section)
sectdict[self.optionxform(option)] = value
def write(self, fp):
"""Write an .ini-format representation of the configuration state."""
if self._defaults:
fp.write("[%s]\n" % DEFAULTSECT)
for (key, value) in self._defaults.items():
fp.write("%s = %s\n" % (key, str(value).replace('\n', '\n\t')))
fp.write("\n")
for section in self._sections:
fp.write("[%s]\n" % section)
for (key, value) in self._sections[section].items():
if key == "__name__":
continue
if (value is not None) or (self._optcre == self.OPTCRE):
key = " = ".join((key, str(value).replace('\n', '\n\t')))
fp.write("%s\n" % (key))
fp.write("\n")
def remove_option(self, section, option):
"""Remove an option."""
if not section or section == DEFAULTSECT:
sectdict = self._defaults
else:
try:
sectdict = self._sections[section]
except KeyError:
raise NoSectionError(section)
option = self.optionxform(option)
existed = option in sectdict
if existed:
del sectdict[option]
return existed
def remove_section(self, section):
"""Remove a file section."""
existed = section in self._sections
if existed:
del self._sections[section]
return existed
#
# Regular expressions for parsing section headers and options.
#
SECTCRE = re.compile(
r'\[' # [
r'(?P<header>[^]]+)' # very permissive!
r'\]' # ]
)
OPTCRE = re.compile(
r'(?P<option>[^:=\s][^:=]*)' # very permissive!
r'\s*(?P<vi>[:=])\s*' # any number of space/tab,
# followed by separator
# (either : or =), followed
# by any # space/tab
r'(?P<value>.*)$' # everything up to eol
)
OPTCRE_NV = re.compile(
r'(?P<option>[^:=\s][^:=]*)' # very permissive!
r'\s*(?:' # any number of space/tab,
r'(?P<vi>[:=])\s*' # optionally followed by
# separator (either : or
# =), followed by any #
# space/tab
r'(?P<value>.*))?$' # everything up to eol
)
def _read(self, fp, fpname):
"""Parse a sectioned setup file.
The sections in setup file contains a title line at the top,
indicated by a name in square brackets (`[]'), plus key/value
options lines, indicated by `name: value' format lines.
Continuations are represented by an embedded newline then
leading whitespace. Blank lines, lines beginning with a '#',
and just about everything else are ignored.
"""
cursect = None # None, or a dictionary
optname = None
lineno = 0
e = None # None, or an exception
while True:
line = fp.readline()
if not line:
break
lineno = lineno + 1
# comment or blank line?
if line.strip() == '' or line[0] in '#;':
continue
if line.split(None, 1)[0].lower() == 'rem' and line[0] in "rR":
# no leading whitespace
continue
# continuation line?
if line[0].isspace() and cursect is not None and optname:
value = line.strip()
if value:
cursect[optname].append(value)
# a section header or option header?
else:
# is it a section header?
mo = self.SECTCRE.match(line)
if mo:
sectname = mo.group('header')
if sectname in self._sections:
cursect = self._sections[sectname]
elif sectname == DEFAULTSECT:
cursect = self._defaults
else:
cursect = self._dict()
cursect['__name__'] = sectname
self._sections[sectname] = cursect
# So sections can't start with a continuation line
optname = None
# no section header in the file?
elif cursect is None:
raise MissingSectionHeaderError(fpname, lineno, line)
# an option line?
else:
mo = self._optcre.match(line)
if mo:
optname, vi, optval = mo.group('option', 'vi', 'value')
optname = self.optionxform(optname.rstrip())
# This check is fine because the OPTCRE cannot
# match if it would set optval to None
if optval is not None:
if vi in ('=', ':') and ';' in optval:
# ';' is a comment delimiter only if it follows
# a spacing character
pos = optval.find(';')
if pos != -1 and optval[pos-1].isspace():
optval = optval[:pos]
optval = optval.strip()
# allow empty values
if optval == '""':
optval = ''
cursect[optname] = [optval]
else:
# valueless option handling
cursect[optname] = optval
else:
# a non-fatal parsing error occurred. set up the
# exception but keep going. the exception will be
# raised at the end of the file and will contain a
# list of all bogus lines
if not e:
e = ParsingError(fpname)
e.append(lineno, repr(line))
# if any parsing errors occurred, raise an exception
if e:
raise e
# join the multi-line values collected while reading
all_sections = [self._defaults]
all_sections.extend(self._sections.values())
for options in all_sections:
for name, val in options.items():
if isinstance(val, list):
options[name] = '\n'.join(val)
import UserDict as _UserDict
class _Chainmap(_UserDict.DictMixin):
"""Combine multiple mappings for successive lookups.
For example, to emulate Python's normal lookup sequence:
import __builtin__
pylookup = _Chainmap(locals(), globals(), vars(__builtin__))
"""
def __init__(self, *maps):
self._maps = maps
def __getitem__(self, key):
for mapping in self._maps:
try:
return mapping[key]
except KeyError:
pass
raise KeyError(key)
def keys(self):
result = []
seen = set()
for mapping in self._maps:
for key in mapping:
if key not in seen:
result.append(key)
seen.add(key)
return result
class ConfigParser(RawConfigParser):
def get(self, section, option, raw=False, vars=None):
"""Get an option value for a given section.
If `vars' is provided, it must be a dictionary. The option is looked up
in `vars' (if provided), `section', and in `defaults' in that order.
All % interpolations are expanded in the return values, unless the
optional argument `raw' is true. Values for interpolation keys are
looked up in the same manner as the option.
The section DEFAULT is special.
"""
sectiondict = {}
try:
sectiondict = self._sections[section]
except KeyError:
if section != DEFAULTSECT:
raise NoSectionError(section)
# Update with the entry specific variables
vardict = {}
if vars:
for key, value in vars.items():
vardict[self.optionxform(key)] = value
d = _Chainmap(vardict, sectiondict, self._defaults)
option = self.optionxform(option)
try:
value = d[option]
except KeyError:
raise NoOptionError(option, section)
if raw or value is None:
return value
else:
return self._interpolate(section, option, value, d)
def items(self, section, raw=False, vars=None):
"""Return a list of tuples with (name, value) for each option
in the section.
All % interpolations are expanded in the return values, based on the
defaults passed into the constructor, unless the optional argument
`raw' is true. Additional substitutions may be provided using the
`vars' argument, which must be a dictionary whose contents overrides
any pre-existing defaults.
The section DEFAULT is special.
"""
d = self._defaults.copy()
try:
d.update(self._sections[section])
except KeyError:
if section != DEFAULTSECT:
raise NoSectionError(section)
# Update with the entry specific variables
if vars:
for key, value in vars.items():
d[self.optionxform(key)] = value
options = d.keys()
if "__name__" in options:
options.remove("__name__")
if raw:
return [(option, d[option])
for option in options]
else:
return [(option, self._interpolate(section, option, d[option], d))
for option in options]
def _interpolate(self, section, option, rawval, vars):
# do the string interpolation
value = rawval
depth = MAX_INTERPOLATION_DEPTH
while depth: # Loop through this until it's done
depth -= 1
if value and "%(" in value:
value = self._KEYCRE.sub(self._interpolation_replace, value)
try:
value = value % vars
except KeyError, e:
raise InterpolationMissingOptionError(
option, section, rawval, e.args[0])
else:
break
if value and "%(" in value:
raise InterpolationDepthError(option, section, rawval)
return value
_KEYCRE = re.compile(r"%\(([^)]*)\)s|.")
def _interpolation_replace(self, match):
s = match.group(1)
if s is None:
return match.group()
else:
return "%%(%s)s" % self.optionxform(s)
class SafeConfigParser(ConfigParser):
def _interpolate(self, section, option, rawval, vars):
# do the string interpolation
L = []
self._interpolate_some(option, L, rawval, section, vars, 1)
return ''.join(L)
_interpvar_re = re.compile(r"%\(([^)]+)\)s")
def _interpolate_some(self, option, accum, rest, section, map, depth):
if depth > MAX_INTERPOLATION_DEPTH:
raise InterpolationDepthError(option, section, rest)
while rest:
p = rest.find("%")
if p < 0:
accum.append(rest)
return
if p > 0:
accum.append(rest[:p])
rest = rest[p:]
# p is no longer used
c = rest[1:2]
if c == "%":
accum.append("%")
rest = rest[2:]
elif c == "(":
m = self._interpvar_re.match(rest)
if m is None:
raise InterpolationSyntaxError(option, section,
"bad interpolation variable reference %r" % rest)
var = self.optionxform(m.group(1))
rest = rest[m.end():]
try:
v = map[var]
except KeyError:
raise InterpolationMissingOptionError(
option, section, rest, var)
if "%" in v:
self._interpolate_some(option, accum, v,
section, map, depth + 1)
else:
accum.append(v)
else:
raise InterpolationSyntaxError(
option, section,
"'%%' must be followed by '%%' or '(', found: %r" % (rest,))
def set(self, section, option, value=None):
"""Set an option. Extend ConfigParser.set: check for string values."""
# The only legal non-string value if we allow valueless
# options is None, so we need to check if the value is a
# string if:
# - we do not allow valueless options, or
# - we allow valueless options but the value is not None
if self._optcre is self.OPTCRE or value:
if not isinstance(value, basestring):
raise TypeError("option values must be strings")
if value is not None:
# check for bad percent signs:
# first, replace all "good" interpolations
tmp_value = value.replace('%%', '')
tmp_value = self._interpvar_re.sub('', tmp_value)
# then, check if there's a lone percent sign left
if '%' in tmp_value:
raise ValueError("invalid interpolation syntax in %r at "
"position %d" % (value, tmp_value.find('%')))
ConfigParser.set(self, section, option, value)
"""Utilities for with-statement contexts. See PEP 343."""
import sys
from functools import wraps
from warnings import warn
__all__ = ["contextmanager", "nested", "closing"]
class GeneratorContextManager(object):
"""Helper for @contextmanager decorator."""
def __init__(self, gen):
self.gen = gen
def __enter__(self):
try:
return self.gen.next()
except StopIteration:
raise RuntimeError("generator didn't yield")
def __exit__(self, type, value, traceback):
if type is None:
try:
self.gen.next()
except StopIteration:
return
else:
raise RuntimeError("generator didn't stop")
else:
if value is None:
# Need to force instantiation so we can reliably
# tell if we get the same exception back
value = type()
try:
self.gen.throw(type, value, traceback)
raise RuntimeError("generator didn't stop after throw()")
except StopIteration, exc:
# Suppress the exception *unless* it's the same exception that
# was passed to throw(). This prevents a StopIteration
# raised inside the "with" statement from being suppressed
return exc is not value
except:
# only re-raise if it's *not* the exception that was
# passed to throw(), because __exit__() must not raise
# an exception unless __exit__() itself failed. But throw()
# has to raise the exception to signal propagation, so this
# fixes the impedance mismatch between the throw() protocol
# and the __exit__() protocol.
#
if sys.exc_info()[1] is not value:
raise
def contextmanager(func):
"""@contextmanager decorator.
Typical usage:
@contextmanager
def some_generator(<arguments>):
<setup>
try:
yield <value>
finally:
<cleanup>
This makes this:
with some_generator(<arguments>) as <variable>:
<body>
equivalent to this:
<setup>
try:
<variable> = <value>
<body>
finally:
<cleanup>
"""
@wraps(func)
def helper(*args, **kwds):
return GeneratorContextManager(func(*args, **kwds))
return helper
@contextmanager
def nested(*managers):
"""Combine multiple context managers into a single nested context manager.
This function has been deprecated in favour of the multiple manager form
of the with statement.
The one advantage of this function over the multiple manager form of the
with statement is that argument unpacking allows it to be
used with a variable number of context managers as follows:
with nested(*managers):
do_something()
"""
warn("With-statements now directly support multiple context managers",
DeprecationWarning, 3)
exits = []
vars = []
exc = (None, None, None)
try:
for mgr in managers:
exit = mgr.__exit__
enter = mgr.__enter__
vars.append(enter())
exits.append(exit)
yield vars
except:
exc = sys.exc_info()
finally:
while exits:
exit = exits.pop()
try:
if exit(*exc):
exc = (None, None, None)
except:
exc = sys.exc_info()
if exc != (None, None, None):
# Don't rely on sys.exc_info() still containing
# the right information. Another exception may
# have been raised and caught by an exit method
raise exc[0], exc[1], exc[2]
class closing(object):
"""Context to automatically close something at the end of a block.
Code like this:
with closing(<module>.open(<arguments>)) as f:
<block>
is equivalent to this:
f = <module>.open(<arguments>)
try:
<block>
finally:
f.close()
"""
def __init__(self, thing):
self.thing = thing
def __enter__(self):
return self.thing
def __exit__(self, *exc_info):
self.thing.close()
"""Generic (shallow and deep) copying operations.
Interface summary:
import copy
x = copy.copy(y) # make a shallow copy of y
x = copy.deepcopy(y) # make a deep copy of y
For module specific errors, copy.Error is raised.
The difference between shallow and deep copying is only relevant for
compound objects (objects that contain other objects, like lists or
class instances).
- A shallow copy constructs a new compound object and then (to the
extent possible) inserts *the same objects* into it that the
original contains.
- A deep copy constructs a new compound object and then, recursively,
inserts *copies* into it of the objects found in the original.
Two problems often exist with deep copy operations that don't exist
with shallow copy operations:
a) recursive objects (compound objects that, directly or indirectly,
contain a reference to themselves) may cause a recursive loop
b) because deep copy copies *everything* it may copy too much, e.g.
administrative data structures that should be shared even between
copies
Python's deep copy operation avoids these problems by:
a) keeping a table of objects already copied during the current
copying pass
b) letting user-defined classes override the copying operation or the
set of components copied
This version does not copy types like module, class, function, method,
nor stack trace, stack frame, nor file, socket, window, nor array, nor
any similar types.
Classes can use the same interfaces to control copying that they use
to control pickling: they can define methods called __getinitargs__(),
__getstate__() and __setstate__(). See the documentation for module
"pickle" for information on these methods.
"""
import types
import weakref
from copy_reg import dispatch_table
class Error(Exception):
pass
error = Error # backward compatibility
try:
from org.python.core import PyStringMap
except ImportError:
PyStringMap = None
__all__ = ["Error", "copy", "deepcopy"]
def copy(x):
"""Shallow copy operation on arbitrary Python objects.
See the module's __doc__ string for more info.
"""
cls = type(x)
copier = _copy_dispatch.get(cls)
if copier:
return copier(x)
copier = getattr(cls, "__copy__", None)
if copier:
return copier(x)
reductor = dispatch_table.get(cls)
if reductor:
rv = reductor(x)
else:
reductor = getattr(x, "__reduce_ex__", None)
if reductor:
rv = reductor(2)
else:
reductor = getattr(x, "__reduce__", None)
if reductor:
rv = reductor()
else:
raise Error("un(shallow)copyable object of type %s" % cls)
return _reconstruct(x, rv, 0)
_copy_dispatch = d = {}
def _copy_immutable(x):
return x
for t in (type(None), int, long, float, bool, str, tuple,
frozenset, type, xrange, types.ClassType,
types.BuiltinFunctionType, type(Ellipsis),
types.FunctionType, weakref.ref):
d[t] = _copy_immutable
for name in ("ComplexType", "UnicodeType", "CodeType"):
t = getattr(types, name, None)
if t is not None:
d[t] = _copy_immutable
def _copy_with_constructor(x):
return type(x)(x)
for t in (list, dict, set):
d[t] = _copy_with_constructor
def _copy_with_copy_method(x):
return x.copy()
if PyStringMap is not None:
d[PyStringMap] = _copy_with_copy_method
def _copy_inst(x):
if hasattr(x, '__copy__'):
return x.__copy__()
if hasattr(x, '__getinitargs__'):
args = x.__getinitargs__()
y = x.__class__(*args)
else:
y = _EmptyClass()
y.__class__ = x.__class__
if hasattr(x, '__getstate__'):
state = x.__getstate__()
else:
state = x.__dict__
if hasattr(y, '__setstate__'):
y.__setstate__(state)
else:
y.__dict__.update(state)
return y
d[types.InstanceType] = _copy_inst
del d
def deepcopy(x, memo=None, _nil=[]):
"""Deep copy operation on arbitrary Python objects.
See the module's __doc__ string for more info.
"""
if memo is None:
memo = {}
d = id(x)
y = memo.get(d, _nil)
if y is not _nil:
return y
cls = type(x)
copier = _deepcopy_dispatch.get(cls)
if copier:
y = copier(x, memo)
else:
try:
issc = issubclass(cls, type)
except TypeError: # cls is not a class (old Boost; see SF #502085)
issc = 0
if issc:
y = _deepcopy_atomic(x, memo)
else:
copier = getattr(x, "__deepcopy__", None)
if copier:
y = copier(memo)
else:
reductor = dispatch_table.get(cls)
if reductor:
rv = reductor(x)
else:
reductor = getattr(x, "__reduce_ex__", None)
if reductor:
rv = reductor(2)
else:
reductor = getattr(x, "__reduce__", None)
if reductor:
rv = reductor()
else:
raise Error(
"un(deep)copyable object of type %s" % cls)
y = _reconstruct(x, rv, 1, memo)
memo[d] = y
_keep_alive(x, memo) # Make sure x lives at least as long as d
return y
_deepcopy_dispatch = d = {}
def _deepcopy_atomic(x, memo):
return x
d[type(None)] = _deepcopy_atomic
d[type(Ellipsis)] = _deepcopy_atomic
d[int] = _deepcopy_atomic
d[long] = _deepcopy_atomic
d[float] = _deepcopy_atomic
d[bool] = _deepcopy_atomic
try:
d[complex] = _deepcopy_atomic
except NameError:
pass
d[str] = _deepcopy_atomic
try:
d[unicode] = _deepcopy_atomic
except NameError:
pass
try:
d[types.CodeType] = _deepcopy_atomic
except AttributeError:
pass
d[type] = _deepcopy_atomic
d[xrange] = _deepcopy_atomic
d[types.ClassType] = _deepcopy_atomic
d[types.BuiltinFunctionType] = _deepcopy_atomic
d[types.FunctionType] = _deepcopy_atomic
d[weakref.ref] = _deepcopy_atomic
def _deepcopy_list(x, memo):
y = []
memo[id(x)] = y
for a in x:
y.append(deepcopy(a, memo))
return y
d[list] = _deepcopy_list
def _deepcopy_tuple(x, memo):
y = []
for a in x:
y.append(deepcopy(a, memo))
d = id(x)
try:
return memo[d]
except KeyError:
pass
for i in range(len(x)):
if x[i] is not y[i]:
y = tuple(y)
break
else:
y = x
memo[d] = y
return y
d[tuple] = _deepcopy_tuple
def _deepcopy_dict(x, memo):
y = {}
memo[id(x)] = y
for key, value in x.iteritems():
y[deepcopy(key, memo)] = deepcopy(value, memo)
return y
d[dict] = _deepcopy_dict
if PyStringMap is not None:
d[PyStringMap] = _deepcopy_dict
def _deepcopy_method(x, memo): # Copy instance methods
return type(x)(x.im_func, deepcopy(x.im_self, memo), x.im_class)
_deepcopy_dispatch[types.MethodType] = _deepcopy_method
def _keep_alive(x, memo):
"""Keeps a reference to the object x in the memo.
Because we remember objects by their id, we have
to assure that possibly temporary objects are kept
alive by referencing them.
We store a reference at the id of the memo, which should
normally not be used unless someone tries to deepcopy
the memo itself...
"""
try:
memo[id(memo)].append(x)
except KeyError:
# aha, this is the first one :-)
memo[id(memo)]=[x]
def _deepcopy_inst(x, memo):
if hasattr(x, '__deepcopy__'):
return x.__deepcopy__(memo)
if hasattr(x, '__getinitargs__'):
args = x.__getinitargs__()
args = deepcopy(args, memo)
y = x.__class__(*args)
else:
y = _EmptyClass()
y.__class__ = x.__class__
memo[id(x)] = y
if hasattr(x, '__getstate__'):
state = x.__getstate__()
else:
state = x.__dict__
state = deepcopy(state, memo)
if hasattr(y, '__setstate__'):
y.__setstate__(state)
else:
y.__dict__.update(state)
return y
d[types.InstanceType] = _deepcopy_inst
def _reconstruct(x, info, deep, memo=None):
if isinstance(info, str):
return x
assert isinstance(info, tuple)
if memo is None:
memo = {}
n = len(info)
assert n in (2, 3, 4, 5)
callable, args = info[:2]
if n > 2:
state = info[2]
else:
state = None
if n > 3:
listiter = info[3]
else:
listiter = None
if n > 4:
dictiter = info[4]
else:
dictiter = None
if deep:
args = deepcopy(args, memo)
y = callable(*args)
memo[id(x)] = y
if state is not None:
if deep:
state = deepcopy(state, memo)
if hasattr(y, '__setstate__'):
y.__setstate__(state)
else:
if isinstance(state, tuple) and len(state) == 2:
state, slotstate = state
else:
slotstate = None
if state is not None:
y.__dict__.update(state)
if slotstate is not None:
for key, value in slotstate.iteritems():
setattr(y, key, value)
if listiter is not None:
for item in listiter:
if deep:
item = deepcopy(item, memo)
y.append(item)
if dictiter is not None:
for key, value in dictiter:
if deep:
key = deepcopy(key, memo)
value = deepcopy(value, memo)
y[key] = value
return y
del d
del types
# Helper for instance creation without calling __init__
class _EmptyClass:
pass
def _test():
l = [None, 1, 2L, 3.14, 'xyzzy', (1, 2L), [3.14, 'abc'],
{'abc': 'ABC'}, (), [], {}]
l1 = copy(l)
print l1==l
l1 = map(copy, l)
print l1==l
l1 = deepcopy(l)
print l1==l
class C:
def __init__(self, arg=None):
self.a = 1
self.arg = arg
if __name__ == '__main__':
import sys
file = sys.argv[0]
else:
file = __file__
self.fp = open(file)
self.fp.close()
def __getstate__(self):
return {'a': self.a, 'arg': self.arg}
def __setstate__(self, state):
for key, value in state.iteritems():
setattr(self, key, value)
def __deepcopy__(self, memo=None):
new = self.__class__(deepcopy(self.arg, memo))
new.a = self.a
return new
c = C('argument sketch')
l.append(c)
l2 = copy(l)
print l == l2
print l
print l2
l2 = deepcopy(l)
print l == l2
print l
print l2
l.append({l[1]: l, 'xyz': l[2]})
l3 = copy(l)
import repr
print map(repr.repr, l)
print map(repr.repr, l1)
print map(repr.repr, l2)
print map(repr.repr, l3)
l3 = deepcopy(l)
import repr
print map(repr.repr, l)
print map(repr.repr, l1)
print map(repr.repr, l2)
print map(repr.repr, l3)
class odict(dict):
def __init__(self, d = {}):
self.a = 99
dict.__init__(self, d)
def __setitem__(self, k, i):
dict.__setitem__(self, k, i)
self.a
o = odict({"A" : "B"})
x = deepcopy(o)
print(o, x)
if __name__ == '__main__':
_test()
"""
csv.py - read/write/investigate CSV files
"""
import re
from functools import reduce
from _csv import Error, __version__, writer, reader, register_dialect, \
unregister_dialect, get_dialect, list_dialects, \
field_size_limit, \
QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONNUMERIC, QUOTE_NONE, \
__doc__
from _csv import Dialect as _Dialect
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
__all__ = [ "QUOTE_MINIMAL", "QUOTE_ALL", "QUOTE_NONNUMERIC", "QUOTE_NONE",
"Error", "Dialect", "__doc__", "excel", "excel_tab",
"field_size_limit", "reader", "writer",
"register_dialect", "get_dialect", "list_dialects", "Sniffer",
"unregister_dialect", "__version__", "DictReader", "DictWriter" ]
class Dialect:
"""Describe an Excel dialect.
This must be subclassed (see csv.excel). Valid attributes are:
delimiter, quotechar, escapechar, doublequote, skipinitialspace,
lineterminator, quoting.
"""
_name = ""
_valid = False
# placeholders
delimiter = None
quotechar = None
escapechar = None
doublequote = None
skipinitialspace = None
lineterminator = None
quoting = None
def __init__(self):
if self.__class__ != Dialect:
self._valid = True
self._validate()
def _validate(self):
try:
_Dialect(self)
except TypeError, e:
# We do this for compatibility with py2.3
raise Error(str(e))
class excel(Dialect):
"""Describe the usual properties of Excel-generated CSV files."""
delimiter = ','
quotechar = '"'
doublequote = True
skipinitialspace = False
lineterminator = '\r\n'
quoting = QUOTE_MINIMAL
register_dialect("excel", excel)
class excel_tab(excel):
"""Describe the usual properties of Excel-generated TAB-delimited files."""
delimiter = '\t'
register_dialect("excel-tab", excel_tab)
class DictReader:
def __init__(self, f, fieldnames=None, restkey=None, restval=None,
dialect="excel", *args, **kwds):
self._fieldnames = fieldnames # list of keys for the dict
self.restkey = restkey # key to catch long rows
self.restval = restval # default value for short rows
self.reader = reader(f, dialect, *args, **kwds)
self.dialect = dialect
self.line_num = 0
def __iter__(self):
return self
@property
def fieldnames(self):
if self._fieldnames is None:
try:
self._fieldnames = self.reader.next()
except StopIteration:
pass
self.line_num = self.reader.line_num
return self._fieldnames
# Issue 20004: Because DictReader is a classic class, this setter is
# ignored. At this point in 2.7's lifecycle, it is too late to change the
# base class for fear of breaking working code. If you want to change
# fieldnames without overwriting the getter, set _fieldnames directly.
@fieldnames.setter
def fieldnames(self, value):
self._fieldnames = value
def next(self):
if self.line_num == 0:
# Used only for its side effect.
self.fieldnames
row = self.reader.next()
self.line_num = self.reader.line_num
# unlike the basic reader, we prefer not to return blanks,
# because we will typically wind up with a dict full of None
# values
while row == []:
row = self.reader.next()
d = dict(zip(self.fieldnames, row))
lf = len(self.fieldnames)
lr = len(row)
if lf < lr:
d[self.restkey] = row[lf:]
elif lf > lr:
for key in self.fieldnames[lr:]:
d[key] = self.restval
return d
class DictWriter:
def __init__(self, f, fieldnames, restval="", extrasaction="raise",
dialect="excel", *args, **kwds):
self.fieldnames = fieldnames # list of keys for the dict
self.restval = restval # for writing short dicts
if extrasaction.lower() not in ("raise", "ignore"):
raise ValueError, \
("extrasaction (%s) must be 'raise' or 'ignore'" %
extrasaction)
self.extrasaction = extrasaction
self.writer = writer(f, dialect, *args, **kwds)
def writeheader(self):
header = dict(zip(self.fieldnames, self.fieldnames))
self.writerow(header)
def _dict_to_list(self, rowdict):
if self.extrasaction == "raise":
wrong_fields = [k for k in rowdict if k not in self.fieldnames]
if wrong_fields:
raise ValueError("dict contains fields not in fieldnames: "
+ ", ".join([repr(x) for x in wrong_fields]))
return [rowdict.get(key, self.restval) for key in self.fieldnames]
def writerow(self, rowdict):
return self.writer.writerow(self._dict_to_list(rowdict))
def writerows(self, rowdicts):
rows = []
for rowdict in rowdicts:
rows.append(self._dict_to_list(rowdict))
return self.writer.writerows(rows)
# Guard Sniffer's type checking against builds that exclude complex()
try:
complex
except NameError:
complex = float
class Sniffer:
'''
"Sniffs" the format of a CSV file (i.e. delimiter, quotechar)
Returns a Dialect object.
'''
def __init__(self):
# in case there is more than one possible delimiter
self.preferred = [',', '\t', ';', ' ', ':']
def sniff(self, sample, delimiters=None):
"""
Returns a dialect (or None) corresponding to the sample
"""
quotechar, doublequote, delimiter, skipinitialspace = \
self._guess_quote_and_delimiter(sample, delimiters)
if not delimiter:
delimiter, skipinitialspace = self._guess_delimiter(sample,
delimiters)
if not delimiter:
raise Error, "Could not determine delimiter"
class dialect(Dialect):
_name = "sniffed"
lineterminator = '\r\n'
quoting = QUOTE_MINIMAL
# escapechar = ''
dialect.doublequote = doublequote
dialect.delimiter = delimiter
# _csv.reader won't accept a quotechar of ''
dialect.quotechar = quotechar or '"'
dialect.skipinitialspace = skipinitialspace
return dialect
def _guess_quote_and_delimiter(self, data, delimiters):
"""
Looks for text enclosed between two identical quotes
(the probable quotechar) which are preceded and followed
by the same character (the probable delimiter).
For example:
,'some text',
The quote with the most wins, same with the delimiter.
If there is no quotechar the delimiter can't be determined
this way.
"""
matches = []
for restr in ('(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)', # ,".*?",
'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?P<delim>[^\w\n"\'])(?P<space> ?)', # ".*?",
'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?:$|\n)', # ,".*?"
'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?:$|\n)'): # ".*?" (no delim, no space)
regexp = re.compile(restr, re.DOTALL | re.MULTILINE)
matches = regexp.findall(data)
if matches:
break
if not matches:
# (quotechar, doublequote, delimiter, skipinitialspace)
return ('', False, None, 0)
quotes = {}
delims = {}
spaces = 0
for m in matches:
n = regexp.groupindex['quote'] - 1
key = m[n]
if key:
quotes[key] = quotes.get(key, 0) + 1
try:
n = regexp.groupindex['delim'] - 1
key = m[n]
except KeyError:
continue
if key and (delimiters is None or key in delimiters):
delims[key] = delims.get(key, 0) + 1
try:
n = regexp.groupindex['space'] - 1
except KeyError:
continue
if m[n]:
spaces += 1
quotechar = reduce(lambda a, b, quotes = quotes:
(quotes[a] > quotes[b]) and a or b, quotes.keys())
if delims:
delim = reduce(lambda a, b, delims = delims:
(delims[a] > delims[b]) and a or b, delims.keys())
skipinitialspace = delims[delim] == spaces
if delim == '\n': # most likely a file with a single column
delim = ''
else:
# there is *no* delimiter, it's a single column of quoted data
delim = ''
skipinitialspace = 0
# if we see an extra quote between delimiters, we've got a
# double quoted format
dq_regexp = re.compile(
r"((%(delim)s)|^)\W*%(quote)s[^%(delim)s\n]*%(quote)s[^%(delim)s\n]*%(quote)s\W*((%(delim)s)|$)" % \
{'delim':re.escape(delim), 'quote':quotechar}, re.MULTILINE)
if dq_regexp.search(data):
doublequote = True
else:
doublequote = False
return (quotechar, doublequote, delim, skipinitialspace)
def _guess_delimiter(self, data, delimiters):
"""
The delimiter /should/ occur the same number of times on
each row. However, due to malformed data, it may not. We don't want
an all or nothing approach, so we allow for small variations in this
number.
1) build a table of the frequency of each character on every line.
2) build a table of frequencies of this frequency (meta-frequency?),
e.g. 'x occurred 5 times in 10 rows, 6 times in 1000 rows,
7 times in 2 rows'
3) use the mode of the meta-frequency to determine the /expected/
frequency for that character
4) find out how often the character actually meets that goal
5) the character that best meets its goal is the delimiter
For performance reasons, the data is evaluated in chunks, so it can
try and evaluate the smallest portion of the data possible, evaluating
additional chunks as necessary.
"""
data = filter(None, data.split('\n'))
ascii = [chr(c) for c in range(127)] # 7-bit ASCII
# build frequency tables
chunkLength = min(10, len(data))
iteration = 0
charFrequency = {}
modes = {}
delims = {}
start, end = 0, min(chunkLength, len(data))
while start < len(data):
iteration += 1
for line in data[start:end]:
for char in ascii:
metaFrequency = charFrequency.get(char, {})
# must count even if frequency is 0
freq = line.count(char)
# value is the mode
metaFrequency[freq] = metaFrequency.get(freq, 0) + 1
charFrequency[char] = metaFrequency
for char in charFrequency.keys():
items = charFrequency[char].items()
if len(items) == 1 and items[0][0] == 0:
continue
# get the mode of the frequencies
if len(items) > 1:
modes[char] = reduce(lambda a, b: a[1] > b[1] and a or b,
items)
# adjust the mode - subtract the sum of all
# other frequencies
items.remove(modes[char])
modes[char] = (modes[char][0], modes[char][1]
- reduce(lambda a, b: (0, a[1] + b[1]),
items)[1])
else:
modes[char] = items[0]
# build a list of possible delimiters
modeList = modes.items()
total = float(chunkLength * iteration)
# (rows of consistent data) / (number of rows) = 100%
consistency = 1.0
# minimum consistency threshold
threshold = 0.9
while len(delims) == 0 and consistency >= threshold:
for k, v in modeList:
if v[0] > 0 and v[1] > 0:
if ((v[1]/total) >= consistency and
(delimiters is None or k in delimiters)):
delims[k] = v
consistency -= 0.01
if len(delims) == 1:
delim = delims.keys()[0]
skipinitialspace = (data[0].count(delim) ==
data[0].count("%c " % delim))
return (delim, skipinitialspace)
# analyze another chunkLength lines
start = end
end += chunkLength
if not delims:
return ('', 0)
# if there's more than one, fall back to a 'preferred' list
if len(delims) > 1:
for d in self.preferred:
if d in delims.keys():
skipinitialspace = (data[0].count(d) ==
data[0].count("%c " % d))
return (d, skipinitialspace)
# nothing else indicates a preference, pick the character that
# dominates(?)
items = [(v,k) for (k,v) in delims.items()]
items.sort()
delim = items[-1][1]
skipinitialspace = (data[0].count(delim) ==
data[0].count("%c " % delim))
return (delim, skipinitialspace)
def has_header(self, sample):
# Creates a dictionary of types of data in each column. If any
# column is of a single type (say, integers), *except* for the first
# row, then the first row is presumed to be labels. If the type
# can't be determined, it is assumed to be a string in which case
# the length of the string is the determining factor: if all of the
# rows except for the first are the same length, it's a header.
# Finally, a 'vote' is taken at the end for each column, adding or
# subtracting from the likelihood of the first row being a header.
rdr = reader(StringIO(sample), self.sniff(sample))
header = rdr.next() # assume first row is header
columns = len(header)
columnTypes = {}
for i in range(columns): columnTypes[i] = None
checked = 0
for row in rdr:
# arbitrary number of rows to check, to keep it sane
if checked > 20:
break
checked += 1
if len(row) != columns:
continue # skip rows that have irregular number of columns
for col in columnTypes.keys():
for thisType in [int, long, float, complex]:
try:
thisType(row[col])
break
except (ValueError, OverflowError):
pass
else:
# fallback to length of string
thisType = len(row[col])
# treat longs as ints
if thisType == long:
thisType = int
if thisType != columnTypes[col]:
if columnTypes[col] is None: # add new column type
columnTypes[col] = thisType
else:
# type is inconsistent, remove column from
# consideration
del columnTypes[col]
# finally, compare results against first row and "vote"
# on whether it's a header
hasHeader = 0
for col, colType in columnTypes.items():
if type(colType) == type(0): # it's a length
if len(header[col]) != colType:
hasHeader += 1
else:
hasHeader -= 1
else: # attempt typecast
try:
colType(header[col])
except (ValueError, TypeError):
hasHeader += 1
else:
hasHeader -= 1
return hasHeader > 0
"""
dyld emulation
"""
import os
from framework import framework_info
from dylib import dylib_info
from itertools import *
__all__ = [
'dyld_find', 'framework_find',
'framework_info', 'dylib_info',
]
# These are the defaults as per man dyld(1)
#
DEFAULT_FRAMEWORK_FALLBACK = [
os.path.expanduser("~/Library/Frameworks"),
"/Library/Frameworks",
"/Network/Library/Frameworks",
"/System/Library/Frameworks",
]
DEFAULT_LIBRARY_FALLBACK = [
os.path.expanduser("~/lib"),
"/usr/local/lib",
"/lib",
"/usr/lib",
]
def ensure_utf8(s):
"""Not all of PyObjC and Python understand unicode paths very well yet"""
if isinstance(s, unicode):
return s.encode('utf8')
return s
def dyld_env(env, var):
if env is None:
env = os.environ
rval = env.get(var)
if rval is None:
return []
return rval.split(':')
def dyld_image_suffix(env=None):
if env is None:
env = os.environ
return env.get('DYLD_IMAGE_SUFFIX')
def dyld_framework_path(env=None):
return dyld_env(env, 'DYLD_FRAMEWORK_PATH')
def dyld_library_path(env=None):
return dyld_env(env, 'DYLD_LIBRARY_PATH')
def dyld_fallback_framework_path(env=None):
return dyld_env(env, 'DYLD_FALLBACK_FRAMEWORK_PATH')
def dyld_fallback_library_path(env=None):
return dyld_env(env, 'DYLD_FALLBACK_LIBRARY_PATH')
def dyld_image_suffix_search(iterator, env=None):
"""For a potential path iterator, add DYLD_IMAGE_SUFFIX semantics"""
suffix = dyld_image_suffix(env)
if suffix is None:
return iterator
def _inject(iterator=iterator, suffix=suffix):
for path in iterator:
if path.endswith('.dylib'):
yield path[:-len('.dylib')] + suffix + '.dylib'
else:
yield path + suffix
yield path
return _inject()
def dyld_override_search(name, env=None):
# If DYLD_FRAMEWORK_PATH is set and this dylib_name is a
# framework name, use the first file that exists in the framework
# path if any. If there is none go on to search the DYLD_LIBRARY_PATH
# if any.
framework = framework_info(name)
if framework is not None:
for path in dyld_framework_path(env):
yield os.path.join(path, framework['name'])
# If DYLD_LIBRARY_PATH is set then use the first file that exists
# in the path. If none use the original name.
for path in dyld_library_path(env):
yield os.path.join(path, os.path.basename(name))
def dyld_executable_path_search(name, executable_path=None):
# If we haven't done any searching and found a library and the
# dylib_name starts with "@executable_path/" then construct the
# library name.
if name.startswith('@executable_path/') and executable_path is not None:
yield os.path.join(executable_path, name[len('@executable_path/'):])
def dyld_default_search(name, env=None):
yield name
framework = framework_info(name)
if framework is not None:
fallback_framework_path = dyld_fallback_framework_path(env)
for path in fallback_framework_path:
yield os.path.join(path, framework['name'])
fallback_library_path = dyld_fallback_library_path(env)
for path in fallback_library_path:
yield os.path.join(path, os.path.basename(name))
if framework is not None and not fallback_framework_path:
for path in DEFAULT_FRAMEWORK_FALLBACK:
yield os.path.join(path, framework['name'])
if not fallback_library_path:
for path in DEFAULT_LIBRARY_FALLBACK:
yield os.path.join(path, os.path.basename(name))
def dyld_find(name, executable_path=None, env=None):
"""
Find a library or framework using dyld semantics
"""
name = ensure_utf8(name)
executable_path = ensure_utf8(executable_path)
for path in dyld_image_suffix_search(chain(
dyld_override_search(name, env),
dyld_executable_path_search(name, executable_path),
dyld_default_search(name, env),
), env):
if os.path.isfile(path):
return path
raise ValueError("dylib %s could not be found" % (name,))
def framework_find(fn, executable_path=None, env=None):
"""
Find a framework using dyld semantics in a very loose manner.
Will take input such as:
Python
Python.framework
Python.framework/Versions/Current
"""
try:
return dyld_find(fn, executable_path=executable_path, env=env)
except ValueError, e:
pass
fmwk_index = fn.rfind('.framework')
if fmwk_index == -1:
fmwk_index = len(fn)
fn += '.framework'
fn = os.path.join(fn, os.path.basename(fn[:fmwk_index]))
try:
return dyld_find(fn, executable_path=executable_path, env=env)
except ValueError:
raise e
def test_dyld_find():
env = {}
assert dyld_find('libSystem.dylib') == '/usr/lib/libSystem.dylib'
assert dyld_find('System.framework/System') == '/System/Library/Frameworks/System.framework/System'
if __name__ == '__main__':
test_dyld_find()
"""
Generic dylib path manipulation
"""
import re
__all__ = ['dylib_info']
DYLIB_RE = re.compile(r"""(?x)
(?P<location>^.*)(?:^|/)
(?P<name>
(?P<shortname>\w+?)
(?:\.(?P<version>[^._]+))?
(?:_(?P<suffix>[^._]+))?
\.dylib$
)
""")
def dylib_info(filename):
"""
A dylib name can take one of the following four forms:
Location/Name.SomeVersion_Suffix.dylib
Location/Name.SomeVersion.dylib
Location/Name_Suffix.dylib
Location/Name.dylib
returns None if not found or a mapping equivalent to:
dict(
location='Location',
name='Name.SomeVersion_Suffix.dylib',
shortname='Name',
version='SomeVersion',
suffix='Suffix',
)
Note that SomeVersion and Suffix are optional and may be None
if not present.
"""
is_dylib = DYLIB_RE.match(filename)
if not is_dylib:
return None
return is_dylib.groupdict()
def test_dylib_info():
def d(location=None, name=None, shortname=None, version=None, suffix=None):
return dict(
location=location,
name=name,
shortname=shortname,
version=version,
suffix=suffix
)
assert dylib_info('completely/invalid') is None
assert dylib_info('completely/invalide_debug') is None
assert dylib_info('P/Foo.dylib') == d('P', 'Foo.dylib', 'Foo')
assert dylib_info('P/Foo_debug.dylib') == d('P', 'Foo_debug.dylib', 'Foo', suffix='debug')
assert dylib_info('P/Foo.A.dylib') == d('P', 'Foo.A.dylib', 'Foo', 'A')
assert dylib_info('P/Foo_debug.A.dylib') == d('P', 'Foo_debug.A.dylib', 'Foo_debug', 'A')
assert dylib_info('P/Foo.A_debug.dylib') == d('P', 'Foo.A_debug.dylib', 'Foo', 'A', 'debug')
if __name__ == '__main__':
test_dylib_info()
"""
Generic framework path manipulation
"""
import re
__all__ = ['framework_info']
STRICT_FRAMEWORK_RE = re.compile(r"""(?x)
(?P<location>^.*)(?:^|/)
(?P<name>
(?P<shortname>\w+).framework/
(?:Versions/(?P<version>[^/]+)/)?
(?P=shortname)
(?:_(?P<suffix>[^_]+))?
)$
""")
def framework_info(filename):
"""
A framework name can take one of the following four forms:
Location/Name.framework/Versions/SomeVersion/Name_Suffix
Location/Name.framework/Versions/SomeVersion/Name
Location/Name.framework/Name_Suffix
Location/Name.framework/Name
returns None if not found, or a mapping equivalent to:
dict(
location='Location',
name='Name.framework/Versions/SomeVersion/Name_Suffix',
shortname='Name',
version='SomeVersion',
suffix='Suffix',
)
Note that SomeVersion and Suffix are optional and may be None
if not present
"""
is_framework = STRICT_FRAMEWORK_RE.match(filename)
if not is_framework:
return None
return is_framework.groupdict()
def test_framework_info():
def d(location=None, name=None, shortname=None, version=None, suffix=None):
return dict(
location=location,
name=name,
shortname=shortname,
version=version,
suffix=suffix
)
assert framework_info('completely/invalid') is None
assert framework_info('completely/invalid/_debug') is None
assert framework_info('P/F.framework') is None
assert framework_info('P/F.framework/_debug') is None
assert framework_info('P/F.framework/F') == d('P', 'F.framework/F', 'F')
assert framework_info('P/F.framework/F_debug') == d('P', 'F.framework/F_debug', 'F', suffix='debug')
assert framework_info('P/F.framework/Versions') is None
assert framework_info('P/F.framework/Versions/A') is None
assert framework_info('P/F.framework/Versions/A/F') == d('P', 'F.framework/Versions/A/F', 'F', 'A')
assert framework_info('P/F.framework/Versions/A/F_debug') == d('P', 'F.framework/Versions/A/F_debug', 'F', 'A', 'debug')
if __name__ == '__main__':
test_framework_info()
"""
Enough Mach-O to make your head spin.
See the relevant header files in /usr/include/mach-o
And also Apple's documentation.
"""
__version__ = '1.0'
import os
import subprocess
import sys
# find_library(name) returns the pathname of a library, or None.
if os.name == "nt":
def _get_build_version():
"""Return the version of MSVC that was used to build Python.
For Python 2.3 and up, the version number is included in
sys.version. For earlier versions, assume the compiler is MSVC 6.
"""
# This function was copied from Lib/distutils/msvccompiler.py
prefix = "MSC v."
i = sys.version.find(prefix)
if i == -1:
return 6
i = i + len(prefix)
s, rest = sys.version[i:].split(" ", 1)
majorVersion = int(s[:-2]) - 6
minorVersion = int(s[2:3]) / 10.0
# I don't think paths are affected by minor version in version 6
if majorVersion == 6:
minorVersion = 0
if majorVersion >= 6:
return majorVersion + minorVersion
# else we don't know what version of the compiler this is
return None
def find_msvcrt():
"""Return the name of the VC runtime dll"""
version = _get_build_version()
if version is None:
# better be safe than sorry
return None
if version <= 6:
clibname = 'msvcrt'
else:
clibname = 'msvcr%d' % (version * 10)
# If python was built with in debug mode
import imp
if imp.get_suffixes()[0][0] == '_d.pyd':
clibname += 'd'
return clibname+'.dll'
def find_library(name):
if name in ('c', 'm'):
return find_msvcrt()
# See MSDN for the REAL search order.
for directory in os.environ['PATH'].split(os.pathsep):
fname = os.path.join(directory, name)
if os.path.isfile(fname):
return fname
if fname.lower().endswith(".dll"):
continue
fname = fname + ".dll"
if os.path.isfile(fname):
return fname
return None
if os.name == "ce":
# search path according to MSDN:
# - absolute path specified by filename
# - The .exe launch directory
# - the Windows directory
# - ROM dll files (where are they?)
# - OEM specified search path: HKLM\Loader\SystemPath
def find_library(name):
return name
if os.name == "posix" and sys.platform == "darwin":
from ctypes.macholib.dyld import dyld_find as _dyld_find
def find_library(name):
possible = ['lib%s.dylib' % name,
'%s.dylib' % name,
'%s.framework/%s' % (name, name)]
for name in possible:
try:
return _dyld_find(name)
except ValueError:
continue
return None
elif os.name == "posix":
# Andreas Degert's find functions, using gcc, /sbin/ldconfig, objdump
import re, tempfile, errno
def _findLib_gcc(name):
# Run GCC's linker with the -t (aka --trace) option and examine the
# library name it prints out. The GCC command will fail because we
# haven't supplied a proper program with main(), but that does not
# matter.
expr = r'[^\(\)\s]*lib%s\.[^\(\)\s]*' % re.escape(name)
cmd = 'if type gcc >/dev/null 2>&1; then CC=gcc; elif type cc >/dev/null 2>&1; then CC=cc;else exit; fi;' \
'LANG=C LC_ALL=C $CC -Wl,-t -o "$2" 2>&1 -l"$1"'
temp = tempfile.NamedTemporaryFile()
try:
proc = subprocess.Popen((cmd, '_findLib_gcc', name, temp.name),
shell=True,
stdout=subprocess.PIPE)
[trace, _] = proc.communicate()
finally:
try:
temp.close()
except OSError, e:
# ENOENT is raised if the file was already removed, which is
# the normal behaviour of GCC if linking fails
if e.errno != errno.ENOENT:
raise
res = re.search(expr, trace)
if not res:
return None
return res.group(0)
if sys.platform == "sunos5":
# use /usr/ccs/bin/dump on solaris
def _get_soname(f):
if not f:
return None
null = open(os.devnull, "wb")
try:
with null:
proc = subprocess.Popen(("/usr/ccs/bin/dump", "-Lpv", f),
stdout=subprocess.PIPE,
stderr=null)
except OSError: # E.g. command not found
return None
[data, _] = proc.communicate()
res = re.search(br'\[.*\]\sSONAME\s+([^\s]+)', data)
if not res:
return None
return res.group(1)
else:
def _get_soname(f):
# assuming GNU binutils / ELF
if not f:
return None
cmd = 'if ! type objdump >/dev/null 2>&1; then exit; fi;' \
'objdump -p -j .dynamic 2>/dev/null "$1"'
proc = subprocess.Popen((cmd, '_get_soname', f), shell=True,
stdout=subprocess.PIPE)
[dump, _] = proc.communicate()
res = re.search(br'\sSONAME\s+([^\s]+)', dump)
if not res:
return None
return res.group(1)
if (sys.platform.startswith("freebsd")
or sys.platform.startswith("openbsd")
or sys.platform.startswith("dragonfly")):
def _num_version(libname):
# "libxyz.so.MAJOR.MINOR" => [ MAJOR, MINOR ]
parts = libname.split(b".")
nums = []
try:
while parts:
nums.insert(0, int(parts.pop()))
except ValueError:
pass
return nums or [sys.maxint]
def find_library(name):
ename = re.escape(name)
expr = r':-l%s\.\S+ => \S*/(lib%s\.\S+)' % (ename, ename)
null = open(os.devnull, 'wb')
try:
with null:
proc = subprocess.Popen(('/sbin/ldconfig', '-r'),
stdout=subprocess.PIPE,
stderr=null)
except OSError: # E.g. command not found
data = b''
else:
[data, _] = proc.communicate()
res = re.findall(expr, data)
if not res:
return _get_soname(_findLib_gcc(name))
res.sort(key=_num_version)
return res[-1]
elif sys.platform == "sunos5":
def _findLib_crle(name, is64):
if not os.path.exists('/usr/bin/crle'):
return None
env = dict(os.environ)
env['LC_ALL'] = 'C'
if is64:
args = ('/usr/bin/crle', '-64')
else:
args = ('/usr/bin/crle',)
paths = None
null = open(os.devnull, 'wb')
try:
with null:
proc = subprocess.Popen(args,
stdout=subprocess.PIPE,
stderr=null,
env=env)
except OSError: # E.g. bad executable
return None
try:
for line in proc.stdout:
line = line.strip()
if line.startswith(b'Default Library Path (ELF):'):
paths = line.split()[4]
finally:
proc.stdout.close()
proc.wait()
if not paths:
return None
for dir in paths.split(":"):
libfile = os.path.join(dir, "lib%s.so" % name)
if os.path.exists(libfile):
return libfile
return None
def find_library(name, is64 = False):
return _get_soname(_findLib_crle(name, is64) or _findLib_gcc(name))
else:
def _findSoname_ldconfig(name):
import struct
if struct.calcsize('l') == 4:
machine = os.uname()[4] + '-32'
else:
machine = os.uname()[4] + '-64'
mach_map = {
'x86_64-64': 'libc6,x86-64',
'ppc64-64': 'libc6,64bit',
'sparc64-64': 'libc6,64bit',
's390x-64': 'libc6,64bit',
'ia64-64': 'libc6,IA-64',
}
abi_type = mach_map.get(machine, 'libc6')
# XXX assuming GLIBC's ldconfig (with option -p)
expr = r'\s+(lib%s\.[^\s]+)\s+\(%s' % (re.escape(name), abi_type)
env = dict(os.environ)
env['LC_ALL'] = 'C'
env['LANG'] = 'C'
null = open(os.devnull, 'wb')
try:
with null:
p = subprocess.Popen(['/sbin/ldconfig', '-p'],
stderr=null,
stdout=subprocess.PIPE,
env=env)
except OSError: # E.g. command not found
return None
[data, _] = p.communicate()
res = re.search(expr, data)
if not res:
return None
return res.group(1)
def find_library(name):
return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
################################################################
# test code
def test():
from ctypes import cdll
if os.name == "nt":
print cdll.msvcrt
print cdll.load("msvcrt")
print find_library("msvcrt")
if os.name == "posix":
# find and load_version
print find_library("m")
print find_library("c")
print find_library("bz2")
# getattr
## print cdll.m
## print cdll.bz2
# load
if sys.platform == "darwin":
print cdll.LoadLibrary("libm.dylib")
print cdll.LoadLibrary("libcrypto.dylib")
print cdll.LoadLibrary("libSystem.dylib")
print cdll.LoadLibrary("System.framework/System")
else:
print cdll.LoadLibrary("libm.so")
print cdll.LoadLibrary("libcrypt.so")
print find_library("crypt")
if __name__ == "__main__":
test()
# The most useful windows datatypes
from ctypes import *
BYTE = c_byte
WORD = c_ushort
DWORD = c_ulong
WCHAR = c_wchar
UINT = c_uint
INT = c_int
DOUBLE = c_double
FLOAT = c_float
BOOLEAN = BYTE
BOOL = c_long
from ctypes import _SimpleCData
class VARIANT_BOOL(_SimpleCData):
_type_ = "v"
def __repr__(self):
return "%s(%r)" % (self.__class__.__name__, self.value)
ULONG = c_ulong
LONG = c_long
USHORT = c_ushort
SHORT = c_short
# in the windows header files, these are structures.
_LARGE_INTEGER = LARGE_INTEGER = c_longlong
_ULARGE_INTEGER = ULARGE_INTEGER = c_ulonglong
LPCOLESTR = LPOLESTR = OLESTR = c_wchar_p
LPCWSTR = LPWSTR = c_wchar_p
LPCSTR = LPSTR = c_char_p
LPCVOID = LPVOID = c_void_p
# WPARAM is defined as UINT_PTR (unsigned type)
# LPARAM is defined as LONG_PTR (signed type)
if sizeof(c_long) == sizeof(c_void_p):
WPARAM = c_ulong
LPARAM = c_long
elif sizeof(c_longlong) == sizeof(c_void_p):
WPARAM = c_ulonglong
LPARAM = c_longlong
ATOM = WORD
LANGID = WORD
COLORREF = DWORD
LGRPID = DWORD
LCTYPE = DWORD
LCID = DWORD
################################################################
# HANDLE types
HANDLE = c_void_p # in the header files: void *
HACCEL = HANDLE
HBITMAP = HANDLE
HBRUSH = HANDLE
HCOLORSPACE = HANDLE
HDC = HANDLE
HDESK = HANDLE
HDWP = HANDLE
HENHMETAFILE = HANDLE
HFONT = HANDLE
HGDIOBJ = HANDLE
HGLOBAL = HANDLE
HHOOK = HANDLE
HICON = HANDLE
HINSTANCE = HANDLE
HKEY = HANDLE
HKL = HANDLE
HLOCAL = HANDLE
HMENU = HANDLE
HMETAFILE = HANDLE
HMODULE = HANDLE
HMONITOR = HANDLE
HPALETTE = HANDLE
HPEN = HANDLE
HRGN = HANDLE
HRSRC = HANDLE
HSTR = HANDLE
HTASK = HANDLE
HWINSTA = HANDLE
HWND = HANDLE
SC_HANDLE = HANDLE
SERVICE_STATUS_HANDLE = HANDLE
################################################################
# Some important structure definitions
class RECT(Structure):
_fields_ = [("left", c_long),
("top", c_long),
("right", c_long),
("bottom", c_long)]
tagRECT = _RECTL = RECTL = RECT
class _SMALL_RECT(Structure):
_fields_ = [('Left', c_short),
('Top', c_short),
('Right', c_short),
('Bottom', c_short)]
SMALL_RECT = _SMALL_RECT
class _COORD(Structure):
_fields_ = [('X', c_short),
('Y', c_short)]
class POINT(Structure):
_fields_ = [("x", c_long),
("y", c_long)]
tagPOINT = _POINTL = POINTL = POINT
class SIZE(Structure):
_fields_ = [("cx", c_long),
("cy", c_long)]
tagSIZE = SIZEL = SIZE
def RGB(red, green, blue):
return red + (green << 8) + (blue << 16)
class FILETIME(Structure):
_fields_ = [("dwLowDateTime", DWORD),
("dwHighDateTime", DWORD)]
_FILETIME = FILETIME
class MSG(Structure):
_fields_ = [("hWnd", HWND),
("message", c_uint),
("wParam", WPARAM),
("lParam", LPARAM),
("time", DWORD),
("pt", POINT)]
tagMSG = MSG
MAX_PATH = 260
class WIN32_FIND_DATAA(Structure):
_fields_ = [("dwFileAttributes", DWORD),
("ftCreationTime", FILETIME),
("ftLastAccessTime", FILETIME),
("ftLastWriteTime", FILETIME),
("nFileSizeHigh", DWORD),
("nFileSizeLow", DWORD),
("dwReserved0", DWORD),
("dwReserved1", DWORD),
("cFileName", c_char * MAX_PATH),
("cAlternateFileName", c_char * 14)]
class WIN32_FIND_DATAW(Structure):
_fields_ = [("dwFileAttributes", DWORD),
("ftCreationTime", FILETIME),
("ftLastAccessTime", FILETIME),
("ftLastWriteTime", FILETIME),
("nFileSizeHigh", DWORD),
("nFileSizeLow", DWORD),
("dwReserved0", DWORD),
("dwReserved1", DWORD),
("cFileName", c_wchar * MAX_PATH),
("cAlternateFileName", c_wchar * 14)]
__all__ = ['ATOM', 'BOOL', 'BOOLEAN', 'BYTE', 'COLORREF', 'DOUBLE', 'DWORD',
'FILETIME', 'FLOAT', 'HACCEL', 'HANDLE', 'HBITMAP', 'HBRUSH',
'HCOLORSPACE', 'HDC', 'HDESK', 'HDWP', 'HENHMETAFILE', 'HFONT',
'HGDIOBJ', 'HGLOBAL', 'HHOOK', 'HICON', 'HINSTANCE', 'HKEY',
'HKL', 'HLOCAL', 'HMENU', 'HMETAFILE', 'HMODULE', 'HMONITOR',
'HPALETTE', 'HPEN', 'HRGN', 'HRSRC', 'HSTR', 'HTASK', 'HWINSTA',
'HWND', 'INT', 'LANGID', 'LARGE_INTEGER', 'LCID', 'LCTYPE',
'LGRPID', 'LONG', 'LPARAM', 'LPCOLESTR', 'LPCSTR', 'LPCVOID',
'LPCWSTR', 'LPOLESTR', 'LPSTR', 'LPVOID', 'LPWSTR', 'MAX_PATH',
'MSG', 'OLESTR', 'POINT', 'POINTL', 'RECT', 'RECTL', 'RGB',
'SC_HANDLE', 'SERVICE_STATUS_HANDLE', 'SHORT', 'SIZE', 'SIZEL',
'SMALL_RECT', 'UINT', 'ULARGE_INTEGER', 'ULONG', 'USHORT',
'VARIANT_BOOL', 'WCHAR', 'WIN32_FIND_DATAA', 'WIN32_FIND_DATAW',
'WORD', 'WPARAM', '_COORD', '_FILETIME', '_LARGE_INTEGER',
'_POINTL', '_RECTL', '_SMALL_RECT', '_ULARGE_INTEGER', 'tagMSG',
'tagPOINT', 'tagRECT', 'tagSIZE']
import sys
from ctypes import *
_array_type = type(Array)
def _other_endian(typ):
"""Return the type with the 'other' byte order. Simple types like
c_int and so on already have __ctype_be__ and __ctype_le__
attributes which contain the types, for more complicated types
arrays and structures are supported.
"""
# check _OTHER_ENDIAN attribute (present if typ is primitive type)
if hasattr(typ, _OTHER_ENDIAN):
return getattr(typ, _OTHER_ENDIAN)
# if typ is array
if isinstance(typ, _array_type):
return _other_endian(typ._type_) * typ._length_
# if typ is structure
if issubclass(typ, Structure):
return typ
raise TypeError("This type does not support other endian: %s" % typ)
class _swapped_meta(type(Structure)):
def __setattr__(self, attrname, value):
if attrname == "_fields_":
fields = []
for desc in value:
name = desc[0]
typ = desc[1]
rest = desc[2:]
fields.append((name, _other_endian(typ)) + rest)
value = fields
super(_swapped_meta, self).__setattr__(attrname, value)
################################################################
# Note: The Structure metaclass checks for the *presence* (not the
# value!) of a _swapped_bytes_ attribute to determine the bit order in
# structures containing bit fields.
if sys.byteorder == "little":
_OTHER_ENDIAN = "__ctype_be__"
LittleEndianStructure = Structure
class BigEndianStructure(Structure):
"""Structure with big endian byte order"""
__metaclass__ = _swapped_meta
_swappedbytes_ = None
elif sys.byteorder == "big":
_OTHER_ENDIAN = "__ctype_le__"
BigEndianStructure = Structure
class LittleEndianStructure(Structure):
"""Structure with little endian byte order"""
__metaclass__ = _swapped_meta
_swappedbytes_ = None
else:
raise RuntimeError("Invalid byteorder")
"""create and manipulate C data types in Python"""
import os as _os, sys as _sys
__version__ = "1.1.0"
from _ctypes import Union, Structure, Array
from _ctypes import _Pointer
from _ctypes import CFuncPtr as _CFuncPtr
from _ctypes import __version__ as _ctypes_version
from _ctypes import RTLD_LOCAL, RTLD_GLOBAL
from _ctypes import ArgumentError
from struct import calcsize as _calcsize
if __version__ != _ctypes_version:
raise Exception("Version number mismatch", __version__, _ctypes_version)
if _os.name in ("nt", "ce"):
from _ctypes import FormatError
DEFAULT_MODE = RTLD_LOCAL
if _os.name == "posix" and _sys.platform == "darwin":
# On OS X 10.3, we use RTLD_GLOBAL as default mode
# because RTLD_LOCAL does not work at least on some
# libraries. OS X 10.3 is Darwin 7, so we check for
# that.
if int(_os.uname()[2].split('.')[0]) < 8:
DEFAULT_MODE = RTLD_GLOBAL
from _ctypes import FUNCFLAG_CDECL as _FUNCFLAG_CDECL, \
FUNCFLAG_PYTHONAPI as _FUNCFLAG_PYTHONAPI, \
FUNCFLAG_USE_ERRNO as _FUNCFLAG_USE_ERRNO, \
FUNCFLAG_USE_LASTERROR as _FUNCFLAG_USE_LASTERROR
"""
WINOLEAPI -> HRESULT
WINOLEAPI_(type)
STDMETHODCALLTYPE
STDMETHOD(name)
STDMETHOD_(type, name)
STDAPICALLTYPE
"""
def create_string_buffer(init, size=None):
"""create_string_buffer(aString) -> character array
create_string_buffer(anInteger) -> character array
create_string_buffer(aString, anInteger) -> character array
"""
if isinstance(init, (str, unicode)):
if size is None:
size = len(init)+1
buftype = c_char * size
buf = buftype()
buf.value = init
return buf
elif isinstance(init, (int, long)):
buftype = c_char * init
buf = buftype()
return buf
raise TypeError(init)
def c_buffer(init, size=None):
## "deprecated, use create_string_buffer instead"
## import warnings
## warnings.warn("c_buffer is deprecated, use create_string_buffer instead",
## DeprecationWarning, stacklevel=2)
return create_string_buffer(init, size)
_c_functype_cache = {}
def CFUNCTYPE(restype, *argtypes, **kw):
"""CFUNCTYPE(restype, *argtypes,
use_errno=False, use_last_error=False) -> function prototype.
restype: the result type
argtypes: a sequence specifying the argument types
The function prototype can be called in different ways to create a
callable object:
prototype(integer address) -> foreign function
prototype(callable) -> create and return a C callable function from callable
prototype(integer index, method name[, paramflags]) -> foreign function calling a COM method
prototype((ordinal number, dll object)[, paramflags]) -> foreign function exported by ordinal
prototype((function name, dll object)[, paramflags]) -> foreign function exported by name
"""
flags = _FUNCFLAG_CDECL
if kw.pop("use_errno", False):
flags |= _FUNCFLAG_USE_ERRNO
if kw.pop("use_last_error", False):
flags |= _FUNCFLAG_USE_LASTERROR
if kw:
raise ValueError("unexpected keyword argument(s) %s" % kw.keys())
try:
return _c_functype_cache[(restype, argtypes, flags)]
except KeyError:
class CFunctionType(_CFuncPtr):
_argtypes_ = argtypes
_restype_ = restype
_flags_ = flags
_c_functype_cache[(restype, argtypes, flags)] = CFunctionType
return CFunctionType
if _os.name in ("nt", "ce"):
from _ctypes import LoadLibrary as _dlopen
from _ctypes import FUNCFLAG_STDCALL as _FUNCFLAG_STDCALL
if _os.name == "ce":
# 'ce' doesn't have the stdcall calling convention
_FUNCFLAG_STDCALL = _FUNCFLAG_CDECL
_win_functype_cache = {}
def WINFUNCTYPE(restype, *argtypes, **kw):
# docstring set later (very similar to CFUNCTYPE.__doc__)
flags = _FUNCFLAG_STDCALL
if kw.pop("use_errno", False):
flags |= _FUNCFLAG_USE_ERRNO
if kw.pop("use_last_error", False):
flags |= _FUNCFLAG_USE_LASTERROR
if kw:
raise ValueError("unexpected keyword argument(s) %s" % kw.keys())
try:
return _win_functype_cache[(restype, argtypes, flags)]
except KeyError:
class WinFunctionType(_CFuncPtr):
_argtypes_ = argtypes
_restype_ = restype
_flags_ = flags
_win_functype_cache[(restype, argtypes, flags)] = WinFunctionType
return WinFunctionType
if WINFUNCTYPE.__doc__:
WINFUNCTYPE.__doc__ = CFUNCTYPE.__doc__.replace("CFUNCTYPE", "WINFUNCTYPE")
elif _os.name == "posix":
from _ctypes import dlopen as _dlopen
from _ctypes import sizeof, byref, addressof, alignment, resize
from _ctypes import get_errno, set_errno
from _ctypes import _SimpleCData
def _check_size(typ, typecode=None):
# Check if sizeof(ctypes_type) against struct.calcsize. This
# should protect somewhat against a misconfigured libffi.
from struct import calcsize
if typecode is None:
# Most _type_ codes are the same as used in struct
typecode = typ._type_
actual, required = sizeof(typ), calcsize(typecode)
if actual != required:
raise SystemError("sizeof(%s) wrong: %d instead of %d" % \
(typ, actual, required))
class py_object(_SimpleCData):
_type_ = "O"
def __repr__(self):
try:
return super(py_object, self).__repr__()
except ValueError:
return "%s(<NULL>)" % type(self).__name__
_check_size(py_object, "P")
class c_short(_SimpleCData):
_type_ = "h"
_check_size(c_short)
class c_ushort(_SimpleCData):
_type_ = "H"
_check_size(c_ushort)
class c_long(_SimpleCData):
_type_ = "l"
_check_size(c_long)
class c_ulong(_SimpleCData):
_type_ = "L"
_check_size(c_ulong)
if _calcsize("i") == _calcsize("l"):
# if int and long have the same size, make c_int an alias for c_long
c_int = c_long
c_uint = c_ulong
else:
class c_int(_SimpleCData):
_type_ = "i"
_check_size(c_int)
class c_uint(_SimpleCData):
_type_ = "I"
_check_size(c_uint)
class c_float(_SimpleCData):
_type_ = "f"
_check_size(c_float)
class c_double(_SimpleCData):
_type_ = "d"
_check_size(c_double)
class c_longdouble(_SimpleCData):
_type_ = "g"
if sizeof(c_longdouble) == sizeof(c_double):
c_longdouble = c_double
if _calcsize("l") == _calcsize("q"):
# if long and long long have the same size, make c_longlong an alias for c_long
c_longlong = c_long
c_ulonglong = c_ulong
else:
class c_longlong(_SimpleCData):
_type_ = "q"
_check_size(c_longlong)
class c_ulonglong(_SimpleCData):
_type_ = "Q"
## def from_param(cls, val):
## return ('d', float(val), val)
## from_param = classmethod(from_param)
_check_size(c_ulonglong)
class c_ubyte(_SimpleCData):
_type_ = "B"
c_ubyte.__ctype_le__ = c_ubyte.__ctype_be__ = c_ubyte
# backward compatibility:
##c_uchar = c_ubyte
_check_size(c_ubyte)
class c_byte(_SimpleCData):
_type_ = "b"
c_byte.__ctype_le__ = c_byte.__ctype_be__ = c_byte
_check_size(c_byte)
class c_char(_SimpleCData):
_type_ = "c"
c_char.__ctype_le__ = c_char.__ctype_be__ = c_char
_check_size(c_char)
class c_char_p(_SimpleCData):
_type_ = "z"
if _os.name == "nt":
def __repr__(self):
if not windll.kernel32.IsBadStringPtrA(self, -1):
return "%s(%r)" % (self.__class__.__name__, self.value)
return "%s(%s)" % (self.__class__.__name__, cast(self, c_void_p).value)
else:
def __repr__(self):
return "%s(%s)" % (self.__class__.__name__, cast(self, c_void_p).value)
_check_size(c_char_p, "P")
class c_void_p(_SimpleCData):
_type_ = "P"
c_voidp = c_void_p # backwards compatibility (to a bug)
_check_size(c_void_p)
class c_bool(_SimpleCData):
_type_ = "?"
from _ctypes import POINTER, pointer, _pointer_type_cache
def _reset_cache():
_pointer_type_cache.clear()
_c_functype_cache.clear()
if _os.name in ("nt", "ce"):
_win_functype_cache.clear()
# _SimpleCData.c_wchar_p_from_param
POINTER(c_wchar).from_param = c_wchar_p.from_param
# _SimpleCData.c_char_p_from_param
POINTER(c_char).from_param = c_char_p.from_param
_pointer_type_cache[None] = c_void_p
# XXX for whatever reasons, creating the first instance of a callback
# function is needed for the unittests on Win64 to succeed. This MAY
# be a compiler bug, since the problem occurs only when _ctypes is
# compiled with the MS SDK compiler. Or an uninitialized variable?
CFUNCTYPE(c_int)(lambda: None)
try:
from _ctypes import set_conversion_mode
except ImportError:
pass
else:
if _os.name in ("nt", "ce"):
set_conversion_mode("mbcs", "ignore")
else:
set_conversion_mode("ascii", "strict")
class c_wchar_p(_SimpleCData):
_type_ = "Z"
class c_wchar(_SimpleCData):
_type_ = "u"
def create_unicode_buffer(init, size=None):
"""create_unicode_buffer(aString) -> character array
create_unicode_buffer(anInteger) -> character array
create_unicode_buffer(aString, anInteger) -> character array
"""
if isinstance(init, (str, unicode)):
if size is None:
size = len(init)+1
buftype = c_wchar * size
buf = buftype()
buf.value = init
return buf
elif isinstance(init, (int, long)):
buftype = c_wchar * init
buf = buftype()
return buf
raise TypeError(init)
# XXX Deprecated
def SetPointerType(pointer, cls):
if _pointer_type_cache.get(cls, None) is not None:
raise RuntimeError("This type already exists in the cache")
if id(pointer) not in _pointer_type_cache:
raise RuntimeError("What's this???")
pointer.set_type(cls)
_pointer_type_cache[cls] = pointer
del _pointer_type_cache[id(pointer)]
# XXX Deprecated
def ARRAY(typ, len):
return typ * len
################################################################
class CDLL(object):
"""An instance of this class represents a loaded dll/shared
library, exporting functions using the standard C calling
convention (named 'cdecl' on Windows).
The exported functions can be accessed as attributes, or by
indexing with the function name. Examples:
<obj>.qsort -> callable object
<obj>['qsort'] -> callable object
Calling the functions releases the Python GIL during the call and
reacquires it afterwards.
"""
_func_flags_ = _FUNCFLAG_CDECL
_func_restype_ = c_int
# default values for repr
_name = '<uninitialized>'
_handle = 0
_FuncPtr = None
def __init__(self, name, mode=DEFAULT_MODE, handle=None,
use_errno=False,
use_last_error=False):
self._name = name
flags = self._func_flags_
if use_errno:
flags |= _FUNCFLAG_USE_ERRNO
if use_last_error:
flags |= _FUNCFLAG_USE_LASTERROR
class _FuncPtr(_CFuncPtr):
_flags_ = flags
_restype_ = self._func_restype_
self._FuncPtr = _FuncPtr
if handle is None:
self._handle = _dlopen(self._name, mode)
else:
self._handle = handle
def __repr__(self):
return "<%s '%s', handle %x at %x>" % \
(self.__class__.__name__, self._name,
(self._handle & (_sys.maxint*2 + 1)),
id(self) & (_sys.maxint*2 + 1))
def __getattr__(self, name):
if name.startswith('__') and name.endswith('__'):
raise AttributeError(name)
func = self.__getitem__(name)
setattr(self, name, func)
return func
def __getitem__(self, name_or_ordinal):
func = self._FuncPtr((name_or_ordinal, self))
if not isinstance(name_or_ordinal, (int, long)):
func.__name__ = name_or_ordinal
return func
class PyDLL(CDLL):
"""This class represents the Python library itself. It allows
accessing Python API functions. The GIL is not released, and
Python exceptions are handled correctly.
"""
_func_flags_ = _FUNCFLAG_CDECL | _FUNCFLAG_PYTHONAPI
if _os.name in ("nt", "ce"):
class WinDLL(CDLL):
"""This class represents a dll exporting functions using the
Windows stdcall calling convention.
"""
_func_flags_ = _FUNCFLAG_STDCALL
# XXX Hm, what about HRESULT as normal parameter?
# Mustn't it derive from c_long then?
from _ctypes import _check_HRESULT, _SimpleCData
class HRESULT(_SimpleCData):
_type_ = "l"
# _check_retval_ is called with the function's result when it
# is used as restype. It checks for the FAILED bit, and
# raises a WindowsError if it is set.
#
# The _check_retval_ method is implemented in C, so that the
# method definition itself is not included in the traceback
# when it raises an error - that is what we want (and Python
# doesn't have a way to raise an exception in the caller's
# frame).
_check_retval_ = _check_HRESULT
class OleDLL(CDLL):
"""This class represents a dll exporting functions using the
Windows stdcall calling convention, and returning HRESULT.
HRESULT error values are automatically raised as WindowsError
exceptions.
"""
_func_flags_ = _FUNCFLAG_STDCALL
_func_restype_ = HRESULT
class LibraryLoader(object):
def __init__(self, dlltype):
self._dlltype = dlltype
def __getattr__(self, name):
if name[0] == '_':
raise AttributeError(name)
dll = self._dlltype(name)
setattr(self, name, dll)
return dll
def __getitem__(self, name):
return getattr(self, name)
def LoadLibrary(self, name):
return self._dlltype(name)
cdll = LibraryLoader(CDLL)
pydll = LibraryLoader(PyDLL)
if _os.name in ("nt", "ce"):
pythonapi = PyDLL("python dll", None, _sys.dllhandle)
elif _sys.platform == "cygwin":
pythonapi = PyDLL("libpython%d.%d.dll" % _sys.version_info[:2])
elif _sys.platform == "cli": # Need to determine how to do this
pythonapi = None
else:
pythonapi = PyDLL(None)
if _os.name in ("nt", "ce"):
windll = LibraryLoader(WinDLL)
oledll = LibraryLoader(OleDLL)
if _os.name == "nt":
GetLastError = windll.kernel32.GetLastError
else:
GetLastError = windll.coredll.GetLastError
from _ctypes import get_last_error, set_last_error
def WinError(code=None, descr=None):
if code is None:
code = GetLastError()
if descr is None:
descr = FormatError(code).strip()
return WindowsError(code, descr)
if sizeof(c_uint) == sizeof(c_void_p):
c_size_t = c_uint
c_ssize_t = c_int
elif sizeof(c_ulong) == sizeof(c_void_p):
c_size_t = c_ulong
c_ssize_t = c_long
elif sizeof(c_ulonglong) == sizeof(c_void_p):
c_size_t = c_ulonglong
c_ssize_t = c_longlong
# functions
from _ctypes import _memmove_addr, _memset_addr, _string_at_addr, _cast_addr
## void *memmove(void *, const void *, size_t);
memmove = CFUNCTYPE(c_void_p, c_void_p, c_void_p, c_size_t)(_memmove_addr)
## void *memset(void *, int, size_t)
memset = CFUNCTYPE(c_void_p, c_void_p, c_int, c_size_t)(_memset_addr)
def PYFUNCTYPE(restype, *argtypes):
class CFunctionType(_CFuncPtr):
_argtypes_ = argtypes
_restype_ = restype
_flags_ = _FUNCFLAG_CDECL | _FUNCFLAG_PYTHONAPI
return CFunctionType
_cast = PYFUNCTYPE(py_object, c_void_p, py_object, py_object)(_cast_addr)
def cast(obj, typ):
return _cast(obj, obj, typ)
_string_at = PYFUNCTYPE(py_object, c_void_p, c_int)(_string_at_addr)
def string_at(ptr, size=-1):
"""string_at(addr[, size]) -> string
Return the string at addr."""
return _string_at(ptr, size)
try:
from _ctypes import _wstring_at_addr
except ImportError:
pass
else:
_wstring_at = PYFUNCTYPE(py_object, c_void_p, c_int)(_wstring_at_addr)
def wstring_at(ptr, size=-1):
"""wstring_at(addr[, size]) -> string
Return the string at addr."""
return _wstring_at(ptr, size)
if _os.name in ("nt", "ce"): # COM stuff
def DllGetClassObject(rclsid, riid, ppv):
try:
ccom = __import__("comtypes.server.inprocserver", globals(), locals(), ['*'])
except ImportError:
return -2147221231 # CLASS_E_CLASSNOTAVAILABLE
else:
return ccom.DllGetClassObject(rclsid, riid, ppv)
def DllCanUnloadNow():
try:
ccom = __import__("comtypes.server.inprocserver", globals(), locals(), ['*'])
except ImportError:
return 0 # S_OK
return ccom.DllCanUnloadNow()
from ctypes._endian import BigEndianStructure, LittleEndianStructure
# Fill in specifically-sized types
c_int8 = c_byte
c_uint8 = c_ubyte
for kind in [c_short, c_int, c_long, c_longlong]:
if sizeof(kind) == 2: c_int16 = kind
elif sizeof(kind) == 4: c_int32 = kind
elif sizeof(kind) == 8: c_int64 = kind
for kind in [c_ushort, c_uint, c_ulong, c_ulonglong]:
if sizeof(kind) == 2: c_uint16 = kind
elif sizeof(kind) == 4: c_uint32 = kind
elif sizeof(kind) == 8: c_uint64 = kind
del(kind)
_reset_cache()
# Copyright (c) 2004 Python Software Foundation.
# All rights reserved.
# Written by Eric Price <eprice at tjhsst.edu>
# and Facundo Batista <facundo at taniquetil.com.ar>
# and Raymond Hettinger <python at rcn.com>
# and Aahz <aahz at pobox.com>
# and Tim Peters
# This module is currently Py2.3 compatible and should be kept that way
# unless a major compelling advantage arises. IOW, 2.3 compatibility is
# strongly preferred, but not guaranteed.
# Also, this module should be kept in sync with the latest updates of
# the IBM specification as it evolves. Those updates will be treated
# as bug fixes (deviation from the spec is a compatibility, usability
# bug) and will be backported. At this point the spec is stabilizing
# and the updates are becoming fewer, smaller, and less significant.
"""
This is a Py2.3 implementation of decimal floating point arithmetic based on
the General Decimal Arithmetic Specification:
http://speleotrove.com/decimal/decarith.html
and IEEE standard 854-1987:
http://en.wikipedia.org/wiki/IEEE_854-1987
Decimal floating point has finite precision with arbitrarily large bounds.
The purpose of this module is to support arithmetic using familiar
"schoolhouse" rules and to avoid some of the tricky representation
issues associated with binary floating point. The package is especially
useful for financial applications or for contexts where users have
expectations that are at odds with binary floating point (for instance,
in binary floating point, 1.00 % 0.1 gives 0.09999999999999995 instead
of the expected Decimal('0.00') returned by decimal floating point).
Here are some examples of using the decimal module:
>>> from decimal import *
>>> setcontext(ExtendedContext)
>>> Decimal(0)
Decimal('0')
>>> Decimal('1')
Decimal('1')
>>> Decimal('-.0123')
Decimal('-0.0123')
>>> Decimal(123456)
Decimal('123456')
>>> Decimal('123.45e12345678901234567890')
Decimal('1.2345E+12345678901234567892')
>>> Decimal('1.33') + Decimal('1.27')
Decimal('2.60')
>>> Decimal('12.34') + Decimal('3.87') - Decimal('18.41')
Decimal('-2.20')
>>> dig = Decimal(1)
>>> print dig / Decimal(3)
0.333333333
>>> getcontext().prec = 18
>>> print dig / Decimal(3)
0.333333333333333333
>>> print dig.sqrt()
1
>>> print Decimal(3).sqrt()
1.73205080756887729
>>> print Decimal(3) ** 123
4.85192780976896427E+58
>>> inf = Decimal(1) / Decimal(0)
>>> print inf
Infinity
>>> neginf = Decimal(-1) / Decimal(0)
>>> print neginf
-Infinity
>>> print neginf + inf
NaN
>>> print neginf * inf
-Infinity
>>> print dig / 0
Infinity
>>> getcontext().traps[DivisionByZero] = 1
>>> print dig / 0
Traceback (most recent call last):
...
...
...
DivisionByZero: x / 0
>>> c = Context()
>>> c.traps[InvalidOperation] = 0
>>> print c.flags[InvalidOperation]
0
>>> c.divide(Decimal(0), Decimal(0))
Decimal('NaN')
>>> c.traps[InvalidOperation] = 1
>>> print c.flags[InvalidOperation]
1
>>> c.flags[InvalidOperation] = 0
>>> print c.flags[InvalidOperation]
0
>>> print c.divide(Decimal(0), Decimal(0))
Traceback (most recent call last):
...
...
...
InvalidOperation: 0 / 0
>>> print c.flags[InvalidOperation]
1
>>> c.flags[InvalidOperation] = 0
>>> c.traps[InvalidOperation] = 0
>>> print c.divide(Decimal(0), Decimal(0))
NaN
>>> print c.flags[InvalidOperation]
1
>>>
"""
__all__ = [
# Two major classes
'Decimal', 'Context',
# Contexts
'DefaultContext', 'BasicContext', 'ExtendedContext',
# Exceptions
'DecimalException', 'Clamped', 'InvalidOperation', 'DivisionByZero',
'Inexact', 'Rounded', 'Subnormal', 'Overflow', 'Underflow',
# Constants for use in setting up contexts
'ROUND_DOWN', 'ROUND_HALF_UP', 'ROUND_HALF_EVEN', 'ROUND_CEILING',
'ROUND_FLOOR', 'ROUND_UP', 'ROUND_HALF_DOWN', 'ROUND_05UP',
# Functions for manipulating contexts
'setcontext', 'getcontext', 'localcontext'
]
__version__ = '1.70' # Highest version of the spec this complies with
import math as _math
import numbers as _numbers
try:
from collections import namedtuple as _namedtuple
DecimalTuple = _namedtuple('DecimalTuple', 'sign digits exponent')
except ImportError:
DecimalTuple = lambda *args: args
# Rounding
ROUND_DOWN = 'ROUND_DOWN'
ROUND_HALF_UP = 'ROUND_HALF_UP'
ROUND_HALF_EVEN = 'ROUND_HALF_EVEN'
ROUND_CEILING = 'ROUND_CEILING'
ROUND_FLOOR = 'ROUND_FLOOR'
ROUND_UP = 'ROUND_UP'
ROUND_HALF_DOWN = 'ROUND_HALF_DOWN'
ROUND_05UP = 'ROUND_05UP'
# Errors
class DecimalException(ArithmeticError):
"""Base exception class.
Used exceptions derive from this.
If an exception derives from another exception besides this (such as
Underflow (Inexact, Rounded, Subnormal) that indicates that it is only
called if the others are present. This isn't actually used for
anything, though.
handle -- Called when context._raise_error is called and the
trap_enabler is not set. First argument is self, second is the
context. More arguments can be given, those being after
the explanation in _raise_error (For example,
context._raise_error(NewError, '(-x)!', self._sign) would
call NewError().handle(context, self._sign).)
To define a new exception, it should be sufficient to have it derive
from DecimalException.
"""
def handle(self, context, *args):
pass
class Clamped(DecimalException):
"""Exponent of a 0 changed to fit bounds.
This occurs and signals clamped if the exponent of a result has been
altered in order to fit the constraints of a specific concrete
representation. This may occur when the exponent of a zero result would
be outside the bounds of a representation, or when a large normal
number would have an encoded exponent that cannot be represented. In
this latter case, the exponent is reduced to fit and the corresponding
number of zero digits are appended to the coefficient ("fold-down").
"""
class InvalidOperation(DecimalException):
"""An invalid operation was performed.
Various bad things cause this:
Something creates a signaling NaN
-INF + INF
0 * (+-)INF
(+-)INF / (+-)INF
x % 0
(+-)INF % x
x._rescale( non-integer )
sqrt(-x) , x > 0
0 ** 0
x ** (non-integer)
x ** (+-)INF
An operand is invalid
The result of the operation after these is a quiet positive NaN,
except when the cause is a signaling NaN, in which case the result is
also a quiet NaN, but with the original sign, and an optional
diagnostic information.
"""
def handle(self, context, *args):
if args:
ans = _dec_from_triple(args[0]._sign, args[0]._int, 'n', True)
return ans._fix_nan(context)
return _NaN
class ConversionSyntax(InvalidOperation):
"""Trying to convert badly formed string.
This occurs and signals invalid-operation if a string is being
converted to a number and it does not conform to the numeric string
syntax. The result is [0,qNaN].
"""
def handle(self, context, *args):
return _NaN
class DivisionByZero(DecimalException, ZeroDivisionError):
"""Division by 0.
This occurs and signals division-by-zero if division of a finite number
by zero was attempted (during a divide-integer or divide operation, or a
power operation with negative right-hand operand), and the dividend was
not zero.
The result of the operation is [sign,inf], where sign is the exclusive
or of the signs of the operands for divide, or is 1 for an odd power of
-0, for power.
"""
def handle(self, context, sign, *args):
return _SignedInfinity[sign]
class DivisionImpossible(InvalidOperation):
"""Cannot perform the division adequately.
This occurs and signals invalid-operation if the integer result of a
divide-integer or remainder operation had too many digits (would be
longer than precision). The result is [0,qNaN].
"""
def handle(self, context, *args):
return _NaN
class DivisionUndefined(InvalidOperation, ZeroDivisionError):
"""Undefined result of division.
This occurs and signals invalid-operation if division by zero was
attempted (during a divide-integer, divide, or remainder operation), and
the dividend is also zero. The result is [0,qNaN].
"""
def handle(self, context, *args):
return _NaN
class Inexact(DecimalException):
"""Had to round, losing information.
This occurs and signals inexact whenever the result of an operation is
not exact (that is, it needed to be rounded and any discarded digits
were non-zero), or if an overflow or underflow condition occurs. The
result in all cases is unchanged.
The inexact signal may be tested (or trapped) to determine if a given
operation (or sequence of operations) was inexact.
"""
class InvalidContext(InvalidOperation):
"""Invalid context. Unknown rounding, for example.
This occurs and signals invalid-operation if an invalid context was
detected during an operation. This can occur if contexts are not checked
on creation and either the precision exceeds the capability of the
underlying concrete representation or an unknown or unsupported rounding
was specified. These aspects of the context need only be checked when
the values are required to be used. The result is [0,qNaN].
"""
def handle(self, context, *args):
return _NaN
class Rounded(DecimalException):
"""Number got rounded (not necessarily changed during rounding).
This occurs and signals rounded whenever the result of an operation is
rounded (that is, some zero or non-zero digits were discarded from the
coefficient), or if an overflow or underflow condition occurs. The
result in all cases is unchanged.
The rounded signal may be tested (or trapped) to determine if a given
operation (or sequence of operations) caused a loss of precision.
"""
class Subnormal(DecimalException):
"""Exponent < Emin before rounding.
This occurs and signals subnormal whenever the result of a conversion or
operation is subnormal (that is, its adjusted exponent is less than
Emin, before any rounding). The result in all cases is unchanged.
The subnormal signal may be tested (or trapped) to determine if a given
or operation (or sequence of operations) yielded a subnormal result.
"""
class Overflow(Inexact, Rounded):
"""Numerical overflow.
This occurs and signals overflow if the adjusted exponent of a result
(from a conversion or from an operation that is not an attempt to divide
by zero), after rounding, would be greater than the largest value that
can be handled by the implementation (the value Emax).
The result depends on the rounding mode:
For round-half-up and round-half-even (and for round-half-down and
round-up, if implemented), the result of the operation is [sign,inf],
where sign is the sign of the intermediate result. For round-down, the
result is the largest finite number that can be represented in the
current precision, with the sign of the intermediate result. For
round-ceiling, the result is the same as for round-down if the sign of
the intermediate result is 1, or is [0,inf] otherwise. For round-floor,
the result is the same as for round-down if the sign of the intermediate
result is 0, or is [1,inf] otherwise. In all cases, Inexact and Rounded
will also be raised.
"""
def handle(self, context, sign, *args):
if context.rounding in (ROUND_HALF_UP, ROUND_HALF_EVEN,
ROUND_HALF_DOWN, ROUND_UP):
return _SignedInfinity[sign]
if sign == 0:
if context.rounding == ROUND_CEILING:
return _SignedInfinity[sign]
return _dec_from_triple(sign, '9'*context.prec,
context.Emax-context.prec+1)
if sign == 1:
if context.rounding == ROUND_FLOOR:
return _SignedInfinity[sign]
return _dec_from_triple(sign, '9'*context.prec,
context.Emax-context.prec+1)
class Underflow(Inexact, Rounded, Subnormal):
"""Numerical underflow with result rounded to 0.
This occurs and signals underflow if a result is inexact and the
adjusted exponent of the result would be smaller (more negative) than
the smallest value that can be handled by the implementation (the value
Emin). That is, the result is both inexact and subnormal.
The result after an underflow will be a subnormal number rounded, if
necessary, so that its exponent is not less than Etiny. This may result
in 0 with the sign of the intermediate result and an exponent of Etiny.
In all cases, Inexact, Rounded, and Subnormal will also be raised.
"""
# List of public traps and flags
_signals = [Clamped, DivisionByZero, Inexact, Overflow, Rounded,
Underflow, InvalidOperation, Subnormal]
# Map conditions (per the spec) to signals
_condition_map = {ConversionSyntax:InvalidOperation,
DivisionImpossible:InvalidOperation,
DivisionUndefined:InvalidOperation,
InvalidContext:InvalidOperation}
##### Context Functions ##################################################
# The getcontext() and setcontext() function manage access to a thread-local
# current context. Py2.4 offers direct support for thread locals. If that
# is not available, use threading.currentThread() which is slower but will
# work for older Pythons. If threads are not part of the build, create a
# mock threading object with threading.local() returning the module namespace.
try:
import threading
except ImportError:
# Python was compiled without threads; create a mock object instead
import sys
class MockThreading(object):
def local(self, sys=sys):
return sys.modules[__name__]
threading = MockThreading()
del sys, MockThreading
try:
threading.local
except AttributeError:
# To fix reloading, force it to create a new context
# Old contexts have different exceptions in their dicts, making problems.
if hasattr(threading.currentThread(), '__decimal_context__'):
del threading.currentThread().__decimal_context__
def setcontext(context):
"""Set this thread's context to context."""
if context in (DefaultContext, BasicContext, ExtendedContext):
context = context.copy()
context.clear_flags()
threading.currentThread().__decimal_context__ = context
def getcontext():
"""Returns this thread's context.
If this thread does not yet have a context, returns
a new context and sets this thread's context.
New contexts are copies of DefaultContext.
"""
try:
return threading.currentThread().__decimal_context__
except AttributeError:
context = Context()
threading.currentThread().__decimal_context__ = context
return context
else:
local = threading.local()
if hasattr(local, '__decimal_context__'):
del local.__decimal_context__
def getcontext(_local=local):
"""Returns this thread's context.
If this thread does not yet have a context, returns
a new context and sets this thread's context.
New contexts are copies of DefaultContext.
"""
try:
return _local.__decimal_context__
except AttributeError:
context = Context()
_local.__decimal_context__ = context
return context
def setcontext(context, _local=local):
"""Set this thread's context to context."""
if context in (DefaultContext, BasicContext, ExtendedContext):
context = context.copy()
context.clear_flags()
_local.__decimal_context__ = context
del threading, local # Don't contaminate the namespace
def localcontext(ctx=None):
"""Return a context manager for a copy of the supplied context
Uses a copy of the current context if no context is specified
The returned context manager creates a local decimal context
in a with statement:
def sin(x):
with localcontext() as ctx:
ctx.prec += 2
# Rest of sin calculation algorithm
# uses a precision 2 greater than normal
return +s # Convert result to normal precision
def sin(x):
with localcontext(ExtendedContext):
# Rest of sin calculation algorithm
# uses the Extended Context from the
# General Decimal Arithmetic Specification
return +s # Convert result to normal context
>>> setcontext(DefaultContext)
>>> print getcontext().prec
28
>>> with localcontext():
... ctx = getcontext()
... ctx.prec += 2
... print ctx.prec
...
30
>>> with localcontext(ExtendedContext):
... print getcontext().prec
...
9
>>> print getcontext().prec
28
"""
if ctx is None: ctx = getcontext()
return _ContextManager(ctx)
##### Decimal class #######################################################
class Decimal(object):
"""Floating point class for decimal arithmetic."""
__slots__ = ('_exp','_int','_sign', '_is_special')
# Generally, the value of the Decimal instance is given by
# (-1)**_sign * _int * 10**_exp
# Special values are signified by _is_special == True
# We're immutable, so use __new__ not __init__
def __new__(cls, value="0", context=None):
"""Create a decimal point instance.
>>> Decimal('3.14') # string input
Decimal('3.14')
>>> Decimal((0, (3, 1, 4), -2)) # tuple (sign, digit_tuple, exponent)
Decimal('3.14')
>>> Decimal(314) # int or long
Decimal('314')
>>> Decimal(Decimal(314)) # another decimal instance
Decimal('314')
>>> Decimal(' 3.14 \\n') # leading and trailing whitespace okay
Decimal('3.14')
"""
# Note that the coefficient, self._int, is actually stored as
# a string rather than as a tuple of digits. This speeds up
# the "digits to integer" and "integer to digits" conversions
# that are used in almost every arithmetic operation on
# Decimals. This is an internal detail: the as_tuple function
# and the Decimal constructor still deal with tuples of
# digits.
self = object.__new__(cls)
import sys
if sys.platform == 'cli':
import System
if isinstance(value, System.Decimal):
value = str(value)
# From a string
# REs insist on real strings, so we can too.
if isinstance(value, basestring):
m = _parser(value.strip())
if m is None:
if context is None:
context = getcontext()
return context._raise_error(ConversionSyntax,
"Invalid literal for Decimal: %r" % value)
if m.group('sign') == "-":
self._sign = 1
else:
self._sign = 0
intpart = m.group('int')
if intpart is not None:
# finite number
fracpart = m.group('frac') or ''
exp = int(m.group('exp') or '0')
self._int = str(int(intpart+fracpart))
self._exp = exp - len(fracpart)
self._is_special = False
else:
diag = m.group('diag')
if diag is not None:
# NaN
self._int = str(int(diag or '0')).lstrip('0')
if m.group('signal'):
self._exp = 'N'
else:
self._exp = 'n'
else:
# infinity
self._int = '0'
self._exp = 'F'
self._is_special = True
return self
# From an integer
if isinstance(value, (int,long)):
if value >= 0:
self._sign = 0
else:
self._sign = 1
self._exp = 0
self._int = str(abs(value))
self._is_special = False
return self
# From another decimal
if isinstance(value, Decimal):
self._exp = value._exp
self._sign = value._sign
self._int = value._int
self._is_special = value._is_special
return self
# From an internal working value
if isinstance(value, _WorkRep):
self._sign = value.sign
self._int = str(value.int)
self._exp = int(value.exp)
self._is_special = False
return self
# tuple/list conversion (possibly from as_tuple())
if isinstance(value, (list,tuple)):
if len(value) != 3:
raise ValueError('Invalid tuple size in creation of Decimal '
'from list or tuple. The list or tuple '
'should have exactly three elements.')
# process sign. The isinstance test rejects floats
if not (isinstance(value[0], (int, long)) and value[0] in (0,1)):
raise ValueError("Invalid sign. The first value in the tuple "
"should be an integer; either 0 for a "
"positive number or 1 for a negative number.")
self._sign = value[0]
if value[2] == 'F':
# infinity: value[1] is ignored
self._int = '0'
self._exp = value[2]
self._is_special = True
else:
# process and validate the digits in value[1]
digits = []
for digit in value[1]:
if isinstance(digit, (int, long)) and 0 <= digit <= 9:
# skip leading zeros
if digits or digit != 0:
digits.append(digit)
else:
raise ValueError("The second value in the tuple must "
"be composed of integers in the range "
"0 through 9.")
if value[2] in ('n', 'N'):
# NaN: digits form the diagnostic
self._int = ''.join(map(str, digits))
self._exp = value[2]
self._is_special = True
elif isinstance(value[2], (int, long)):
# finite number: digits give the coefficient
self._int = ''.join(map(str, digits or [0]))
self._exp = value[2]
self._is_special = False
else:
raise ValueError("The third value in the tuple must "
"be an integer, or one of the "
"strings 'F', 'n', 'N'.")
return self
if isinstance(value, float):
value = Decimal.from_float(value)
self._exp = value._exp
self._sign = value._sign
self._int = value._int
self._is_special = value._is_special
return self
raise TypeError("Cannot convert %r to Decimal" % value)
# @classmethod, but @decorator is not valid Python 2.3 syntax, so
# don't use it (see notes on Py2.3 compatibility at top of file)
def from_float(cls, f):
"""Converts a float to a decimal number, exactly.
Note that Decimal.from_float(0.1) is not the same as Decimal('0.1').
Since 0.1 is not exactly representable in binary floating point, the
value is stored as the nearest representable value which is
0x1.999999999999ap-4. The exact equivalent of the value in decimal
is 0.1000000000000000055511151231257827021181583404541015625.
>>> Decimal.from_float(0.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')
>>> Decimal.from_float(float('nan'))
Decimal('NaN')
>>> Decimal.from_float(float('inf'))
Decimal('Infinity')
>>> Decimal.from_float(-float('inf'))
Decimal('-Infinity')
>>> Decimal.from_float(-0.0)
Decimal('-0')
"""
if isinstance(f, (int, long)): # handle integer inputs
return cls(f)
if _math.isinf(f) or _math.isnan(f): # raises TypeError if not a float
return cls(repr(f))
if _math.copysign(1.0, f) == 1.0:
sign = 0
else:
sign = 1
n, d = abs(f).as_integer_ratio()
k = d.bit_length() - 1
result = _dec_from_triple(sign, str(n*5**k), -k)
if cls is Decimal:
return result
else:
return cls(result)
from_float = classmethod(from_float)
def _isnan(self):
"""Returns whether the number is not actually one.
0 if a number
1 if NaN
2 if sNaN
"""
if self._is_special:
exp = self._exp
if exp == 'n':
return 1
elif exp == 'N':
return 2
return 0
def _isinfinity(self):
"""Returns whether the number is infinite
0 if finite or not a number
1 if +INF
-1 if -INF
"""
if self._exp == 'F':
if self._sign:
return -1
return 1
return 0
def _check_nans(self, other=None, context=None):
"""Returns whether the number is not actually one.
if self, other are sNaN, signal
if self, other are NaN return nan
return 0
Done before operations.
"""
self_is_nan = self._isnan()
if other is None:
other_is_nan = False
else:
other_is_nan = other._isnan()
if self_is_nan or other_is_nan:
if context is None:
context = getcontext()
if self_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
self)
if other_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
other)
if self_is_nan:
return self._fix_nan(context)
return other._fix_nan(context)
return 0
def _compare_check_nans(self, other, context):
"""Version of _check_nans used for the signaling comparisons
compare_signal, __le__, __lt__, __ge__, __gt__.
Signal InvalidOperation if either self or other is a (quiet
or signaling) NaN. Signaling NaNs take precedence over quiet
NaNs.
Return 0 if neither operand is a NaN.
"""
if context is None:
context = getcontext()
if self._is_special or other._is_special:
if self.is_snan():
return context._raise_error(InvalidOperation,
'comparison involving sNaN',
self)
elif other.is_snan():
return context._raise_error(InvalidOperation,
'comparison involving sNaN',
other)
elif self.is_qnan():
return context._raise_error(InvalidOperation,
'comparison involving NaN',
self)
elif other.is_qnan():
return context._raise_error(InvalidOperation,
'comparison involving NaN',
other)
return 0
def __nonzero__(self):
"""Return True if self is nonzero; otherwise return False.
NaNs and infinities are considered nonzero.
"""
return self._is_special or self._int != '0'
def _cmp(self, other):
"""Compare the two non-NaN decimal instances self and other.
Returns -1 if self < other, 0 if self == other and 1
if self > other. This routine is for internal use only."""
if self._is_special or other._is_special:
self_inf = self._isinfinity()
other_inf = other._isinfinity()
if self_inf == other_inf:
return 0
elif self_inf < other_inf:
return -1
else:
return 1
# check for zeros; Decimal('0') == Decimal('-0')
if not self:
if not other:
return 0
else:
return -((-1)**other._sign)
if not other:
return (-1)**self._sign
# If different signs, neg one is less
if other._sign < self._sign:
return -1
if self._sign < other._sign:
return 1
self_adjusted = self.adjusted()
other_adjusted = other.adjusted()
if self_adjusted == other_adjusted:
self_padded = self._int + '0'*(self._exp - other._exp)
other_padded = other._int + '0'*(other._exp - self._exp)
if self_padded == other_padded:
return 0
elif self_padded < other_padded:
return -(-1)**self._sign
else:
return (-1)**self._sign
elif self_adjusted > other_adjusted:
return (-1)**self._sign
else: # self_adjusted < other_adjusted
return -((-1)**self._sign)
# Note: The Decimal standard doesn't cover rich comparisons for
# Decimals. In particular, the specification is silent on the
# subject of what should happen for a comparison involving a NaN.
# We take the following approach:
#
# == comparisons involving a quiet NaN always return False
# != comparisons involving a quiet NaN always return True
# == or != comparisons involving a signaling NaN signal
# InvalidOperation, and return False or True as above if the
# InvalidOperation is not trapped.
# <, >, <= and >= comparisons involving a (quiet or signaling)
# NaN signal InvalidOperation, and return False if the
# InvalidOperation is not trapped.
#
# This behavior is designed to conform as closely as possible to
# that specified by IEEE 754.
def __eq__(self, other, context=None):
other = _convert_other(other, allow_float=True)
if other is NotImplemented:
return other
if self._check_nans(other, context):
return False
return self._cmp(other) == 0
def __ne__(self, other, context=None):
other = _convert_other(other, allow_float=True)
if other is NotImplemented:
return other
if self._check_nans(other, context):
return True
return self._cmp(other) != 0
def __lt__(self, other, context=None):
other = _convert_other(other, allow_float=True)
if other is NotImplemented:
return other
ans = self._compare_check_nans(other, context)
if ans:
return False
return self._cmp(other) < 0
def __le__(self, other, context=None):
other = _convert_other(other, allow_float=True)
if other is NotImplemented:
return other
ans = self._compare_check_nans(other, context)
if ans:
return False
return self._cmp(other) <= 0
def __gt__(self, other, context=None):
other = _convert_other(other, allow_float=True)
if other is NotImplemented:
return other
ans = self._compare_check_nans(other, context)
if ans:
return False
return self._cmp(other) > 0
def __ge__(self, other, context=None):
other = _convert_other(other, allow_float=True)
if other is NotImplemented:
return other
ans = self._compare_check_nans(other, context)
if ans:
return False
return self._cmp(other) >= 0
def compare(self, other, context=None):
"""Compares one to another.
-1 => a < b
0 => a = b
1 => a > b
NaN => one is NaN
Like __cmp__, but returns Decimal instances.
"""
other = _convert_other(other, raiseit=True)
# Compare(NaN, NaN) = NaN
if (self._is_special or other and other._is_special):
ans = self._check_nans(other, context)
if ans:
return ans
return Decimal(self._cmp(other))
def __hash__(self):
"""x.__hash__() <==> hash(x)"""
# Decimal integers must hash the same as the ints
#
# The hash of a nonspecial noninteger Decimal must depend only
# on the value of that Decimal, and not on its representation.
# For example: hash(Decimal('100E-1')) == hash(Decimal('10')).
# Equality comparisons involving signaling nans can raise an
# exception; since equality checks are implicitly and
# unpredictably used when checking set and dict membership, we
# prevent signaling nans from being used as set elements or
# dict keys by making __hash__ raise an exception.
if self._is_special:
if self.is_snan():
raise TypeError('Cannot hash a signaling NaN value.')
elif self.is_nan():
# 0 to match hash(float('nan'))
return 0
else:
# values chosen to match hash(float('inf')) and
# hash(float('-inf')).
if self._sign:
return -271828
else:
return 314159
# In Python 2.7, we're allowing comparisons (but not
# arithmetic operations) between floats and Decimals; so if
# a Decimal instance is exactly representable as a float then
# its hash should match that of the float.
self_as_float = float(self)
if Decimal.from_float(self_as_float) == self:
return hash(self_as_float)
if self._isinteger():
op = _WorkRep(self.to_integral_value())
# to make computation feasible for Decimals with large
# exponent, we use the fact that hash(n) == hash(m) for
# any two nonzero integers n and m such that (i) n and m
# have the same sign, and (ii) n is congruent to m modulo
# 2**64-1. So we can replace hash((-1)**s*c*10**e) with
# hash((-1)**s*c*pow(10, e, 2**64-1).
return hash((-1)**op.sign*op.int*pow(10, op.exp, 2**64-1))
# The value of a nonzero nonspecial Decimal instance is
# faithfully represented by the triple consisting of its sign,
# its adjusted exponent, and its coefficient with trailing
# zeros removed.
return hash((self._sign,
self._exp+len(self._int),
self._int.rstrip('0')))
def as_tuple(self):
"""Represents the number as a triple tuple.
To show the internals exactly as they are.
"""
return DecimalTuple(self._sign, tuple(map(int, self._int)), self._exp)
def __repr__(self):
"""Represents the number as an instance of Decimal."""
# Invariant: eval(repr(d)) == d
return "Decimal('%s')" % str(self)
def __str__(self, eng=False, context=None):
"""Return string representation of the number in scientific notation.
Captures all of the information in the underlying representation.
"""
sign = ['', '-'][self._sign]
if self._is_special:
if self._exp == 'F':
return sign + 'Infinity'
elif self._exp == 'n':
return sign + 'NaN' + self._int
else: # self._exp == 'N'
return sign + 'sNaN' + self._int
# number of digits of self._int to left of decimal point
leftdigits = self._exp + len(self._int)
# dotplace is number of digits of self._int to the left of the
# decimal point in the mantissa of the output string (that is,
# after adjusting the exponent)
if self._exp <= 0 and leftdigits > -6:
# no exponent required
dotplace = leftdigits
elif not eng:
# usual scientific notation: 1 digit on left of the point
dotplace = 1
elif self._int == '0':
# engineering notation, zero
dotplace = (leftdigits + 1) % 3 - 1
else:
# engineering notation, nonzero
dotplace = (leftdigits - 1) % 3 + 1
if dotplace <= 0:
intpart = '0'
fracpart = '.' + '0'*(-dotplace) + self._int
elif dotplace >= len(self._int):
intpart = self._int+'0'*(dotplace-len(self._int))
fracpart = ''
else:
intpart = self._int[:dotplace]
fracpart = '.' + self._int[dotplace:]
if leftdigits == dotplace:
exp = ''
else:
if context is None:
context = getcontext()
exp = ['e', 'E'][context.capitals] + "%+d" % (leftdigits-dotplace)
return sign + intpart + fracpart + exp
def to_eng_string(self, context=None):
"""Convert to a string, using engineering notation if an exponent is needed.
Engineering notation has an exponent which is a multiple of 3. This
can leave up to 3 digits to the left of the decimal place and may
require the addition of either one or two trailing zeros.
"""
return self.__str__(eng=True, context=context)
def __neg__(self, context=None):
"""Returns a copy with the sign switched.
Rounds, if it has reason.
"""
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
if context is None:
context = getcontext()
if not self and context.rounding != ROUND_FLOOR:
# -Decimal('0') is Decimal('0'), not Decimal('-0'), except
# in ROUND_FLOOR rounding mode.
ans = self.copy_abs()
else:
ans = self.copy_negate()
return ans._fix(context)
def __pos__(self, context=None):
"""Returns a copy, unless it is a sNaN.
Rounds the number (if more than precision digits)
"""
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
if context is None:
context = getcontext()
if not self and context.rounding != ROUND_FLOOR:
# + (-0) = 0, except in ROUND_FLOOR rounding mode.
ans = self.copy_abs()
else:
ans = Decimal(self)
return ans._fix(context)
def __abs__(self, round=True, context=None):
"""Returns the absolute value of self.
If the keyword argument 'round' is false, do not round. The
expression self.__abs__(round=False) is equivalent to
self.copy_abs().
"""
if not round:
return self.copy_abs()
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
if self._sign:
ans = self.__neg__(context=context)
else:
ans = self.__pos__(context=context)
return ans
def __add__(self, other, context=None):
"""Returns self + other.
-INF + INF (or the reverse) cause InvalidOperation errors.
"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
if self._is_special or other._is_special:
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity():
# If both INF, same sign => same as both, opposite => error.
if self._sign != other._sign and other._isinfinity():
return context._raise_error(InvalidOperation, '-INF + INF')
return Decimal(self)
if other._isinfinity():
return Decimal(other) # Can't both be infinity here
exp = min(self._exp, other._exp)
negativezero = 0
if context.rounding == ROUND_FLOOR and self._sign != other._sign:
# If the answer is 0, the sign should be negative, in this case.
negativezero = 1
if not self and not other:
sign = min(self._sign, other._sign)
if negativezero:
sign = 1
ans = _dec_from_triple(sign, '0', exp)
ans = ans._fix(context)
return ans
if not self:
exp = max(exp, other._exp - context.prec-1)
ans = other._rescale(exp, context.rounding)
ans = ans._fix(context)
return ans
if not other:
exp = max(exp, self._exp - context.prec-1)
ans = self._rescale(exp, context.rounding)
ans = ans._fix(context)
return ans
op1 = _WorkRep(self)
op2 = _WorkRep(other)
op1, op2 = _normalize(op1, op2, context.prec)
result = _WorkRep()
if op1.sign != op2.sign:
# Equal and opposite
if op1.int == op2.int:
ans = _dec_from_triple(negativezero, '0', exp)
ans = ans._fix(context)
return ans
if op1.int < op2.int:
op1, op2 = op2, op1
# OK, now abs(op1) > abs(op2)
if op1.sign == 1:
result.sign = 1
op1.sign, op2.sign = op2.sign, op1.sign
else:
result.sign = 0
# So we know the sign, and op1 > 0.
elif op1.sign == 1:
result.sign = 1
op1.sign, op2.sign = (0, 0)
else:
result.sign = 0
# Now, op1 > abs(op2) > 0
if op2.sign == 0:
result.int = op1.int + op2.int
else:
result.int = op1.int - op2.int
result.exp = op1.exp
ans = Decimal(result)
ans = ans._fix(context)
return ans
__radd__ = __add__
def __sub__(self, other, context=None):
"""Return self - other"""
other = _convert_other(other)
if other is NotImplemented:
return other
if self._is_special or other._is_special:
ans = self._check_nans(other, context=context)
if ans:
return ans
# self - other is computed as self + other.copy_negate()
return self.__add__(other.copy_negate(), context=context)
def __rsub__(self, other, context=None):
"""Return other - self"""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__sub__(self, context=context)
def __mul__(self, other, context=None):
"""Return self * other.
(+-) INF * 0 (or its reverse) raise InvalidOperation.
"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
resultsign = self._sign ^ other._sign
if self._is_special or other._is_special:
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity():
if not other:
return context._raise_error(InvalidOperation, '(+-)INF * 0')
return _SignedInfinity[resultsign]
if other._isinfinity():
if not self:
return context._raise_error(InvalidOperation, '0 * (+-)INF')
return _SignedInfinity[resultsign]
resultexp = self._exp + other._exp
# Special case for multiplying by zero
if not self or not other:
ans = _dec_from_triple(resultsign, '0', resultexp)
# Fixing in case the exponent is out of bounds
ans = ans._fix(context)
return ans
# Special case for multiplying by power of 10
if self._int == '1':
ans = _dec_from_triple(resultsign, other._int, resultexp)
ans = ans._fix(context)
return ans
if other._int == '1':
ans = _dec_from_triple(resultsign, self._int, resultexp)
ans = ans._fix(context)
return ans
op1 = _WorkRep(self)
op2 = _WorkRep(other)
ans = _dec_from_triple(resultsign, str(op1.int * op2.int), resultexp)
ans = ans._fix(context)
return ans
__rmul__ = __mul__
def __truediv__(self, other, context=None):
"""Return self / other."""
other = _convert_other(other)
if other is NotImplemented:
return NotImplemented
if context is None:
context = getcontext()
sign = self._sign ^ other._sign
if self._is_special or other._is_special:
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity() and other._isinfinity():
return context._raise_error(InvalidOperation, '(+-)INF/(+-)INF')
if self._isinfinity():
return _SignedInfinity[sign]
if other._isinfinity():
context._raise_error(Clamped, 'Division by infinity')
return _dec_from_triple(sign, '0', context.Etiny())
# Special cases for zeroes
if not other:
if not self:
return context._raise_error(DivisionUndefined, '0 / 0')
return context._raise_error(DivisionByZero, 'x / 0', sign)
if not self:
exp = self._exp - other._exp
coeff = 0
else:
# OK, so neither = 0, INF or NaN
shift = len(other._int) - len(self._int) + context.prec + 1
exp = self._exp - other._exp - shift
op1 = _WorkRep(self)
op2 = _WorkRep(other)
if shift >= 0:
coeff, remainder = divmod(op1.int * 10**shift, op2.int)
else:
coeff, remainder = divmod(op1.int, op2.int * 10**-shift)
if remainder:
# result is not exact; adjust to ensure correct rounding
if coeff % 5 == 0:
coeff += 1
else:
# result is exact; get as close to ideal exponent as possible
ideal_exp = self._exp - other._exp
while exp < ideal_exp and coeff % 10 == 0:
coeff //= 10
exp += 1
ans = _dec_from_triple(sign, str(coeff), exp)
return ans._fix(context)
def _divide(self, other, context):
"""Return (self // other, self % other), to context.prec precision.
Assumes that neither self nor other is a NaN, that self is not
infinite and that other is nonzero.
"""
sign = self._sign ^ other._sign
if other._isinfinity():
ideal_exp = self._exp
else:
ideal_exp = min(self._exp, other._exp)
expdiff = self.adjusted() - other.adjusted()
if not self or other._isinfinity() or expdiff <= -2:
return (_dec_from_triple(sign, '0', 0),
self._rescale(ideal_exp, context.rounding))
if expdiff <= context.prec:
op1 = _WorkRep(self)
op2 = _WorkRep(other)
if op1.exp >= op2.exp:
op1.int *= 10**(op1.exp - op2.exp)
else:
op2.int *= 10**(op2.exp - op1.exp)
q, r = divmod(op1.int, op2.int)
if q < 10**context.prec:
return (_dec_from_triple(sign, str(q), 0),
_dec_from_triple(self._sign, str(r), ideal_exp))
# Here the quotient is too large to be representable
ans = context._raise_error(DivisionImpossible,
'quotient too large in //, % or divmod')
return ans, ans
def __rtruediv__(self, other, context=None):
"""Swaps self/other and returns __truediv__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__truediv__(self, context=context)
__div__ = __truediv__
__rdiv__ = __rtruediv__
def __divmod__(self, other, context=None):
"""
Return (self // other, self % other)
"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
ans = self._check_nans(other, context)
if ans:
return (ans, ans)
sign = self._sign ^ other._sign
if self._isinfinity():
if other._isinfinity():
ans = context._raise_error(InvalidOperation, 'divmod(INF, INF)')
return ans, ans
else:
return (_SignedInfinity[sign],
context._raise_error(InvalidOperation, 'INF % x'))
if not other:
if not self:
ans = context._raise_error(DivisionUndefined, 'divmod(0, 0)')
return ans, ans
else:
return (context._raise_error(DivisionByZero, 'x // 0', sign),
context._raise_error(InvalidOperation, 'x % 0'))
quotient, remainder = self._divide(other, context)
remainder = remainder._fix(context)
return quotient, remainder
def __rdivmod__(self, other, context=None):
"""Swaps self/other and returns __divmod__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__divmod__(self, context=context)
def __mod__(self, other, context=None):
"""
self % other
"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity():
return context._raise_error(InvalidOperation, 'INF % x')
elif not other:
if self:
return context._raise_error(InvalidOperation, 'x % 0')
else:
return context._raise_error(DivisionUndefined, '0 % 0')
remainder = self._divide(other, context)[1]
remainder = remainder._fix(context)
return remainder
def __rmod__(self, other, context=None):
"""Swaps self/other and returns __mod__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__mod__(self, context=context)
def remainder_near(self, other, context=None):
"""
Remainder nearest to 0- abs(remainder-near) <= other/2
"""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
ans = self._check_nans(other, context)
if ans:
return ans
# self == +/-infinity -> InvalidOperation
if self._isinfinity():
return context._raise_error(InvalidOperation,
'remainder_near(infinity, x)')
# other == 0 -> either InvalidOperation or DivisionUndefined
if not other:
if self:
return context._raise_error(InvalidOperation,
'remainder_near(x, 0)')
else:
return context._raise_error(DivisionUndefined,
'remainder_near(0, 0)')
# other = +/-infinity -> remainder = self
if other._isinfinity():
ans = Decimal(self)
return ans._fix(context)
# self = 0 -> remainder = self, with ideal exponent
ideal_exponent = min(self._exp, other._exp)
if not self:
ans = _dec_from_triple(self._sign, '0', ideal_exponent)
return ans._fix(context)
# catch most cases of large or small quotient
expdiff = self.adjusted() - other.adjusted()
if expdiff >= context.prec + 1:
# expdiff >= prec+1 => abs(self/other) > 10**prec
return context._raise_error(DivisionImpossible)
if expdiff <= -2:
# expdiff <= -2 => abs(self/other) < 0.1
ans = self._rescale(ideal_exponent, context.rounding)
return ans._fix(context)
# adjust both arguments to have the same exponent, then divide
op1 = _WorkRep(self)
op2 = _WorkRep(other)
if op1.exp >= op2.exp:
op1.int *= 10**(op1.exp - op2.exp)
else:
op2.int *= 10**(op2.exp - op1.exp)
q, r = divmod(op1.int, op2.int)
# remainder is r*10**ideal_exponent; other is +/-op2.int *
# 10**ideal_exponent. Apply correction to ensure that
# abs(remainder) <= abs(other)/2
if 2*r + (q&1) > op2.int:
r -= op2.int
q += 1
if q >= 10**context.prec:
return context._raise_error(DivisionImpossible)
# result has same sign as self unless r is negative
sign = self._sign
if r < 0:
sign = 1-sign
r = -r
ans = _dec_from_triple(sign, str(r), ideal_exponent)
return ans._fix(context)
def __floordiv__(self, other, context=None):
"""self // other"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity():
if other._isinfinity():
return context._raise_error(InvalidOperation, 'INF // INF')
else:
return _SignedInfinity[self._sign ^ other._sign]
if not other:
if self:
return context._raise_error(DivisionByZero, 'x // 0',
self._sign ^ other._sign)
else:
return context._raise_error(DivisionUndefined, '0 // 0')
return self._divide(other, context)[0]
def __rfloordiv__(self, other, context=None):
"""Swaps self/other and returns __floordiv__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__floordiv__(self, context=context)
def __float__(self):
"""Float representation."""
if self._isnan():
if self.is_snan():
raise ValueError("Cannot convert signaling NaN to float")
s = "-nan" if self._sign else "nan"
else:
s = str(self)
return float(s)
def __int__(self):
"""Converts self to an int, truncating if necessary."""
if self._is_special:
if self._isnan():
raise ValueError("Cannot convert NaN to integer")
elif self._isinfinity():
raise OverflowError("Cannot convert infinity to integer")
s = (-1)**self._sign
if self._exp >= 0:
return s*int(self._int)*10**self._exp
else:
return s*int(self._int[:self._exp] or '0')
__trunc__ = __int__
def real(self):
return self
real = property(real)
def imag(self):
return Decimal(0)
imag = property(imag)
def conjugate(self):
return self
def __complex__(self):
return complex(float(self))
def __long__(self):
"""Converts to a long.
Equivalent to long(int(self))
"""
return long(self.__int__())
def _fix_nan(self, context):
"""Decapitate the payload of a NaN to fit the context"""
payload = self._int
# maximum length of payload is precision if _clamp=0,
# precision-1 if _clamp=1.
max_payload_len = context.prec - context._clamp
if len(payload) > max_payload_len:
payload = payload[len(payload)-max_payload_len:].lstrip('0')
return _dec_from_triple(self._sign, payload, self._exp, True)
return Decimal(self)
def _fix(self, context):
"""Round if it is necessary to keep self within prec precision.
Rounds and fixes the exponent. Does not raise on a sNaN.
Arguments:
self - Decimal instance
context - context used.
"""
if self._is_special:
if self._isnan():
# decapitate payload if necessary
return self._fix_nan(context)
else:
# self is +/-Infinity; return unaltered
return Decimal(self)
# if self is zero then exponent should be between Etiny and
# Emax if _clamp==0, and between Etiny and Etop if _clamp==1.
Etiny = context.Etiny()
Etop = context.Etop()
if not self:
exp_max = [context.Emax, Etop][context._clamp]
new_exp = min(max(self._exp, Etiny), exp_max)
if new_exp != self._exp:
context._raise_error(Clamped)
return _dec_from_triple(self._sign, '0', new_exp)
else:
return Decimal(self)
# exp_min is the smallest allowable exponent of the result,
# equal to max(self.adjusted()-context.prec+1, Etiny)
exp_min = len(self._int) + self._exp - context.prec
if exp_min > Etop:
# overflow: exp_min > Etop iff self.adjusted() > Emax
ans = context._raise_error(Overflow, 'above Emax', self._sign)
context._raise_error(Inexact)
context._raise_error(Rounded)
return ans
self_is_subnormal = exp_min < Etiny
if self_is_subnormal:
exp_min = Etiny
# round if self has too many digits
if self._exp < exp_min:
digits = len(self._int) + self._exp - exp_min
if digits < 0:
self = _dec_from_triple(self._sign, '1', exp_min-1)
digits = 0
rounding_method = self._pick_rounding_function[context.rounding]
changed = rounding_method(self, digits)
coeff = self._int[:digits] or '0'
if changed > 0:
coeff = str(int(coeff)+1)
if len(coeff) > context.prec:
coeff = coeff[:-1]
exp_min += 1
# check whether the rounding pushed the exponent out of range
if exp_min > Etop:
ans = context._raise_error(Overflow, 'above Emax', self._sign)
else:
ans = _dec_from_triple(self._sign, coeff, exp_min)
# raise the appropriate signals, taking care to respect
# the precedence described in the specification
if changed and self_is_subnormal:
context._raise_error(Underflow)
if self_is_subnormal:
context._raise_error(Subnormal)
if changed:
context._raise_error(Inexact)
context._raise_error(Rounded)
if not ans:
# raise Clamped on underflow to 0
context._raise_error(Clamped)
return ans
if self_is_subnormal:
context._raise_error(Subnormal)
# fold down if _clamp == 1 and self has too few digits
if context._clamp == 1 and self._exp > Etop:
context._raise_error(Clamped)
self_padded = self._int + '0'*(self._exp - Etop)
return _dec_from_triple(self._sign, self_padded, Etop)
# here self was representable to begin with; return unchanged
return Decimal(self)
# for each of the rounding functions below:
# self is a finite, nonzero Decimal
# prec is an integer satisfying 0 <= prec < len(self._int)
#
# each function returns either -1, 0, or 1, as follows:
# 1 indicates that self should be rounded up (away from zero)
# 0 indicates that self should be truncated, and that all the
# digits to be truncated are zeros (so the value is unchanged)
# -1 indicates that there are nonzero digits to be truncated
def _round_down(self, prec):
"""Also known as round-towards-0, truncate."""
if _all_zeros(self._int, prec):
return 0
else:
return -1
def _round_up(self, prec):
"""Rounds away from 0."""
return -self._round_down(prec)
def _round_half_up(self, prec):
"""Rounds 5 up (away from 0)"""
if self._int[prec] in '56789':
return 1
elif _all_zeros(self._int, prec):
return 0
else:
return -1
def _round_half_down(self, prec):
"""Round 5 down"""
if _exact_half(self._int, prec):
return -1
else:
return self._round_half_up(prec)
def _round_half_even(self, prec):
"""Round 5 to even, rest to nearest."""
if _exact_half(self._int, prec) and \
(prec == 0 or self._int[prec-1] in '02468'):
return -1
else:
return self._round_half_up(prec)
def _round_ceiling(self, prec):
"""Rounds up (not away from 0 if negative.)"""
if self._sign:
return self._round_down(prec)
else:
return -self._round_down(prec)
def _round_floor(self, prec):
"""Rounds down (not towards 0 if negative)"""
if not self._sign:
return self._round_down(prec)
else:
return -self._round_down(prec)
def _round_05up(self, prec):
"""Round down unless digit prec-1 is 0 or 5."""
if prec and self._int[prec-1] not in '05':
return self._round_down(prec)
else:
return -self._round_down(prec)
_pick_rounding_function = dict(
ROUND_DOWN = _round_down,
ROUND_UP = _round_up,
ROUND_HALF_UP = _round_half_up,
ROUND_HALF_DOWN = _round_half_down,
ROUND_HALF_EVEN = _round_half_even,
ROUND_CEILING = _round_ceiling,
ROUND_FLOOR = _round_floor,
ROUND_05UP = _round_05up,
)
def fma(self, other, third, context=None):
"""Fused multiply-add.
Returns self*other+third with no rounding of the intermediate
product self*other.
self and other are multiplied together, with no rounding of
the result. The third operand is then added to the result,
and a single final rounding is performed.
"""
other = _convert_other(other, raiseit=True)
# compute product; raise InvalidOperation if either operand is
# a signaling NaN or if the product is zero times infinity.
if self._is_special or other._is_special:
if context is None:
context = getcontext()
if self._exp == 'N':
return context._raise_error(InvalidOperation, 'sNaN', self)
if other._exp == 'N':
return context._raise_error(InvalidOperation, 'sNaN', other)
if self._exp == 'n':
product = self
elif other._exp == 'n':
product = other
elif self._exp == 'F':
if not other:
return context._raise_error(InvalidOperation,
'INF * 0 in fma')
product = _SignedInfinity[self._sign ^ other._sign]
elif other._exp == 'F':
if not self:
return context._raise_error(InvalidOperation,
'0 * INF in fma')
product = _SignedInfinity[self._sign ^ other._sign]
else:
product = _dec_from_triple(self._sign ^ other._sign,
str(int(self._int) * int(other._int)),
self._exp + other._exp)
third = _convert_other(third, raiseit=True)
return product.__add__(third, context)
def _power_modulo(self, other, modulo, context=None):
"""Three argument version of __pow__"""
# if can't convert other and modulo to Decimal, raise
# TypeError; there's no point returning NotImplemented (no
# equivalent of __rpow__ for three argument pow)
other = _convert_other(other, raiseit=True)
modulo = _convert_other(modulo, raiseit=True)
if context is None:
context = getcontext()
# deal with NaNs: if there are any sNaNs then first one wins,
# (i.e. behaviour for NaNs is identical to that of fma)
self_is_nan = self._isnan()
other_is_nan = other._isnan()
modulo_is_nan = modulo._isnan()
if self_is_nan or other_is_nan or modulo_is_nan:
if self_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
self)
if other_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
other)
if modulo_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
modulo)
if self_is_nan:
return self._fix_nan(context)
if other_is_nan:
return other._fix_nan(context)
return modulo._fix_nan(context)
# check inputs: we apply same restrictions as Python's pow()
if not (self._isinteger() and
other._isinteger() and
modulo._isinteger()):
return context._raise_error(InvalidOperation,
'pow() 3rd argument not allowed '
'unless all arguments are integers')
if other < 0:
return context._raise_error(InvalidOperation,
'pow() 2nd argument cannot be '
'negative when 3rd argument specified')
if not modulo:
return context._raise_error(InvalidOperation,
'pow() 3rd argument cannot be 0')
# additional restriction for decimal: the modulus must be less
# than 10**prec in absolute value
if modulo.adjusted() >= context.prec:
return context._raise_error(InvalidOperation,
'insufficient precision: pow() 3rd '
'argument must not have more than '
'precision digits')
# define 0**0 == NaN, for consistency with two-argument pow
# (even though it hurts!)
if not other and not self:
return context._raise_error(InvalidOperation,
'at least one of pow() 1st argument '
'and 2nd argument must be nonzero; '
'0**0 is not defined')
# compute sign of result
if other._iseven():
sign = 0
else:
sign = self._sign
# convert modulo to a Python integer, and self and other to
# Decimal integers (i.e. force their exponents to be >= 0)
modulo = abs(int(modulo))
base = _WorkRep(self.to_integral_value())
exponent = _WorkRep(other.to_integral_value())
# compute result using integer pow()
base = (base.int % modulo * pow(10, base.exp, modulo)) % modulo
for i in xrange(exponent.exp):
base = pow(base, 10, modulo)
base = pow(base, exponent.int, modulo)
return _dec_from_triple(sign, str(base), 0)
def _power_exact(self, other, p):
"""Attempt to compute self**other exactly.
Given Decimals self and other and an integer p, attempt to
compute an exact result for the power self**other, with p
digits of precision. Return None if self**other is not
exactly representable in p digits.
Assumes that elimination of special cases has already been
performed: self and other must both be nonspecial; self must
be positive and not numerically equal to 1; other must be
nonzero. For efficiency, other._exp should not be too large,
so that 10**abs(other._exp) is a feasible calculation."""
# In the comments below, we write x for the value of self and y for the
# value of other. Write x = xc*10**xe and abs(y) = yc*10**ye, with xc
# and yc positive integers not divisible by 10.
# The main purpose of this method is to identify the *failure*
# of x**y to be exactly representable with as little effort as
# possible. So we look for cheap and easy tests that
# eliminate the possibility of x**y being exact. Only if all
# these tests are passed do we go on to actually compute x**y.
# Here's the main idea. Express y as a rational number m/n, with m and
# n relatively prime and n>0. Then for x**y to be exactly
# representable (at *any* precision), xc must be the nth power of a
# positive integer and xe must be divisible by n. If y is negative
# then additionally xc must be a power of either 2 or 5, hence a power
# of 2**n or 5**n.
#
# There's a limit to how small |y| can be: if y=m/n as above
# then:
#
# (1) if xc != 1 then for the result to be representable we
# need xc**(1/n) >= 2, and hence also xc**|y| >= 2. So
# if |y| <= 1/nbits(xc) then xc < 2**nbits(xc) <=
# 2**(1/|y|), hence xc**|y| < 2 and the result is not
# representable.
#
# (2) if xe != 0, |xe|*(1/n) >= 1, so |xe|*|y| >= 1. Hence if
# |y| < 1/|xe| then the result is not representable.
#
# Note that since x is not equal to 1, at least one of (1) and
# (2) must apply. Now |y| < 1/nbits(xc) iff |yc|*nbits(xc) <
# 10**-ye iff len(str(|yc|*nbits(xc)) <= -ye.
#
# There's also a limit to how large y can be, at least if it's
# positive: the normalized result will have coefficient xc**y,
# so if it's representable then xc**y < 10**p, and y <
# p/log10(xc). Hence if y*log10(xc) >= p then the result is
# not exactly representable.
# if len(str(abs(yc*xe)) <= -ye then abs(yc*xe) < 10**-ye,
# so |y| < 1/xe and the result is not representable.
# Similarly, len(str(abs(yc)*xc_bits)) <= -ye implies |y|
# < 1/nbits(xc).
x = _WorkRep(self)
xc, xe = x.int, x.exp
while xc % 10 == 0:
xc //= 10
xe += 1
y = _WorkRep(other)
yc, ye = y.int, y.exp
while yc % 10 == 0:
yc //= 10
ye += 1
# case where xc == 1: result is 10**(xe*y), with xe*y
# required to be an integer
if xc == 1:
xe *= yc
# result is now 10**(xe * 10**ye); xe * 10**ye must be integral
while xe % 10 == 0:
xe //= 10
ye += 1
if ye < 0:
return None
exponent = xe * 10**ye
if y.sign == 1:
exponent = -exponent
# if other is a nonnegative integer, use ideal exponent
if other._isinteger() and other._sign == 0:
ideal_exponent = self._exp*int(other)
zeros = min(exponent-ideal_exponent, p-1)
else:
zeros = 0
return _dec_from_triple(0, '1' + '0'*zeros, exponent-zeros)
# case where y is negative: xc must be either a power
# of 2 or a power of 5.
if y.sign == 1:
last_digit = xc % 10
if last_digit in (2,4,6,8):
# quick test for power of 2
if xc & -xc != xc:
return None
# now xc is a power of 2; e is its exponent
e = _nbits(xc)-1
# We now have:
#
# x = 2**e * 10**xe, e > 0, and y < 0.
#
# The exact result is:
#
# x**y = 5**(-e*y) * 10**(e*y + xe*y)
#
# provided that both e*y and xe*y are integers. Note that if
# 5**(-e*y) >= 10**p, then the result can't be expressed
# exactly with p digits of precision.
#
# Using the above, we can guard against large values of ye.
# 93/65 is an upper bound for log(10)/log(5), so if
#
# ye >= len(str(93*p//65))
#
# then
#
# -e*y >= -y >= 10**ye > 93*p/65 > p*log(10)/log(5),
#
# so 5**(-e*y) >= 10**p, and the coefficient of the result
# can't be expressed in p digits.
# emax >= largest e such that 5**e < 10**p.
emax = p*93//65
if ye >= len(str(emax)):
return None
# Find -e*y and -xe*y; both must be integers
e = _decimal_lshift_exact(e * yc, ye)
xe = _decimal_lshift_exact(xe * yc, ye)
if e is None or xe is None:
return None
if e > emax:
return None
xc = 5**e
elif last_digit == 5:
# e >= log_5(xc) if xc is a power of 5; we have
# equality all the way up to xc=5**2658
e = _nbits(xc)*28//65
xc, remainder = divmod(5**e, xc)
if remainder:
return None
while xc % 5 == 0:
xc //= 5
e -= 1
# Guard against large values of ye, using the same logic as in
# the 'xc is a power of 2' branch. 10/3 is an upper bound for
# log(10)/log(2).
emax = p*10//3
if ye >= len(str(emax)):
return None
e = _decimal_lshift_exact(e * yc, ye)
xe = _decimal_lshift_exact(xe * yc, ye)
if e is None or xe is None:
return None
if e > emax:
return None
xc = 2**e
else:
return None
if xc >= 10**p:
return None
xe = -e-xe
return _dec_from_triple(0, str(xc), xe)
# now y is positive; find m and n such that y = m/n
if ye >= 0:
m, n = yc*10**ye, 1
else:
if xe != 0 and len(str(abs(yc*xe))) <= -ye:
return None
xc_bits = _nbits(xc)
if xc != 1 and len(str(abs(yc)*xc_bits)) <= -ye:
return None
m, n = yc, 10**(-ye)
while m % 2 == n % 2 == 0:
m //= 2
n //= 2
while m % 5 == n % 5 == 0:
m //= 5
n //= 5
# compute nth root of xc*10**xe
if n > 1:
# if 1 < xc < 2**n then xc isn't an nth power
if xc != 1 and xc_bits <= n:
return None
xe, rem = divmod(xe, n)
if rem != 0:
return None
# compute nth root of xc using Newton's method
a = 1L << -(-_nbits(xc)//n) # initial estimate
while True:
q, r = divmod(xc, a**(n-1))
if a <= q:
break
else:
a = (a*(n-1) + q)//n
if not (a == q and r == 0):
return None
xc = a
# now xc*10**xe is the nth root of the original xc*10**xe
# compute mth power of xc*10**xe
# if m > p*100//_log10_lb(xc) then m > p/log10(xc), hence xc**m >
# 10**p and the result is not representable.
if xc > 1 and m > p*100//_log10_lb(xc):
return None
xc = xc**m
xe *= m
if xc > 10**p:
return None
# by this point the result *is* exactly representable
# adjust the exponent to get as close as possible to the ideal
# exponent, if necessary
str_xc = str(xc)
if other._isinteger() and other._sign == 0:
ideal_exponent = self._exp*int(other)
zeros = min(xe-ideal_exponent, p-len(str_xc))
else:
zeros = 0
return _dec_from_triple(0, str_xc+'0'*zeros, xe-zeros)
def __pow__(self, other, modulo=None, context=None):
"""Return self ** other [ % modulo].
With two arguments, compute self**other.
With three arguments, compute (self**other) % modulo. For the
three argument form, the following restrictions on the
arguments hold:
- all three arguments must be integral
- other must be nonnegative
- either self or other (or both) must be nonzero
- modulo must be nonzero and must have at most p digits,
where p is the context precision.
If any of these restrictions is violated the InvalidOperation
flag is raised.
The result of pow(self, other, modulo) is identical to the
result that would be obtained by computing (self**other) %
modulo with unbounded precision, but is computed more
efficiently. It is always exact.
"""
if modulo is not None:
return self._power_modulo(other, modulo, context)
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
# either argument is a NaN => result is NaN
ans = self._check_nans(other, context)
if ans:
return ans
# 0**0 = NaN (!), x**0 = 1 for nonzero x (including +/-Infinity)
if not other:
if not self:
return context._raise_error(InvalidOperation, '0 ** 0')
else:
return _One
# result has sign 1 iff self._sign is 1 and other is an odd integer
result_sign = 0
if self._sign == 1:
if other._isinteger():
if not other._iseven():
result_sign = 1
else:
# -ve**noninteger = NaN
# (-0)**noninteger = 0**noninteger
if self:
return context._raise_error(InvalidOperation,
'x ** y with x negative and y not an integer')
# negate self, without doing any unwanted rounding
self = self.copy_negate()
# 0**(+ve or Inf)= 0; 0**(-ve or -Inf) = Infinity
if not self:
if other._sign == 0:
return _dec_from_triple(result_sign, '0', 0)
else:
return _SignedInfinity[result_sign]
# Inf**(+ve or Inf) = Inf; Inf**(-ve or -Inf) = 0
if self._isinfinity():
if other._sign == 0:
return _SignedInfinity[result_sign]
else:
return _dec_from_triple(result_sign, '0', 0)
# 1**other = 1, but the choice of exponent and the flags
# depend on the exponent of self, and on whether other is a
# positive integer, a negative integer, or neither
if self == _One:
if other._isinteger():
# exp = max(self._exp*max(int(other), 0),
# 1-context.prec) but evaluating int(other) directly
# is dangerous until we know other is small (other
# could be 1e999999999)
if other._sign == 1:
multiplier = 0
elif other > context.prec:
multiplier = context.prec
else:
multiplier = int(other)
exp = self._exp * multiplier
if exp < 1-context.prec:
exp = 1-context.prec
context._raise_error(Rounded)
else:
context._raise_error(Inexact)
context._raise_error(Rounded)
exp = 1-context.prec
return _dec_from_triple(result_sign, '1'+'0'*-exp, exp)
# compute adjusted exponent of self
self_adj = self.adjusted()
# self ** infinity is infinity if self > 1, 0 if self < 1
# self ** -infinity is infinity if self < 1, 0 if self > 1
if other._isinfinity():
if (other._sign == 0) == (self_adj < 0):
return _dec_from_triple(result_sign, '0', 0)
else:
return _SignedInfinity[result_sign]
# from here on, the result always goes through the call
# to _fix at the end of this function.
ans = None
exact = False
# crude test to catch cases of extreme overflow/underflow. If
# log10(self)*other >= 10**bound and bound >= len(str(Emax))
# then 10**bound >= 10**len(str(Emax)) >= Emax+1 and hence
# self**other >= 10**(Emax+1), so overflow occurs. The test
# for underflow is similar.
bound = self._log10_exp_bound() + other.adjusted()
if (self_adj >= 0) == (other._sign == 0):
# self > 1 and other +ve, or self < 1 and other -ve
# possibility of overflow
if bound >= len(str(context.Emax)):
ans = _dec_from_triple(result_sign, '1', context.Emax+1)
else:
# self > 1 and other -ve, or self < 1 and other +ve
# possibility of underflow to 0
Etiny = context.Etiny()
if bound >= len(str(-Etiny)):
ans = _dec_from_triple(result_sign, '1', Etiny-1)
# try for an exact result with precision +1
if ans is None:
ans = self._power_exact(other, context.prec + 1)
if ans is not None:
if result_sign == 1:
ans = _dec_from_triple(1, ans._int, ans._exp)
exact = True
# usual case: inexact result, x**y computed directly as exp(y*log(x))
if ans is None:
p = context.prec
x = _WorkRep(self)
xc, xe = x.int, x.exp
y = _WorkRep(other)
yc, ye = y.int, y.exp
if y.sign == 1:
yc = -yc
# compute correctly rounded result: start with precision +3,
# then increase precision until result is unambiguously roundable
extra = 3
while True:
coeff, exp = _dpower(xc, xe, yc, ye, p+extra)
if coeff % (5*10**(len(str(coeff))-p-1)):
break
extra += 3
ans = _dec_from_triple(result_sign, str(coeff), exp)
# unlike exp, ln and log10, the power function respects the
# rounding mode; no need to switch to ROUND_HALF_EVEN here
# There's a difficulty here when 'other' is not an integer and
# the result is exact. In this case, the specification
# requires that the Inexact flag be raised (in spite of
# exactness), but since the result is exact _fix won't do this
# for us. (Correspondingly, the Underflow signal should also
# be raised for subnormal results.) We can't directly raise
# these signals either before or after calling _fix, since
# that would violate the precedence for signals. So we wrap
# the ._fix call in a temporary context, and reraise
# afterwards.
if exact and not other._isinteger():
# pad with zeros up to length context.prec+1 if necessary; this
# ensures that the Rounded signal will be raised.
if len(ans._int) <= context.prec:
expdiff = context.prec + 1 - len(ans._int)
ans = _dec_from_triple(ans._sign, ans._int+'0'*expdiff,
ans._exp-expdiff)
# create a copy of the current context, with cleared flags/traps
newcontext = context.copy()
newcontext.clear_flags()
for exception in _signals:
newcontext.traps[exception] = 0
# round in the new context
ans = ans._fix(newcontext)
# raise Inexact, and if necessary, Underflow
newcontext._raise_error(Inexact)
if newcontext.flags[Subnormal]:
newcontext._raise_error(Underflow)
# propagate signals to the original context; _fix could
# have raised any of Overflow, Underflow, Subnormal,
# Inexact, Rounded, Clamped. Overflow needs the correct
# arguments. Note that the order of the exceptions is
# important here.
if newcontext.flags[Overflow]:
context._raise_error(Overflow, 'above Emax', ans._sign)
for exception in Underflow, Subnormal, Inexact, Rounded, Clamped:
if newcontext.flags[exception]:
context._raise_error(exception)
else:
ans = ans._fix(context)
return ans
def __rpow__(self, other, context=None):
"""Swaps self/other and returns __pow__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__pow__(self, context=context)
def normalize(self, context=None):
"""Normalize- strip trailing 0s, change anything equal to 0 to 0e0"""
if context is None:
context = getcontext()
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
dup = self._fix(context)
if dup._isinfinity():
return dup
if not dup:
return _dec_from_triple(dup._sign, '0', 0)
exp_max = [context.Emax, context.Etop()][context._clamp]
end = len(dup._int)
exp = dup._exp
while dup._int[end-1] == '0' and exp < exp_max:
exp += 1
end -= 1
return _dec_from_triple(dup._sign, dup._int[:end], exp)
def quantize(self, exp, rounding=None, context=None, watchexp=True):
"""Quantize self so its exponent is the same as that of exp.
Similar to self._rescale(exp._exp) but with error checking.
"""
exp = _convert_other(exp, raiseit=True)
if context is None:
context = getcontext()
if rounding is None:
rounding = context.rounding
if self._is_special or exp._is_special:
ans = self._check_nans(exp, context)
if ans:
return ans
if exp._isinfinity() or self._isinfinity():
if exp._isinfinity() and self._isinfinity():
return Decimal(self) # if both are inf, it is OK
return context._raise_error(InvalidOperation,
'quantize with one INF')
# if we're not watching exponents, do a simple rescale
if not watchexp:
ans = self._rescale(exp._exp, rounding)
# raise Inexact and Rounded where appropriate
if ans._exp > self._exp:
context._raise_error(Rounded)
if ans != self:
context._raise_error(Inexact)
return ans
# exp._exp should be between Etiny and Emax
if not (context.Etiny() <= exp._exp <= context.Emax):
return context._raise_error(InvalidOperation,
'target exponent out of bounds in quantize')
if not self:
ans = _dec_from_triple(self._sign, '0', exp._exp)
return ans._fix(context)
self_adjusted = self.adjusted()
if self_adjusted > context.Emax:
return context._raise_error(InvalidOperation,
'exponent of quantize result too large for current context')
if self_adjusted - exp._exp + 1 > context.prec:
return context._raise_error(InvalidOperation,
'quantize result has too many digits for current context')
ans = self._rescale(exp._exp, rounding)
if ans.adjusted() > context.Emax:
return context._raise_error(InvalidOperation,
'exponent of quantize result too large for current context')
if len(ans._int) > context.prec:
return context._raise_error(InvalidOperation,
'quantize result has too many digits for current context')
# raise appropriate flags
if ans and ans.adjusted() < context.Emin:
context._raise_error(Subnormal)
if ans._exp > self._exp:
if ans != self:
context._raise_error(Inexact)
context._raise_error(Rounded)
# call to fix takes care of any necessary folddown, and
# signals Clamped if necessary
ans = ans._fix(context)
return ans
def same_quantum(self, other):
"""Return True if self and other have the same exponent; otherwise
return False.
If either operand is a special value, the following rules are used:
* return True if both operands are infinities
* return True if both operands are NaNs
* otherwise, return False.
"""
other = _convert_other(other, raiseit=True)
if self._is_special or other._is_special:
return (self.is_nan() and other.is_nan() or
self.is_infinite() and other.is_infinite())
return self._exp == other._exp
def _rescale(self, exp, rounding):
"""Rescale self so that the exponent is exp, either by padding with zeros
or by truncating digits, using the given rounding mode.
Specials are returned without change. This operation is
quiet: it raises no flags, and uses no information from the
context.
exp = exp to scale to (an integer)
rounding = rounding mode
"""
if self._is_special:
return Decimal(self)
if not self:
return _dec_from_triple(self._sign, '0', exp)
if self._exp >= exp:
# pad answer with zeros if necessary
return _dec_from_triple(self._sign,
self._int + '0'*(self._exp - exp), exp)
# too many digits; round and lose data. If self.adjusted() <
# exp-1, replace self by 10**(exp-1) before rounding
digits = len(self._int) + self._exp - exp
if digits < 0:
self = _dec_from_triple(self._sign, '1', exp-1)
digits = 0
this_function = self._pick_rounding_function[rounding]
changed = this_function(self, digits)
coeff = self._int[:digits] or '0'
if changed == 1:
coeff = str(int(coeff)+1)
return _dec_from_triple(self._sign, coeff, exp)
def _round(self, places, rounding):
"""Round a nonzero, nonspecial Decimal to a fixed number of
significant figures, using the given rounding mode.
Infinities, NaNs and zeros are returned unaltered.
This operation is quiet: it raises no flags, and uses no
information from the context.
"""
if places <= 0:
raise ValueError("argument should be at least 1 in _round")
if self._is_special or not self:
return Decimal(self)
ans = self._rescale(self.adjusted()+1-places, rounding)
# it can happen that the rescale alters the adjusted exponent;
# for example when rounding 99.97 to 3 significant figures.
# When this happens we end up with an extra 0 at the end of
# the number; a second rescale fixes this.
if ans.adjusted() != self.adjusted():
ans = ans._rescale(ans.adjusted()+1-places, rounding)
return ans
def to_integral_exact(self, rounding=None, context=None):
"""Rounds to a nearby integer.
If no rounding mode is specified, take the rounding mode from
the context. This method raises the Rounded and Inexact flags
when appropriate.
See also: to_integral_value, which does exactly the same as
this method except that it doesn't raise Inexact or Rounded.
"""
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
return Decimal(self)
if self._exp >= 0:
return Decimal(self)
if not self:
return _dec_from_triple(self._sign, '0', 0)
if context is None:
context = getcontext()
if rounding is None:
rounding = context.rounding
ans = self._rescale(0, rounding)
if ans != self:
context._raise_error(Inexact)
context._raise_error(Rounded)
return ans
def to_integral_value(self, rounding=None, context=None):
"""Rounds to the nearest integer, without raising inexact, rounded."""
if context is None:
context = getcontext()
if rounding is None:
rounding = context.rounding
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
return Decimal(self)
if self._exp >= 0:
return Decimal(self)
else:
return self._rescale(0, rounding)
# the method name changed, but we provide also the old one, for compatibility
to_integral = to_integral_value
def sqrt(self, context=None):
"""Return the square root of self."""
if context is None:
context = getcontext()
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
if self._isinfinity() and self._sign == 0:
return Decimal(self)
if not self:
# exponent = self._exp // 2. sqrt(-0) = -0
ans = _dec_from_triple(self._sign, '0', self._exp // 2)
return ans._fix(context)
if self._sign == 1:
return context._raise_error(InvalidOperation, 'sqrt(-x), x > 0')
# At this point self represents a positive number. Let p be
# the desired precision and express self in the form c*100**e
# with c a positive real number and e an integer, c and e
# being chosen so that 100**(p-1) <= c < 100**p. Then the
# (exact) square root of self is sqrt(c)*10**e, and 10**(p-1)
# <= sqrt(c) < 10**p, so the closest representable Decimal at
# precision p is n*10**e where n = round_half_even(sqrt(c)),
# the closest integer to sqrt(c) with the even integer chosen
# in the case of a tie.
#
# To ensure correct rounding in all cases, we use the
# following trick: we compute the square root to an extra
# place (precision p+1 instead of precision p), rounding down.
# Then, if the result is inexact and its last digit is 0 or 5,
# we increase the last digit to 1 or 6 respectively; if it's
# exact we leave the last digit alone. Now the final round to
# p places (or fewer in the case of underflow) will round
# correctly and raise the appropriate flags.
# use an extra digit of precision
prec = context.prec+1
# write argument in the form c*100**e where e = self._exp//2
# is the 'ideal' exponent, to be used if the square root is
# exactly representable. l is the number of 'digits' of c in
# base 100, so that 100**(l-1) <= c < 100**l.
op = _WorkRep(self)
e = op.exp >> 1
if op.exp & 1:
c = op.int * 10
l = (len(self._int) >> 1) + 1
else:
c = op.int
l = len(self._int)+1 >> 1
# rescale so that c has exactly prec base 100 'digits'
shift = prec-l
if shift >= 0:
c *= 100**shift
exact = True
else:
c, remainder = divmod(c, 100**-shift)
exact = not remainder
e -= shift
# find n = floor(sqrt(c)) using Newton's method
n = 10**prec
while True:
q = c//n
if n <= q:
break
else:
n = n + q >> 1
exact = exact and n*n == c
if exact:
# result is exact; rescale to use ideal exponent e
if shift >= 0:
# assert n % 10**shift == 0
n //= 10**shift
else:
n *= 10**-shift
e += shift
else:
# result is not exact; fix last digit as described above
if n % 5 == 0:
n += 1
ans = _dec_from_triple(0, str(n), e)
# round, and fit to current context
context = context._shallow_copy()
rounding = context._set_rounding(ROUND_HALF_EVEN)
ans = ans._fix(context)
context.rounding = rounding
return ans
def max(self, other, context=None):
"""Returns the larger value.
Like max(self, other) except if one is not a number, returns
NaN (and signals if one is sNaN). Also rounds.
"""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
if self._is_special or other._is_special:
# If one operand is a quiet NaN and the other is number, then the
# number is always returned
sn = self._isnan()
on = other._isnan()
if sn or on:
if on == 1 and sn == 0:
return self._fix(context)
if sn == 1 and on == 0:
return other._fix(context)
return self._check_nans(other, context)
c = self._cmp(other)
if c == 0:
# If both operands are finite and equal in numerical value
# then an ordering is applied:
#
# If the signs differ then max returns the operand with the
# positive sign and min returns the operand with the negative sign
#
# If the signs are the same then the exponent is used to select
# the result. This is exactly the ordering used in compare_total.
c = self.compare_total(other)
if c == -1:
ans = other
else:
ans = self
return ans._fix(context)
def min(self, other, context=None):
"""Returns the smaller value.
Like min(self, other) except if one is not a number, returns
NaN (and signals if one is sNaN). Also rounds.
"""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
if self._is_special or other._is_special:
# If one operand is a quiet NaN and the other is number, then the
# number is always returned
sn = self._isnan()
on = other._isnan()
if sn or on:
if on == 1 and sn == 0:
return self._fix(context)
if sn == 1 and on == 0:
return other._fix(context)
return self._check_nans(other, context)
c = self._cmp(other)
if c == 0:
c = self.compare_total(other)
if c == -1:
ans = self
else:
ans = other
return ans._fix(context)
def _isinteger(self):
"""Returns whether self is an integer"""
if self._is_special:
return False
if self._exp >= 0:
return True
rest = self._int[self._exp:]
return rest == '0'*len(rest)
def _iseven(self):
"""Returns True if self is even. Assumes self is an integer."""
if not self or self._exp > 0:
return True
return self._int[-1+self._exp] in '02468'
def adjusted(self):
"""Return the adjusted exponent of self"""
try:
return self._exp + len(self._int) - 1
# If NaN or Infinity, self._exp is string
except TypeError:
return 0
def canonical(self, context=None):
"""Returns the same Decimal object.
As we do not have different encodings for the same number, the
received object already is in its canonical form.
"""
return self
def compare_signal(self, other, context=None):
"""Compares self to the other operand numerically.
It's pretty much like compare(), but all NaNs signal, with signaling
NaNs taking precedence over quiet NaNs.
"""
other = _convert_other(other, raiseit = True)
ans = self._compare_check_nans(other, context)
if ans:
return ans
return self.compare(other, context=context)
def compare_total(self, other):
"""Compares self to other using the abstract representations.
This is not like the standard compare, which use their numerical
value. Note that a total ordering is defined for all possible abstract
representations.
"""
other = _convert_other(other, raiseit=True)
# if one is negative and the other is positive, it's easy
if self._sign and not other._sign:
return _NegativeOne
if not self._sign and other._sign:
return _One
sign = self._sign
# let's handle both NaN types
self_nan = self._isnan()
other_nan = other._isnan()
if self_nan or other_nan:
if self_nan == other_nan:
# compare payloads as though they're integers
self_key = len(self._int), self._int
other_key = len(other._int), other._int
if self_key < other_key:
if sign:
return _One
else:
return _NegativeOne
if self_key > other_key:
if sign:
return _NegativeOne
else:
return _One
return _Zero
if sign:
if self_nan == 1:
return _NegativeOne
if other_nan == 1:
return _One
if self_nan == 2:
return _NegativeOne
if other_nan == 2:
return _One
else:
if self_nan == 1:
return _One
if other_nan == 1:
return _NegativeOne
if self_nan == 2:
return _One
if other_nan == 2:
return _NegativeOne
if self < other:
return _NegativeOne
if self > other:
return _One
if self._exp < other._exp:
if sign:
return _One
else:
return _NegativeOne
if self._exp > other._exp:
if sign:
return _NegativeOne
else:
return _One
return _Zero
def compare_total_mag(self, other):
"""Compares self to other using abstract repr., ignoring sign.
Like compare_total, but with operand's sign ignored and assumed to be 0.
"""
other = _convert_other(other, raiseit=True)
s = self.copy_abs()
o = other.copy_abs()
return s.compare_total(o)
def copy_abs(self):
"""Returns a copy with the sign set to 0. """
return _dec_from_triple(0, self._int, self._exp, self._is_special)
def copy_negate(self):
"""Returns a copy with the sign inverted."""
if self._sign:
return _dec_from_triple(0, self._int, self._exp, self._is_special)
else:
return _dec_from_triple(1, self._int, self._exp, self._is_special)
def copy_sign(self, other):
"""Returns self with the sign of other."""
other = _convert_other(other, raiseit=True)
return _dec_from_triple(other._sign, self._int,
self._exp, self._is_special)
def exp(self, context=None):
"""Returns e ** self."""
if context is None:
context = getcontext()
# exp(NaN) = NaN
ans = self._check_nans(context=context)
if ans:
return ans
# exp(-Infinity) = 0
if self._isinfinity() == -1:
return _Zero
# exp(0) = 1
if not self:
return _One
# exp(Infinity) = Infinity
if self._isinfinity() == 1:
return Decimal(self)
# the result is now guaranteed to be inexact (the true
# mathematical result is transcendental). There's no need to
# raise Rounded and Inexact here---they'll always be raised as
# a result of the call to _fix.
p = context.prec
adj = self.adjusted()
# we only need to do any computation for quite a small range
# of adjusted exponents---for example, -29 <= adj <= 10 for
# the default context. For smaller exponent the result is
# indistinguishable from 1 at the given precision, while for
# larger exponent the result either overflows or underflows.
if self._sign == 0 and adj > len(str((context.Emax+1)*3)):
# overflow
ans = _dec_from_triple(0, '1', context.Emax+1)
elif self._sign == 1 and adj > len(str((-context.Etiny()+1)*3)):
# underflow to 0
ans = _dec_from_triple(0, '1', context.Etiny()-1)
elif self._sign == 0 and adj < -p:
# p+1 digits; final round will raise correct flags
ans = _dec_from_triple(0, '1' + '0'*(p-1) + '1', -p)
elif self._sign == 1 and adj < -p-1:
# p+1 digits; final round will raise correct flags
ans = _dec_from_triple(0, '9'*(p+1), -p-1)
# general case
else:
op = _WorkRep(self)
c, e = op.int, op.exp
if op.sign == 1:
c = -c
# compute correctly rounded result: increase precision by
# 3 digits at a time until we get an unambiguously
# roundable result
extra = 3
while True:
coeff, exp = _dexp(c, e, p+extra)
if coeff % (5*10**(len(str(coeff))-p-1)):
break
extra += 3
ans = _dec_from_triple(0, str(coeff), exp)
# at this stage, ans should round correctly with *any*
# rounding mode, not just with ROUND_HALF_EVEN
context = context._shallow_copy()
rounding = context._set_rounding(ROUND_HALF_EVEN)
ans = ans._fix(context)
context.rounding = rounding
return ans
def is_canonical(self):
"""Return True if self is canonical; otherwise return False.
Currently, the encoding of a Decimal instance is always
canonical, so this method returns True for any Decimal.
"""
return True
def is_finite(self):
"""Return True if self is finite; otherwise return False.
A Decimal instance is considered finite if it is neither
infinite nor a NaN.
"""
return not self._is_special
def is_infinite(self):
"""Return True if self is infinite; otherwise return False."""
return self._exp == 'F'
def is_nan(self):
"""Return True if self is a qNaN or sNaN; otherwise return False."""
return self._exp in ('n', 'N')
def is_normal(self, context=None):
"""Return True if self is a normal number; otherwise return False."""
if self._is_special or not self:
return False
if context is None:
context = getcontext()
return context.Emin <= self.adjusted()
def is_qnan(self):
"""Return True if self is a quiet NaN; otherwise return False."""
return self._exp == 'n'
def is_signed(self):
"""Return True if self is negative; otherwise return False."""
return self._sign == 1
def is_snan(self):
"""Return True if self is a signaling NaN; otherwise return False."""
return self._exp == 'N'
def is_subnormal(self, context=None):
"""Return True if self is subnormal; otherwise return False."""
if self._is_special or not self:
return False
if context is None:
context = getcontext()
return self.adjusted() < context.Emin
def is_zero(self):
"""Return True if self is a zero; otherwise return False."""
return not self._is_special and self._int == '0'
def _ln_exp_bound(self):
"""Compute a lower bound for the adjusted exponent of self.ln().
In other words, compute r such that self.ln() >= 10**r. Assumes
that self is finite and positive and that self != 1.
"""
# for 0.1 <= x <= 10 we use the inequalities 1-1/x <= ln(x) <= x-1
adj = self._exp + len(self._int) - 1
if adj >= 1:
# argument >= 10; we use 23/10 = 2.3 as a lower bound for ln(10)
return len(str(adj*23//10)) - 1
if adj <= -2:
# argument <= 0.1
return len(str((-1-adj)*23//10)) - 1
op = _WorkRep(self)
c, e = op.int, op.exp
if adj == 0:
# 1 < self < 10
num = str(c-10**-e)
den = str(c)
return len(num) - len(den) - (num < den)
# adj == -1, 0.1 <= self < 1
return e + len(str(10**-e - c)) - 1
def ln(self, context=None):
"""Returns the natural (base e) logarithm of self."""
if context is None:
context = getcontext()
# ln(NaN) = NaN
ans = self._check_nans(context=context)
if ans:
return ans
# ln(0.0) == -Infinity
if not self:
return _NegativeInfinity
# ln(Infinity) = Infinity
if self._isinfinity() == 1:
return _Infinity
# ln(1.0) == 0.0
if self == _One:
return _Zero
# ln(negative) raises InvalidOperation
if self._sign == 1:
return context._raise_error(InvalidOperation,
'ln of a negative value')
# result is irrational, so necessarily inexact
op = _WorkRep(self)
c, e = op.int, op.exp
p = context.prec
# correctly rounded result: repeatedly increase precision by 3
# until we get an unambiguously roundable result
places = p - self._ln_exp_bound() + 2 # at least p+3 places
while True:
coeff = _dlog(c, e, places)
# assert len(str(abs(coeff)))-p >= 1
if coeff % (5*10**(len(str(abs(coeff)))-p-1)):
break
places += 3
ans = _dec_from_triple(int(coeff<0), str(abs(coeff)), -places)
context = context._shallow_copy()
rounding = context._set_rounding(ROUND_HALF_EVEN)
ans = ans._fix(context)
context.rounding = rounding
return ans
def _log10_exp_bound(self):
"""Compute a lower bound for the adjusted exponent of self.log10().
In other words, find r such that self.log10() >= 10**r.
Assumes that self is finite and positive and that self != 1.
"""
# For x >= 10 or x < 0.1 we only need a bound on the integer
# part of log10(self), and this comes directly from the
# exponent of x. For 0.1 <= x <= 10 we use the inequalities
# 1-1/x <= log(x) <= x-1. If x > 1 we have |log10(x)| >
# (1-1/x)/2.31 > 0. If x < 1 then |log10(x)| > (1-x)/2.31 > 0
adj = self._exp + len(self._int) - 1
if adj >= 1:
# self >= 10
return len(str(adj))-1
if adj <= -2:
# self < 0.1
return len(str(-1-adj))-1
op = _WorkRep(self)
c, e = op.int, op.exp
if adj == 0:
# 1 < self < 10
num = str(c-10**-e)
den = str(231*c)
return len(num) - len(den) - (num < den) + 2
# adj == -1, 0.1 <= self < 1
num = str(10**-e-c)
return len(num) + e - (num < "231") - 1
def log10(self, context=None):
"""Returns the base 10 logarithm of self."""
if context is None:
context = getcontext()
# log10(NaN) = NaN
ans = self._check_nans(context=context)
if ans:
return ans
# log10(0.0) == -Infinity
if not self:
return _NegativeInfinity
# log10(Infinity) = Infinity
if self._isinfinity() == 1:
return _Infinity
# log10(negative or -Infinity) raises InvalidOperation
if self._sign == 1:
return context._raise_error(InvalidOperation,
'log10 of a negative value')
# log10(10**n) = n
if self._int[0] == '1' and self._int[1:] == '0'*(len(self._int) - 1):
# answer may need rounding
ans = Decimal(self._exp + len(self._int) - 1)
else:
# result is irrational, so necessarily inexact
op = _WorkRep(self)
c, e = op.int, op.exp
p = context.prec
# correctly rounded result: repeatedly increase precision
# until result is unambiguously roundable
places = p-self._log10_exp_bound()+2
while True:
coeff = _dlog10(c, e, places)
# assert len(str(abs(coeff)))-p >= 1
if coeff % (5*10**(len(str(abs(coeff)))-p-1)):
break
places += 3
ans = _dec_from_triple(int(coeff<0), str(abs(coeff)), -places)
context = context._shallow_copy()
rounding = context._set_rounding(ROUND_HALF_EVEN)
ans = ans._fix(context)
context.rounding = rounding
return ans
def logb(self, context=None):
""" Returns the exponent of the magnitude of self's MSD.
The result is the integer which is the exponent of the magnitude
of the most significant digit of self (as though it were truncated
to a single digit while maintaining the value of that digit and
without limiting the resulting exponent).
"""
# logb(NaN) = NaN
ans = self._check_nans(context=context)
if ans:
return ans
if context is None:
context = getcontext()
# logb(+/-Inf) = +Inf
if self._isinfinity():
return _Infinity
# logb(0) = -Inf, DivisionByZero
if not self:
return context._raise_error(DivisionByZero, 'logb(0)', 1)
# otherwise, simply return the adjusted exponent of self, as a
# Decimal. Note that no attempt is made to fit the result
# into the current context.
ans = Decimal(self.adjusted())
return ans._fix(context)
def _islogical(self):
"""Return True if self is a logical operand.
For being logical, it must be a finite number with a sign of 0,
an exponent of 0, and a coefficient whose digits must all be
either 0 or 1.
"""
if self._sign != 0 or self._exp != 0:
return False
for dig in self._int:
if dig not in '01':
return False
return True
def _fill_logical(self, context, opa, opb):
dif = context.prec - len(opa)
if dif > 0:
opa = '0'*dif + opa
elif dif < 0:
opa = opa[-context.prec:]
dif = context.prec - len(opb)
if dif > 0:
opb = '0'*dif + opb
elif dif < 0:
opb = opb[-context.prec:]
return opa, opb
def logical_and(self, other, context=None):
"""Applies an 'and' operation between self and other's digits."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
if not self._islogical() or not other._islogical():
return context._raise_error(InvalidOperation)
# fill to context.prec
(opa, opb) = self._fill_logical(context, self._int, other._int)
# make the operation, and clean starting zeroes
result = "".join([str(int(a)&int(b)) for a,b in zip(opa,opb)])
return _dec_from_triple(0, result.lstrip('0') or '0', 0)
def logical_invert(self, context=None):
"""Invert all its digits."""
if context is None:
context = getcontext()
return self.logical_xor(_dec_from_triple(0,'1'*context.prec,0),
context)
def logical_or(self, other, context=None):
"""Applies an 'or' operation between self and other's digits."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
if not self._islogical() or not other._islogical():
return context._raise_error(InvalidOperation)
# fill to context.prec
(opa, opb) = self._fill_logical(context, self._int, other._int)
# make the operation, and clean starting zeroes
result = "".join([str(int(a)|int(b)) for a,b in zip(opa,opb)])
return _dec_from_triple(0, result.lstrip('0') or '0', 0)
def logical_xor(self, other, context=None):
"""Applies an 'xor' operation between self and other's digits."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
if not self._islogical() or not other._islogical():
return context._raise_error(InvalidOperation)
# fill to context.prec
(opa, opb) = self._fill_logical(context, self._int, other._int)
# make the operation, and clean starting zeroes
result = "".join([str(int(a)^int(b)) for a,b in zip(opa,opb)])
return _dec_from_triple(0, result.lstrip('0') or '0', 0)
def max_mag(self, other, context=None):
"""Compares the values numerically with their sign ignored."""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
if self._is_special or other._is_special:
# If one operand is a quiet NaN and the other is number, then the
# number is always returned
sn = self._isnan()
on = other._isnan()
if sn or on:
if on == 1 and sn == 0:
return self._fix(context)
if sn == 1 and on == 0:
return other._fix(context)
return self._check_nans(other, context)
c = self.copy_abs()._cmp(other.copy_abs())
if c == 0:
c = self.compare_total(other)
if c == -1:
ans = other
else:
ans = self
return ans._fix(context)
def min_mag(self, other, context=None):
"""Compares the values numerically with their sign ignored."""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
if self._is_special or other._is_special:
# If one operand is a quiet NaN and the other is number, then the
# number is always returned
sn = self._isnan()
on = other._isnan()
if sn or on:
if on == 1 and sn == 0:
return self._fix(context)
if sn == 1 and on == 0:
return other._fix(context)
return self._check_nans(other, context)
c = self.copy_abs()._cmp(other.copy_abs())
if c == 0:
c = self.compare_total(other)
if c == -1:
ans = self
else:
ans = other
return ans._fix(context)
def next_minus(self, context=None):
"""Returns the largest representable number smaller than itself."""
if context is None:
context = getcontext()
ans = self._check_nans(context=context)
if ans:
return ans
if self._isinfinity() == -1:
return _NegativeInfinity
if self._isinfinity() == 1:
return _dec_from_triple(0, '9'*context.prec, context.Etop())
context = context.copy()
context._set_rounding(ROUND_FLOOR)
context._ignore_all_flags()
new_self = self._fix(context)
if new_self != self:
return new_self
return self.__sub__(_dec_from_triple(0, '1', context.Etiny()-1),
context)
def next_plus(self, context=None):
"""Returns the smallest representable number larger than itself."""
if context is None:
context = getcontext()
ans = self._check_nans(context=context)
if ans:
return ans
if self._isinfinity() == 1:
return _Infinity
if self._isinfinity() == -1:
return _dec_from_triple(1, '9'*context.prec, context.Etop())
context = context.copy()
context._set_rounding(ROUND_CEILING)
context._ignore_all_flags()
new_self = self._fix(context)
if new_self != self:
return new_self
return self.__add__(_dec_from_triple(0, '1', context.Etiny()-1),
context)
def next_toward(self, other, context=None):
"""Returns the number closest to self, in the direction towards other.
The result is the closest representable number to self
(excluding self) that is in the direction towards other,
unless both have the same value. If the two operands are
numerically equal, then the result is a copy of self with the
sign set to be the same as the sign of other.
"""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
ans = self._check_nans(other, context)
if ans:
return ans
comparison = self._cmp(other)
if comparison == 0:
return self.copy_sign(other)
if comparison == -1:
ans = self.next_plus(context)
else: # comparison == 1
ans = self.next_minus(context)
# decide which flags to raise using value of ans
if ans._isinfinity():
context._raise_error(Overflow,
'Infinite result from next_toward',
ans._sign)
context._raise_error(Inexact)
context._raise_error(Rounded)
elif ans.adjusted() < context.Emin:
context._raise_error(Underflow)
context._raise_error(Subnormal)
context._raise_error(Inexact)
context._raise_error(Rounded)
# if precision == 1 then we don't raise Clamped for a
# result 0E-Etiny.
if not ans:
context._raise_error(Clamped)
return ans
def number_class(self, context=None):
"""Returns an indication of the class of self.
The class is one of the following strings:
sNaN
NaN
-Infinity
-Normal
-Subnormal
-Zero
+Zero
+Subnormal
+Normal
+Infinity
"""
if self.is_snan():
return "sNaN"
if self.is_qnan():
return "NaN"
inf = self._isinfinity()
if inf == 1:
return "+Infinity"
if inf == -1:
return "-Infinity"
if self.is_zero():
if self._sign:
return "-Zero"
else:
return "+Zero"
if context is None:
context = getcontext()
if self.is_subnormal(context=context):
if self._sign:
return "-Subnormal"
else:
return "+Subnormal"
# just a normal, regular, boring number, :)
if self._sign:
return "-Normal"
else:
return "+Normal"
def radix(self):
"""Just returns 10, as this is Decimal, :)"""
return Decimal(10)
def rotate(self, other, context=None):
"""Returns a rotated copy of self, value-of-other times."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
ans = self._check_nans(other, context)
if ans:
return ans
if other._exp != 0:
return context._raise_error(InvalidOperation)
if not (-context.prec <= int(other) <= context.prec):
return context._raise_error(InvalidOperation)
if self._isinfinity():
return Decimal(self)
# get values, pad if necessary
torot = int(other)
rotdig = self._int
topad = context.prec - len(rotdig)
if topad > 0:
rotdig = '0'*topad + rotdig
elif topad < 0:
rotdig = rotdig[-topad:]
# let's rotate!
rotated = rotdig[torot:] + rotdig[:torot]
return _dec_from_triple(self._sign,
rotated.lstrip('0') or '0', self._exp)
def scaleb(self, other, context=None):
"""Returns self operand after adding the second value to its exp."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
ans = self._check_nans(other, context)
if ans:
return ans
if other._exp != 0:
return context._raise_error(InvalidOperation)
liminf = -2 * (context.Emax + context.prec)
limsup = 2 * (context.Emax + context.prec)
if not (liminf <= int(other) <= limsup):
return context._raise_error(InvalidOperation)
if self._isinfinity():
return Decimal(self)
d = _dec_from_triple(self._sign, self._int, self._exp + int(other))
d = d._fix(context)
return d
def shift(self, other, context=None):
"""Returns a shifted copy of self, value-of-other times."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
ans = self._check_nans(other, context)
if ans:
return ans
if other._exp != 0:
return context._raise_error(InvalidOperation)
if not (-context.prec <= int(other) <= context.prec):
return context._raise_error(InvalidOperation)
if self._isinfinity():
return Decimal(self)
# get values, pad if necessary
torot = int(other)
rotdig = self._int
topad = context.prec - len(rotdig)
if topad > 0:
rotdig = '0'*topad + rotdig
elif topad < 0:
rotdig = rotdig[-topad:]
# let's shift!
if torot < 0:
shifted = rotdig[:torot]
else:
shifted = rotdig + '0'*torot
shifted = shifted[-context.prec:]
return _dec_from_triple(self._sign,
shifted.lstrip('0') or '0', self._exp)
# Support for pickling, copy, and deepcopy
def __reduce__(self):
return (self.__class__, (str(self),))
def __copy__(self):
if type(self) is Decimal:
return self # I'm immutable; therefore I am my own clone
return self.__class__(str(self))
def __deepcopy__(self, memo):
if type(self) is Decimal:
return self # My components are also immutable
return self.__class__(str(self))
# PEP 3101 support. the _localeconv keyword argument should be
# considered private: it's provided for ease of testing only.
def __format__(self, specifier, context=None, _localeconv=None):
"""Format a Decimal instance according to the given specifier.
The specifier should be a standard format specifier, with the
form described in PEP 3101. Formatting types 'e', 'E', 'f',
'F', 'g', 'G', 'n' and '%' are supported. If the formatting
type is omitted it defaults to 'g' or 'G', depending on the
value of context.capitals.
"""
# Note: PEP 3101 says that if the type is not present then
# there should be at least one digit after the decimal point.
# We take the liberty of ignoring this requirement for
# Decimal---it's presumably there to make sure that
# format(float, '') behaves similarly to str(float).
if context is None:
context = getcontext()
spec = _parse_format_specifier(specifier, _localeconv=_localeconv)
# special values don't care about the type or precision
if self._is_special:
sign = _format_sign(self._sign, spec)
body = str(self.copy_abs())
if spec['type'] == '%':
body += '%'
return _format_align(sign, body, spec)
# a type of None defaults to 'g' or 'G', depending on context
if spec['type'] is None:
spec['type'] = ['g', 'G'][context.capitals]
# if type is '%', adjust exponent of self accordingly
if spec['type'] == '%':
self = _dec_from_triple(self._sign, self._int, self._exp+2)
# round if necessary, taking rounding mode from the context
rounding = context.rounding
precision = spec['precision']
if precision is not None:
if spec['type'] in 'eE':
self = self._round(precision+1, rounding)
elif spec['type'] in 'fF%':
self = self._rescale(-precision, rounding)
elif spec['type'] in 'gG' and len(self._int) > precision:
self = self._round(precision, rounding)
# special case: zeros with a positive exponent can't be
# represented in fixed point; rescale them to 0e0.
if not self and self._exp > 0 and spec['type'] in 'fF%':
self = self._rescale(0, rounding)
# figure out placement of the decimal point
leftdigits = self._exp + len(self._int)
if spec['type'] in 'eE':
if not self and precision is not None:
dotplace = 1 - precision
else:
dotplace = 1
elif spec['type'] in 'fF%':
dotplace = leftdigits
elif spec['type'] in 'gG':
if self._exp <= 0 and leftdigits > -6:
dotplace = leftdigits
else:
dotplace = 1
# find digits before and after decimal point, and get exponent
if dotplace < 0:
intpart = '0'
fracpart = '0'*(-dotplace) + self._int
elif dotplace > len(self._int):
intpart = self._int + '0'*(dotplace-len(self._int))
fracpart = ''
else:
intpart = self._int[:dotplace] or '0'
fracpart = self._int[dotplace:]
exp = leftdigits-dotplace
# done with the decimal-specific stuff; hand over the rest
# of the formatting to the _format_number function
return _format_number(self._sign, intpart, fracpart, exp, spec)
def _dec_from_triple(sign, coefficient, exponent, special=False):
"""Create a decimal instance directly, without any validation,
normalization (e.g. removal of leading zeros) or argument
conversion.
This function is for *internal use only*.
"""
self = object.__new__(Decimal)
self._sign = sign
self._int = coefficient
self._exp = exponent
self._is_special = special
return self
# Register Decimal as a kind of Number (an abstract base class).
# However, do not register it as Real (because Decimals are not
# interoperable with floats).
_numbers.Number.register(Decimal)
##### Context class #######################################################
class _ContextManager(object):
"""Context manager class to support localcontext().
Sets a copy of the supplied context in __enter__() and restores
the previous decimal context in __exit__()
"""
def __init__(self, new_context):
self.new_context = new_context.copy()
def __enter__(self):
self.saved_context = getcontext()
setcontext(self.new_context)
return self.new_context
def __exit__(self, t, v, tb):
setcontext(self.saved_context)
class Context(object):
"""Contains the context for a Decimal instance.
Contains:
prec - precision (for use in rounding, division, square roots..)
rounding - rounding type (how you round)
traps - If traps[exception] = 1, then the exception is
raised when it is caused. Otherwise, a value is
substituted in.
flags - When an exception is caused, flags[exception] is set.
(Whether or not the trap_enabler is set)
Should be reset by user of Decimal instance.
Emin - Minimum exponent
Emax - Maximum exponent
capitals - If 1, 1*10^1 is printed as 1E+1.
If 0, printed as 1e1
_clamp - If 1, change exponents if too high (Default 0)
"""
def __init__(self, prec=None, rounding=None,
traps=None, flags=None,
Emin=None, Emax=None,
capitals=None, _clamp=0,
_ignored_flags=None):
# Set defaults; for everything except flags and _ignored_flags,
# inherit from DefaultContext.
try:
dc = DefaultContext
except NameError:
pass
self.prec = prec if prec is not None else dc.prec
self.rounding = rounding if rounding is not None else dc.rounding
self.Emin = Emin if Emin is not None else dc.Emin
self.Emax = Emax if Emax is not None else dc.Emax
self.capitals = capitals if capitals is not None else dc.capitals
self._clamp = _clamp if _clamp is not None else dc._clamp
if _ignored_flags is None:
self._ignored_flags = []
else:
self._ignored_flags = _ignored_flags
if traps is None:
self.traps = dc.traps.copy()
elif not isinstance(traps, dict):
self.traps = dict((s, int(s in traps)) for s in _signals)
else:
self.traps = traps
if flags is None:
self.flags = dict.fromkeys(_signals, 0)
elif not isinstance(flags, dict):
self.flags = dict((s, int(s in flags)) for s in _signals)
else:
self.flags = flags
def __repr__(self):
"""Show the current context."""
s = []
s.append('Context(prec=%(prec)d, rounding=%(rounding)s, '
'Emin=%(Emin)d, Emax=%(Emax)d, capitals=%(capitals)d'
% vars(self))
names = [f.__name__ for f, v in self.flags.items() if v]
s.append('flags=[' + ', '.join(names) + ']')
names = [t.__name__ for t, v in self.traps.items() if v]
s.append('traps=[' + ', '.join(names) + ']')
return ', '.join(s) + ')'
def clear_flags(self):
"""Reset all flags to zero"""
for flag in self.flags:
self.flags[flag] = 0
def _shallow_copy(self):
"""Returns a shallow copy from self."""
nc = Context(self.prec, self.rounding, self.traps,
self.flags, self.Emin, self.Emax,
self.capitals, self._clamp, self._ignored_flags)
return nc
def copy(self):
"""Returns a deep copy from self."""
nc = Context(self.prec, self.rounding, self.traps.copy(),
self.flags.copy(), self.Emin, self.Emax,
self.capitals, self._clamp, self._ignored_flags)
return nc
__copy__ = copy
def _raise_error(self, condition, explanation = None, *args):
"""Handles an error
If the flag is in _ignored_flags, returns the default response.
Otherwise, it sets the flag, then, if the corresponding
trap_enabler is set, it reraises the exception. Otherwise, it returns
the default value after setting the flag.
"""
error = _condition_map.get(condition, condition)
if error in self._ignored_flags:
# Don't touch the flag
return error().handle(self, *args)
self.flags[error] = 1
if not self.traps[error]:
# The errors define how to handle themselves.
return condition().handle(self, *args)
# Errors should only be risked on copies of the context
# self._ignored_flags = []
raise error(explanation)
def _ignore_all_flags(self):
"""Ignore all flags, if they are raised"""
return self._ignore_flags(*_signals)
def _ignore_flags(self, *flags):
"""Ignore the flags, if they are raised"""
# Do not mutate-- This way, copies of a context leave the original
# alone.
self._ignored_flags = (self._ignored_flags + list(flags))
return list(flags)
def _regard_flags(self, *flags):
"""Stop ignoring the flags, if they are raised"""
if flags and isinstance(flags[0], (tuple,list)):
flags = flags[0]
for flag in flags:
self._ignored_flags.remove(flag)
# We inherit object.__hash__, so we must deny this explicitly
__hash__ = None
def Etiny(self):
"""Returns Etiny (= Emin - prec + 1)"""
return int(self.Emin - self.prec + 1)
def Etop(self):
"""Returns maximum exponent (= Emax - prec + 1)"""
return int(self.Emax - self.prec + 1)
def _set_rounding(self, type):
"""Sets the rounding type.
Sets the rounding type, and returns the current (previous)
rounding type. Often used like:
context = context.copy()
# so you don't change the calling context
# if an error occurs in the middle.
rounding = context._set_rounding(ROUND_UP)
val = self.__sub__(other, context=context)
context._set_rounding(rounding)
This will make it round up for that operation.
"""
rounding = self.rounding
self.rounding= type
return rounding
def create_decimal(self, num='0'):
"""Creates a new Decimal instance but using self as context.
This method implements the to-number operation of the
IBM Decimal specification."""
if isinstance(num, basestring) and num != num.strip():
return self._raise_error(ConversionSyntax,
"no trailing or leading whitespace is "
"permitted.")
d = Decimal(num, context=self)
if d._isnan() and len(d._int) > self.prec - self._clamp:
return self._raise_error(ConversionSyntax,
"diagnostic info too long in NaN")
return d._fix(self)
def create_decimal_from_float(self, f):
"""Creates a new Decimal instance from a float but rounding using self
as the context.
>>> context = Context(prec=5, rounding=ROUND_DOWN)
>>> context.create_decimal_from_float(3.1415926535897932)
Decimal('3.1415')
>>> context = Context(prec=5, traps=[Inexact])
>>> context.create_decimal_from_float(3.1415926535897932)
Traceback (most recent call last):
...
Inexact: None
"""
d = Decimal.from_float(f) # An exact conversion
return d._fix(self) # Apply the context rounding
# Methods
def abs(self, a):
"""Returns the absolute value of the operand.
If the operand is negative, the result is the same as using the minus
operation on the operand. Otherwise, the result is the same as using
the plus operation on the operand.
>>> ExtendedContext.abs(Decimal('2.1'))
Decimal('2.1')
>>> ExtendedContext.abs(Decimal('-100'))
Decimal('100')
>>> ExtendedContext.abs(Decimal('101.5'))
Decimal('101.5')
>>> ExtendedContext.abs(Decimal('-101.5'))
Decimal('101.5')
>>> ExtendedContext.abs(-1)
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.__abs__(context=self)
def add(self, a, b):
"""Return the sum of the two operands.
>>> ExtendedContext.add(Decimal('12'), Decimal('7.00'))
Decimal('19.00')
>>> ExtendedContext.add(Decimal('1E+2'), Decimal('1.01E+4'))
Decimal('1.02E+4')
>>> ExtendedContext.add(1, Decimal(2))
Decimal('3')
>>> ExtendedContext.add(Decimal(8), 5)
Decimal('13')
>>> ExtendedContext.add(5, 5)
Decimal('10')
"""
a = _convert_other(a, raiseit=True)
r = a.__add__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def _apply(self, a):
return str(a._fix(self))
def canonical(self, a):
"""Returns the same Decimal object.
As we do not have different encodings for the same number, the
received object already is in its canonical form.
>>> ExtendedContext.canonical(Decimal('2.50'))
Decimal('2.50')
"""
return a.canonical(context=self)
def compare(self, a, b):
"""Compares values numerically.
If the signs of the operands differ, a value representing each operand
('-1' if the operand is less than zero, '0' if the operand is zero or
negative zero, or '1' if the operand is greater than zero) is used in
place of that operand for the comparison instead of the actual
operand.
The comparison is then effected by subtracting the second operand from
the first and then returning a value according to the result of the
subtraction: '-1' if the result is less than zero, '0' if the result is
zero or negative zero, or '1' if the result is greater than zero.
>>> ExtendedContext.compare(Decimal('2.1'), Decimal('3'))
Decimal('-1')
>>> ExtendedContext.compare(Decimal('2.1'), Decimal('2.1'))
Decimal('0')
>>> ExtendedContext.compare(Decimal('2.1'), Decimal('2.10'))
Decimal('0')
>>> ExtendedContext.compare(Decimal('3'), Decimal('2.1'))
Decimal('1')
>>> ExtendedContext.compare(Decimal('2.1'), Decimal('-3'))
Decimal('1')
>>> ExtendedContext.compare(Decimal('-3'), Decimal('2.1'))
Decimal('-1')
>>> ExtendedContext.compare(1, 2)
Decimal('-1')
>>> ExtendedContext.compare(Decimal(1), 2)
Decimal('-1')
>>> ExtendedContext.compare(1, Decimal(2))
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.compare(b, context=self)
def compare_signal(self, a, b):
"""Compares the values of the two operands numerically.
It's pretty much like compare(), but all NaNs signal, with signaling
NaNs taking precedence over quiet NaNs.
>>> c = ExtendedContext
>>> c.compare_signal(Decimal('2.1'), Decimal('3'))
Decimal('-1')
>>> c.compare_signal(Decimal('2.1'), Decimal('2.1'))
Decimal('0')
>>> c.flags[InvalidOperation] = 0
>>> print c.flags[InvalidOperation]
0
>>> c.compare_signal(Decimal('NaN'), Decimal('2.1'))
Decimal('NaN')
>>> print c.flags[InvalidOperation]
1
>>> c.flags[InvalidOperation] = 0
>>> print c.flags[InvalidOperation]
0
>>> c.compare_signal(Decimal('sNaN'), Decimal('2.1'))
Decimal('NaN')
>>> print c.flags[InvalidOperation]
1
>>> c.compare_signal(-1, 2)
Decimal('-1')
>>> c.compare_signal(Decimal(-1), 2)
Decimal('-1')
>>> c.compare_signal(-1, Decimal(2))
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.compare_signal(b, context=self)
def compare_total(self, a, b):
"""Compares two operands using their abstract representation.
This is not like the standard compare, which use their numerical
value. Note that a total ordering is defined for all possible abstract
representations.
>>> ExtendedContext.compare_total(Decimal('12.73'), Decimal('127.9'))
Decimal('-1')
>>> ExtendedContext.compare_total(Decimal('-127'), Decimal('12'))
Decimal('-1')
>>> ExtendedContext.compare_total(Decimal('12.30'), Decimal('12.3'))
Decimal('-1')
>>> ExtendedContext.compare_total(Decimal('12.30'), Decimal('12.30'))
Decimal('0')
>>> ExtendedContext.compare_total(Decimal('12.3'), Decimal('12.300'))
Decimal('1')
>>> ExtendedContext.compare_total(Decimal('12.3'), Decimal('NaN'))
Decimal('-1')
>>> ExtendedContext.compare_total(1, 2)
Decimal('-1')
>>> ExtendedContext.compare_total(Decimal(1), 2)
Decimal('-1')
>>> ExtendedContext.compare_total(1, Decimal(2))
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.compare_total(b)
def compare_total_mag(self, a, b):
"""Compares two operands using their abstract representation ignoring sign.
Like compare_total, but with operand's sign ignored and assumed to be 0.
"""
a = _convert_other(a, raiseit=True)
return a.compare_total_mag(b)
def copy_abs(self, a):
"""Returns a copy of the operand with the sign set to 0.
>>> ExtendedContext.copy_abs(Decimal('2.1'))
Decimal('2.1')
>>> ExtendedContext.copy_abs(Decimal('-100'))
Decimal('100')
>>> ExtendedContext.copy_abs(-1)
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.copy_abs()
def copy_decimal(self, a):
"""Returns a copy of the decimal object.
>>> ExtendedContext.copy_decimal(Decimal('2.1'))
Decimal('2.1')
>>> ExtendedContext.copy_decimal(Decimal('-1.00'))
Decimal('-1.00')
>>> ExtendedContext.copy_decimal(1)
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return Decimal(a)
def copy_negate(self, a):
"""Returns a copy of the operand with the sign inverted.
>>> ExtendedContext.copy_negate(Decimal('101.5'))
Decimal('-101.5')
>>> ExtendedContext.copy_negate(Decimal('-101.5'))
Decimal('101.5')
>>> ExtendedContext.copy_negate(1)
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.copy_negate()
def copy_sign(self, a, b):
"""Copies the second operand's sign to the first one.
In detail, it returns a copy of the first operand with the sign
equal to the sign of the second operand.
>>> ExtendedContext.copy_sign(Decimal( '1.50'), Decimal('7.33'))
Decimal('1.50')
>>> ExtendedContext.copy_sign(Decimal('-1.50'), Decimal('7.33'))
Decimal('1.50')
>>> ExtendedContext.copy_sign(Decimal( '1.50'), Decimal('-7.33'))
Decimal('-1.50')
>>> ExtendedContext.copy_sign(Decimal('-1.50'), Decimal('-7.33'))
Decimal('-1.50')
>>> ExtendedContext.copy_sign(1, -2)
Decimal('-1')
>>> ExtendedContext.copy_sign(Decimal(1), -2)
Decimal('-1')
>>> ExtendedContext.copy_sign(1, Decimal(-2))
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.copy_sign(b)
def divide(self, a, b):
"""Decimal division in a specified context.
>>> ExtendedContext.divide(Decimal('1'), Decimal('3'))
Decimal('0.333333333')
>>> ExtendedContext.divide(Decimal('2'), Decimal('3'))
Decimal('0.666666667')
>>> ExtendedContext.divide(Decimal('5'), Decimal('2'))
Decimal('2.5')
>>> ExtendedContext.divide(Decimal('1'), Decimal('10'))
Decimal('0.1')
>>> ExtendedContext.divide(Decimal('12'), Decimal('12'))
Decimal('1')
>>> ExtendedContext.divide(Decimal('8.00'), Decimal('2'))
Decimal('4.00')
>>> ExtendedContext.divide(Decimal('2.400'), Decimal('2.0'))
Decimal('1.20')
>>> ExtendedContext.divide(Decimal('1000'), Decimal('100'))
Decimal('10')
>>> ExtendedContext.divide(Decimal('1000'), Decimal('1'))
Decimal('1000')
>>> ExtendedContext.divide(Decimal('2.40E+6'), Decimal('2'))
Decimal('1.20E+6')
>>> ExtendedContext.divide(5, 5)
Decimal('1')
>>> ExtendedContext.divide(Decimal(5), 5)
Decimal('1')
>>> ExtendedContext.divide(5, Decimal(5))
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
r = a.__div__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def divide_int(self, a, b):
"""Divides two numbers and returns the integer part of the result.
>>> ExtendedContext.divide_int(Decimal('2'), Decimal('3'))
Decimal('0')
>>> ExtendedContext.divide_int(Decimal('10'), Decimal('3'))
Decimal('3')
>>> ExtendedContext.divide_int(Decimal('1'), Decimal('0.3'))
Decimal('3')
>>> ExtendedContext.divide_int(10, 3)
Decimal('3')
>>> ExtendedContext.divide_int(Decimal(10), 3)
Decimal('3')
>>> ExtendedContext.divide_int(10, Decimal(3))
Decimal('3')
"""
a = _convert_other(a, raiseit=True)
r = a.__floordiv__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def divmod(self, a, b):
"""Return (a // b, a % b).
>>> ExtendedContext.divmod(Decimal(8), Decimal(3))
(Decimal('2'), Decimal('2'))
>>> ExtendedContext.divmod(Decimal(8), Decimal(4))
(Decimal('2'), Decimal('0'))
>>> ExtendedContext.divmod(8, 4)
(Decimal('2'), Decimal('0'))
>>> ExtendedContext.divmod(Decimal(8), 4)
(Decimal('2'), Decimal('0'))
>>> ExtendedContext.divmod(8, Decimal(4))
(Decimal('2'), Decimal('0'))
"""
a = _convert_other(a, raiseit=True)
r = a.__divmod__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def exp(self, a):
"""Returns e ** a.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.exp(Decimal('-Infinity'))
Decimal('0')
>>> c.exp(Decimal('-1'))
Decimal('0.367879441')
>>> c.exp(Decimal('0'))
Decimal('1')
>>> c.exp(Decimal('1'))
Decimal('2.71828183')
>>> c.exp(Decimal('0.693147181'))
Decimal('2.00000000')
>>> c.exp(Decimal('+Infinity'))
Decimal('Infinity')
>>> c.exp(10)
Decimal('22026.4658')
"""
a =_convert_other(a, raiseit=True)
return a.exp(context=self)
def fma(self, a, b, c):
"""Returns a multiplied by b, plus c.
The first two operands are multiplied together, using multiply,
the third operand is then added to the result of that
multiplication, using add, all with only one final rounding.
>>> ExtendedContext.fma(Decimal('3'), Decimal('5'), Decimal('7'))
Decimal('22')
>>> ExtendedContext.fma(Decimal('3'), Decimal('-5'), Decimal('7'))
Decimal('-8')
>>> ExtendedContext.fma(Decimal('888565290'), Decimal('1557.96930'), Decimal('-86087.7578'))
Decimal('1.38435736E+12')
>>> ExtendedContext.fma(1, 3, 4)
Decimal('7')
>>> ExtendedContext.fma(1, Decimal(3), 4)
Decimal('7')
>>> ExtendedContext.fma(1, 3, Decimal(4))
Decimal('7')
"""
a = _convert_other(a, raiseit=True)
return a.fma(b, c, context=self)
def is_canonical(self, a):
"""Return True if the operand is canonical; otherwise return False.
Currently, the encoding of a Decimal instance is always
canonical, so this method returns True for any Decimal.
>>> ExtendedContext.is_canonical(Decimal('2.50'))
True
"""
return a.is_canonical()
def is_finite(self, a):
"""Return True if the operand is finite; otherwise return False.
A Decimal instance is considered finite if it is neither
infinite nor a NaN.
>>> ExtendedContext.is_finite(Decimal('2.50'))
True
>>> ExtendedContext.is_finite(Decimal('-0.3'))
True
>>> ExtendedContext.is_finite(Decimal('0'))
True
>>> ExtendedContext.is_finite(Decimal('Inf'))
False
>>> ExtendedContext.is_finite(Decimal('NaN'))
False
>>> ExtendedContext.is_finite(1)
True
"""
a = _convert_other(a, raiseit=True)
return a.is_finite()
def is_infinite(self, a):
"""Return True if the operand is infinite; otherwise return False.
>>> ExtendedContext.is_infinite(Decimal('2.50'))
False
>>> ExtendedContext.is_infinite(Decimal('-Inf'))
True
>>> ExtendedContext.is_infinite(Decimal('NaN'))
False
>>> ExtendedContext.is_infinite(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_infinite()
def is_nan(self, a):
"""Return True if the operand is a qNaN or sNaN;
otherwise return False.
>>> ExtendedContext.is_nan(Decimal('2.50'))
False
>>> ExtendedContext.is_nan(Decimal('NaN'))
True
>>> ExtendedContext.is_nan(Decimal('-sNaN'))
True
>>> ExtendedContext.is_nan(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_nan()
def is_normal(self, a):
"""Return True if the operand is a normal number;
otherwise return False.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.is_normal(Decimal('2.50'))
True
>>> c.is_normal(Decimal('0.1E-999'))
False
>>> c.is_normal(Decimal('0.00'))
False
>>> c.is_normal(Decimal('-Inf'))
False
>>> c.is_normal(Decimal('NaN'))
False
>>> c.is_normal(1)
True
"""
a = _convert_other(a, raiseit=True)
return a.is_normal(context=self)
def is_qnan(self, a):
"""Return True if the operand is a quiet NaN; otherwise return False.
>>> ExtendedContext.is_qnan(Decimal('2.50'))
False
>>> ExtendedContext.is_qnan(Decimal('NaN'))
True
>>> ExtendedContext.is_qnan(Decimal('sNaN'))
False
>>> ExtendedContext.is_qnan(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_qnan()
def is_signed(self, a):
"""Return True if the operand is negative; otherwise return False.
>>> ExtendedContext.is_signed(Decimal('2.50'))
False
>>> ExtendedContext.is_signed(Decimal('-12'))
True
>>> ExtendedContext.is_signed(Decimal('-0'))
True
>>> ExtendedContext.is_signed(8)
False
>>> ExtendedContext.is_signed(-8)
True
"""
a = _convert_other(a, raiseit=True)
return a.is_signed()
def is_snan(self, a):
"""Return True if the operand is a signaling NaN;
otherwise return False.
>>> ExtendedContext.is_snan(Decimal('2.50'))
False
>>> ExtendedContext.is_snan(Decimal('NaN'))
False
>>> ExtendedContext.is_snan(Decimal('sNaN'))
True
>>> ExtendedContext.is_snan(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_snan()
def is_subnormal(self, a):
"""Return True if the operand is subnormal; otherwise return False.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.is_subnormal(Decimal('2.50'))
False
>>> c.is_subnormal(Decimal('0.1E-999'))
True
>>> c.is_subnormal(Decimal('0.00'))
False
>>> c.is_subnormal(Decimal('-Inf'))
False
>>> c.is_subnormal(Decimal('NaN'))
False
>>> c.is_subnormal(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_subnormal(context=self)
def is_zero(self, a):
"""Return True if the operand is a zero; otherwise return False.
>>> ExtendedContext.is_zero(Decimal('0'))
True
>>> ExtendedContext.is_zero(Decimal('2.50'))
False
>>> ExtendedContext.is_zero(Decimal('-0E+2'))
True
>>> ExtendedContext.is_zero(1)
False
>>> ExtendedContext.is_zero(0)
True
"""
a = _convert_other(a, raiseit=True)
return a.is_zero()
def ln(self, a):
"""Returns the natural (base e) logarithm of the operand.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.ln(Decimal('0'))
Decimal('-Infinity')
>>> c.ln(Decimal('1.000'))
Decimal('0')
>>> c.ln(Decimal('2.71828183'))
Decimal('1.00000000')
>>> c.ln(Decimal('10'))
Decimal('2.30258509')
>>> c.ln(Decimal('+Infinity'))
Decimal('Infinity')
>>> c.ln(1)
Decimal('0')
"""
a = _convert_other(a, raiseit=True)
return a.ln(context=self)
def log10(self, a):
"""Returns the base 10 logarithm of the operand.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.log10(Decimal('0'))
Decimal('-Infinity')
>>> c.log10(Decimal('0.001'))
Decimal('-3')
>>> c.log10(Decimal('1.000'))
Decimal('0')
>>> c.log10(Decimal('2'))
Decimal('0.301029996')
>>> c.log10(Decimal('10'))
Decimal('1')
>>> c.log10(Decimal('70'))
Decimal('1.84509804')
>>> c.log10(Decimal('+Infinity'))
Decimal('Infinity')
>>> c.log10(0)
Decimal('-Infinity')
>>> c.log10(1)
Decimal('0')
"""
a = _convert_other(a, raiseit=True)
return a.log10(context=self)
def logb(self, a):
""" Returns the exponent of the magnitude of the operand's MSD.
The result is the integer which is the exponent of the magnitude
of the most significant digit of the operand (as though the
operand were truncated to a single digit while maintaining the
value of that digit and without limiting the resulting exponent).
>>> ExtendedContext.logb(Decimal('250'))
Decimal('2')
>>> ExtendedContext.logb(Decimal('2.50'))
Decimal('0')
>>> ExtendedContext.logb(Decimal('0.03'))
Decimal('-2')
>>> ExtendedContext.logb(Decimal('0'))
Decimal('-Infinity')
>>> ExtendedContext.logb(1)
Decimal('0')
>>> ExtendedContext.logb(10)
Decimal('1')
>>> ExtendedContext.logb(100)
Decimal('2')
"""
a = _convert_other(a, raiseit=True)
return a.logb(context=self)
def logical_and(self, a, b):
"""Applies the logical operation 'and' between each operand's digits.
The operands must be both logical numbers.
>>> ExtendedContext.logical_and(Decimal('0'), Decimal('0'))
Decimal('0')
>>> ExtendedContext.logical_and(Decimal('0'), Decimal('1'))
Decimal('0')
>>> ExtendedContext.logical_and(Decimal('1'), Decimal('0'))
Decimal('0')
>>> ExtendedContext.logical_and(Decimal('1'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.logical_and(Decimal('1100'), Decimal('1010'))
Decimal('1000')
>>> ExtendedContext.logical_and(Decimal('1111'), Decimal('10'))
Decimal('10')
>>> ExtendedContext.logical_and(110, 1101)
Decimal('100')
>>> ExtendedContext.logical_and(Decimal(110), 1101)
Decimal('100')
>>> ExtendedContext.logical_and(110, Decimal(1101))
Decimal('100')
"""
a = _convert_other(a, raiseit=True)
return a.logical_and(b, context=self)
def logical_invert(self, a):
"""Invert all the digits in the operand.
The operand must be a logical number.
>>> ExtendedContext.logical_invert(Decimal('0'))
Decimal('111111111')
>>> ExtendedContext.logical_invert(Decimal('1'))
Decimal('111111110')
>>> ExtendedContext.logical_invert(Decimal('111111111'))
Decimal('0')
>>> ExtendedContext.logical_invert(Decimal('101010101'))
Decimal('10101010')
>>> ExtendedContext.logical_invert(1101)
Decimal('111110010')
"""
a = _convert_other(a, raiseit=True)
return a.logical_invert(context=self)
def logical_or(self, a, b):
"""Applies the logical operation 'or' between each operand's digits.
The operands must be both logical numbers.
>>> ExtendedContext.logical_or(Decimal('0'), Decimal('0'))
Decimal('0')
>>> ExtendedContext.logical_or(Decimal('0'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.logical_or(Decimal('1'), Decimal('0'))
Decimal('1')
>>> ExtendedContext.logical_or(Decimal('1'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.logical_or(Decimal('1100'), Decimal('1010'))
Decimal('1110')
>>> ExtendedContext.logical_or(Decimal('1110'), Decimal('10'))
Decimal('1110')
>>> ExtendedContext.logical_or(110, 1101)
Decimal('1111')
>>> ExtendedContext.logical_or(Decimal(110), 1101)
Decimal('1111')
>>> ExtendedContext.logical_or(110, Decimal(1101))
Decimal('1111')
"""
a = _convert_other(a, raiseit=True)
return a.logical_or(b, context=self)
def logical_xor(self, a, b):
"""Applies the logical operation 'xor' between each operand's digits.
The operands must be both logical numbers.
>>> ExtendedContext.logical_xor(Decimal('0'), Decimal('0'))
Decimal('0')
>>> ExtendedContext.logical_xor(Decimal('0'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.logical_xor(Decimal('1'), Decimal('0'))
Decimal('1')
>>> ExtendedContext.logical_xor(Decimal('1'), Decimal('1'))
Decimal('0')
>>> ExtendedContext.logical_xor(Decimal('1100'), Decimal('1010'))
Decimal('110')
>>> ExtendedContext.logical_xor(Decimal('1111'), Decimal('10'))
Decimal('1101')
>>> ExtendedContext.logical_xor(110, 1101)
Decimal('1011')
>>> ExtendedContext.logical_xor(Decimal(110), 1101)
Decimal('1011')
>>> ExtendedContext.logical_xor(110, Decimal(1101))
Decimal('1011')
"""
a = _convert_other(a, raiseit=True)
return a.logical_xor(b, context=self)
def max(self, a, b):
"""max compares two values numerically and returns the maximum.
If either operand is a NaN then the general rules apply.
Otherwise, the operands are compared as though by the compare
operation. If they are numerically equal then the left-hand operand
is chosen as the result. Otherwise the maximum (closer to positive
infinity) of the two operands is chosen as the result.
>>> ExtendedContext.max(Decimal('3'), Decimal('2'))
Decimal('3')
>>> ExtendedContext.max(Decimal('-10'), Decimal('3'))
Decimal('3')
>>> ExtendedContext.max(Decimal('1.0'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.max(Decimal('7'), Decimal('NaN'))
Decimal('7')
>>> ExtendedContext.max(1, 2)
Decimal('2')
>>> ExtendedContext.max(Decimal(1), 2)
Decimal('2')
>>> ExtendedContext.max(1, Decimal(2))
Decimal('2')
"""
a = _convert_other(a, raiseit=True)
return a.max(b, context=self)
def max_mag(self, a, b):
"""Compares the values numerically with their sign ignored.
>>> ExtendedContext.max_mag(Decimal('7'), Decimal('NaN'))
Decimal('7')
>>> ExtendedContext.max_mag(Decimal('7'), Decimal('-10'))
Decimal('-10')
>>> ExtendedContext.max_mag(1, -2)
Decimal('-2')
>>> ExtendedContext.max_mag(Decimal(1), -2)
Decimal('-2')
>>> ExtendedContext.max_mag(1, Decimal(-2))
Decimal('-2')
"""
a = _convert_other(a, raiseit=True)
return a.max_mag(b, context=self)
def min(self, a, b):
"""min compares two values numerically and returns the minimum.
If either operand is a NaN then the general rules apply.
Otherwise, the operands are compared as though by the compare
operation. If they are numerically equal then the left-hand operand
is chosen as the result. Otherwise the minimum (closer to negative
infinity) of the two operands is chosen as the result.
>>> ExtendedContext.min(Decimal('3'), Decimal('2'))
Decimal('2')
>>> ExtendedContext.min(Decimal('-10'), Decimal('3'))
Decimal('-10')
>>> ExtendedContext.min(Decimal('1.0'), Decimal('1'))
Decimal('1.0')
>>> ExtendedContext.min(Decimal('7'), Decimal('NaN'))
Decimal('7')
>>> ExtendedContext.min(1, 2)
Decimal('1')
>>> ExtendedContext.min(Decimal(1), 2)
Decimal('1')
>>> ExtendedContext.min(1, Decimal(29))
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.min(b, context=self)
def min_mag(self, a, b):
"""Compares the values numerically with their sign ignored.
>>> ExtendedContext.min_mag(Decimal('3'), Decimal('-2'))
Decimal('-2')
>>> ExtendedContext.min_mag(Decimal('-3'), Decimal('NaN'))
Decimal('-3')
>>> ExtendedContext.min_mag(1, -2)
Decimal('1')
>>> ExtendedContext.min_mag(Decimal(1), -2)
Decimal('1')
>>> ExtendedContext.min_mag(1, Decimal(-2))
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.min_mag(b, context=self)
def minus(self, a):
"""Minus corresponds to unary prefix minus in Python.
The operation is evaluated using the same rules as subtract; the
operation minus(a) is calculated as subtract('0', a) where the '0'
has the same exponent as the operand.
>>> ExtendedContext.minus(Decimal('1.3'))
Decimal('-1.3')
>>> ExtendedContext.minus(Decimal('-1.3'))
Decimal('1.3')
>>> ExtendedContext.minus(1)
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.__neg__(context=self)
def multiply(self, a, b):
"""multiply multiplies two operands.
If either operand is a special value then the general rules apply.
Otherwise, the operands are multiplied together
('long multiplication'), resulting in a number which may be as long as
the sum of the lengths of the two operands.
>>> ExtendedContext.multiply(Decimal('1.20'), Decimal('3'))
Decimal('3.60')
>>> ExtendedContext.multiply(Decimal('7'), Decimal('3'))
Decimal('21')
>>> ExtendedContext.multiply(Decimal('0.9'), Decimal('0.8'))
Decimal('0.72')
>>> ExtendedContext.multiply(Decimal('0.9'), Decimal('-0'))
Decimal('-0.0')
>>> ExtendedContext.multiply(Decimal('654321'), Decimal('654321'))
Decimal('4.28135971E+11')
>>> ExtendedContext.multiply(7, 7)
Decimal('49')
>>> ExtendedContext.multiply(Decimal(7), 7)
Decimal('49')
>>> ExtendedContext.multiply(7, Decimal(7))
Decimal('49')
"""
a = _convert_other(a, raiseit=True)
r = a.__mul__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def next_minus(self, a):
"""Returns the largest representable number smaller than a.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> ExtendedContext.next_minus(Decimal('1'))
Decimal('0.999999999')
>>> c.next_minus(Decimal('1E-1007'))
Decimal('0E-1007')
>>> ExtendedContext.next_minus(Decimal('-1.00000003'))
Decimal('-1.00000004')
>>> c.next_minus(Decimal('Infinity'))
Decimal('9.99999999E+999')
>>> c.next_minus(1)
Decimal('0.999999999')
"""
a = _convert_other(a, raiseit=True)
return a.next_minus(context=self)
def next_plus(self, a):
"""Returns the smallest representable number larger than a.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> ExtendedContext.next_plus(Decimal('1'))
Decimal('1.00000001')
>>> c.next_plus(Decimal('-1E-1007'))
Decimal('-0E-1007')
>>> ExtendedContext.next_plus(Decimal('-1.00000003'))
Decimal('-1.00000002')
>>> c.next_plus(Decimal('-Infinity'))
Decimal('-9.99999999E+999')
>>> c.next_plus(1)
Decimal('1.00000001')
"""
a = _convert_other(a, raiseit=True)
return a.next_plus(context=self)
def next_toward(self, a, b):
"""Returns the number closest to a, in direction towards b.
The result is the closest representable number from the first
operand (but not the first operand) that is in the direction
towards the second operand, unless the operands have the same
value.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.next_toward(Decimal('1'), Decimal('2'))
Decimal('1.00000001')
>>> c.next_toward(Decimal('-1E-1007'), Decimal('1'))
Decimal('-0E-1007')
>>> c.next_toward(Decimal('-1.00000003'), Decimal('0'))
Decimal('-1.00000002')
>>> c.next_toward(Decimal('1'), Decimal('0'))
Decimal('0.999999999')
>>> c.next_toward(Decimal('1E-1007'), Decimal('-100'))
Decimal('0E-1007')
>>> c.next_toward(Decimal('-1.00000003'), Decimal('-10'))
Decimal('-1.00000004')
>>> c.next_toward(Decimal('0.00'), Decimal('-0.0000'))
Decimal('-0.00')
>>> c.next_toward(0, 1)
Decimal('1E-1007')
>>> c.next_toward(Decimal(0), 1)
Decimal('1E-1007')
>>> c.next_toward(0, Decimal(1))
Decimal('1E-1007')
"""
a = _convert_other(a, raiseit=True)
return a.next_toward(b, context=self)
def normalize(self, a):
"""normalize reduces an operand to its simplest form.
Essentially a plus operation with all trailing zeros removed from the
result.
>>> ExtendedContext.normalize(Decimal('2.1'))
Decimal('2.1')
>>> ExtendedContext.normalize(Decimal('-2.0'))
Decimal('-2')
>>> ExtendedContext.normalize(Decimal('1.200'))
Decimal('1.2')
>>> ExtendedContext.normalize(Decimal('-120'))
Decimal('-1.2E+2')
>>> ExtendedContext.normalize(Decimal('120.00'))
Decimal('1.2E+2')
>>> ExtendedContext.normalize(Decimal('0.00'))
Decimal('0')
>>> ExtendedContext.normalize(6)
Decimal('6')
"""
a = _convert_other(a, raiseit=True)
return a.normalize(context=self)
def number_class(self, a):
"""Returns an indication of the class of the operand.
The class is one of the following strings:
-sNaN
-NaN
-Infinity
-Normal
-Subnormal
-Zero
+Zero
+Subnormal
+Normal
+Infinity
>>> c = Context(ExtendedContext)
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.number_class(Decimal('Infinity'))
'+Infinity'
>>> c.number_class(Decimal('1E-10'))
'+Normal'
>>> c.number_class(Decimal('2.50'))
'+Normal'
>>> c.number_class(Decimal('0.1E-999'))
'+Subnormal'
>>> c.number_class(Decimal('0'))
'+Zero'
>>> c.number_class(Decimal('-0'))
'-Zero'
>>> c.number_class(Decimal('-0.1E-999'))
'-Subnormal'
>>> c.number_class(Decimal('-1E-10'))
'-Normal'
>>> c.number_class(Decimal('-2.50'))
'-Normal'
>>> c.number_class(Decimal('-Infinity'))
'-Infinity'
>>> c.number_class(Decimal('NaN'))
'NaN'
>>> c.number_class(Decimal('-NaN'))
'NaN'
>>> c.number_class(Decimal('sNaN'))
'sNaN'
>>> c.number_class(123)
'+Normal'
"""
a = _convert_other(a, raiseit=True)
return a.number_class(context=self)
def plus(self, a):
"""Plus corresponds to unary prefix plus in Python.
The operation is evaluated using the same rules as add; the
operation plus(a) is calculated as add('0', a) where the '0'
has the same exponent as the operand.
>>> ExtendedContext.plus(Decimal('1.3'))
Decimal('1.3')
>>> ExtendedContext.plus(Decimal('-1.3'))
Decimal('-1.3')
>>> ExtendedContext.plus(-1)
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.__pos__(context=self)
def power(self, a, b, modulo=None):
"""Raises a to the power of b, to modulo if given.
With two arguments, compute a**b. If a is negative then b
must be integral. The result will be inexact unless b is
integral and the result is finite and can be expressed exactly
in 'precision' digits.
With three arguments, compute (a**b) % modulo. For the
three argument form, the following restrictions on the
arguments hold:
- all three arguments must be integral
- b must be nonnegative
- at least one of a or b must be nonzero
- modulo must be nonzero and have at most 'precision' digits
The result of pow(a, b, modulo) is identical to the result
that would be obtained by computing (a**b) % modulo with
unbounded precision, but is computed more efficiently. It is
always exact.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.power(Decimal('2'), Decimal('3'))
Decimal('8')
>>> c.power(Decimal('-2'), Decimal('3'))
Decimal('-8')
>>> c.power(Decimal('2'), Decimal('-3'))
Decimal('0.125')
>>> c.power(Decimal('1.7'), Decimal('8'))
Decimal('69.7575744')
>>> c.power(Decimal('10'), Decimal('0.301029996'))
Decimal('2.00000000')
>>> c.power(Decimal('Infinity'), Decimal('-1'))
Decimal('0')
>>> c.power(Decimal('Infinity'), Decimal('0'))
Decimal('1')
>>> c.power(Decimal('Infinity'), Decimal('1'))
Decimal('Infinity')
>>> c.power(Decimal('-Infinity'), Decimal('-1'))
Decimal('-0')
>>> c.power(Decimal('-Infinity'), Decimal('0'))
Decimal('1')
>>> c.power(Decimal('-Infinity'), Decimal('1'))
Decimal('-Infinity')
>>> c.power(Decimal('-Infinity'), Decimal('2'))
Decimal('Infinity')
>>> c.power(Decimal('0'), Decimal('0'))
Decimal('NaN')
>>> c.power(Decimal('3'), Decimal('7'), Decimal('16'))
Decimal('11')
>>> c.power(Decimal('-3'), Decimal('7'), Decimal('16'))
Decimal('-11')
>>> c.power(Decimal('-3'), Decimal('8'), Decimal('16'))
Decimal('1')
>>> c.power(Decimal('3'), Decimal('7'), Decimal('-16'))
Decimal('11')
>>> c.power(Decimal('23E12345'), Decimal('67E189'), Decimal('123456789'))
Decimal('11729830')
>>> c.power(Decimal('-0'), Decimal('17'), Decimal('1729'))
Decimal('-0')
>>> c.power(Decimal('-23'), Decimal('0'), Decimal('65537'))
Decimal('1')
>>> ExtendedContext.power(7, 7)
Decimal('823543')
>>> ExtendedContext.power(Decimal(7), 7)
Decimal('823543')
>>> ExtendedContext.power(7, Decimal(7), 2)
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
r = a.__pow__(b, modulo, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def quantize(self, a, b):
"""Returns a value equal to 'a' (rounded), having the exponent of 'b'.
The coefficient of the result is derived from that of the left-hand
operand. It may be rounded using the current rounding setting (if the
exponent is being increased), multiplied by a positive power of ten (if
the exponent is being decreased), or is unchanged (if the exponent is
already equal to that of the right-hand operand).
Unlike other operations, if the length of the coefficient after the
quantize operation would be greater than precision then an Invalid
operation condition is raised. This guarantees that, unless there is
an error condition, the exponent of the result of a quantize is always
equal to that of the right-hand operand.
Also unlike other operations, quantize will never raise Underflow, even
if the result is subnormal and inexact.
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('0.001'))
Decimal('2.170')
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('0.01'))
Decimal('2.17')
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('0.1'))
Decimal('2.2')
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('1e+0'))
Decimal('2')
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('1e+1'))
Decimal('0E+1')
>>> ExtendedContext.quantize(Decimal('-Inf'), Decimal('Infinity'))
Decimal('-Infinity')
>>> ExtendedContext.quantize(Decimal('2'), Decimal('Infinity'))
Decimal('NaN')
>>> ExtendedContext.quantize(Decimal('-0.1'), Decimal('1'))
Decimal('-0')
>>> ExtendedContext.quantize(Decimal('-0'), Decimal('1e+5'))
Decimal('-0E+5')
>>> ExtendedContext.quantize(Decimal('+35236450.6'), Decimal('1e-2'))
Decimal('NaN')
>>> ExtendedContext.quantize(Decimal('-35236450.6'), Decimal('1e-2'))
Decimal('NaN')
>>> ExtendedContext.quantize(Decimal('217'), Decimal('1e-1'))
Decimal('217.0')
>>> ExtendedContext.quantize(Decimal('217'), Decimal('1e-0'))
Decimal('217')
>>> ExtendedContext.quantize(Decimal('217'), Decimal('1e+1'))
Decimal('2.2E+2')
>>> ExtendedContext.quantize(Decimal('217'), Decimal('1e+2'))
Decimal('2E+2')
>>> ExtendedContext.quantize(1, 2)
Decimal('1')
>>> ExtendedContext.quantize(Decimal(1), 2)
Decimal('1')
>>> ExtendedContext.quantize(1, Decimal(2))
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.quantize(b, context=self)
def radix(self):
"""Just returns 10, as this is Decimal, :)
>>> ExtendedContext.radix()
Decimal('10')
"""
return Decimal(10)
def remainder(self, a, b):
"""Returns the remainder from integer division.
The result is the residue of the dividend after the operation of
calculating integer division as described for divide-integer, rounded
to precision digits if necessary. The sign of the result, if
non-zero, is the same as that of the original dividend.
This operation will fail under the same conditions as integer division
(that is, if integer division on the same two operands would fail, the
remainder cannot be calculated).
>>> ExtendedContext.remainder(Decimal('2.1'), Decimal('3'))
Decimal('2.1')
>>> ExtendedContext.remainder(Decimal('10'), Decimal('3'))
Decimal('1')
>>> ExtendedContext.remainder(Decimal('-10'), Decimal('3'))
Decimal('-1')
>>> ExtendedContext.remainder(Decimal('10.2'), Decimal('1'))
Decimal('0.2')
>>> ExtendedContext.remainder(Decimal('10'), Decimal('0.3'))
Decimal('0.1')
>>> ExtendedContext.remainder(Decimal('3.6'), Decimal('1.3'))
Decimal('1.0')
>>> ExtendedContext.remainder(22, 6)
Decimal('4')
>>> ExtendedContext.remainder(Decimal(22), 6)
Decimal('4')
>>> ExtendedContext.remainder(22, Decimal(6))
Decimal('4')
"""
a = _convert_other(a, raiseit=True)
r = a.__mod__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def remainder_near(self, a, b):
"""Returns to be "a - b * n", where n is the integer nearest the exact
value of "x / b" (if two integers are equally near then the even one
is chosen). If the result is equal to 0 then its sign will be the
sign of a.
This operation will fail under the same conditions as integer division
(that is, if integer division on the same two operands would fail, the
remainder cannot be calculated).
>>> ExtendedContext.remainder_near(Decimal('2.1'), Decimal('3'))
Decimal('-0.9')
>>> ExtendedContext.remainder_near(Decimal('10'), Decimal('6'))
Decimal('-2')
>>> ExtendedContext.remainder_near(Decimal('10'), Decimal('3'))
Decimal('1')
>>> ExtendedContext.remainder_near(Decimal('-10'), Decimal('3'))
Decimal('-1')
>>> ExtendedContext.remainder_near(Decimal('10.2'), Decimal('1'))
Decimal('0.2')
>>> ExtendedContext.remainder_near(Decimal('10'), Decimal('0.3'))
Decimal('0.1')
>>> ExtendedContext.remainder_near(Decimal('3.6'), Decimal('1.3'))
Decimal('-0.3')
>>> ExtendedContext.remainder_near(3, 11)
Decimal('3')
>>> ExtendedContext.remainder_near(Decimal(3), 11)
Decimal('3')
>>> ExtendedContext.remainder_near(3, Decimal(11))
Decimal('3')
"""
a = _convert_other(a, raiseit=True)
return a.remainder_near(b, context=self)
def rotate(self, a, b):
"""Returns a rotated copy of a, b times.
The coefficient of the result is a rotated copy of the digits in
the coefficient of the first operand. The number of places of
rotation is taken from the absolute value of the second operand,
with the rotation being to the left if the second operand is
positive or to the right otherwise.
>>> ExtendedContext.rotate(Decimal('34'), Decimal('8'))
Decimal('400000003')
>>> ExtendedContext.rotate(Decimal('12'), Decimal('9'))
Decimal('12')
>>> ExtendedContext.rotate(Decimal('123456789'), Decimal('-2'))
Decimal('891234567')
>>> ExtendedContext.rotate(Decimal('123456789'), Decimal('0'))
Decimal('123456789')
>>> ExtendedContext.rotate(Decimal('123456789'), Decimal('+2'))
Decimal('345678912')
>>> ExtendedContext.rotate(1333333, 1)
Decimal('13333330')
>>> ExtendedContext.rotate(Decimal(1333333), 1)
Decimal('13333330')
>>> ExtendedContext.rotate(1333333, Decimal(1))
Decimal('13333330')
"""
a = _convert_other(a, raiseit=True)
return a.rotate(b, context=self)
def same_quantum(self, a, b):
"""Returns True if the two operands have the same exponent.
The result is never affected by either the sign or the coefficient of
either operand.
>>> ExtendedContext.same_quantum(Decimal('2.17'), Decimal('0.001'))
False
>>> ExtendedContext.same_quantum(Decimal('2.17'), Decimal('0.01'))
True
>>> ExtendedContext.same_quantum(Decimal('2.17'), Decimal('1'))
False
>>> ExtendedContext.same_quantum(Decimal('Inf'), Decimal('-Inf'))
True
>>> ExtendedContext.same_quantum(10000, -1)
True
>>> ExtendedContext.same_quantum(Decimal(10000), -1)
True
>>> ExtendedContext.same_quantum(10000, Decimal(-1))
True
"""
a = _convert_other(a, raiseit=True)
return a.same_quantum(b)
def scaleb (self, a, b):
"""Returns the first operand after adding the second value its exp.
>>> ExtendedContext.scaleb(Decimal('7.50'), Decimal('-2'))
Decimal('0.0750')
>>> ExtendedContext.scaleb(Decimal('7.50'), Decimal('0'))
Decimal('7.50')
>>> ExtendedContext.scaleb(Decimal('7.50'), Decimal('3'))
Decimal('7.50E+3')
>>> ExtendedContext.scaleb(1, 4)
Decimal('1E+4')
>>> ExtendedContext.scaleb(Decimal(1), 4)
Decimal('1E+4')
>>> ExtendedContext.scaleb(1, Decimal(4))
Decimal('1E+4')
"""
a = _convert_other(a, raiseit=True)
return a.scaleb(b, context=self)
def shift(self, a, b):
"""Returns a shifted copy of a, b times.
The coefficient of the result is a shifted copy of the digits
in the coefficient of the first operand. The number of places
to shift is taken from the absolute value of the second operand,
with the shift being to the left if the second operand is
positive or to the right otherwise. Digits shifted into the
coefficient are zeros.
>>> ExtendedContext.shift(Decimal('34'), Decimal('8'))
Decimal('400000000')
>>> ExtendedContext.shift(Decimal('12'), Decimal('9'))
Decimal('0')
>>> ExtendedContext.shift(Decimal('123456789'), Decimal('-2'))
Decimal('1234567')
>>> ExtendedContext.shift(Decimal('123456789'), Decimal('0'))
Decimal('123456789')
>>> ExtendedContext.shift(Decimal('123456789'), Decimal('+2'))
Decimal('345678900')
>>> ExtendedContext.shift(88888888, 2)
Decimal('888888800')
>>> ExtendedContext.shift(Decimal(88888888), 2)
Decimal('888888800')
>>> ExtendedContext.shift(88888888, Decimal(2))
Decimal('888888800')
"""
a = _convert_other(a, raiseit=True)
return a.shift(b, context=self)
def sqrt(self, a):
"""Square root of a non-negative number to context precision.
If the result must be inexact, it is rounded using the round-half-even
algorithm.
>>> ExtendedContext.sqrt(Decimal('0'))
Decimal('0')
>>> ExtendedContext.sqrt(Decimal('-0'))
Decimal('-0')
>>> ExtendedContext.sqrt(Decimal('0.39'))
Decimal('0.624499800')
>>> ExtendedContext.sqrt(Decimal('100'))
Decimal('10')
>>> ExtendedContext.sqrt(Decimal('1'))
Decimal('1')
>>> ExtendedContext.sqrt(Decimal('1.0'))
Decimal('1.0')
>>> ExtendedContext.sqrt(Decimal('1.00'))
Decimal('1.0')
>>> ExtendedContext.sqrt(Decimal('7'))
Decimal('2.64575131')
>>> ExtendedContext.sqrt(Decimal('10'))
Decimal('3.16227766')
>>> ExtendedContext.sqrt(2)
Decimal('1.41421356')
>>> ExtendedContext.prec
9
"""
a = _convert_other(a, raiseit=True)
return a.sqrt(context=self)
def subtract(self, a, b):
"""Return the difference between the two operands.
>>> ExtendedContext.subtract(Decimal('1.3'), Decimal('1.07'))
Decimal('0.23')
>>> ExtendedContext.subtract(Decimal('1.3'), Decimal('1.30'))
Decimal('0.00')
>>> ExtendedContext.subtract(Decimal('1.3'), Decimal('2.07'))
Decimal('-0.77')
>>> ExtendedContext.subtract(8, 5)
Decimal('3')
>>> ExtendedContext.subtract(Decimal(8), 5)
Decimal('3')
>>> ExtendedContext.subtract(8, Decimal(5))
Decimal('3')
"""
a = _convert_other(a, raiseit=True)
r = a.__sub__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def to_eng_string(self, a):
"""Convert to a string, using engineering notation if an exponent is needed.
Engineering notation has an exponent which is a multiple of 3. This
can leave up to 3 digits to the left of the decimal place and may
require the addition of either one or two trailing zeros.
The operation is not affected by the context.
>>> ExtendedContext.to_eng_string(Decimal('123E+1'))
'1.23E+3'
>>> ExtendedContext.to_eng_string(Decimal('123E+3'))
'123E+3'
>>> ExtendedContext.to_eng_string(Decimal('123E-10'))
'12.3E-9'
>>> ExtendedContext.to_eng_string(Decimal('-123E-12'))
'-123E-12'
>>> ExtendedContext.to_eng_string(Decimal('7E-7'))
'700E-9'
>>> ExtendedContext.to_eng_string(Decimal('7E+1'))
'70'
>>> ExtendedContext.to_eng_string(Decimal('0E+1'))
'0.00E+3'
"""
a = _convert_other(a, raiseit=True)
return a.to_eng_string(context=self)
def to_sci_string(self, a):
"""Converts a number to a string, using scientific notation.
The operation is not affected by the context.
"""
a = _convert_other(a, raiseit=True)
return a.__str__(context=self)
def to_integral_exact(self, a):
"""Rounds to an integer.
When the operand has a negative exponent, the result is the same
as using the quantize() operation using the given operand as the
left-hand-operand, 1E+0 as the right-hand-operand, and the precision
of the operand as the precision setting; Inexact and Rounded flags
are allowed in this operation. The rounding mode is taken from the
context.
>>> ExtendedContext.to_integral_exact(Decimal('2.1'))
Decimal('2')
>>> ExtendedContext.to_integral_exact(Decimal('100'))
Decimal('100')
>>> ExtendedContext.to_integral_exact(Decimal('100.0'))
Decimal('100')
>>> ExtendedContext.to_integral_exact(Decimal('101.5'))
Decimal('102')
>>> ExtendedContext.to_integral_exact(Decimal('-101.5'))
Decimal('-102')
>>> ExtendedContext.to_integral_exact(Decimal('10E+5'))
Decimal('1.0E+6')
>>> ExtendedContext.to_integral_exact(Decimal('7.89E+77'))
Decimal('7.89E+77')
>>> ExtendedContext.to_integral_exact(Decimal('-Inf'))
Decimal('-Infinity')
"""
a = _convert_other(a, raiseit=True)
return a.to_integral_exact(context=self)
def to_integral_value(self, a):
"""Rounds to an integer.
When the operand has a negative exponent, the result is the same
as using the quantize() operation using the given operand as the
left-hand-operand, 1E+0 as the right-hand-operand, and the precision
of the operand as the precision setting, except that no flags will
be set. The rounding mode is taken from the context.
>>> ExtendedContext.to_integral_value(Decimal('2.1'))
Decimal('2')
>>> ExtendedContext.to_integral_value(Decimal('100'))
Decimal('100')
>>> ExtendedContext.to_integral_value(Decimal('100.0'))
Decimal('100')
>>> ExtendedContext.to_integral_value(Decimal('101.5'))
Decimal('102')
>>> ExtendedContext.to_integral_value(Decimal('-101.5'))
Decimal('-102')
>>> ExtendedContext.to_integral_value(Decimal('10E+5'))
Decimal('1.0E+6')
>>> ExtendedContext.to_integral_value(Decimal('7.89E+77'))
Decimal('7.89E+77')
>>> ExtendedContext.to_integral_value(Decimal('-Inf'))
Decimal('-Infinity')
"""
a = _convert_other(a, raiseit=True)
return a.to_integral_value(context=self)
# the method name changed, but we provide also the old one, for compatibility
to_integral = to_integral_value
class _WorkRep(object):
__slots__ = ('sign','int','exp')
# sign: 0 or 1
# int: int or long
# exp: None, int, or string
def __init__(self, value=None):
if value is None:
self.sign = None
self.int = 0
self.exp = None
elif isinstance(value, Decimal):
self.sign = value._sign
self.int = int(value._int)
self.exp = value._exp
else:
# assert isinstance(value, tuple)
self.sign = value[0]
self.int = value[1]
self.exp = value[2]
def __repr__(self):
return "(%r, %r, %r)" % (self.sign, self.int, self.exp)
__str__ = __repr__
def _normalize(op1, op2, prec = 0):
"""Normalizes op1, op2 to have the same exp and length of coefficient.
Done during addition.
"""
if op1.exp < op2.exp:
tmp = op2
other = op1
else:
tmp = op1
other = op2
# Let exp = min(tmp.exp - 1, tmp.adjusted() - precision - 1).
# Then adding 10**exp to tmp has the same effect (after rounding)
# as adding any positive quantity smaller than 10**exp; similarly
# for subtraction. So if other is smaller than 10**exp we replace
# it with 10**exp. This avoids tmp.exp - other.exp getting too large.
tmp_len = len(str(tmp.int))
other_len = len(str(other.int))
exp = tmp.exp + min(-1, tmp_len - prec - 2)
if other_len + other.exp - 1 < exp:
other.int = 1
other.exp = exp
tmp.int *= 10 ** (tmp.exp - other.exp)
tmp.exp = other.exp
return op1, op2
##### Integer arithmetic functions used by ln, log10, exp and __pow__ #####
# This function from Tim Peters was taken from here:
# http://mail.python.org/pipermail/python-list/1999-July/007758.html
# The correction being in the function definition is for speed, and
# the whole function is not resolved with math.log because of avoiding
# the use of floats.
def _nbits(n, correction = {
'0': 4, '1': 3, '2': 2, '3': 2,
'4': 1, '5': 1, '6': 1, '7': 1,
'8': 0, '9': 0, 'a': 0, 'b': 0,
'c': 0, 'd': 0, 'e': 0, 'f': 0}):
"""Number of bits in binary representation of the positive integer n,
or 0 if n == 0.
"""
if n < 0:
raise ValueError("The argument to _nbits should be nonnegative.")
hex_n = "%x" % n
return 4*len(hex_n) - correction[hex_n[0]]
def _decimal_lshift_exact(n, e):
""" Given integers n and e, return n * 10**e if it's an integer, else None.
The computation is designed to avoid computing large powers of 10
unnecessarily.
>>> _decimal_lshift_exact(3, 4)
30000
>>> _decimal_lshift_exact(300, -999999999) # returns None
"""
if n == 0:
return 0
elif e >= 0:
return n * 10**e
else:
# val_n = largest power of 10 dividing n.
str_n = str(abs(n))
val_n = len(str_n) - len(str_n.rstrip('0'))
return None if val_n < -e else n // 10**-e
def _sqrt_nearest(n, a):
"""Closest integer to the square root of the positive integer n. a is
an initial approximation to the square root. Any positive integer
will do for a, but the closer a is to the square root of n the
faster convergence will be.
"""
if n <= 0 or a <= 0:
raise ValueError("Both arguments to _sqrt_nearest should be positive.")
b=0
while a != b:
b, a = a, a--n//a>>1
return a
def _rshift_nearest(x, shift):
"""Given an integer x and a nonnegative integer shift, return closest
integer to x / 2**shift; use round-to-even in case of a tie.
"""
b, q = 1L << shift, x >> shift
return q + (2*(x & (b-1)) + (q&1) > b)
def _div_nearest(a, b):
"""Closest integer to a/b, a and b positive integers; rounds to even
in the case of a tie.
"""
q, r = divmod(a, b)
return q + (2*r + (q&1) > b)
def _ilog(x, M, L = 8):
"""Integer approximation to M*log(x/M), with absolute error boundable
in terms only of x/M.
Given positive integers x and M, return an integer approximation to
M * log(x/M). For L = 8 and 0.1 <= x/M <= 10 the difference
between the approximation and the exact result is at most 22. For
L = 8 and 1.0 <= x/M <= 10.0 the difference is at most 15. In
both cases these are upper bounds on the error; it will usually be
much smaller."""
# The basic algorithm is the following: let log1p be the function
# log1p(x) = log(1+x). Then log(x/M) = log1p((x-M)/M). We use
# the reduction
#
# log1p(y) = 2*log1p(y/(1+sqrt(1+y)))
#
# repeatedly until the argument to log1p is small (< 2**-L in
# absolute value). For small y we can use the Taylor series
# expansion
#
# log1p(y) ~ y - y**2/2 + y**3/3 - ... - (-y)**T/T
#
# truncating at T such that y**T is small enough. The whole
# computation is carried out in a form of fixed-point arithmetic,
# with a real number z being represented by an integer
# approximation to z*M. To avoid loss of precision, the y below
# is actually an integer approximation to 2**R*y*M, where R is the
# number of reductions performed so far.
y = x-M
# argument reduction; R = number of reductions performed
R = 0
while (R <= L and long(abs(y)) << L-R >= M or
R > L and abs(y) >> R-L >= M):
y = _div_nearest(long(M*y) << 1,
M + _sqrt_nearest(M*(M+_rshift_nearest(y, R)), M))
R += 1
# Taylor series with T terms
T = -int(-10*len(str(M))//(3*L))
yshift = _rshift_nearest(y, R)
w = _div_nearest(M, T)
for k in xrange(T-1, 0, -1):
w = _div_nearest(M, k) - _div_nearest(yshift*w, M)
return _div_nearest(w*y, M)
def _dlog10(c, e, p):
"""Given integers c, e and p with c > 0, p >= 0, compute an integer
approximation to 10**p * log10(c*10**e), with an absolute error of
at most 1. Assumes that c*10**e is not exactly 1."""
# increase precision by 2; compensate for this by dividing
# final result by 100
p += 2
# write c*10**e as d*10**f with either:
# f >= 0 and 1 <= d <= 10, or
# f <= 0 and 0.1 <= d <= 1.
# Thus for c*10**e close to 1, f = 0
l = len(str(c))
f = e+l - (e+l >= 1)
if p > 0:
M = 10**p
k = e+p-f
if k >= 0:
c *= 10**k
else:
c = _div_nearest(c, 10**-k)
log_d = _ilog(c, M) # error < 5 + 22 = 27
log_10 = _log10_digits(p) # error < 1
log_d = _div_nearest(log_d*M, log_10)
log_tenpower = f*M # exact
else:
log_d = 0 # error < 2.31
log_tenpower = _div_nearest(f, 10**-p) # error < 0.5
return _div_nearest(log_tenpower+log_d, 100)
def _dlog(c, e, p):
"""Given integers c, e and p with c > 0, compute an integer
approximation to 10**p * log(c*10**e), with an absolute error of
at most 1. Assumes that c*10**e is not exactly 1."""
# Increase precision by 2. The precision increase is compensated
# for at the end with a division by 100.
p += 2
# rewrite c*10**e as d*10**f with either f >= 0 and 1 <= d <= 10,
# or f <= 0 and 0.1 <= d <= 1. Then we can compute 10**p * log(c*10**e)
# as 10**p * log(d) + 10**p*f * log(10).
l = len(str(c))
f = e+l - (e+l >= 1)
# compute approximation to 10**p*log(d), with error < 27
if p > 0:
k = e+p-f
if k >= 0:
c *= 10**k
else:
c = _div_nearest(c, 10**-k) # error of <= 0.5 in c
# _ilog magnifies existing error in c by a factor of at most 10
log_d = _ilog(c, 10**p) # error < 5 + 22 = 27
else:
# p <= 0: just approximate the whole thing by 0; error < 2.31
log_d = 0
# compute approximation to f*10**p*log(10), with error < 11.
if f:
extra = len(str(abs(f)))-1
if p + extra >= 0:
# error in f * _log10_digits(p+extra) < |f| * 1 = |f|
# after division, error < |f|/10**extra + 0.5 < 10 + 0.5 < 11
f_log_ten = _div_nearest(f*_log10_digits(p+extra), 10**extra)
else:
f_log_ten = 0
else:
f_log_ten = 0
# error in sum < 11+27 = 38; error after division < 0.38 + 0.5 < 1
return _div_nearest(f_log_ten + log_d, 100)
class _Log10Memoize(object):
"""Class to compute, store, and allow retrieval of, digits of the
constant log(10) = 2.302585.... This constant is needed by
Decimal.ln, Decimal.log10, Decimal.exp and Decimal.__pow__."""
def __init__(self):
self.digits = "23025850929940456840179914546843642076011014886"
def getdigits(self, p):
"""Given an integer p >= 0, return floor(10**p)*log(10).
For example, self.getdigits(3) returns 2302.
"""
# digits are stored as a string, for quick conversion to
# integer in the case that we've already computed enough
# digits; the stored digits should always be correct
# (truncated, not rounded to nearest).
if p < 0:
raise ValueError("p should be nonnegative")
if p >= len(self.digits):
# compute p+3, p+6, p+9, ... digits; continue until at
# least one of the extra digits is nonzero
extra = 3
while True:
# compute p+extra digits, correct to within 1ulp
M = 10**(p+extra+2)
digits = str(_div_nearest(_ilog(10*M, M), 100))
if digits[-extra:] != '0'*extra:
break
extra += 3
# keep all reliable digits so far; remove trailing zeros
# and next nonzero digit
self.digits = digits.rstrip('0')[:-1]
return int(self.digits[:p+1])
_log10_digits = _Log10Memoize().getdigits
def _iexp(x, M, L=8):
"""Given integers x and M, M > 0, such that x/M is small in absolute
value, compute an integer approximation to M*exp(x/M). For 0 <=
x/M <= 2.4, the absolute error in the result is bounded by 60 (and
is usually much smaller)."""
# Algorithm: to compute exp(z) for a real number z, first divide z
# by a suitable power R of 2 so that |z/2**R| < 2**-L. Then
# compute expm1(z/2**R) = exp(z/2**R) - 1 using the usual Taylor
# series
#
# expm1(x) = x + x**2/2! + x**3/3! + ...
#
# Now use the identity
#
# expm1(2x) = expm1(x)*(expm1(x)+2)
#
# R times to compute the sequence expm1(z/2**R),
# expm1(z/2**(R-1)), ... , exp(z/2), exp(z).
# Find R such that x/2**R/M <= 2**-L
R = _nbits((long(x)<<L)//M)
# Taylor series. (2**L)**T > M
T = -int(-10*len(str(M))//(3*L))
y = _div_nearest(x, T)
Mshift = long(M)<<R
for i in xrange(T-1, 0, -1):
y = _div_nearest(x*(Mshift + y), Mshift * i)
# Expansion
for k in xrange(R-1, -1, -1):
Mshift = long(M)<<(k+2)
y = _div_nearest(y*(y+Mshift), Mshift)
return M+y
def _dexp(c, e, p):
"""Compute an approximation to exp(c*10**e), with p decimal places of
precision.
Returns integers d, f such that:
10**(p-1) <= d <= 10**p, and
(d-1)*10**f < exp(c*10**e) < (d+1)*10**f
In other words, d*10**f is an approximation to exp(c*10**e) with p
digits of precision, and with an error in d of at most 1. This is
almost, but not quite, the same as the error being < 1ulp: when d
= 10**(p-1) the error could be up to 10 ulp."""
# we'll call iexp with M = 10**(p+2), giving p+3 digits of precision
p += 2
# compute log(10) with extra precision = adjusted exponent of c*10**e
extra = max(0, e + len(str(c)) - 1)
q = p + extra
# compute quotient c*10**e/(log(10)) = c*10**(e+q)/(log(10)*10**q),
# rounding down
shift = e+q
if shift >= 0:
cshift = c*10**shift
else:
cshift = c//10**-shift
quot, rem = divmod(cshift, _log10_digits(q))
# reduce remainder back to original precision
rem = _div_nearest(rem, 10**extra)
# error in result of _iexp < 120; error after division < 0.62
return _div_nearest(_iexp(rem, 10**p), 1000), quot - p + 3
def _dpower(xc, xe, yc, ye, p):
"""Given integers xc, xe, yc and ye representing Decimals x = xc*10**xe and
y = yc*10**ye, compute x**y. Returns a pair of integers (c, e) such that:
10**(p-1) <= c <= 10**p, and
(c-1)*10**e < x**y < (c+1)*10**e
in other words, c*10**e is an approximation to x**y with p digits
of precision, and with an error in c of at most 1. (This is
almost, but not quite, the same as the error being < 1ulp: when c
== 10**(p-1) we can only guarantee error < 10ulp.)
We assume that: x is positive and not equal to 1, and y is nonzero.
"""
# Find b such that 10**(b-1) <= |y| <= 10**b
b = len(str(abs(yc))) + ye
# log(x) = lxc*10**(-p-b-1), to p+b+1 places after the decimal point
lxc = _dlog(xc, xe, p+b+1)
# compute product y*log(x) = yc*lxc*10**(-p-b-1+ye) = pc*10**(-p-1)
shift = ye-b
if shift >= 0:
pc = lxc*yc*10**shift
else:
pc = _div_nearest(lxc*yc, 10**-shift)
if pc == 0:
# we prefer a result that isn't exactly 1; this makes it
# easier to compute a correctly rounded result in __pow__
if ((len(str(xc)) + xe >= 1) == (yc > 0)): # if x**y > 1:
coeff, exp = 10**(p-1)+1, 1-p
else:
coeff, exp = 10**p-1, -p
else:
coeff, exp = _dexp(pc, -(p+1), p+1)
coeff = _div_nearest(coeff, 10)
exp += 1
return coeff, exp
def _log10_lb(c, correction = {
'1': 100, '2': 70, '3': 53, '4': 40, '5': 31,
'6': 23, '7': 16, '8': 10, '9': 5}):
"""Compute a lower bound for 100*log10(c) for a positive integer c."""
if c <= 0:
raise ValueError("The argument to _log10_lb should be nonnegative.")
str_c = str(c)
return 100*len(str_c) - correction[str_c[0]]
##### Helper Functions ####################################################
def _convert_other(other, raiseit=False, allow_float=False):
"""Convert other to Decimal.
Verifies that it's ok to use in an implicit construction.
If allow_float is true, allow conversion from float; this
is used in the comparison methods (__eq__ and friends).
"""
if isinstance(other, Decimal):
return other
if isinstance(other, (int, long)):
return Decimal(other)
if allow_float and isinstance(other, float):
return Decimal.from_float(other)
import sys
if sys.platform == 'cli':
import System
if isinstance(other, System.Decimal):
return Decimal(other)
if raiseit:
raise TypeError("Unable to convert %s to Decimal" % other)
return NotImplemented
##### Setup Specific Contexts ############################################
# The default context prototype used by Context()
# Is mutable, so that new contexts can have different default values
DefaultContext = Context(
prec=28, rounding=ROUND_HALF_EVEN,
traps=[DivisionByZero, Overflow, InvalidOperation],
flags=[],
Emax=999999999,
Emin=-999999999,
capitals=1
)
# Pre-made alternate contexts offered by the specification
# Don't change these; the user should be able to select these
# contexts and be able to reproduce results from other implementations
# of the spec.
BasicContext = Context(
prec=9, rounding=ROUND_HALF_UP,
traps=[DivisionByZero, Overflow, InvalidOperation, Clamped, Underflow],
flags=[],
)
ExtendedContext = Context(
prec=9, rounding=ROUND_HALF_EVEN,
traps=[],
flags=[],
)
##### crud for parsing strings #############################################
#
# Regular expression used for parsing numeric strings. Additional
# comments:
#
# 1. Uncomment the two '\s*' lines to allow leading and/or trailing
# whitespace. But note that the specification disallows whitespace in
# a numeric string.
#
# 2. For finite numbers (not infinities and NaNs) the body of the
# number between the optional sign and the optional exponent must have
# at least one decimal digit, possibly after the decimal point. The
# lookahead expression '(?=\d|\.\d)' checks this.
import re
_parser = re.compile(r""" # A numeric string consists of:
# \s*
(?P<sign>[-+])? # an optional sign, followed by either...
(
(?=\d|\.\d) # ...a number (with at least one digit)
(?P<int>\d*) # having a (possibly empty) integer part
(\.(?P<frac>\d*))? # followed by an optional fractional part
(E(?P<exp>[-+]?\d+))? # followed by an optional exponent, or...
|
Inf(inity)? # ...an infinity, or...
|
(?P<signal>s)? # ...an (optionally signaling)
NaN # NaN
(?P<diag>\d*) # with (possibly empty) diagnostic info.
)
# \s*
\Z
""", re.VERBOSE | re.IGNORECASE | re.UNICODE).match
_all_zeros = re.compile('0*$').match
_exact_half = re.compile('50*$').match
##### PEP3101 support functions ##############################################
# The functions in this section have little to do with the Decimal
# class, and could potentially be reused or adapted for other pure
# Python numeric classes that want to implement __format__
#
# A format specifier for Decimal looks like:
#
# [[fill]align][sign][0][minimumwidth][,][.precision][type]
_parse_format_specifier_regex = re.compile(r"""\A
(?:
(?P<fill>.)?
(?P<align>[<>=^])
)?
(?P<sign>[-+ ])?
(?P<zeropad>0)?
(?P<minimumwidth>(?!0)\d+)?
(?P<thousands_sep>,)?
(?:\.(?P<precision>0|(?!0)\d+))?
(?P<type>[eEfFgGn%])?
\Z
""", re.VERBOSE)
del re
# The locale module is only needed for the 'n' format specifier. The
# rest of the PEP 3101 code functions quite happily without it, so we
# don't care too much if locale isn't present.
try:
import locale as _locale
except ImportError:
pass
def _parse_format_specifier(format_spec, _localeconv=None):
"""Parse and validate a format specifier.
Turns a standard numeric format specifier into a dict, with the
following entries:
fill: fill character to pad field to minimum width
align: alignment type, either '<', '>', '=' or '^'
sign: either '+', '-' or ' '
minimumwidth: nonnegative integer giving minimum width
zeropad: boolean, indicating whether to pad with zeros
thousands_sep: string to use as thousands separator, or ''
grouping: grouping for thousands separators, in format
used by localeconv
decimal_point: string to use for decimal point
precision: nonnegative integer giving precision, or None
type: one of the characters 'eEfFgG%', or None
unicode: boolean (always True for Python 3.x)
"""
m = _parse_format_specifier_regex.match(format_spec)
if m is None:
raise ValueError("Invalid format specifier: " + format_spec)
# get the dictionary
format_dict = m.groupdict()
# zeropad; defaults for fill and alignment. If zero padding
# is requested, the fill and align fields should be absent.
fill = format_dict['fill']
align = format_dict['align']
format_dict['zeropad'] = (format_dict['zeropad'] is not None)
if format_dict['zeropad']:
if fill is not None:
raise ValueError("Fill character conflicts with '0'"
" in format specifier: " + format_spec)
if align is not None:
raise ValueError("Alignment conflicts with '0' in "
"format specifier: " + format_spec)
format_dict['fill'] = fill or ' '
# PEP 3101 originally specified that the default alignment should
# be left; it was later agreed that right-aligned makes more sense
# for numeric types. See http://bugs.python.org/issue6857.
format_dict['align'] = align or '>'
# default sign handling: '-' for negative, '' for positive
if format_dict['sign'] is None:
format_dict['sign'] = '-'
# minimumwidth defaults to 0; precision remains None if not given
format_dict['minimumwidth'] = int(format_dict['minimumwidth'] or '0')
if format_dict['precision'] is not None:
format_dict['precision'] = int(format_dict['precision'])
# if format type is 'g' or 'G' then a precision of 0 makes little
# sense; convert it to 1. Same if format type is unspecified.
if format_dict['precision'] == 0:
if format_dict['type'] is None or format_dict['type'] in 'gG':
format_dict['precision'] = 1
# determine thousands separator, grouping, and decimal separator, and
# add appropriate entries to format_dict
if format_dict['type'] == 'n':
# apart from separators, 'n' behaves just like 'g'
format_dict['type'] = 'g'
if _localeconv is None:
_localeconv = _locale.localeconv()
if format_dict['thousands_sep'] is not None:
raise ValueError("Explicit thousands separator conflicts with "
"'n' type in format specifier: " + format_spec)
format_dict['thousands_sep'] = _localeconv['thousands_sep']
format_dict['grouping'] = _localeconv['grouping']
format_dict['decimal_point'] = _localeconv['decimal_point']
else:
if format_dict['thousands_sep'] is None:
format_dict['thousands_sep'] = ''
format_dict['grouping'] = [3, 0]
format_dict['decimal_point'] = '.'
# record whether return type should be str or unicode
try:
format_dict['unicode'] = isinstance(format_spec, unicode)
except NameError:
format_dict['unicode'] = False
return format_dict
def _format_align(sign, body, spec):
"""Given an unpadded, non-aligned numeric string 'body' and sign
string 'sign', add padding and alignment conforming to the given
format specifier dictionary 'spec' (as produced by
parse_format_specifier).
Also converts result to unicode if necessary.
"""
# how much extra space do we have to play with?
minimumwidth = spec['minimumwidth']
fill = spec['fill']
padding = fill*(minimumwidth - len(sign) - len(body))
align = spec['align']
if align == '<':
result = sign + body + padding
elif align == '>':
result = padding + sign + body
elif align == '=':
result = sign + padding + body
elif align == '^':
half = len(padding)//2
result = padding[:half] + sign + body + padding[half:]
else:
raise ValueError('Unrecognised alignment field')
# make sure that result is unicode if necessary
if spec['unicode']:
result = unicode(result)
return result
def _group_lengths(grouping):
"""Convert a localeconv-style grouping into a (possibly infinite)
iterable of integers representing group lengths.
"""
# The result from localeconv()['grouping'], and the input to this
# function, should be a list of integers in one of the
# following three forms:
#
# (1) an empty list, or
# (2) nonempty list of positive integers + [0]
# (3) list of positive integers + [locale.CHAR_MAX], or
from itertools import chain, repeat
if not grouping:
return []
elif grouping[-1] == 0 and len(grouping) >= 2:
return chain(grouping[:-1], repeat(grouping[-2]))
elif grouping[-1] == _locale.CHAR_MAX:
return grouping[:-1]
else:
raise ValueError('unrecognised format for grouping')
def _insert_thousands_sep(digits, spec, min_width=1):
"""Insert thousands separators into a digit string.
spec is a dictionary whose keys should include 'thousands_sep' and
'grouping'; typically it's the result of parsing the format
specifier using _parse_format_specifier.
The min_width keyword argument gives the minimum length of the
result, which will be padded on the left with zeros if necessary.
If necessary, the zero padding adds an extra '0' on the left to
avoid a leading thousands separator. For example, inserting
commas every three digits in '123456', with min_width=8, gives
'0,123,456', even though that has length 9.
"""
sep = spec['thousands_sep']
grouping = spec['grouping']
groups = []
for l in _group_lengths(grouping):
if l <= 0:
raise ValueError("group length should be positive")
# max(..., 1) forces at least 1 digit to the left of a separator
l = min(max(len(digits), min_width, 1), l)
groups.append('0'*(l - len(digits)) + digits[-l:])
digits = digits[:-l]
min_width -= l
if not digits and min_width <= 0:
break
min_width -= len(sep)
else:
l = max(len(digits), min_width, 1)
groups.append('0'*(l - len(digits)) + digits[-l:])
return sep.join(reversed(groups))
def _format_sign(is_negative, spec):
"""Determine sign character."""
if is_negative:
return '-'
elif spec['sign'] in ' +':
return spec['sign']
else:
return ''
def _format_number(is_negative, intpart, fracpart, exp, spec):
"""Format a number, given the following data:
is_negative: true if the number is negative, else false
intpart: string of digits that must appear before the decimal point
fracpart: string of digits that must come after the point
exp: exponent, as an integer
spec: dictionary resulting from parsing the format specifier
This function uses the information in spec to:
insert separators (decimal separator and thousands separators)
format the sign
format the exponent
add trailing '%' for the '%' type
zero-pad if necessary
fill and align if necessary
"""
sign = _format_sign(is_negative, spec)
if fracpart:
fracpart = spec['decimal_point'] + fracpart
if exp != 0 or spec['type'] in 'eE':
echar = {'E': 'E', 'e': 'e', 'G': 'E', 'g': 'e'}[spec['type']]
fracpart += "{0}{1:+}".format(echar, exp)
if spec['type'] == '%':
fracpart += '%'
if spec['zeropad']:
min_width = spec['minimumwidth'] - len(fracpart) - len(sign)
else:
min_width = 0
intpart = _insert_thousands_sep(intpart, spec, min_width)
return _format_align(sign, intpart+fracpart, spec)
##### Useful Constants (internal use only) ################################
# Reusable defaults
_Infinity = Decimal('Inf')
_NegativeInfinity = Decimal('-Inf')
_NaN = Decimal('NaN')
_Zero = Decimal(0)
_One = Decimal(1)
_NegativeOne = Decimal(-1)
# _SignedInfinity[sign] is infinity w/ that sign
_SignedInfinity = (_Infinity, _NegativeInfinity)
if __name__ == '__main__':
import doctest, sys
doctest.testmod(sys.modules[__name__])
"""
Module difflib -- helpers for computing deltas between objects.
Function get_close_matches(word, possibilities, n=3, cutoff=0.6):
Use SequenceMatcher to return list of the best "good enough" matches.
Function context_diff(a, b):
For two lists of strings, return a delta in context diff format.
Function ndiff(a, b):
Return a delta: the difference between `a` and `b` (lists of strings).
Function restore(delta, which):
Return one of the two sequences that generated an ndiff delta.
Function unified_diff(a, b):
For two lists of strings, return a delta in unified diff format.
Class SequenceMatcher:
A flexible class for comparing pairs of sequences of any type.
Class Differ:
For producing human-readable deltas from sequences of lines of text.
Class HtmlDiff:
For producing HTML side by side comparison with change highlights.
"""
__all__ = ['get_close_matches', 'ndiff', 'restore', 'SequenceMatcher',
'Differ','IS_CHARACTER_JUNK', 'IS_LINE_JUNK', 'context_diff',
'unified_diff', 'HtmlDiff', 'Match']
import heapq
from collections import namedtuple as _namedtuple
from functools import reduce
Match = _namedtuple('Match', 'a b size')
def _calculate_ratio(matches, length):
if length:
return 2.0 * matches / length
return 1.0
class SequenceMatcher:
"""
SequenceMatcher is a flexible class for comparing pairs of sequences of
any type, so long as the sequence elements are hashable. The basic
algorithm predates, and is a little fancier than, an algorithm
published in the late 1980's by Ratcliff and Obershelp under the
hyperbolic name "gestalt pattern matching". The basic idea is to find
the longest contiguous matching subsequence that contains no "junk"
elements (R-O doesn't address junk). The same idea is then applied
recursively to the pieces of the sequences to the left and to the right
of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that "look right" to people.
SequenceMatcher tries to compute a "human-friendly diff" between two
sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the
longest *contiguous* & junk-free matching subsequence. That's what
catches peoples' eyes. The Windows(tm) windiff has another interesting
notion, pairing up elements that appear uniquely in each sequence.
That, and the method here, appear to yield more intuitive difference
reports than does diff. This method appears to be the least vulnerable
to synching up on blocks of "junk lines", though (like blank lines in
ordinary text files, or maybe "<P>" lines in HTML files). That may be
because this is the only method of the 3 that has a *concept* of
"junk" <wink>.
Example, comparing two strings, and considering blanks to be "junk":
>>> s = SequenceMatcher(lambda x: x == " ",
... "private Thread currentThread;",
... "private volatile Thread currentThread;")
>>>
.ratio() returns a float in [0, 1], measuring the "similarity" of the
sequences. As a rule of thumb, a .ratio() value over 0.6 means the
sequences are close matches:
>>> print round(s.ratio(), 3)
0.866
>>>
If you're only interested in where the sequences match,
.get_matching_blocks() is handy:
>>> for block in s.get_matching_blocks():
... print "a[%d] and b[%d] match for %d elements" % block
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 21 elements
a[29] and b[38] match for 0 elements
Note that the last tuple returned by .get_matching_blocks() is always a
dummy, (len(a), len(b), 0), and this is the only case in which the last
tuple element (number of elements matched) is 0.
If you want to know how to change the first sequence into the second,
use .get_opcodes():
>>> for opcode in s.get_opcodes():
... print "%6s a[%d:%d] b[%d:%d]" % opcode
equal a[0:8] b[0:8]
insert a[8:8] b[8:17]
equal a[8:29] b[17:38]
See the Differ class for a fancy human-friendly file differencer, which
uses SequenceMatcher both to compare sequences of lines, and to compare
sequences of characters within similar (near-matching) lines.
See also function get_close_matches() in this module, which shows how
simple code building on SequenceMatcher can be used to do useful work.
Timing: Basic R-O is cubic time worst case and quadratic time expected
case. SequenceMatcher is quadratic time for the worst case and has
expected-case behavior dependent in a complicated way on how many
elements the sequences have in common; best case time is linear.
Methods:
__init__(isjunk=None, a='', b='')
Construct a SequenceMatcher.
set_seqs(a, b)
Set the two sequences to be compared.
set_seq1(a)
Set the first sequence to be compared.
set_seq2(b)
Set the second sequence to be compared.
find_longest_match(alo, ahi, blo, bhi)
Find longest matching block in a[alo:ahi] and b[blo:bhi].
get_matching_blocks()
Return list of triples describing matching subsequences.
get_opcodes()
Return list of 5-tuples describing how to turn a into b.
ratio()
Return a measure of the sequences' similarity (float in [0,1]).
quick_ratio()
Return an upper bound on .ratio() relatively quickly.
real_quick_ratio()
Return an upper bound on ratio() very quickly.
"""
def __init__(self, isjunk=None, a='', b='', autojunk=True):
"""Construct a SequenceMatcher.
Optional arg isjunk is None (the default), or a one-argument
function that takes a sequence element and returns true iff the
element is junk. None is equivalent to passing "lambda x: 0", i.e.
no elements are considered to be junk. For example, pass
lambda x: x in " \\t"
if you're comparing lines as sequences of characters, and don't
want to synch up on blanks or hard tabs.
Optional arg a is the first of two sequences to be compared. By
default, an empty string. The elements of a must be hashable. See
also .set_seqs() and .set_seq1().
Optional arg b is the second of two sequences to be compared. By
default, an empty string. The elements of b must be hashable. See
also .set_seqs() and .set_seq2().
Optional arg autojunk should be set to False to disable the
"automatic junk heuristic" that treats popular elements as junk
(see module documentation for more information).
"""
# Members:
# a
# first sequence
# b
# second sequence; differences are computed as "what do
# we need to do to 'a' to change it into 'b'?"
# b2j
# for x in b, b2j[x] is a list of the indices (into b)
# at which x appears; junk elements do not appear
# fullbcount
# for x in b, fullbcount[x] == the number of times x
# appears in b; only materialized if really needed (used
# only for computing quick_ratio())
# matching_blocks
# a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k];
# ascending & non-overlapping in i and in j; terminated by
# a dummy (len(a), len(b), 0) sentinel
# opcodes
# a list of (tag, i1, i2, j1, j2) tuples, where tag is
# one of
# 'replace' a[i1:i2] should be replaced by b[j1:j2]
# 'delete' a[i1:i2] should be deleted
# 'insert' b[j1:j2] should be inserted
# 'equal' a[i1:i2] == b[j1:j2]
# isjunk
# a user-supplied function taking a sequence element and
# returning true iff the element is "junk" -- this has
# subtle but helpful effects on the algorithm, which I'll
# get around to writing up someday <0.9 wink>.
# DON'T USE! Only __chain_b uses this. Use isbjunk.
# isbjunk
# for x in b, isbjunk(x) == isjunk(x) but much faster;
# it's really the __contains__ method of a hidden dict.
# DOES NOT WORK for x in a!
# isbpopular
# for x in b, isbpopular(x) is true iff b is reasonably long
# (at least 200 elements) and x accounts for more than 1 + 1% of
# its elements (when autojunk is enabled).
# DOES NOT WORK for x in a!
self.isjunk = isjunk
self.a = self.b = None
self.autojunk = autojunk
self.set_seqs(a, b)
def set_seqs(self, a, b):
"""Set the two sequences to be compared.
>>> s = SequenceMatcher()
>>> s.set_seqs("abcd", "bcde")
>>> s.ratio()
0.75
"""
self.set_seq1(a)
self.set_seq2(b)
def set_seq1(self, a):
"""Set the first sequence to be compared.
The second sequence to be compared is not changed.
>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio()
0.75
>>> s.set_seq1("bcde")
>>> s.ratio()
1.0
>>>
SequenceMatcher computes and caches detailed information about the
second sequence, so if you want to compare one sequence S against
many sequences, use .set_seq2(S) once and call .set_seq1(x)
repeatedly for each of the other sequences.
See also set_seqs() and set_seq2().
"""
if a is self.a:
return
self.a = a
self.matching_blocks = self.opcodes = None
def set_seq2(self, b):
"""Set the second sequence to be compared.
The first sequence to be compared is not changed.
>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio()
0.75
>>> s.set_seq2("abcd")
>>> s.ratio()
1.0
>>>
SequenceMatcher computes and caches detailed information about the
second sequence, so if you want to compare one sequence S against
many sequences, use .set_seq2(S) once and call .set_seq1(x)
repeatedly for each of the other sequences.
See also set_seqs() and set_seq1().
"""
if b is self.b:
return
self.b = b
self.matching_blocks = self.opcodes = None
self.fullbcount = None
self.__chain_b()
# For each element x in b, set b2j[x] to a list of the indices in
# b where x appears; the indices are in increasing order; note that
# the number of times x appears in b is len(b2j[x]) ...
# when self.isjunk is defined, junk elements don't show up in this
# map at all, which stops the central find_longest_match method
# from starting any matching block at a junk element ...
# also creates the fast isbjunk function ...
# b2j also does not contain entries for "popular" elements, meaning
# elements that account for more than 1 + 1% of the total elements, and
# when the sequence is reasonably large (>= 200 elements); this can
# be viewed as an adaptive notion of semi-junk, and yields an enormous
# speedup when, e.g., comparing program files with hundreds of
# instances of "return NULL;" ...
# note that this is only called when b changes; so for cross-product
# kinds of matches, it's best to call set_seq2 once, then set_seq1
# repeatedly
def __chain_b(self):
# Because isjunk is a user-defined (not C) function, and we test
# for junk a LOT, it's important to minimize the number of calls.
# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.
# The first trick is to build b2j ignoring the possibility
# of junk. I.e., we don't call isjunk at all yet. Throwing
# out the junk later is much cheaper than building b2j "right"
# from the start.
b = self.b
self.b2j = b2j = {}
for i, elt in enumerate(b):
indices = b2j.setdefault(elt, [])
indices.append(i)
# Purge junk elements
junk = set()
isjunk = self.isjunk
if isjunk:
for elt in list(b2j.keys()): # using list() since b2j is modified
if isjunk(elt):
junk.add(elt)
del b2j[elt]
# Purge popular elements that are not junk
popular = set()
n = len(b)
if self.autojunk and n >= 200:
ntest = n // 100 + 1
for elt, idxs in list(b2j.items()):
if len(idxs) > ntest:
popular.add(elt)
del b2j[elt]
# Now for x in b, isjunk(x) == x in junk, but the latter is much faster.
# Sicne the number of *unique* junk elements is probably small, the
# memory burden of keeping this set alive is likely trivial compared to
# the size of b2j.
self.isbjunk = junk.__contains__
self.isbpopular = popular.__contains__
def find_longest_match(self, alo, ahi, blo, bhi):
"""Find longest matching block in a[alo:ahi] and b[blo:bhi].
If isjunk is not defined:
Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where
alo <= i <= i+k <= ahi
blo <= j <= j+k <= bhi
and for all (i',j',k') meeting those conditions,
k >= k'
i <= i'
and if i == i', j <= j'
In other words, of all maximal matching blocks, return one that
starts earliest in a, and of all those maximal matching blocks that
start earliest in a, return the one that starts earliest in b.
>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=0, b=4, size=5)
If isjunk is defined, first the longest matching block is
determined as above, but with the additional restriction that no
junk element appears in the block. Then that block is extended as
far as possible by matching (only) junk elements on both sides. So
the resulting block never matches on junk except as identical junk
happens to be adjacent to an "interesting" match.
Here's the same example as before, but considering blanks to be
junk. That prevents " abcd" from matching the " abcd" at the tail
end of the second sequence directly. Instead only the "abcd" can
match, and matches the leftmost "abcd" in the second sequence:
>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=1, b=0, size=4)
If no blocks match, return (alo, blo, 0).
>>> s = SequenceMatcher(None, "ab", "c")
>>> s.find_longest_match(0, 2, 0, 1)
Match(a=0, b=0, size=0)
"""
# CAUTION: stripping common prefix or suffix would be incorrect.
# E.g.,
# ab
# acab
# Longest matching block is "ab", but if common prefix is
# stripped, it's "a" (tied with "b"). UNIX(tm) diff does so
# strip, so ends up claiming that ab is changed to acab by
# inserting "ca" in the middle. That's minimal but unintuitive:
# "it's obvious" that someone inserted "ac" at the front.
# Windiff ends up at the same place as diff, but by pairing up
# the unique 'b's and then matching the first two 'a's.
a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.isbjunk
besti, bestj, bestsize = alo, blo, 0
# find longest junk-free match
# during an iteration of the loop, j2len[j] = length of longest
# junk-free match ending with a[i-1] and b[j]
j2len = {}
nothing = []
for i in xrange(alo, ahi):
# look at all instances of a[i] in b; note that because
# b2j has no junk keys, the loop is skipped if a[i] is junk
j2lenget = j2len.get
newj2len = {}
for j in b2j.get(a[i], nothing):
# a[i] matches b[j]
if j < blo:
continue
if j >= bhi:
break
k = newj2len[j] = j2lenget(j-1, 0) + 1
if k > bestsize:
besti, bestj, bestsize = i-k+1, j-k+1, k
j2len = newj2len
# Extend the best by non-junk elements on each end. In particular,
# "popular" non-junk elements aren't in b2j, which greatly speeds
# the inner loop above, but also means "the best" match so far
# doesn't contain any junk *or* popular non-junk elements.
while besti > alo and bestj > blo and \
not isbjunk(b[bestj-1]) and \
a[besti-1] == b[bestj-1]:
besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
while besti+bestsize < ahi and bestj+bestsize < bhi and \
not isbjunk(b[bestj+bestsize]) and \
a[besti+bestsize] == b[bestj+bestsize]:
bestsize += 1
# Now that we have a wholly interesting match (albeit possibly
# empty!), we may as well suck up the matching junk on each
# side of it too. Can't think of a good reason not to, and it
# saves post-processing the (possibly considerable) expense of
# figuring out what to do with it. In the case of an empty
# interesting match, this is clearly the right thing to do,
# because no other kind of match is possible in the regions.
while besti > alo and bestj > blo and \
isbjunk(b[bestj-1]) and \
a[besti-1] == b[bestj-1]:
besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
while besti+bestsize < ahi and bestj+bestsize < bhi and \
isbjunk(b[bestj+bestsize]) and \
a[besti+bestsize] == b[bestj+bestsize]:
bestsize = bestsize + 1
return Match(besti, bestj, bestsize)
def get_matching_blocks(self):
"""Return list of triples describing matching subsequences.
Each triple is of the form (i, j, n), and means that
a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in
i and in j. New in Python 2.5, it's also guaranteed that if
(i, j, n) and (i', j', n') are adjacent triples in the list, and
the second is not the last triple in the list, then i+n != i' or
j+n != j'. IOW, adjacent triples never describe adjacent equal
blocks.
The last triple is a dummy, (len(a), len(b), 0), and is the only
triple with n==0.
>>> s = SequenceMatcher(None, "abxcd", "abcd")
>>> s.get_matching_blocks()
[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
"""
if self.matching_blocks is not None:
return self.matching_blocks
la, lb = len(self.a), len(self.b)
# This is most naturally expressed as a recursive algorithm, but
# at least one user bumped into extreme use cases that exceeded
# the recursion limit on their box. So, now we maintain a list
# ('queue`) of blocks we still need to look at, and append partial
# results to `matching_blocks` in a loop; the matches are sorted
# at the end.
queue = [(0, la, 0, lb)]
matching_blocks = []
while queue:
alo, ahi, blo, bhi = queue.pop()
i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)
# a[alo:i] vs b[blo:j] unknown
# a[i:i+k] same as b[j:j+k]
# a[i+k:ahi] vs b[j+k:bhi] unknown
if k: # if k is 0, there was no matching block
matching_blocks.append(x)
if alo < i and blo < j:
queue.append((alo, i, blo, j))
if i+k < ahi and j+k < bhi:
queue.append((i+k, ahi, j+k, bhi))
matching_blocks.sort()
# It's possible that we have adjacent equal blocks in the
# matching_blocks list now. Starting with 2.5, this code was added
# to collapse them.
i1 = j1 = k1 = 0
non_adjacent = []
for i2, j2, k2 in matching_blocks:
# Is this block adjacent to i1, j1, k1?
if i1 + k1 == i2 and j1 + k1 == j2:
# Yes, so collapse them -- this just increases the length of
# the first block by the length of the second, and the first
# block so lengthened remains the block to compare against.
k1 += k2
else:
# Not adjacent. Remember the first block (k1==0 means it's
# the dummy we started with), and make the second block the
# new block to compare against.
if k1:
non_adjacent.append((i1, j1, k1))
i1, j1, k1 = i2, j2, k2
if k1:
non_adjacent.append((i1, j1, k1))
non_adjacent.append( (la, lb, 0) )
self.matching_blocks = map(Match._make, non_adjacent)
return self.matching_blocks
def get_opcodes(self):
"""Return list of 5-tuples describing how to turn a into b.
Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple
has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the
tuple preceding it, and likewise for j1 == the previous j2.
The tags are strings, with these meanings:
'replace': a[i1:i2] should be replaced by b[j1:j2]
'delete': a[i1:i2] should be deleted.
Note that j1==j2 in this case.
'insert': b[j1:j2] should be inserted at a[i1:i1].
Note that i1==i2 in this case.
'equal': a[i1:i2] == b[j1:j2]
>>> a = "qabxcd"
>>> b = "abycdf"
>>> s = SequenceMatcher(None, a, b)
>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
delete a[0:1] (q) b[0:0] ()
equal a[1:3] (ab) b[0:2] (ab)
replace a[3:4] (x) b[2:3] (y)
equal a[4:6] (cd) b[3:5] (cd)
insert a[6:6] () b[5:6] (f)
"""
if self.opcodes is not None:
return self.opcodes
i = j = 0
self.opcodes = answer = []
for ai, bj, size in self.get_matching_blocks():
# invariant: we've pumped out correct diffs to change
# a[:i] into b[:j], and the next matching block is
# a[ai:ai+size] == b[bj:bj+size]. So we need to pump
# out a diff to change a[i:ai] into b[j:bj], pump out
# the matching block, and move (i,j) beyond the match
tag = ''
if i < ai and j < bj:
tag = 'replace'
elif i < ai:
tag = 'delete'
elif j < bj:
tag = 'insert'
if tag:
answer.append( (tag, i, ai, j, bj) )
i, j = ai+size, bj+size
# the list of matching blocks is terminated by a
# sentinel with size 0
if size:
answer.append( ('equal', ai, i, bj, j) )
return answer
def get_grouped_opcodes(self, n=3):
""" Isolate change clusters by eliminating ranges with no changes.
Return a generator of groups with up to n lines of context.
Each group is in the same format as returned by get_opcodes().
>>> from pprint import pprint
>>> a = map(str, range(1,40))
>>> b = a[:]
>>> b[8:8] = ['i'] # Make an insertion
>>> b[20] += 'x' # Make a replacement
>>> b[23:28] = [] # Make a deletion
>>> b[30] += 'y' # Make another replacement
>>> pprint(list(SequenceMatcher(None,a,b).get_grouped_opcodes()))
[[('equal', 5, 8, 5, 8), ('insert', 8, 8, 8, 9), ('equal', 8, 11, 9, 12)],
[('equal', 16, 19, 17, 20),
('replace', 19, 20, 20, 21),
('equal', 20, 22, 21, 23),
('delete', 22, 27, 23, 23),
('equal', 27, 30, 23, 26)],
[('equal', 31, 34, 27, 30),
('replace', 34, 35, 30, 31),
('equal', 35, 38, 31, 34)]]
"""
codes = self.get_opcodes()
if not codes:
codes = [("equal", 0, 1, 0, 1)]
# Fixup leading and trailing groups if they show no changes.
if codes[0][0] == 'equal':
tag, i1, i2, j1, j2 = codes[0]
codes[0] = tag, max(i1, i2-n), i2, max(j1, j2-n), j2
if codes[-1][0] == 'equal':
tag, i1, i2, j1, j2 = codes[-1]
codes[-1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n)
nn = n + n
group = []
for tag, i1, i2, j1, j2 in codes:
# End the current group and start a new one whenever
# there is a large range with no changes.
if tag == 'equal' and i2-i1 > nn:
group.append((tag, i1, min(i2, i1+n), j1, min(j2, j1+n)))
yield group
group = []
i1, j1 = max(i1, i2-n), max(j1, j2-n)
group.append((tag, i1, i2, j1 ,j2))
if group and not (len(group)==1 and group[0][0] == 'equal'):
yield group
def ratio(self):
"""Return a measure of the sequences' similarity (float in [0,1]).
Where T is the total number of elements in both sequences, and
M is the number of matches, this is 2.0*M / T.
Note that this is 1 if the sequences are identical, and 0 if
they have nothing in common.
.ratio() is expensive to compute if you haven't already computed
.get_matching_blocks() or .get_opcodes(), in which case you may
want to try .quick_ratio() or .real_quick_ratio() first to get an
upper bound.
>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio()
0.75
>>> s.quick_ratio()
0.75
>>> s.real_quick_ratio()
1.0
"""
matches = reduce(lambda sum, triple: sum + triple[-1],
self.get_matching_blocks(), 0)
return _calculate_ratio(matches, len(self.a) + len(self.b))
def quick_ratio(self):
"""Return an upper bound on ratio() relatively quickly.
This isn't defined beyond that it is an upper bound on .ratio(), and
is faster to compute.
"""
# viewing a and b as multisets, set matches to the cardinality
# of their intersection; this counts the number of matches
# without regard to order, so is clearly an upper bound
if self.fullbcount is None:
self.fullbcount = fullbcount = {}
for elt in self.b:
fullbcount[elt] = fullbcount.get(elt, 0) + 1
fullbcount = self.fullbcount
# avail[x] is the number of times x appears in 'b' less the
# number of times we've seen it in 'a' so far ... kinda
avail = {}
availhas, matches = avail.__contains__, 0
for elt in self.a:
if availhas(elt):
numb = avail[elt]
else:
numb = fullbcount.get(elt, 0)
avail[elt] = numb - 1
if numb > 0:
matches = matches + 1
return _calculate_ratio(matches, len(self.a) + len(self.b))
def real_quick_ratio(self):
"""Return an upper bound on ratio() very quickly.
This isn't defined beyond that it is an upper bound on .ratio(), and
is faster to compute than either .ratio() or .quick_ratio().
"""
la, lb = len(self.a), len(self.b)
# can't have more matches than the number of elements in the
# shorter sequence
return _calculate_ratio(min(la, lb), la + lb)
def get_close_matches(word, possibilities, n=3, cutoff=0.6):
"""Use SequenceMatcher to return list of the best "good enough" matches.
word is a sequence for which close matches are desired (typically a
string).
possibilities is a list of sequences against which to match word
(typically a list of strings).
Optional arg n (default 3) is the maximum number of close matches to
return. n must be > 0.
Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities
that don't score at least that similar to word are ignored.
The best (no more than n) matches among the possibilities are returned
in a list, sorted by similarity score, most similar first.
>>> get_close_matches("appel", ["ape", "apple", "peach", "puppy"])
['apple', 'ape']
>>> import keyword as _keyword
>>> get_close_matches("wheel", _keyword.kwlist)
['while']
>>> get_close_matches("apple", _keyword.kwlist)
[]
>>> get_close_matches("accept", _keyword.kwlist)
['except']
"""
if not n > 0:
raise ValueError("n must be > 0: %r" % (n,))
if not 0.0 <= cutoff <= 1.0:
raise ValueError("cutoff must be in [0.0, 1.0]: %r" % (cutoff,))
result = []
s = SequenceMatcher()
s.set_seq2(word)
for x in possibilities:
s.set_seq1(x)
if s.real_quick_ratio() >= cutoff and \
s.quick_ratio() >= cutoff and \
s.ratio() >= cutoff:
result.append((s.ratio(), x))
# Move the best scorers to head of list
result = heapq.nlargest(n, result)
# Strip scores for the best n matches
return [x for score, x in result]
def _count_leading(line, ch):
"""
Return number of `ch` characters at the start of `line`.
Example:
>>> _count_leading(' abc', ' ')
3
"""
i, n = 0, len(line)
while i < n and line[i] == ch:
i += 1
return i
class Differ:
r"""
Differ is a class for comparing sequences of lines of text, and
producing human-readable differences or deltas. Differ uses
SequenceMatcher both to compare sequences of lines, and to compare
sequences of characters within similar (near-matching) lines.
Each line of a Differ delta begins with a two-letter code:
'- ' line unique to sequence 1
'+ ' line unique to sequence 2
' ' line common to both sequences
'? ' line not present in either input sequence
Lines beginning with '? ' attempt to guide the eye to intraline
differences, and were not present in either input sequence. These lines
can be confusing if the sequences contain tab characters.
Note that Differ makes no claim to produce a *minimal* diff. To the
contrary, minimal diffs are often counter-intuitive, because they synch
up anywhere possible, sometimes accidental matches 100 pages apart.
Restricting synch points to contiguous matches preserves some notion of
locality, at the occasional cost of producing a longer diff.
Example: Comparing two texts.
First we set up the texts, sequences of individual single-line strings
ending with newlines (such sequences can also be obtained from the
`readlines()` method of file-like objects):
>>> text1 = ''' 1. Beautiful is better than ugly.
... 2. Explicit is better than implicit.
... 3. Simple is better than complex.
... 4. Complex is better than complicated.
... '''.splitlines(1)
>>> len(text1)
4
>>> text1[0][-1]
'\n'
>>> text2 = ''' 1. Beautiful is better than ugly.
... 3. Simple is better than complex.
... 4. Complicated is better than complex.
... 5. Flat is better than nested.
... '''.splitlines(1)
Next we instantiate a Differ object:
>>> d = Differ()
Note that when instantiating a Differ object we may pass functions to
filter out line and character 'junk'. See Differ.__init__ for details.
Finally, we compare the two:
>>> result = list(d.compare(text1, text2))
'result' is a list of strings, so let's pretty-print it:
>>> from pprint import pprint as _pprint
>>> _pprint(result)
[' 1. Beautiful is better than ugly.\n',
'- 2. Explicit is better than implicit.\n',
'- 3. Simple is better than complex.\n',
'+ 3. Simple is better than complex.\n',
'? ++\n',
'- 4. Complex is better than complicated.\n',
'? ^ ---- ^\n',
'+ 4. Complicated is better than complex.\n',
'? ++++ ^ ^\n',
'+ 5. Flat is better than nested.\n']
As a single multi-line string it looks like this:
>>> print ''.join(result),
1. Beautiful is better than ugly.
- 2. Explicit is better than implicit.
- 3. Simple is better than complex.
+ 3. Simple is better than complex.
? ++
- 4. Complex is better than complicated.
? ^ ---- ^
+ 4. Complicated is better than complex.
? ++++ ^ ^
+ 5. Flat is better than nested.
Methods:
__init__(linejunk=None, charjunk=None)
Construct a text differencer, with optional filters.
compare(a, b)
Compare two sequences of lines; generate the resulting delta.
"""
def __init__(self, linejunk=None, charjunk=None):
"""
Construct a text differencer, with optional filters.
The two optional keyword parameters are for filter functions:
- `linejunk`: A function that should accept a single string argument,
and return true iff the string is junk. The module-level function
`IS_LINE_JUNK` may be used to filter out lines without visible
characters, except for at most one splat ('#'). It is recommended
to leave linejunk None; as of Python 2.3, the underlying
SequenceMatcher class has grown an adaptive notion of "noise" lines
that's better than any static definition the author has ever been
able to craft.
- `charjunk`: A function that should accept a string of length 1. The
module-level function `IS_CHARACTER_JUNK` may be used to filter out
whitespace characters (a blank or tab; **note**: bad idea to include
newline in this!). Use of IS_CHARACTER_JUNK is recommended.
"""
self.linejunk = linejunk
self.charjunk = charjunk
def compare(self, a, b):
r"""
Compare two sequences of lines; generate the resulting delta.
Each sequence must contain individual single-line strings ending with
newlines. Such sequences can be obtained from the `readlines()` method
of file-like objects. The delta generated also consists of newline-
terminated strings, ready to be printed as-is via the writeline()
method of a file-like object.
Example:
>>> print ''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1),
... 'ore\ntree\nemu\n'.splitlines(1))),
- one
? ^
+ ore
? ^
- two
- three
? -
+ tree
+ emu
"""
cruncher = SequenceMatcher(self.linejunk, a, b)
for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
if tag == 'replace':
g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
elif tag == 'delete':
g = self._dump('-', a, alo, ahi)
elif tag == 'insert':
g = self._dump('+', b, blo, bhi)
elif tag == 'equal':
g = self._dump(' ', a, alo, ahi)
else:
raise ValueError, 'unknown tag %r' % (tag,)
for line in g:
yield line
def _dump(self, tag, x, lo, hi):
"""Generate comparison results for a same-tagged range."""
for i in xrange(lo, hi):
yield '%s %s' % (tag, x[i])
def _plain_replace(self, a, alo, ahi, b, blo, bhi):
assert alo < ahi and blo < bhi
# dump the shorter block first -- reduces the burden on short-term
# memory if the blocks are of very different sizes
if bhi - blo < ahi - alo:
first = self._dump('+', b, blo, bhi)
second = self._dump('-', a, alo, ahi)
else:
first = self._dump('-', a, alo, ahi)
second = self._dump('+', b, blo, bhi)
for g in first, second:
for line in g:
yield line
def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
r"""
When replacing one block of lines with another, search the blocks
for *similar* lines; the best-matching pair (if any) is used as a
synch point, and intraline difference marking is done on the
similar pair. Lots of work, but often worth it.
Example:
>>> d = Differ()
>>> results = d._fancy_replace(['abcDefghiJkl\n'], 0, 1,
... ['abcdefGhijkl\n'], 0, 1)
>>> print ''.join(results),
- abcDefghiJkl
? ^ ^ ^
+ abcdefGhijkl
? ^ ^ ^
"""
# don't synch up unless the lines have a similarity score of at
# least cutoff; best_ratio tracks the best score seen so far
best_ratio, cutoff = 0.74, 0.75
cruncher = SequenceMatcher(self.charjunk)
eqi, eqj = None, None # 1st indices of equal lines (if any)
# search for the pair that matches best without being identical
# (identical lines must be junk lines, & we don't want to synch up
# on junk -- unless we have to)
for j in xrange(blo, bhi):
bj = b[j]
cruncher.set_seq2(bj)
for i in xrange(alo, ahi):
ai = a[i]
if ai == bj:
if eqi is None:
eqi, eqj = i, j
continue
cruncher.set_seq1(ai)
# computing similarity is expensive, so use the quick
# upper bounds first -- have seen this speed up messy
# compares by a factor of 3.
# note that ratio() is only expensive to compute the first
# time it's called on a sequence pair; the expensive part
# of the computation is cached by cruncher
if cruncher.real_quick_ratio() > best_ratio and \
cruncher.quick_ratio() > best_ratio and \
cruncher.ratio() > best_ratio:
best_ratio, best_i, best_j = cruncher.ratio(), i, j
if best_ratio < cutoff:
# no non-identical "pretty close" pair
if eqi is None:
# no identical pair either -- treat it as a straight replace
for line in self._plain_replace(a, alo, ahi, b, blo, bhi):
yield line
return
# no close pair, but an identical pair -- synch up on that
best_i, best_j, best_ratio = eqi, eqj, 1.0
else:
# there's a close pair, so forget the identical pair (if any)
eqi = None
# a[best_i] very similar to b[best_j]; eqi is None iff they're not
# identical
# pump out diffs from before the synch point
for line in self._fancy_helper(a, alo, best_i, b, blo, best_j):
yield line
# do intraline marking on the synch pair
aelt, belt = a[best_i], b[best_j]
if eqi is None:
# pump out a '-', '?', '+', '?' quad for the synched lines
atags = btags = ""
cruncher.set_seqs(aelt, belt)
for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes():
la, lb = ai2 - ai1, bj2 - bj1
if tag == 'replace':
atags += '^' * la
btags += '^' * lb
elif tag == 'delete':
atags += '-' * la
elif tag == 'insert':
btags += '+' * lb
elif tag == 'equal':
atags += ' ' * la
btags += ' ' * lb
else:
raise ValueError, 'unknown tag %r' % (tag,)
for line in self._qformat(aelt, belt, atags, btags):
yield line
else:
# the synch pair is identical
yield ' ' + aelt
# pump out diffs from after the synch point
for line in self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi):
yield line
def _fancy_helper(self, a, alo, ahi, b, blo, bhi):
g = []
if alo < ahi:
if blo < bhi:
g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
else:
g = self._dump('-', a, alo, ahi)
elif blo < bhi:
g = self._dump('+', b, blo, bhi)
for line in g:
yield line
def _qformat(self, aline, bline, atags, btags):
r"""
Format "?" output and deal with leading tabs.
Example:
>>> d = Differ()
>>> results = d._qformat('\tabcDefghiJkl\n', '\tabcdefGhijkl\n',
... ' ^ ^ ^ ', ' ^ ^ ^ ')
>>> for line in results: print repr(line)
...
'- \tabcDefghiJkl\n'
'? \t ^ ^ ^\n'
'+ \tabcdefGhijkl\n'
'? \t ^ ^ ^\n'
"""
# Can hurt, but will probably help most of the time.
common = min(_count_leading(aline, "\t"),
_count_leading(bline, "\t"))
common = min(common, _count_leading(atags[:common], " "))
common = min(common, _count_leading(btags[:common], " "))
atags = atags[common:].rstrip()
btags = btags[common:].rstrip()
yield "- " + aline
if atags:
yield "? %s%s\n" % ("\t" * common, atags)
yield "+ " + bline
if btags:
yield "? %s%s\n" % ("\t" * common, btags)
# With respect to junk, an earlier version of ndiff simply refused to
# *start* a match with a junk element. The result was cases like this:
# before: private Thread currentThread;
# after: private volatile Thread currentThread;
# If you consider whitespace to be junk, the longest contiguous match
# not starting with junk is "e Thread currentThread". So ndiff reported
# that "e volatil" was inserted between the 't' and the 'e' in "private".
# While an accurate view, to people that's absurd. The current version
# looks for matching blocks that are entirely junk-free, then extends the
# longest one of those as far as possible but only with matching junk.
# So now "currentThread" is matched, then extended to suck up the
# preceding blank; then "private" is matched, and extended to suck up the
# following blank; then "Thread" is matched; and finally ndiff reports
# that "volatile " was inserted before "Thread". The only quibble
# remaining is that perhaps it was really the case that " volatile"
# was inserted after "private". I can live with that <wink>.
import re
def IS_LINE_JUNK(line, pat=re.compile(r"\s*(?:#\s*)?$").match):
r"""
Return 1 for ignorable line: iff `line` is blank or contains a single '#'.
Examples:
>>> IS_LINE_JUNK('\n')
True
>>> IS_LINE_JUNK(' # \n')
True
>>> IS_LINE_JUNK('hello\n')
False
"""
return pat(line) is not None
def IS_CHARACTER_JUNK(ch, ws=" \t"):
r"""
Return 1 for ignorable character: iff `ch` is a space or tab.
Examples:
>>> IS_CHARACTER_JUNK(' ')
True
>>> IS_CHARACTER_JUNK('\t')
True
>>> IS_CHARACTER_JUNK('\n')
False
>>> IS_CHARACTER_JUNK('x')
False
"""
return ch in ws
########################################################################
### Unified Diff
########################################################################
def _format_range_unified(start, stop):
'Convert range to the "ed" format'
# Per the diff spec at http://www.unix.org/single_unix_specification/
beginning = start + 1 # lines start numbering with one
length = stop - start
if length == 1:
return '{}'.format(beginning)
if not length:
beginning -= 1 # empty ranges begin at line just before the range
return '{},{}'.format(beginning, length)
def unified_diff(a, b, fromfile='', tofile='', fromfiledate='',
tofiledate='', n=3, lineterm='\n'):
r"""
Compare two sequences of lines; generate the delta as a unified diff.
Unified diffs are a compact way of showing line changes and a few
lines of context. The number of context lines is set by 'n' which
defaults to three.
By default, the diff control lines (those with ---, +++, or @@) are
created with a trailing newline. This is helpful so that inputs
created from file.readlines() result in diffs that are suitable for
file.writelines() since both the inputs and outputs have trailing
newlines.
For inputs that do not have trailing newlines, set the lineterm
argument to "" so that the output will be uniformly newline free.
The unidiff format normally has a header for filenames and modification
times. Any or all of these may be specified using strings for
'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'.
The modification times are normally expressed in the ISO 8601 format.
Example:
>>> for line in unified_diff('one two three four'.split(),
... 'zero one tree four'.split(), 'Original', 'Current',
... '2005-01-26 23:30:50', '2010-04-02 10:20:52',
... lineterm=''):
... print line # doctest: +NORMALIZE_WHITESPACE
--- Original 2005-01-26 23:30:50
+++ Current 2010-04-02 10:20:52
@@ -1,4 +1,4 @@
+zero
one
-two
-three
+tree
four
"""
started = False
for group in SequenceMatcher(None,a,b).get_grouped_opcodes(n):
if not started:
started = True
fromdate = '\t{}'.format(fromfiledate) if fromfiledate else ''
todate = '\t{}'.format(tofiledate) if tofiledate else ''
yield '--- {}{}{}'.format(fromfile, fromdate, lineterm)
yield '+++ {}{}{}'.format(tofile, todate, lineterm)
first, last = group[0], group[-1]
file1_range = _format_range_unified(first[1], last[2])
file2_range = _format_range_unified(first[3], last[4])
yield '@@ -{} +{} @@{}'.format(file1_range, file2_range, lineterm)
for tag, i1, i2, j1, j2 in group:
if tag == 'equal':
for line in a[i1:i2]:
yield ' ' + line
continue
if tag in ('replace', 'delete'):
for line in a[i1:i2]:
yield '-' + line
if tag in ('replace', 'insert'):
for line in b[j1:j2]:
yield '+' + line
########################################################################
### Context Diff
########################################################################
def _format_range_context(start, stop):
'Convert range to the "ed" format'
# Per the diff spec at http://www.unix.org/single_unix_specification/
beginning = start + 1 # lines start numbering with one
length = stop - start
if not length:
beginning -= 1 # empty ranges begin at line just before the range
if length <= 1:
return '{}'.format(beginning)
return '{},{}'.format(beginning, beginning + length - 1)
# See http://www.unix.org/single_unix_specification/
def context_diff(a, b, fromfile='', tofile='',
fromfiledate='', tofiledate='', n=3, lineterm='\n'):
r"""
Compare two sequences of lines; generate the delta as a context diff.
Context diffs are a compact way of showing line changes and a few
lines of context. The number of context lines is set by 'n' which
defaults to three.
By default, the diff control lines (those with *** or ---) are
created with a trailing newline. This is helpful so that inputs
created from file.readlines() result in diffs that are suitable for
file.writelines() since both the inputs and outputs have trailing
newlines.
For inputs that do not have trailing newlines, set the lineterm
argument to "" so that the output will be uniformly newline free.
The context diff format normally has a header for filenames and
modification times. Any or all of these may be specified using
strings for 'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'.
The modification times are normally expressed in the ISO 8601 format.
If not specified, the strings default to blanks.
Example:
>>> print ''.join(context_diff('one\ntwo\nthree\nfour\n'.splitlines(1),
... 'zero\none\ntree\nfour\n'.splitlines(1), 'Original', 'Current')),
*** Original
--- Current
***************
*** 1,4 ****
one
! two
! three
four
--- 1,4 ----
+ zero
one
! tree
four
"""
prefix = dict(insert='+ ', delete='- ', replace='! ', equal=' ')
started = False
for group in SequenceMatcher(None,a,b).get_grouped_opcodes(n):
if not started:
started = True
fromdate = '\t{}'.format(fromfiledate) if fromfiledate else ''
todate = '\t{}'.format(tofiledate) if tofiledate else ''
yield '*** {}{}{}'.format(fromfile, fromdate, lineterm)
yield '--- {}{}{}'.format(tofile, todate, lineterm)
first, last = group[0], group[-1]
yield '***************' + lineterm
file1_range = _format_range_context(first[1], last[2])
yield '*** {} ****{}'.format(file1_range, lineterm)
if any(tag in ('replace', 'delete') for tag, _, _, _, _ in group):
for tag, i1, i2, _, _ in group:
if tag != 'insert':
for line in a[i1:i2]:
yield prefix[tag] + line
file2_range = _format_range_context(first[3], last[4])
yield '--- {} ----{}'.format(file2_range, lineterm)
if any(tag in ('replace', 'insert') for tag, _, _, _, _ in group):
for tag, _, _, j1, j2 in group:
if tag != 'delete':
for line in b[j1:j2]:
yield prefix[tag] + line
def ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK):
r"""
Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
Optional keyword parameters `linejunk` and `charjunk` are for filter
functions (or None):
- linejunk: A function that should accept a single string argument, and
return true iff the string is junk. The default is None, and is
recommended; as of Python 2.3, an adaptive notion of "noise" lines is
used that does a good job on its own.
- charjunk: A function that should accept a string of length 1. The
default is module-level function IS_CHARACTER_JUNK, which filters out
whitespace characters (a blank or tab; note: bad idea to include newline
in this!).
Tools/scripts/ndiff.py is a command-line front-end to this function.
Example:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
... 'ore\ntree\nemu\n'.splitlines(1))
>>> print ''.join(diff),
- one
? ^
+ ore
? ^
- two
- three
? -
+ tree
+ emu
"""
return Differ(linejunk, charjunk).compare(a, b)
def _mdiff(fromlines, tolines, context=None, linejunk=None,
charjunk=IS_CHARACTER_JUNK):
r"""Returns generator yielding marked up from/to side by side differences.
Arguments:
fromlines -- list of text lines to compared to tolines
tolines -- list of text lines to be compared to fromlines
context -- number of context lines to display on each side of difference,
if None, all from/to text lines will be generated.
linejunk -- passed on to ndiff (see ndiff documentation)
charjunk -- passed on to ndiff (see ndiff documentation)
This function returns an iterator which returns a tuple:
(from line tuple, to line tuple, boolean flag)
from/to line tuple -- (line num, line text)
line num -- integer or None (to indicate a context separation)
line text -- original line text with following markers inserted:
'\0+' -- marks start of added text
'\0-' -- marks start of deleted text
'\0^' -- marks start of changed text
'\1' -- marks end of added/deleted/changed text
boolean flag -- None indicates context separation, True indicates
either "from" or "to" line contains a change, otherwise False.
This function/iterator was originally developed to generate side by side
file difference for making HTML pages (see HtmlDiff class for example
usage).
Note, this function utilizes the ndiff function to generate the side by
side difference markup. Optional ndiff arguments may be passed to this
function and they in turn will be passed to ndiff.
"""
import re
# regular expression for finding intraline change indices
change_re = re.compile('(\++|\-+|\^+)')
# create the difference iterator to generate the differences
diff_lines_iterator = ndiff(fromlines,tolines,linejunk,charjunk)
def _make_line(lines, format_key, side, num_lines=[0,0]):
"""Returns line of text with user's change markup and line formatting.
lines -- list of lines from the ndiff generator to produce a line of
text from. When producing the line of text to return, the
lines used are removed from this list.
format_key -- '+' return first line in list with "add" markup around
the entire line.
'-' return first line in list with "delete" markup around
the entire line.
'?' return first line in list with add/delete/change
intraline markup (indices obtained from second line)
None return first line in list with no markup
side -- indice into the num_lines list (0=from,1=to)
num_lines -- from/to current line number. This is NOT intended to be a
passed parameter. It is present as a keyword argument to
maintain memory of the current line numbers between calls
of this function.
Note, this function is purposefully not defined at the module scope so
that data it needs from its parent function (within whose context it
is defined) does not need to be of module scope.
"""
num_lines[side] += 1
# Handle case where no user markup is to be added, just return line of
# text with user's line format to allow for usage of the line number.
if format_key is None:
return (num_lines[side],lines.pop(0)[2:])
# Handle case of intraline changes
if format_key == '?':
text, markers = lines.pop(0), lines.pop(0)
# find intraline changes (store change type and indices in tuples)
sub_info = []
def record_sub_info(match_object,sub_info=sub_info):
sub_info.append([match_object.group(1)[0],match_object.span()])
return match_object.group(1)
change_re.sub(record_sub_info,markers)
# process each tuple inserting our special marks that won't be
# noticed by an xml/html escaper.
for key,(begin,end) in sub_info[::-1]:
text = text[0:begin]+'\0'+key+text[begin:end]+'\1'+text[end:]
text = text[2:]
# Handle case of add/delete entire line
else:
text = lines.pop(0)[2:]
# if line of text is just a newline, insert a space so there is
# something for the user to highlight and see.
if not text:
text = ' '
# insert marks that won't be noticed by an xml/html escaper.
text = '\0' + format_key + text + '\1'
# Return line of text, first allow user's line formatter to do its
# thing (such as adding the line number) then replace the special
# marks with what the user's change markup.
return (num_lines[side],text)
def _line_iterator():
"""Yields from/to lines of text with a change indication.
This function is an iterator. It itself pulls lines from a
differencing iterator, processes them and yields them. When it can
it yields both a "from" and a "to" line, otherwise it will yield one
or the other. In addition to yielding the lines of from/to text, a
boolean flag is yielded to indicate if the text line(s) have
differences in them.
Note, this function is purposefully not defined at the module scope so
that data it needs from its parent function (within whose context it
is defined) does not need to be of module scope.
"""
lines = []
num_blanks_pending, num_blanks_to_yield = 0, 0
while True:
# Load up next 4 lines so we can look ahead, create strings which
# are a concatenation of the first character of each of the 4 lines
# so we can do some very readable comparisons.
while len(lines) < 4:
try:
lines.append(diff_lines_iterator.next())
except StopIteration:
lines.append('X')
s = ''.join([line[0] for line in lines])
if s.startswith('X'):
# When no more lines, pump out any remaining blank lines so the
# corresponding add/delete lines get a matching blank line so
# all line pairs get yielded at the next level.
num_blanks_to_yield = num_blanks_pending
elif s.startswith('-?+?'):
# simple intraline change
yield _make_line(lines,'?',0), _make_line(lines,'?',1), True
continue
elif s.startswith('--++'):
# in delete block, add block coming: we do NOT want to get
# caught up on blank lines yet, just process the delete line
num_blanks_pending -= 1
yield _make_line(lines,'-',0), None, True
continue
elif s.startswith(('--?+', '--+', '- ')):
# in delete block and see an intraline change or unchanged line
# coming: yield the delete line and then blanks
from_line,to_line = _make_line(lines,'-',0), None
num_blanks_to_yield,num_blanks_pending = num_blanks_pending-1,0
elif s.startswith('-+?'):
# intraline change
yield _make_line(lines,None,0), _make_line(lines,'?',1), True
continue
elif s.startswith('-?+'):
# intraline change
yield _make_line(lines,'?',0), _make_line(lines,None,1), True
continue
elif s.startswith('-'):
# delete FROM line
num_blanks_pending -= 1
yield _make_line(lines,'-',0), None, True
continue
elif s.startswith('+--'):
# in add block, delete block coming: we do NOT want to get
# caught up on blank lines yet, just process the add line
num_blanks_pending += 1
yield None, _make_line(lines,'+',1), True
continue
elif s.startswith(('+ ', '+-')):
# will be leaving an add block: yield blanks then add line
from_line, to_line = None, _make_line(lines,'+',1)
num_blanks_to_yield,num_blanks_pending = num_blanks_pending+1,0
elif s.startswith('+'):
# inside an add block, yield the add line
num_blanks_pending += 1
yield None, _make_line(lines,'+',1), True
continue
elif s.startswith(' '):
# unchanged text, yield it to both sides
yield _make_line(lines[:],None,0),_make_line(lines,None,1),False
continue
# Catch up on the blank lines so when we yield the next from/to
# pair, they are lined up.
while(num_blanks_to_yield < 0):
num_blanks_to_yield += 1
yield None,('','\n'),True
while(num_blanks_to_yield > 0):
num_blanks_to_yield -= 1
yield ('','\n'),None,True
if s.startswith('X'):
raise StopIteration
else:
yield from_line,to_line,True
def _line_pair_iterator():
"""Yields from/to lines of text with a change indication.
This function is an iterator. It itself pulls lines from the line
iterator. Its difference from that iterator is that this function
always yields a pair of from/to text lines (with the change
indication). If necessary it will collect single from/to lines
until it has a matching pair from/to pair to yield.
Note, this function is purposefully not defined at the module scope so
that data it needs from its parent function (within whose context it
is defined) does not need to be of module scope.
"""
line_iterator = _line_iterator()
fromlines,tolines=[],[]
while True:
# Collecting lines of text until we have a from/to pair
while (len(fromlines)==0 or len(tolines)==0):
from_line, to_line, found_diff =line_iterator.next()
if from_line is not None:
fromlines.append((from_line,found_diff))
if to_line is not None:
tolines.append((to_line,found_diff))
# Once we have a pair, remove them from the collection and yield it
from_line, fromDiff = fromlines.pop(0)
to_line, to_diff = tolines.pop(0)
yield (from_line,to_line,fromDiff or to_diff)
# Handle case where user does not want context differencing, just yield
# them up without doing anything else with them.
line_pair_iterator = _line_pair_iterator()
if context is None:
while True:
yield line_pair_iterator.next()
# Handle case where user wants context differencing. We must do some
# storage of lines until we know for sure that they are to be yielded.
else:
context += 1
lines_to_write = 0
while True:
# Store lines up until we find a difference, note use of a
# circular queue because we only need to keep around what
# we need for context.
index, contextLines = 0, [None]*(context)
found_diff = False
while(found_diff is False):
from_line, to_line, found_diff = line_pair_iterator.next()
i = index % context
contextLines[i] = (from_line, to_line, found_diff)
index += 1
# Yield lines that we have collected so far, but first yield
# the user's separator.
if index > context:
yield None, None, None
lines_to_write = context
else:
lines_to_write = index
index = 0
while(lines_to_write):
i = index % context
index += 1
yield contextLines[i]
lines_to_write -= 1
# Now yield the context lines after the change
lines_to_write = context-1
while(lines_to_write):
from_line, to_line, found_diff = line_pair_iterator.next()
# If another change within the context, extend the context
if found_diff:
lines_to_write = context-1
else:
lines_to_write -= 1
yield from_line, to_line, found_diff
_file_template = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1" />
<title></title>
<style type="text/css">%(styles)s
</style>
</head>
<body>
%(table)s%(legend)s
</body>
</html>"""
_styles = """
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}"""
_table_template = """
<table class="diff" id="difflib_chg_%(prefix)s_top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
%(header_row)s
<tbody>
%(data_rows)s </tbody>
</table>"""
_legend = """
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>"""
class HtmlDiff(object):
"""For producing HTML side by side comparison with change highlights.
This class can be used to create an HTML table (or a complete HTML file
containing the table) showing a side by side, line by line comparison
of text with inter-line and intra-line change highlights. The table can
be generated in either full or contextual difference mode.
The following methods are provided for HTML generation:
make_table -- generates HTML for a single side by side table
make_file -- generates complete HTML file with a single side by side table
See tools/scripts/diff.py for an example usage of this class.
"""
_file_template = _file_template
_styles = _styles
_table_template = _table_template
_legend = _legend
_default_prefix = 0
def __init__(self,tabsize=8,wrapcolumn=None,linejunk=None,
charjunk=IS_CHARACTER_JUNK):
"""HtmlDiff instance initializer
Arguments:
tabsize -- tab stop spacing, defaults to 8.
wrapcolumn -- column number where lines are broken and wrapped,
defaults to None where lines are not wrapped.
linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
HtmlDiff() to generate the side by side HTML differences). See
ndiff() documentation for argument default values and descriptions.
"""
self._tabsize = tabsize
self._wrapcolumn = wrapcolumn
self._linejunk = linejunk
self._charjunk = charjunk
def make_file(self,fromlines,tolines,fromdesc='',todesc='',context=False,
numlines=5):
"""Returns HTML file of side by side comparison with change highlights
Arguments:
fromlines -- list of "from" lines
tolines -- list of "to" lines
fromdesc -- "from" file column header string
todesc -- "to" file column header string
context -- set to True for contextual differences (defaults to False
which shows full differences).
numlines -- number of context lines. When context is set True,
controls number of lines displayed before and after the change.
When context is False, controls the number of lines to place
the "next" link anchors before the next change (so click of
"next" link jumps to just before the change).
"""
return self._file_template % dict(
styles = self._styles,
legend = self._legend,
table = self.make_table(fromlines,tolines,fromdesc,todesc,
context=context,numlines=numlines))
def _tab_newline_replace(self,fromlines,tolines):
"""Returns from/to line lists with tabs expanded and newlines removed.
Instead of tab characters being replaced by the number of spaces
needed to fill in to the next tab stop, this function will fill
the space with tab characters. This is done so that the difference
algorithms can identify changes in a file when tabs are replaced by
spaces and vice versa. At the end of the HTML generation, the tab
characters will be replaced with a nonbreakable space.
"""
def expand_tabs(line):
# hide real spaces
line = line.replace(' ','\0')
# expand tabs into spaces
line = line.expandtabs(self._tabsize)
# replace spaces from expanded tabs back into tab characters
# (we'll replace them with markup after we do differencing)
line = line.replace(' ','\t')
return line.replace('\0',' ').rstrip('\n')
fromlines = [expand_tabs(line) for line in fromlines]
tolines = [expand_tabs(line) for line in tolines]
return fromlines,tolines
def _split_line(self,data_list,line_num,text):
"""Builds list of text lines by splitting text lines at wrap point
This function will determine if the input text line needs to be
wrapped (split) into separate lines. If so, the first wrap point
will be determined and the first line appended to the output
text line list. This function is used recursively to handle
the second part of the split line to further split it.
"""
# if blank line or context separator, just add it to the output list
if not line_num:
data_list.append((line_num,text))
return
# if line text doesn't need wrapping, just add it to the output list
size = len(text)
max = self._wrapcolumn
if (size <= max) or ((size -(text.count('\0')*3)) <= max):
data_list.append((line_num,text))
return
# scan text looking for the wrap point, keeping track if the wrap
# point is inside markers
i = 0
n = 0
mark = ''
while n < max and i < size:
if text[i] == '\0':
i += 1
mark = text[i]
i += 1
elif text[i] == '\1':
i += 1
mark = ''
else:
i += 1
n += 1
# wrap point is inside text, break it up into separate lines
line1 = text[:i]
line2 = text[i:]
# if wrap point is inside markers, place end marker at end of first
# line and start marker at beginning of second line because each
# line will have its own table tag markup around it.
if mark:
line1 = line1 + '\1'
line2 = '\0' + mark + line2
# tack on first line onto the output list
data_list.append((line_num,line1))
# use this routine again to wrap the remaining text
self._split_line(data_list,'>',line2)
def _line_wrapper(self,diffs):
"""Returns iterator that splits (wraps) mdiff text lines"""
# pull from/to data and flags from mdiff iterator
for fromdata,todata,flag in diffs:
# check for context separators and pass them through
if flag is None:
yield fromdata,todata,flag
continue
(fromline,fromtext),(toline,totext) = fromdata,todata
# for each from/to line split it at the wrap column to form
# list of text lines.
fromlist,tolist = [],[]
self._split_line(fromlist,fromline,fromtext)
self._split_line(tolist,toline,totext)
# yield from/to line in pairs inserting blank lines as
# necessary when one side has more wrapped lines
while fromlist or tolist:
if fromlist:
fromdata = fromlist.pop(0)
else:
fromdata = ('',' ')
if tolist:
todata = tolist.pop(0)
else:
todata = ('',' ')
yield fromdata,todata,flag
def _collect_lines(self,diffs):
"""Collects mdiff output into separate lists
Before storing the mdiff from/to data into a list, it is converted
into a single line of text with HTML markup.
"""
fromlist,tolist,flaglist = [],[],[]
# pull from/to data and flags from mdiff style iterator
for fromdata,todata,flag in diffs:
try:
# store HTML markup of the lines into the lists
fromlist.append(self._format_line(0,flag,*fromdata))
tolist.append(self._format_line(1,flag,*todata))
except TypeError:
# exceptions occur for lines where context separators go
fromlist.append(None)
tolist.append(None)
flaglist.append(flag)
return fromlist,tolist,flaglist
def _format_line(self,side,flag,linenum,text):
"""Returns HTML markup of "from" / "to" text lines
side -- 0 or 1 indicating "from" or "to" text
flag -- indicates if difference on line
linenum -- line number (used for line number column)
text -- line text to be marked up
"""
try:
linenum = '%d' % linenum
id = ' id="%s%s"' % (self._prefix[side],linenum)
except TypeError:
# handle blank lines where linenum is '>' or ''
id = ''
# replace those things that would get confused with HTML symbols
text=text.replace("&","&").replace(">",">").replace("<","<")
# make space non-breakable so they don't get compressed or line wrapped
text = text.replace(' ',' ').rstrip()
return '<td class="diff_header"%s>%s</td><td nowrap="nowrap">%s</td>' \
% (id,linenum,text)
def _make_prefix(self):
"""Create unique anchor prefixes"""
# Generate a unique anchor prefix so multiple tables
# can exist on the same HTML page without conflicts.
fromprefix = "from%d_" % HtmlDiff._default_prefix
toprefix = "to%d_" % HtmlDiff._default_prefix
HtmlDiff._default_prefix += 1
# store prefixes so line format method has access
self._prefix = [fromprefix,toprefix]
def _convert_flags(self,fromlist,tolist,flaglist,context,numlines):
"""Makes list of "next" links"""
# all anchor names will be generated using the unique "to" prefix
toprefix = self._prefix[1]
# process change flags, generating middle column of next anchors/links
next_id = ['']*len(flaglist)
next_href = ['']*len(flaglist)
num_chg, in_change = 0, False
last = 0
for i,flag in enumerate(flaglist):
if flag:
if not in_change:
in_change = True
last = i
# at the beginning of a change, drop an anchor a few lines
# (the context lines) before the change for the previous
# link
i = max([0,i-numlines])
next_id[i] = ' id="difflib_chg_%s_%d"' % (toprefix,num_chg)
# at the beginning of a change, drop a link to the next
# change
num_chg += 1
next_href[last] = '<a href="#difflib_chg_%s_%d">n</a>' % (
toprefix,num_chg)
else:
in_change = False
# check for cases where there is no content to avoid exceptions
if not flaglist:
flaglist = [False]
next_id = ['']
next_href = ['']
last = 0
if context:
fromlist = ['<td></td><td> No Differences Found </td>']
tolist = fromlist
else:
fromlist = tolist = ['<td></td><td> Empty File </td>']
# if not a change on first line, drop a link
if not flaglist[0]:
next_href[0] = '<a href="#difflib_chg_%s_0">f</a>' % toprefix
# redo the last link to link to the top
next_href[last] = '<a href="#difflib_chg_%s_top">t</a>' % (toprefix)
return fromlist,tolist,flaglist,next_href,next_id
def make_table(self,fromlines,tolines,fromdesc='',todesc='',context=False,
numlines=5):
"""Returns HTML table of side by side comparison with change highlights
Arguments:
fromlines -- list of "from" lines
tolines -- list of "to" lines
fromdesc -- "from" file column header string
todesc -- "to" file column header string
context -- set to True for contextual differences (defaults to False
which shows full differences).
numlines -- number of context lines. When context is set True,
controls number of lines displayed before and after the change.
When context is False, controls the number of lines to place
the "next" link anchors before the next change (so click of
"next" link jumps to just before the change).
"""
# make unique anchor prefixes so that multiple tables may exist
# on the same page without conflict.
self._make_prefix()
# change tabs to spaces before it gets more difficult after we insert
# markup
fromlines,tolines = self._tab_newline_replace(fromlines,tolines)
# create diffs iterator which generates side by side from/to data
if context:
context_lines = numlines
else:
context_lines = None
diffs = _mdiff(fromlines,tolines,context_lines,linejunk=self._linejunk,
charjunk=self._charjunk)
# set up iterator to wrap lines that exceed desired width
if self._wrapcolumn:
diffs = self._line_wrapper(diffs)
# collect up from/to lines and flags into lists (also format the lines)
fromlist,tolist,flaglist = self._collect_lines(diffs)
# process change flags, generating middle column of next anchors/links
fromlist,tolist,flaglist,next_href,next_id = self._convert_flags(
fromlist,tolist,flaglist,context,numlines)
s = []
fmt = ' <tr><td class="diff_next"%s>%s</td>%s' + \
'<td class="diff_next">%s</td>%s</tr>\n'
for i in range(len(flaglist)):
if flaglist[i] is None:
# mdiff yields None on separator lines skip the bogus ones
# generated for the first line
if i > 0:
s.append(' </tbody> \n <tbody>\n')
else:
s.append( fmt % (next_id[i],next_href[i],fromlist[i],
next_href[i],tolist[i]))
if fromdesc or todesc:
header_row = '<thead><tr>%s%s%s%s</tr></thead>' % (
'<th class="diff_next"><br /></th>',
'<th colspan="2" class="diff_header">%s</th>' % fromdesc,
'<th class="diff_next"><br /></th>',
'<th colspan="2" class="diff_header">%s</th>' % todesc)
else:
header_row = ''
table = self._table_template % dict(
data_rows=''.join(s),
header_row=header_row,
prefix=self._prefix[1])
return table.replace('\0+','<span class="diff_add">'). \
replace('\0-','<span class="diff_sub">'). \
replace('\0^','<span class="diff_chg">'). \
replace('\1','</span>'). \
replace('\t',' ')
del re
def restore(delta, which):
r"""
Generate one of the two sequences that generated a delta.
Given a `delta` produced by `Differ.compare()` or `ndiff()`, extract
lines originating from file 1 or 2 (parameter `which`), stripping off line
prefixes.
Examples:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
... 'ore\ntree\nemu\n'.splitlines(1))
>>> diff = list(diff)
>>> print ''.join(restore(diff, 1)),
one
two
three
>>> print ''.join(restore(diff, 2)),
ore
tree
emu
"""
try:
tag = {1: "- ", 2: "+ "}[int(which)]
except KeyError:
raise ValueError, ('unknown delta choice (must be 1 or 2): %r'
% which)
prefixes = (" ", tag)
for line in delta:
if line[:2] in prefixes:
yield line[2:]
def _test():
import doctest, difflib
return doctest.testmod(difflib)
if __name__ == "__main__":
_test()
"""Read and cache directory listings.
The listdir() routine returns a sorted list of the files in a directory,
using a cache to avoid reading the directory more often than necessary.
The annotate() routine appends slashes to directories."""
from warnings import warnpy3k
warnpy3k("the dircache module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
import os
__all__ = ["listdir", "opendir", "annotate", "reset"]
cache = {}
def reset():
"""Reset the cache completely."""
global cache
cache = {}
def listdir(path):
"""List directory contents, using cache."""
try:
cached_mtime, list = cache[path]
del cache[path]
except KeyError:
cached_mtime, list = -1, []
mtime = os.stat(path).st_mtime
if mtime != cached_mtime:
list = os.listdir(path)
list.sort()
cache[path] = mtime, list
return list
opendir = listdir # XXX backward compatibility
def annotate(head, list):
"""Add '/' suffixes to directories."""
for i in range(len(list)):
if os.path.isdir(os.path.join(head, list[i])):
list[i] = list[i] + '/'
"""Disassembler of Python byte code into mnemonics."""
import sys
import types
from opcode import *
from opcode import __all__ as _opcodes_all
__all__ = ["dis", "disassemble", "distb", "disco",
"findlinestarts", "findlabels"] + _opcodes_all
del _opcodes_all
_have_code = (types.MethodType, types.FunctionType, types.CodeType,
types.ClassType, type)
def dis(x=None):
"""Disassemble classes, methods, functions, or code.
With no argument, disassemble the last traceback.
"""
if x is None:
distb()
return
if isinstance(x, types.InstanceType):
x = x.__class__
if hasattr(x, 'im_func'):
x = x.im_func
if hasattr(x, 'func_code'):
x = x.func_code
if hasattr(x, '__dict__'):
items = x.__dict__.items()
items.sort()
for name, x1 in items:
if isinstance(x1, _have_code):
print "Disassembly of %s:" % name
try:
dis(x1)
except TypeError, msg:
print "Sorry:", msg
print
elif hasattr(x, 'co_code'):
disassemble(x)
elif isinstance(x, str):
disassemble_string(x)
else:
raise TypeError, \
"don't know how to disassemble %s objects" % \
type(x).__name__
def distb(tb=None):
"""Disassemble a traceback (default: last traceback)."""
if tb is None:
try:
tb = sys.last_traceback
except AttributeError:
raise RuntimeError, "no last traceback to disassemble"
while tb.tb_next: tb = tb.tb_next
disassemble(tb.tb_frame.f_code, tb.tb_lasti)
def disassemble(co, lasti=-1):
"""Disassemble a code object."""
code = co.co_code
labels = findlabels(code)
linestarts = dict(findlinestarts(co))
n = len(code)
i = 0
extended_arg = 0
free = None
while i < n:
c = code[i]
op = ord(c)
if i in linestarts:
if i > 0:
print
print "%3d" % linestarts[i],
else:
print ' ',
if i == lasti: print '-->',
else: print ' ',
if i in labels: print '>>',
else: print ' ',
print repr(i).rjust(4),
print opname[op].ljust(20),
i = i+1
if op >= HAVE_ARGUMENT:
oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
extended_arg = 0
i = i+2
if op == EXTENDED_ARG:
extended_arg = oparg*65536L
print repr(oparg).rjust(5),
if op in hasconst:
print '(' + repr(co.co_consts[oparg]) + ')',
elif op in hasname:
print '(' + co.co_names[oparg] + ')',
elif op in hasjrel:
print '(to ' + repr(i + oparg) + ')',
elif op in haslocal:
print '(' + co.co_varnames[oparg] + ')',
elif op in hascompare:
print '(' + cmp_op[oparg] + ')',
elif op in hasfree:
if free is None:
free = co.co_cellvars + co.co_freevars
print '(' + free[oparg] + ')',
print
def disassemble_string(code, lasti=-1, varnames=None, names=None,
constants=None):
labels = findlabels(code)
n = len(code)
i = 0
while i < n:
c = code[i]
op = ord(c)
if i == lasti: print '-->',
else: print ' ',
if i in labels: print '>>',
else: print ' ',
print repr(i).rjust(4),
print opname[op].ljust(15),
i = i+1
if op >= HAVE_ARGUMENT:
oparg = ord(code[i]) + ord(code[i+1])*256
i = i+2
print repr(oparg).rjust(5),
if op in hasconst:
if constants:
print '(' + repr(constants[oparg]) + ')',
else:
print '(%d)'%oparg,
elif op in hasname:
if names is not None:
print '(' + names[oparg] + ')',
else:
print '(%d)'%oparg,
elif op in hasjrel:
print '(to ' + repr(i + oparg) + ')',
elif op in haslocal:
if varnames:
print '(' + varnames[oparg] + ')',
else:
print '(%d)' % oparg,
elif op in hascompare:
print '(' + cmp_op[oparg] + ')',
print
disco = disassemble # XXX For backwards compatibility
def findlabels(code):
"""Detect all offsets in a byte code which are jump targets.
Return the list of offsets.
"""
labels = []
n = len(code)
i = 0
while i < n:
c = code[i]
op = ord(c)
i = i+1
if op >= HAVE_ARGUMENT:
oparg = ord(code[i]) + ord(code[i+1])*256
i = i+2
label = -1
if op in hasjrel:
label = i+oparg
elif op in hasjabs:
label = oparg
if label >= 0:
if label not in labels:
labels.append(label)
return labels
def findlinestarts(code):
"""Find the offsets in a byte code which are start of lines in the source.
Generate pairs (offset, lineno) as described in Python/compile.c.
"""
byte_increments = [ord(c) for c in code.co_lnotab[0::2]]
line_increments = [ord(c) for c in code.co_lnotab[1::2]]
lastlineno = None
lineno = code.co_firstlineno
addr = 0
for byte_incr, line_incr in zip(byte_increments, line_increments):
if byte_incr:
if lineno != lastlineno:
yield (addr, lineno)
lastlineno = lineno
addr += byte_incr
lineno += line_incr
if lineno != lastlineno:
yield (addr, lineno)
def _test():
"""Simple test program to disassemble a file."""
if sys.argv[1:]:
if sys.argv[2:]:
sys.stderr.write("usage: python dis.py [-|file]\n")
sys.exit(2)
fn = sys.argv[1]
if not fn or fn == "-":
fn = None
else:
fn = None
if fn is None:
f = sys.stdin
else:
f = open(fn)
source = f.read()
if fn is not None:
f.close()
else:
fn = "<stdin>"
code = compile(source, fn, "exec")
dis(code)
if __name__ == "__main__":
_test()
"""distutils.archive_util
Utility functions for creating archive files (tarballs, zip files,
that sort of thing)."""
__revision__ = "$Id$"
import os
from warnings import warn
import sys
from distutils.errors import DistutilsExecError
from distutils.spawn import spawn
from distutils.dir_util import mkpath
from distutils import log
try:
from pwd import getpwnam
except ImportError:
getpwnam = None
try:
from grp import getgrnam
except ImportError:
getgrnam = None
def _get_gid(name):
"""Returns a gid, given a group name."""
if getgrnam is None or name is None:
return None
try:
result = getgrnam(name)
except KeyError:
result = None
if result is not None:
return result[2]
return None
def _get_uid(name):
"""Returns an uid, given a user name."""
if getpwnam is None or name is None:
return None
try:
result = getpwnam(name)
except KeyError:
result = None
if result is not None:
return result[2]
return None
def make_tarball(base_name, base_dir, compress="gzip", verbose=0, dry_run=0,
owner=None, group=None):
"""Create a (possibly compressed) tar file from all the files under
'base_dir'.
'compress' must be "gzip" (the default), "compress", "bzip2", or None.
(compress will be deprecated in Python 3.2)
'owner' and 'group' can be used to define an owner and a group for the
archive that is being built. If not provided, the current owner and group
will be used.
The output tar file will be named 'base_dir' + ".tar", possibly plus
the appropriate compression extension (".gz", ".bz2" or ".Z").
Returns the output filename.
"""
tar_compression = {'gzip': 'gz', 'bzip2': 'bz2', None: '', 'compress': ''}
compress_ext = {'gzip': '.gz', 'bzip2': '.bz2', 'compress': '.Z'}
# flags for compression program, each element of list will be an argument
if compress is not None and compress not in compress_ext.keys():
raise ValueError, \
("bad value for 'compress': must be None, 'gzip', 'bzip2' "
"or 'compress'")
archive_name = base_name + '.tar'
if compress != 'compress':
archive_name += compress_ext.get(compress, '')
mkpath(os.path.dirname(archive_name), dry_run=dry_run)
# creating the tarball
import tarfile # late import so Python build itself doesn't break
log.info('Creating tar archive')
uid = _get_uid(owner)
gid = _get_gid(group)
def _set_uid_gid(tarinfo):
if gid is not None:
tarinfo.gid = gid
tarinfo.gname = group
if uid is not None:
tarinfo.uid = uid
tarinfo.uname = owner
return tarinfo
if not dry_run:
tar = tarfile.open(archive_name, 'w|%s' % tar_compression[compress])
try:
tar.add(base_dir, filter=_set_uid_gid)
finally:
tar.close()
# compression using `compress`
if compress == 'compress':
warn("'compress' will be deprecated.", PendingDeprecationWarning)
# the option varies depending on the platform
compressed_name = archive_name + compress_ext[compress]
if sys.platform == 'win32':
cmd = [compress, archive_name, compressed_name]
else:
cmd = [compress, '-f', archive_name]
spawn(cmd, dry_run=dry_run)
return compressed_name
return archive_name
def make_zipfile(base_name, base_dir, verbose=0, dry_run=0):
"""Create a zip file from all the files under 'base_dir'.
The output zip file will be named 'base_name' + ".zip". Uses either the
"zipfile" Python module (if available) or the InfoZIP "zip" utility
(if installed and found on the default search path). If neither tool is
available, raises DistutilsExecError. Returns the name of the output zip
file.
"""
try:
import zipfile
except ImportError:
zipfile = None
zip_filename = base_name + ".zip"
mkpath(os.path.dirname(zip_filename), dry_run=dry_run)
# If zipfile module is not available, try spawning an external
# 'zip' command.
if zipfile is None:
if verbose:
zipoptions = "-r"
else:
zipoptions = "-rq"
try:
spawn(["zip", zipoptions, zip_filename, base_dir],
dry_run=dry_run)
except DistutilsExecError:
# XXX really should distinguish between "couldn't find
# external 'zip' command" and "zip failed".
raise DistutilsExecError, \
("unable to create zip file '%s': "
"could neither import the 'zipfile' module nor "
"find a standalone zip utility") % zip_filename
else:
log.info("creating '%s' and adding '%s' to it",
zip_filename, base_dir)
if not dry_run:
zip = zipfile.ZipFile(zip_filename, "w",
compression=zipfile.ZIP_DEFLATED)
if base_dir != os.curdir:
path = os.path.normpath(os.path.join(base_dir, ''))
zip.write(path, path)
log.info("adding '%s'", path)
for dirpath, dirnames, filenames in os.walk(base_dir):
for name in dirnames:
path = os.path.normpath(os.path.join(dirpath, name, ''))
zip.write(path, path)
log.info("adding '%s'", path)
for name in filenames:
path = os.path.normpath(os.path.join(dirpath, name))
if os.path.isfile(path):
zip.write(path, path)
log.info("adding '%s'" % path)
zip.close()
return zip_filename
ARCHIVE_FORMATS = {
'gztar': (make_tarball, [('compress', 'gzip')], "gzip'ed tar-file"),
'bztar': (make_tarball, [('compress', 'bzip2')], "bzip2'ed tar-file"),
'ztar': (make_tarball, [('compress', 'compress')], "compressed tar file"),
'tar': (make_tarball, [('compress', None)], "uncompressed tar file"),
'zip': (make_zipfile, [],"ZIP file")
}
def check_archive_formats(formats):
"""Returns the first format from the 'format' list that is unknown.
If all formats are known, returns None
"""
for format in formats:
if format not in ARCHIVE_FORMATS:
return format
return None
def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
dry_run=0, owner=None, group=None):
"""Create an archive file (eg. zip or tar).
'base_name' is the name of the file to create, minus any format-specific
extension; 'format' is the archive format: one of "zip", "tar", "ztar",
or "gztar".
'root_dir' is a directory that will be the root directory of the
archive; ie. we typically chdir into 'root_dir' before creating the
archive. 'base_dir' is the directory where we start archiving from;
ie. 'base_dir' will be the common prefix of all files and
directories in the archive. 'root_dir' and 'base_dir' both default
to the current directory. Returns the name of the archive file.
'owner' and 'group' are used when creating a tar archive. By default,
uses the current owner and group.
"""
save_cwd = os.getcwd()
if root_dir is not None:
log.debug("changing into '%s'", root_dir)
base_name = os.path.abspath(base_name)
if not dry_run:
os.chdir(root_dir)
if base_dir is None:
base_dir = os.curdir
kwargs = {'dry_run': dry_run}
try:
format_info = ARCHIVE_FORMATS[format]
except KeyError:
raise ValueError, "unknown archive format '%s'" % format
func = format_info[0]
for arg, val in format_info[1]:
kwargs[arg] = val
if format != 'zip':
kwargs['owner'] = owner
kwargs['group'] = group
try:
filename = func(base_name, base_dir, **kwargs)
finally:
if root_dir is not None:
log.debug("changing back to '%s'", save_cwd)
os.chdir(save_cwd)
return filename
"""distutils.bcppcompiler
Contains BorlandCCompiler, an implementation of the abstract CCompiler class
for the Borland C++ compiler.
"""
# This implementation by Lyle Johnson, based on the original msvccompiler.py
# module and using the directions originally published by Gordon Williams.
# XXX looks like there's a LOT of overlap between these two classes:
# someone should sit down and factor out the common code as
# WindowsCCompiler! --GPW
__revision__ = "$Id$"
import os
from distutils.errors import (DistutilsExecError, CompileError, LibError,
LinkError, UnknownFileError)
from distutils.ccompiler import CCompiler, gen_preprocess_options
from distutils.file_util import write_file
from distutils.dep_util import newer
from distutils import log
class BCPPCompiler(CCompiler) :
"""Concrete class that implements an interface to the Borland C/C++
compiler, as defined by the CCompiler abstract class.
"""
compiler_type = 'bcpp'
# Just set this so CCompiler's constructor doesn't barf. We currently
# don't use the 'set_executables()' bureaucracy provided by CCompiler,
# as it really isn't necessary for this sort of single-compiler class.
# Would be nice to have a consistent interface with UnixCCompiler,
# though, so it's worth thinking about.
executables = {}
# Private class data (need to distinguish C from C++ source for compiler)
_c_extensions = ['.c']
_cpp_extensions = ['.cc', '.cpp', '.cxx']
# Needed for the filename generation methods provided by the
# base class, CCompiler.
src_extensions = _c_extensions + _cpp_extensions
obj_extension = '.obj'
static_lib_extension = '.lib'
shared_lib_extension = '.dll'
static_lib_format = shared_lib_format = '%s%s'
exe_extension = '.exe'
def __init__ (self,
verbose=0,
dry_run=0,
force=0):
CCompiler.__init__ (self, verbose, dry_run, force)
# These executables are assumed to all be in the path.
# Borland doesn't seem to use any special registry settings to
# indicate their installation locations.
self.cc = "bcc32.exe"
self.linker = "ilink32.exe"
self.lib = "tlib.exe"
self.preprocess_options = None
self.compile_options = ['/tWM', '/O2', '/q', '/g0']
self.compile_options_debug = ['/tWM', '/Od', '/q', '/g0']
self.ldflags_shared = ['/Tpd', '/Gn', '/q', '/x']
self.ldflags_shared_debug = ['/Tpd', '/Gn', '/q', '/x']
self.ldflags_static = []
self.ldflags_exe = ['/Gn', '/q', '/x']
self.ldflags_exe_debug = ['/Gn', '/q', '/x','/r']
# -- Worker methods ------------------------------------------------
def compile(self, sources,
output_dir=None, macros=None, include_dirs=None, debug=0,
extra_preargs=None, extra_postargs=None, depends=None):
macros, objects, extra_postargs, pp_opts, build = \
self._setup_compile(output_dir, macros, include_dirs, sources,
depends, extra_postargs)
compile_opts = extra_preargs or []
compile_opts.append ('-c')
if debug:
compile_opts.extend (self.compile_options_debug)
else:
compile_opts.extend (self.compile_options)
for obj in objects:
try:
src, ext = build[obj]
except KeyError:
continue
# XXX why do the normpath here?
src = os.path.normpath(src)
obj = os.path.normpath(obj)
# XXX _setup_compile() did a mkpath() too but before the normpath.
# Is it possible to skip the normpath?
self.mkpath(os.path.dirname(obj))
if ext == '.res':
# This is already a binary file -- skip it.
continue # the 'for' loop
if ext == '.rc':
# This needs to be compiled to a .res file -- do it now.
try:
self.spawn (["brcc32", "-fo", obj, src])
except DistutilsExecError, msg:
raise CompileError, msg
continue # the 'for' loop
# The next two are both for the real compiler.
if ext in self._c_extensions:
input_opt = ""
elif ext in self._cpp_extensions:
input_opt = "-P"
else:
# Unknown file type -- no extra options. The compiler
# will probably fail, but let it just in case this is a
# file the compiler recognizes even if we don't.
input_opt = ""
output_opt = "-o" + obj
# Compiler command line syntax is: "bcc32 [options] file(s)".
# Note that the source file names must appear at the end of
# the command line.
try:
self.spawn ([self.cc] + compile_opts + pp_opts +
[input_opt, output_opt] +
extra_postargs + [src])
except DistutilsExecError, msg:
raise CompileError, msg
return objects
# compile ()
def create_static_lib (self,
objects,
output_libname,
output_dir=None,
debug=0,
target_lang=None):
(objects, output_dir) = self._fix_object_args (objects, output_dir)
output_filename = \
self.library_filename (output_libname, output_dir=output_dir)
if self._need_link (objects, output_filename):
lib_args = [output_filename, '/u'] + objects
if debug:
pass # XXX what goes here?
try:
self.spawn ([self.lib] + lib_args)
except DistutilsExecError, msg:
raise LibError, msg
else:
log.debug("skipping %s (up-to-date)", output_filename)
# create_static_lib ()
def link (self,
target_desc,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
# XXX this ignores 'build_temp'! should follow the lead of
# msvccompiler.py
(objects, output_dir) = self._fix_object_args (objects, output_dir)
(libraries, library_dirs, runtime_library_dirs) = \
self._fix_lib_args (libraries, library_dirs, runtime_library_dirs)
if runtime_library_dirs:
log.warn("I don't know what to do with 'runtime_library_dirs': %s",
str(runtime_library_dirs))
if output_dir is not None:
output_filename = os.path.join (output_dir, output_filename)
if self._need_link (objects, output_filename):
# Figure out linker args based on type of target.
if target_desc == CCompiler.EXECUTABLE:
startup_obj = 'c0w32'
if debug:
ld_args = self.ldflags_exe_debug[:]
else:
ld_args = self.ldflags_exe[:]
else:
startup_obj = 'c0d32'
if debug:
ld_args = self.ldflags_shared_debug[:]
else:
ld_args = self.ldflags_shared[:]
# Create a temporary exports file for use by the linker
if export_symbols is None:
def_file = ''
else:
head, tail = os.path.split (output_filename)
modname, ext = os.path.splitext (tail)
temp_dir = os.path.dirname(objects[0]) # preserve tree structure
def_file = os.path.join (temp_dir, '%s.def' % modname)
contents = ['EXPORTS']
for sym in (export_symbols or []):
contents.append(' %s=_%s' % (sym, sym))
self.execute(write_file, (def_file, contents),
"writing %s" % def_file)
# Borland C++ has problems with '/' in paths
objects2 = map(os.path.normpath, objects)
# split objects in .obj and .res files
# Borland C++ needs them at different positions in the command line
objects = [startup_obj]
resources = []
for file in objects2:
(base, ext) = os.path.splitext(os.path.normcase(file))
if ext == '.res':
resources.append(file)
else:
objects.append(file)
for l in library_dirs:
ld_args.append("/L%s" % os.path.normpath(l))
ld_args.append("/L.") # we sometimes use relative paths
# list of object files
ld_args.extend(objects)
# XXX the command-line syntax for Borland C++ is a bit wonky;
# certain filenames are jammed together in one big string, but
# comma-delimited. This doesn't mesh too well with the
# Unix-centric attitude (with a DOS/Windows quoting hack) of
# 'spawn()', so constructing the argument list is a bit
# awkward. Note that doing the obvious thing and jamming all
# the filenames and commas into one argument would be wrong,
# because 'spawn()' would quote any filenames with spaces in
# them. Arghghh!. Apparently it works fine as coded...
# name of dll/exe file
ld_args.extend([',',output_filename])
# no map file and start libraries
ld_args.append(',,')
for lib in libraries:
# see if we find it and if there is a bcpp specific lib
# (xxx_bcpp.lib)
libfile = self.find_library_file(library_dirs, lib, debug)
if libfile is None:
ld_args.append(lib)
# probably a BCPP internal library -- don't warn
else:
# full name which prefers bcpp_xxx.lib over xxx.lib
ld_args.append(libfile)
# some default libraries
ld_args.append ('import32')
ld_args.append ('cw32mt')
# def file for export symbols
ld_args.extend([',',def_file])
# add resource files
ld_args.append(',')
ld_args.extend(resources)
if extra_preargs:
ld_args[:0] = extra_preargs
if extra_postargs:
ld_args.extend(extra_postargs)
self.mkpath (os.path.dirname (output_filename))
try:
self.spawn ([self.linker] + ld_args)
except DistutilsExecError, msg:
raise LinkError, msg
else:
log.debug("skipping %s (up-to-date)", output_filename)
# link ()
# -- Miscellaneous methods -----------------------------------------
def find_library_file (self, dirs, lib, debug=0):
# List of effective library names to try, in order of preference:
# xxx_bcpp.lib is better than xxx.lib
# and xxx_d.lib is better than xxx.lib if debug is set
#
# The "_bcpp" suffix is to handle a Python installation for people
# with multiple compilers (primarily Distutils hackers, I suspect
# ;-). The idea is they'd have one static library for each
# compiler they care about, since (almost?) every Windows compiler
# seems to have a different format for static libraries.
if debug:
dlib = (lib + "_d")
try_names = (dlib + "_bcpp", lib + "_bcpp", dlib, lib)
else:
try_names = (lib + "_bcpp", lib)
for dir in dirs:
for name in try_names:
libfile = os.path.join(dir, self.library_filename(name))
if os.path.exists(libfile):
return libfile
else:
# Oops, didn't find it in *any* of 'dirs'
return None
# overwrite the one from CCompiler to support rc and res-files
def object_filenames (self,
source_filenames,
strip_dir=0,
output_dir=''):
if output_dir is None: output_dir = ''
obj_names = []
for src_name in source_filenames:
# use normcase to make sure '.rc' is really '.rc' and not '.RC'
(base, ext) = os.path.splitext (os.path.normcase(src_name))
if ext not in (self.src_extensions + ['.rc','.res']):
raise UnknownFileError, \
"unknown file type '%s' (from '%s')" % \
(ext, src_name)
if strip_dir:
base = os.path.basename (base)
if ext == '.res':
# these can go unchanged
obj_names.append (os.path.join (output_dir, base + ext))
elif ext == '.rc':
# these need to be compiled to .res-files
obj_names.append (os.path.join (output_dir, base + '.res'))
else:
obj_names.append (os.path.join (output_dir,
base + self.obj_extension))
return obj_names
# object_filenames ()
def preprocess (self,
source,
output_file=None,
macros=None,
include_dirs=None,
extra_preargs=None,
extra_postargs=None):
(_, macros, include_dirs) = \
self._fix_compile_args(None, macros, include_dirs)
pp_opts = gen_preprocess_options(macros, include_dirs)
pp_args = ['cpp32.exe'] + pp_opts
if output_file is not None:
pp_args.append('-o' + output_file)
if extra_preargs:
pp_args[:0] = extra_preargs
if extra_postargs:
pp_args.extend(extra_postargs)
pp_args.append(source)
# We need to preprocess: either we're being forced to, or the
# source file is newer than the target (or the target doesn't
# exist).
if self.force or output_file is None or newer(source, output_file):
if output_file:
self.mkpath(os.path.dirname(output_file))
try:
self.spawn(pp_args)
except DistutilsExecError, msg:
print msg
raise CompileError, msg
# preprocess()
"""distutils.ccompiler
Contains CCompiler, an abstract base class that defines the interface
for the Distutils compiler abstraction model."""
__revision__ = "$Id$"
import sys
import os
import re
from distutils.errors import (CompileError, LinkError, UnknownFileError,
DistutilsPlatformError, DistutilsModuleError)
from distutils.spawn import spawn
from distutils.file_util import move_file
from distutils.dir_util import mkpath
from distutils.dep_util import newer_group
from distutils.util import split_quoted, execute
from distutils import log
# following import is for backward compatibility
from distutils.sysconfig import customize_compiler
class CCompiler:
"""Abstract base class to define the interface that must be implemented
by real compiler classes. Also has some utility methods used by
several compiler classes.
The basic idea behind a compiler abstraction class is that each
instance can be used for all the compile/link steps in building a
single project. Thus, attributes common to all of those compile and
link steps -- include directories, macros to define, libraries to link
against, etc. -- are attributes of the compiler instance. To allow for
variability in how individual files are treated, most of those
attributes may be varied on a per-compilation or per-link basis.
"""
# 'compiler_type' is a class attribute that identifies this class. It
# keeps code that wants to know what kind of compiler it's dealing with
# from having to import all possible compiler classes just to do an
# 'isinstance'. In concrete CCompiler subclasses, 'compiler_type'
# should really, really be one of the keys of the 'compiler_class'
# dictionary (see below -- used by the 'new_compiler()' factory
# function) -- authors of new compiler interface classes are
# responsible for updating 'compiler_class'!
compiler_type = None
# XXX things not handled by this compiler abstraction model:
# * client can't provide additional options for a compiler,
# e.g. warning, optimization, debugging flags. Perhaps this
# should be the domain of concrete compiler abstraction classes
# (UnixCCompiler, MSVCCompiler, etc.) -- or perhaps the base
# class should have methods for the common ones.
# * can't completely override the include or library searchg
# path, ie. no "cc -I -Idir1 -Idir2" or "cc -L -Ldir1 -Ldir2".
# I'm not sure how widely supported this is even by Unix
# compilers, much less on other platforms. And I'm even less
# sure how useful it is; maybe for cross-compiling, but
# support for that is a ways off. (And anyways, cross
# compilers probably have a dedicated binary with the
# right paths compiled in. I hope.)
# * can't do really freaky things with the library list/library
# dirs, e.g. "-Ldir1 -lfoo -Ldir2 -lfoo" to link against
# different versions of libfoo.a in different locations. I
# think this is useless without the ability to null out the
# library search path anyways.
# Subclasses that rely on the standard filename generation methods
# implemented below should override these; see the comment near
# those methods ('object_filenames()' et. al.) for details:
src_extensions = None # list of strings
obj_extension = None # string
static_lib_extension = None
shared_lib_extension = None # string
static_lib_format = None # format string
shared_lib_format = None # prob. same as static_lib_format
exe_extension = None # string
# Default language settings. language_map is used to detect a source
# file or Extension target language, checking source filenames.
# language_order is used to detect the language precedence, when deciding
# what language to use when mixing source types. For example, if some
# extension has two files with ".c" extension, and one with ".cpp", it
# is still linked as c++.
language_map = {".c" : "c",
".cc" : "c++",
".cpp" : "c++",
".cxx" : "c++",
".m" : "objc",
}
language_order = ["c++", "objc", "c"]
def __init__ (self, verbose=0, dry_run=0, force=0):
self.dry_run = dry_run
self.force = force
self.verbose = verbose
# 'output_dir': a common output directory for object, library,
# shared object, and shared library files
self.output_dir = None
# 'macros': a list of macro definitions (or undefinitions). A
# macro definition is a 2-tuple (name, value), where the value is
# either a string or None (no explicit value). A macro
# undefinition is a 1-tuple (name,).
self.macros = []
# 'include_dirs': a list of directories to search for include files
self.include_dirs = []
# 'libraries': a list of libraries to include in any link
# (library names, not filenames: eg. "foo" not "libfoo.a")
self.libraries = []
# 'library_dirs': a list of directories to search for libraries
self.library_dirs = []
# 'runtime_library_dirs': a list of directories to search for
# shared libraries/objects at runtime
self.runtime_library_dirs = []
# 'objects': a list of object files (or similar, such as explicitly
# named library files) to include on any link
self.objects = []
for key in self.executables.keys():
self.set_executable(key, self.executables[key])
def set_executables(self, **args):
"""Define the executables (and options for them) that will be run
to perform the various stages of compilation. The exact set of
executables that may be specified here depends on the compiler
class (via the 'executables' class attribute), but most will have:
compiler the C/C++ compiler
linker_so linker used to create shared objects and libraries
linker_exe linker used to create binary executables
archiver static library creator
On platforms with a command-line (Unix, DOS/Windows), each of these
is a string that will be split into executable name and (optional)
list of arguments. (Splitting the string is done similarly to how
Unix shells operate: words are delimited by spaces, but quotes and
backslashes can override this. See
'distutils.util.split_quoted()'.)
"""
# Note that some CCompiler implementation classes will define class
# attributes 'cpp', 'cc', etc. with hard-coded executable names;
# this is appropriate when a compiler class is for exactly one
# compiler/OS combination (eg. MSVCCompiler). Other compiler
# classes (UnixCCompiler, in particular) are driven by information
# discovered at run-time, since there are many different ways to do
# basically the same things with Unix C compilers.
for key in args.keys():
if key not in self.executables:
raise ValueError, \
"unknown executable '%s' for class %s" % \
(key, self.__class__.__name__)
self.set_executable(key, args[key])
def set_executable(self, key, value):
if isinstance(value, basestring):
setattr(self, key, split_quoted(value))
else:
setattr(self, key, value)
def _find_macro(self, name):
i = 0
for defn in self.macros:
if defn[0] == name:
return i
i = i + 1
return None
def _check_macro_definitions(self, definitions):
"""Ensures that every element of 'definitions' is a valid macro
definition, ie. either (name,value) 2-tuple or a (name,) tuple. Do
nothing if all definitions are OK, raise TypeError otherwise.
"""
for defn in definitions:
if not (isinstance(defn, tuple) and
(len (defn) == 1 or
(len (defn) == 2 and
(isinstance(defn[1], str) or defn[1] is None))) and
isinstance(defn[0], str)):
raise TypeError, \
("invalid macro definition '%s': " % defn) + \
"must be tuple (string,), (string, string), or " + \
"(string, None)"
# -- Bookkeeping methods -------------------------------------------
def define_macro(self, name, value=None):
"""Define a preprocessor macro for all compilations driven by this
compiler object. The optional parameter 'value' should be a
string; if it is not supplied, then the macro will be defined
without an explicit value and the exact outcome depends on the
compiler used (XXX true? does ANSI say anything about this?)
"""
# Delete from the list of macro definitions/undefinitions if
# already there (so that this one will take precedence).
i = self._find_macro (name)
if i is not None:
del self.macros[i]
defn = (name, value)
self.macros.append (defn)
def undefine_macro(self, name):
"""Undefine a preprocessor macro for all compilations driven by
this compiler object. If the same macro is defined by
'define_macro()' and undefined by 'undefine_macro()' the last call
takes precedence (including multiple redefinitions or
undefinitions). If the macro is redefined/undefined on a
per-compilation basis (ie. in the call to 'compile()'), then that
takes precedence.
"""
# Delete from the list of macro definitions/undefinitions if
# already there (so that this one will take precedence).
i = self._find_macro (name)
if i is not None:
del self.macros[i]
undefn = (name,)
self.macros.append (undefn)
def add_include_dir(self, dir):
"""Add 'dir' to the list of directories that will be searched for
header files. The compiler is instructed to search directories in
the order in which they are supplied by successive calls to
'add_include_dir()'.
"""
self.include_dirs.append (dir)
def set_include_dirs(self, dirs):
"""Set the list of directories that will be searched to 'dirs' (a
list of strings). Overrides any preceding calls to
'add_include_dir()'; subsequence calls to 'add_include_dir()' add
to the list passed to 'set_include_dirs()'. This does not affect
any list of standard include directories that the compiler may
search by default.
"""
self.include_dirs = dirs[:]
def add_library(self, libname):
"""Add 'libname' to the list of libraries that will be included in
all links driven by this compiler object. Note that 'libname'
should *not* be the name of a file containing a library, but the
name of the library itself: the actual filename will be inferred by
the linker, the compiler, or the compiler class (depending on the
platform).
The linker will be instructed to link against libraries in the
order they were supplied to 'add_library()' and/or
'set_libraries()'. It is perfectly valid to duplicate library
names; the linker will be instructed to link against libraries as
many times as they are mentioned.
"""
self.libraries.append (libname)
def set_libraries(self, libnames):
"""Set the list of libraries to be included in all links driven by
this compiler object to 'libnames' (a list of strings). This does
not affect any standard system libraries that the linker may
include by default.
"""
self.libraries = libnames[:]
def add_library_dir(self, dir):
"""Add 'dir' to the list of directories that will be searched for
libraries specified to 'add_library()' and 'set_libraries()'. The
linker will be instructed to search for libraries in the order they
are supplied to 'add_library_dir()' and/or 'set_library_dirs()'.
"""
self.library_dirs.append(dir)
def set_library_dirs(self, dirs):
"""Set the list of library search directories to 'dirs' (a list of
strings). This does not affect any standard library search path
that the linker may search by default.
"""
self.library_dirs = dirs[:]
def add_runtime_library_dir(self, dir):
"""Add 'dir' to the list of directories that will be searched for
shared libraries at runtime.
"""
self.runtime_library_dirs.append(dir)
def set_runtime_library_dirs(self, dirs):
"""Set the list of directories to search for shared libraries at
runtime to 'dirs' (a list of strings). This does not affect any
standard search path that the runtime linker may search by
default.
"""
self.runtime_library_dirs = dirs[:]
def add_link_object(self, object):
"""Add 'object' to the list of object files (or analogues, such as
explicitly named library files or the output of "resource
compilers") to be included in every link driven by this compiler
object.
"""
self.objects.append(object)
def set_link_objects(self, objects):
"""Set the list of object files (or analogues) to be included in
every link to 'objects'. This does not affect any standard object
files that the linker may include by default (such as system
libraries).
"""
self.objects = objects[:]
# -- Private utility methods --------------------------------------
# (here for the convenience of subclasses)
# Helper method to prep compiler in subclass compile() methods
def _setup_compile(self, outdir, macros, incdirs, sources, depends,
extra):
"""Process arguments and decide which source files to compile."""
if outdir is None:
outdir = self.output_dir
elif not isinstance(outdir, str):
raise TypeError, "'output_dir' must be a string or None"
if macros is None:
macros = self.macros
elif isinstance(macros, list):
macros = macros + (self.macros or [])
else:
raise TypeError, "'macros' (if supplied) must be a list of tuples"
if incdirs is None:
incdirs = self.include_dirs
elif isinstance(incdirs, (list, tuple)):
incdirs = list(incdirs) + (self.include_dirs or [])
else:
raise TypeError, \
"'include_dirs' (if supplied) must be a list of strings"
if extra is None:
extra = []
# Get the list of expected output (object) files
objects = self.object_filenames(sources,
strip_dir=0,
output_dir=outdir)
assert len(objects) == len(sources)
pp_opts = gen_preprocess_options(macros, incdirs)
build = {}
for i in range(len(sources)):
src = sources[i]
obj = objects[i]
ext = os.path.splitext(src)[1]
self.mkpath(os.path.dirname(obj))
build[obj] = (src, ext)
return macros, objects, extra, pp_opts, build
def _get_cc_args(self, pp_opts, debug, before):
# works for unixccompiler, emxccompiler, cygwinccompiler
cc_args = pp_opts + ['-c']
if debug:
cc_args[:0] = ['-g']
if before:
cc_args[:0] = before
return cc_args
def _fix_compile_args(self, output_dir, macros, include_dirs):
"""Typecheck and fix-up some of the arguments to the 'compile()'
method, and return fixed-up values. Specifically: if 'output_dir'
is None, replaces it with 'self.output_dir'; ensures that 'macros'
is a list, and augments it with 'self.macros'; ensures that
'include_dirs' is a list, and augments it with 'self.include_dirs'.
Guarantees that the returned values are of the correct type,
i.e. for 'output_dir' either string or None, and for 'macros' and
'include_dirs' either list or None.
"""
if output_dir is None:
output_dir = self.output_dir
elif not isinstance(output_dir, str):
raise TypeError, "'output_dir' must be a string or None"
if macros is None:
macros = self.macros
elif isinstance(macros, list):
macros = macros + (self.macros or [])
else:
raise TypeError, "'macros' (if supplied) must be a list of tuples"
if include_dirs is None:
include_dirs = self.include_dirs
elif isinstance(include_dirs, (list, tuple)):
include_dirs = list (include_dirs) + (self.include_dirs or [])
else:
raise TypeError, \
"'include_dirs' (if supplied) must be a list of strings"
return output_dir, macros, include_dirs
def _fix_object_args(self, objects, output_dir):
"""Typecheck and fix up some arguments supplied to various methods.
Specifically: ensure that 'objects' is a list; if output_dir is
None, replace with self.output_dir. Return fixed versions of
'objects' and 'output_dir'.
"""
if not isinstance(objects, (list, tuple)):
raise TypeError, \
"'objects' must be a list or tuple of strings"
objects = list (objects)
if output_dir is None:
output_dir = self.output_dir
elif not isinstance(output_dir, str):
raise TypeError, "'output_dir' must be a string or None"
return (objects, output_dir)
def _fix_lib_args(self, libraries, library_dirs, runtime_library_dirs):
"""Typecheck and fix up some of the arguments supplied to the
'link_*' methods. Specifically: ensure that all arguments are
lists, and augment them with their permanent versions
(eg. 'self.libraries' augments 'libraries'). Return a tuple with
fixed versions of all arguments.
"""
if libraries is None:
libraries = self.libraries
elif isinstance(libraries, (list, tuple)):
libraries = list (libraries) + (self.libraries or [])
else:
raise TypeError, \
"'libraries' (if supplied) must be a list of strings"
if library_dirs is None:
library_dirs = self.library_dirs
elif isinstance(library_dirs, (list, tuple)):
library_dirs = list (library_dirs) + (self.library_dirs or [])
else:
raise TypeError, \
"'library_dirs' (if supplied) must be a list of strings"
if runtime_library_dirs is None:
runtime_library_dirs = self.runtime_library_dirs
elif isinstance(runtime_library_dirs, (list, tuple)):
runtime_library_dirs = (list (runtime_library_dirs) +
(self.runtime_library_dirs or []))
else:
raise TypeError, \
"'runtime_library_dirs' (if supplied) " + \
"must be a list of strings"
return (libraries, library_dirs, runtime_library_dirs)
def _need_link(self, objects, output_file):
"""Return true if we need to relink the files listed in 'objects'
to recreate 'output_file'.
"""
if self.force:
return 1
else:
if self.dry_run:
newer = newer_group (objects, output_file, missing='newer')
else:
newer = newer_group (objects, output_file)
return newer
def detect_language(self, sources):
"""Detect the language of a given file, or list of files. Uses
language_map, and language_order to do the job.
"""
if not isinstance(sources, list):
sources = [sources]
lang = None
index = len(self.language_order)
for source in sources:
base, ext = os.path.splitext(source)
extlang = self.language_map.get(ext)
try:
extindex = self.language_order.index(extlang)
if extindex < index:
lang = extlang
index = extindex
except ValueError:
pass
return lang
# -- Worker methods ------------------------------------------------
# (must be implemented by subclasses)
def preprocess(self, source, output_file=None, macros=None,
include_dirs=None, extra_preargs=None, extra_postargs=None):
"""Preprocess a single C/C++ source file, named in 'source'.
Output will be written to file named 'output_file', or stdout if
'output_file' not supplied. 'macros' is a list of macro
definitions as for 'compile()', which will augment the macros set
with 'define_macro()' and 'undefine_macro()'. 'include_dirs' is a
list of directory names that will be added to the default list.
Raises PreprocessError on failure.
"""
pass
def compile(self, sources, output_dir=None, macros=None,
include_dirs=None, debug=0, extra_preargs=None,
extra_postargs=None, depends=None):
"""Compile one or more source files.
'sources' must be a list of filenames, most likely C/C++
files, but in reality anything that can be handled by a
particular compiler and compiler class (eg. MSVCCompiler can
handle resource files in 'sources'). Return a list of object
filenames, one per source filename in 'sources'. Depending on
the implementation, not all source files will necessarily be
compiled, but all corresponding object filenames will be
returned.
If 'output_dir' is given, object files will be put under it, while
retaining their original path component. That is, "foo/bar.c"
normally compiles to "foo/bar.o" (for a Unix implementation); if
'output_dir' is "build", then it would compile to
"build/foo/bar.o".
'macros', if given, must be a list of macro definitions. A macro
definition is either a (name, value) 2-tuple or a (name,) 1-tuple.
The former defines a macro; if the value is None, the macro is
defined without an explicit value. The 1-tuple case undefines a
macro. Later definitions/redefinitions/ undefinitions take
precedence.
'include_dirs', if given, must be a list of strings, the
directories to add to the default include file search path for this
compilation only.
'debug' is a boolean; if true, the compiler will be instructed to
output debug symbols in (or alongside) the object file(s).
'extra_preargs' and 'extra_postargs' are implementation- dependent.
On platforms that have the notion of a command-line (e.g. Unix,
DOS/Windows), they are most likely lists of strings: extra
command-line arguments to prepand/append to the compiler command
line. On other platforms, consult the implementation class
documentation. In any event, they are intended as an escape hatch
for those occasions when the abstract compiler framework doesn't
cut the mustard.
'depends', if given, is a list of filenames that all targets
depend on. If a source file is older than any file in
depends, then the source file will be recompiled. This
supports dependency tracking, but only at a coarse
granularity.
Raises CompileError on failure.
"""
# A concrete compiler class can either override this method
# entirely or implement _compile().
macros, objects, extra_postargs, pp_opts, build = \
self._setup_compile(output_dir, macros, include_dirs, sources,
depends, extra_postargs)
cc_args = self._get_cc_args(pp_opts, debug, extra_preargs)
for obj in objects:
try:
src, ext = build[obj]
except KeyError:
continue
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
# Return *all* object filenames, not just the ones we just built.
return objects
def _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts):
"""Compile 'src' to product 'obj'."""
# A concrete compiler class that does not override compile()
# should implement _compile().
pass
def create_static_lib(self, objects, output_libname, output_dir=None,
debug=0, target_lang=None):
"""Link a bunch of stuff together to create a static library file.
The "bunch of stuff" consists of the list of object files supplied
as 'objects', the extra object files supplied to
'add_link_object()' and/or 'set_link_objects()', the libraries
supplied to 'add_library()' and/or 'set_libraries()', and the
libraries supplied as 'libraries' (if any).
'output_libname' should be a library name, not a filename; the
filename will be inferred from the library name. 'output_dir' is
the directory where the library file will be put.
'debug' is a boolean; if true, debugging information will be
included in the library (note that on most platforms, it is the
compile step where this matters: the 'debug' flag is included here
just for consistency).
'target_lang' is the target language for which the given objects
are being compiled. This allows specific linkage time treatment of
certain languages.
Raises LibError on failure.
"""
pass
# values for target_desc parameter in link()
SHARED_OBJECT = "shared_object"
SHARED_LIBRARY = "shared_library"
EXECUTABLE = "executable"
def link(self, target_desc, objects, output_filename, output_dir=None,
libraries=None, library_dirs=None, runtime_library_dirs=None,
export_symbols=None, debug=0, extra_preargs=None,
extra_postargs=None, build_temp=None, target_lang=None):
"""Link a bunch of stuff together to create an executable or
shared library file.
The "bunch of stuff" consists of the list of object files supplied
as 'objects'. 'output_filename' should be a filename. If
'output_dir' is supplied, 'output_filename' is relative to it
(i.e. 'output_filename' can provide directory components if
needed).
'libraries' is a list of libraries to link against. These are
library names, not filenames, since they're translated into
filenames in a platform-specific way (eg. "foo" becomes "libfoo.a"
on Unix and "foo.lib" on DOS/Windows). However, they can include a
directory component, which means the linker will look in that
specific directory rather than searching all the normal locations.
'library_dirs', if supplied, should be a list of directories to
search for libraries that were specified as bare library names
(ie. no directory component). These are on top of the system
default and those supplied to 'add_library_dir()' and/or
'set_library_dirs()'. 'runtime_library_dirs' is a list of
directories that will be embedded into the shared library and used
to search for other shared libraries that *it* depends on at
run-time. (This may only be relevant on Unix.)
'export_symbols' is a list of symbols that the shared library will
export. (This appears to be relevant only on Windows.)
'debug' is as for 'compile()' and 'create_static_lib()', with the
slight distinction that it actually matters on most platforms (as
opposed to 'create_static_lib()', which includes a 'debug' flag
mostly for form's sake).
'extra_preargs' and 'extra_postargs' are as for 'compile()' (except
of course that they supply command-line arguments for the
particular linker being used).
'target_lang' is the target language for which the given objects
are being compiled. This allows specific linkage time treatment of
certain languages.
Raises LinkError on failure.
"""
raise NotImplementedError
# Old 'link_*()' methods, rewritten to use the new 'link()' method.
def link_shared_lib(self, objects, output_libname, output_dir=None,
libraries=None, library_dirs=None,
runtime_library_dirs=None, export_symbols=None,
debug=0, extra_preargs=None, extra_postargs=None,
build_temp=None, target_lang=None):
self.link(CCompiler.SHARED_LIBRARY, objects,
self.library_filename(output_libname, lib_type='shared'),
output_dir,
libraries, library_dirs, runtime_library_dirs,
export_symbols, debug,
extra_preargs, extra_postargs, build_temp, target_lang)
def link_shared_object(self, objects, output_filename, output_dir=None,
libraries=None, library_dirs=None,
runtime_library_dirs=None, export_symbols=None,
debug=0, extra_preargs=None, extra_postargs=None,
build_temp=None, target_lang=None):
self.link(CCompiler.SHARED_OBJECT, objects,
output_filename, output_dir,
libraries, library_dirs, runtime_library_dirs,
export_symbols, debug,
extra_preargs, extra_postargs, build_temp, target_lang)
def link_executable(self, objects, output_progname, output_dir=None,
libraries=None, library_dirs=None,
runtime_library_dirs=None, debug=0, extra_preargs=None,
extra_postargs=None, target_lang=None):
self.link(CCompiler.EXECUTABLE, objects,
self.executable_filename(output_progname), output_dir,
libraries, library_dirs, runtime_library_dirs, None,
debug, extra_preargs, extra_postargs, None, target_lang)
# -- Miscellaneous methods -----------------------------------------
# These are all used by the 'gen_lib_options() function; there is
# no appropriate default implementation so subclasses should
# implement all of these.
def library_dir_option(self, dir):
"""Return the compiler option to add 'dir' to the list of
directories searched for libraries.
"""
raise NotImplementedError
def runtime_library_dir_option(self, dir):
"""Return the compiler option to add 'dir' to the list of
directories searched for runtime libraries.
"""
raise NotImplementedError
def library_option(self, lib):
"""Return the compiler option to add 'lib' to the list of libraries
linked into the shared library or executable.
"""
raise NotImplementedError
def has_function(self, funcname, includes=None, include_dirs=None,
libraries=None, library_dirs=None):
"""Return a boolean indicating whether funcname is supported on
the current platform. The optional arguments can be used to
augment the compilation environment.
"""
# this can't be included at module scope because it tries to
# import math which might not be available at that point - maybe
# the necessary logic should just be inlined?
import tempfile
if includes is None:
includes = []
if include_dirs is None:
include_dirs = []
if libraries is None:
libraries = []
if library_dirs is None:
library_dirs = []
fd, fname = tempfile.mkstemp(".c", funcname, text=True)
f = os.fdopen(fd, "w")
try:
for incl in includes:
f.write("""#include "%s"\n""" % incl)
f.write("""\
int main (int argc, char **argv) {
%s();
return 0;
}
""" % funcname)
finally:
f.close()
try:
objects = self.compile([fname], include_dirs=include_dirs)
except CompileError:
return False
try:
self.link_executable(objects, "a.out",
libraries=libraries,
library_dirs=library_dirs)
except (LinkError, TypeError):
return False
return True
def find_library_file (self, dirs, lib, debug=0):
"""Search the specified list of directories for a static or shared
library file 'lib' and return the full path to that file. If
'debug' true, look for a debugging version (if that makes sense on
the current platform). Return None if 'lib' wasn't found in any of
the specified directories.
"""
raise NotImplementedError
# -- Filename generation methods -----------------------------------
# The default implementation of the filename generating methods are
# prejudiced towards the Unix/DOS/Windows view of the world:
# * object files are named by replacing the source file extension
# (eg. .c/.cpp -> .o/.obj)
# * library files (shared or static) are named by plugging the
# library name and extension into a format string, eg.
# "lib%s.%s" % (lib_name, ".a") for Unix static libraries
# * executables are named by appending an extension (possibly
# empty) to the program name: eg. progname + ".exe" for
# Windows
#
# To reduce redundant code, these methods expect to find
# several attributes in the current object (presumably defined
# as class attributes):
# * src_extensions -
# list of C/C++ source file extensions, eg. ['.c', '.cpp']
# * obj_extension -
# object file extension, eg. '.o' or '.obj'
# * static_lib_extension -
# extension for static library files, eg. '.a' or '.lib'
# * shared_lib_extension -
# extension for shared library/object files, eg. '.so', '.dll'
# * static_lib_format -
# format string for generating static library filenames,
# eg. 'lib%s.%s' or '%s.%s'
# * shared_lib_format
# format string for generating shared library filenames
# (probably same as static_lib_format, since the extension
# is one of the intended parameters to the format string)
# * exe_extension -
# extension for executable files, eg. '' or '.exe'
def object_filenames(self, source_filenames, strip_dir=0, output_dir=''):
if output_dir is None:
output_dir = ''
obj_names = []
for src_name in source_filenames:
base, ext = os.path.splitext(src_name)
base = os.path.splitdrive(base)[1] # Chop off the drive
base = base[os.path.isabs(base):] # If abs, chop off leading /
if ext not in self.src_extensions:
raise UnknownFileError, \
"unknown file type '%s' (from '%s')" % (ext, src_name)
if strip_dir:
base = os.path.basename(base)
obj_names.append(os.path.join(output_dir,
base + self.obj_extension))
return obj_names
def shared_object_filename(self, basename, strip_dir=0, output_dir=''):
assert output_dir is not None
if strip_dir:
basename = os.path.basename (basename)
return os.path.join(output_dir, basename + self.shared_lib_extension)
def executable_filename(self, basename, strip_dir=0, output_dir=''):
assert output_dir is not None
if strip_dir:
basename = os.path.basename (basename)
return os.path.join(output_dir, basename + (self.exe_extension or ''))
def library_filename(self, libname, lib_type='static', # or 'shared'
strip_dir=0, output_dir=''):
assert output_dir is not None
if lib_type not in ("static", "shared", "dylib", "xcode_stub"):
raise ValueError, ("""'lib_type' must be "static", "shared", """
""""dylib", or "xcode_stub".""")
fmt = getattr(self, lib_type + "_lib_format")
ext = getattr(self, lib_type + "_lib_extension")
dir, base = os.path.split (libname)
filename = fmt % (base, ext)
if strip_dir:
dir = ''
return os.path.join(output_dir, dir, filename)
# -- Utility methods -----------------------------------------------
def announce(self, msg, level=1):
log.debug(msg)
def debug_print(self, msg):
from distutils.debug import DEBUG
if DEBUG:
print msg
def warn(self, msg):
sys.stderr.write("warning: %s\n" % msg)
def execute(self, func, args, msg=None, level=1):
execute(func, args, msg, self.dry_run)
def spawn(self, cmd):
spawn(cmd, dry_run=self.dry_run)
def move_file(self, src, dst):
return move_file(src, dst, dry_run=self.dry_run)
def mkpath(self, name, mode=0777):
mkpath(name, mode, dry_run=self.dry_run)
# class CCompiler
# Map a sys.platform/os.name ('posix', 'nt') to the default compiler
# type for that platform. Keys are interpreted as re match
# patterns. Order is important; platform mappings are preferred over
# OS names.
_default_compilers = (
# Platform string mappings
# on a cygwin built python we can use gcc like an ordinary UNIXish
# compiler
('cygwin.*', 'unix'),
('os2emx', 'emx'),
# OS name mappings
('posix', 'unix'),
('nt', 'msvc'),
)
def get_default_compiler(osname=None, platform=None):
""" Determine the default compiler to use for the given platform.
osname should be one of the standard Python OS names (i.e. the
ones returned by os.name) and platform the common value
returned by sys.platform for the platform in question.
The default values are os.name and sys.platform in case the
parameters are not given.
"""
if osname is None:
osname = os.name
if platform is None:
platform = sys.platform
for pattern, compiler in _default_compilers:
if re.match(pattern, platform) is not None or \
re.match(pattern, osname) is not None:
return compiler
# Default to Unix compiler
return 'unix'
# Map compiler types to (module_name, class_name) pairs -- ie. where to
# find the code that implements an interface to this compiler. (The module
# is assumed to be in the 'distutils' package.)
compiler_class = { 'unix': ('unixccompiler', 'UnixCCompiler',
"standard UNIX-style compiler"),
'msvc': ('msvccompiler', 'MSVCCompiler',
"Microsoft Visual C++"),
'cygwin': ('cygwinccompiler', 'CygwinCCompiler',
"Cygwin port of GNU C Compiler for Win32"),
'mingw32': ('cygwinccompiler', 'Mingw32CCompiler',
"Mingw32 port of GNU C Compiler for Win32"),
'bcpp': ('bcppcompiler', 'BCPPCompiler',
"Borland C++ Compiler"),
'emx': ('emxccompiler', 'EMXCCompiler',
"EMX port of GNU C Compiler for OS/2"),
}
def show_compilers():
"""Print list of available compilers (used by the "--help-compiler"
options to "build", "build_ext", "build_clib").
"""
# XXX this "knows" that the compiler option it's describing is
# "--compiler", which just happens to be the case for the three
# commands that use it.
from distutils.fancy_getopt import FancyGetopt
compilers = []
for compiler in compiler_class.keys():
compilers.append(("compiler="+compiler, None,
compiler_class[compiler][2]))
compilers.sort()
pretty_printer = FancyGetopt(compilers)
pretty_printer.print_help("List of available compilers:")
def new_compiler(plat=None, compiler=None, verbose=0, dry_run=0, force=0):
"""Generate an instance of some CCompiler subclass for the supplied
platform/compiler combination. 'plat' defaults to 'os.name'
(eg. 'posix', 'nt'), and 'compiler' defaults to the default compiler
for that platform. Currently only 'posix' and 'nt' are supported, and
the default compilers are "traditional Unix interface" (UnixCCompiler
class) and Visual C++ (MSVCCompiler class). Note that it's perfectly
possible to ask for a Unix compiler object under Windows, and a
Microsoft compiler object under Unix -- if you supply a value for
'compiler', 'plat' is ignored.
"""
if plat is None:
plat = os.name
try:
if compiler is None:
compiler = get_default_compiler(plat)
(module_name, class_name, long_description) = compiler_class[compiler]
except KeyError:
msg = "don't know how to compile C/C++ code on platform '%s'" % plat
if compiler is not None:
msg = msg + " with '%s' compiler" % compiler
raise DistutilsPlatformError, msg
try:
module_name = "distutils." + module_name
__import__ (module_name)
module = sys.modules[module_name]
klass = vars(module)[class_name]
except ImportError:
raise DistutilsModuleError, \
"can't compile C/C++ code: unable to load module '%s'" % \
module_name
except KeyError:
raise DistutilsModuleError, \
("can't compile C/C++ code: unable to find class '%s' " +
"in module '%s'") % (class_name, module_name)
# XXX The None is necessary to preserve backwards compatibility
# with classes that expect verbose to be the first positional
# argument.
return klass(None, dry_run, force)
def gen_preprocess_options(macros, include_dirs):
"""Generate C pre-processor options (-D, -U, -I) as used by at least
two types of compilers: the typical Unix compiler and Visual C++.
'macros' is the usual thing, a list of 1- or 2-tuples, where (name,)
means undefine (-U) macro 'name', and (name,value) means define (-D)
macro 'name' to 'value'. 'include_dirs' is just a list of directory
names to be added to the header file search path (-I). Returns a list
of command-line options suitable for either Unix compilers or Visual
C++.
"""
# XXX it would be nice (mainly aesthetic, and so we don't generate
# stupid-looking command lines) to go over 'macros' and eliminate
# redundant definitions/undefinitions (ie. ensure that only the
# latest mention of a particular macro winds up on the command
# line). I don't think it's essential, though, since most (all?)
# Unix C compilers only pay attention to the latest -D or -U
# mention of a macro on their command line. Similar situation for
# 'include_dirs'. I'm punting on both for now. Anyways, weeding out
# redundancies like this should probably be the province of
# CCompiler, since the data structures used are inherited from it
# and therefore common to all CCompiler classes.
pp_opts = []
for macro in macros:
if not (isinstance(macro, tuple) and
1 <= len (macro) <= 2):
raise TypeError, \
("bad macro definition '%s': " +
"each element of 'macros' list must be a 1- or 2-tuple") % \
macro
if len (macro) == 1: # undefine this macro
pp_opts.append ("-U%s" % macro[0])
elif len (macro) == 2:
if macro[1] is None: # define with no explicit value
pp_opts.append ("-D%s" % macro[0])
else:
# XXX *don't* need to be clever about quoting the
# macro value here, because we're going to avoid the
# shell at all costs when we spawn the command!
pp_opts.append ("-D%s=%s" % macro)
for dir in include_dirs:
pp_opts.append ("-I%s" % dir)
return pp_opts
def gen_lib_options(compiler, library_dirs, runtime_library_dirs, libraries):
"""Generate linker options for searching library directories and
linking with specific libraries.
'libraries' and 'library_dirs' are, respectively, lists of library names
(not filenames!) and search directories. Returns a list of command-line
options suitable for use with some compiler (depending on the two format
strings passed in).
"""
lib_opts = []
for dir in library_dirs:
lib_opts.append(compiler.library_dir_option(dir))
for dir in runtime_library_dirs:
opt = compiler.runtime_library_dir_option(dir)
if isinstance(opt, list):
lib_opts.extend(opt)
else:
lib_opts.append(opt)
# XXX it's important that we *not* remove redundant library mentions!
# sometimes you really do have to say "-lfoo -lbar -lfoo" in order to
# resolve all symbols. I just hope we never have to say "-lfoo obj.o
# -lbar" to get things to work -- that's certainly a possibility, but a
# pretty nasty way to arrange your C code.
for lib in libraries:
lib_dir, lib_name = os.path.split(lib)
if lib_dir != '':
lib_file = compiler.find_library_file([lib_dir], lib_name)
if lib_file is not None:
lib_opts.append(lib_file)
else:
compiler.warn("no library file corresponding to "
"'%s' found (skipping)" % lib)
else:
lib_opts.append(compiler.library_option(lib))
return lib_opts
"""distutils.cmd
Provides the Command class, the base class for the command classes
in the distutils.command package.
"""
__revision__ = "$Id$"
import sys, os, re
from distutils.errors import DistutilsOptionError
from distutils import util, dir_util, file_util, archive_util, dep_util
from distutils import log
class Command:
"""Abstract base class for defining command classes, the "worker bees"
of the Distutils. A useful analogy for command classes is to think of
them as subroutines with local variables called "options". The options
are "declared" in 'initialize_options()' and "defined" (given their
final values, aka "finalized") in 'finalize_options()', both of which
must be defined by every command class. The distinction between the
two is necessary because option values might come from the outside
world (command line, config file, ...), and any options dependent on
other options must be computed *after* these outside influences have
been processed -- hence 'finalize_options()'. The "body" of the
subroutine, where it does all its work based on the values of its
options, is the 'run()' method, which must also be implemented by every
command class.
"""
# 'sub_commands' formalizes the notion of a "family" of commands,
# eg. "install" as the parent with sub-commands "install_lib",
# "install_headers", etc. The parent of a family of commands
# defines 'sub_commands' as a class attribute; it's a list of
# (command_name : string, predicate : unbound_method | string | None)
# tuples, where 'predicate' is a method of the parent command that
# determines whether the corresponding command is applicable in the
# current situation. (Eg. we "install_headers" is only applicable if
# we have any C header files to install.) If 'predicate' is None,
# that command is always applicable.
#
# 'sub_commands' is usually defined at the *end* of a class, because
# predicates can be unbound methods, so they must already have been
# defined. The canonical example is the "install" command.
sub_commands = []
# -- Creation/initialization methods -------------------------------
def __init__(self, dist):
"""Create and initialize a new Command object. Most importantly,
invokes the 'initialize_options()' method, which is the real
initializer and depends on the actual command being
instantiated.
"""
# late import because of mutual dependence between these classes
from distutils.dist import Distribution
if not isinstance(dist, Distribution):
raise TypeError, "dist must be a Distribution instance"
if self.__class__ is Command:
raise RuntimeError, "Command is an abstract class"
self.distribution = dist
self.initialize_options()
# Per-command versions of the global flags, so that the user can
# customize Distutils' behaviour command-by-command and let some
# commands fall back on the Distribution's behaviour. None means
# "not defined, check self.distribution's copy", while 0 or 1 mean
# false and true (duh). Note that this means figuring out the real
# value of each flag is a touch complicated -- hence "self._dry_run"
# will be handled by __getattr__, below.
# XXX This needs to be fixed.
self._dry_run = None
# verbose is largely ignored, but needs to be set for
# backwards compatibility (I think)?
self.verbose = dist.verbose
# Some commands define a 'self.force' option to ignore file
# timestamps, but methods defined *here* assume that
# 'self.force' exists for all commands. So define it here
# just to be safe.
self.force = None
# The 'help' flag is just used for command-line parsing, so
# none of that complicated bureaucracy is needed.
self.help = 0
# 'finalized' records whether or not 'finalize_options()' has been
# called. 'finalize_options()' itself should not pay attention to
# this flag: it is the business of 'ensure_finalized()', which
# always calls 'finalize_options()', to respect/update it.
self.finalized = 0
# XXX A more explicit way to customize dry_run would be better.
def __getattr__(self, attr):
if attr == 'dry_run':
myval = getattr(self, "_" + attr)
if myval is None:
return getattr(self.distribution, attr)
else:
return myval
else:
raise AttributeError, attr
def ensure_finalized(self):
if not self.finalized:
self.finalize_options()
self.finalized = 1
# Subclasses must define:
# initialize_options()
# provide default values for all options; may be customized by
# setup script, by options from config file(s), or by command-line
# options
# finalize_options()
# decide on the final values for all options; this is called
# after all possible intervention from the outside world
# (command-line, option file, etc.) has been processed
# run()
# run the command: do whatever it is we're here to do,
# controlled by the command's various option values
def initialize_options(self):
"""Set default values for all the options that this command
supports. Note that these defaults may be overridden by other
commands, by the setup script, by config files, or by the
command-line. Thus, this is not the place to code dependencies
between options; generally, 'initialize_options()' implementations
are just a bunch of "self.foo = None" assignments.
This method must be implemented by all command classes.
"""
raise RuntimeError, \
"abstract method -- subclass %s must override" % self.__class__
def finalize_options(self):
"""Set final values for all the options that this command supports.
This is always called as late as possible, ie. after any option
assignments from the command-line or from other commands have been
done. Thus, this is the place to code option dependencies: if
'foo' depends on 'bar', then it is safe to set 'foo' from 'bar' as
long as 'foo' still has the same value it was assigned in
'initialize_options()'.
This method must be implemented by all command classes.
"""
raise RuntimeError, \
"abstract method -- subclass %s must override" % self.__class__
def dump_options(self, header=None, indent=""):
from distutils.fancy_getopt import longopt_xlate
if header is None:
header = "command options for '%s':" % self.get_command_name()
self.announce(indent + header, level=log.INFO)
indent = indent + " "
for (option, _, _) in self.user_options:
option = option.translate(longopt_xlate)
if option[-1] == "=":
option = option[:-1]
value = getattr(self, option)
self.announce(indent + "%s = %s" % (option, value),
level=log.INFO)
def run(self):
"""A command's raison d'etre: carry out the action it exists to
perform, controlled by the options initialized in
'initialize_options()', customized by other commands, the setup
script, the command-line, and config files, and finalized in
'finalize_options()'. All terminal output and filesystem
interaction should be done by 'run()'.
This method must be implemented by all command classes.
"""
raise RuntimeError, \
"abstract method -- subclass %s must override" % self.__class__
def announce(self, msg, level=1):
"""If the current verbosity level is of greater than or equal to
'level' print 'msg' to stdout.
"""
log.log(level, msg)
def debug_print(self, msg):
"""Print 'msg' to stdout if the global DEBUG (taken from the
DISTUTILS_DEBUG environment variable) flag is true.
"""
from distutils.debug import DEBUG
if DEBUG:
print msg
sys.stdout.flush()
# -- Option validation methods -------------------------------------
# (these are very handy in writing the 'finalize_options()' method)
#
# NB. the general philosophy here is to ensure that a particular option
# value meets certain type and value constraints. If not, we try to
# force it into conformance (eg. if we expect a list but have a string,
# split the string on comma and/or whitespace). If we can't force the
# option into conformance, raise DistutilsOptionError. Thus, command
# classes need do nothing more than (eg.)
# self.ensure_string_list('foo')
# and they can be guaranteed that thereafter, self.foo will be
# a list of strings.
def _ensure_stringlike(self, option, what, default=None):
val = getattr(self, option)
if val is None:
setattr(self, option, default)
return default
elif not isinstance(val, str):
raise DistutilsOptionError, \
"'%s' must be a %s (got `%s`)" % (option, what, val)
return val
def ensure_string(self, option, default=None):
"""Ensure that 'option' is a string; if not defined, set it to
'default'.
"""
self._ensure_stringlike(option, "string", default)
def ensure_string_list(self, option):
"""Ensure that 'option' is a list of strings. If 'option' is
currently a string, we split it either on /,\s*/ or /\s+/, so
"foo bar baz", "foo,bar,baz", and "foo, bar baz" all become
["foo", "bar", "baz"].
"""
val = getattr(self, option)
if val is None:
return
elif isinstance(val, str):
setattr(self, option, re.split(r',\s*|\s+', val))
else:
if isinstance(val, list):
# checks if all elements are str
ok = 1
for element in val:
if not isinstance(element, str):
ok = 0
break
else:
ok = 0
if not ok:
raise DistutilsOptionError, \
"'%s' must be a list of strings (got %r)" % \
(option, val)
def _ensure_tested_string(self, option, tester,
what, error_fmt, default=None):
val = self._ensure_stringlike(option, what, default)
if val is not None and not tester(val):
raise DistutilsOptionError, \
("error in '%s' option: " + error_fmt) % (option, val)
def ensure_filename(self, option):
"""Ensure that 'option' is the name of an existing file."""
self._ensure_tested_string(option, os.path.isfile,
"filename",
"'%s' does not exist or is not a file")
def ensure_dirname(self, option):
self._ensure_tested_string(option, os.path.isdir,
"directory name",
"'%s' does not exist or is not a directory")
# -- Convenience methods for commands ------------------------------
def get_command_name(self):
if hasattr(self, 'command_name'):
return self.command_name
else:
return self.__class__.__name__
def set_undefined_options(self, src_cmd, *option_pairs):
"""Set the values of any "undefined" options from corresponding
option values in some other command object. "Undefined" here means
"is None", which is the convention used to indicate that an option
has not been changed between 'initialize_options()' and
'finalize_options()'. Usually called from 'finalize_options()' for
options that depend on some other command rather than another
option of the same command. 'src_cmd' is the other command from
which option values will be taken (a command object will be created
for it if necessary); the remaining arguments are
'(src_option,dst_option)' tuples which mean "take the value of
'src_option' in the 'src_cmd' command object, and copy it to
'dst_option' in the current command object".
"""
# Option_pairs: list of (src_option, dst_option) tuples
src_cmd_obj = self.distribution.get_command_obj(src_cmd)
src_cmd_obj.ensure_finalized()
for (src_option, dst_option) in option_pairs:
if getattr(self, dst_option) is None:
setattr(self, dst_option,
getattr(src_cmd_obj, src_option))
def get_finalized_command(self, command, create=1):
"""Wrapper around Distribution's 'get_command_obj()' method: find
(create if necessary and 'create' is true) the command object for
'command', call its 'ensure_finalized()' method, and return the
finalized command object.
"""
cmd_obj = self.distribution.get_command_obj(command, create)
cmd_obj.ensure_finalized()
return cmd_obj
# XXX rename to 'get_reinitialized_command()'? (should do the
# same in dist.py, if so)
def reinitialize_command(self, command, reinit_subcommands=0):
return self.distribution.reinitialize_command(
command, reinit_subcommands)
def run_command(self, command):
"""Run some other command: uses the 'run_command()' method of
Distribution, which creates and finalizes the command object if
necessary and then invokes its 'run()' method.
"""
self.distribution.run_command(command)
def get_sub_commands(self):
"""Determine the sub-commands that are relevant in the current
distribution (ie., that need to be run). This is based on the
'sub_commands' class attribute: each tuple in that list may include
a method that we call to determine if the subcommand needs to be
run for the current distribution. Return a list of command names.
"""
commands = []
for (cmd_name, method) in self.sub_commands:
if method is None or method(self):
commands.append(cmd_name)
return commands
# -- External world manipulation -----------------------------------
def warn(self, msg):
log.warn("warning: %s: %s\n" %
(self.get_command_name(), msg))
def execute(self, func, args, msg=None, level=1):
util.execute(func, args, msg, dry_run=self.dry_run)
def mkpath(self, name, mode=0777):
dir_util.mkpath(name, mode, dry_run=self.dry_run)
def copy_file(self, infile, outfile,
preserve_mode=1, preserve_times=1, link=None, level=1):
"""Copy a file respecting verbose, dry-run and force flags. (The
former two default to whatever is in the Distribution object, and
the latter defaults to false for commands that don't define it.)"""
return file_util.copy_file(
infile, outfile,
preserve_mode, preserve_times,
not self.force,
link,
dry_run=self.dry_run)
def copy_tree(self, infile, outfile,
preserve_mode=1, preserve_times=1, preserve_symlinks=0,
level=1):
"""Copy an entire directory tree respecting verbose, dry-run,
and force flags.
"""
return dir_util.copy_tree(
infile, outfile,
preserve_mode,preserve_times,preserve_symlinks,
not self.force,
dry_run=self.dry_run)
def move_file (self, src, dst, level=1):
"""Move a file respecting dry-run flag."""
return file_util.move_file(src, dst, dry_run = self.dry_run)
def spawn (self, cmd, search_path=1, level=1):
"""Spawn an external command respecting dry-run flag."""
from distutils.spawn import spawn
spawn(cmd, search_path, dry_run= self.dry_run)
def make_archive(self, base_name, format, root_dir=None, base_dir=None,
owner=None, group=None):
return archive_util.make_archive(base_name, format, root_dir,
base_dir, dry_run=self.dry_run,
owner=owner, group=group)
def make_file(self, infiles, outfile, func, args,
exec_msg=None, skip_msg=None, level=1):
"""Special case of 'execute()' for operations that process one or
more input files and generate one output file. Works just like
'execute()', except the operation is skipped and a different
message printed if 'outfile' already exists and is newer than all
files listed in 'infiles'. If the command defined 'self.force',
and it is true, then the command is unconditionally run -- does no
timestamp checks.
"""
if skip_msg is None:
skip_msg = "skipping %s (inputs unchanged)" % outfile
# Allow 'infiles' to be a single string
if isinstance(infiles, str):
infiles = (infiles,)
elif not isinstance(infiles, (list, tuple)):
raise TypeError, \
"'infiles' must be a string, or a list or tuple of strings"
if exec_msg is None:
exec_msg = "generating %s from %s" % \
(outfile, ', '.join(infiles))
# If 'outfile' must be regenerated (either because it doesn't
# exist, is out-of-date, or the 'force' flag is true) then
# perform the action that presumably regenerates it
if self.force or dep_util.newer_group(infiles, outfile):
self.execute(func, args, exec_msg, level)
# Otherwise, print the "skip" message
else:
log.debug(skip_msg)
# XXX 'install_misc' class not currently used -- it was the base class for
# both 'install_scripts' and 'install_data', but they outgrew it. It might
# still be useful for 'install_headers', though, so I'm keeping it around
# for the time being.
class install_misc(Command):
"""Common base class for installing some files in a subdirectory.
Currently used by install_data and install_scripts.
"""
user_options = [('install-dir=', 'd', "directory to install the files to")]
def initialize_options (self):
self.install_dir = None
self.outfiles = []
def _install_dir_from(self, dirname):
self.set_undefined_options('install', (dirname, 'install_dir'))
def _copy_files(self, filelist):
self.outfiles = []
if not filelist:
return
self.mkpath(self.install_dir)
for f in filelist:
self.copy_file(f, self.install_dir)
self.outfiles.append(os.path.join(self.install_dir, f))
def get_outputs(self):
return self.outfiles
"""distutils.command.bdist
Implements the Distutils 'bdist' command (create a built [binary]
distribution)."""
__revision__ = "$Id$"
import os
from distutils.util import get_platform
from distutils.core import Command
from distutils.errors import DistutilsPlatformError, DistutilsOptionError
def show_formats():
"""Print list of available formats (arguments to "--format" option).
"""
from distutils.fancy_getopt import FancyGetopt
formats = []
for format in bdist.format_commands:
formats.append(("formats=" + format, None,
bdist.format_command[format][1]))
pretty_printer = FancyGetopt(formats)
pretty_printer.print_help("List of available distribution formats:")
class bdist(Command):
description = "create a built (binary) distribution"
user_options = [('bdist-base=', 'b',
"temporary directory for creating built distributions"),
('plat-name=', 'p',
"platform name to embed in generated filenames "
"(default: %s)" % get_platform()),
('formats=', None,
"formats for distribution (comma-separated list)"),
('dist-dir=', 'd',
"directory to put final built distributions in "
"[default: dist]"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
('owner=', 'u',
"Owner name used when creating a tar file"
" [default: current user]"),
('group=', 'g',
"Group name used when creating a tar file"
" [default: current group]"),
]
boolean_options = ['skip-build']
help_options = [
('help-formats', None,
"lists available distribution formats", show_formats),
]
# The following commands do not take a format option from bdist
no_format_option = ('bdist_rpm',)
# This won't do in reality: will need to distinguish RPM-ish Linux,
# Debian-ish Linux, Solaris, FreeBSD, ..., Windows, Mac OS.
default_format = {'posix': 'gztar',
'nt': 'zip',
'os2': 'zip'}
# Establish the preferred order (for the --help-formats option).
format_commands = ['rpm', 'gztar', 'bztar', 'ztar', 'tar',
'wininst', 'zip', 'msi']
# And the real information.
format_command = {'rpm': ('bdist_rpm', "RPM distribution"),
'gztar': ('bdist_dumb', "gzip'ed tar file"),
'bztar': ('bdist_dumb', "bzip2'ed tar file"),
'ztar': ('bdist_dumb', "compressed tar file"),
'tar': ('bdist_dumb', "tar file"),
'wininst': ('bdist_wininst',
"Windows executable installer"),
'zip': ('bdist_dumb', "ZIP file"),
'msi': ('bdist_msi', "Microsoft Installer")
}
def initialize_options(self):
self.bdist_base = None
self.plat_name = None
self.formats = None
self.dist_dir = None
self.skip_build = 0
self.group = None
self.owner = None
def finalize_options(self):
# have to finalize 'plat_name' before 'bdist_base'
if self.plat_name is None:
if self.skip_build:
self.plat_name = get_platform()
else:
self.plat_name = self.get_finalized_command('build').plat_name
# 'bdist_base' -- parent of per-built-distribution-format
# temporary directories (eg. we'll probably have
# "build/bdist.<plat>/dumb", "build/bdist.<plat>/rpm", etc.)
if self.bdist_base is None:
build_base = self.get_finalized_command('build').build_base
self.bdist_base = os.path.join(build_base,
'bdist.' + self.plat_name)
self.ensure_string_list('formats')
if self.formats is None:
try:
self.formats = [self.default_format[os.name]]
except KeyError:
raise DistutilsPlatformError, \
"don't know how to create built distributions " + \
"on platform %s" % os.name
if self.dist_dir is None:
self.dist_dir = "dist"
def run(self):
# Figure out which sub-commands we need to run.
commands = []
for format in self.formats:
try:
commands.append(self.format_command[format][0])
except KeyError:
raise DistutilsOptionError, "invalid format '%s'" % format
# Reinitialize and run each command.
for i in range(len(self.formats)):
cmd_name = commands[i]
sub_cmd = self.reinitialize_command(cmd_name)
if cmd_name not in self.no_format_option:
sub_cmd.format = self.formats[i]
# passing the owner and group names for tar archiving
if cmd_name == 'bdist_dumb':
sub_cmd.owner = self.owner
sub_cmd.group = self.group
# If we're going to need to run this command again, tell it to
# keep its temporary files around so subsequent runs go faster.
if cmd_name in commands[i+1:]:
sub_cmd.keep_temp = 1
self.run_command(cmd_name)
"""distutils.command.bdist_dumb
Implements the Distutils 'bdist_dumb' command (create a "dumb" built
distribution -- i.e., just an archive to be unpacked under $prefix or
$exec_prefix)."""
__revision__ = "$Id$"
import os
from sysconfig import get_python_version
from distutils.util import get_platform
from distutils.core import Command
from distutils.dir_util import remove_tree, ensure_relative
from distutils.errors import DistutilsPlatformError
from distutils import log
class bdist_dumb (Command):
description = 'create a "dumb" built distribution'
user_options = [('bdist-dir=', 'd',
"temporary directory for creating the distribution"),
('plat-name=', 'p',
"platform name to embed in generated filenames "
"(default: %s)" % get_platform()),
('format=', 'f',
"archive format to create (tar, ztar, gztar, zip)"),
('keep-temp', 'k',
"keep the pseudo-installation tree around after " +
"creating the distribution archive"),
('dist-dir=', 'd',
"directory to put final built distributions in"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
('relative', None,
"build the archive using relative paths "
"(default: false)"),
('owner=', 'u',
"Owner name used when creating a tar file"
" [default: current user]"),
('group=', 'g',
"Group name used when creating a tar file"
" [default: current group]"),
]
boolean_options = ['keep-temp', 'skip-build', 'relative']
default_format = { 'posix': 'gztar',
'nt': 'zip',
'os2': 'zip' }
def initialize_options (self):
self.bdist_dir = None
self.plat_name = None
self.format = None
self.keep_temp = 0
self.dist_dir = None
self.skip_build = None
self.relative = 0
self.owner = None
self.group = None
def finalize_options(self):
if self.bdist_dir is None:
bdist_base = self.get_finalized_command('bdist').bdist_base
self.bdist_dir = os.path.join(bdist_base, 'dumb')
if self.format is None:
try:
self.format = self.default_format[os.name]
except KeyError:
raise DistutilsPlatformError, \
("don't know how to create dumb built distributions " +
"on platform %s") % os.name
self.set_undefined_options('bdist',
('dist_dir', 'dist_dir'),
('plat_name', 'plat_name'),
('skip_build', 'skip_build'))
def run(self):
if not self.skip_build:
self.run_command('build')
install = self.reinitialize_command('install', reinit_subcommands=1)
install.root = self.bdist_dir
install.skip_build = self.skip_build
install.warn_dir = 0
log.info("installing to %s" % self.bdist_dir)
self.run_command('install')
# And make an archive relative to the root of the
# pseudo-installation tree.
archive_basename = "%s.%s" % (self.distribution.get_fullname(),
self.plat_name)
# OS/2 objects to any ":" characters in a filename (such as when
# a timestamp is used in a version) so change them to hyphens.
if os.name == "os2":
archive_basename = archive_basename.replace(":", "-")
pseudoinstall_root = os.path.join(self.dist_dir, archive_basename)
if not self.relative:
archive_root = self.bdist_dir
else:
if (self.distribution.has_ext_modules() and
(install.install_base != install.install_platbase)):
raise DistutilsPlatformError, \
("can't make a dumb built distribution where "
"base and platbase are different (%s, %s)"
% (repr(install.install_base),
repr(install.install_platbase)))
else:
archive_root = os.path.join(self.bdist_dir,
ensure_relative(install.install_base))
# Make the archive
filename = self.make_archive(pseudoinstall_root,
self.format, root_dir=archive_root,
owner=self.owner, group=self.group)
if self.distribution.has_ext_modules():
pyversion = get_python_version()
else:
pyversion = 'any'
self.distribution.dist_files.append(('bdist_dumb', pyversion,
filename))
if not self.keep_temp:
remove_tree(self.bdist_dir, dry_run=self.dry_run)
"""distutils.command.bdist_rpm
Implements the Distutils 'bdist_rpm' command (create RPM source and binary
distributions)."""
__revision__ = "$Id$"
import sys
import os
import string
from distutils.core import Command
from distutils.debug import DEBUG
from distutils.file_util import write_file
from distutils.sysconfig import get_python_version
from distutils.errors import (DistutilsOptionError, DistutilsPlatformError,
DistutilsFileError, DistutilsExecError)
from distutils import log
class bdist_rpm (Command):
description = "create an RPM distribution"
user_options = [
('bdist-base=', None,
"base directory for creating built distributions"),
('rpm-base=', None,
"base directory for creating RPMs (defaults to \"rpm\" under "
"--bdist-base; must be specified for RPM 2)"),
('dist-dir=', 'd',
"directory to put final RPM files in "
"(and .spec files if --spec-only)"),
('python=', None,
"path to Python interpreter to hard-code in the .spec file "
"(default: \"python\")"),
('fix-python', None,
"hard-code the exact path to the current Python interpreter in "
"the .spec file"),
('spec-only', None,
"only regenerate spec file"),
('source-only', None,
"only generate source RPM"),
('binary-only', None,
"only generate binary RPM"),
('use-bzip2', None,
"use bzip2 instead of gzip to create source distribution"),
# More meta-data: too RPM-specific to put in the setup script,
# but needs to go in the .spec file -- so we make these options
# to "bdist_rpm". The idea is that packagers would put this
# info in setup.cfg, although they are of course free to
# supply it on the command line.
('distribution-name=', None,
"name of the (Linux) distribution to which this "
"RPM applies (*not* the name of the module distribution!)"),
('group=', None,
"package classification [default: \"Development/Libraries\"]"),
('release=', None,
"RPM release number"),
('serial=', None,
"RPM serial number"),
('vendor=', None,
"RPM \"vendor\" (eg. \"Joe Blow <[email protected]>\") "
"[default: maintainer or author from setup script]"),
('packager=', None,
"RPM packager (eg. \"Jane Doe <[email protected]>\") "
"[default: vendor]"),
('doc-files=', None,
"list of documentation files (space or comma-separated)"),
('changelog=', None,
"RPM changelog"),
('icon=', None,
"name of icon file"),
('provides=', None,
"capabilities provided by this package"),
('requires=', None,
"capabilities required by this package"),
('conflicts=', None,
"capabilities which conflict with this package"),
('build-requires=', None,
"capabilities required to build this package"),
('obsoletes=', None,
"capabilities made obsolete by this package"),
('no-autoreq', None,
"do not automatically calculate dependencies"),
# Actions to take when building RPM
('keep-temp', 'k',
"don't clean up RPM build directory"),
('no-keep-temp', None,
"clean up RPM build directory [default]"),
('use-rpm-opt-flags', None,
"compile with RPM_OPT_FLAGS when building from source RPM"),
('no-rpm-opt-flags', None,
"do not pass any RPM CFLAGS to compiler"),
('rpm3-mode', None,
"RPM 3 compatibility mode (default)"),
('rpm2-mode', None,
"RPM 2 compatibility mode"),
# Add the hooks necessary for specifying custom scripts
('prep-script=', None,
"Specify a script for the PREP phase of RPM building"),
('build-script=', None,
"Specify a script for the BUILD phase of RPM building"),
('pre-install=', None,
"Specify a script for the pre-INSTALL phase of RPM building"),
('install-script=', None,
"Specify a script for the INSTALL phase of RPM building"),
('post-install=', None,
"Specify a script for the post-INSTALL phase of RPM building"),
('pre-uninstall=', None,
"Specify a script for the pre-UNINSTALL phase of RPM building"),
('post-uninstall=', None,
"Specify a script for the post-UNINSTALL phase of RPM building"),
('clean-script=', None,
"Specify a script for the CLEAN phase of RPM building"),
('verify-script=', None,
"Specify a script for the VERIFY phase of the RPM build"),
# Allow a packager to explicitly force an architecture
('force-arch=', None,
"Force an architecture onto the RPM build process"),
('quiet', 'q',
"Run the INSTALL phase of RPM building in quiet mode"),
]
boolean_options = ['keep-temp', 'use-rpm-opt-flags', 'rpm3-mode',
'no-autoreq', 'quiet']
negative_opt = {'no-keep-temp': 'keep-temp',
'no-rpm-opt-flags': 'use-rpm-opt-flags',
'rpm2-mode': 'rpm3-mode'}
def initialize_options (self):
self.bdist_base = None
self.rpm_base = None
self.dist_dir = None
self.python = None
self.fix_python = None
self.spec_only = None
self.binary_only = None
self.source_only = None
self.use_bzip2 = None
self.distribution_name = None
self.group = None
self.release = None
self.serial = None
self.vendor = None
self.packager = None
self.doc_files = None
self.changelog = None
self.icon = None
self.prep_script = None
self.build_script = None
self.install_script = None
self.clean_script = None
self.verify_script = None
self.pre_install = None
self.post_install = None
self.pre_uninstall = None
self.post_uninstall = None
self.prep = None
self.provides = None
self.requires = None
self.conflicts = None
self.build_requires = None
self.obsoletes = None
self.keep_temp = 0
self.use_rpm_opt_flags = 1
self.rpm3_mode = 1
self.no_autoreq = 0
self.force_arch = None
self.quiet = 0
# initialize_options()
def finalize_options (self):
self.set_undefined_options('bdist', ('bdist_base', 'bdist_base'))
if self.rpm_base is None:
if not self.rpm3_mode:
raise DistutilsOptionError, \
"you must specify --rpm-base in RPM 2 mode"
self.rpm_base = os.path.join(self.bdist_base, "rpm")
if self.python is None:
if self.fix_python:
self.python = sys.executable
else:
self.python = "python"
elif self.fix_python:
raise DistutilsOptionError, \
"--python and --fix-python are mutually exclusive options"
if os.name != 'posix':
raise DistutilsPlatformError, \
("don't know how to create RPM "
"distributions on platform %s" % os.name)
if self.binary_only and self.source_only:
raise DistutilsOptionError, \
"cannot supply both '--source-only' and '--binary-only'"
# don't pass CFLAGS to pure python distributions
if not self.distribution.has_ext_modules():
self.use_rpm_opt_flags = 0
self.set_undefined_options('bdist', ('dist_dir', 'dist_dir'))
self.finalize_package_data()
# finalize_options()
def finalize_package_data (self):
self.ensure_string('group', "Development/Libraries")
self.ensure_string('vendor',
"%s <%s>" % (self.distribution.get_contact(),
self.distribution.get_contact_email()))
self.ensure_string('packager')
self.ensure_string_list('doc_files')
if isinstance(self.doc_files, list):
for readme in ('README', 'README.txt'):
if os.path.exists(readme) and readme not in self.doc_files:
self.doc_files.append(readme)
self.ensure_string('release', "1")
self.ensure_string('serial') # should it be an int?
self.ensure_string('distribution_name')
self.ensure_string('changelog')
# Format changelog correctly
self.changelog = self._format_changelog(self.changelog)
self.ensure_filename('icon')
self.ensure_filename('prep_script')
self.ensure_filename('build_script')
self.ensure_filename('install_script')
self.ensure_filename('clean_script')
self.ensure_filename('verify_script')
self.ensure_filename('pre_install')
self.ensure_filename('post_install')
self.ensure_filename('pre_uninstall')
self.ensure_filename('post_uninstall')
# XXX don't forget we punted on summaries and descriptions -- they
# should be handled here eventually!
# Now *this* is some meta-data that belongs in the setup script...
self.ensure_string_list('provides')
self.ensure_string_list('requires')
self.ensure_string_list('conflicts')
self.ensure_string_list('build_requires')
self.ensure_string_list('obsoletes')
self.ensure_string('force_arch')
# finalize_package_data ()
def run (self):
if DEBUG:
print "before _get_package_data():"
print "vendor =", self.vendor
print "packager =", self.packager
print "doc_files =", self.doc_files
print "changelog =", self.changelog
# make directories
if self.spec_only:
spec_dir = self.dist_dir
self.mkpath(spec_dir)
else:
rpm_dir = {}
for d in ('SOURCES', 'SPECS', 'BUILD', 'RPMS', 'SRPMS'):
rpm_dir[d] = os.path.join(self.rpm_base, d)
self.mkpath(rpm_dir[d])
spec_dir = rpm_dir['SPECS']
# Spec file goes into 'dist_dir' if '--spec-only specified',
# build/rpm.<plat> otherwise.
spec_path = os.path.join(spec_dir,
"%s.spec" % self.distribution.get_name())
self.execute(write_file,
(spec_path,
self._make_spec_file()),
"writing '%s'" % spec_path)
if self.spec_only: # stop if requested
return
# Make a source distribution and copy to SOURCES directory with
# optional icon.
saved_dist_files = self.distribution.dist_files[:]
sdist = self.reinitialize_command('sdist')
if self.use_bzip2:
sdist.formats = ['bztar']
else:
sdist.formats = ['gztar']
self.run_command('sdist')
self.distribution.dist_files = saved_dist_files
source = sdist.get_archive_files()[0]
source_dir = rpm_dir['SOURCES']
self.copy_file(source, source_dir)
if self.icon:
if os.path.exists(self.icon):
self.copy_file(self.icon, source_dir)
else:
raise DistutilsFileError, \
"icon file '%s' does not exist" % self.icon
# build package
log.info("building RPMs")
rpm_cmd = ['rpm']
if os.path.exists('/usr/bin/rpmbuild') or \
os.path.exists('/bin/rpmbuild'):
rpm_cmd = ['rpmbuild']
if self.source_only: # what kind of RPMs?
rpm_cmd.append('-bs')
elif self.binary_only:
rpm_cmd.append('-bb')
else:
rpm_cmd.append('-ba')
if self.rpm3_mode:
rpm_cmd.extend(['--define',
'_topdir %s' % os.path.abspath(self.rpm_base)])
if not self.keep_temp:
rpm_cmd.append('--clean')
if self.quiet:
rpm_cmd.append('--quiet')
rpm_cmd.append(spec_path)
# Determine the binary rpm names that should be built out of this spec
# file
# Note that some of these may not be really built (if the file
# list is empty)
nvr_string = "%{name}-%{version}-%{release}"
src_rpm = nvr_string + ".src.rpm"
non_src_rpm = "%{arch}/" + nvr_string + ".%{arch}.rpm"
q_cmd = r"rpm -q --qf '%s %s\n' --specfile '%s'" % (
src_rpm, non_src_rpm, spec_path)
out = os.popen(q_cmd)
try:
binary_rpms = []
source_rpm = None
while 1:
line = out.readline()
if not line:
break
l = string.split(string.strip(line))
assert(len(l) == 2)
binary_rpms.append(l[1])
# The source rpm is named after the first entry in the spec file
if source_rpm is None:
source_rpm = l[0]
status = out.close()
if status:
raise DistutilsExecError("Failed to execute: %s" % repr(q_cmd))
finally:
out.close()
self.spawn(rpm_cmd)
if not self.dry_run:
if self.distribution.has_ext_modules():
pyversion = get_python_version()
else:
pyversion = 'any'
if not self.binary_only:
srpm = os.path.join(rpm_dir['SRPMS'], source_rpm)
assert(os.path.exists(srpm))
self.move_file(srpm, self.dist_dir)
filename = os.path.join(self.dist_dir, source_rpm)
self.distribution.dist_files.append(
('bdist_rpm', pyversion, filename))
if not self.source_only:
for rpm in binary_rpms:
rpm = os.path.join(rpm_dir['RPMS'], rpm)
if os.path.exists(rpm):
self.move_file(rpm, self.dist_dir)
filename = os.path.join(self.dist_dir,
os.path.basename(rpm))
self.distribution.dist_files.append(
('bdist_rpm', pyversion, filename))
# run()
def _dist_path(self, path):
return os.path.join(self.dist_dir, os.path.basename(path))
def _make_spec_file(self):
"""Generate the text of an RPM spec file and return it as a
list of strings (one per line).
"""
# definitions and headers
spec_file = [
'%define name ' + self.distribution.get_name(),
'%define version ' + self.distribution.get_version().replace('-','_'),
'%define unmangled_version ' + self.distribution.get_version(),
'%define release ' + self.release.replace('-','_'),
'',
'Summary: ' + self.distribution.get_description(),
]
# put locale summaries into spec file
# XXX not supported for now (hard to put a dictionary
# in a config file -- arg!)
#for locale in self.summaries.keys():
# spec_file.append('Summary(%s): %s' % (locale,
# self.summaries[locale]))
spec_file.extend([
'Name: %{name}',
'Version: %{version}',
'Release: %{release}',])
# XXX yuck! this filename is available from the "sdist" command,
# but only after it has run: and we create the spec file before
# running "sdist", in case of --spec-only.
if self.use_bzip2:
spec_file.append('Source0: %{name}-%{unmangled_version}.tar.bz2')
else:
spec_file.append('Source0: %{name}-%{unmangled_version}.tar.gz')
spec_file.extend([
'License: ' + self.distribution.get_license(),
'Group: ' + self.group,
'BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-buildroot',
'Prefix: %{_prefix}', ])
if not self.force_arch:
# noarch if no extension modules
if not self.distribution.has_ext_modules():
spec_file.append('BuildArch: noarch')
else:
spec_file.append( 'BuildArch: %s' % self.force_arch )
for field in ('Vendor',
'Packager',
'Provides',
'Requires',
'Conflicts',
'Obsoletes',
):
val = getattr(self, string.lower(field))
if isinstance(val, list):
spec_file.append('%s: %s' % (field, string.join(val)))
elif val is not None:
spec_file.append('%s: %s' % (field, val))
if self.distribution.get_url() != 'UNKNOWN':
spec_file.append('Url: ' + self.distribution.get_url())
if self.distribution_name:
spec_file.append('Distribution: ' + self.distribution_name)
if self.build_requires:
spec_file.append('BuildRequires: ' +
string.join(self.build_requires))
if self.icon:
spec_file.append('Icon: ' + os.path.basename(self.icon))
if self.no_autoreq:
spec_file.append('AutoReq: 0')
spec_file.extend([
'',
'%description',
self.distribution.get_long_description()
])
# put locale descriptions into spec file
# XXX again, suppressed because config file syntax doesn't
# easily support this ;-(
#for locale in self.descriptions.keys():
# spec_file.extend([
# '',
# '%description -l ' + locale,
# self.descriptions[locale],
# ])
# rpm scripts
# figure out default build script
def_setup_call = "%s %s" % (self.python,os.path.basename(sys.argv[0]))
def_build = "%s build" % def_setup_call
if self.use_rpm_opt_flags:
def_build = 'env CFLAGS="$RPM_OPT_FLAGS" ' + def_build
# insert contents of files
# XXX this is kind of misleading: user-supplied options are files
# that we open and interpolate into the spec file, but the defaults
# are just text that we drop in as-is. Hmmm.
install_cmd = ('%s install -O1 --root=$RPM_BUILD_ROOT '
'--record=INSTALLED_FILES') % def_setup_call
script_options = [
('prep', 'prep_script', "%setup -n %{name}-%{unmangled_version}"),
('build', 'build_script', def_build),
('install', 'install_script', install_cmd),
('clean', 'clean_script', "rm -rf $RPM_BUILD_ROOT"),
('verifyscript', 'verify_script', None),
('pre', 'pre_install', None),
('post', 'post_install', None),
('preun', 'pre_uninstall', None),
('postun', 'post_uninstall', None),
]
for (rpm_opt, attr, default) in script_options:
# Insert contents of file referred to, if no file is referred to
# use 'default' as contents of script
val = getattr(self, attr)
if val or default:
spec_file.extend([
'',
'%' + rpm_opt,])
if val:
spec_file.extend(string.split(open(val, 'r').read(), '\n'))
else:
spec_file.append(default)
# files section
spec_file.extend([
'',
'%files -f INSTALLED_FILES',
'%defattr(-,root,root)',
])
if self.doc_files:
spec_file.append('%doc ' + string.join(self.doc_files))
if self.changelog:
spec_file.extend([
'',
'%changelog',])
spec_file.extend(self.changelog)
return spec_file
# _make_spec_file ()
def _format_changelog(self, changelog):
"""Format the changelog correctly and convert it to a list of strings
"""
if not changelog:
return changelog
new_changelog = []
for line in string.split(string.strip(changelog), '\n'):
line = string.strip(line)
if line[0] == '*':
new_changelog.extend(['', line])
elif line[0] == '-':
new_changelog.append(line)
else:
new_changelog.append(' ' + line)
# strip trailing newline inserted by first changelog entry
if not new_changelog[0]:
del new_changelog[0]
return new_changelog
# _format_changelog()
# class bdist_rpm
"""distutils.command.bdist_wininst
Implements the Distutils 'bdist_wininst' command: create a windows installer
exe-program."""
__revision__ = "$Id$"
import sys
import os
import string
from sysconfig import get_python_version
from distutils.core import Command
from distutils.dir_util import remove_tree
from distutils.errors import DistutilsOptionError, DistutilsPlatformError
from distutils import log
from distutils.util import get_platform
class bdist_wininst (Command):
description = "create an executable installer for MS Windows"
user_options = [('bdist-dir=', None,
"temporary directory for creating the distribution"),
('plat-name=', 'p',
"platform name to embed in generated filenames "
"(default: %s)" % get_platform()),
('keep-temp', 'k',
"keep the pseudo-installation tree around after " +
"creating the distribution archive"),
('target-version=', None,
"require a specific python version" +
" on the target system"),
('no-target-compile', 'c',
"do not compile .py to .pyc on the target system"),
('no-target-optimize', 'o',
"do not compile .py to .pyo (optimized) "
"on the target system"),
('dist-dir=', 'd',
"directory to put final built distributions in"),
('bitmap=', 'b',
"bitmap to use for the installer instead of python-powered logo"),
('title=', 't',
"title to display on the installer background instead of default"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
('install-script=', None,
"basename of installation script to be run after "
"installation or before deinstallation"),
('pre-install-script=', None,
"Fully qualified filename of a script to be run before "
"any files are installed. This script need not be in the "
"distribution"),
('user-access-control=', None,
"specify Vista's UAC handling - 'none'/default=no "
"handling, 'auto'=use UAC if target Python installed for "
"all users, 'force'=always use UAC"),
]
boolean_options = ['keep-temp', 'no-target-compile', 'no-target-optimize',
'skip-build']
def initialize_options (self):
self.bdist_dir = None
self.plat_name = None
self.keep_temp = 0
self.no_target_compile = 0
self.no_target_optimize = 0
self.target_version = None
self.dist_dir = None
self.bitmap = None
self.title = None
self.skip_build = None
self.install_script = None
self.pre_install_script = None
self.user_access_control = None
# initialize_options()
def finalize_options (self):
self.set_undefined_options('bdist', ('skip_build', 'skip_build'))
if self.bdist_dir is None:
if self.skip_build and self.plat_name:
# If build is skipped and plat_name is overridden, bdist will
# not see the correct 'plat_name' - so set that up manually.
bdist = self.distribution.get_command_obj('bdist')
bdist.plat_name = self.plat_name
# next the command will be initialized using that name
bdist_base = self.get_finalized_command('bdist').bdist_base
self.bdist_dir = os.path.join(bdist_base, 'wininst')
if not self.target_version:
self.target_version = ""
if not self.skip_build and self.distribution.has_ext_modules():
short_version = get_python_version()
if self.target_version and self.target_version != short_version:
raise DistutilsOptionError, \
"target version can only be %s, or the '--skip-build'" \
" option must be specified" % (short_version,)
self.target_version = short_version
self.set_undefined_options('bdist',
('dist_dir', 'dist_dir'),
('plat_name', 'plat_name'),
)
if self.install_script:
for script in self.distribution.scripts:
if self.install_script == os.path.basename(script):
break
else:
raise DistutilsOptionError, \
"install_script '%s' not found in scripts" % \
self.install_script
# finalize_options()
def run (self):
if (sys.platform != "win32" and
(self.distribution.has_ext_modules() or
self.distribution.has_c_libraries())):
raise DistutilsPlatformError \
("distribution contains extensions and/or C libraries; "
"must be compiled on a Windows 32 platform")
if not self.skip_build:
self.run_command('build')
install = self.reinitialize_command('install', reinit_subcommands=1)
install.root = self.bdist_dir
install.skip_build = self.skip_build
install.warn_dir = 0
install.plat_name = self.plat_name
install_lib = self.reinitialize_command('install_lib')
# we do not want to include pyc or pyo files
install_lib.compile = 0
install_lib.optimize = 0
if self.distribution.has_ext_modules():
# If we are building an installer for a Python version other
# than the one we are currently running, then we need to ensure
# our build_lib reflects the other Python version rather than ours.
# Note that for target_version!=sys.version, we must have skipped the
# build step, so there is no issue with enforcing the build of this
# version.
target_version = self.target_version
if not target_version:
assert self.skip_build, "Should have already checked this"
target_version = sys.version[0:3]
plat_specifier = ".%s-%s" % (self.plat_name, target_version)
build = self.get_finalized_command('build')
build.build_lib = os.path.join(build.build_base,
'lib' + plat_specifier)
# Use a custom scheme for the zip-file, because we have to decide
# at installation time which scheme to use.
for key in ('purelib', 'platlib', 'headers', 'scripts', 'data'):
value = string.upper(key)
if key == 'headers':
value = value + '/Include/$dist_name'
setattr(install,
'install_' + key,
value)
log.info("installing to %s", self.bdist_dir)
install.ensure_finalized()
# avoid warning of 'install_lib' about installing
# into a directory not in sys.path
sys.path.insert(0, os.path.join(self.bdist_dir, 'PURELIB'))
install.run()
del sys.path[0]
# And make an archive relative to the root of the
# pseudo-installation tree.
from tempfile import mktemp
archive_basename = mktemp()
fullname = self.distribution.get_fullname()
arcname = self.make_archive(archive_basename, "zip",
root_dir=self.bdist_dir)
# create an exe containing the zip-file
self.create_exe(arcname, fullname, self.bitmap)
if self.distribution.has_ext_modules():
pyversion = get_python_version()
else:
pyversion = 'any'
self.distribution.dist_files.append(('bdist_wininst', pyversion,
self.get_installer_filename(fullname)))
# remove the zip-file again
log.debug("removing temporary file '%s'", arcname)
os.remove(arcname)
if not self.keep_temp:
remove_tree(self.bdist_dir, dry_run=self.dry_run)
# run()
def get_inidata (self):
# Return data describing the installation.
lines = []
metadata = self.distribution.metadata
# Write the [metadata] section.
lines.append("[metadata]")
# 'info' will be displayed in the installer's dialog box,
# describing the items to be installed.
info = (metadata.long_description or '') + '\n'
# Escape newline characters
def escape(s):
return string.replace(s, "\n", "\\n")
for name in ["author", "author_email", "description", "maintainer",
"maintainer_email", "name", "url", "version"]:
data = getattr(metadata, name, "")
if data:
info = info + ("\n %s: %s" % \
(string.capitalize(name), escape(data)))
lines.append("%s=%s" % (name, escape(data)))
# The [setup] section contains entries controlling
# the installer runtime.
lines.append("\n[Setup]")
if self.install_script:
lines.append("install_script=%s" % self.install_script)
lines.append("info=%s" % escape(info))
lines.append("target_compile=%d" % (not self.no_target_compile))
lines.append("target_optimize=%d" % (not self.no_target_optimize))
if self.target_version:
lines.append("target_version=%s" % self.target_version)
if self.user_access_control:
lines.append("user_access_control=%s" % self.user_access_control)
title = self.title or self.distribution.get_fullname()
lines.append("title=%s" % escape(title))
import time
import distutils
build_info = "Built %s with distutils-%s" % \
(time.ctime(time.time()), distutils.__version__)
lines.append("build_info=%s" % build_info)
return string.join(lines, "\n")
# get_inidata()
def create_exe (self, arcname, fullname, bitmap=None):
import struct
self.mkpath(self.dist_dir)
cfgdata = self.get_inidata()
installer_name = self.get_installer_filename(fullname)
self.announce("creating %s" % installer_name)
if bitmap:
bitmapdata = open(bitmap, "rb").read()
bitmaplen = len(bitmapdata)
else:
bitmaplen = 0
file = open(installer_name, "wb")
file.write(self.get_exe_bytes())
if bitmap:
file.write(bitmapdata)
# Convert cfgdata from unicode to ascii, mbcs encoded
try:
unicode
except NameError:
pass
else:
if isinstance(cfgdata, unicode):
cfgdata = cfgdata.encode("mbcs")
# Append the pre-install script
cfgdata = cfgdata + "\0"
if self.pre_install_script:
script_data = open(self.pre_install_script, "r").read()
cfgdata = cfgdata + script_data + "\n\0"
else:
# empty pre-install script
cfgdata = cfgdata + "\0"
file.write(cfgdata)
# The 'magic number' 0x1234567B is used to make sure that the
# binary layout of 'cfgdata' is what the wininst.exe binary
# expects. If the layout changes, increment that number, make
# the corresponding changes to the wininst.exe sources, and
# recompile them.
header = struct.pack("<iii",
0x1234567B, # tag
len(cfgdata), # length
bitmaplen, # number of bytes in bitmap
)
file.write(header)
file.write(open(arcname, "rb").read())
# create_exe()
def get_installer_filename(self, fullname):
# Factored out to allow overriding in subclasses
if self.target_version:
# if we create an installer for a specific python version,
# it's better to include this in the name
installer_name = os.path.join(self.dist_dir,
"%s.%s-py%s.exe" %
(fullname, self.plat_name, self.target_version))
else:
installer_name = os.path.join(self.dist_dir,
"%s.%s.exe" % (fullname, self.plat_name))
return installer_name
# get_installer_filename()
def get_exe_bytes (self):
from distutils.msvccompiler import get_build_version
# If a target-version other than the current version has been
# specified, then using the MSVC version from *this* build is no good.
# Without actually finding and executing the target version and parsing
# its sys.version, we just hard-code our knowledge of old versions.
# NOTE: Possible alternative is to allow "--target-version" to
# specify a Python executable rather than a simple version string.
# We can then execute this program to obtain any info we need, such
# as the real sys.version string for the build.
cur_version = get_python_version()
if self.target_version and self.target_version != cur_version:
# If the target version is *later* than us, then we assume they
# use what we use
# string compares seem wrong, but are what sysconfig.py itself uses
if self.target_version > cur_version:
bv = get_build_version()
else:
if self.target_version < "2.4":
bv = 6.0
else:
bv = 7.1
else:
# for current version - use authoritative check.
bv = get_build_version()
# wininst-x.y.exe is in the same directory as this file
directory = os.path.dirname(__file__)
# we must use a wininst-x.y.exe built with the same C compiler
# used for python. XXX What about mingw, borland, and so on?
# if plat_name starts with "win" but is not "win32"
# we want to strip "win" and leave the rest (e.g. -amd64)
# for all other cases, we don't want any suffix
if self.plat_name != 'win32' and self.plat_name[:3] == 'win':
sfix = self.plat_name[3:]
else:
sfix = ''
filename = os.path.join(directory, "wininst-%.1f%s.exe" % (bv, sfix))
f = open(filename, "rb")
try:
return f.read()
finally:
f.close()
# class bdist_wininst
"""distutils.command.build
Implements the Distutils 'build' command."""
__revision__ = "$Id$"
import sys, os
from distutils.util import get_platform
from distutils.core import Command
from distutils.errors import DistutilsOptionError
def show_compilers():
from distutils.ccompiler import show_compilers
show_compilers()
class build(Command):
description = "build everything needed to install"
user_options = [
('build-base=', 'b',
"base directory for build library"),
('build-purelib=', None,
"build directory for platform-neutral distributions"),
('build-platlib=', None,
"build directory for platform-specific distributions"),
('build-lib=', None,
"build directory for all distribution (defaults to either " +
"build-purelib or build-platlib"),
('build-scripts=', None,
"build directory for scripts"),
('build-temp=', 't',
"temporary build directory"),
('plat-name=', 'p',
"platform name to build for, if supported "
"(default: %s)" % get_platform()),
('compiler=', 'c',
"specify the compiler type"),
('debug', 'g',
"compile extensions and libraries with debugging information"),
('force', 'f',
"forcibly build everything (ignore file timestamps)"),
('executable=', 'e',
"specify final destination interpreter path (build.py)"),
]
boolean_options = ['debug', 'force']
help_options = [
('help-compiler', None,
"list available compilers", show_compilers),
]
def initialize_options(self):
self.build_base = 'build'
# these are decided only after 'build_base' has its final value
# (unless overridden by the user or client)
self.build_purelib = None
self.build_platlib = None
self.build_lib = None
self.build_temp = None
self.build_scripts = None
self.compiler = None
self.plat_name = None
self.debug = None
self.force = 0
self.executable = None
def finalize_options(self):
if self.plat_name is None:
self.plat_name = get_platform()
else:
# plat-name only supported for windows (other platforms are
# supported via ./configure flags, if at all). Avoid misleading
# other platforms.
if os.name != 'nt':
raise DistutilsOptionError(
"--plat-name only supported on Windows (try "
"using './configure --help' on your platform)")
plat_specifier = ".%s-%s" % (self.plat_name, sys.version[0:3])
# Make it so Python 2.x and Python 2.x with --with-pydebug don't
# share the same build directories. Doing so confuses the build
# process for C modules
if hasattr(sys, 'gettotalrefcount'):
plat_specifier += '-pydebug'
# 'build_purelib' and 'build_platlib' just default to 'lib' and
# 'lib.<plat>' under the base build directory. We only use one of
# them for a given distribution, though --
if self.build_purelib is None:
self.build_purelib = os.path.join(self.build_base, 'lib')
if self.build_platlib is None:
self.build_platlib = os.path.join(self.build_base,
'lib' + plat_specifier)
# 'build_lib' is the actual directory that we will use for this
# particular module distribution -- if user didn't supply it, pick
# one of 'build_purelib' or 'build_platlib'.
if self.build_lib is None:
if self.distribution.ext_modules:
self.build_lib = self.build_platlib
else:
self.build_lib = self.build_purelib
# 'build_temp' -- temporary directory for compiler turds,
# "build/temp.<plat>"
if self.build_temp is None:
self.build_temp = os.path.join(self.build_base,
'temp' + plat_specifier)
if self.build_scripts is None:
self.build_scripts = os.path.join(self.build_base,
'scripts-' + sys.version[0:3])
if self.executable is None and sys.executable:
self.executable = os.path.normpath(sys.executable)
def run(self):
# Run all relevant sub-commands. This will be some subset of:
# - build_py - pure Python modules
# - build_clib - standalone C libraries
# - build_ext - Python extensions
# - build_scripts - (Python) scripts
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
# -- Predicates for the sub-command list ---------------------------
def has_pure_modules (self):
return self.distribution.has_pure_modules()
def has_c_libraries (self):
return self.distribution.has_c_libraries()
def has_ext_modules (self):
return self.distribution.has_ext_modules()
def has_scripts (self):
return self.distribution.has_scripts()
sub_commands = [('build_py', has_pure_modules),
('build_clib', has_c_libraries),
('build_ext', has_ext_modules),
('build_scripts', has_scripts),
]
"""distutils.command.build_clib
Implements the Distutils 'build_clib' command, to build a C/C++ library
that is included in the module distribution and needed by an extension
module."""
__revision__ = "$Id$"
# XXX this module has *lots* of code ripped-off quite transparently from
# build_ext.py -- not surprisingly really, as the work required to build
# a static library from a collection of C source files is not really all
# that different from what's required to build a shared object file from
# a collection of C source files. Nevertheless, I haven't done the
# necessary refactoring to account for the overlap in code between the
# two modules, mainly because a number of subtle details changed in the
# cut 'n paste. Sigh.
import os
from distutils.core import Command
from distutils.errors import DistutilsSetupError
from distutils.sysconfig import customize_compiler
from distutils import log
def show_compilers():
from distutils.ccompiler import show_compilers
show_compilers()
class build_clib(Command):
description = "build C/C++ libraries used by Python extensions"
user_options = [
('build-clib=', 'b',
"directory to build C/C++ libraries to"),
('build-temp=', 't',
"directory to put temporary build by-products"),
('debug', 'g',
"compile with debugging information"),
('force', 'f',
"forcibly build everything (ignore file timestamps)"),
('compiler=', 'c',
"specify the compiler type"),
]
boolean_options = ['debug', 'force']
help_options = [
('help-compiler', None,
"list available compilers", show_compilers),
]
def initialize_options(self):
self.build_clib = None
self.build_temp = None
# List of libraries to build
self.libraries = None
# Compilation options for all libraries
self.include_dirs = None
self.define = None
self.undef = None
self.debug = None
self.force = 0
self.compiler = None
def finalize_options(self):
# This might be confusing: both build-clib and build-temp default
# to build-temp as defined by the "build" command. This is because
# I think that C libraries are really just temporary build
# by-products, at least from the point of view of building Python
# extensions -- but I want to keep my options open.
self.set_undefined_options('build',
('build_temp', 'build_clib'),
('build_temp', 'build_temp'),
('compiler', 'compiler'),
('debug', 'debug'),
('force', 'force'))
self.libraries = self.distribution.libraries
if self.libraries:
self.check_library_list(self.libraries)
if self.include_dirs is None:
self.include_dirs = self.distribution.include_dirs or []
if isinstance(self.include_dirs, str):
self.include_dirs = self.include_dirs.split(os.pathsep)
# XXX same as for build_ext -- what about 'self.define' and
# 'self.undef' ?
def run(self):
if not self.libraries:
return
# Yech -- this is cut 'n pasted from build_ext.py!
from distutils.ccompiler import new_compiler
self.compiler = new_compiler(compiler=self.compiler,
dry_run=self.dry_run,
force=self.force)
customize_compiler(self.compiler)
if self.include_dirs is not None:
self.compiler.set_include_dirs(self.include_dirs)
if self.define is not None:
# 'define' option is a list of (name,value) tuples
for (name,value) in self.define:
self.compiler.define_macro(name, value)
if self.undef is not None:
for macro in self.undef:
self.compiler.undefine_macro(macro)
self.build_libraries(self.libraries)
def check_library_list(self, libraries):
"""Ensure that the list of libraries is valid.
`library` is presumably provided as a command option 'libraries'.
This method checks that it is a list of 2-tuples, where the tuples
are (library_name, build_info_dict).
Raise DistutilsSetupError if the structure is invalid anywhere;
just returns otherwise.
"""
if not isinstance(libraries, list):
raise DistutilsSetupError, \
"'libraries' option must be a list of tuples"
for lib in libraries:
if not isinstance(lib, tuple) and len(lib) != 2:
raise DistutilsSetupError, \
"each element of 'libraries' must a 2-tuple"
name, build_info = lib
if not isinstance(name, str):
raise DistutilsSetupError, \
"first element of each tuple in 'libraries' " + \
"must be a string (the library name)"
if '/' in name or (os.sep != '/' and os.sep in name):
raise DistutilsSetupError, \
("bad library name '%s': " +
"may not contain directory separators") % \
lib[0]
if not isinstance(build_info, dict):
raise DistutilsSetupError, \
"second element of each tuple in 'libraries' " + \
"must be a dictionary (build info)"
def get_library_names(self):
# Assume the library list is valid -- 'check_library_list()' is
# called from 'finalize_options()', so it should be!
if not self.libraries:
return None
lib_names = []
for (lib_name, build_info) in self.libraries:
lib_names.append(lib_name)
return lib_names
def get_source_files(self):
self.check_library_list(self.libraries)
filenames = []
for (lib_name, build_info) in self.libraries:
sources = build_info.get('sources')
if sources is None or not isinstance(sources, (list, tuple)):
raise DistutilsSetupError, \
("in 'libraries' option (library '%s'), "
"'sources' must be present and must be "
"a list of source filenames") % lib_name
filenames.extend(sources)
return filenames
def build_libraries(self, libraries):
for (lib_name, build_info) in libraries:
sources = build_info.get('sources')
if sources is None or not isinstance(sources, (list, tuple)):
raise DistutilsSetupError, \
("in 'libraries' option (library '%s'), " +
"'sources' must be present and must be " +
"a list of source filenames") % lib_name
sources = list(sources)
log.info("building '%s' library", lib_name)
# First, compile the source code to object files in the library
# directory. (This should probably change to putting object
# files in a temporary build directory.)
macros = build_info.get('macros')
include_dirs = build_info.get('include_dirs')
objects = self.compiler.compile(sources,
output_dir=self.build_temp,
macros=macros,
include_dirs=include_dirs,
debug=self.debug)
# Now "link" the object files together into a static library.
# (On Unix at least, this isn't really linking -- it just
# builds an archive. Whatever.)
self.compiler.create_static_lib(objects, lib_name,
output_dir=self.build_clib,
debug=self.debug)
"""distutils.command.build_ext
Implements the Distutils 'build_ext' command, for building extension
modules (currently limited to C extensions, should accommodate C++
extensions ASAP)."""
# This module should be kept compatible with Python 2.1.
__revision__ = "$Id$"
import sys, os, string, re
from types import *
from site import USER_BASE, USER_SITE
from distutils.core import Command
from distutils.errors import *
from distutils.sysconfig import customize_compiler, get_python_version
from distutils.dep_util import newer_group
from distutils.extension import Extension
from distutils.util import get_platform
from distutils import log
if os.name == 'nt':
from distutils.msvccompiler import get_build_version
MSVC_VERSION = int(get_build_version())
# An extension name is just a dot-separated list of Python NAMEs (ie.
# the same as a fully-qualified module name).
extension_name_re = re.compile \
(r'^[a-zA-Z_][a-zA-Z_0-9]*(\.[a-zA-Z_][a-zA-Z_0-9]*)*$')
def show_compilers ():
from distutils.ccompiler import show_compilers
show_compilers()
class build_ext (Command):
description = "build C/C++ extensions (compile/link to build directory)"
# XXX thoughts on how to deal with complex command-line options like
# these, i.e. how to make it so fancy_getopt can suck them off the
# command line and make it look like setup.py defined the appropriate
# lists of tuples of what-have-you.
# - each command needs a callback to process its command-line options
# - Command.__init__() needs access to its share of the whole
# command line (must ultimately come from
# Distribution.parse_command_line())
# - it then calls the current command class' option-parsing
# callback to deal with weird options like -D, which have to
# parse the option text and churn out some custom data
# structure
# - that data structure (in this case, a list of 2-tuples)
# will then be present in the command object by the time
# we get to finalize_options() (i.e. the constructor
# takes care of both command-line and client options
# in between initialize_options() and finalize_options())
sep_by = " (separated by '%s')" % os.pathsep
user_options = [
('build-lib=', 'b',
"directory for compiled extension modules"),
('build-temp=', 't',
"directory for temporary files (build by-products)"),
('plat-name=', 'p',
"platform name to cross-compile for, if supported "
"(default: %s)" % get_platform()),
('inplace', 'i',
"ignore build-lib and put compiled extensions into the source " +
"directory alongside your pure Python modules"),
('include-dirs=', 'I',
"list of directories to search for header files" + sep_by),
('define=', 'D',
"C preprocessor macros to define"),
('undef=', 'U',
"C preprocessor macros to undefine"),
('libraries=', 'l',
"external C libraries to link with"),
('library-dirs=', 'L',
"directories to search for external C libraries" + sep_by),
('rpath=', 'R',
"directories to search for shared C libraries at runtime"),
('link-objects=', 'O',
"extra explicit link objects to include in the link"),
('debug', 'g',
"compile/link with debugging information"),
('force', 'f',
"forcibly build everything (ignore file timestamps)"),
('compiler=', 'c',
"specify the compiler type"),
('swig-cpp', None,
"make SWIG create C++ files (default is C)"),
('swig-opts=', None,
"list of SWIG command line options"),
('swig=', None,
"path to the SWIG executable"),
('user', None,
"add user include, library and rpath"),
]
boolean_options = ['inplace', 'debug', 'force', 'swig-cpp', 'user']
help_options = [
('help-compiler', None,
"list available compilers", show_compilers),
]
def initialize_options (self):
self.extensions = None
self.build_lib = None
self.plat_name = None
self.build_temp = None
self.inplace = 0
self.package = None
self.include_dirs = None
self.define = None
self.undef = None
self.libraries = None
self.library_dirs = None
self.rpath = None
self.link_objects = None
self.debug = None
self.force = None
self.compiler = None
self.swig = None
self.swig_cpp = None
self.swig_opts = None
self.user = None
def finalize_options(self):
from distutils import sysconfig
self.set_undefined_options('build',
('build_lib', 'build_lib'),
('build_temp', 'build_temp'),
('compiler', 'compiler'),
('debug', 'debug'),
('force', 'force'),
('plat_name', 'plat_name'),
)
if self.package is None:
self.package = self.distribution.ext_package
self.extensions = self.distribution.ext_modules
# Make sure Python's include directories (for Python.h, pyconfig.h,
# etc.) are in the include search path.
py_include = sysconfig.get_python_inc()
plat_py_include = sysconfig.get_python_inc(plat_specific=1)
if self.include_dirs is None:
self.include_dirs = self.distribution.include_dirs or []
if isinstance(self.include_dirs, str):
self.include_dirs = self.include_dirs.split(os.pathsep)
# Put the Python "system" include dir at the end, so that
# any local include dirs take precedence.
self.include_dirs.append(py_include)
if plat_py_include != py_include:
self.include_dirs.append(plat_py_include)
self.ensure_string_list('libraries')
self.ensure_string_list('link_objects')
# Life is easier if we're not forever checking for None, so
# simplify these options to empty lists if unset
if self.libraries is None:
self.libraries = []
if self.library_dirs is None:
self.library_dirs = []
elif type(self.library_dirs) is StringType:
self.library_dirs = string.split(self.library_dirs, os.pathsep)
if self.rpath is None:
self.rpath = []
elif type(self.rpath) is StringType:
self.rpath = string.split(self.rpath, os.pathsep)
# for extensions under windows use different directories
# for Release and Debug builds.
# also Python's library directory must be appended to library_dirs
if os.name == 'nt':
# the 'libs' directory is for binary installs - we assume that
# must be the *native* platform. But we don't really support
# cross-compiling via a binary install anyway, so we let it go.
self.library_dirs.append(os.path.join(sys.exec_prefix, 'libs'))
if self.debug:
self.build_temp = os.path.join(self.build_temp, "Debug")
else:
self.build_temp = os.path.join(self.build_temp, "Release")
# Append the source distribution include and library directories,
# this allows distutils on windows to work in the source tree
self.include_dirs.append(os.path.join(sys.exec_prefix, 'PC'))
if MSVC_VERSION == 9:
# Use the .lib files for the correct architecture
if self.plat_name == 'win32':
suffix = ''
else:
# win-amd64 or win-ia64
suffix = self.plat_name[4:]
# We could have been built in one of two places; add both
for d in ('PCbuild',), ('PC', 'VS9.0'):
new_lib = os.path.join(sys.exec_prefix, *d)
if suffix:
new_lib = os.path.join(new_lib, suffix)
self.library_dirs.append(new_lib)
elif MSVC_VERSION == 8:
self.library_dirs.append(os.path.join(sys.exec_prefix,
'PC', 'VS8.0'))
elif MSVC_VERSION == 7:
self.library_dirs.append(os.path.join(sys.exec_prefix,
'PC', 'VS7.1'))
else:
self.library_dirs.append(os.path.join(sys.exec_prefix,
'PC', 'VC6'))
# OS/2 (EMX) doesn't support Debug vs Release builds, but has the
# import libraries in its "Config" subdirectory
if os.name == 'os2':
self.library_dirs.append(os.path.join(sys.exec_prefix, 'Config'))
# for extensions under Cygwin and AtheOS Python's library directory must be
# appended to library_dirs
if sys.platform[:6] == 'cygwin' or sys.platform[:6] == 'atheos':
if sys.executable.startswith(os.path.join(sys.exec_prefix, "bin")):
# building third party extensions
self.library_dirs.append(os.path.join(sys.prefix, "lib",
"python" + get_python_version(),
"config"))
else:
# building python standard extensions
self.library_dirs.append('.')
# For building extensions with a shared Python library,
# Python's library directory must be appended to library_dirs
# See Issues: #1600860, #4366
if (sysconfig.get_config_var('Py_ENABLE_SHARED')):
if not sysconfig.python_build:
# building third party extensions
self.library_dirs.append(sysconfig.get_config_var('LIBDIR'))
else:
# building python standard extensions
self.library_dirs.append('.')
# The argument parsing will result in self.define being a string, but
# it has to be a list of 2-tuples. All the preprocessor symbols
# specified by the 'define' option will be set to '1'. Multiple
# symbols can be separated with commas.
if self.define:
defines = self.define.split(',')
self.define = map(lambda symbol: (symbol, '1'), defines)
# The option for macros to undefine is also a string from the
# option parsing, but has to be a list. Multiple symbols can also
# be separated with commas here.
if self.undef:
self.undef = self.undef.split(',')
if self.swig_opts is None:
self.swig_opts = []
else:
self.swig_opts = self.swig_opts.split(' ')
# Finally add the user include and library directories if requested
if self.user:
user_include = os.path.join(USER_BASE, "include")
user_lib = os.path.join(USER_BASE, "lib")
if os.path.isdir(user_include):
self.include_dirs.append(user_include)
if os.path.isdir(user_lib):
self.library_dirs.append(user_lib)
self.rpath.append(user_lib)
def run(self):
from distutils.ccompiler import new_compiler
# 'self.extensions', as supplied by setup.py, is a list of
# Extension instances. See the documentation for Extension (in
# distutils.extension) for details.
#
# For backwards compatibility with Distutils 0.8.2 and earlier, we
# also allow the 'extensions' list to be a list of tuples:
# (ext_name, build_info)
# where build_info is a dictionary containing everything that
# Extension instances do except the name, with a few things being
# differently named. We convert these 2-tuples to Extension
# instances as needed.
if not self.extensions:
return
# If we were asked to build any C/C++ libraries, make sure that the
# directory where we put them is in the library search path for
# linking extensions.
if self.distribution.has_c_libraries():
build_clib = self.get_finalized_command('build_clib')
self.libraries.extend(build_clib.get_library_names() or [])
self.library_dirs.append(build_clib.build_clib)
# Setup the CCompiler object that we'll use to do all the
# compiling and linking
self.compiler = new_compiler(compiler=self.compiler,
verbose=self.verbose,
dry_run=self.dry_run,
force=self.force)
customize_compiler(self.compiler)
# If we are cross-compiling, init the compiler now (if we are not
# cross-compiling, init would not hurt, but people may rely on
# late initialization of compiler even if they shouldn't...)
if os.name == 'nt' and self.plat_name != get_platform():
self.compiler.initialize(self.plat_name)
# And make sure that any compile/link-related options (which might
# come from the command-line or from the setup script) are set in
# that CCompiler object -- that way, they automatically apply to
# all compiling and linking done here.
if self.include_dirs is not None:
self.compiler.set_include_dirs(self.include_dirs)
if self.define is not None:
# 'define' option is a list of (name,value) tuples
for (name, value) in self.define:
self.compiler.define_macro(name, value)
if self.undef is not None:
for macro in self.undef:
self.compiler.undefine_macro(macro)
if self.libraries is not None:
self.compiler.set_libraries(self.libraries)
if self.library_dirs is not None:
self.compiler.set_library_dirs(self.library_dirs)
if self.rpath is not None:
self.compiler.set_runtime_library_dirs(self.rpath)
if self.link_objects is not None:
self.compiler.set_link_objects(self.link_objects)
# Now actually compile and link everything.
self.build_extensions()
def check_extensions_list(self, extensions):
"""Ensure that the list of extensions (presumably provided as a
command option 'extensions') is valid, i.e. it is a list of
Extension objects. We also support the old-style list of 2-tuples,
where the tuples are (ext_name, build_info), which are converted to
Extension instances here.
Raise DistutilsSetupError if the structure is invalid anywhere;
just returns otherwise.
"""
if not isinstance(extensions, list):
raise DistutilsSetupError, \
"'ext_modules' option must be a list of Extension instances"
for i, ext in enumerate(extensions):
if isinstance(ext, Extension):
continue # OK! (assume type-checking done
# by Extension constructor)
if not isinstance(ext, tuple) or len(ext) != 2:
raise DistutilsSetupError, \
("each element of 'ext_modules' option must be an "
"Extension instance or 2-tuple")
ext_name, build_info = ext
log.warn(("old-style (ext_name, build_info) tuple found in "
"ext_modules for extension '%s' "
"-- please convert to Extension instance" % ext_name))
if not (isinstance(ext_name, str) and
extension_name_re.match(ext_name)):
raise DistutilsSetupError, \
("first element of each tuple in 'ext_modules' "
"must be the extension name (a string)")
if not isinstance(build_info, dict):
raise DistutilsSetupError, \
("second element of each tuple in 'ext_modules' "
"must be a dictionary (build info)")
# OK, the (ext_name, build_info) dict is type-safe: convert it
# to an Extension instance.
ext = Extension(ext_name, build_info['sources'])
# Easy stuff: one-to-one mapping from dict elements to
# instance attributes.
for key in ('include_dirs', 'library_dirs', 'libraries',
'extra_objects', 'extra_compile_args',
'extra_link_args'):
val = build_info.get(key)
if val is not None:
setattr(ext, key, val)
# Medium-easy stuff: same syntax/semantics, different names.
ext.runtime_library_dirs = build_info.get('rpath')
if 'def_file' in build_info:
log.warn("'def_file' element of build info dict "
"no longer supported")
# Non-trivial stuff: 'macros' split into 'define_macros'
# and 'undef_macros'.
macros = build_info.get('macros')
if macros:
ext.define_macros = []
ext.undef_macros = []
for macro in macros:
if not (isinstance(macro, tuple) and len(macro) in (1, 2)):
raise DistutilsSetupError, \
("'macros' element of build info dict "
"must be 1- or 2-tuple")
if len(macro) == 1:
ext.undef_macros.append(macro[0])
elif len(macro) == 2:
ext.define_macros.append(macro)
extensions[i] = ext
def get_source_files(self):
self.check_extensions_list(self.extensions)
filenames = []
# Wouldn't it be neat if we knew the names of header files too...
for ext in self.extensions:
filenames.extend(ext.sources)
return filenames
def get_outputs(self):
# Sanity check the 'extensions' list -- can't assume this is being
# done in the same run as a 'build_extensions()' call (in fact, we
# can probably assume that it *isn't*!).
self.check_extensions_list(self.extensions)
# And build the list of output (built) filenames. Note that this
# ignores the 'inplace' flag, and assumes everything goes in the
# "build" tree.
outputs = []
for ext in self.extensions:
outputs.append(self.get_ext_fullpath(ext.name))
return outputs
def build_extensions(self):
# First, sanity-check the 'extensions' list
self.check_extensions_list(self.extensions)
for ext in self.extensions:
self.build_extension(ext)
def build_extension(self, ext):
sources = ext.sources
if sources is None or type(sources) not in (ListType, TupleType):
raise DistutilsSetupError, \
("in 'ext_modules' option (extension '%s'), " +
"'sources' must be present and must be " +
"a list of source filenames") % ext.name
sources = list(sources)
ext_path = self.get_ext_fullpath(ext.name)
depends = sources + ext.depends
if not (self.force or newer_group(depends, ext_path, 'newer')):
log.debug("skipping '%s' extension (up-to-date)", ext.name)
return
else:
log.info("building '%s' extension", ext.name)
# First, scan the sources for SWIG definition files (.i), run
# SWIG on 'em to create .c files, and modify the sources list
# accordingly.
sources = self.swig_sources(sources, ext)
# Next, compile the source code to object files.
# XXX not honouring 'define_macros' or 'undef_macros' -- the
# CCompiler API needs to change to accommodate this, and I
# want to do one thing at a time!
# Two possible sources for extra compiler arguments:
# - 'extra_compile_args' in Extension object
# - CFLAGS environment variable (not particularly
# elegant, but people seem to expect it and I
# guess it's useful)
# The environment variable should take precedence, and
# any sensible compiler will give precedence to later
# command line args. Hence we combine them in order:
extra_args = ext.extra_compile_args or []
macros = ext.define_macros[:]
for undef in ext.undef_macros:
macros.append((undef,))
objects = self.compiler.compile(sources,
output_dir=self.build_temp,
macros=macros,
include_dirs=ext.include_dirs,
debug=self.debug,
extra_postargs=extra_args,
depends=ext.depends)
# XXX -- this is a Vile HACK!
#
# The setup.py script for Python on Unix needs to be able to
# get this list so it can perform all the clean up needed to
# avoid keeping object files around when cleaning out a failed
# build of an extension module. Since Distutils does not
# track dependencies, we have to get rid of intermediates to
# ensure all the intermediates will be properly re-built.
#
self._built_objects = objects[:]
# Now link the object files together into a "shared object" --
# of course, first we have to figure out all the other things
# that go into the mix.
if ext.extra_objects:
objects.extend(ext.extra_objects)
extra_args = ext.extra_link_args or []
# Detect target language, if not provided
language = ext.language or self.compiler.detect_language(sources)
self.compiler.link_shared_object(
objects, ext_path,
libraries=self.get_libraries(ext),
library_dirs=ext.library_dirs,
runtime_library_dirs=ext.runtime_library_dirs,
extra_postargs=extra_args,
export_symbols=self.get_export_symbols(ext),
debug=self.debug,
build_temp=self.build_temp,
target_lang=language)
def swig_sources (self, sources, extension):
"""Walk the list of source files in 'sources', looking for SWIG
interface (.i) files. Run SWIG on all that are found, and
return a modified 'sources' list with SWIG source files replaced
by the generated C (or C++) files.
"""
new_sources = []
swig_sources = []
swig_targets = {}
# XXX this drops generated C/C++ files into the source tree, which
# is fine for developers who want to distribute the generated
# source -- but there should be an option to put SWIG output in
# the temp dir.
if self.swig_cpp:
log.warn("--swig-cpp is deprecated - use --swig-opts=-c++")
if self.swig_cpp or ('-c++' in self.swig_opts) or \
('-c++' in extension.swig_opts):
target_ext = '.cpp'
else:
target_ext = '.c'
for source in sources:
(base, ext) = os.path.splitext(source)
if ext == ".i": # SWIG interface file
new_sources.append(base + '_wrap' + target_ext)
swig_sources.append(source)
swig_targets[source] = new_sources[-1]
else:
new_sources.append(source)
if not swig_sources:
return new_sources
swig = self.swig or self.find_swig()
swig_cmd = [swig, "-python"]
swig_cmd.extend(self.swig_opts)
if self.swig_cpp:
swig_cmd.append("-c++")
# Do not override commandline arguments
if not self.swig_opts:
for o in extension.swig_opts:
swig_cmd.append(o)
for source in swig_sources:
target = swig_targets[source]
log.info("swigging %s to %s", source, target)
self.spawn(swig_cmd + ["-o", target, source])
return new_sources
# swig_sources ()
def find_swig (self):
"""Return the name of the SWIG executable. On Unix, this is
just "swig" -- it should be in the PATH. Tries a bit harder on
Windows.
"""
if os.name == "posix":
return "swig"
elif os.name == "nt":
# Look for SWIG in its standard installation directory on
# Windows (or so I presume!). If we find it there, great;
# if not, act like Unix and assume it's in the PATH.
for vers in ("1.3", "1.2", "1.1"):
fn = os.path.join("c:\\swig%s" % vers, "swig.exe")
if os.path.isfile(fn):
return fn
else:
return "swig.exe"
elif os.name == "os2":
# assume swig available in the PATH.
return "swig.exe"
else:
raise DistutilsPlatformError, \
("I don't know how to find (much less run) SWIG "
"on platform '%s'") % os.name
# find_swig ()
# -- Name generators -----------------------------------------------
# (extension names, filenames, whatever)
def get_ext_fullpath(self, ext_name):
"""Returns the path of the filename for a given extension.
The file is located in `build_lib` or directly in the package
(inplace option).
"""
# makes sure the extension name is only using dots
all_dots = string.maketrans('/'+os.sep, '..')
ext_name = ext_name.translate(all_dots)
fullname = self.get_ext_fullname(ext_name)
modpath = fullname.split('.')
filename = self.get_ext_filename(ext_name)
filename = os.path.split(filename)[-1]
if not self.inplace:
# no further work needed
# returning :
# build_dir/package/path/filename
filename = os.path.join(*modpath[:-1]+[filename])
return os.path.join(self.build_lib, filename)
# the inplace option requires to find the package directory
# using the build_py command for that
package = '.'.join(modpath[0:-1])
build_py = self.get_finalized_command('build_py')
package_dir = os.path.abspath(build_py.get_package_dir(package))
# returning
# package_dir/filename
return os.path.join(package_dir, filename)
def get_ext_fullname(self, ext_name):
"""Returns the fullname of a given extension name.
Adds the `package.` prefix"""
if self.package is None:
return ext_name
else:
return self.package + '.' + ext_name
def get_ext_filename(self, ext_name):
r"""Convert the name of an extension (eg. "foo.bar") into the name
of the file from which it will be loaded (eg. "foo/bar.so", or
"foo\bar.pyd").
"""
from distutils.sysconfig import get_config_var
ext_path = string.split(ext_name, '.')
# OS/2 has an 8 character module (extension) limit :-(
if os.name == "os2":
ext_path[len(ext_path) - 1] = ext_path[len(ext_path) - 1][:8]
# extensions in debug_mode are named 'module_d.pyd' under windows
so_ext = get_config_var('SO')
if os.name == 'nt' and self.debug:
return os.path.join(*ext_path) + '_d' + so_ext
return os.path.join(*ext_path) + so_ext
def get_export_symbols (self, ext):
"""Return the list of symbols that a shared extension has to
export. This either uses 'ext.export_symbols' or, if it's not
provided, "init" + module_name. Only relevant on Windows, where
the .pyd file (DLL) must export the module "init" function.
"""
initfunc_name = "init" + ext.name.split('.')[-1]
if initfunc_name not in ext.export_symbols:
ext.export_symbols.append(initfunc_name)
return ext.export_symbols
def get_libraries (self, ext):
"""Return the list of libraries to link against when building a
shared extension. On most platforms, this is just 'ext.libraries';
on Windows and OS/2, we add the Python library (eg. python20.dll).
"""
# The python library is always needed on Windows. For MSVC, this
# is redundant, since the library is mentioned in a pragma in
# pyconfig.h that MSVC groks. The other Windows compilers all seem
# to need it mentioned explicitly, though, so that's what we do.
# Append '_d' to the python import library on debug builds.
if sys.platform == "win32":
from distutils.msvccompiler import MSVCCompiler
if not isinstance(self.compiler, MSVCCompiler):
template = "python%d%d"
if self.debug:
template = template + '_d'
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
# don't extend ext.libraries, it may be shared with other
# extensions, it is a reference to the original list
return ext.libraries + [pythonlib]
else:
return ext.libraries
elif sys.platform == "os2emx":
# EMX/GCC requires the python library explicitly, and I
# believe VACPP does as well (though not confirmed) - AIM Apr01
template = "python%d%d"
# debug versions of the main DLL aren't supported, at least
# not at this time - AIM Apr01
#if self.debug:
# template = template + '_d'
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
# don't extend ext.libraries, it may be shared with other
# extensions, it is a reference to the original list
return ext.libraries + [pythonlib]
elif sys.platform[:6] == "cygwin":
template = "python%d.%d"
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
# don't extend ext.libraries, it may be shared with other
# extensions, it is a reference to the original list
return ext.libraries + [pythonlib]
elif sys.platform[:6] == "atheos":
from distutils import sysconfig
template = "python%d.%d"
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
# Get SHLIBS from Makefile
extra = []
for lib in sysconfig.get_config_var('SHLIBS').split():
if lib.startswith('-l'):
extra.append(lib[2:])
else:
extra.append(lib)
# don't extend ext.libraries, it may be shared with other
# extensions, it is a reference to the original list
return ext.libraries + [pythonlib, "m"] + extra
elif sys.platform == 'darwin':
# Don't use the default code below
return ext.libraries
elif sys.platform[:3] == 'aix':
# Don't use the default code below
return ext.libraries
else:
from distutils import sysconfig
if sysconfig.get_config_var('Py_ENABLE_SHARED'):
template = "python%d.%d"
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
return ext.libraries + [pythonlib]
else:
return ext.libraries
# class build_ext
"""distutils.command.build_py
Implements the Distutils 'build_py' command."""
__revision__ = "$Id$"
import os
import sys
from glob import glob
from distutils.core import Command
from distutils.errors import DistutilsOptionError, DistutilsFileError
from distutils.util import convert_path
from distutils import log
class build_py(Command):
description = "\"build\" pure Python modules (copy to build directory)"
user_options = [
('build-lib=', 'd', "directory to \"build\" (copy) to"),
('compile', 'c', "compile .py to .pyc"),
('no-compile', None, "don't compile .py files [default]"),
('optimize=', 'O',
"also compile with optimization: -O1 for \"python -O\", "
"-O2 for \"python -OO\", and -O0 to disable [default: -O0]"),
('force', 'f', "forcibly build everything (ignore file timestamps)"),
]
boolean_options = ['compile', 'force']
negative_opt = {'no-compile' : 'compile'}
def initialize_options(self):
self.build_lib = None
self.py_modules = None
self.package = None
self.package_data = None
self.package_dir = None
self.compile = 0
self.optimize = 0
self.force = None
def finalize_options(self):
self.set_undefined_options('build',
('build_lib', 'build_lib'),
('force', 'force'))
# Get the distribution options that are aliases for build_py
# options -- list of packages and list of modules.
self.packages = self.distribution.packages
self.py_modules = self.distribution.py_modules
self.package_data = self.distribution.package_data
self.package_dir = {}
if self.distribution.package_dir:
for name, path in self.distribution.package_dir.items():
self.package_dir[name] = convert_path(path)
self.data_files = self.get_data_files()
# Ick, copied straight from install_lib.py (fancy_getopt needs a
# type system! Hell, *everything* needs a type system!!!)
if not isinstance(self.optimize, int):
try:
self.optimize = int(self.optimize)
assert 0 <= self.optimize <= 2
except (ValueError, AssertionError):
raise DistutilsOptionError("optimize must be 0, 1, or 2")
def run(self):
# XXX copy_file by default preserves atime and mtime. IMHO this is
# the right thing to do, but perhaps it should be an option -- in
# particular, a site administrator might want installed files to
# reflect the time of installation rather than the last
# modification time before the installed release.
# XXX copy_file by default preserves mode, which appears to be the
# wrong thing to do: if a file is read-only in the working
# directory, we want it to be installed read/write so that the next
# installation of the same module distribution can overwrite it
# without problems. (This might be a Unix-specific issue.) Thus
# we turn off 'preserve_mode' when copying to the build directory,
# since the build directory is supposed to be exactly what the
# installation will look like (ie. we preserve mode when
# installing).
# Two options control which modules will be installed: 'packages'
# and 'py_modules'. The former lets us work with whole packages, not
# specifying individual modules at all; the latter is for
# specifying modules one-at-a-time.
if self.py_modules:
self.build_modules()
if self.packages:
self.build_packages()
self.build_package_data()
self.byte_compile(self.get_outputs(include_bytecode=0))
def get_data_files(self):
"""Generate list of '(package,src_dir,build_dir,filenames)' tuples"""
data = []
if not self.packages:
return data
for package in self.packages:
# Locate package source directory
src_dir = self.get_package_dir(package)
# Compute package build directory
build_dir = os.path.join(*([self.build_lib] + package.split('.')))
# Length of path to strip from found files
plen = 0
if src_dir:
plen = len(src_dir)+1
# Strip directory from globbed filenames
filenames = [
file[plen:] for file in self.find_data_files(package, src_dir)
]
data.append((package, src_dir, build_dir, filenames))
return data
def find_data_files(self, package, src_dir):
"""Return filenames for package's data files in 'src_dir'"""
globs = (self.package_data.get('', [])
+ self.package_data.get(package, []))
files = []
for pattern in globs:
# Each pattern has to be converted to a platform-specific path
filelist = glob(os.path.join(src_dir, convert_path(pattern)))
# Files that match more than one pattern are only added once
files.extend([fn for fn in filelist if fn not in files
and os.path.isfile(fn)])
return files
def build_package_data(self):
"""Copy data files into build directory"""
for package, src_dir, build_dir, filenames in self.data_files:
for filename in filenames:
target = os.path.join(build_dir, filename)
self.mkpath(os.path.dirname(target))
self.copy_file(os.path.join(src_dir, filename), target,
preserve_mode=False)
def get_package_dir(self, package):
"""Return the directory, relative to the top of the source
distribution, where package 'package' should be found
(at least according to the 'package_dir' option, if any)."""
path = package.split('.')
if not self.package_dir:
if path:
return os.path.join(*path)
else:
return ''
else:
tail = []
while path:
try:
pdir = self.package_dir['.'.join(path)]
except KeyError:
tail.insert(0, path[-1])
del path[-1]
else:
tail.insert(0, pdir)
return os.path.join(*tail)
else:
# Oops, got all the way through 'path' without finding a
# match in package_dir. If package_dir defines a directory
# for the root (nameless) package, then fallback on it;
# otherwise, we might as well have not consulted
# package_dir at all, as we just use the directory implied
# by 'tail' (which should be the same as the original value
# of 'path' at this point).
pdir = self.package_dir.get('')
if pdir is not None:
tail.insert(0, pdir)
if tail:
return os.path.join(*tail)
else:
return ''
def check_package(self, package, package_dir):
# Empty dir name means current directory, which we can probably
# assume exists. Also, os.path.exists and isdir don't know about
# my "empty string means current dir" convention, so we have to
# circumvent them.
if package_dir != "":
if not os.path.exists(package_dir):
raise DistutilsFileError(
"package directory '%s' does not exist" % package_dir)
if not os.path.isdir(package_dir):
raise DistutilsFileError(
"supposed package directory '%s' exists, "
"but is not a directory" % package_dir)
# Require __init__.py for all but the "root package"
if package:
init_py = os.path.join(package_dir, "__init__.py")
if os.path.isfile(init_py):
return init_py
else:
log.warn(("package init file '%s' not found " +
"(or not a regular file)"), init_py)
# Either not in a package at all (__init__.py not expected), or
# __init__.py doesn't exist -- so don't return the filename.
return None
def check_module(self, module, module_file):
if not os.path.isfile(module_file):
log.warn("file %s (for module %s) not found", module_file, module)
return False
else:
return True
def find_package_modules(self, package, package_dir):
self.check_package(package, package_dir)
module_files = glob(os.path.join(package_dir, "*.py"))
modules = []
setup_script = os.path.abspath(self.distribution.script_name)
for f in module_files:
abs_f = os.path.abspath(f)
if abs_f != setup_script:
module = os.path.splitext(os.path.basename(f))[0]
modules.append((package, module, f))
else:
self.debug_print("excluding %s" % setup_script)
return modules
def find_modules(self):
"""Finds individually-specified Python modules, ie. those listed by
module name in 'self.py_modules'. Returns a list of tuples (package,
module_base, filename): 'package' is a tuple of the path through
package-space to the module; 'module_base' is the bare (no
packages, no dots) module name, and 'filename' is the path to the
".py" file (relative to the distribution root) that implements the
module.
"""
# Map package names to tuples of useful info about the package:
# (package_dir, checked)
# package_dir - the directory where we'll find source files for
# this package
# checked - true if we have checked that the package directory
# is valid (exists, contains __init__.py, ... ?)
packages = {}
# List of (package, module, filename) tuples to return
modules = []
# We treat modules-in-packages almost the same as toplevel modules,
# just the "package" for a toplevel is empty (either an empty
# string or empty list, depending on context). Differences:
# - don't check for __init__.py in directory for empty package
for module in self.py_modules:
path = module.split('.')
package = '.'.join(path[0:-1])
module_base = path[-1]
try:
(package_dir, checked) = packages[package]
except KeyError:
package_dir = self.get_package_dir(package)
checked = 0
if not checked:
init_py = self.check_package(package, package_dir)
packages[package] = (package_dir, 1)
if init_py:
modules.append((package, "__init__", init_py))
# XXX perhaps we should also check for just .pyc files
# (so greedy closed-source bastards can distribute Python
# modules too)
module_file = os.path.join(package_dir, module_base + ".py")
if not self.check_module(module, module_file):
continue
modules.append((package, module_base, module_file))
return modules
def find_all_modules(self):
"""Compute the list of all modules that will be built, whether
they are specified one-module-at-a-time ('self.py_modules') or
by whole packages ('self.packages'). Return a list of tuples
(package, module, module_file), just like 'find_modules()' and
'find_package_modules()' do."""
modules = []
if self.py_modules:
modules.extend(self.find_modules())
if self.packages:
for package in self.packages:
package_dir = self.get_package_dir(package)
m = self.find_package_modules(package, package_dir)
modules.extend(m)
return modules
def get_source_files(self):
return [module[-1] for module in self.find_all_modules()]
def get_module_outfile(self, build_dir, package, module):
outfile_path = [build_dir] + list(package) + [module + ".py"]
return os.path.join(*outfile_path)
def get_outputs(self, include_bytecode=1):
modules = self.find_all_modules()
outputs = []
for (package, module, module_file) in modules:
package = package.split('.')
filename = self.get_module_outfile(self.build_lib, package, module)
outputs.append(filename)
if include_bytecode:
if self.compile:
outputs.append(filename + "c")
if self.optimize > 0:
outputs.append(filename + "o")
outputs += [
os.path.join(build_dir, filename)
for package, src_dir, build_dir, filenames in self.data_files
for filename in filenames
]
return outputs
def build_module(self, module, module_file, package):
if isinstance(package, str):
package = package.split('.')
elif not isinstance(package, (list, tuple)):
raise TypeError(
"'package' must be a string (dot-separated), list, or tuple")
# Now put the module source file into the "build" area -- this is
# easy, we just copy it somewhere under self.build_lib (the build
# directory for Python source).
outfile = self.get_module_outfile(self.build_lib, package, module)
dir = os.path.dirname(outfile)
self.mkpath(dir)
return self.copy_file(module_file, outfile, preserve_mode=0)
def build_modules(self):
modules = self.find_modules()
for (package, module, module_file) in modules:
# Now "build" the module -- ie. copy the source file to
# self.build_lib (the build directory for Python source).
# (Actually, it gets copied to the directory for this package
# under self.build_lib.)
self.build_module(module, module_file, package)
def build_packages(self):
for package in self.packages:
# Get list of (package, module, module_file) tuples based on
# scanning the package directory. 'package' is only included
# in the tuple so that 'find_modules()' and
# 'find_package_tuples()' have a consistent interface; it's
# ignored here (apart from a sanity check). Also, 'module' is
# the *unqualified* module name (ie. no dots, no package -- we
# already know its package!), and 'module_file' is the path to
# the .py file, relative to the current directory
# (ie. including 'package_dir').
package_dir = self.get_package_dir(package)
modules = self.find_package_modules(package, package_dir)
# Now loop over the modules we found, "building" each one (just
# copy it to self.build_lib).
for (package_, module, module_file) in modules:
assert package == package_
self.build_module(module, module_file, package)
def byte_compile(self, files):
if sys.dont_write_bytecode:
self.warn('byte-compiling is disabled, skipping.')
return
from distutils.util import byte_compile
prefix = self.build_lib
if prefix[-1] != os.sep:
prefix = prefix + os.sep
# XXX this code is essentially the same as the 'byte_compile()
# method of the "install_lib" command, except for the determination
# of the 'prefix' string. Hmmm.
if self.compile:
byte_compile(files, optimize=0,
force=self.force, prefix=prefix, dry_run=self.dry_run)
if self.optimize > 0:
byte_compile(files, optimize=self.optimize,
force=self.force, prefix=prefix, dry_run=self.dry_run)
"""distutils.command.build_scripts
Implements the Distutils 'build_scripts' command."""
__revision__ = "$Id$"
import os, re
from stat import ST_MODE
from distutils.core import Command
from distutils.dep_util import newer
from distutils.util import convert_path
from distutils import log
# check if Python is called on the first line with this expression
first_line_re = re.compile('^#!.*python[0-9.]*([ \t].*)?$')
class build_scripts (Command):
description = "\"build\" scripts (copy and fixup #! line)"
user_options = [
('build-dir=', 'd', "directory to \"build\" (copy) to"),
('force', 'f', "forcibly build everything (ignore file timestamps"),
('executable=', 'e', "specify final destination interpreter path"),
]
boolean_options = ['force']
def initialize_options (self):
self.build_dir = None
self.scripts = None
self.force = None
self.executable = None
self.outfiles = None
def finalize_options (self):
self.set_undefined_options('build',
('build_scripts', 'build_dir'),
('force', 'force'),
('executable', 'executable'))
self.scripts = self.distribution.scripts
def get_source_files(self):
return self.scripts
def run (self):
if not self.scripts:
return
self.copy_scripts()
def copy_scripts (self):
"""Copy each script listed in 'self.scripts'; if it's marked as a
Python script in the Unix way (first line matches 'first_line_re',
ie. starts with "\#!" and contains "python"), then adjust the first
line to refer to the current Python interpreter as we copy.
"""
_sysconfig = __import__('sysconfig')
self.mkpath(self.build_dir)
outfiles = []
for script in self.scripts:
adjust = 0
script = convert_path(script)
outfile = os.path.join(self.build_dir, os.path.basename(script))
outfiles.append(outfile)
if not self.force and not newer(script, outfile):
log.debug("not copying %s (up-to-date)", script)
continue
# Always open the file, but ignore failures in dry-run mode --
# that way, we'll get accurate feedback if we can read the
# script.
try:
f = open(script, "r")
except IOError:
if not self.dry_run:
raise
f = None
else:
first_line = f.readline()
if not first_line:
self.warn("%s is an empty file (skipping)" % script)
continue
match = first_line_re.match(first_line)
if match:
adjust = 1
post_interp = match.group(1) or ''
if adjust:
log.info("copying and adjusting %s -> %s", script,
self.build_dir)
if not self.dry_run:
outf = open(outfile, "w")
if not _sysconfig.is_python_build():
outf.write("#!%s%s\n" %
(self.executable,
post_interp))
else:
outf.write("#!%s%s\n" %
(os.path.join(
_sysconfig.get_config_var("BINDIR"),
"python%s%s" % (_sysconfig.get_config_var("VERSION"),
_sysconfig.get_config_var("EXE"))),
post_interp))
outf.writelines(f.readlines())
outf.close()
if f:
f.close()
else:
if f:
f.close()
self.copy_file(script, outfile)
if os.name == 'posix':
for file in outfiles:
if self.dry_run:
log.info("changing mode of %s", file)
else:
oldmode = os.stat(file)[ST_MODE] & 07777
newmode = (oldmode | 0555) & 07777
if newmode != oldmode:
log.info("changing mode of %s from %o to %o",
file, oldmode, newmode)
os.chmod(file, newmode)
# copy_scripts ()
# class build_scripts
"""distutils.command.check
Implements the Distutils 'check' command.
"""
__revision__ = "$Id$"
from distutils.core import Command
from distutils.dist import PKG_INFO_ENCODING
from distutils.errors import DistutilsSetupError
try:
# docutils is installed
from docutils.utils import Reporter
from docutils.parsers.rst import Parser
from docutils import frontend
from docutils import nodes
from StringIO import StringIO
class SilentReporter(Reporter):
def __init__(self, source, report_level, halt_level, stream=None,
debug=0, encoding='ascii', error_handler='replace'):
self.messages = []
Reporter.__init__(self, source, report_level, halt_level, stream,
debug, encoding, error_handler)
def system_message(self, level, message, *children, **kwargs):
self.messages.append((level, message, children, kwargs))
return nodes.system_message(message, level=level,
type=self.levels[level],
*children, **kwargs)
HAS_DOCUTILS = True
except ImportError:
# docutils is not installed
HAS_DOCUTILS = False
class check(Command):
"""This command checks the meta-data of the package.
"""
description = ("perform some checks on the package")
user_options = [('metadata', 'm', 'Verify meta-data'),
('restructuredtext', 'r',
('Checks if long string meta-data syntax '
'are reStructuredText-compliant')),
('strict', 's',
'Will exit with an error if a check fails')]
boolean_options = ['metadata', 'restructuredtext', 'strict']
def initialize_options(self):
"""Sets default values for options."""
self.restructuredtext = 0
self.metadata = 1
self.strict = 0
self._warnings = 0
def finalize_options(self):
pass
def warn(self, msg):
"""Counts the number of warnings that occurs."""
self._warnings += 1
return Command.warn(self, msg)
def run(self):
"""Runs the command."""
# perform the various tests
if self.metadata:
self.check_metadata()
if self.restructuredtext:
if HAS_DOCUTILS:
self.check_restructuredtext()
elif self.strict:
raise DistutilsSetupError('The docutils package is needed.')
# let's raise an error in strict mode, if we have at least
# one warning
if self.strict and self._warnings > 0:
raise DistutilsSetupError('Please correct your package.')
def check_metadata(self):
"""Ensures that all required elements of meta-data are supplied.
name, version, URL, (author and author_email) or
(maintainer and maintainer_email)).
Warns if any are missing.
"""
metadata = self.distribution.metadata
missing = []
for attr in ('name', 'version', 'url'):
if not (hasattr(metadata, attr) and getattr(metadata, attr)):
missing.append(attr)
if missing:
self.warn("missing required meta-data: %s" % ', '.join(missing))
if metadata.author:
if not metadata.author_email:
self.warn("missing meta-data: if 'author' supplied, " +
"'author_email' must be supplied too")
elif metadata.maintainer:
if not metadata.maintainer_email:
self.warn("missing meta-data: if 'maintainer' supplied, " +
"'maintainer_email' must be supplied too")
else:
self.warn("missing meta-data: either (author and author_email) " +
"or (maintainer and maintainer_email) " +
"must be supplied")
def check_restructuredtext(self):
"""Checks if the long string fields are reST-compliant."""
data = self.distribution.get_long_description()
if not isinstance(data, unicode):
data = data.decode(PKG_INFO_ENCODING)
for warning in self._check_rst_data(data):
line = warning[-1].get('line')
if line is None:
warning = warning[1]
else:
warning = '%s (line %s)' % (warning[1], line)
self.warn(warning)
def _check_rst_data(self, data):
"""Returns warnings when the provided data doesn't compile."""
# the include and csv_table directives need this to be a path
source_path = self.distribution.script_name or 'setup.py'
parser = Parser()
settings = frontend.OptionParser(components=(Parser,)).get_default_values()
settings.tab_width = 4
settings.pep_references = None
settings.rfc_references = None
reporter = SilentReporter(source_path,
settings.report_level,
settings.halt_level,
stream=settings.warning_stream,
debug=settings.debug,
encoding=settings.error_encoding,
error_handler=settings.error_encoding_error_handler)
document = nodes.document(settings, reporter, source=source_path)
document.note_source(source_path, -1)
try:
parser.parse(data, document)
except AttributeError as e:
reporter.messages.append(
(-1, 'Could not finish the parsing: %s.' % e, '', {}))
return reporter.messages
"""distutils.command.clean
Implements the Distutils 'clean' command."""
# contributed by Bastian Kleineidam <[email protected]>, added 2000-03-18
__revision__ = "$Id$"
import os
from distutils.core import Command
from distutils.dir_util import remove_tree
from distutils import log
class clean(Command):
description = "clean up temporary files from 'build' command"
user_options = [
('build-base=', 'b',
"base build directory (default: 'build.build-base')"),
('build-lib=', None,
"build directory for all modules (default: 'build.build-lib')"),
('build-temp=', 't',
"temporary build directory (default: 'build.build-temp')"),
('build-scripts=', None,
"build directory for scripts (default: 'build.build-scripts')"),
('bdist-base=', None,
"temporary directory for built distributions"),
('all', 'a',
"remove all build output, not just temporary by-products")
]
boolean_options = ['all']
def initialize_options(self):
self.build_base = None
self.build_lib = None
self.build_temp = None
self.build_scripts = None
self.bdist_base = None
self.all = None
def finalize_options(self):
self.set_undefined_options('build',
('build_base', 'build_base'),
('build_lib', 'build_lib'),
('build_scripts', 'build_scripts'),
('build_temp', 'build_temp'))
self.set_undefined_options('bdist',
('bdist_base', 'bdist_base'))
def run(self):
# remove the build/temp.<plat> directory (unless it's already
# gone)
if os.path.exists(self.build_temp):
remove_tree(self.build_temp, dry_run=self.dry_run)
else:
log.debug("'%s' does not exist -- can't clean it",
self.build_temp)
if self.all:
# remove build directories
for directory in (self.build_lib,
self.bdist_base,
self.build_scripts):
if os.path.exists(directory):
remove_tree(directory, dry_run=self.dry_run)
else:
log.warn("'%s' does not exist -- can't clean it",
directory)
# just for the heck of it, try to remove the base build directory:
# we might have emptied it right now, but if not we don't care
if not self.dry_run:
try:
os.rmdir(self.build_base)
log.info("removing '%s'", self.build_base)
except OSError:
pass
# class clean
"""distutils.command.config
Implements the Distutils 'config' command, a (mostly) empty command class
that exists mainly to be sub-classed by specific module distributions and
applications. The idea is that while every "config" command is different,
at least they're all named the same, and users always see "config" in the
list of standard commands. Also, this is a good place to put common
configure-like tasks: "try to compile this C code", or "figure out where
this header file lives".
"""
__revision__ = "$Id$"
import os
import re
from distutils.core import Command
from distutils.errors import DistutilsExecError
from distutils.sysconfig import customize_compiler
from distutils import log
LANG_EXT = {'c': '.c', 'c++': '.cxx'}
class config(Command):
description = "prepare to build"
user_options = [
('compiler=', None,
"specify the compiler type"),
('cc=', None,
"specify the compiler executable"),
('include-dirs=', 'I',
"list of directories to search for header files"),
('define=', 'D',
"C preprocessor macros to define"),
('undef=', 'U',
"C preprocessor macros to undefine"),
('libraries=', 'l',
"external C libraries to link with"),
('library-dirs=', 'L',
"directories to search for external C libraries"),
('noisy', None,
"show every action (compile, link, run, ...) taken"),
('dump-source', None,
"dump generated source files before attempting to compile them"),
]
# The three standard command methods: since the "config" command
# does nothing by default, these are empty.
def initialize_options(self):
self.compiler = None
self.cc = None
self.include_dirs = None
self.libraries = None
self.library_dirs = None
# maximal output for now
self.noisy = 1
self.dump_source = 1
# list of temporary files generated along-the-way that we have
# to clean at some point
self.temp_files = []
def finalize_options(self):
if self.include_dirs is None:
self.include_dirs = self.distribution.include_dirs or []
elif isinstance(self.include_dirs, str):
self.include_dirs = self.include_dirs.split(os.pathsep)
if self.libraries is None:
self.libraries = []
elif isinstance(self.libraries, str):
self.libraries = [self.libraries]
if self.library_dirs is None:
self.library_dirs = []
elif isinstance(self.library_dirs, str):
self.library_dirs = self.library_dirs.split(os.pathsep)
def run(self):
pass
# Utility methods for actual "config" commands. The interfaces are
# loosely based on Autoconf macros of similar names. Sub-classes
# may use these freely.
def _check_compiler(self):
"""Check that 'self.compiler' really is a CCompiler object;
if not, make it one.
"""
# We do this late, and only on-demand, because this is an expensive
# import.
from distutils.ccompiler import CCompiler, new_compiler
if not isinstance(self.compiler, CCompiler):
self.compiler = new_compiler(compiler=self.compiler,
dry_run=self.dry_run, force=1)
customize_compiler(self.compiler)
if self.include_dirs:
self.compiler.set_include_dirs(self.include_dirs)
if self.libraries:
self.compiler.set_libraries(self.libraries)
if self.library_dirs:
self.compiler.set_library_dirs(self.library_dirs)
def _gen_temp_sourcefile(self, body, headers, lang):
filename = "_configtest" + LANG_EXT[lang]
file = open(filename, "w")
if headers:
for header in headers:
file.write("#include <%s>\n" % header)
file.write("\n")
file.write(body)
if body[-1] != "\n":
file.write("\n")
file.close()
return filename
def _preprocess(self, body, headers, include_dirs, lang):
src = self._gen_temp_sourcefile(body, headers, lang)
out = "_configtest.i"
self.temp_files.extend([src, out])
self.compiler.preprocess(src, out, include_dirs=include_dirs)
return (src, out)
def _compile(self, body, headers, include_dirs, lang):
src = self._gen_temp_sourcefile(body, headers, lang)
if self.dump_source:
dump_file(src, "compiling '%s':" % src)
(obj,) = self.compiler.object_filenames([src])
self.temp_files.extend([src, obj])
self.compiler.compile([src], include_dirs=include_dirs)
return (src, obj)
def _link(self, body, headers, include_dirs, libraries, library_dirs,
lang):
(src, obj) = self._compile(body, headers, include_dirs, lang)
prog = os.path.splitext(os.path.basename(src))[0]
self.compiler.link_executable([obj], prog,
libraries=libraries,
library_dirs=library_dirs,
target_lang=lang)
if self.compiler.exe_extension is not None:
prog = prog + self.compiler.exe_extension
self.temp_files.append(prog)
return (src, obj, prog)
def _clean(self, *filenames):
if not filenames:
filenames = self.temp_files
self.temp_files = []
log.info("removing: %s", ' '.join(filenames))
for filename in filenames:
try:
os.remove(filename)
except OSError:
pass
# XXX these ignore the dry-run flag: what to do, what to do? even if
# you want a dry-run build, you still need some sort of configuration
# info. My inclination is to make it up to the real config command to
# consult 'dry_run', and assume a default (minimal) configuration if
# true. The problem with trying to do it here is that you'd have to
# return either true or false from all the 'try' methods, neither of
# which is correct.
# XXX need access to the header search path and maybe default macros.
def try_cpp(self, body=None, headers=None, include_dirs=None, lang="c"):
"""Construct a source file from 'body' (a string containing lines
of C/C++ code) and 'headers' (a list of header files to include)
and run it through the preprocessor. Return true if the
preprocessor succeeded, false if there were any errors.
('body' probably isn't of much use, but what the heck.)
"""
from distutils.ccompiler import CompileError
self._check_compiler()
ok = 1
try:
self._preprocess(body, headers, include_dirs, lang)
except CompileError:
ok = 0
self._clean()
return ok
def search_cpp(self, pattern, body=None, headers=None, include_dirs=None,
lang="c"):
"""Construct a source file (just like 'try_cpp()'), run it through
the preprocessor, and return true if any line of the output matches
'pattern'. 'pattern' should either be a compiled regex object or a
string containing a regex. If both 'body' and 'headers' are None,
preprocesses an empty file -- which can be useful to determine the
symbols the preprocessor and compiler set by default.
"""
self._check_compiler()
src, out = self._preprocess(body, headers, include_dirs, lang)
if isinstance(pattern, str):
pattern = re.compile(pattern)
file = open(out)
match = 0
while 1:
line = file.readline()
if line == '':
break
if pattern.search(line):
match = 1
break
file.close()
self._clean()
return match
def try_compile(self, body, headers=None, include_dirs=None, lang="c"):
"""Try to compile a source file built from 'body' and 'headers'.
Return true on success, false otherwise.
"""
from distutils.ccompiler import CompileError
self._check_compiler()
try:
self._compile(body, headers, include_dirs, lang)
ok = 1
except CompileError:
ok = 0
log.info(ok and "success!" or "failure.")
self._clean()
return ok
def try_link(self, body, headers=None, include_dirs=None, libraries=None,
library_dirs=None, lang="c"):
"""Try to compile and link a source file, built from 'body' and
'headers', to executable form. Return true on success, false
otherwise.
"""
from distutils.ccompiler import CompileError, LinkError
self._check_compiler()
try:
self._link(body, headers, include_dirs,
libraries, library_dirs, lang)
ok = 1
except (CompileError, LinkError):
ok = 0
log.info(ok and "success!" or "failure.")
self._clean()
return ok
def try_run(self, body, headers=None, include_dirs=None, libraries=None,
library_dirs=None, lang="c"):
"""Try to compile, link to an executable, and run a program
built from 'body' and 'headers'. Return true on success, false
otherwise.
"""
from distutils.ccompiler import CompileError, LinkError
self._check_compiler()
try:
src, obj, exe = self._link(body, headers, include_dirs,
libraries, library_dirs, lang)
self.spawn([exe])
ok = 1
except (CompileError, LinkError, DistutilsExecError):
ok = 0
log.info(ok and "success!" or "failure.")
self._clean()
return ok
# -- High-level methods --------------------------------------------
# (these are the ones that are actually likely to be useful
# when implementing a real-world config command!)
def check_func(self, func, headers=None, include_dirs=None,
libraries=None, library_dirs=None, decl=0, call=0):
"""Determine if function 'func' is available by constructing a
source file that refers to 'func', and compiles and links it.
If everything succeeds, returns true; otherwise returns false.
The constructed source file starts out by including the header
files listed in 'headers'. If 'decl' is true, it then declares
'func' (as "int func()"); you probably shouldn't supply 'headers'
and set 'decl' true in the same call, or you might get errors about
a conflicting declarations for 'func'. Finally, the constructed
'main()' function either references 'func' or (if 'call' is true)
calls it. 'libraries' and 'library_dirs' are used when
linking.
"""
self._check_compiler()
body = []
if decl:
body.append("int %s ();" % func)
body.append("int main () {")
if call:
body.append(" %s();" % func)
else:
body.append(" %s;" % func)
body.append("}")
body = "\n".join(body) + "\n"
return self.try_link(body, headers, include_dirs,
libraries, library_dirs)
# check_func ()
def check_lib(self, library, library_dirs=None, headers=None,
include_dirs=None, other_libraries=[]):
"""Determine if 'library' is available to be linked against,
without actually checking that any particular symbols are provided
by it. 'headers' will be used in constructing the source file to
be compiled, but the only effect of this is to check if all the
header files listed are available. Any libraries listed in
'other_libraries' will be included in the link, in case 'library'
has symbols that depend on other libraries.
"""
self._check_compiler()
return self.try_link("int main (void) { }",
headers, include_dirs,
[library]+other_libraries, library_dirs)
def check_header(self, header, include_dirs=None, library_dirs=None,
lang="c"):
"""Determine if the system header file named by 'header_file'
exists and can be found by the preprocessor; return true if so,
false otherwise.
"""
return self.try_cpp(body="/* No body */", headers=[header],
include_dirs=include_dirs)
def dump_file(filename, head=None):
"""Dumps a file content into log.info.
If head is not None, will be dumped before the file content.
"""
if head is None:
log.info('%s' % filename)
else:
log.info(head)
file = open(filename)
try:
log.info(file.read())
finally:
file.close()
"""distutils.command.install
Implements the Distutils 'install' command."""
from distutils import log
# This module should be kept compatible with Python 2.1.
__revision__ = "$Id$"
import sys, os, string
from types import *
from distutils.core import Command
from distutils.debug import DEBUG
from distutils.sysconfig import get_config_vars
from distutils.errors import DistutilsPlatformError
from distutils.file_util import write_file
from distutils.util import convert_path, subst_vars, change_root
from distutils.util import get_platform
from distutils.errors import DistutilsOptionError
from site import USER_BASE
from site import USER_SITE
if sys.version < "2.2":
WINDOWS_SCHEME = {
'purelib': '$base',
'platlib': '$base',
'headers': '$base/Include/$dist_name',
'scripts': '$base/Scripts',
'data' : '$base',
}
else:
WINDOWS_SCHEME = {
'purelib': '$base/Lib/site-packages',
'platlib': '$base/Lib/site-packages',
'headers': '$base/Include/$dist_name',
'scripts': '$base/Scripts',
'data' : '$base',
}
INSTALL_SCHEMES = {
'unix_prefix': {
'purelib': '$base/lib/python$py_version_short/site-packages',
'platlib': '$platbase/lib/python$py_version_short/site-packages',
'headers': '$base/include/python$py_version_short/$dist_name',
'scripts': '$base/bin',
'data' : '$base',
},
'unix_home': {
'purelib': '$base/lib/python',
'platlib': '$base/lib/python',
'headers': '$base/include/python/$dist_name',
'scripts': '$base/bin',
'data' : '$base',
},
'unix_user': {
'purelib': '$usersite',
'platlib': '$usersite',
'headers': '$userbase/include/python$py_version_short/$dist_name',
'scripts': '$userbase/bin',
'data' : '$userbase',
},
'nt': WINDOWS_SCHEME,
'nt_user': {
'purelib': '$usersite',
'platlib': '$usersite',
'headers': '$userbase/Python$py_version_nodot/Include/$dist_name',
'scripts': '$userbase/Scripts',
'data' : '$userbase',
},
'os2': {
'purelib': '$base/Lib/site-packages',
'platlib': '$base/Lib/site-packages',
'headers': '$base/Include/$dist_name',
'scripts': '$base/Scripts',
'data' : '$base',
},
'os2_home': {
'purelib': '$usersite',
'platlib': '$usersite',
'headers': '$userbase/include/python$py_version_short/$dist_name',
'scripts': '$userbase/bin',
'data' : '$userbase',
},
}
# The keys to an installation scheme; if any new types of files are to be
# installed, be sure to add an entry to every installation scheme above,
# and to SCHEME_KEYS here.
SCHEME_KEYS = ('purelib', 'platlib', 'headers', 'scripts', 'data')
class install (Command):
description = "install everything from build directory"
user_options = [
# Select installation scheme and set base director(y|ies)
('prefix=', None,
"installation prefix"),
('exec-prefix=', None,
"(Unix only) prefix for platform-specific files"),
('home=', None,
"(Unix only) home directory to install under"),
('user', None,
"install in user site-package '%s'" % USER_SITE),
# Or, just set the base director(y|ies)
('install-base=', None,
"base installation directory (instead of --prefix or --home)"),
('install-platbase=', None,
"base installation directory for platform-specific files " +
"(instead of --exec-prefix or --home)"),
('root=', None,
"install everything relative to this alternate root directory"),
# Or, explicitly set the installation scheme
('install-purelib=', None,
"installation directory for pure Python module distributions"),
('install-platlib=', None,
"installation directory for non-pure module distributions"),
('install-lib=', None,
"installation directory for all module distributions " +
"(overrides --install-purelib and --install-platlib)"),
('install-headers=', None,
"installation directory for C/C++ headers"),
('install-scripts=', None,
"installation directory for Python scripts"),
('install-data=', None,
"installation directory for data files"),
# Byte-compilation options -- see install_lib.py for details, as
# these are duplicated from there (but only install_lib does
# anything with them).
('compile', 'c', "compile .py to .pyc [default]"),
('no-compile', None, "don't compile .py files"),
('optimize=', 'O',
"also compile with optimization: -O1 for \"python -O\", "
"-O2 for \"python -OO\", and -O0 to disable [default: -O0]"),
# Miscellaneous control options
('force', 'f',
"force installation (overwrite any existing files)"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
# Where to install documentation (eventually!)
#('doc-format=', None, "format of documentation to generate"),
#('install-man=', None, "directory for Unix man pages"),
#('install-html=', None, "directory for HTML documentation"),
#('install-info=', None, "directory for GNU info files"),
('record=', None,
"filename in which to record list of installed files"),
]
boolean_options = ['compile', 'force', 'skip-build', 'user']
negative_opt = {'no-compile' : 'compile'}
def initialize_options (self):
# High-level options: these select both an installation base
# and scheme.
self.prefix = None
self.exec_prefix = None
self.home = None
self.user = 0
# These select only the installation base; it's up to the user to
# specify the installation scheme (currently, that means supplying
# the --install-{platlib,purelib,scripts,data} options).
self.install_base = None
self.install_platbase = None
self.root = None
# These options are the actual installation directories; if not
# supplied by the user, they are filled in using the installation
# scheme implied by prefix/exec-prefix/home and the contents of
# that installation scheme.
self.install_purelib = None # for pure module distributions
self.install_platlib = None # non-pure (dists w/ extensions)
self.install_headers = None # for C/C++ headers
self.install_lib = None # set to either purelib or platlib
self.install_scripts = None
self.install_data = None
self.install_userbase = USER_BASE
self.install_usersite = USER_SITE
self.compile = None
self.optimize = None
# These two are for putting non-packagized distributions into their
# own directory and creating a .pth file if it makes sense.
# 'extra_path' comes from the setup file; 'install_path_file' can
# be turned off if it makes no sense to install a .pth file. (But
# better to install it uselessly than to guess wrong and not
# install it when it's necessary and would be used!) Currently,
# 'install_path_file' is always true unless some outsider meddles
# with it.
self.extra_path = None
self.install_path_file = 1
# 'force' forces installation, even if target files are not
# out-of-date. 'skip_build' skips running the "build" command,
# handy if you know it's not necessary. 'warn_dir' (which is *not*
# a user option, it's just there so the bdist_* commands can turn
# it off) determines whether we warn about installing to a
# directory not in sys.path.
self.force = 0
self.skip_build = 0
self.warn_dir = 1
# These are only here as a conduit from the 'build' command to the
# 'install_*' commands that do the real work. ('build_base' isn't
# actually used anywhere, but it might be useful in future.) They
# are not user options, because if the user told the install
# command where the build directory is, that wouldn't affect the
# build command.
self.build_base = None
self.build_lib = None
# Not defined yet because we don't know anything about
# documentation yet.
#self.install_man = None
#self.install_html = None
#self.install_info = None
self.record = None
# -- Option finalizing methods -------------------------------------
# (This is rather more involved than for most commands,
# because this is where the policy for installing third-
# party Python modules on various platforms given a wide
# array of user input is decided. Yes, it's quite complex!)
def finalize_options (self):
# This method (and its pliant slaves, like 'finalize_unix()',
# 'finalize_other()', and 'select_scheme()') is where the default
# installation directories for modules, extension modules, and
# anything else we care to install from a Python module
# distribution. Thus, this code makes a pretty important policy
# statement about how third-party stuff is added to a Python
# installation! Note that the actual work of installation is done
# by the relatively simple 'install_*' commands; they just take
# their orders from the installation directory options determined
# here.
# Check for errors/inconsistencies in the options; first, stuff
# that's wrong on any platform.
if ((self.prefix or self.exec_prefix or self.home) and
(self.install_base or self.install_platbase)):
raise DistutilsOptionError, \
("must supply either prefix/exec-prefix/home or " +
"install-base/install-platbase -- not both")
if self.home and (self.prefix or self.exec_prefix):
raise DistutilsOptionError, \
"must supply either home or prefix/exec-prefix -- not both"
if self.user and (self.prefix or self.exec_prefix or self.home or
self.install_base or self.install_platbase):
raise DistutilsOptionError("can't combine user with prefix, "
"exec_prefix/home, or install_(plat)base")
# Next, stuff that's wrong (or dubious) only on certain platforms.
if os.name != "posix":
if self.exec_prefix:
self.warn("exec-prefix option ignored on this platform")
self.exec_prefix = None
# Now the interesting logic -- so interesting that we farm it out
# to other methods. The goal of these methods is to set the final
# values for the install_{lib,scripts,data,...} options, using as
# input a heady brew of prefix, exec_prefix, home, install_base,
# install_platbase, user-supplied versions of
# install_{purelib,platlib,lib,scripts,data,...}, and the
# INSTALL_SCHEME dictionary above. Phew!
self.dump_dirs("pre-finalize_{unix,other}")
if os.name == 'posix':
self.finalize_unix()
else:
self.finalize_other()
self.dump_dirs("post-finalize_{unix,other}()")
# Expand configuration variables, tilde, etc. in self.install_base
# and self.install_platbase -- that way, we can use $base or
# $platbase in the other installation directories and not worry
# about needing recursive variable expansion (shudder).
py_version = (string.split(sys.version))[0]
(prefix, exec_prefix) = get_config_vars('prefix', 'exec_prefix')
self.config_vars = {'dist_name': self.distribution.get_name(),
'dist_version': self.distribution.get_version(),
'dist_fullname': self.distribution.get_fullname(),
'py_version': py_version,
'py_version_short': py_version[0:3],
'py_version_nodot': py_version[0] + py_version[2],
'sys_prefix': prefix,
'prefix': prefix,
'sys_exec_prefix': exec_prefix,
'exec_prefix': exec_prefix,
'userbase': self.install_userbase,
'usersite': self.install_usersite,
}
self.expand_basedirs()
self.dump_dirs("post-expand_basedirs()")
# Now define config vars for the base directories so we can expand
# everything else.
self.config_vars['base'] = self.install_base
self.config_vars['platbase'] = self.install_platbase
if DEBUG:
from pprint import pprint
print "config vars:"
pprint(self.config_vars)
# Expand "~" and configuration variables in the installation
# directories.
self.expand_dirs()
self.dump_dirs("post-expand_dirs()")
# Create directories in the home dir:
if self.user:
self.create_home_path()
# Pick the actual directory to install all modules to: either
# install_purelib or install_platlib, depending on whether this
# module distribution is pure or not. Of course, if the user
# already specified install_lib, use their selection.
if self.install_lib is None:
if self.distribution.ext_modules: # has extensions: non-pure
self.install_lib = self.install_platlib
else:
self.install_lib = self.install_purelib
# Convert directories from Unix /-separated syntax to the local
# convention.
self.convert_paths('lib', 'purelib', 'platlib',
'scripts', 'data', 'headers',
'userbase', 'usersite')
# Well, we're not actually fully completely finalized yet: we still
# have to deal with 'extra_path', which is the hack for allowing
# non-packagized module distributions (hello, Numerical Python!) to
# get their own directories.
self.handle_extra_path()
self.install_libbase = self.install_lib # needed for .pth file
self.install_lib = os.path.join(self.install_lib, self.extra_dirs)
# If a new root directory was supplied, make all the installation
# dirs relative to it.
if self.root is not None:
self.change_roots('libbase', 'lib', 'purelib', 'platlib',
'scripts', 'data', 'headers')
self.dump_dirs("after prepending root")
# Find out the build directories, ie. where to install from.
self.set_undefined_options('build',
('build_base', 'build_base'),
('build_lib', 'build_lib'))
# Punt on doc directories for now -- after all, we're punting on
# documentation completely!
# finalize_options ()
def dump_dirs (self, msg):
if DEBUG:
from distutils.fancy_getopt import longopt_xlate
print msg + ":"
for opt in self.user_options:
opt_name = opt[0]
if opt_name[-1] == "=":
opt_name = opt_name[0:-1]
if opt_name in self.negative_opt:
opt_name = string.translate(self.negative_opt[opt_name],
longopt_xlate)
val = not getattr(self, opt_name)
else:
opt_name = string.translate(opt_name, longopt_xlate)
val = getattr(self, opt_name)
print " %s: %s" % (opt_name, val)
def finalize_unix (self):
if self.install_base is not None or self.install_platbase is not None:
if ((self.install_lib is None and
self.install_purelib is None and
self.install_platlib is None) or
self.install_headers is None or
self.install_scripts is None or
self.install_data is None):
raise DistutilsOptionError, \
("install-base or install-platbase supplied, but "
"installation scheme is incomplete")
return
if self.user:
if self.install_userbase is None:
raise DistutilsPlatformError(
"User base directory is not specified")
self.install_base = self.install_platbase = self.install_userbase
self.select_scheme("unix_user")
elif self.home is not None:
self.install_base = self.install_platbase = self.home
self.select_scheme("unix_home")
else:
if self.prefix is None:
if self.exec_prefix is not None:
raise DistutilsOptionError, \
"must not supply exec-prefix without prefix"
self.prefix = os.path.normpath(sys.prefix)
self.exec_prefix = os.path.normpath(sys.exec_prefix)
else:
if self.exec_prefix is None:
self.exec_prefix = self.prefix
self.install_base = self.prefix
self.install_platbase = self.exec_prefix
self.select_scheme("unix_prefix")
# finalize_unix ()
def finalize_other (self): # Windows and Mac OS for now
if self.user:
if self.install_userbase is None:
raise DistutilsPlatformError(
"User base directory is not specified")
self.install_base = self.install_platbase = self.install_userbase
self.select_scheme(os.name + "_user")
elif self.home is not None:
self.install_base = self.install_platbase = self.home
self.select_scheme("unix_home")
else:
if self.prefix is None:
self.prefix = os.path.normpath(sys.prefix)
self.install_base = self.install_platbase = self.prefix
try:
self.select_scheme(os.name)
except KeyError:
raise DistutilsPlatformError, \
"I don't know how to install stuff on '%s'" % os.name
# finalize_other ()
def select_scheme (self, name):
# it's the caller's problem if they supply a bad name!
scheme = INSTALL_SCHEMES[name]
for key in SCHEME_KEYS:
attrname = 'install_' + key
if getattr(self, attrname) is None:
setattr(self, attrname, scheme[key])
def _expand_attrs (self, attrs):
for attr in attrs:
val = getattr(self, attr)
if val is not None:
if os.name == 'posix' or os.name == 'nt':
val = os.path.expanduser(val)
val = subst_vars(val, self.config_vars)
setattr(self, attr, val)
def expand_basedirs (self):
self._expand_attrs(['install_base',
'install_platbase',
'root'])
def expand_dirs (self):
self._expand_attrs(['install_purelib',
'install_platlib',
'install_lib',
'install_headers',
'install_scripts',
'install_data',])
def convert_paths (self, *names):
for name in names:
attr = "install_" + name
setattr(self, attr, convert_path(getattr(self, attr)))
def handle_extra_path (self):
if self.extra_path is None:
self.extra_path = self.distribution.extra_path
if self.extra_path is not None:
if type(self.extra_path) is StringType:
self.extra_path = string.split(self.extra_path, ',')
if len(self.extra_path) == 1:
path_file = extra_dirs = self.extra_path[0]
elif len(self.extra_path) == 2:
(path_file, extra_dirs) = self.extra_path
else:
raise DistutilsOptionError, \
("'extra_path' option must be a list, tuple, or "
"comma-separated string with 1 or 2 elements")
# convert to local form in case Unix notation used (as it
# should be in setup scripts)
extra_dirs = convert_path(extra_dirs)
else:
path_file = None
extra_dirs = ''
# XXX should we warn if path_file and not extra_dirs? (in which
# case the path file would be harmless but pointless)
self.path_file = path_file
self.extra_dirs = extra_dirs
# handle_extra_path ()
def change_roots (self, *names):
for name in names:
attr = "install_" + name
setattr(self, attr, change_root(self.root, getattr(self, attr)))
def create_home_path(self):
"""Create directories under ~
"""
if not self.user:
return
home = convert_path(os.path.expanduser("~"))
for name, path in self.config_vars.iteritems():
if path.startswith(home) and not os.path.isdir(path):
self.debug_print("os.makedirs('%s', 0700)" % path)
os.makedirs(path, 0700)
# -- Command execution methods -------------------------------------
def run (self):
# Obviously have to build before we can install
if not self.skip_build:
self.run_command('build')
# If we built for any other platform, we can't install.
build_plat = self.distribution.get_command_obj('build').plat_name
# check warn_dir - it is a clue that the 'install' is happening
# internally, and not to sys.path, so we don't check the platform
# matches what we are running.
if self.warn_dir and build_plat != get_platform():
raise DistutilsPlatformError("Can't install when "
"cross-compiling")
# Run all sub-commands (at least those that need to be run)
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
if self.path_file:
self.create_path_file()
# write list of installed files, if requested.
if self.record:
outputs = self.get_outputs()
if self.root: # strip any package prefix
root_len = len(self.root)
for counter in xrange(len(outputs)):
outputs[counter] = outputs[counter][root_len:]
self.execute(write_file,
(self.record, outputs),
"writing list of installed files to '%s'" %
self.record)
sys_path = map(os.path.normpath, sys.path)
sys_path = map(os.path.normcase, sys_path)
install_lib = os.path.normcase(os.path.normpath(self.install_lib))
if (self.warn_dir and
not (self.path_file and self.install_path_file) and
install_lib not in sys_path):
log.debug(("modules installed to '%s', which is not in "
"Python's module search path (sys.path) -- "
"you'll have to change the search path yourself"),
self.install_lib)
# run ()
def create_path_file (self):
filename = os.path.join(self.install_libbase,
self.path_file + ".pth")
if self.install_path_file:
self.execute(write_file,
(filename, [self.extra_dirs]),
"creating %s" % filename)
else:
self.warn("path file '%s' not created" % filename)
# -- Reporting methods ---------------------------------------------
def get_outputs (self):
# Assemble the outputs of all the sub-commands.
outputs = []
for cmd_name in self.get_sub_commands():
cmd = self.get_finalized_command(cmd_name)
# Add the contents of cmd.get_outputs(), ensuring
# that outputs doesn't contain duplicate entries
for filename in cmd.get_outputs():
if filename not in outputs:
outputs.append(filename)
if self.path_file and self.install_path_file:
outputs.append(os.path.join(self.install_libbase,
self.path_file + ".pth"))
return outputs
def get_inputs (self):
# XXX gee, this looks familiar ;-(
inputs = []
for cmd_name in self.get_sub_commands():
cmd = self.get_finalized_command(cmd_name)
inputs.extend(cmd.get_inputs())
return inputs
# -- Predicates for sub-command list -------------------------------
def has_lib (self):
"""Return true if the current distribution has any Python
modules to install."""
return (self.distribution.has_pure_modules() or
self.distribution.has_ext_modules())
def has_headers (self):
return self.distribution.has_headers()
def has_scripts (self):
return self.distribution.has_scripts()
def has_data (self):
return self.distribution.has_data_files()
# 'sub_commands': a list of commands this command might have to run to
# get its work done. See cmd.py for more info.
sub_commands = [('install_lib', has_lib),
('install_headers', has_headers),
('install_scripts', has_scripts),
('install_data', has_data),
('install_egg_info', lambda self:True),
]
# class install
"""distutils.command.install_data
Implements the Distutils 'install_data' command, for installing
platform-independent data files."""
# contributed by Bastian Kleineidam
__revision__ = "$Id$"
import os
from distutils.core import Command
from distutils.util import change_root, convert_path
class install_data(Command):
description = "install data files"
user_options = [
('install-dir=', 'd',
"base directory for installing data files "
"(default: installation base dir)"),
('root=', None,
"install everything relative to this alternate root directory"),
('force', 'f', "force installation (overwrite existing files)"),
]
boolean_options = ['force']
def initialize_options(self):
self.install_dir = None
self.outfiles = []
self.root = None
self.force = 0
self.data_files = self.distribution.data_files
self.warn_dir = 1
def finalize_options(self):
self.set_undefined_options('install',
('install_data', 'install_dir'),
('root', 'root'),
('force', 'force'),
)
def run(self):
self.mkpath(self.install_dir)
for f in self.data_files:
if isinstance(f, str):
# it's a simple file, so copy it
f = convert_path(f)
if self.warn_dir:
self.warn("setup script did not provide a directory for "
"'%s' -- installing right in '%s'" %
(f, self.install_dir))
(out, _) = self.copy_file(f, self.install_dir)
self.outfiles.append(out)
else:
# it's a tuple with path to install to and a list of files
dir = convert_path(f[0])
if not os.path.isabs(dir):
dir = os.path.join(self.install_dir, dir)
elif self.root:
dir = change_root(self.root, dir)
self.mkpath(dir)
if f[1] == []:
# If there are no files listed, the user must be
# trying to create an empty directory, so add the
# directory to the list of output files.
self.outfiles.append(dir)
else:
# Copy files, adding them to the list of output files.
for data in f[1]:
data = convert_path(data)
(out, _) = self.copy_file(data, dir)
self.outfiles.append(out)
def get_inputs(self):
return self.data_files or []
def get_outputs(self):
return self.outfiles
"""distutils.command.install_egg_info
Implements the Distutils 'install_egg_info' command, for installing
a package's PKG-INFO metadata."""
from distutils.cmd import Command
from distutils import log, dir_util
import os, sys, re
class install_egg_info(Command):
"""Install an .egg-info file for the package"""
description = "Install package's PKG-INFO metadata as an .egg-info file"
user_options = [
('install-dir=', 'd', "directory to install to"),
]
def initialize_options(self):
self.install_dir = None
def finalize_options(self):
self.set_undefined_options('install_lib',('install_dir','install_dir'))
basename = "%s-%s-py%s.egg-info" % (
to_filename(safe_name(self.distribution.get_name())),
to_filename(safe_version(self.distribution.get_version())),
sys.version[:3]
)
self.target = os.path.join(self.install_dir, basename)
self.outputs = [self.target]
def run(self):
target = self.target
if os.path.isdir(target) and not os.path.islink(target):
dir_util.remove_tree(target, dry_run=self.dry_run)
elif os.path.exists(target):
self.execute(os.unlink,(self.target,),"Removing "+target)
elif not os.path.isdir(self.install_dir):
self.execute(os.makedirs, (self.install_dir,),
"Creating "+self.install_dir)
log.info("Writing %s", target)
if not self.dry_run:
f = open(target, 'w')
self.distribution.metadata.write_pkg_file(f)
f.close()
def get_outputs(self):
return self.outputs
# The following routines are taken from setuptools' pkg_resources module and
# can be replaced by importing them from pkg_resources once it is included
# in the stdlib.
def safe_name(name):
"""Convert an arbitrary string to a standard distribution name
Any runs of non-alphanumeric/. characters are replaced with a single '-'.
"""
return re.sub('[^A-Za-z0-9.]+', '-', name)
def safe_version(version):
"""Convert an arbitrary string to a standard version string
Spaces become dots, and all other non-alphanumeric characters become
dashes, with runs of multiple dashes condensed to a single dash.
"""
version = version.replace(' ','.')
return re.sub('[^A-Za-z0-9.]+', '-', version)
def to_filename(name):
"""Convert a project or version name to its filename-escaped form
Any '-' characters are currently replaced with '_'.
"""
return name.replace('-','_')
"""distutils.command.install_headers
Implements the Distutils 'install_headers' command, to install C/C++ header
files to the Python include directory."""
__revision__ = "$Id$"
from distutils.core import Command
# XXX force is never used
class install_headers(Command):
description = "install C/C++ header files"
user_options = [('install-dir=', 'd',
"directory to install header files to"),
('force', 'f',
"force installation (overwrite existing files)"),
]
boolean_options = ['force']
def initialize_options(self):
self.install_dir = None
self.force = 0
self.outfiles = []
def finalize_options(self):
self.set_undefined_options('install',
('install_headers', 'install_dir'),
('force', 'force'))
def run(self):
headers = self.distribution.headers
if not headers:
return
self.mkpath(self.install_dir)
for header in headers:
(out, _) = self.copy_file(header, self.install_dir)
self.outfiles.append(out)
def get_inputs(self):
return self.distribution.headers or []
def get_outputs(self):
return self.outfiles
# class install_headers
"""distutils.command.install_lib
Implements the Distutils 'install_lib' command
(install all Python modules)."""
__revision__ = "$Id$"
import os
import sys
from distutils.core import Command
from distutils.errors import DistutilsOptionError
# Extension for Python source files.
if hasattr(os, 'extsep'):
PYTHON_SOURCE_EXTENSION = os.extsep + "py"
else:
PYTHON_SOURCE_EXTENSION = ".py"
class install_lib(Command):
description = "install all Python modules (extensions and pure Python)"
# The byte-compilation options are a tad confusing. Here are the
# possible scenarios:
# 1) no compilation at all (--no-compile --no-optimize)
# 2) compile .pyc only (--compile --no-optimize; default)
# 3) compile .pyc and "level 1" .pyo (--compile --optimize)
# 4) compile "level 1" .pyo only (--no-compile --optimize)
# 5) compile .pyc and "level 2" .pyo (--compile --optimize-more)
# 6) compile "level 2" .pyo only (--no-compile --optimize-more)
#
# The UI for this is two option, 'compile' and 'optimize'.
# 'compile' is strictly boolean, and only decides whether to
# generate .pyc files. 'optimize' is three-way (0, 1, or 2), and
# decides both whether to generate .pyo files and what level of
# optimization to use.
user_options = [
('install-dir=', 'd', "directory to install to"),
('build-dir=','b', "build directory (where to install from)"),
('force', 'f', "force installation (overwrite existing files)"),
('compile', 'c', "compile .py to .pyc [default]"),
('no-compile', None, "don't compile .py files"),
('optimize=', 'O',
"also compile with optimization: -O1 for \"python -O\", "
"-O2 for \"python -OO\", and -O0 to disable [default: -O0]"),
('skip-build', None, "skip the build steps"),
]
boolean_options = ['force', 'compile', 'skip-build']
negative_opt = {'no-compile' : 'compile'}
def initialize_options(self):
# let the 'install' command dictate our installation directory
self.install_dir = None
self.build_dir = None
self.force = 0
self.compile = None
self.optimize = None
self.skip_build = None
def finalize_options(self):
# Get all the information we need to install pure Python modules
# from the umbrella 'install' command -- build (source) directory,
# install (target) directory, and whether to compile .py files.
self.set_undefined_options('install',
('build_lib', 'build_dir'),
('install_lib', 'install_dir'),
('force', 'force'),
('compile', 'compile'),
('optimize', 'optimize'),
('skip_build', 'skip_build'),
)
if self.compile is None:
self.compile = 1
if self.optimize is None:
self.optimize = 0
if not isinstance(self.optimize, int):
try:
self.optimize = int(self.optimize)
if self.optimize not in (0, 1, 2):
raise AssertionError
except (ValueError, AssertionError):
raise DistutilsOptionError, "optimize must be 0, 1, or 2"
def run(self):
# Make sure we have built everything we need first
self.build()
# Install everything: simply dump the entire contents of the build
# directory to the installation directory (that's the beauty of
# having a build directory!)
outfiles = self.install()
# (Optionally) compile .py to .pyc
if outfiles is not None and self.distribution.has_pure_modules():
self.byte_compile(outfiles)
# -- Top-level worker functions ------------------------------------
# (called from 'run()')
def build(self):
if not self.skip_build:
if self.distribution.has_pure_modules():
self.run_command('build_py')
if self.distribution.has_ext_modules():
self.run_command('build_ext')
def install(self):
if os.path.isdir(self.build_dir):
outfiles = self.copy_tree(self.build_dir, self.install_dir)
else:
self.warn("'%s' does not exist -- no Python modules to install" %
self.build_dir)
return
return outfiles
def byte_compile(self, files):
if sys.dont_write_bytecode:
self.warn('byte-compiling is disabled, skipping.')
return
from distutils.util import byte_compile
# Get the "--root" directory supplied to the "install" command,
# and use it as a prefix to strip off the purported filename
# encoded in bytecode files. This is far from complete, but it
# should at least generate usable bytecode in RPM distributions.
install_root = self.get_finalized_command('install').root
if self.compile:
byte_compile(files, optimize=0,
force=self.force, prefix=install_root,
dry_run=self.dry_run)
if self.optimize > 0:
byte_compile(files, optimize=self.optimize,
force=self.force, prefix=install_root,
verbose=self.verbose, dry_run=self.dry_run)
# -- Utility methods -----------------------------------------------
def _mutate_outputs(self, has_any, build_cmd, cmd_option, output_dir):
if not has_any:
return []
build_cmd = self.get_finalized_command(build_cmd)
build_files = build_cmd.get_outputs()
build_dir = getattr(build_cmd, cmd_option)
prefix_len = len(build_dir) + len(os.sep)
outputs = []
for file in build_files:
outputs.append(os.path.join(output_dir, file[prefix_len:]))
return outputs
def _bytecode_filenames(self, py_filenames):
bytecode_files = []
for py_file in py_filenames:
# Since build_py handles package data installation, the
# list of outputs can contain more than just .py files.
# Make sure we only report bytecode for the .py files.
ext = os.path.splitext(os.path.normcase(py_file))[1]
if ext != PYTHON_SOURCE_EXTENSION:
continue
if self.compile:
bytecode_files.append(py_file + "c")
if self.optimize > 0:
bytecode_files.append(py_file + "o")
return bytecode_files
# -- External interface --------------------------------------------
# (called by outsiders)
def get_outputs(self):
"""Return the list of files that would be installed if this command
were actually run. Not affected by the "dry-run" flag or whether
modules have actually been built yet.
"""
pure_outputs = \
self._mutate_outputs(self.distribution.has_pure_modules(),
'build_py', 'build_lib',
self.install_dir)
if self.compile:
bytecode_outputs = self._bytecode_filenames(pure_outputs)
else:
bytecode_outputs = []
ext_outputs = \
self._mutate_outputs(self.distribution.has_ext_modules(),
'build_ext', 'build_lib',
self.install_dir)
return pure_outputs + bytecode_outputs + ext_outputs
def get_inputs(self):
"""Get the list of files that are input to this command, ie. the
files that get installed as they are named in the build tree.
The files in this list correspond one-to-one to the output
filenames returned by 'get_outputs()'.
"""
inputs = []
if self.distribution.has_pure_modules():
build_py = self.get_finalized_command('build_py')
inputs.extend(build_py.get_outputs())
if self.distribution.has_ext_modules():
build_ext = self.get_finalized_command('build_ext')
inputs.extend(build_ext.get_outputs())
return inputs
"""distutils.command.install_scripts
Implements the Distutils 'install_scripts' command, for installing
Python scripts."""
# contributed by Bastian Kleineidam
__revision__ = "$Id$"
import os
from distutils.core import Command
from distutils import log
from stat import ST_MODE
class install_scripts (Command):
description = "install scripts (Python or otherwise)"
user_options = [
('install-dir=', 'd', "directory to install scripts to"),
('build-dir=','b', "build directory (where to install from)"),
('force', 'f', "force installation (overwrite existing files)"),
('skip-build', None, "skip the build steps"),
]
boolean_options = ['force', 'skip-build']
def initialize_options (self):
self.install_dir = None
self.force = 0
self.build_dir = None
self.skip_build = None
def finalize_options (self):
self.set_undefined_options('build', ('build_scripts', 'build_dir'))
self.set_undefined_options('install',
('install_scripts', 'install_dir'),
('force', 'force'),
('skip_build', 'skip_build'),
)
def run (self):
if not self.skip_build:
self.run_command('build_scripts')
self.outfiles = self.copy_tree(self.build_dir, self.install_dir)
if os.name == 'posix':
# Set the executable bits (owner, group, and world) on
# all the scripts we just installed.
for file in self.get_outputs():
if self.dry_run:
log.info("changing mode of %s", file)
else:
mode = ((os.stat(file)[ST_MODE]) | 0555) & 07777
log.info("changing mode of %s to %o", file, mode)
os.chmod(file, mode)
def get_inputs (self):
return self.distribution.scripts or []
def get_outputs(self):
return self.outfiles or []
# class install_scripts
"""distutils.command.register
Implements the Distutils 'register' command (register with the repository).
"""
# created 2002/10/21, Richard Jones
__revision__ = "$Id$"
import urllib2
import getpass
import urlparse
from warnings import warn
from distutils.core import PyPIRCCommand
from distutils import log
class register(PyPIRCCommand):
description = ("register the distribution with the Python package index")
user_options = PyPIRCCommand.user_options + [
('list-classifiers', None,
'list the valid Trove classifiers'),
('strict', None ,
'Will stop the registering if the meta-data are not fully compliant')
]
boolean_options = PyPIRCCommand.boolean_options + [
'verify', 'list-classifiers', 'strict']
sub_commands = [('check', lambda self: True)]
def initialize_options(self):
PyPIRCCommand.initialize_options(self)
self.list_classifiers = 0
self.strict = 0
def finalize_options(self):
PyPIRCCommand.finalize_options(self)
# setting options for the `check` subcommand
check_options = {'strict': ('register', self.strict),
'restructuredtext': ('register', 1)}
self.distribution.command_options['check'] = check_options
def run(self):
self.finalize_options()
self._set_config()
# Run sub commands
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
if self.dry_run:
self.verify_metadata()
elif self.list_classifiers:
self.classifiers()
else:
self.send_metadata()
def check_metadata(self):
"""Deprecated API."""
warn("distutils.command.register.check_metadata is deprecated, \
use the check command instead", PendingDeprecationWarning)
check = self.distribution.get_command_obj('check')
check.ensure_finalized()
check.strict = self.strict
check.restructuredtext = 1
check.run()
def _set_config(self):
''' Reads the configuration file and set attributes.
'''
config = self._read_pypirc()
if config != {}:
self.username = config['username']
self.password = config['password']
self.repository = config['repository']
self.realm = config['realm']
self.has_config = True
else:
if self.repository not in ('pypi', self.DEFAULT_REPOSITORY):
raise ValueError('%s not found in .pypirc' % self.repository)
if self.repository == 'pypi':
self.repository = self.DEFAULT_REPOSITORY
self.has_config = False
def classifiers(self):
''' Fetch the list of classifiers from the server.
'''
response = urllib2.urlopen(self.repository+'?:action=list_classifiers')
log.info(response.read())
def verify_metadata(self):
''' Send the metadata to the package index server to be checked.
'''
# send the info to the server and report the result
(code, result) = self.post_to_server(self.build_post_data('verify'))
log.info('Server response (%s): %s' % (code, result))
def send_metadata(self):
''' Send the metadata to the package index server.
Well, do the following:
1. figure who the user is, and then
2. send the data as a Basic auth'ed POST.
First we try to read the username/password from $HOME/.pypirc,
which is a ConfigParser-formatted file with a section
[distutils] containing username and password entries (both
in clear text). Eg:
[distutils]
index-servers =
pypi
[pypi]
username: fred
password: sekrit
Otherwise, to figure who the user is, we offer the user three
choices:
1. use existing login,
2. register as a new user, or
3. set the password to a random string and email the user.
'''
# see if we can short-cut and get the username/password from the
# config
if self.has_config:
choice = '1'
username = self.username
password = self.password
else:
choice = 'x'
username = password = ''
# get the user's login info
choices = '1 2 3 4'.split()
while choice not in choices:
self.announce('''\
We need to know who you are, so please choose either:
1. use your existing login,
2. register as a new user,
3. have the server generate a new password for you (and email it to you), or
4. quit
Your selection [default 1]: ''', log.INFO)
choice = raw_input()
if not choice:
choice = '1'
elif choice not in choices:
print 'Please choose one of the four options!'
if choice == '1':
# get the username and password
while not username:
username = raw_input('Username: ')
while not password:
password = getpass.getpass('Password: ')
# set up the authentication
auth = urllib2.HTTPPasswordMgr()
host = urlparse.urlparse(self.repository)[1]
auth.add_password(self.realm, host, username, password)
# send the info to the server and report the result
code, result = self.post_to_server(self.build_post_data('submit'),
auth)
self.announce('Server response (%s): %s' % (code, result),
log.INFO)
# possibly save the login
if code == 200:
if self.has_config:
# sharing the password in the distribution instance
# so the upload command can reuse it
self.distribution.password = password
else:
self.announce(('I can store your PyPI login so future '
'submissions will be faster.'), log.INFO)
self.announce('(the login will be stored in %s)' % \
self._get_rc_file(), log.INFO)
choice = 'X'
while choice.lower() not in 'yn':
choice = raw_input('Save your login (y/N)?')
if not choice:
choice = 'n'
if choice.lower() == 'y':
self._store_pypirc(username, password)
elif choice == '2':
data = {':action': 'user'}
data['name'] = data['password'] = data['email'] = ''
data['confirm'] = None
while not data['name']:
data['name'] = raw_input('Username: ')
while data['password'] != data['confirm']:
while not data['password']:
data['password'] = getpass.getpass('Password: ')
while not data['confirm']:
data['confirm'] = getpass.getpass(' Confirm: ')
if data['password'] != data['confirm']:
data['password'] = ''
data['confirm'] = None
print "Password and confirm don't match!"
while not data['email']:
data['email'] = raw_input(' EMail: ')
code, result = self.post_to_server(data)
if code != 200:
log.info('Server response (%s): %s' % (code, result))
else:
log.info('You will receive an email shortly.')
log.info(('Follow the instructions in it to '
'complete registration.'))
elif choice == '3':
data = {':action': 'password_reset'}
data['email'] = ''
while not data['email']:
data['email'] = raw_input('Your email address: ')
code, result = self.post_to_server(data)
log.info('Server response (%s): %s' % (code, result))
def build_post_data(self, action):
# figure the data to send - the metadata plus some additional
# information used by the package server
meta = self.distribution.metadata
data = {
':action': action,
'metadata_version' : '1.0',
'name': meta.get_name(),
'version': meta.get_version(),
'summary': meta.get_description(),
'home_page': meta.get_url(),
'author': meta.get_contact(),
'author_email': meta.get_contact_email(),
'license': meta.get_licence(),
'description': meta.get_long_description(),
'keywords': meta.get_keywords(),
'platform': meta.get_platforms(),
'classifiers': meta.get_classifiers(),
'download_url': meta.get_download_url(),
# PEP 314
'provides': meta.get_provides(),
'requires': meta.get_requires(),
'obsoletes': meta.get_obsoletes(),
}
if data['provides'] or data['requires'] or data['obsoletes']:
data['metadata_version'] = '1.1'
return data
def post_to_server(self, data, auth=None):
''' Post a query to the server, and return a string response.
'''
if 'name' in data:
self.announce('Registering %s to %s' % (data['name'],
self.repository),
log.INFO)
# Build up the MIME payload for the urllib2 POST data
boundary = '--------------GHSKFJDLGDS7543FJKLFHRE75642756743254'
sep_boundary = '\n--' + boundary
end_boundary = sep_boundary + '--'
chunks = []
for key, value in data.items():
# handle multiple entries for the same name
if type(value) not in (type([]), type( () )):
value = [value]
for value in value:
chunks.append(sep_boundary)
chunks.append('\nContent-Disposition: form-data; name="%s"'%key)
chunks.append("\n\n")
chunks.append(value)
if value and value[-1] == '\r':
chunks.append('\n') # write an extra newline (lurve Macs)
chunks.append(end_boundary)
chunks.append("\n")
# chunks may be bytes (str) or unicode objects that we need to encode
body = []
for chunk in chunks:
if isinstance(chunk, unicode):
body.append(chunk.encode('utf-8'))
else:
body.append(chunk)
body = ''.join(body)
# build the Request
headers = {
'Content-type': 'multipart/form-data; boundary=%s; charset=utf-8'%boundary,
'Content-length': str(len(body))
}
req = urllib2.Request(self.repository, body, headers)
# handle HTTP and include the Basic Auth handler
opener = urllib2.build_opener(
urllib2.HTTPBasicAuthHandler(password_mgr=auth)
)
data = ''
try:
result = opener.open(req)
except urllib2.HTTPError, e:
if self.show_response:
data = e.fp.read()
result = e.code, e.msg
except urllib2.URLError, e:
result = 500, str(e)
else:
if self.show_response:
data = result.read()
result = 200, 'OK'
if self.show_response:
dashes = '-' * 75
self.announce('%s%s%s' % (dashes, data, dashes))
return result
"""distutils.command.sdist
Implements the Distutils 'sdist' command (create a source distribution)."""
__revision__ = "$Id$"
import os
import string
import sys
from glob import glob
from warnings import warn
from distutils.core import Command
from distutils import dir_util, dep_util, file_util, archive_util
from distutils.text_file import TextFile
from distutils.errors import (DistutilsPlatformError, DistutilsOptionError,
DistutilsTemplateError)
from distutils.filelist import FileList
from distutils import log
from distutils.util import convert_path
def show_formats():
"""Print all possible values for the 'formats' option (used by
the "--help-formats" command-line option).
"""
from distutils.fancy_getopt import FancyGetopt
from distutils.archive_util import ARCHIVE_FORMATS
formats = []
for format in ARCHIVE_FORMATS.keys():
formats.append(("formats=" + format, None,
ARCHIVE_FORMATS[format][2]))
formats.sort()
FancyGetopt(formats).print_help(
"List of available source distribution formats:")
class sdist(Command):
description = "create a source distribution (tarball, zip file, etc.)"
def checking_metadata(self):
"""Callable used for the check sub-command.
Placed here so user_options can view it"""
return self.metadata_check
user_options = [
('template=', 't',
"name of manifest template file [default: MANIFEST.in]"),
('manifest=', 'm',
"name of manifest file [default: MANIFEST]"),
('use-defaults', None,
"include the default file set in the manifest "
"[default; disable with --no-defaults]"),
('no-defaults', None,
"don't include the default file set"),
('prune', None,
"specifically exclude files/directories that should not be "
"distributed (build tree, RCS/CVS dirs, etc.) "
"[default; disable with --no-prune]"),
('no-prune', None,
"don't automatically exclude anything"),
('manifest-only', 'o',
"just regenerate the manifest and then stop "
"(implies --force-manifest)"),
('force-manifest', 'f',
"forcibly regenerate the manifest and carry on as usual. "
"Deprecated: now the manifest is always regenerated."),
('formats=', None,
"formats for source distribution (comma-separated list)"),
('keep-temp', 'k',
"keep the distribution tree around after creating " +
"archive file(s)"),
('dist-dir=', 'd',
"directory to put the source distribution archive(s) in "
"[default: dist]"),
('metadata-check', None,
"Ensure that all required elements of meta-data "
"are supplied. Warn if any missing. [default]"),
('owner=', 'u',
"Owner name used when creating a tar file [default: current user]"),
('group=', 'g',
"Group name used when creating a tar file [default: current group]"),
]
boolean_options = ['use-defaults', 'prune',
'manifest-only', 'force-manifest',
'keep-temp', 'metadata-check']
help_options = [
('help-formats', None,
"list available distribution formats", show_formats),
]
negative_opt = {'no-defaults': 'use-defaults',
'no-prune': 'prune' }
default_format = {'posix': 'gztar',
'nt': 'zip' }
sub_commands = [('check', checking_metadata)]
def initialize_options(self):
# 'template' and 'manifest' are, respectively, the names of
# the manifest template and manifest file.
self.template = None
self.manifest = None
# 'use_defaults': if true, we will include the default file set
# in the manifest
self.use_defaults = 1
self.prune = 1
self.manifest_only = 0
self.force_manifest = 0
self.formats = None
self.keep_temp = 0
self.dist_dir = None
self.archive_files = None
self.metadata_check = 1
self.owner = None
self.group = None
def finalize_options(self):
if self.manifest is None:
self.manifest = "MANIFEST"
if self.template is None:
self.template = "MANIFEST.in"
self.ensure_string_list('formats')
if self.formats is None:
try:
self.formats = [self.default_format[os.name]]
except KeyError:
raise DistutilsPlatformError, \
"don't know how to create source distributions " + \
"on platform %s" % os.name
bad_format = archive_util.check_archive_formats(self.formats)
if bad_format:
raise DistutilsOptionError, \
"unknown archive format '%s'" % bad_format
if self.dist_dir is None:
self.dist_dir = "dist"
def run(self):
# 'filelist' contains the list of files that will make up the
# manifest
self.filelist = FileList()
# Run sub commands
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
# Do whatever it takes to get the list of files to process
# (process the manifest template, read an existing manifest,
# whatever). File list is accumulated in 'self.filelist'.
self.get_file_list()
# If user just wanted us to regenerate the manifest, stop now.
if self.manifest_only:
return
# Otherwise, go ahead and create the source distribution tarball,
# or zipfile, or whatever.
self.make_distribution()
def check_metadata(self):
"""Deprecated API."""
warn("distutils.command.sdist.check_metadata is deprecated, \
use the check command instead", PendingDeprecationWarning)
check = self.distribution.get_command_obj('check')
check.ensure_finalized()
check.run()
def get_file_list(self):
"""Figure out the list of files to include in the source
distribution, and put it in 'self.filelist'. This might involve
reading the manifest template (and writing the manifest), or just
reading the manifest, or just using the default file set -- it all
depends on the user's options.
"""
# new behavior when using a template:
# the file list is recalculated every time because
# even if MANIFEST.in or setup.py are not changed
# the user might have added some files in the tree that
# need to be included.
#
# This makes --force the default and only behavior with templates.
template_exists = os.path.isfile(self.template)
if not template_exists and self._manifest_is_not_generated():
self.read_manifest()
self.filelist.sort()
self.filelist.remove_duplicates()
return
if not template_exists:
self.warn(("manifest template '%s' does not exist " +
"(using default file list)") %
self.template)
self.filelist.findall()
if self.use_defaults:
self.add_defaults()
if template_exists:
self.read_template()
if self.prune:
self.prune_file_list()
self.filelist.sort()
self.filelist.remove_duplicates()
self.write_manifest()
def add_defaults(self):
"""Add all the default files to self.filelist:
- README or README.txt
- setup.py
- test/test*.py
- all pure Python modules mentioned in setup script
- all files pointed by package_data (build_py)
- all files defined in data_files.
- all files defined as scripts.
- all C sources listed as part of extensions or C libraries
in the setup script (doesn't catch C headers!)
Warns if (README or README.txt) or setup.py are missing; everything
else is optional.
"""
standards = [('README', 'README.txt'), self.distribution.script_name]
for fn in standards:
if isinstance(fn, tuple):
alts = fn
got_it = 0
for fn in alts:
if os.path.exists(fn):
got_it = 1
self.filelist.append(fn)
break
if not got_it:
self.warn("standard file not found: should have one of " +
string.join(alts, ', '))
else:
if os.path.exists(fn):
self.filelist.append(fn)
else:
self.warn("standard file '%s' not found" % fn)
optional = ['test/test*.py', 'setup.cfg']
for pattern in optional:
files = filter(os.path.isfile, glob(pattern))
if files:
self.filelist.extend(files)
# build_py is used to get:
# - python modules
# - files defined in package_data
build_py = self.get_finalized_command('build_py')
# getting python files
if self.distribution.has_pure_modules():
self.filelist.extend(build_py.get_source_files())
# getting package_data files
# (computed in build_py.data_files by build_py.finalize_options)
for pkg, src_dir, build_dir, filenames in build_py.data_files:
for filename in filenames:
self.filelist.append(os.path.join(src_dir, filename))
# getting distribution.data_files
if self.distribution.has_data_files():
for item in self.distribution.data_files:
if isinstance(item, str): # plain file
item = convert_path(item)
if os.path.isfile(item):
self.filelist.append(item)
else: # a (dirname, filenames) tuple
dirname, filenames = item
for f in filenames:
f = convert_path(f)
if os.path.isfile(f):
self.filelist.append(f)
if self.distribution.has_ext_modules():
build_ext = self.get_finalized_command('build_ext')
self.filelist.extend(build_ext.get_source_files())
if self.distribution.has_c_libraries():
build_clib = self.get_finalized_command('build_clib')
self.filelist.extend(build_clib.get_source_files())
if self.distribution.has_scripts():
build_scripts = self.get_finalized_command('build_scripts')
self.filelist.extend(build_scripts.get_source_files())
def read_template(self):
"""Read and parse manifest template file named by self.template.
(usually "MANIFEST.in") The parsing and processing is done by
'self.filelist', which updates itself accordingly.
"""
log.info("reading manifest template '%s'", self.template)
template = TextFile(self.template,
strip_comments=1,
skip_blanks=1,
join_lines=1,
lstrip_ws=1,
rstrip_ws=1,
collapse_join=1)
try:
while 1:
line = template.readline()
if line is None: # end of file
break
try:
self.filelist.process_template_line(line)
# the call above can raise a DistutilsTemplateError for
# malformed lines, or a ValueError from the lower-level
# convert_path function
except (DistutilsTemplateError, ValueError) as msg:
self.warn("%s, line %d: %s" % (template.filename,
template.current_line,
msg))
finally:
template.close()
def prune_file_list(self):
"""Prune off branches that might slip into the file list as created
by 'read_template()', but really don't belong there:
* the build tree (typically "build")
* the release tree itself (only an issue if we ran "sdist"
previously with --keep-temp, or it aborted)
* any RCS, CVS, .svn, .hg, .git, .bzr, _darcs directories
"""
build = self.get_finalized_command('build')
base_dir = self.distribution.get_fullname()
self.filelist.exclude_pattern(None, prefix=build.build_base)
self.filelist.exclude_pattern(None, prefix=base_dir)
# pruning out vcs directories
# both separators are used under win32
if sys.platform == 'win32':
seps = r'/|\\'
else:
seps = '/'
vcs_dirs = ['RCS', 'CVS', r'\.svn', r'\.hg', r'\.git', r'\.bzr',
'_darcs']
vcs_ptrn = r'(^|%s)(%s)(%s).*' % (seps, '|'.join(vcs_dirs), seps)
self.filelist.exclude_pattern(vcs_ptrn, is_regex=1)
def write_manifest(self):
"""Write the file list in 'self.filelist' (presumably as filled in
by 'add_defaults()' and 'read_template()') to the manifest file
named by 'self.manifest'.
"""
if self._manifest_is_not_generated():
log.info("not writing to manually maintained "
"manifest file '%s'" % self.manifest)
return
content = self.filelist.files[:]
content.insert(0, '# file GENERATED by distutils, do NOT edit')
self.execute(file_util.write_file, (self.manifest, content),
"writing manifest file '%s'" % self.manifest)
def _manifest_is_not_generated(self):
# check for special comment used in 2.7.1 and higher
if not os.path.isfile(self.manifest):
return False
fp = open(self.manifest, 'rU')
try:
first_line = fp.readline()
finally:
fp.close()
return first_line != '# file GENERATED by distutils, do NOT edit\n'
def read_manifest(self):
"""Read the manifest file (named by 'self.manifest') and use it to
fill in 'self.filelist', the list of files to include in the source
distribution.
"""
log.info("reading manifest file '%s'", self.manifest)
manifest = open(self.manifest)
for line in manifest:
# ignore comments and blank lines
line = line.strip()
if line.startswith('#') or not line:
continue
self.filelist.append(line)
manifest.close()
def make_release_tree(self, base_dir, files):
"""Create the directory tree that will become the source
distribution archive. All directories implied by the filenames in
'files' are created under 'base_dir', and then we hard link or copy
(if hard linking is unavailable) those files into place.
Essentially, this duplicates the developer's source tree, but in a
directory named after the distribution, containing only the files
to be distributed.
"""
# Create all the directories under 'base_dir' necessary to
# put 'files' there; the 'mkpath()' is just so we don't die
# if the manifest happens to be empty.
self.mkpath(base_dir)
dir_util.create_tree(base_dir, files, dry_run=self.dry_run)
# And walk over the list of files, either making a hard link (if
# os.link exists) to each one that doesn't already exist in its
# corresponding location under 'base_dir', or copying each file
# that's out-of-date in 'base_dir'. (Usually, all files will be
# out-of-date, because by default we blow away 'base_dir' when
# we're done making the distribution archives.)
if hasattr(os, 'link'): # can make hard links on this system
link = 'hard'
msg = "making hard links in %s..." % base_dir
else: # nope, have to copy
link = None
msg = "copying files to %s..." % base_dir
if not files:
log.warn("no files to distribute -- empty manifest?")
else:
log.info(msg)
for file in files:
if not os.path.isfile(file):
log.warn("'%s' not a regular file -- skipping" % file)
else:
dest = os.path.join(base_dir, file)
self.copy_file(file, dest, link=link)
self.distribution.metadata.write_pkg_info(base_dir)
def make_distribution(self):
"""Create the source distribution(s). First, we create the release
tree with 'make_release_tree()'; then, we create all required
archive files (according to 'self.formats') from the release tree.
Finally, we clean up by blowing away the release tree (unless
'self.keep_temp' is true). The list of archive files created is
stored so it can be retrieved later by 'get_archive_files()'.
"""
# Don't warn about missing meta-data here -- should be (and is!)
# done elsewhere.
base_dir = self.distribution.get_fullname()
base_name = os.path.join(self.dist_dir, base_dir)
self.make_release_tree(base_dir, self.filelist.files)
archive_files = [] # remember names of files we create
# tar archive must be created last to avoid overwrite and remove
if 'tar' in self.formats:
self.formats.append(self.formats.pop(self.formats.index('tar')))
for fmt in self.formats:
file = self.make_archive(base_name, fmt, base_dir=base_dir,
owner=self.owner, group=self.group)
archive_files.append(file)
self.distribution.dist_files.append(('sdist', '', file))
self.archive_files = archive_files
if not self.keep_temp:
dir_util.remove_tree(base_dir, dry_run=self.dry_run)
def get_archive_files(self):
"""Return the list of archive files created when the command
was run, or None if the command hasn't run yet.
"""
return self.archive_files
"""distutils.command.upload
Implements the Distutils 'upload' subcommand (upload package to PyPI)."""
import os
import socket
import platform
from urllib2 import urlopen, Request, HTTPError
from base64 import standard_b64encode
import urlparse
import cStringIO as StringIO
from hashlib import md5
from distutils.errors import DistutilsError, DistutilsOptionError
from distutils.core import PyPIRCCommand
from distutils.spawn import spawn
from distutils import log
class upload(PyPIRCCommand):
description = "upload binary package to PyPI"
user_options = PyPIRCCommand.user_options + [
('sign', 's',
'sign files to upload using gpg'),
('identity=', 'i', 'GPG identity used to sign files'),
]
boolean_options = PyPIRCCommand.boolean_options + ['sign']
def initialize_options(self):
PyPIRCCommand.initialize_options(self)
self.username = ''
self.password = ''
self.show_response = 0
self.sign = False
self.identity = None
def finalize_options(self):
PyPIRCCommand.finalize_options(self)
if self.identity and not self.sign:
raise DistutilsOptionError(
"Must use --sign for --identity to have meaning"
)
config = self._read_pypirc()
if config != {}:
self.username = config['username']
self.password = config['password']
self.repository = config['repository']
self.realm = config['realm']
# getting the password from the distribution
# if previously set by the register command
if not self.password and self.distribution.password:
self.password = self.distribution.password
def run(self):
if not self.distribution.dist_files:
msg = ("Must create and upload files in one command "
"(e.g. setup.py sdist upload)")
raise DistutilsOptionError(msg)
for command, pyversion, filename in self.distribution.dist_files:
self.upload_file(command, pyversion, filename)
def upload_file(self, command, pyversion, filename):
# Makes sure the repository URL is compliant
schema, netloc, url, params, query, fragments = \
urlparse.urlparse(self.repository)
if params or query or fragments:
raise AssertionError("Incompatible url %s" % self.repository)
if schema not in ('http', 'https'):
raise AssertionError("unsupported schema " + schema)
# Sign if requested
if self.sign:
gpg_args = ["gpg", "--detach-sign", "-a", filename]
if self.identity:
gpg_args[2:2] = ["--local-user", self.identity]
spawn(gpg_args,
dry_run=self.dry_run)
# Fill in the data - send all the meta-data in case we need to
# register a new release
f = open(filename,'rb')
try:
content = f.read()
finally:
f.close()
meta = self.distribution.metadata
data = {
# action
':action': 'file_upload',
'protcol_version': '1',
# identify release
'name': meta.get_name(),
'version': meta.get_version(),
# file content
'content': (os.path.basename(filename),content),
'filetype': command,
'pyversion': pyversion,
'md5_digest': md5(content).hexdigest(),
# additional meta-data
'metadata_version' : '1.0',
'summary': meta.get_description(),
'home_page': meta.get_url(),
'author': meta.get_contact(),
'author_email': meta.get_contact_email(),
'license': meta.get_licence(),
'description': meta.get_long_description(),
'keywords': meta.get_keywords(),
'platform': meta.get_platforms(),
'classifiers': meta.get_classifiers(),
'download_url': meta.get_download_url(),
# PEP 314
'provides': meta.get_provides(),
'requires': meta.get_requires(),
'obsoletes': meta.get_obsoletes(),
}
comment = ''
if command == 'bdist_rpm':
dist, version, id = platform.dist()
if dist:
comment = 'built for %s %s' % (dist, version)
elif command == 'bdist_dumb':
comment = 'built for %s' % platform.platform(terse=1)
data['comment'] = comment
if self.sign:
data['gpg_signature'] = (os.path.basename(filename) + ".asc",
open(filename+".asc").read())
# set up the authentication
auth = "Basic " + standard_b64encode(self.username + ":" +
self.password)
# Build up the MIME payload for the POST data
boundary = '--------------GHSKFJDLGDS7543FJKLFHRE75642756743254'
sep_boundary = '\r\n--' + boundary
end_boundary = sep_boundary + '--\r\n'
body = StringIO.StringIO()
for key, value in data.items():
# handle multiple entries for the same name
if not isinstance(value, list):
value = [value]
for value in value:
if isinstance(value, tuple):
fn = ';filename="%s"' % value[0]
value = value[1]
else:
fn = ""
body.write(sep_boundary)
body.write('\r\nContent-Disposition: form-data; name="%s"' % key)
body.write(fn)
body.write("\r\n\r\n")
body.write(value)
body.write(end_boundary)
body = body.getvalue()
self.announce("Submitting %s to %s" % (filename, self.repository), log.INFO)
# build the Request
headers = {'Content-type':
'multipart/form-data; boundary=%s' % boundary,
'Content-length': str(len(body)),
'Authorization': auth}
request = Request(self.repository, data=body,
headers=headers)
# send the data
try:
result = urlopen(request)
status = result.getcode()
reason = result.msg
if self.show_response:
msg = '\n'.join(('-' * 75, result.read(), '-' * 75))
self.announce(msg, log.INFO)
except socket.error, e:
self.announce(str(e), log.ERROR)
raise
except HTTPError, e:
status = e.code
reason = e.msg
if status == 200:
self.announce('Server response (%s): %s' % (status, reason),
log.INFO)
else:
msg = 'Upload failed (%s): %s' % (status, reason)
self.announce(msg, log.ERROR)
raise DistutilsError(msg)
"""distutils.command
Package containing implementation of all the standard Distutils
commands."""
__revision__ = "$Id$"
__all__ = ['build',
'build_py',
'build_ext',
'build_clib',
'build_scripts',
'clean',
'install',
'install_lib',
'install_headers',
'install_scripts',
'install_data',
'sdist',
'register',
'bdist',
'bdist_dumb',
'bdist_rpm',
'bdist_wininst',
'upload',
'check',
# These two are reserved for future use:
#'bdist_sdux',
#'bdist_pkgtool',
# Note:
# bdist_packager is not included because it only provides
# an abstract base class
]
"""distutils.pypirc
Provides the PyPIRCCommand class, the base class for the command classes
that uses .pypirc in the distutils.command package.
"""
import os
from ConfigParser import ConfigParser
from distutils.cmd import Command
DEFAULT_PYPIRC = """\
[distutils]
index-servers =
pypi
[pypi]
username:%s
password:%s
"""
class PyPIRCCommand(Command):
"""Base command that knows how to handle the .pypirc file
"""
DEFAULT_REPOSITORY = 'https://upload.pypi.org/legacy/'
DEFAULT_REALM = 'pypi'
repository = None
realm = None
user_options = [
('repository=', 'r',
"url of repository [default: %s]" % \
DEFAULT_REPOSITORY),
('show-response', None,
'display full response text from server')]
boolean_options = ['show-response']
def _get_rc_file(self):
"""Returns rc file path."""
return os.path.join(os.path.expanduser('~'), '.pypirc')
def _store_pypirc(self, username, password):
"""Creates a default .pypirc file."""
rc = self._get_rc_file()
f = os.fdopen(os.open(rc, os.O_CREAT | os.O_WRONLY, 0600), 'w')
try:
f.write(DEFAULT_PYPIRC % (username, password))
finally:
f.close()
def _read_pypirc(self):
"""Reads the .pypirc file."""
rc = self._get_rc_file()
if os.path.exists(rc):
self.announce('Using PyPI login from %s' % rc)
repository = self.repository or self.DEFAULT_REPOSITORY
config = ConfigParser()
config.read(rc)
sections = config.sections()
if 'distutils' in sections:
# let's get the list of servers
index_servers = config.get('distutils', 'index-servers')
_servers = [server.strip() for server in
index_servers.split('\n')
if server.strip() != '']
if _servers == []:
# nothing set, let's try to get the default pypi
if 'pypi' in sections:
_servers = ['pypi']
else:
# the file is not properly defined, returning
# an empty dict
return {}
for server in _servers:
current = {'server': server}
current['username'] = config.get(server, 'username')
# optional params
for key, default in (('repository',
self.DEFAULT_REPOSITORY),
('realm', self.DEFAULT_REALM),
('password', None)):
if config.has_option(server, key):
current[key] = config.get(server, key)
else:
current[key] = default
if (current['server'] == repository or
current['repository'] == repository):
return current
elif 'server-login' in sections:
# old format
server = 'server-login'
if config.has_option(server, 'repository'):
repository = config.get(server, 'repository')
else:
repository = self.DEFAULT_REPOSITORY
return {'username': config.get(server, 'username'),
'password': config.get(server, 'password'),
'repository': repository,
'server': server,
'realm': self.DEFAULT_REALM}
return {}
def initialize_options(self):
"""Initialize options."""
self.repository = None
self.realm = None
self.show_response = 0
def finalize_options(self):
"""Finalizes options."""
if self.repository is None:
self.repository = self.DEFAULT_REPOSITORY
if self.realm is None:
self.realm = self.DEFAULT_REALM
"""distutils.core
The only module that needs to be imported to use the Distutils; provides
the 'setup' function (which is to be called from the setup script). Also
indirectly provides the Distribution and Command classes, although they are
really defined in distutils.dist and distutils.cmd.
"""
__revision__ = "$Id$"
import sys
import os
from distutils.debug import DEBUG
from distutils.errors import (DistutilsSetupError, DistutilsArgError,
DistutilsError, CCompilerError)
# Mainly import these so setup scripts can "from distutils.core import" them.
from distutils.dist import Distribution
from distutils.cmd import Command
from distutils.config import PyPIRCCommand
from distutils.extension import Extension
# This is a barebones help message generated displayed when the user
# runs the setup script with no arguments at all. More useful help
# is generated with various --help options: global help, list commands,
# and per-command help.
USAGE = """\
usage: %(script)s [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: %(script)s --help [cmd1 cmd2 ...]
or: %(script)s --help-commands
or: %(script)s cmd --help
"""
def gen_usage(script_name):
script = os.path.basename(script_name)
return USAGE % {'script': script}
# Some mild magic to control the behaviour of 'setup()' from 'run_setup()'.
_setup_stop_after = None
_setup_distribution = None
# Legal keyword arguments for the setup() function
setup_keywords = ('distclass', 'script_name', 'script_args', 'options',
'name', 'version', 'author', 'author_email',
'maintainer', 'maintainer_email', 'url', 'license',
'description', 'long_description', 'keywords',
'platforms', 'classifiers', 'download_url',
'requires', 'provides', 'obsoletes',
)
# Legal keyword arguments for the Extension constructor
extension_keywords = ('name', 'sources', 'include_dirs',
'define_macros', 'undef_macros',
'library_dirs', 'libraries', 'runtime_library_dirs',
'extra_objects', 'extra_compile_args', 'extra_link_args',
'swig_opts', 'export_symbols', 'depends', 'language')
def setup(**attrs):
"""The gateway to the Distutils: do everything your setup script needs
to do, in a highly flexible and user-driven way. Briefly: create a
Distribution instance; find and parse config files; parse the command
line; run each Distutils command found there, customized by the options
supplied to 'setup()' (as keyword arguments), in config files, and on
the command line.
The Distribution instance might be an instance of a class supplied via
the 'distclass' keyword argument to 'setup'; if no such class is
supplied, then the Distribution class (in dist.py) is instantiated.
All other arguments to 'setup' (except for 'cmdclass') are used to set
attributes of the Distribution instance.
The 'cmdclass' argument, if supplied, is a dictionary mapping command
names to command classes. Each command encountered on the command line
will be turned into a command class, which is in turn instantiated; any
class found in 'cmdclass' is used in place of the default, which is
(for command 'foo_bar') class 'foo_bar' in module
'distutils.command.foo_bar'. The command class must provide a
'user_options' attribute which is a list of option specifiers for
'distutils.fancy_getopt'. Any command-line options between the current
and the next command are used to set attributes of the current command
object.
When the entire command-line has been successfully parsed, calls the
'run()' method on each command object in turn. This method will be
driven entirely by the Distribution object (which each command object
has a reference to, thanks to its constructor), and the
command-specific options that became attributes of each command
object.
"""
global _setup_stop_after, _setup_distribution
# Determine the distribution class -- either caller-supplied or
# our Distribution (see below).
klass = attrs.get('distclass')
if klass:
del attrs['distclass']
else:
klass = Distribution
if 'script_name' not in attrs:
attrs['script_name'] = os.path.basename(sys.argv[0])
if 'script_args' not in attrs:
attrs['script_args'] = sys.argv[1:]
# Create the Distribution instance, using the remaining arguments
# (ie. everything except distclass) to initialize it
try:
_setup_distribution = dist = klass(attrs)
except DistutilsSetupError, msg:
if 'name' in attrs:
raise SystemExit, "error in %s setup command: %s" % \
(attrs['name'], msg)
else:
raise SystemExit, "error in setup command: %s" % msg
if _setup_stop_after == "init":
return dist
# Find and parse the config file(s): they will override options from
# the setup script, but be overridden by the command line.
dist.parse_config_files()
if DEBUG:
print "options (after parsing config files):"
dist.dump_option_dicts()
if _setup_stop_after == "config":
return dist
# Parse the command line and override config files; any
# command-line errors are the end user's fault, so turn them into
# SystemExit to suppress tracebacks.
try:
ok = dist.parse_command_line()
except DistutilsArgError, msg:
raise SystemExit, gen_usage(dist.script_name) + "\nerror: %s" % msg
if DEBUG:
print "options (after parsing command line):"
dist.dump_option_dicts()
if _setup_stop_after == "commandline":
return dist
# And finally, run all the commands found on the command line.
if ok:
try:
dist.run_commands()
except KeyboardInterrupt:
raise SystemExit, "interrupted"
except (IOError, os.error), exc:
if DEBUG:
sys.stderr.write("error: %s\n" % (exc,))
raise
else:
raise SystemExit, "error: %s" % (exc,)
except (DistutilsError,
CCompilerError), msg:
if DEBUG:
raise
else:
raise SystemExit, "error: " + str(msg)
return dist
def run_setup(script_name, script_args=None, stop_after="run"):
"""Run a setup script in a somewhat controlled environment, and
return the Distribution instance that drives things. This is useful
if you need to find out the distribution meta-data (passed as
keyword args from 'script' to 'setup()', or the contents of the
config files or command-line.
'script_name' is a file that will be run with 'execfile()';
'sys.argv[0]' will be replaced with 'script' for the duration of the
call. 'script_args' is a list of strings; if supplied,
'sys.argv[1:]' will be replaced by 'script_args' for the duration of
the call.
'stop_after' tells 'setup()' when to stop processing; possible
values:
init
stop after the Distribution instance has been created and
populated with the keyword arguments to 'setup()'
config
stop after config files have been parsed (and their data
stored in the Distribution instance)
commandline
stop after the command-line ('sys.argv[1:]' or 'script_args')
have been parsed (and the data stored in the Distribution)
run [default]
stop after all commands have been run (the same as if 'setup()'
had been called in the usual way
Returns the Distribution instance, which provides all information
used to drive the Distutils.
"""
if stop_after not in ('init', 'config', 'commandline', 'run'):
raise ValueError, "invalid value for 'stop_after': %r" % (stop_after,)
global _setup_stop_after, _setup_distribution
_setup_stop_after = stop_after
save_argv = sys.argv
g = {'__file__': script_name}
l = {}
try:
try:
sys.argv[0] = script_name
if script_args is not None:
sys.argv[1:] = script_args
f = open(script_name)
try:
exec f.read() in g, l
finally:
f.close()
finally:
sys.argv = save_argv
_setup_stop_after = None
except SystemExit:
# Hmm, should we do something if exiting with a non-zero code
# (ie. error)?
pass
except:
raise
if _setup_distribution is None:
raise RuntimeError, \
("'distutils.core.setup()' was never called -- "
"perhaps '%s' is not a Distutils setup script?") % \
script_name
# I wonder if the setup script's namespace -- g and l -- would be of
# any interest to callers?
return _setup_distribution
"""distutils.cygwinccompiler
Provides the CygwinCCompiler class, a subclass of UnixCCompiler that
handles the Cygwin port of the GNU C compiler to Windows. It also contains
the Mingw32CCompiler class which handles the mingw32 port of GCC (same as
cygwin in no-cygwin mode).
"""
# problems:
#
# * if you use a msvc compiled python version (1.5.2)
# 1. you have to insert a __GNUC__ section in its config.h
# 2. you have to generate an import library for its dll
# - create a def-file for python??.dll
# - create an import library using
# dlltool --dllname python15.dll --def python15.def \
# --output-lib libpython15.a
#
# see also http://starship.python.net/crew/kernr/mingw32/Notes.html
#
# * We put export_symbols in a def-file, and don't use
# --export-all-symbols because it doesn't worked reliable in some
# tested configurations. And because other windows compilers also
# need their symbols specified this no serious problem.
#
# tested configurations:
#
# * cygwin gcc 2.91.57/ld 2.9.4/dllwrap 0.2.4 works
# (after patching python's config.h and for C++ some other include files)
# see also http://starship.python.net/crew/kernr/mingw32/Notes.html
# * mingw32 gcc 2.95.2/ld 2.9.4/dllwrap 0.2.4 works
# (ld doesn't support -shared, so we use dllwrap)
# * cygwin gcc 2.95.2/ld 2.10.90/dllwrap 2.10.90 works now
# - its dllwrap doesn't work, there is a bug in binutils 2.10.90
# see also http://sources.redhat.com/ml/cygwin/2000-06/msg01274.html
# - using gcc -mdll instead dllwrap doesn't work without -static because
# it tries to link against dlls instead their import libraries. (If
# it finds the dll first.)
# By specifying -static we force ld to link against the import libraries,
# this is windows standard and there are normally not the necessary symbols
# in the dlls.
# *** only the version of June 2000 shows these problems
# * cygwin gcc 3.2/ld 2.13.90 works
# (ld supports -shared)
# * mingw gcc 3.2/ld 2.13 works
# (ld supports -shared)
# This module should be kept compatible with Python 2.1.
__revision__ = "$Id$"
import os,sys,copy
from distutils.ccompiler import gen_preprocess_options, gen_lib_options
from distutils.unixccompiler import UnixCCompiler
from distutils.file_util import write_file
from distutils.errors import DistutilsExecError, CompileError, UnknownFileError
from distutils import log
def get_msvcr():
"""Include the appropriate MSVC runtime library if Python was built
with MSVC 7.0 or later.
"""
msc_pos = sys.version.find('MSC v.')
if msc_pos != -1:
msc_ver = sys.version[msc_pos+6:msc_pos+10]
if msc_ver == '1300':
# MSVC 7.0
return ['msvcr70']
elif msc_ver == '1310':
# MSVC 7.1
return ['msvcr71']
elif msc_ver == '1400':
# VS2005 / MSVC 8.0
return ['msvcr80']
elif msc_ver == '1500':
# VS2008 / MSVC 9.0
return ['msvcr90']
else:
raise ValueError("Unknown MS Compiler version %s " % msc_ver)
class CygwinCCompiler (UnixCCompiler):
compiler_type = 'cygwin'
obj_extension = ".o"
static_lib_extension = ".a"
shared_lib_extension = ".dll"
static_lib_format = "lib%s%s"
shared_lib_format = "%s%s"
exe_extension = ".exe"
def __init__ (self, verbose=0, dry_run=0, force=0):
UnixCCompiler.__init__ (self, verbose, dry_run, force)
(status, details) = check_config_h()
self.debug_print("Python's GCC status: %s (details: %s)" %
(status, details))
if status is not CONFIG_H_OK:
self.warn(
"Python's pyconfig.h doesn't seem to support your compiler. "
"Reason: %s. "
"Compiling may fail because of undefined preprocessor macros."
% details)
self.gcc_version, self.ld_version, self.dllwrap_version = \
get_versions()
self.debug_print(self.compiler_type + ": gcc %s, ld %s, dllwrap %s\n" %
(self.gcc_version,
self.ld_version,
self.dllwrap_version) )
# ld_version >= "2.10.90" and < "2.13" should also be able to use
# gcc -mdll instead of dllwrap
# Older dllwraps had own version numbers, newer ones use the
# same as the rest of binutils ( also ld )
# dllwrap 2.10.90 is buggy
if self.ld_version >= "2.10.90":
self.linker_dll = "gcc"
else:
self.linker_dll = "dllwrap"
# ld_version >= "2.13" support -shared so use it instead of
# -mdll -static
if self.ld_version >= "2.13":
shared_option = "-shared"
else:
shared_option = "-mdll -static"
# Hard-code GCC because that's what this is all about.
# XXX optimization, warnings etc. should be customizable.
self.set_executables(compiler='gcc -mcygwin -O -Wall',
compiler_so='gcc -mcygwin -mdll -O -Wall',
compiler_cxx='g++ -mcygwin -O -Wall',
linker_exe='gcc -mcygwin',
linker_so=('%s -mcygwin %s' %
(self.linker_dll, shared_option)))
# cygwin and mingw32 need different sets of libraries
if self.gcc_version == "2.91.57":
# cygwin shouldn't need msvcrt, but without the dlls will crash
# (gcc version 2.91.57) -- perhaps something about initialization
self.dll_libraries=["msvcrt"]
self.warn(
"Consider upgrading to a newer version of gcc")
else:
# Include the appropriate MSVC runtime library if Python was built
# with MSVC 7.0 or later.
self.dll_libraries = get_msvcr()
# __init__ ()
def _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts):
if ext == '.rc' or ext == '.res':
# gcc needs '.res' and '.rc' compiled to object files !!!
try:
self.spawn(["windres", "-i", src, "-o", obj])
except DistutilsExecError, msg:
raise CompileError, msg
else: # for other files use the C-compiler
try:
self.spawn(self.compiler_so + cc_args + [src, '-o', obj] +
extra_postargs)
except DistutilsExecError, msg:
raise CompileError, msg
def link (self,
target_desc,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
# use separate copies, so we can modify the lists
extra_preargs = copy.copy(extra_preargs or [])
libraries = copy.copy(libraries or [])
objects = copy.copy(objects or [])
# Additional libraries
libraries.extend(self.dll_libraries)
# handle export symbols by creating a def-file
# with executables this only works with gcc/ld as linker
if ((export_symbols is not None) and
(target_desc != self.EXECUTABLE or self.linker_dll == "gcc")):
# (The linker doesn't do anything if output is up-to-date.
# So it would probably better to check if we really need this,
# but for this we had to insert some unchanged parts of
# UnixCCompiler, and this is not what we want.)
# we want to put some files in the same directory as the
# object files are, build_temp doesn't help much
# where are the object files
temp_dir = os.path.dirname(objects[0])
# name of dll to give the helper files the same base name
(dll_name, dll_extension) = os.path.splitext(
os.path.basename(output_filename))
# generate the filenames for these files
def_file = os.path.join(temp_dir, dll_name + ".def")
lib_file = os.path.join(temp_dir, 'lib' + dll_name + ".a")
# Generate .def file
contents = [
"LIBRARY %s" % os.path.basename(output_filename),
"EXPORTS"]
for sym in export_symbols:
contents.append(sym)
self.execute(write_file, (def_file, contents),
"writing %s" % def_file)
# next add options for def-file and to creating import libraries
# dllwrap uses different options than gcc/ld
if self.linker_dll == "dllwrap":
extra_preargs.extend(["--output-lib", lib_file])
# for dllwrap we have to use a special option
extra_preargs.extend(["--def", def_file])
# we use gcc/ld here and can be sure ld is >= 2.9.10
else:
# doesn't work: bfd_close build\...\libfoo.a: Invalid operation
#extra_preargs.extend(["-Wl,--out-implib,%s" % lib_file])
# for gcc/ld the def-file is specified as any object files
objects.append(def_file)
#end: if ((export_symbols is not None) and
# (target_desc != self.EXECUTABLE or self.linker_dll == "gcc")):
# who wants symbols and a many times larger output file
# should explicitly switch the debug mode on
# otherwise we let dllwrap/ld strip the output file
# (On my machine: 10KB < stripped_file < ??100KB
# unstripped_file = stripped_file + XXX KB
# ( XXX=254 for a typical python extension))
if not debug:
extra_preargs.append("-s")
UnixCCompiler.link(self,
target_desc,
objects,
output_filename,
output_dir,
libraries,
library_dirs,
runtime_library_dirs,
None, # export_symbols, we do this in our def-file
debug,
extra_preargs,
extra_postargs,
build_temp,
target_lang)
# link ()
# -- Miscellaneous methods -----------------------------------------
# overwrite the one from CCompiler to support rc and res-files
def object_filenames (self,
source_filenames,
strip_dir=0,
output_dir=''):
if output_dir is None: output_dir = ''
obj_names = []
for src_name in source_filenames:
# use normcase to make sure '.rc' is really '.rc' and not '.RC'
(base, ext) = os.path.splitext (os.path.normcase(src_name))
if ext not in (self.src_extensions + ['.rc','.res']):
raise UnknownFileError, \
"unknown file type '%s' (from '%s')" % \
(ext, src_name)
if strip_dir:
base = os.path.basename (base)
if ext == '.res' or ext == '.rc':
# these need to be compiled to object files
obj_names.append (os.path.join (output_dir,
base + ext + self.obj_extension))
else:
obj_names.append (os.path.join (output_dir,
base + self.obj_extension))
return obj_names
# object_filenames ()
# class CygwinCCompiler
# the same as cygwin plus some additional parameters
class Mingw32CCompiler (CygwinCCompiler):
compiler_type = 'mingw32'
def __init__ (self,
verbose=0,
dry_run=0,
force=0):
CygwinCCompiler.__init__ (self, verbose, dry_run, force)
# ld_version >= "2.13" support -shared so use it instead of
# -mdll -static
if self.ld_version >= "2.13":
shared_option = "-shared"
else:
shared_option = "-mdll -static"
# A real mingw32 doesn't need to specify a different entry point,
# but cygwin 2.91.57 in no-cygwin-mode needs it.
if self.gcc_version <= "2.91.57":
entry_point = '--entry _DllMain@12'
else:
entry_point = ''
if self.gcc_version < '4' or is_cygwingcc():
no_cygwin = ' -mno-cygwin'
else:
no_cygwin = ''
self.set_executables(compiler='gcc%s -O -Wall' % no_cygwin,
compiler_so='gcc%s -mdll -O -Wall' % no_cygwin,
compiler_cxx='g++%s -O -Wall' % no_cygwin,
linker_exe='gcc%s' % no_cygwin,
linker_so='%s%s %s %s'
% (self.linker_dll, no_cygwin,
shared_option, entry_point))
# Maybe we should also append -mthreads, but then the finished
# dlls need another dll (mingwm10.dll see Mingw32 docs)
# (-mthreads: Support thread-safe exception handling on `Mingw32')
# no additional libraries needed
self.dll_libraries=[]
# Include the appropriate MSVC runtime library if Python was built
# with MSVC 7.0 or later.
self.dll_libraries = get_msvcr()
# __init__ ()
# class Mingw32CCompiler
# Because these compilers aren't configured in Python's pyconfig.h file by
# default, we should at least warn the user if he is using an unmodified
# version.
CONFIG_H_OK = "ok"
CONFIG_H_NOTOK = "not ok"
CONFIG_H_UNCERTAIN = "uncertain"
def check_config_h():
"""Check if the current Python installation (specifically, pyconfig.h)
appears amenable to building extensions with GCC. Returns a tuple
(status, details), where 'status' is one of the following constants:
CONFIG_H_OK
all is well, go ahead and compile
CONFIG_H_NOTOK
doesn't look good
CONFIG_H_UNCERTAIN
not sure -- unable to read pyconfig.h
'details' is a human-readable string explaining the situation.
Note there are two ways to conclude "OK": either 'sys.version' contains
the string "GCC" (implying that this Python was built with GCC), or the
installed "pyconfig.h" contains the string "__GNUC__".
"""
# XXX since this function also checks sys.version, it's not strictly a
# "pyconfig.h" check -- should probably be renamed...
from distutils import sysconfig
import string
# if sys.version contains GCC then python was compiled with
# GCC, and the pyconfig.h file should be OK
if string.find(sys.version,"GCC") >= 0:
return (CONFIG_H_OK, "sys.version mentions 'GCC'")
fn = sysconfig.get_config_h_filename()
try:
# It would probably better to read single lines to search.
# But we do this only once, and it is fast enough
f = open(fn)
try:
s = f.read()
finally:
f.close()
except IOError, exc:
# if we can't read this file, we cannot say it is wrong
# the compiler will complain later about this file as missing
return (CONFIG_H_UNCERTAIN,
"couldn't read '%s': %s" % (fn, exc.strerror))
else:
# "pyconfig.h" contains an "#ifdef __GNUC__" or something similar
if string.find(s,"__GNUC__") >= 0:
return (CONFIG_H_OK, "'%s' mentions '__GNUC__'" % fn)
else:
return (CONFIG_H_NOTOK, "'%s' does not mention '__GNUC__'" % fn)
def get_versions():
""" Try to find out the versions of gcc, ld and dllwrap.
If not possible it returns None for it.
"""
from distutils.version import LooseVersion
from distutils.spawn import find_executable
import re
gcc_exe = find_executable('gcc')
if gcc_exe:
out = os.popen(gcc_exe + ' -dumpversion','r')
out_string = out.read()
out.close()
result = re.search('(\d+\.\d+(\.\d+)*)',out_string)
if result:
gcc_version = LooseVersion(result.group(1))
else:
gcc_version = None
else:
gcc_version = None
ld_exe = find_executable('ld')
if ld_exe:
out = os.popen(ld_exe + ' -v','r')
out_string = out.read()
out.close()
result = re.search('(\d+\.\d+(\.\d+)*)',out_string)
if result:
ld_version = LooseVersion(result.group(1))
else:
ld_version = None
else:
ld_version = None
dllwrap_exe = find_executable('dllwrap')
if dllwrap_exe:
out = os.popen(dllwrap_exe + ' --version','r')
out_string = out.read()
out.close()
result = re.search(' (\d+\.\d+(\.\d+)*)',out_string)
if result:
dllwrap_version = LooseVersion(result.group(1))
else:
dllwrap_version = None
else:
dllwrap_version = None
return (gcc_version, ld_version, dllwrap_version)
def is_cygwingcc():
'''Try to determine if the gcc that would be used is from cygwin.'''
out = os.popen('gcc -dumpmachine', 'r')
out_string = out.read()
out.close()
# out_string is the target triplet cpu-vendor-os
# Cygwin's gcc sets the os to 'cygwin'
return out_string.strip().endswith('cygwin')
import os
__revision__ = "$Id$"
# If DISTUTILS_DEBUG is anything other than the empty string, we run in
# debug mode.
DEBUG = os.environ.get('DISTUTILS_DEBUG')
"""distutils.dep_util
Utility functions for simple, timestamp-based dependency of files
and groups of files; also, function based entirely on such
timestamp dependency analysis."""
__revision__ = "$Id$"
import os
from stat import ST_MTIME
from distutils.errors import DistutilsFileError
def newer(source, target):
"""Tells if the target is newer than the source.
Return true if 'source' exists and is more recently modified than
'target', or if 'source' exists and 'target' doesn't.
Return false if both exist and 'target' is the same age or younger
than 'source'. Raise DistutilsFileError if 'source' does not exist.
Note that this test is not very accurate: files created in the same second
will have the same "age".
"""
if not os.path.exists(source):
raise DistutilsFileError("file '%s' does not exist" %
os.path.abspath(source))
if not os.path.exists(target):
return True
return os.stat(source)[ST_MTIME] > os.stat(target)[ST_MTIME]
def newer_pairwise(sources, targets):
"""Walk two filename lists in parallel, testing if each source is newer
than its corresponding target. Return a pair of lists (sources,
targets) where source is newer than target, according to the semantics
of 'newer()'.
"""
if len(sources) != len(targets):
raise ValueError, "'sources' and 'targets' must be same length"
# build a pair of lists (sources, targets) where source is newer
n_sources = []
n_targets = []
for source, target in zip(sources, targets):
if newer(source, target):
n_sources.append(source)
n_targets.append(target)
return n_sources, n_targets
def newer_group(sources, target, missing='error'):
"""Return true if 'target' is out-of-date with respect to any file
listed in 'sources'.
In other words, if 'target' exists and is newer
than every file in 'sources', return false; otherwise return true.
'missing' controls what we do when a source file is missing; the
default ("error") is to blow up with an OSError from inside 'stat()';
if it is "ignore", we silently drop any missing source files; if it is
"newer", any missing source files make us assume that 'target' is
out-of-date (this is handy in "dry-run" mode: it'll make you pretend to
carry out commands that wouldn't work because inputs are missing, but
that doesn't matter because you're not actually going to run the
commands).
"""
# If the target doesn't even exist, then it's definitely out-of-date.
if not os.path.exists(target):
return True
# Otherwise we have to find out the hard way: if *any* source file
# is more recent than 'target', then 'target' is out-of-date and
# we can immediately return true. If we fall through to the end
# of the loop, then 'target' is up-to-date and we return false.
target_mtime = os.stat(target)[ST_MTIME]
for source in sources:
if not os.path.exists(source):
if missing == 'error': # blow up when we stat() the file
pass
elif missing == 'ignore': # missing source dropped from
continue # target's dependency list
elif missing == 'newer': # missing source means target is
return True # out-of-date
if os.stat(source)[ST_MTIME] > target_mtime:
return True
return False
"""distutils.dir_util
Utility functions for manipulating directories and directory trees."""
__revision__ = "$Id$"
import os
import errno
from distutils.errors import DistutilsFileError, DistutilsInternalError
from distutils import log
# cache for by mkpath() -- in addition to cheapening redundant calls,
# eliminates redundant "creating /foo/bar/baz" messages in dry-run mode
_path_created = {}
# I don't use os.makedirs because a) it's new to Python 1.5.2, and
# b) it blows up if the directory already exists (I want to silently
# succeed in that case).
def mkpath(name, mode=0777, verbose=1, dry_run=0):
"""Create a directory and any missing ancestor directories.
If the directory already exists (or if 'name' is the empty string, which
means the current directory, which of course exists), then do nothing.
Raise DistutilsFileError if unable to create some directory along the way
(eg. some sub-path exists, but is a file rather than a directory).
If 'verbose' is true, print a one-line summary of each mkdir to stdout.
Return the list of directories actually created.
"""
global _path_created
# Detect a common bug -- name is None
if not isinstance(name, basestring):
raise DistutilsInternalError, \
"mkpath: 'name' must be a string (got %r)" % (name,)
# XXX what's the better way to handle verbosity? print as we create
# each directory in the path (the current behaviour), or only announce
# the creation of the whole path? (quite easy to do the latter since
# we're not using a recursive algorithm)
name = os.path.normpath(name)
created_dirs = []
if os.path.isdir(name) or name == '':
return created_dirs
if _path_created.get(os.path.abspath(name)):
return created_dirs
(head, tail) = os.path.split(name)
tails = [tail] # stack of lone dirs to create
while head and tail and not os.path.isdir(head):
(head, tail) = os.path.split(head)
tails.insert(0, tail) # push next higher dir onto stack
# now 'head' contains the deepest directory that already exists
# (that is, the child of 'head' in 'name' is the highest directory
# that does *not* exist)
for d in tails:
#print "head = %s, d = %s: " % (head, d),
head = os.path.join(head, d)
abs_head = os.path.abspath(head)
if _path_created.get(abs_head):
continue
if verbose >= 1:
log.info("creating %s", head)
if not dry_run:
try:
os.mkdir(head, mode)
except OSError, exc:
if not (exc.errno == errno.EEXIST and os.path.isdir(head)):
raise DistutilsFileError(
"could not create '%s': %s" % (head, exc.args[-1]))
created_dirs.append(head)
_path_created[abs_head] = 1
return created_dirs
def create_tree(base_dir, files, mode=0777, verbose=1, dry_run=0):
"""Create all the empty directories under 'base_dir' needed to put 'files'
there.
'base_dir' is just the name of a directory which doesn't necessarily
exist yet; 'files' is a list of filenames to be interpreted relative to
'base_dir'. 'base_dir' + the directory portion of every file in 'files'
will be created if it doesn't already exist. 'mode', 'verbose' and
'dry_run' flags are as for 'mkpath()'.
"""
# First get the list of directories to create
need_dir = {}
for file in files:
need_dir[os.path.join(base_dir, os.path.dirname(file))] = 1
need_dirs = need_dir.keys()
need_dirs.sort()
# Now create them
for dir in need_dirs:
mkpath(dir, mode, verbose=verbose, dry_run=dry_run)
def copy_tree(src, dst, preserve_mode=1, preserve_times=1,
preserve_symlinks=0, update=0, verbose=1, dry_run=0):
"""Copy an entire directory tree 'src' to a new location 'dst'.
Both 'src' and 'dst' must be directory names. If 'src' is not a
directory, raise DistutilsFileError. If 'dst' does not exist, it is
created with 'mkpath()'. The end result of the copy is that every
file in 'src' is copied to 'dst', and directories under 'src' are
recursively copied to 'dst'. Return the list of files that were
copied or might have been copied, using their output name. The
return value is unaffected by 'update' or 'dry_run': it is simply
the list of all files under 'src', with the names changed to be
under 'dst'.
'preserve_mode' and 'preserve_times' are the same as for
'copy_file'; note that they only apply to regular files, not to
directories. If 'preserve_symlinks' is true, symlinks will be
copied as symlinks (on platforms that support them!); otherwise
(the default), the destination of the symlink will be copied.
'update' and 'verbose' are the same as for 'copy_file'.
"""
from distutils.file_util import copy_file
if not dry_run and not os.path.isdir(src):
raise DistutilsFileError, \
"cannot copy tree '%s': not a directory" % src
try:
names = os.listdir(src)
except os.error, (errno, errstr):
if dry_run:
names = []
else:
raise DistutilsFileError, \
"error listing files in '%s': %s" % (src, errstr)
if not dry_run:
mkpath(dst, verbose=verbose)
outputs = []
for n in names:
src_name = os.path.join(src, n)
dst_name = os.path.join(dst, n)
if n.startswith('.nfs'):
# skip NFS rename files
continue
if preserve_symlinks and os.path.islink(src_name):
link_dest = os.readlink(src_name)
if verbose >= 1:
log.info("linking %s -> %s", dst_name, link_dest)
if not dry_run:
os.symlink(link_dest, dst_name)
outputs.append(dst_name)
elif os.path.isdir(src_name):
outputs.extend(
copy_tree(src_name, dst_name, preserve_mode,
preserve_times, preserve_symlinks, update,
verbose=verbose, dry_run=dry_run))
else:
copy_file(src_name, dst_name, preserve_mode,
preserve_times, update, verbose=verbose,
dry_run=dry_run)
outputs.append(dst_name)
return outputs
def _build_cmdtuple(path, cmdtuples):
"""Helper for remove_tree()."""
for f in os.listdir(path):
real_f = os.path.join(path,f)
if os.path.isdir(real_f) and not os.path.islink(real_f):
_build_cmdtuple(real_f, cmdtuples)
else:
cmdtuples.append((os.remove, real_f))
cmdtuples.append((os.rmdir, path))
def remove_tree(directory, verbose=1, dry_run=0):
"""Recursively remove an entire directory tree.
Any errors are ignored (apart from being reported to stdout if 'verbose'
is true).
"""
global _path_created
if verbose >= 1:
log.info("removing '%s' (and everything under it)", directory)
if dry_run:
return
cmdtuples = []
_build_cmdtuple(directory, cmdtuples)
for cmd in cmdtuples:
try:
cmd[0](cmd[1])
# remove dir from cache if it's already there
abspath = os.path.abspath(cmd[1])
if abspath in _path_created:
del _path_created[abspath]
except (IOError, OSError), exc:
log.warn("error removing %s: %s", directory, exc)
def ensure_relative(path):
"""Take the full path 'path', and make it a relative path.
This is useful to make 'path' the second argument to os.path.join().
"""
drive, path = os.path.splitdrive(path)
if path[0:1] == os.sep:
path = drive + path[1:]
return path
"""distutils.dist
Provides the Distribution class, which represents the module distribution
being built/installed/distributed.
"""
__revision__ = "$Id$"
import sys, os, re
from email import message_from_file
try:
import warnings
except ImportError:
warnings = None
from distutils.errors import (DistutilsOptionError, DistutilsArgError,
DistutilsModuleError, DistutilsClassError)
from distutils.fancy_getopt import FancyGetopt, translate_longopt
from distutils.util import check_environ, strtobool, rfc822_escape
from distutils import log
from distutils.debug import DEBUG
# Encoding used for the PKG-INFO files
PKG_INFO_ENCODING = 'utf-8'
# Regex to define acceptable Distutils command names. This is not *quite*
# the same as a Python NAME -- I don't allow leading underscores. The fact
# that they're very similar is no coincidence; the default naming scheme is
# to look for a Python module named after the command.
command_re = re.compile (r'^[a-zA-Z]([a-zA-Z0-9_]*)$')
class Distribution:
"""The core of the Distutils. Most of the work hiding behind 'setup'
is really done within a Distribution instance, which farms the work out
to the Distutils commands specified on the command line.
Setup scripts will almost never instantiate Distribution directly,
unless the 'setup()' function is totally inadequate to their needs.
However, it is conceivable that a setup script might wish to subclass
Distribution for some specialized purpose, and then pass the subclass
to 'setup()' as the 'distclass' keyword argument. If so, it is
necessary to respect the expectations that 'setup' has of Distribution.
See the code for 'setup()', in core.py, for details.
"""
# 'global_options' describes the command-line options that may be
# supplied to the setup script prior to any actual commands.
# Eg. "./setup.py -n" or "./setup.py --quiet" both take advantage of
# these global options. This list should be kept to a bare minimum,
# since every global option is also valid as a command option -- and we
# don't want to pollute the commands with too many options that they
# have minimal control over.
# The fourth entry for verbose means that it can be repeated.
global_options = [('verbose', 'v', "run verbosely (default)", 1),
('quiet', 'q', "run quietly (turns verbosity off)"),
('dry-run', 'n', "don't actually do anything"),
('help', 'h', "show detailed help message"),
('no-user-cfg', None,
'ignore pydistutils.cfg in your home directory'),
]
# 'common_usage' is a short (2-3 line) string describing the common
# usage of the setup script.
common_usage = """\
Common commands: (see '--help-commands' for more)
setup.py build will build the package underneath 'build/'
setup.py install will install the package
"""
# options that are not propagated to the commands
display_options = [
('help-commands', None,
"list all available commands"),
('name', None,
"print package name"),
('version', 'V',
"print package version"),
('fullname', None,
"print <package name>-<version>"),
('author', None,
"print the author's name"),
('author-email', None,
"print the author's email address"),
('maintainer', None,
"print the maintainer's name"),
('maintainer-email', None,
"print the maintainer's email address"),
('contact', None,
"print the maintainer's name if known, else the author's"),
('contact-email', None,
"print the maintainer's email address if known, else the author's"),
('url', None,
"print the URL for this package"),
('license', None,
"print the license of the package"),
('licence', None,
"alias for --license"),
('description', None,
"print the package description"),
('long-description', None,
"print the long package description"),
('platforms', None,
"print the list of platforms"),
('classifiers', None,
"print the list of classifiers"),
('keywords', None,
"print the list of keywords"),
('provides', None,
"print the list of packages/modules provided"),
('requires', None,
"print the list of packages/modules required"),
('obsoletes', None,
"print the list of packages/modules made obsolete")
]
display_option_names = map(lambda x: translate_longopt(x[0]),
display_options)
# negative options are options that exclude other options
negative_opt = {'quiet': 'verbose'}
# -- Creation/initialization methods -------------------------------
def __init__ (self, attrs=None):
"""Construct a new Distribution instance: initialize all the
attributes of a Distribution, and then use 'attrs' (a dictionary
mapping attribute names to values) to assign some of those
attributes their "real" values. (Any attributes not mentioned in
'attrs' will be assigned to some null value: 0, None, an empty list
or dictionary, etc.) Most importantly, initialize the
'command_obj' attribute to the empty dictionary; this will be
filled in with real command objects by 'parse_command_line()'.
"""
# Default values for our command-line options
self.verbose = 1
self.dry_run = 0
self.help = 0
for attr in self.display_option_names:
setattr(self, attr, 0)
# Store the distribution meta-data (name, version, author, and so
# forth) in a separate object -- we're getting to have enough
# information here (and enough command-line options) that it's
# worth it. Also delegate 'get_XXX()' methods to the 'metadata'
# object in a sneaky and underhanded (but efficient!) way.
self.metadata = DistributionMetadata()
for basename in self.metadata._METHOD_BASENAMES:
method_name = "get_" + basename
setattr(self, method_name, getattr(self.metadata, method_name))
# 'cmdclass' maps command names to class objects, so we
# can 1) quickly figure out which class to instantiate when
# we need to create a new command object, and 2) have a way
# for the setup script to override command classes
self.cmdclass = {}
# 'command_packages' is a list of packages in which commands
# are searched for. The factory for command 'foo' is expected
# to be named 'foo' in the module 'foo' in one of the packages
# named here. This list is searched from the left; an error
# is raised if no named package provides the command being
# searched for. (Always access using get_command_packages().)
self.command_packages = None
# 'script_name' and 'script_args' are usually set to sys.argv[0]
# and sys.argv[1:], but they can be overridden when the caller is
# not necessarily a setup script run from the command-line.
self.script_name = None
self.script_args = None
# 'command_options' is where we store command options between
# parsing them (from config files, the command-line, etc.) and when
# they are actually needed -- ie. when the command in question is
# instantiated. It is a dictionary of dictionaries of 2-tuples:
# command_options = { command_name : { option : (source, value) } }
self.command_options = {}
# 'dist_files' is the list of (command, pyversion, file) that
# have been created by any dist commands run so far. This is
# filled regardless of whether the run is dry or not. pyversion
# gives sysconfig.get_python_version() if the dist file is
# specific to a Python version, 'any' if it is good for all
# Python versions on the target platform, and '' for a source
# file. pyversion should not be used to specify minimum or
# maximum required Python versions; use the metainfo for that
# instead.
self.dist_files = []
# These options are really the business of various commands, rather
# than of the Distribution itself. We provide aliases for them in
# Distribution as a convenience to the developer.
self.packages = None
self.package_data = {}
self.package_dir = None
self.py_modules = None
self.libraries = None
self.headers = None
self.ext_modules = None
self.ext_package = None
self.include_dirs = None
self.extra_path = None
self.scripts = None
self.data_files = None
self.password = ''
# And now initialize bookkeeping stuff that can't be supplied by
# the caller at all. 'command_obj' maps command names to
# Command instances -- that's how we enforce that every command
# class is a singleton.
self.command_obj = {}
# 'have_run' maps command names to boolean values; it keeps track
# of whether we have actually run a particular command, to make it
# cheap to "run" a command whenever we think we might need to -- if
# it's already been done, no need for expensive filesystem
# operations, we just check the 'have_run' dictionary and carry on.
# It's only safe to query 'have_run' for a command class that has
# been instantiated -- a false value will be inserted when the
# command object is created, and replaced with a true value when
# the command is successfully run. Thus it's probably best to use
# '.get()' rather than a straight lookup.
self.have_run = {}
# Now we'll use the attrs dictionary (ultimately, keyword args from
# the setup script) to possibly override any or all of these
# distribution options.
if attrs:
# Pull out the set of command options and work on them
# specifically. Note that this order guarantees that aliased
# command options will override any supplied redundantly
# through the general options dictionary.
options = attrs.get('options')
if options is not None:
del attrs['options']
for (command, cmd_options) in options.items():
opt_dict = self.get_option_dict(command)
for (opt, val) in cmd_options.items():
opt_dict[opt] = ("setup script", val)
if 'licence' in attrs:
attrs['license'] = attrs['licence']
del attrs['licence']
msg = "'licence' distribution option is deprecated; use 'license'"
if warnings is not None:
warnings.warn(msg)
else:
sys.stderr.write(msg + "\n")
# Now work on the rest of the attributes. Any attribute that's
# not already defined is invalid!
for (key, val) in attrs.items():
if hasattr(self.metadata, "set_" + key):
getattr(self.metadata, "set_" + key)(val)
elif hasattr(self.metadata, key):
setattr(self.metadata, key, val)
elif hasattr(self, key):
setattr(self, key, val)
else:
msg = "Unknown distribution option: %s" % repr(key)
if warnings is not None:
warnings.warn(msg)
else:
sys.stderr.write(msg + "\n")
# no-user-cfg is handled before other command line args
# because other args override the config files, and this
# one is needed before we can load the config files.
# If attrs['script_args'] wasn't passed, assume false.
#
# This also make sure we just look at the global options
self.want_user_cfg = True
if self.script_args is not None:
for arg in self.script_args:
if not arg.startswith('-'):
break
if arg == '--no-user-cfg':
self.want_user_cfg = False
break
self.finalize_options()
def get_option_dict(self, command):
"""Get the option dictionary for a given command. If that
command's option dictionary hasn't been created yet, then create it
and return the new dictionary; otherwise, return the existing
option dictionary.
"""
dict = self.command_options.get(command)
if dict is None:
dict = self.command_options[command] = {}
return dict
def dump_option_dicts(self, header=None, commands=None, indent=""):
from pprint import pformat
if commands is None: # dump all command option dicts
commands = self.command_options.keys()
commands.sort()
if header is not None:
self.announce(indent + header)
indent = indent + " "
if not commands:
self.announce(indent + "no commands known yet")
return
for cmd_name in commands:
opt_dict = self.command_options.get(cmd_name)
if opt_dict is None:
self.announce(indent +
"no option dict for '%s' command" % cmd_name)
else:
self.announce(indent +
"option dict for '%s' command:" % cmd_name)
out = pformat(opt_dict)
for line in out.split('\n'):
self.announce(indent + " " + line)
# -- Config file finding/parsing methods ---------------------------
def find_config_files(self):
"""Find as many configuration files as should be processed for this
platform, and return a list of filenames in the order in which they
should be parsed. The filenames returned are guaranteed to exist
(modulo nasty race conditions).
There are three possible config files: distutils.cfg in the
Distutils installation directory (ie. where the top-level
Distutils __inst__.py file lives), a file in the user's home
directory named .pydistutils.cfg on Unix and pydistutils.cfg
on Windows/Mac; and setup.cfg in the current directory.
The file in the user's home directory can be disabled with the
--no-user-cfg option.
"""
files = []
check_environ()
# Where to look for the system-wide Distutils config file
sys_dir = os.path.dirname(sys.modules['distutils'].__file__)
# Look for the system config file
sys_file = os.path.join(sys_dir, "distutils.cfg")
if os.path.isfile(sys_file):
files.append(sys_file)
# What to call the per-user config file
if os.name == 'posix':
user_filename = ".pydistutils.cfg"
else:
user_filename = "pydistutils.cfg"
# And look for the user config file
if self.want_user_cfg:
user_file = os.path.join(os.path.expanduser('~'), user_filename)
if os.path.isfile(user_file):
files.append(user_file)
# All platforms support local setup.cfg
local_file = "setup.cfg"
if os.path.isfile(local_file):
files.append(local_file)
if DEBUG:
self.announce("using config files: %s" % ', '.join(files))
return files
def parse_config_files(self, filenames=None):
from ConfigParser import ConfigParser
if filenames is None:
filenames = self.find_config_files()
if DEBUG:
self.announce("Distribution.parse_config_files():")
parser = ConfigParser()
for filename in filenames:
if DEBUG:
self.announce(" reading %s" % filename)
parser.read(filename)
for section in parser.sections():
options = parser.options(section)
opt_dict = self.get_option_dict(section)
for opt in options:
if opt != '__name__':
val = parser.get(section,opt)
opt = opt.replace('-', '_')
opt_dict[opt] = (filename, val)
# Make the ConfigParser forget everything (so we retain
# the original filenames that options come from)
parser.__init__()
# If there was a "global" section in the config file, use it
# to set Distribution options.
if 'global' in self.command_options:
for (opt, (src, val)) in self.command_options['global'].items():
alias = self.negative_opt.get(opt)
try:
if alias:
setattr(self, alias, not strtobool(val))
elif opt in ('verbose', 'dry_run'): # ugh!
setattr(self, opt, strtobool(val))
else:
setattr(self, opt, val)
except ValueError, msg:
raise DistutilsOptionError, msg
# -- Command-line parsing methods ----------------------------------
def parse_command_line(self):
"""Parse the setup script's command line, taken from the
'script_args' instance attribute (which defaults to 'sys.argv[1:]'
-- see 'setup()' in core.py). This list is first processed for
"global options" -- options that set attributes of the Distribution
instance. Then, it is alternately scanned for Distutils commands
and options for that command. Each new command terminates the
options for the previous command. The allowed options for a
command are determined by the 'user_options' attribute of the
command class -- thus, we have to be able to load command classes
in order to parse the command line. Any error in that 'options'
attribute raises DistutilsGetoptError; any error on the
command-line raises DistutilsArgError. If no Distutils commands
were found on the command line, raises DistutilsArgError. Return
true if command-line was successfully parsed and we should carry
on with executing commands; false if no errors but we shouldn't
execute commands (currently, this only happens if user asks for
help).
"""
#
# We now have enough information to show the Macintosh dialog
# that allows the user to interactively specify the "command line".
#
toplevel_options = self._get_toplevel_options()
# We have to parse the command line a bit at a time -- global
# options, then the first command, then its options, and so on --
# because each command will be handled by a different class, and
# the options that are valid for a particular class aren't known
# until we have loaded the command class, which doesn't happen
# until we know what the command is.
self.commands = []
parser = FancyGetopt(toplevel_options + self.display_options)
parser.set_negative_aliases(self.negative_opt)
parser.set_aliases({'licence': 'license'})
args = parser.getopt(args=self.script_args, object=self)
option_order = parser.get_option_order()
log.set_verbosity(self.verbose)
# for display options we return immediately
if self.handle_display_options(option_order):
return
while args:
args = self._parse_command_opts(parser, args)
if args is None: # user asked for help (and got it)
return
# Handle the cases of --help as a "global" option, ie.
# "setup.py --help" and "setup.py --help command ...". For the
# former, we show global options (--verbose, --dry-run, etc.)
# and display-only options (--name, --version, etc.); for the
# latter, we omit the display-only options and show help for
# each command listed on the command line.
if self.help:
self._show_help(parser,
display_options=len(self.commands) == 0,
commands=self.commands)
return
# Oops, no commands found -- an end-user error
if not self.commands:
raise DistutilsArgError, "no commands supplied"
# All is well: return true
return 1
def _get_toplevel_options(self):
"""Return the non-display options recognized at the top level.
This includes options that are recognized *only* at the top
level as well as options recognized for commands.
"""
return self.global_options + [
("command-packages=", None,
"list of packages that provide distutils commands"),
]
def _parse_command_opts(self, parser, args):
"""Parse the command-line options for a single command.
'parser' must be a FancyGetopt instance; 'args' must be the list
of arguments, starting with the current command (whose options
we are about to parse). Returns a new version of 'args' with
the next command at the front of the list; will be the empty
list if there are no more commands on the command line. Returns
None if the user asked for help on this command.
"""
# late import because of mutual dependence between these modules
from distutils.cmd import Command
# Pull the current command from the head of the command line
command = args[0]
if not command_re.match(command):
raise SystemExit, "invalid command name '%s'" % command
self.commands.append(command)
# Dig up the command class that implements this command, so we
# 1) know that it's a valid command, and 2) know which options
# it takes.
try:
cmd_class = self.get_command_class(command)
except DistutilsModuleError, msg:
raise DistutilsArgError, msg
# Require that the command class be derived from Command -- want
# to be sure that the basic "command" interface is implemented.
if not issubclass(cmd_class, Command):
raise DistutilsClassError, \
"command class %s must subclass Command" % cmd_class
# Also make sure that the command object provides a list of its
# known options.
if not (hasattr(cmd_class, 'user_options') and
isinstance(cmd_class.user_options, list)):
raise DistutilsClassError, \
("command class %s must provide " +
"'user_options' attribute (a list of tuples)") % \
cmd_class
# If the command class has a list of negative alias options,
# merge it in with the global negative aliases.
negative_opt = self.negative_opt
if hasattr(cmd_class, 'negative_opt'):
negative_opt = negative_opt.copy()
negative_opt.update(cmd_class.negative_opt)
# Check for help_options in command class. They have a different
# format (tuple of four) so we need to preprocess them here.
if (hasattr(cmd_class, 'help_options') and
isinstance(cmd_class.help_options, list)):
help_options = fix_help_options(cmd_class.help_options)
else:
help_options = []
# All commands support the global options too, just by adding
# in 'global_options'.
parser.set_option_table(self.global_options +
cmd_class.user_options +
help_options)
parser.set_negative_aliases(negative_opt)
(args, opts) = parser.getopt(args[1:])
if hasattr(opts, 'help') and opts.help:
self._show_help(parser, display_options=0, commands=[cmd_class])
return
if (hasattr(cmd_class, 'help_options') and
isinstance(cmd_class.help_options, list)):
help_option_found=0
for (help_option, short, desc, func) in cmd_class.help_options:
if hasattr(opts, parser.get_attr_name(help_option)):
help_option_found=1
if hasattr(func, '__call__'):
func()
else:
raise DistutilsClassError(
"invalid help function %r for help option '%s': "
"must be a callable object (function, etc.)"
% (func, help_option))
if help_option_found:
return
# Put the options from the command-line into their official
# holding pen, the 'command_options' dictionary.
opt_dict = self.get_option_dict(command)
for (name, value) in vars(opts).items():
opt_dict[name] = ("command line", value)
return args
def finalize_options(self):
"""Set final values for all the options on the Distribution
instance, analogous to the .finalize_options() method of Command
objects.
"""
for attr in ('keywords', 'platforms'):
value = getattr(self.metadata, attr)
if value is None:
continue
if isinstance(value, str):
value = [elm.strip() for elm in value.split(',')]
setattr(self.metadata, attr, value)
def _show_help(self, parser, global_options=1, display_options=1,
commands=[]):
"""Show help for the setup script command-line in the form of
several lists of command-line options. 'parser' should be a
FancyGetopt instance; do not expect it to be returned in the
same state, as its option table will be reset to make it
generate the correct help text.
If 'global_options' is true, lists the global options:
--verbose, --dry-run, etc. If 'display_options' is true, lists
the "display-only" options: --name, --version, etc. Finally,
lists per-command help for every command name or command class
in 'commands'.
"""
# late import because of mutual dependence between these modules
from distutils.core import gen_usage
from distutils.cmd import Command
if global_options:
if display_options:
options = self._get_toplevel_options()
else:
options = self.global_options
parser.set_option_table(options)
parser.print_help(self.common_usage + "\nGlobal options:")
print('')
if display_options:
parser.set_option_table(self.display_options)
parser.print_help(
"Information display options (just display " +
"information, ignore any commands)")
print('')
for command in self.commands:
if isinstance(command, type) and issubclass(command, Command):
klass = command
else:
klass = self.get_command_class(command)
if (hasattr(klass, 'help_options') and
isinstance(klass.help_options, list)):
parser.set_option_table(klass.user_options +
fix_help_options(klass.help_options))
else:
parser.set_option_table(klass.user_options)
parser.print_help("Options for '%s' command:" % klass.__name__)
print('')
print(gen_usage(self.script_name))
def handle_display_options(self, option_order):
"""If there were any non-global "display-only" options
(--help-commands or the metadata display options) on the command
line, display the requested info and return true; else return
false.
"""
from distutils.core import gen_usage
# User just wants a list of commands -- we'll print it out and stop
# processing now (ie. if they ran "setup --help-commands foo bar",
# we ignore "foo bar").
if self.help_commands:
self.print_commands()
print('')
print(gen_usage(self.script_name))
return 1
# If user supplied any of the "display metadata" options, then
# display that metadata in the order in which the user supplied the
# metadata options.
any_display_options = 0
is_display_option = {}
for option in self.display_options:
is_display_option[option[0]] = 1
for (opt, val) in option_order:
if val and is_display_option.get(opt):
opt = translate_longopt(opt)
value = getattr(self.metadata, "get_"+opt)()
if opt in ['keywords', 'platforms']:
print(','.join(value))
elif opt in ('classifiers', 'provides', 'requires',
'obsoletes'):
print('\n'.join(value))
else:
print(value)
any_display_options = 1
return any_display_options
def print_command_list(self, commands, header, max_length):
"""Print a subset of the list of all commands -- used by
'print_commands()'.
"""
print(header + ":")
for cmd in commands:
klass = self.cmdclass.get(cmd)
if not klass:
klass = self.get_command_class(cmd)
try:
description = klass.description
except AttributeError:
description = "(no description available)"
print(" %-*s %s" % (max_length, cmd, description))
def print_commands(self):
"""Print out a help message listing all available commands with a
description of each. The list is divided into "standard commands"
(listed in distutils.command.__all__) and "extra commands"
(mentioned in self.cmdclass, but not a standard command). The
descriptions come from the command class attribute
'description'.
"""
import distutils.command
std_commands = distutils.command.__all__
is_std = {}
for cmd in std_commands:
is_std[cmd] = 1
extra_commands = []
for cmd in self.cmdclass.keys():
if not is_std.get(cmd):
extra_commands.append(cmd)
max_length = 0
for cmd in (std_commands + extra_commands):
if len(cmd) > max_length:
max_length = len(cmd)
self.print_command_list(std_commands,
"Standard commands",
max_length)
if extra_commands:
print
self.print_command_list(extra_commands,
"Extra commands",
max_length)
def get_command_list(self):
"""Get a list of (command, description) tuples.
The list is divided into "standard commands" (listed in
distutils.command.__all__) and "extra commands" (mentioned in
self.cmdclass, but not a standard command). The descriptions come
from the command class attribute 'description'.
"""
# Currently this is only used on Mac OS, for the Mac-only GUI
# Distutils interface (by Jack Jansen)
import distutils.command
std_commands = distutils.command.__all__
is_std = {}
for cmd in std_commands:
is_std[cmd] = 1
extra_commands = []
for cmd in self.cmdclass.keys():
if not is_std.get(cmd):
extra_commands.append(cmd)
rv = []
for cmd in (std_commands + extra_commands):
klass = self.cmdclass.get(cmd)
if not klass:
klass = self.get_command_class(cmd)
try:
description = klass.description
except AttributeError:
description = "(no description available)"
rv.append((cmd, description))
return rv
# -- Command class/object methods ----------------------------------
def get_command_packages(self):
"""Return a list of packages from which commands are loaded."""
pkgs = self.command_packages
if not isinstance(pkgs, list):
if pkgs is None:
pkgs = ''
pkgs = [pkg.strip() for pkg in pkgs.split(',') if pkg != '']
if "distutils.command" not in pkgs:
pkgs.insert(0, "distutils.command")
self.command_packages = pkgs
return pkgs
def get_command_class(self, command):
"""Return the class that implements the Distutils command named by
'command'. First we check the 'cmdclass' dictionary; if the
command is mentioned there, we fetch the class object from the
dictionary and return it. Otherwise we load the command module
("distutils.command." + command) and fetch the command class from
the module. The loaded class is also stored in 'cmdclass'
to speed future calls to 'get_command_class()'.
Raises DistutilsModuleError if the expected module could not be
found, or if that module does not define the expected class.
"""
klass = self.cmdclass.get(command)
if klass:
return klass
for pkgname in self.get_command_packages():
module_name = "%s.%s" % (pkgname, command)
klass_name = command
try:
__import__ (module_name)
module = sys.modules[module_name]
except ImportError:
continue
try:
klass = getattr(module, klass_name)
except AttributeError:
raise DistutilsModuleError, \
"invalid command '%s' (no class '%s' in module '%s')" \
% (command, klass_name, module_name)
self.cmdclass[command] = klass
return klass
raise DistutilsModuleError("invalid command '%s'" % command)
def get_command_obj(self, command, create=1):
"""Return the command object for 'command'. Normally this object
is cached on a previous call to 'get_command_obj()'; if no command
object for 'command' is in the cache, then we either create and
return it (if 'create' is true) or return None.
"""
cmd_obj = self.command_obj.get(command)
if not cmd_obj and create:
if DEBUG:
self.announce("Distribution.get_command_obj(): " \
"creating '%s' command object" % command)
klass = self.get_command_class(command)
cmd_obj = self.command_obj[command] = klass(self)
self.have_run[command] = 0
# Set any options that were supplied in config files
# or on the command line. (NB. support for error
# reporting is lame here: any errors aren't reported
# until 'finalize_options()' is called, which means
# we won't report the source of the error.)
options = self.command_options.get(command)
if options:
self._set_command_options(cmd_obj, options)
return cmd_obj
def _set_command_options(self, command_obj, option_dict=None):
"""Set the options for 'command_obj' from 'option_dict'. Basically
this means copying elements of a dictionary ('option_dict') to
attributes of an instance ('command').
'command_obj' must be a Command instance. If 'option_dict' is not
supplied, uses the standard option dictionary for this command
(from 'self.command_options').
"""
command_name = command_obj.get_command_name()
if option_dict is None:
option_dict = self.get_option_dict(command_name)
if DEBUG:
self.announce(" setting options for '%s' command:" % command_name)
for (option, (source, value)) in option_dict.items():
if DEBUG:
self.announce(" %s = %s (from %s)" % (option, value,
source))
try:
bool_opts = map(translate_longopt, command_obj.boolean_options)
except AttributeError:
bool_opts = []
try:
neg_opt = command_obj.negative_opt
except AttributeError:
neg_opt = {}
try:
is_string = isinstance(value, str)
if option in neg_opt and is_string:
setattr(command_obj, neg_opt[option], not strtobool(value))
elif option in bool_opts and is_string:
setattr(command_obj, option, strtobool(value))
elif hasattr(command_obj, option):
setattr(command_obj, option, value)
else:
raise DistutilsOptionError, \
("error in %s: command '%s' has no such option '%s'"
% (source, command_name, option))
except ValueError, msg:
raise DistutilsOptionError, msg
def reinitialize_command(self, command, reinit_subcommands=0):
"""Reinitializes a command to the state it was in when first
returned by 'get_command_obj()': ie., initialized but not yet
finalized. This provides the opportunity to sneak option
values in programmatically, overriding or supplementing
user-supplied values from the config files and command line.
You'll have to re-finalize the command object (by calling
'finalize_options()' or 'ensure_finalized()') before using it for
real.
'command' should be a command name (string) or command object. If
'reinit_subcommands' is true, also reinitializes the command's
sub-commands, as declared by the 'sub_commands' class attribute (if
it has one). See the "install" command for an example. Only
reinitializes the sub-commands that actually matter, ie. those
whose test predicates return true.
Returns the reinitialized command object.
"""
from distutils.cmd import Command
if not isinstance(command, Command):
command_name = command
command = self.get_command_obj(command_name)
else:
command_name = command.get_command_name()
if not command.finalized:
return command
command.initialize_options()
command.finalized = 0
self.have_run[command_name] = 0
self._set_command_options(command)
if reinit_subcommands:
for sub in command.get_sub_commands():
self.reinitialize_command(sub, reinit_subcommands)
return command
# -- Methods that operate on the Distribution ----------------------
def announce(self, msg, level=log.INFO):
log.log(level, msg)
def run_commands(self):
"""Run each command that was seen on the setup script command line.
Uses the list of commands found and cache of command objects
created by 'get_command_obj()'.
"""
for cmd in self.commands:
self.run_command(cmd)
# -- Methods that operate on its Commands --------------------------
def run_command(self, command):
"""Do whatever it takes to run a command (including nothing at all,
if the command has already been run). Specifically: if we have
already created and run the command named by 'command', return
silently without doing anything. If the command named by 'command'
doesn't even have a command object yet, create one. Then invoke
'run()' on that command object (or an existing one).
"""
# Already been here, done that? then return silently.
if self.have_run.get(command):
return
log.info("running %s", command)
cmd_obj = self.get_command_obj(command)
cmd_obj.ensure_finalized()
cmd_obj.run()
self.have_run[command] = 1
# -- Distribution query methods ------------------------------------
def has_pure_modules(self):
return len(self.packages or self.py_modules or []) > 0
def has_ext_modules(self):
return self.ext_modules and len(self.ext_modules) > 0
def has_c_libraries(self):
return self.libraries and len(self.libraries) > 0
def has_modules(self):
return self.has_pure_modules() or self.has_ext_modules()
def has_headers(self):
return self.headers and len(self.headers) > 0
def has_scripts(self):
return self.scripts and len(self.scripts) > 0
def has_data_files(self):
return self.data_files and len(self.data_files) > 0
def is_pure(self):
return (self.has_pure_modules() and
not self.has_ext_modules() and
not self.has_c_libraries())
# -- Metadata query methods ----------------------------------------
# If you're looking for 'get_name()', 'get_version()', and so forth,
# they are defined in a sneaky way: the constructor binds self.get_XXX
# to self.metadata.get_XXX. The actual code is in the
# DistributionMetadata class, below.
class DistributionMetadata:
"""Dummy class to hold the distribution meta-data: name, version,
author, and so forth.
"""
_METHOD_BASENAMES = ("name", "version", "author", "author_email",
"maintainer", "maintainer_email", "url",
"license", "description", "long_description",
"keywords", "platforms", "fullname", "contact",
"contact_email", "license", "classifiers",
"download_url",
# PEP 314
"provides", "requires", "obsoletes",
)
def __init__(self, path=None):
if path is not None:
self.read_pkg_file(open(path))
else:
self.name = None
self.version = None
self.author = None
self.author_email = None
self.maintainer = None
self.maintainer_email = None
self.url = None
self.license = None
self.description = None
self.long_description = None
self.keywords = None
self.platforms = None
self.classifiers = None
self.download_url = None
# PEP 314
self.provides = None
self.requires = None
self.obsoletes = None
def read_pkg_file(self, file):
"""Reads the metadata values from a file object."""
msg = message_from_file(file)
def _read_field(name):
value = msg[name]
if value == 'UNKNOWN':
return None
return value
def _read_list(name):
values = msg.get_all(name, None)
if values == []:
return None
return values
metadata_version = msg['metadata-version']
self.name = _read_field('name')
self.version = _read_field('version')
self.description = _read_field('summary')
# we are filling author only.
self.author = _read_field('author')
self.maintainer = None
self.author_email = _read_field('author-email')
self.maintainer_email = None
self.url = _read_field('home-page')
self.license = _read_field('license')
if 'download-url' in msg:
self.download_url = _read_field('download-url')
else:
self.download_url = None
self.long_description = _read_field('description')
self.description = _read_field('summary')
if 'keywords' in msg:
self.keywords = _read_field('keywords').split(',')
self.platforms = _read_list('platform')
self.classifiers = _read_list('classifier')
# PEP 314 - these fields only exist in 1.1
if metadata_version == '1.1':
self.requires = _read_list('requires')
self.provides = _read_list('provides')
self.obsoletes = _read_list('obsoletes')
else:
self.requires = None
self.provides = None
self.obsoletes = None
def write_pkg_info(self, base_dir):
"""Write the PKG-INFO file into the release tree.
"""
pkg_info = open(os.path.join(base_dir, 'PKG-INFO'), 'w')
try:
self.write_pkg_file(pkg_info)
finally:
pkg_info.close()
def write_pkg_file(self, file):
"""Write the PKG-INFO format data to a file object.
"""
version = '1.0'
if (self.provides or self.requires or self.obsoletes or
self.classifiers or self.download_url):
version = '1.1'
self._write_field(file, 'Metadata-Version', version)
self._write_field(file, 'Name', self.get_name())
self._write_field(file, 'Version', self.get_version())
self._write_field(file, 'Summary', self.get_description())
self._write_field(file, 'Home-page', self.get_url())
self._write_field(file, 'Author', self.get_contact())
self._write_field(file, 'Author-email', self.get_contact_email())
self._write_field(file, 'License', self.get_license())
if self.download_url:
self._write_field(file, 'Download-URL', self.download_url)
long_desc = rfc822_escape(self.get_long_description())
self._write_field(file, 'Description', long_desc)
keywords = ','.join(self.get_keywords())
if keywords:
self._write_field(file, 'Keywords', keywords)
self._write_list(file, 'Platform', self.get_platforms())
self._write_list(file, 'Classifier', self.get_classifiers())
# PEP 314
self._write_list(file, 'Requires', self.get_requires())
self._write_list(file, 'Provides', self.get_provides())
self._write_list(file, 'Obsoletes', self.get_obsoletes())
def _write_field(self, file, name, value):
file.write('%s: %s\n' % (name, self._encode_field(value)))
def _write_list (self, file, name, values):
for value in values:
self._write_field(file, name, value)
def _encode_field(self, value):
if value is None:
return None
if isinstance(value, unicode):
return value.encode(PKG_INFO_ENCODING)
return str(value)
# -- Metadata query methods ----------------------------------------
def get_name(self):
return self.name or "UNKNOWN"
def get_version(self):
return self.version or "0.0.0"
def get_fullname(self):
return "%s-%s" % (self.get_name(), self.get_version())
def get_author(self):
return self._encode_field(self.author) or "UNKNOWN"
def get_author_email(self):
return self.author_email or "UNKNOWN"
def get_maintainer(self):
return self._encode_field(self.maintainer) or "UNKNOWN"
def get_maintainer_email(self):
return self.maintainer_email or "UNKNOWN"
def get_contact(self):
return (self._encode_field(self.maintainer) or
self._encode_field(self.author) or "UNKNOWN")
def get_contact_email(self):
return self.maintainer_email or self.author_email or "UNKNOWN"
def get_url(self):
return self.url or "UNKNOWN"
def get_license(self):
return self.license or "UNKNOWN"
get_licence = get_license
def get_description(self):
return self._encode_field(self.description) or "UNKNOWN"
def get_long_description(self):
return self._encode_field(self.long_description) or "UNKNOWN"
def get_keywords(self):
return self.keywords or []
def get_platforms(self):
return self.platforms or ["UNKNOWN"]
def get_classifiers(self):
return self.classifiers or []
def get_download_url(self):
return self.download_url or "UNKNOWN"
# PEP 314
def get_requires(self):
return self.requires or []
def set_requires(self, value):
import distutils.versionpredicate
for v in value:
distutils.versionpredicate.VersionPredicate(v)
self.requires = value
def get_provides(self):
return self.provides or []
def set_provides(self, value):
value = [v.strip() for v in value]
for v in value:
import distutils.versionpredicate
distutils.versionpredicate.split_provision(v)
self.provides = value
def get_obsoletes(self):
return self.obsoletes or []
def set_obsoletes(self, value):
import distutils.versionpredicate
for v in value:
distutils.versionpredicate.VersionPredicate(v)
self.obsoletes = value
def fix_help_options(options):
"""Convert a 4-tuple 'help_options' list as found in various command
classes to the 3-tuple form required by FancyGetopt.
"""
new_options = []
for help_tuple in options:
new_options.append(help_tuple[0:3])
return new_options
"""distutils.emxccompiler
Provides the EMXCCompiler class, a subclass of UnixCCompiler that
handles the EMX port of the GNU C compiler to OS/2.
"""
# issues:
#
# * OS/2 insists that DLLs can have names no longer than 8 characters
# We put export_symbols in a def-file, as though the DLL can have
# an arbitrary length name, but truncate the output filename.
#
# * only use OMF objects and use LINK386 as the linker (-Zomf)
#
# * always build for multithreading (-Zmt) as the accompanying OS/2 port
# of Python is only distributed with threads enabled.
#
# tested configurations:
#
# * EMX gcc 2.81/EMX 0.9d fix03
__revision__ = "$Id$"
import os,sys,copy
from distutils.ccompiler import gen_preprocess_options, gen_lib_options
from distutils.unixccompiler import UnixCCompiler
from distutils.file_util import write_file
from distutils.errors import DistutilsExecError, CompileError, UnknownFileError
from distutils import log
class EMXCCompiler (UnixCCompiler):
compiler_type = 'emx'
obj_extension = ".obj"
static_lib_extension = ".lib"
shared_lib_extension = ".dll"
static_lib_format = "%s%s"
shared_lib_format = "%s%s"
res_extension = ".res" # compiled resource file
exe_extension = ".exe"
def __init__ (self,
verbose=0,
dry_run=0,
force=0):
UnixCCompiler.__init__ (self, verbose, dry_run, force)
(status, details) = check_config_h()
self.debug_print("Python's GCC status: %s (details: %s)" %
(status, details))
if status is not CONFIG_H_OK:
self.warn(
"Python's pyconfig.h doesn't seem to support your compiler. " +
("Reason: %s." % details) +
"Compiling may fail because of undefined preprocessor macros.")
(self.gcc_version, self.ld_version) = \
get_versions()
self.debug_print(self.compiler_type + ": gcc %s, ld %s\n" %
(self.gcc_version,
self.ld_version) )
# Hard-code GCC because that's what this is all about.
# XXX optimization, warnings etc. should be customizable.
self.set_executables(compiler='gcc -Zomf -Zmt -O3 -fomit-frame-pointer -mprobe -Wall',
compiler_so='gcc -Zomf -Zmt -O3 -fomit-frame-pointer -mprobe -Wall',
linker_exe='gcc -Zomf -Zmt -Zcrtdll',
linker_so='gcc -Zomf -Zmt -Zcrtdll -Zdll')
# want the gcc library statically linked (so that we don't have
# to distribute a version dependent on the compiler we have)
self.dll_libraries=["gcc"]
# __init__ ()
def _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts):
if ext == '.rc':
# gcc requires '.rc' compiled to binary ('.res') files !!!
try:
self.spawn(["rc", "-r", src])
except DistutilsExecError, msg:
raise CompileError, msg
else: # for other files use the C-compiler
try:
self.spawn(self.compiler_so + cc_args + [src, '-o', obj] +
extra_postargs)
except DistutilsExecError, msg:
raise CompileError, msg
def link (self,
target_desc,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
# use separate copies, so we can modify the lists
extra_preargs = copy.copy(extra_preargs or [])
libraries = copy.copy(libraries or [])
objects = copy.copy(objects or [])
# Additional libraries
libraries.extend(self.dll_libraries)
# handle export symbols by creating a def-file
# with executables this only works with gcc/ld as linker
if ((export_symbols is not None) and
(target_desc != self.EXECUTABLE)):
# (The linker doesn't do anything if output is up-to-date.
# So it would probably better to check if we really need this,
# but for this we had to insert some unchanged parts of
# UnixCCompiler, and this is not what we want.)
# we want to put some files in the same directory as the
# object files are, build_temp doesn't help much
# where are the object files
temp_dir = os.path.dirname(objects[0])
# name of dll to give the helper files the same base name
(dll_name, dll_extension) = os.path.splitext(
os.path.basename(output_filename))
# generate the filenames for these files
def_file = os.path.join(temp_dir, dll_name + ".def")
# Generate .def file
contents = [
"LIBRARY %s INITINSTANCE TERMINSTANCE" % \
os.path.splitext(os.path.basename(output_filename))[0],
"DATA MULTIPLE NONSHARED",
"EXPORTS"]
for sym in export_symbols:
contents.append(' "%s"' % sym)
self.execute(write_file, (def_file, contents),
"writing %s" % def_file)
# next add options for def-file and to creating import libraries
# for gcc/ld the def-file is specified as any other object files
objects.append(def_file)
#end: if ((export_symbols is not None) and
# (target_desc != self.EXECUTABLE or self.linker_dll == "gcc")):
# who wants symbols and a many times larger output file
# should explicitly switch the debug mode on
# otherwise we let dllwrap/ld strip the output file
# (On my machine: 10KB < stripped_file < ??100KB
# unstripped_file = stripped_file + XXX KB
# ( XXX=254 for a typical python extension))
if not debug:
extra_preargs.append("-s")
UnixCCompiler.link(self,
target_desc,
objects,
output_filename,
output_dir,
libraries,
library_dirs,
runtime_library_dirs,
None, # export_symbols, we do this in our def-file
debug,
extra_preargs,
extra_postargs,
build_temp,
target_lang)
# link ()
# -- Miscellaneous methods -----------------------------------------
# override the object_filenames method from CCompiler to
# support rc and res-files
def object_filenames (self,
source_filenames,
strip_dir=0,
output_dir=''):
if output_dir is None: output_dir = ''
obj_names = []
for src_name in source_filenames:
# use normcase to make sure '.rc' is really '.rc' and not '.RC'
(base, ext) = os.path.splitext (os.path.normcase(src_name))
if ext not in (self.src_extensions + ['.rc']):
raise UnknownFileError, \
"unknown file type '%s' (from '%s')" % \
(ext, src_name)
if strip_dir:
base = os.path.basename (base)
if ext == '.rc':
# these need to be compiled to object files
obj_names.append (os.path.join (output_dir,
base + self.res_extension))
else:
obj_names.append (os.path.join (output_dir,
base + self.obj_extension))
return obj_names
# object_filenames ()
# override the find_library_file method from UnixCCompiler
# to deal with file naming/searching differences
def find_library_file(self, dirs, lib, debug=0):
shortlib = '%s.lib' % lib
longlib = 'lib%s.lib' % lib # this form very rare
# get EMX's default library directory search path
try:
emx_dirs = os.environ['LIBRARY_PATH'].split(';')
except KeyError:
emx_dirs = []
for dir in dirs + emx_dirs:
shortlibp = os.path.join(dir, shortlib)
longlibp = os.path.join(dir, longlib)
if os.path.exists(shortlibp):
return shortlibp
elif os.path.exists(longlibp):
return longlibp
# Oops, didn't find it in *any* of 'dirs'
return None
# class EMXCCompiler
# Because these compilers aren't configured in Python's pyconfig.h file by
# default, we should at least warn the user if he is using a unmodified
# version.
CONFIG_H_OK = "ok"
CONFIG_H_NOTOK = "not ok"
CONFIG_H_UNCERTAIN = "uncertain"
def check_config_h():
"""Check if the current Python installation (specifically, pyconfig.h)
appears amenable to building extensions with GCC. Returns a tuple
(status, details), where 'status' is one of the following constants:
CONFIG_H_OK
all is well, go ahead and compile
CONFIG_H_NOTOK
doesn't look good
CONFIG_H_UNCERTAIN
not sure -- unable to read pyconfig.h
'details' is a human-readable string explaining the situation.
Note there are two ways to conclude "OK": either 'sys.version' contains
the string "GCC" (implying that this Python was built with GCC), or the
installed "pyconfig.h" contains the string "__GNUC__".
"""
# XXX since this function also checks sys.version, it's not strictly a
# "pyconfig.h" check -- should probably be renamed...
from distutils import sysconfig
import string
# if sys.version contains GCC then python was compiled with
# GCC, and the pyconfig.h file should be OK
if string.find(sys.version,"GCC") >= 0:
return (CONFIG_H_OK, "sys.version mentions 'GCC'")
fn = sysconfig.get_config_h_filename()
try:
# It would probably better to read single lines to search.
# But we do this only once, and it is fast enough
f = open(fn)
try:
s = f.read()
finally:
f.close()
except IOError, exc:
# if we can't read this file, we cannot say it is wrong
# the compiler will complain later about this file as missing
return (CONFIG_H_UNCERTAIN,
"couldn't read '%s': %s" % (fn, exc.strerror))
else:
# "pyconfig.h" contains an "#ifdef __GNUC__" or something similar
if string.find(s,"__GNUC__") >= 0:
return (CONFIG_H_OK, "'%s' mentions '__GNUC__'" % fn)
else:
return (CONFIG_H_NOTOK, "'%s' does not mention '__GNUC__'" % fn)
def get_versions():
""" Try to find out the versions of gcc and ld.
If not possible it returns None for it.
"""
from distutils.version import StrictVersion
from distutils.spawn import find_executable
import re
gcc_exe = find_executable('gcc')
if gcc_exe:
out = os.popen(gcc_exe + ' -dumpversion','r')
try:
out_string = out.read()
finally:
out.close()
result = re.search('(\d+\.\d+\.\d+)',out_string)
if result:
gcc_version = StrictVersion(result.group(1))
else:
gcc_version = None
else:
gcc_version = None
# EMX ld has no way of reporting version number, and we use GCC
# anyway - so we can link OMF DLLs
ld_version = None
return (gcc_version, ld_version)
"""distutils.errors
Provides exceptions used by the Distutils modules. Note that Distutils
modules may raise standard exceptions; in particular, SystemExit is
usually raised for errors that are obviously the end-user's fault
(eg. bad command-line arguments).
This module is safe to use in "from ... import *" mode; it only exports
symbols whose names start with "Distutils" and end with "Error"."""
__revision__ = "$Id$"
class DistutilsError(Exception):
"""The root of all Distutils evil."""
class DistutilsModuleError(DistutilsError):
"""Unable to load an expected module, or to find an expected class
within some module (in particular, command modules and classes)."""
class DistutilsClassError(DistutilsError):
"""Some command class (or possibly distribution class, if anyone
feels a need to subclass Distribution) is found not to be holding
up its end of the bargain, ie. implementing some part of the
"command "interface."""
class DistutilsGetoptError(DistutilsError):
"""The option table provided to 'fancy_getopt()' is bogus."""
class DistutilsArgError(DistutilsError):
"""Raised by fancy_getopt in response to getopt.error -- ie. an
error in the command line usage."""
class DistutilsFileError(DistutilsError):
"""Any problems in the filesystem: expected file not found, etc.
Typically this is for problems that we detect before IOError or
OSError could be raised."""
class DistutilsOptionError(DistutilsError):
"""Syntactic/semantic errors in command options, such as use of
mutually conflicting options, or inconsistent options,
badly-spelled values, etc. No distinction is made between option
values originating in the setup script, the command line, config
files, or what-have-you -- but if we *know* something originated in
the setup script, we'll raise DistutilsSetupError instead."""
class DistutilsSetupError(DistutilsError):
"""For errors that can be definitely blamed on the setup script,
such as invalid keyword arguments to 'setup()'."""
class DistutilsPlatformError(DistutilsError):
"""We don't know how to do something on the current platform (but
we do know how to do it on some platform) -- eg. trying to compile
C files on a platform not supported by a CCompiler subclass."""
class DistutilsExecError(DistutilsError):
"""Any problems executing an external program (such as the C
compiler, when compiling C files)."""
class DistutilsInternalError(DistutilsError):
"""Internal inconsistencies or impossibilities (obviously, this
should never be seen if the code is working!)."""
class DistutilsTemplateError(DistutilsError):
"""Syntax error in a file list template."""
class DistutilsByteCompileError(DistutilsError):
"""Byte compile error."""
# Exception classes used by the CCompiler implementation classes
class CCompilerError(Exception):
"""Some compile/link operation failed."""
class PreprocessError(CCompilerError):
"""Failure to preprocess one or more C/C++ files."""
class CompileError(CCompilerError):
"""Failure to compile one or more C/C++ source files."""
class LibError(CCompilerError):
"""Failure to create a static library from one or more C/C++ object
files."""
class LinkError(CCompilerError):
"""Failure to link one or more C/C++ object files into an executable
or shared library file."""
class UnknownFileError(CCompilerError):
"""Attempt to process an unknown file type."""
"""distutils.extension
Provides the Extension class, used to describe C/C++ extension
modules in setup scripts."""
__revision__ = "$Id$"
import os, string, sys
from types import *
try:
import warnings
except ImportError:
warnings = None
# This class is really only used by the "build_ext" command, so it might
# make sense to put it in distutils.command.build_ext. However, that
# module is already big enough, and I want to make this class a bit more
# complex to simplify some common cases ("foo" module in "foo.c") and do
# better error-checking ("foo.c" actually exists).
#
# Also, putting this in build_ext.py means every setup script would have to
# import that large-ish module (indirectly, through distutils.core) in
# order to do anything.
class Extension:
"""Just a collection of attributes that describes an extension
module and everything needed to build it (hopefully in a portable
way, but there are hooks that let you be as unportable as you need).
Instance attributes:
name : string
the full name of the extension, including any packages -- ie.
*not* a filename or pathname, but Python dotted name
sources : [string]
list of source filenames, relative to the distribution root
(where the setup script lives), in Unix form (slash-separated)
for portability. Source files may be C, C++, SWIG (.i),
platform-specific resource files, or whatever else is recognized
by the "build_ext" command as source for a Python extension.
include_dirs : [string]
list of directories to search for C/C++ header files (in Unix
form for portability)
define_macros : [(name : string, value : string|None)]
list of macros to define; each macro is defined using a 2-tuple,
where 'value' is either the string to define it to or None to
define it without a particular value (equivalent of "#define
FOO" in source or -DFOO on Unix C compiler command line)
undef_macros : [string]
list of macros to undefine explicitly
library_dirs : [string]
list of directories to search for C/C++ libraries at link time
libraries : [string]
list of library names (not filenames or paths) to link against
runtime_library_dirs : [string]
list of directories to search for C/C++ libraries at run time
(for shared extensions, this is when the extension is loaded)
extra_objects : [string]
list of extra files to link with (eg. object files not implied
by 'sources', static library that must be explicitly specified,
binary resource files, etc.)
extra_compile_args : [string]
any extra platform- and compiler-specific information to use
when compiling the source files in 'sources'. For platforms and
compilers where "command line" makes sense, this is typically a
list of command-line arguments, but for other platforms it could
be anything.
extra_link_args : [string]
any extra platform- and compiler-specific information to use
when linking object files together to create the extension (or
to create a new static Python interpreter). Similar
interpretation as for 'extra_compile_args'.
export_symbols : [string]
list of symbols to be exported from a shared extension. Not
used on all platforms, and not generally necessary for Python
extensions, which typically export exactly one symbol: "init" +
extension_name.
swig_opts : [string]
any extra options to pass to SWIG if a source file has the .i
extension.
depends : [string]
list of files that the extension depends on
language : string
extension language (i.e. "c", "c++", "objc"). Will be detected
from the source extensions if not provided.
"""
# When adding arguments to this constructor, be sure to update
# setup_keywords in core.py.
def __init__ (self, name, sources,
include_dirs=None,
define_macros=None,
undef_macros=None,
library_dirs=None,
libraries=None,
runtime_library_dirs=None,
extra_objects=None,
extra_compile_args=None,
extra_link_args=None,
export_symbols=None,
swig_opts = None,
depends=None,
language=None,
**kw # To catch unknown keywords
):
assert type(name) is StringType, "'name' must be a string"
assert (type(sources) is ListType and
map(type, sources) == [StringType]*len(sources)), \
"'sources' must be a list of strings"
self.name = name
self.sources = sources
self.include_dirs = include_dirs or []
self.define_macros = define_macros or []
self.undef_macros = undef_macros or []
self.library_dirs = library_dirs or []
self.libraries = libraries or []
self.runtime_library_dirs = runtime_library_dirs or []
self.extra_objects = extra_objects or []
self.extra_compile_args = extra_compile_args or []
self.extra_link_args = extra_link_args or []
self.export_symbols = export_symbols or []
self.swig_opts = swig_opts or []
self.depends = depends or []
self.language = language
# If there are unknown keyword options, warn about them
if len(kw):
L = kw.keys() ; L.sort()
L = map(repr, L)
msg = "Unknown Extension options: " + string.join(L, ', ')
if warnings is not None:
warnings.warn(msg)
else:
sys.stderr.write(msg + '\n')
# class Extension
def read_setup_file (filename):
from distutils.sysconfig import \
parse_makefile, expand_makefile_vars, _variable_rx
from distutils.text_file import TextFile
from distutils.util import split_quoted
# First pass over the file to gather "VAR = VALUE" assignments.
vars = parse_makefile(filename)
# Second pass to gobble up the real content: lines of the form
# <module> ... [<sourcefile> ...] [<cpparg> ...] [<library> ...]
file = TextFile(filename,
strip_comments=1, skip_blanks=1, join_lines=1,
lstrip_ws=1, rstrip_ws=1)
try:
extensions = []
while 1:
line = file.readline()
if line is None: # eof
break
if _variable_rx.match(line): # VAR=VALUE, handled in first pass
continue
if line[0] == line[-1] == "*":
file.warn("'%s' lines not handled yet" % line)
continue
#print "original line: " + line
line = expand_makefile_vars(line, vars)
words = split_quoted(line)
#print "expanded line: " + line
# NB. this parses a slightly different syntax than the old
# makesetup script: here, there must be exactly one extension per
# line, and it must be the first word of the line. I have no idea
# why the old syntax supported multiple extensions per line, as
# they all wind up being the same.
module = words[0]
ext = Extension(module, [])
append_next_word = None
for word in words[1:]:
if append_next_word is not None:
append_next_word.append(word)
append_next_word = None
continue
suffix = os.path.splitext(word)[1]
switch = word[0:2] ; value = word[2:]
if suffix in (".c", ".cc", ".cpp", ".cxx", ".c++", ".m", ".mm"):
# hmm, should we do something about C vs. C++ sources?
# or leave it up to the CCompiler implementation to
# worry about?
ext.sources.append(word)
elif switch == "-I":
ext.include_dirs.append(value)
elif switch == "-D":
equals = string.find(value, "=")
if equals == -1: # bare "-DFOO" -- no value
ext.define_macros.append((value, None))
else: # "-DFOO=blah"
ext.define_macros.append((value[0:equals],
value[equals+2:]))
elif switch == "-U":
ext.undef_macros.append(value)
elif switch == "-C": # only here 'cause makesetup has it!
ext.extra_compile_args.append(word)
elif switch == "-l":
ext.libraries.append(value)
elif switch == "-L":
ext.library_dirs.append(value)
elif switch == "-R":
ext.runtime_library_dirs.append(value)
elif word == "-rpath":
append_next_word = ext.runtime_library_dirs
elif word == "-Xlinker":
append_next_word = ext.extra_link_args
elif word == "-Xcompiler":
append_next_word = ext.extra_compile_args
elif switch == "-u":
ext.extra_link_args.append(word)
if not value:
append_next_word = ext.extra_link_args
elif word == "-Xcompiler":
append_next_word = ext.extra_compile_args
elif switch == "-u":
ext.extra_link_args.append(word)
if not value:
append_next_word = ext.extra_link_args
elif suffix in (".a", ".so", ".sl", ".o", ".dylib"):
# NB. a really faithful emulation of makesetup would
# append a .o file to extra_objects only if it
# had a slash in it; otherwise, it would s/.o/.c/
# and append it to sources. Hmmmm.
ext.extra_objects.append(word)
else:
file.warn("unrecognized argument '%s'" % word)
extensions.append(ext)
finally:
file.close()
#print "module:", module
#print "source files:", source_files
#print "cpp args:", cpp_args
#print "lib args:", library_args
#extensions[module] = { 'sources': source_files,
# 'cpp_args': cpp_args,
# 'lib_args': library_args }
return extensions
# read_setup_file ()
"""distutils.fancy_getopt
Wrapper around the standard getopt module that provides the following
additional features:
* short and long options are tied together
* options have help strings, so fancy_getopt could potentially
create a complete usage summary
* options set attributes of a passed-in object
"""
__revision__ = "$Id$"
import sys
import string
import re
import getopt
from distutils.errors import DistutilsGetoptError, DistutilsArgError
# Much like command_re in distutils.core, this is close to but not quite
# the same as a Python NAME -- except, in the spirit of most GNU
# utilities, we use '-' in place of '_'. (The spirit of LISP lives on!)
# The similarities to NAME are again not a coincidence...
longopt_pat = r'[a-zA-Z](?:[a-zA-Z0-9-]*)'
longopt_re = re.compile(r'^%s$' % longopt_pat)
# For recognizing "negative alias" options, eg. "quiet=!verbose"
neg_alias_re = re.compile("^(%s)=!(%s)$" % (longopt_pat, longopt_pat))
# This is used to translate long options to legitimate Python identifiers
# (for use as attributes of some object).
longopt_xlate = string.maketrans('-', '_')
class FancyGetopt:
"""Wrapper around the standard 'getopt()' module that provides some
handy extra functionality:
* short and long options are tied together
* options have help strings, and help text can be assembled
from them
* options set attributes of a passed-in object
* boolean options can have "negative aliases" -- eg. if
--quiet is the "negative alias" of --verbose, then "--quiet"
on the command line sets 'verbose' to false
"""
def __init__ (self, option_table=None):
# The option table is (currently) a list of tuples. The
# tuples may have 3 or four values:
# (long_option, short_option, help_string [, repeatable])
# if an option takes an argument, its long_option should have '='
# appended; short_option should just be a single character, no ':'
# in any case. If a long_option doesn't have a corresponding
# short_option, short_option should be None. All option tuples
# must have long options.
self.option_table = option_table
# 'option_index' maps long option names to entries in the option
# table (ie. those 3-tuples).
self.option_index = {}
if self.option_table:
self._build_index()
# 'alias' records (duh) alias options; {'foo': 'bar'} means
# --foo is an alias for --bar
self.alias = {}
# 'negative_alias' keeps track of options that are the boolean
# opposite of some other option
self.negative_alias = {}
# These keep track of the information in the option table. We
# don't actually populate these structures until we're ready to
# parse the command-line, since the 'option_table' passed in here
# isn't necessarily the final word.
self.short_opts = []
self.long_opts = []
self.short2long = {}
self.attr_name = {}
self.takes_arg = {}
# And 'option_order' is filled up in 'getopt()'; it records the
# original order of options (and their values) on the command-line,
# but expands short options, converts aliases, etc.
self.option_order = []
# __init__ ()
def _build_index (self):
self.option_index.clear()
for option in self.option_table:
self.option_index[option[0]] = option
def set_option_table (self, option_table):
self.option_table = option_table
self._build_index()
def add_option (self, long_option, short_option=None, help_string=None):
if long_option in self.option_index:
raise DistutilsGetoptError, \
"option conflict: already an option '%s'" % long_option
else:
option = (long_option, short_option, help_string)
self.option_table.append(option)
self.option_index[long_option] = option
def has_option (self, long_option):
"""Return true if the option table for this parser has an
option with long name 'long_option'."""
return long_option in self.option_index
def get_attr_name (self, long_option):
"""Translate long option name 'long_option' to the form it
has as an attribute of some object: ie., translate hyphens
to underscores."""
return string.translate(long_option, longopt_xlate)
def _check_alias_dict (self, aliases, what):
assert isinstance(aliases, dict)
for (alias, opt) in aliases.items():
if alias not in self.option_index:
raise DistutilsGetoptError, \
("invalid %s '%s': "
"option '%s' not defined") % (what, alias, alias)
if opt not in self.option_index:
raise DistutilsGetoptError, \
("invalid %s '%s': "
"aliased option '%s' not defined") % (what, alias, opt)
def set_aliases (self, alias):
"""Set the aliases for this option parser."""
self._check_alias_dict(alias, "alias")
self.alias = alias
def set_negative_aliases (self, negative_alias):
"""Set the negative aliases for this option parser.
'negative_alias' should be a dictionary mapping option names to
option names, both the key and value must already be defined
in the option table."""
self._check_alias_dict(negative_alias, "negative alias")
self.negative_alias = negative_alias
def _grok_option_table (self):
"""Populate the various data structures that keep tabs on the
option table. Called by 'getopt()' before it can do anything
worthwhile.
"""
self.long_opts = []
self.short_opts = []
self.short2long.clear()
self.repeat = {}
for option in self.option_table:
if len(option) == 3:
long, short, help = option
repeat = 0
elif len(option) == 4:
long, short, help, repeat = option
else:
# the option table is part of the code, so simply
# assert that it is correct
raise ValueError, "invalid option tuple: %r" % (option,)
# Type- and value-check the option names
if not isinstance(long, str) or len(long) < 2:
raise DistutilsGetoptError, \
("invalid long option '%s': "
"must be a string of length >= 2") % long
if (not ((short is None) or
(isinstance(short, str) and len(short) == 1))):
raise DistutilsGetoptError, \
("invalid short option '%s': "
"must a single character or None") % short
self.repeat[long] = repeat
self.long_opts.append(long)
if long[-1] == '=': # option takes an argument?
if short: short = short + ':'
long = long[0:-1]
self.takes_arg[long] = 1
else:
# Is option is a "negative alias" for some other option (eg.
# "quiet" == "!verbose")?
alias_to = self.negative_alias.get(long)
if alias_to is not None:
if self.takes_arg[alias_to]:
raise DistutilsGetoptError, \
("invalid negative alias '%s': "
"aliased option '%s' takes a value") % \
(long, alias_to)
self.long_opts[-1] = long # XXX redundant?!
self.takes_arg[long] = 0
else:
self.takes_arg[long] = 0
# If this is an alias option, make sure its "takes arg" flag is
# the same as the option it's aliased to.
alias_to = self.alias.get(long)
if alias_to is not None:
if self.takes_arg[long] != self.takes_arg[alias_to]:
raise DistutilsGetoptError, \
("invalid alias '%s': inconsistent with "
"aliased option '%s' (one of them takes a value, "
"the other doesn't") % (long, alias_to)
# Now enforce some bondage on the long option name, so we can
# later translate it to an attribute name on some object. Have
# to do this a bit late to make sure we've removed any trailing
# '='.
if not longopt_re.match(long):
raise DistutilsGetoptError, \
("invalid long option name '%s' " +
"(must be letters, numbers, hyphens only") % long
self.attr_name[long] = self.get_attr_name(long)
if short:
self.short_opts.append(short)
self.short2long[short[0]] = long
# for option_table
# _grok_option_table()
def getopt (self, args=None, object=None):
"""Parse command-line options in args. Store as attributes on object.
If 'args' is None or not supplied, uses 'sys.argv[1:]'. If
'object' is None or not supplied, creates a new OptionDummy
object, stores option values there, and returns a tuple (args,
object). If 'object' is supplied, it is modified in place and
'getopt()' just returns 'args'; in both cases, the returned
'args' is a modified copy of the passed-in 'args' list, which
is left untouched.
"""
if args is None:
args = sys.argv[1:]
if object is None:
object = OptionDummy()
created_object = 1
else:
created_object = 0
self._grok_option_table()
short_opts = string.join(self.short_opts)
try:
opts, args = getopt.getopt(args, short_opts, self.long_opts)
except getopt.error, msg:
raise DistutilsArgError, msg
for opt, val in opts:
if len(opt) == 2 and opt[0] == '-': # it's a short option
opt = self.short2long[opt[1]]
else:
assert len(opt) > 2 and opt[:2] == '--'
opt = opt[2:]
alias = self.alias.get(opt)
if alias:
opt = alias
if not self.takes_arg[opt]: # boolean option?
assert val == '', "boolean option can't have value"
alias = self.negative_alias.get(opt)
if alias:
opt = alias
val = 0
else:
val = 1
attr = self.attr_name[opt]
# The only repeating option at the moment is 'verbose'.
# It has a negative option -q quiet, which should set verbose = 0.
if val and self.repeat.get(attr) is not None:
val = getattr(object, attr, 0) + 1
setattr(object, attr, val)
self.option_order.append((opt, val))
# for opts
if created_object:
return args, object
else:
return args
# getopt()
def get_option_order (self):
"""Returns the list of (option, value) tuples processed by the
previous run of 'getopt()'. Raises RuntimeError if
'getopt()' hasn't been called yet.
"""
if self.option_order is None:
raise RuntimeError, "'getopt()' hasn't been called yet"
else:
return self.option_order
def generate_help (self, header=None):
"""Generate help text (a list of strings, one per suggested line of
output) from the option table for this FancyGetopt object.
"""
# Blithely assume the option table is good: probably wouldn't call
# 'generate_help()' unless you've already called 'getopt()'.
# First pass: determine maximum length of long option names
max_opt = 0
for option in self.option_table:
long = option[0]
short = option[1]
l = len(long)
if long[-1] == '=':
l = l - 1
if short is not None:
l = l + 5 # " (-x)" where short == 'x'
if l > max_opt:
max_opt = l
opt_width = max_opt + 2 + 2 + 2 # room for indent + dashes + gutter
# Typical help block looks like this:
# --foo controls foonabulation
# Help block for longest option looks like this:
# --flimflam set the flim-flam level
# and with wrapped text:
# --flimflam set the flim-flam level (must be between
# 0 and 100, except on Tuesdays)
# Options with short names will have the short name shown (but
# it doesn't contribute to max_opt):
# --foo (-f) controls foonabulation
# If adding the short option would make the left column too wide,
# we push the explanation off to the next line
# --flimflam (-l)
# set the flim-flam level
# Important parameters:
# - 2 spaces before option block start lines
# - 2 dashes for each long option name
# - min. 2 spaces between option and explanation (gutter)
# - 5 characters (incl. space) for short option name
# Now generate lines of help text. (If 80 columns were good enough
# for Jesus, then 78 columns are good enough for me!)
line_width = 78
text_width = line_width - opt_width
big_indent = ' ' * opt_width
if header:
lines = [header]
else:
lines = ['Option summary:']
for option in self.option_table:
long, short, help = option[:3]
text = wrap_text(help, text_width)
if long[-1] == '=':
long = long[0:-1]
# Case 1: no short option at all (makes life easy)
if short is None:
if text:
lines.append(" --%-*s %s" % (max_opt, long, text[0]))
else:
lines.append(" --%-*s " % (max_opt, long))
# Case 2: we have a short option, so we have to include it
# just after the long option
else:
opt_names = "%s (-%s)" % (long, short)
if text:
lines.append(" --%-*s %s" %
(max_opt, opt_names, text[0]))
else:
lines.append(" --%-*s" % opt_names)
for l in text[1:]:
lines.append(big_indent + l)
# for self.option_table
return lines
# generate_help ()
def print_help (self, header=None, file=None):
if file is None:
file = sys.stdout
for line in self.generate_help(header):
file.write(line + "\n")
# class FancyGetopt
def fancy_getopt (options, negative_opt, object, args):
parser = FancyGetopt(options)
parser.set_negative_aliases(negative_opt)
return parser.getopt(args, object)
WS_TRANS = string.maketrans(string.whitespace, ' ' * len(string.whitespace))
def wrap_text (text, width):
"""wrap_text(text : string, width : int) -> [string]
Split 'text' into multiple lines of no more than 'width' characters
each, and return the list of strings that results.
"""
if text is None:
return []
if len(text) <= width:
return [text]
text = string.expandtabs(text)
text = string.translate(text, WS_TRANS)
chunks = re.split(r'( +|-+)', text)
chunks = filter(None, chunks) # ' - ' results in empty strings
lines = []
while chunks:
cur_line = [] # list of chunks (to-be-joined)
cur_len = 0 # length of current line
while chunks:
l = len(chunks[0])
if cur_len + l <= width: # can squeeze (at least) this chunk in
cur_line.append(chunks[0])
del chunks[0]
cur_len = cur_len + l
else: # this line is full
# drop last chunk if all space
if cur_line and cur_line[-1][0] == ' ':
del cur_line[-1]
break
if chunks: # any chunks left to process?
# if the current line is still empty, then we had a single
# chunk that's too big too fit on a line -- so we break
# down and break it up at the line width
if cur_len == 0:
cur_line.append(chunks[0][0:width])
chunks[0] = chunks[0][width:]
# all-whitespace chunks at the end of a line can be discarded
# (and we know from the re.split above that if a chunk has
# *any* whitespace, it is *all* whitespace)
if chunks[0][0] == ' ':
del chunks[0]
# and store this line in the list-of-all-lines -- as a single
# string, of course!
lines.append(string.join(cur_line, ''))
# while chunks
return lines
def translate_longopt(opt):
"""Convert a long option name to a valid Python identifier by
changing "-" to "_".
"""
return string.translate(opt, longopt_xlate)
class OptionDummy:
"""Dummy class just used as a place to hold command-line option
values as instance attributes."""
def __init__ (self, options=[]):
"""Create a new OptionDummy instance. The attributes listed in
'options' will be initialized to None."""
for opt in options:
setattr(self, opt, None)
"""distutils.filelist
Provides the FileList class, used for poking about the filesystem
and building lists of files.
"""
__revision__ = "$Id$"
import os, re
import fnmatch
from distutils.util import convert_path
from distutils.errors import DistutilsTemplateError, DistutilsInternalError
from distutils import log
class FileList:
"""A list of files built by on exploring the filesystem and filtered by
applying various patterns to what we find there.
Instance attributes:
dir
directory from which files will be taken -- only used if
'allfiles' not supplied to constructor
files
list of filenames currently being built/filtered/manipulated
allfiles
complete list of files under consideration (ie. without any
filtering applied)
"""
def __init__(self, warn=None, debug_print=None):
# ignore argument to FileList, but keep them for backwards
# compatibility
self.allfiles = None
self.files = []
def set_allfiles(self, allfiles):
self.allfiles = allfiles
def findall(self, dir=os.curdir):
self.allfiles = findall(dir)
def debug_print(self, msg):
"""Print 'msg' to stdout if the global DEBUG (taken from the
DISTUTILS_DEBUG environment variable) flag is true.
"""
from distutils.debug import DEBUG
if DEBUG:
print msg
# -- List-like methods ---------------------------------------------
def append(self, item):
self.files.append(item)
def extend(self, items):
self.files.extend(items)
def sort(self):
# Not a strict lexical sort!
sortable_files = map(os.path.split, self.files)
sortable_files.sort()
self.files = []
for sort_tuple in sortable_files:
self.files.append(os.path.join(*sort_tuple))
# -- Other miscellaneous utility methods ---------------------------
def remove_duplicates(self):
# Assumes list has been sorted!
for i in range(len(self.files) - 1, 0, -1):
if self.files[i] == self.files[i - 1]:
del self.files[i]
# -- "File template" methods ---------------------------------------
def _parse_template_line(self, line):
words = line.split()
action = words[0]
patterns = dir = dir_pattern = None
if action in ('include', 'exclude',
'global-include', 'global-exclude'):
if len(words) < 2:
raise DistutilsTemplateError, \
"'%s' expects <pattern1> <pattern2> ..." % action
patterns = map(convert_path, words[1:])
elif action in ('recursive-include', 'recursive-exclude'):
if len(words) < 3:
raise DistutilsTemplateError, \
"'%s' expects <dir> <pattern1> <pattern2> ..." % action
dir = convert_path(words[1])
patterns = map(convert_path, words[2:])
elif action in ('graft', 'prune'):
if len(words) != 2:
raise DistutilsTemplateError, \
"'%s' expects a single <dir_pattern>" % action
dir_pattern = convert_path(words[1])
else:
raise DistutilsTemplateError, "unknown action '%s'" % action
return (action, patterns, dir, dir_pattern)
def process_template_line(self, line):
# Parse the line: split it up, make sure the right number of words
# is there, and return the relevant words. 'action' is always
# defined: it's the first word of the line. Which of the other
# three are defined depends on the action; it'll be either
# patterns, (dir and patterns), or (dir_pattern).
action, patterns, dir, dir_pattern = self._parse_template_line(line)
# OK, now we know that the action is valid and we have the
# right number of words on the line for that action -- so we
# can proceed with minimal error-checking.
if action == 'include':
self.debug_print("include " + ' '.join(patterns))
for pattern in patterns:
if not self.include_pattern(pattern, anchor=1):
log.warn("warning: no files found matching '%s'",
pattern)
elif action == 'exclude':
self.debug_print("exclude " + ' '.join(patterns))
for pattern in patterns:
if not self.exclude_pattern(pattern, anchor=1):
log.warn(("warning: no previously-included files "
"found matching '%s'"), pattern)
elif action == 'global-include':
self.debug_print("global-include " + ' '.join(patterns))
for pattern in patterns:
if not self.include_pattern(pattern, anchor=0):
log.warn(("warning: no files found matching '%s' " +
"anywhere in distribution"), pattern)
elif action == 'global-exclude':
self.debug_print("global-exclude " + ' '.join(patterns))
for pattern in patterns:
if not self.exclude_pattern(pattern, anchor=0):
log.warn(("warning: no previously-included files matching "
"'%s' found anywhere in distribution"),
pattern)
elif action == 'recursive-include':
self.debug_print("recursive-include %s %s" %
(dir, ' '.join(patterns)))
for pattern in patterns:
if not self.include_pattern(pattern, prefix=dir):
log.warn(("warning: no files found matching '%s' " +
"under directory '%s'"),
pattern, dir)
elif action == 'recursive-exclude':
self.debug_print("recursive-exclude %s %s" %
(dir, ' '.join(patterns)))
for pattern in patterns:
if not self.exclude_pattern(pattern, prefix=dir):
log.warn(("warning: no previously-included files matching "
"'%s' found under directory '%s'"),
pattern, dir)
elif action == 'graft':
self.debug_print("graft " + dir_pattern)
if not self.include_pattern(None, prefix=dir_pattern):
log.warn("warning: no directories found matching '%s'",
dir_pattern)
elif action == 'prune':
self.debug_print("prune " + dir_pattern)
if not self.exclude_pattern(None, prefix=dir_pattern):
log.warn(("no previously-included directories found " +
"matching '%s'"), dir_pattern)
else:
raise DistutilsInternalError, \
"this cannot happen: invalid action '%s'" % action
# -- Filtering/selection methods -----------------------------------
def include_pattern(self, pattern, anchor=1, prefix=None, is_regex=0):
"""Select strings (presumably filenames) from 'self.files' that
match 'pattern', a Unix-style wildcard (glob) pattern.
Patterns are not quite the same as implemented by the 'fnmatch'
module: '*' and '?' match non-special characters, where "special"
is platform-dependent: slash on Unix; colon, slash, and backslash on
DOS/Windows; and colon on Mac OS.
If 'anchor' is true (the default), then the pattern match is more
stringent: "*.py" will match "foo.py" but not "foo/bar.py". If
'anchor' is false, both of these will match.
If 'prefix' is supplied, then only filenames starting with 'prefix'
(itself a pattern) and ending with 'pattern', with anything in between
them, will match. 'anchor' is ignored in this case.
If 'is_regex' is true, 'anchor' and 'prefix' are ignored, and
'pattern' is assumed to be either a string containing a regex or a
regex object -- no translation is done, the regex is just compiled
and used as-is.
Selected strings will be added to self.files.
Return 1 if files are found.
"""
# XXX docstring lying about what the special chars are?
files_found = 0
pattern_re = translate_pattern(pattern, anchor, prefix, is_regex)
self.debug_print("include_pattern: applying regex r'%s'" %
pattern_re.pattern)
# delayed loading of allfiles list
if self.allfiles is None:
self.findall()
for name in self.allfiles:
if pattern_re.search(name):
self.debug_print(" adding " + name)
self.files.append(name)
files_found = 1
return files_found
def exclude_pattern(self, pattern, anchor=1, prefix=None, is_regex=0):
"""Remove strings (presumably filenames) from 'files' that match
'pattern'.
Other parameters are the same as for 'include_pattern()', above.
The list 'self.files' is modified in place. Return 1 if files are
found.
"""
files_found = 0
pattern_re = translate_pattern(pattern, anchor, prefix, is_regex)
self.debug_print("exclude_pattern: applying regex r'%s'" %
pattern_re.pattern)
for i in range(len(self.files)-1, -1, -1):
if pattern_re.search(self.files[i]):
self.debug_print(" removing " + self.files[i])
del self.files[i]
files_found = 1
return files_found
# ----------------------------------------------------------------------
# Utility functions
def findall(dir = os.curdir):
"""Find all files under 'dir' and return the list of full filenames
(relative to 'dir').
"""
from stat import ST_MODE, S_ISREG, S_ISDIR, S_ISLNK
list = []
stack = [dir]
pop = stack.pop
push = stack.append
while stack:
dir = pop()
names = os.listdir(dir)
for name in names:
if dir != os.curdir: # avoid the dreaded "./" syndrome
fullname = os.path.join(dir, name)
else:
fullname = name
# Avoid excess stat calls -- just one will do, thank you!
stat = os.stat(fullname)
mode = stat[ST_MODE]
if S_ISREG(mode):
list.append(fullname)
elif S_ISDIR(mode) and not S_ISLNK(mode):
push(fullname)
return list
def glob_to_re(pattern):
"""Translate a shell-like glob pattern to a regular expression.
Return a string containing the regex. Differs from
'fnmatch.translate()' in that '*' does not match "special characters"
(which are platform-specific).
"""
pattern_re = fnmatch.translate(pattern)
# '?' and '*' in the glob pattern become '.' and '.*' in the RE, which
# IMHO is wrong -- '?' and '*' aren't supposed to match slash in Unix,
# and by extension they shouldn't match such "special characters" under
# any OS. So change all non-escaped dots in the RE to match any
# character except the special characters (currently: just os.sep).
sep = os.sep
if os.sep == '\\':
# we're using a regex to manipulate a regex, so we need
# to escape the backslash twice
sep = r'\\\\'
escaped = r'\1[^%s]' % sep
pattern_re = re.sub(r'((?<!\\)(\\\\)*)\.', escaped, pattern_re)
return pattern_re
def translate_pattern(pattern, anchor=1, prefix=None, is_regex=0):
"""Translate a shell-like wildcard pattern to a compiled regular
expression.
Return the compiled regex. If 'is_regex' true,
then 'pattern' is directly compiled to a regex (if it's a string)
or just returned as-is (assumes it's a regex object).
"""
if is_regex:
if isinstance(pattern, str):
return re.compile(pattern)
else:
return pattern
if pattern:
pattern_re = glob_to_re(pattern)
else:
pattern_re = ''
if prefix is not None:
# ditch end of pattern character
empty_pattern = glob_to_re('')
prefix_re = glob_to_re(prefix)[:-len(empty_pattern)]
sep = os.sep
if os.sep == '\\':
sep = r'\\'
pattern_re = "^" + sep.join((prefix_re, ".*" + pattern_re))
else: # no prefix -- respect anchor flag
if anchor:
pattern_re = "^" + pattern_re
return re.compile(pattern_re)
"""distutils.file_util
Utility functions for operating on single files.
"""
__revision__ = "$Id$"
import os
from distutils.errors import DistutilsFileError
from distutils import log
# for generating verbose output in 'copy_file()'
_copy_action = {None: 'copying',
'hard': 'hard linking',
'sym': 'symbolically linking'}
def _copy_file_contents(src, dst, buffer_size=16*1024):
"""Copy the file 'src' to 'dst'.
Both must be filenames. Any error opening either file, reading from
'src', or writing to 'dst', raises DistutilsFileError. Data is
read/written in chunks of 'buffer_size' bytes (default 16k). No attempt
is made to handle anything apart from regular files.
"""
# Stolen from shutil module in the standard library, but with
# custom error-handling added.
fsrc = None
fdst = None
try:
try:
fsrc = open(src, 'rb')
except os.error, (errno, errstr):
raise DistutilsFileError("could not open '%s': %s" % (src, errstr))
if os.path.exists(dst):
try:
os.unlink(dst)
except os.error, (errno, errstr):
raise DistutilsFileError(
"could not delete '%s': %s" % (dst, errstr))
try:
fdst = open(dst, 'wb')
except os.error, (errno, errstr):
raise DistutilsFileError(
"could not create '%s': %s" % (dst, errstr))
while 1:
try:
buf = fsrc.read(buffer_size)
except os.error, (errno, errstr):
raise DistutilsFileError(
"could not read from '%s': %s" % (src, errstr))
if not buf:
break
try:
fdst.write(buf)
except os.error, (errno, errstr):
raise DistutilsFileError(
"could not write to '%s': %s" % (dst, errstr))
finally:
if fdst:
fdst.close()
if fsrc:
fsrc.close()
def copy_file(src, dst, preserve_mode=1, preserve_times=1, update=0,
link=None, verbose=1, dry_run=0):
"""Copy a file 'src' to 'dst'.
If 'dst' is a directory, then 'src' is copied there with the same name;
otherwise, it must be a filename. (If the file exists, it will be
ruthlessly clobbered.) If 'preserve_mode' is true (the default),
the file's mode (type and permission bits, or whatever is analogous on
the current platform) is copied. If 'preserve_times' is true (the
default), the last-modified and last-access times are copied as well.
If 'update' is true, 'src' will only be copied if 'dst' does not exist,
or if 'dst' does exist but is older than 'src'.
'link' allows you to make hard links (os.link) or symbolic links
(os.symlink) instead of copying: set it to "hard" or "sym"; if it is
None (the default), files are copied. Don't set 'link' on systems that
don't support it: 'copy_file()' doesn't check if hard or symbolic
linking is available. If hardlink fails, falls back to
_copy_file_contents().
Under Mac OS, uses the native file copy function in macostools; on
other systems, uses '_copy_file_contents()' to copy file contents.
Return a tuple (dest_name, copied): 'dest_name' is the actual name of
the output file, and 'copied' is true if the file was copied (or would
have been copied, if 'dry_run' true).
"""
# XXX if the destination file already exists, we clobber it if
# copying, but blow up if linking. Hmmm. And I don't know what
# macostools.copyfile() does. Should definitely be consistent, and
# should probably blow up if destination exists and we would be
# changing it (ie. it's not already a hard/soft link to src OR
# (not update) and (src newer than dst).
from distutils.dep_util import newer
from stat import ST_ATIME, ST_MTIME, ST_MODE, S_IMODE
if not os.path.isfile(src):
raise DistutilsFileError(
"can't copy '%s': doesn't exist or not a regular file" % src)
if os.path.isdir(dst):
dir = dst
dst = os.path.join(dst, os.path.basename(src))
else:
dir = os.path.dirname(dst)
if update and not newer(src, dst):
if verbose >= 1:
log.debug("not copying %s (output up-to-date)", src)
return dst, 0
try:
action = _copy_action[link]
except KeyError:
raise ValueError("invalid value '%s' for 'link' argument" % link)
if verbose >= 1:
if os.path.basename(dst) == os.path.basename(src):
log.info("%s %s -> %s", action, src, dir)
else:
log.info("%s %s -> %s", action, src, dst)
if dry_run:
return (dst, 1)
# If linking (hard or symbolic), use the appropriate system call
# (Unix only, of course, but that's the caller's responsibility)
if link == 'hard':
if not (os.path.exists(dst) and os.path.samefile(src, dst)):
try:
os.link(src, dst)
return (dst, 1)
except OSError:
# If hard linking fails, fall back on copying file
# (some special filesystems don't support hard linking
# even under Unix, see issue #8876).
pass
elif link == 'sym':
if not (os.path.exists(dst) and os.path.samefile(src, dst)):
os.symlink(src, dst)
return (dst, 1)
# Otherwise (non-Mac, not linking), copy the file contents and
# (optionally) copy the times and mode.
_copy_file_contents(src, dst)
if preserve_mode or preserve_times:
st = os.stat(src)
# According to David Ascher <[email protected]>, utime() should be done
# before chmod() (at least under NT).
if preserve_times:
os.utime(dst, (st[ST_ATIME], st[ST_MTIME]))
if preserve_mode:
os.chmod(dst, S_IMODE(st[ST_MODE]))
return (dst, 1)
# XXX I suspect this is Unix-specific -- need porting help!
def move_file (src, dst, verbose=1, dry_run=0):
"""Move a file 'src' to 'dst'.
If 'dst' is a directory, the file will be moved into it with the same
name; otherwise, 'src' is just renamed to 'dst'. Return the new
full name of the file.
Handles cross-device moves on Unix using 'copy_file()'. What about
other systems???
"""
from os.path import exists, isfile, isdir, basename, dirname
import errno
if verbose >= 1:
log.info("moving %s -> %s", src, dst)
if dry_run:
return dst
if not isfile(src):
raise DistutilsFileError("can't move '%s': not a regular file" % src)
if isdir(dst):
dst = os.path.join(dst, basename(src))
elif exists(dst):
raise DistutilsFileError(
"can't move '%s': destination '%s' already exists" %
(src, dst))
if not isdir(dirname(dst)):
raise DistutilsFileError(
"can't move '%s': destination '%s' not a valid path" % \
(src, dst))
copy_it = 0
try:
os.rename(src, dst)
except os.error, (num, msg):
if num == errno.EXDEV:
copy_it = 1
else:
raise DistutilsFileError(
"couldn't move '%s' to '%s': %s" % (src, dst, msg))
if copy_it:
copy_file(src, dst, verbose=verbose)
try:
os.unlink(src)
except os.error, (num, msg):
try:
os.unlink(dst)
except os.error:
pass
raise DistutilsFileError(
("couldn't move '%s' to '%s' by copy/delete: " +
"delete '%s' failed: %s") %
(src, dst, src, msg))
return dst
def write_file (filename, contents):
"""Create a file with the specified name and write 'contents' (a
sequence of strings without line terminators) to it.
"""
f = open(filename, "w")
try:
for line in contents:
f.write(line + "\n")
finally:
f.close()
"""A simple log mechanism styled after PEP 282."""
# The class here is styled after PEP 282 so that it could later be
# replaced with a standard Python logging implementation.
DEBUG = 1
INFO = 2
WARN = 3
ERROR = 4
FATAL = 5
import sys
class Log:
def __init__(self, threshold=WARN):
self.threshold = threshold
def _log(self, level, msg, args):
if level not in (DEBUG, INFO, WARN, ERROR, FATAL):
raise ValueError('%s wrong log level' % str(level))
if level >= self.threshold:
if args:
msg = msg % args
if level in (WARN, ERROR, FATAL):
stream = sys.stderr
else:
stream = sys.stdout
stream.write('%s\n' % msg)
stream.flush()
def log(self, level, msg, *args):
self._log(level, msg, args)
def debug(self, msg, *args):
self._log(DEBUG, msg, args)
def info(self, msg, *args):
self._log(INFO, msg, args)
def warn(self, msg, *args):
self._log(WARN, msg, args)
def error(self, msg, *args):
self._log(ERROR, msg, args)
def fatal(self, msg, *args):
self._log(FATAL, msg, args)
_global_log = Log()
log = _global_log.log
debug = _global_log.debug
info = _global_log.info
warn = _global_log.warn
error = _global_log.error
fatal = _global_log.fatal
def set_threshold(level):
# return the old threshold for use from tests
old = _global_log.threshold
_global_log.threshold = level
return old
def set_verbosity(v):
if v <= 0:
set_threshold(WARN)
elif v == 1:
set_threshold(INFO)
elif v >= 2:
set_threshold(DEBUG)
"""distutils.msvccompiler
Contains MSVCCompiler, an implementation of the abstract CCompiler class
for the Microsoft Visual Studio.
"""
# Written by Perry Stoll
# hacked by Robin Becker and Thomas Heller to do a better job of
# finding DevStudio (through the registry)
__revision__ = "$Id$"
import sys
import os
import string
from distutils.errors import (DistutilsExecError, DistutilsPlatformError,
CompileError, LibError, LinkError)
from distutils.ccompiler import CCompiler, gen_lib_options
from distutils import log
_can_read_reg = 0
try:
import _winreg
_can_read_reg = 1
hkey_mod = _winreg
RegOpenKeyEx = _winreg.OpenKeyEx
RegEnumKey = _winreg.EnumKey
RegEnumValue = _winreg.EnumValue
RegError = _winreg.error
except ImportError:
try:
import win32api
import win32con
_can_read_reg = 1
hkey_mod = win32con
RegOpenKeyEx = win32api.RegOpenKeyEx
RegEnumKey = win32api.RegEnumKey
RegEnumValue = win32api.RegEnumValue
RegError = win32api.error
except ImportError:
log.info("Warning: Can't read registry to find the "
"necessary compiler setting\n"
"Make sure that Python modules _winreg, "
"win32api or win32con are installed.")
pass
if _can_read_reg:
HKEYS = (hkey_mod.HKEY_USERS,
hkey_mod.HKEY_CURRENT_USER,
hkey_mod.HKEY_LOCAL_MACHINE,
hkey_mod.HKEY_CLASSES_ROOT)
def read_keys(base, key):
"""Return list of registry keys."""
try:
handle = RegOpenKeyEx(base, key)
except RegError:
return None
L = []
i = 0
while 1:
try:
k = RegEnumKey(handle, i)
except RegError:
break
L.append(k)
i = i + 1
return L
def read_values(base, key):
"""Return dict of registry keys and values.
All names are converted to lowercase.
"""
try:
handle = RegOpenKeyEx(base, key)
except RegError:
return None
d = {}
i = 0
while 1:
try:
name, value, type = RegEnumValue(handle, i)
except RegError:
break
name = name.lower()
d[convert_mbcs(name)] = convert_mbcs(value)
i = i + 1
return d
def convert_mbcs(s):
enc = getattr(s, "encode", None)
if enc is not None:
try:
s = enc("mbcs")
except UnicodeError:
pass
return s
class MacroExpander:
def __init__(self, version):
self.macros = {}
self.load_macros(version)
def set_macro(self, macro, path, key):
for base in HKEYS:
d = read_values(base, path)
if d:
self.macros["$(%s)" % macro] = d[key]
break
def load_macros(self, version):
vsbase = r"Software\Microsoft\VisualStudio\%0.1f" % version
self.set_macro("VCInstallDir", vsbase + r"\Setup\VC", "productdir")
self.set_macro("VSInstallDir", vsbase + r"\Setup\VS", "productdir")
net = r"Software\Microsoft\.NETFramework"
self.set_macro("FrameworkDir", net, "installroot")
try:
if version > 7.0:
self.set_macro("FrameworkSDKDir", net, "sdkinstallrootv1.1")
else:
self.set_macro("FrameworkSDKDir", net, "sdkinstallroot")
except KeyError:
raise DistutilsPlatformError, \
("""Python was built with Visual Studio 2003;
extensions must be built with a compiler than can generate compatible binaries.
Visual Studio 2003 was not found on this system. If you have Cygwin installed,
you can try compiling with MingW32, by passing "-c mingw32" to setup.py.""")
p = r"Software\Microsoft\NET Framework Setup\Product"
for base in HKEYS:
try:
h = RegOpenKeyEx(base, p)
except RegError:
continue
key = RegEnumKey(h, 0)
d = read_values(base, r"%s\%s" % (p, key))
self.macros["$(FrameworkVersion)"] = d["version"]
def sub(self, s):
for k, v in self.macros.items():
s = string.replace(s, k, v)
return s
def get_build_version():
"""Return the version of MSVC that was used to build Python.
For Python 2.3 and up, the version number is included in
sys.version. For earlier versions, assume the compiler is MSVC 6.
"""
prefix = "MSC v."
i = string.find(sys.version, prefix)
if i == -1:
return 6
i = i + len(prefix)
s, rest = sys.version[i:].split(" ", 1)
majorVersion = int(s[:-2]) - 6
minorVersion = int(s[2:3]) / 10.0
# I don't think paths are affected by minor version in version 6
if majorVersion == 6:
minorVersion = 0
if majorVersion >= 6:
return majorVersion + minorVersion
# else we don't know what version of the compiler this is
return None
def get_build_architecture():
"""Return the processor architecture.
Possible results are "Intel", "Itanium", or "AMD64".
"""
prefix = " bit ("
i = string.find(sys.version, prefix)
if i == -1:
return "Intel"
j = string.find(sys.version, ")", i)
return sys.version[i+len(prefix):j]
def normalize_and_reduce_paths(paths):
"""Return a list of normalized paths with duplicates removed.
The current order of paths is maintained.
"""
# Paths are normalized so things like: /a and /a/ aren't both preserved.
reduced_paths = []
for p in paths:
np = os.path.normpath(p)
# XXX(nnorwitz): O(n**2), if reduced_paths gets long perhaps use a set.
if np not in reduced_paths:
reduced_paths.append(np)
return reduced_paths
class MSVCCompiler (CCompiler) :
"""Concrete class that implements an interface to Microsoft Visual C++,
as defined by the CCompiler abstract class."""
compiler_type = 'msvc'
# Just set this so CCompiler's constructor doesn't barf. We currently
# don't use the 'set_executables()' bureaucracy provided by CCompiler,
# as it really isn't necessary for this sort of single-compiler class.
# Would be nice to have a consistent interface with UnixCCompiler,
# though, so it's worth thinking about.
executables = {}
# Private class data (need to distinguish C from C++ source for compiler)
_c_extensions = ['.c']
_cpp_extensions = ['.cc', '.cpp', '.cxx']
_rc_extensions = ['.rc']
_mc_extensions = ['.mc']
# Needed for the filename generation methods provided by the
# base class, CCompiler.
src_extensions = (_c_extensions + _cpp_extensions +
_rc_extensions + _mc_extensions)
res_extension = '.res'
obj_extension = '.obj'
static_lib_extension = '.lib'
shared_lib_extension = '.dll'
static_lib_format = shared_lib_format = '%s%s'
exe_extension = '.exe'
def __init__ (self, verbose=0, dry_run=0, force=0):
CCompiler.__init__ (self, verbose, dry_run, force)
self.__version = get_build_version()
self.__arch = get_build_architecture()
if self.__arch == "Intel":
# x86
if self.__version >= 7:
self.__root = r"Software\Microsoft\VisualStudio"
self.__macros = MacroExpander(self.__version)
else:
self.__root = r"Software\Microsoft\Devstudio"
self.__product = "Visual Studio version %s" % self.__version
else:
# Win64. Assume this was built with the platform SDK
self.__product = "Microsoft SDK compiler %s" % (self.__version + 6)
self.initialized = False
def initialize(self):
self.__paths = []
if "DISTUTILS_USE_SDK" in os.environ and "MSSdk" in os.environ and self.find_exe("cl.exe"):
# Assume that the SDK set up everything alright; don't try to be
# smarter
self.cc = "cl.exe"
self.linker = "link.exe"
self.lib = "lib.exe"
self.rc = "rc.exe"
self.mc = "mc.exe"
else:
self.__paths = self.get_msvc_paths("path")
if len (self.__paths) == 0:
raise DistutilsPlatformError, \
("Python was built with %s, "
"and extensions need to be built with the same "
"version of the compiler, but it isn't installed." % self.__product)
self.cc = self.find_exe("cl.exe")
self.linker = self.find_exe("link.exe")
self.lib = self.find_exe("lib.exe")
self.rc = self.find_exe("rc.exe") # resource compiler
self.mc = self.find_exe("mc.exe") # message compiler
self.set_path_env_var('lib')
self.set_path_env_var('include')
# extend the MSVC path with the current path
try:
for p in string.split(os.environ['path'], ';'):
self.__paths.append(p)
except KeyError:
pass
self.__paths = normalize_and_reduce_paths(self.__paths)
os.environ['path'] = string.join(self.__paths, ';')
self.preprocess_options = None
if self.__arch == "Intel":
self.compile_options = [ '/nologo', '/Ox', '/MD', '/W3', '/GX' ,
'/DNDEBUG']
self.compile_options_debug = ['/nologo', '/Od', '/MDd', '/W3', '/GX',
'/Z7', '/D_DEBUG']
else:
# Win64
self.compile_options = [ '/nologo', '/Ox', '/MD', '/W3', '/GS-' ,
'/DNDEBUG']
self.compile_options_debug = ['/nologo', '/Od', '/MDd', '/W3', '/GS-',
'/Z7', '/D_DEBUG']
self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:NO']
if self.__version >= 7:
self.ldflags_shared_debug = [
'/DLL', '/nologo', '/INCREMENTAL:no', '/DEBUG'
]
else:
self.ldflags_shared_debug = [
'/DLL', '/nologo', '/INCREMENTAL:no', '/pdb:None', '/DEBUG'
]
self.ldflags_static = [ '/nologo']
self.initialized = True
# -- Worker methods ------------------------------------------------
def object_filenames (self,
source_filenames,
strip_dir=0,
output_dir=''):
# Copied from ccompiler.py, extended to return .res as 'object'-file
# for .rc input file
if output_dir is None: output_dir = ''
obj_names = []
for src_name in source_filenames:
(base, ext) = os.path.splitext (src_name)
base = os.path.splitdrive(base)[1] # Chop off the drive
base = base[os.path.isabs(base):] # If abs, chop off leading /
if ext not in self.src_extensions:
# Better to raise an exception instead of silently continuing
# and later complain about sources and targets having
# different lengths
raise CompileError ("Don't know how to compile %s" % src_name)
if strip_dir:
base = os.path.basename (base)
if ext in self._rc_extensions:
obj_names.append (os.path.join (output_dir,
base + self.res_extension))
elif ext in self._mc_extensions:
obj_names.append (os.path.join (output_dir,
base + self.res_extension))
else:
obj_names.append (os.path.join (output_dir,
base + self.obj_extension))
return obj_names
# object_filenames ()
def compile(self, sources,
output_dir=None, macros=None, include_dirs=None, debug=0,
extra_preargs=None, extra_postargs=None, depends=None):
if not self.initialized: self.initialize()
macros, objects, extra_postargs, pp_opts, build = \
self._setup_compile(output_dir, macros, include_dirs, sources,
depends, extra_postargs)
compile_opts = extra_preargs or []
compile_opts.append ('/c')
if debug:
compile_opts.extend(self.compile_options_debug)
else:
compile_opts.extend(self.compile_options)
for obj in objects:
try:
src, ext = build[obj]
except KeyError:
continue
if debug:
# pass the full pathname to MSVC in debug mode,
# this allows the debugger to find the source file
# without asking the user to browse for it
src = os.path.abspath(src)
if ext in self._c_extensions:
input_opt = "/Tc" + src
elif ext in self._cpp_extensions:
input_opt = "/Tp" + src
elif ext in self._rc_extensions:
# compile .RC to .RES file
input_opt = src
output_opt = "/fo" + obj
try:
self.spawn ([self.rc] + pp_opts +
[output_opt] + [input_opt])
except DistutilsExecError, msg:
raise CompileError, msg
continue
elif ext in self._mc_extensions:
# Compile .MC to .RC file to .RES file.
# * '-h dir' specifies the directory for the
# generated include file
# * '-r dir' specifies the target directory of the
# generated RC file and the binary message resource
# it includes
#
# For now (since there are no options to change this),
# we use the source-directory for the include file and
# the build directory for the RC file and message
# resources. This works at least for win32all.
h_dir = os.path.dirname (src)
rc_dir = os.path.dirname (obj)
try:
# first compile .MC to .RC and .H file
self.spawn ([self.mc] +
['-h', h_dir, '-r', rc_dir] + [src])
base, _ = os.path.splitext (os.path.basename (src))
rc_file = os.path.join (rc_dir, base + '.rc')
# then compile .RC to .RES file
self.spawn ([self.rc] +
["/fo" + obj] + [rc_file])
except DistutilsExecError, msg:
raise CompileError, msg
continue
else:
# how to handle this file?
raise CompileError (
"Don't know how to compile %s to %s" % \
(src, obj))
output_opt = "/Fo" + obj
try:
self.spawn ([self.cc] + compile_opts + pp_opts +
[input_opt, output_opt] +
extra_postargs)
except DistutilsExecError, msg:
raise CompileError, msg
return objects
# compile ()
def create_static_lib (self,
objects,
output_libname,
output_dir=None,
debug=0,
target_lang=None):
if not self.initialized: self.initialize()
(objects, output_dir) = self._fix_object_args (objects, output_dir)
output_filename = \
self.library_filename (output_libname, output_dir=output_dir)
if self._need_link (objects, output_filename):
lib_args = objects + ['/OUT:' + output_filename]
if debug:
pass # XXX what goes here?
try:
self.spawn ([self.lib] + lib_args)
except DistutilsExecError, msg:
raise LibError, msg
else:
log.debug("skipping %s (up-to-date)", output_filename)
# create_static_lib ()
def link (self,
target_desc,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
if not self.initialized: self.initialize()
(objects, output_dir) = self._fix_object_args (objects, output_dir)
(libraries, library_dirs, runtime_library_dirs) = \
self._fix_lib_args (libraries, library_dirs, runtime_library_dirs)
if runtime_library_dirs:
self.warn ("I don't know what to do with 'runtime_library_dirs': "
+ str (runtime_library_dirs))
lib_opts = gen_lib_options (self,
library_dirs, runtime_library_dirs,
libraries)
if output_dir is not None:
output_filename = os.path.join (output_dir, output_filename)
if self._need_link (objects, output_filename):
if target_desc == CCompiler.EXECUTABLE:
if debug:
ldflags = self.ldflags_shared_debug[1:]
else:
ldflags = self.ldflags_shared[1:]
else:
if debug:
ldflags = self.ldflags_shared_debug
else:
ldflags = self.ldflags_shared
export_opts = []
for sym in (export_symbols or []):
export_opts.append("/EXPORT:" + sym)
ld_args = (ldflags + lib_opts + export_opts +
objects + ['/OUT:' + output_filename])
# The MSVC linker generates .lib and .exp files, which cannot be
# suppressed by any linker switches. The .lib files may even be
# needed! Make sure they are generated in the temporary build
# directory. Since they have different names for debug and release
# builds, they can go into the same directory.
if export_symbols is not None:
(dll_name, dll_ext) = os.path.splitext(
os.path.basename(output_filename))
implib_file = os.path.join(
os.path.dirname(objects[0]),
self.library_filename(dll_name))
ld_args.append ('/IMPLIB:' + implib_file)
if extra_preargs:
ld_args[:0] = extra_preargs
if extra_postargs:
ld_args.extend(extra_postargs)
self.mkpath (os.path.dirname (output_filename))
try:
self.spawn ([self.linker] + ld_args)
except DistutilsExecError, msg:
raise LinkError, msg
else:
log.debug("skipping %s (up-to-date)", output_filename)
# link ()
# -- Miscellaneous methods -----------------------------------------
# These are all used by the 'gen_lib_options() function, in
# ccompiler.py.
def library_dir_option (self, dir):
return "/LIBPATH:" + dir
def runtime_library_dir_option (self, dir):
raise DistutilsPlatformError, \
"don't know how to set runtime library search path for MSVC++"
def library_option (self, lib):
return self.library_filename (lib)
def find_library_file (self, dirs, lib, debug=0):
# Prefer a debugging library if found (and requested), but deal
# with it if we don't have one.
if debug:
try_names = [lib + "_d", lib]
else:
try_names = [lib]
for dir in dirs:
for name in try_names:
libfile = os.path.join(dir, self.library_filename (name))
if os.path.exists(libfile):
return libfile
else:
# Oops, didn't find it in *any* of 'dirs'
return None
# find_library_file ()
# Helper methods for using the MSVC registry settings
def find_exe(self, exe):
"""Return path to an MSVC executable program.
Tries to find the program in several places: first, one of the
MSVC program search paths from the registry; next, the directories
in the PATH environment variable. If any of those work, return an
absolute path that is known to exist. If none of them work, just
return the original program name, 'exe'.
"""
for p in self.__paths:
fn = os.path.join(os.path.abspath(p), exe)
if os.path.isfile(fn):
return fn
# didn't find it; try existing path
for p in string.split(os.environ['Path'],';'):
fn = os.path.join(os.path.abspath(p),exe)
if os.path.isfile(fn):
return fn
return exe
def get_msvc_paths(self, path, platform='x86'):
"""Get a list of devstudio directories (include, lib or path).
Return a list of strings. The list will be empty if unable to
access the registry or appropriate registry keys not found.
"""
if not _can_read_reg:
return []
path = path + " dirs"
if self.__version >= 7:
key = (r"%s\%0.1f\VC\VC_OBJECTS_PLATFORM_INFO\Win32\Directories"
% (self.__root, self.__version))
else:
key = (r"%s\6.0\Build System\Components\Platforms"
r"\Win32 (%s)\Directories" % (self.__root, platform))
for base in HKEYS:
d = read_values(base, key)
if d:
if self.__version >= 7:
return string.split(self.__macros.sub(d[path]), ";")
else:
return string.split(d[path], ";")
# MSVC 6 seems to create the registry entries we need only when
# the GUI is run.
if self.__version == 6:
for base in HKEYS:
if read_values(base, r"%s\6.0" % self.__root) is not None:
self.warn("It seems you have Visual Studio 6 installed, "
"but the expected registry settings are not present.\n"
"You must at least run the Visual Studio GUI once "
"so that these entries are created.")
break
return []
def set_path_env_var(self, name):
"""Set environment variable 'name' to an MSVC path type value.
This is equivalent to a SET command prior to execution of spawned
commands.
"""
if name == "lib":
p = self.get_msvc_paths("library")
else:
p = self.get_msvc_paths(name)
if p:
os.environ[name] = string.join(p, ';')
if get_build_version() >= 8.0:
log.debug("Importing new compiler from distutils.msvc9compiler")
OldMSVCCompiler = MSVCCompiler
from distutils.msvc9compiler import MSVCCompiler
# get_build_architecture not really relevant now we support cross-compile
from distutils.msvc9compiler import MacroExpander
"""distutils.spawn
Provides the 'spawn()' function, a front-end to various platform-
specific functions for launching another program in a sub-process.
Also provides the 'find_executable()' to search the path for a given
executable name.
"""
__revision__ = "$Id$"
import sys
import os
from distutils.errors import DistutilsPlatformError, DistutilsExecError
from distutils.debug import DEBUG
from distutils import log
def spawn(cmd, search_path=1, verbose=0, dry_run=0):
"""Run another program, specified as a command list 'cmd', in a new process.
'cmd' is just the argument list for the new process, ie.
cmd[0] is the program to run and cmd[1:] are the rest of its arguments.
There is no way to run a program with a name different from that of its
executable.
If 'search_path' is true (the default), the system's executable
search path will be used to find the program; otherwise, cmd[0]
must be the exact path to the executable. If 'dry_run' is true,
the command will not actually be run.
Raise DistutilsExecError if running the program fails in any way; just
return on success.
"""
# cmd is documented as a list, but just in case some code passes a tuple
# in, protect our %-formatting code against horrible death
cmd = list(cmd)
if os.name == 'posix':
_spawn_posix(cmd, search_path, dry_run=dry_run)
elif os.name == 'nt':
_spawn_nt(cmd, search_path, dry_run=dry_run)
elif os.name == 'os2':
_spawn_os2(cmd, search_path, dry_run=dry_run)
else:
raise DistutilsPlatformError, \
"don't know how to spawn programs on platform '%s'" % os.name
def _nt_quote_args(args):
"""Quote command-line arguments for DOS/Windows conventions.
Just wraps every argument which contains blanks in double quotes, and
returns a new argument list.
"""
# XXX this doesn't seem very robust to me -- but if the Windows guys
# say it'll work, I guess I'll have to accept it. (What if an arg
# contains quotes? What other magic characters, other than spaces,
# have to be escaped? Is there an escaping mechanism other than
# quoting?)
for i, arg in enumerate(args):
if ' ' in arg:
args[i] = '"%s"' % arg
return args
def _spawn_nt(cmd, search_path=1, verbose=0, dry_run=0):
executable = cmd[0]
cmd = _nt_quote_args(cmd)
if search_path:
# either we find one or it stays the same
executable = find_executable(executable) or executable
log.info(' '.join([executable] + cmd[1:]))
if not dry_run:
# spawn for NT requires a full path to the .exe
try:
rc = os.spawnv(os.P_WAIT, executable, cmd)
except OSError, exc:
# this seems to happen when the command isn't found
if not DEBUG:
cmd = executable
raise DistutilsExecError, \
"command %r failed: %s" % (cmd, exc[-1])
if rc != 0:
# and this reflects the command running but failing
if not DEBUG:
cmd = executable
raise DistutilsExecError, \
"command %r failed with exit status %d" % (cmd, rc)
def _spawn_os2(cmd, search_path=1, verbose=0, dry_run=0):
executable = cmd[0]
if search_path:
# either we find one or it stays the same
executable = find_executable(executable) or executable
log.info(' '.join([executable] + cmd[1:]))
if not dry_run:
# spawnv for OS/2 EMX requires a full path to the .exe
try:
rc = os.spawnv(os.P_WAIT, executable, cmd)
except OSError, exc:
# this seems to happen when the command isn't found
if not DEBUG:
cmd = executable
raise DistutilsExecError, \
"command %r failed: %s" % (cmd, exc[-1])
if rc != 0:
# and this reflects the command running but failing
if not DEBUG:
cmd = executable
log.debug("command %r failed with exit status %d" % (cmd, rc))
raise DistutilsExecError, \
"command %r failed with exit status %d" % (cmd, rc)
if sys.platform == 'darwin':
from distutils import sysconfig
_cfg_target = None
_cfg_target_split = None
def _spawn_posix(cmd, search_path=1, verbose=0, dry_run=0):
log.info(' '.join(cmd))
if dry_run:
return
executable = cmd[0]
exec_fn = search_path and os.execvp or os.execv
env = None
if sys.platform == 'darwin':
global _cfg_target, _cfg_target_split
if _cfg_target is None:
_cfg_target = sysconfig.get_config_var(
'MACOSX_DEPLOYMENT_TARGET') or ''
if _cfg_target:
_cfg_target_split = [int(x) for x in _cfg_target.split('.')]
if _cfg_target:
# ensure that the deployment target of build process is not less
# than that used when the interpreter was built. This ensures
# extension modules are built with correct compatibility values
cur_target = os.environ.get('MACOSX_DEPLOYMENT_TARGET', _cfg_target)
if _cfg_target_split > [int(x) for x in cur_target.split('.')]:
my_msg = ('$MACOSX_DEPLOYMENT_TARGET mismatch: '
'now "%s" but "%s" during configure'
% (cur_target, _cfg_target))
raise DistutilsPlatformError(my_msg)
env = dict(os.environ,
MACOSX_DEPLOYMENT_TARGET=cur_target)
exec_fn = search_path and os.execvpe or os.execve
pid = os.fork()
if pid == 0: # in the child
try:
if env is None:
exec_fn(executable, cmd)
else:
exec_fn(executable, cmd, env)
except OSError, e:
if not DEBUG:
cmd = executable
sys.stderr.write("unable to execute %r: %s\n" %
(cmd, e.strerror))
os._exit(1)
if not DEBUG:
cmd = executable
sys.stderr.write("unable to execute %r for unknown reasons" % cmd)
os._exit(1)
else: # in the parent
# Loop until the child either exits or is terminated by a signal
# (ie. keep waiting if it's merely stopped)
while 1:
try:
pid, status = os.waitpid(pid, 0)
except OSError, exc:
import errno
if exc.errno == errno.EINTR:
continue
if not DEBUG:
cmd = executable
raise DistutilsExecError, \
"command %r failed: %s" % (cmd, exc[-1])
if os.WIFSIGNALED(status):
if not DEBUG:
cmd = executable
raise DistutilsExecError, \
"command %r terminated by signal %d" % \
(cmd, os.WTERMSIG(status))
elif os.WIFEXITED(status):
exit_status = os.WEXITSTATUS(status)
if exit_status == 0:
return # hey, it succeeded!
else:
if not DEBUG:
cmd = executable
raise DistutilsExecError, \
"command %r failed with exit status %d" % \
(cmd, exit_status)
elif os.WIFSTOPPED(status):
continue
else:
if not DEBUG:
cmd = executable
raise DistutilsExecError, \
"unknown error executing %r: termination status %d" % \
(cmd, status)
def find_executable(executable, path=None):
"""Tries to find 'executable' in the directories listed in 'path'.
A string listing directories separated by 'os.pathsep'; defaults to
os.environ['PATH']. Returns the complete filename or None if not found.
"""
if path is None:
path = os.environ.get('PATH', os.defpath)
paths = path.split(os.pathsep)
base, ext = os.path.splitext(executable)
if (sys.platform == 'win32' or os.name == 'os2') and (ext != '.exe'):
executable = executable + '.exe'
if not os.path.isfile(executable):
for p in paths:
f = os.path.join(p, executable)
if os.path.isfile(f):
# the file exists, we have a shot at spawn working
return f
return None
else:
return executable
"""Provide access to Python's configuration information. The specific
configuration variables available depend heavily on the platform and
configuration. The values may be retrieved using
get_config_var(name), and the list of variables is available via
get_config_vars().keys(). Additional convenience functions are also
available.
Written by: Fred L. Drake, Jr.
Email: <[email protected]>
"""
__revision__ = "$Id$"
import os
import re
import string
import sys
from distutils.errors import DistutilsPlatformError
# These are needed in a couple of spots, so just compute them once.
PREFIX = os.path.normpath(sys.prefix)
EXEC_PREFIX = os.path.normpath(sys.exec_prefix)
# Path to the base directory of the project. On Windows the binary may
# live in project/PCBuild9. If we're dealing with an x64 Windows build,
# it'll live in project/PCbuild/amd64.
if sys.executable:
project_base = os.path.dirname(os.path.abspath(sys.executable))
else:
# sys.executable can be empty if argv[0] has been changed and Python is
# unable to retrieve the real program name
project_base = os.getcwd()
if os.name == "nt" and "pcbuild" in project_base[-8:].lower():
project_base = os.path.abspath(os.path.join(project_base, os.path.pardir))
# PC/VS7.1
if os.name == "nt" and "\\pc\\v" in project_base[-10:].lower():
project_base = os.path.abspath(os.path.join(project_base, os.path.pardir,
os.path.pardir))
# PC/AMD64
if os.name == "nt" and "\\pcbuild\\amd64" in project_base[-14:].lower():
project_base = os.path.abspath(os.path.join(project_base, os.path.pardir,
os.path.pardir))
# set for cross builds
if "_PYTHON_PROJECT_BASE" in os.environ:
# this is the build directory, at least for posix
project_base = os.path.normpath(os.environ["_PYTHON_PROJECT_BASE"])
# python_build: (Boolean) if true, we're either building Python or
# building an extension with an un-installed Python, so we use
# different (hard-wired) directories.
# Setup.local is available for Makefile builds including VPATH builds,
# Setup.dist is available on Windows
def _python_build():
for fn in ("Setup.dist", "Setup.local"):
if os.path.isfile(os.path.join(project_base, "Modules", fn)):
return True
return False
python_build = _python_build()
def get_python_version():
"""Return a string containing the major and minor Python version,
leaving off the patchlevel. Sample return values could be '1.5'
or '2.2'.
"""
return sys.version[:3]
def get_python_inc(plat_specific=0, prefix=None):
"""Return the directory containing installed Python header files.
If 'plat_specific' is false (the default), this is the path to the
non-platform-specific header files, i.e. Python.h and so on;
otherwise, this is the path to platform-specific header files
(namely pyconfig.h).
If 'prefix' is supplied, use it instead of sys.prefix or
sys.exec_prefix -- i.e., ignore 'plat_specific'.
"""
if prefix is None:
prefix = plat_specific and EXEC_PREFIX or PREFIX
if os.name == "posix":
if python_build:
if sys.executable:
buildir = os.path.dirname(sys.executable)
else:
# sys.executable can be empty if argv[0] has been changed
# and Python is unable to retrieve the real program name
buildir = os.getcwd()
if plat_specific:
# python.h is located in the buildir
inc_dir = buildir
else:
# the source dir is relative to the buildir
srcdir = os.path.abspath(os.path.join(buildir,
get_config_var('srcdir')))
# Include is located in the srcdir
inc_dir = os.path.join(srcdir, "Include")
return inc_dir
return os.path.join(prefix, "include", "python" + get_python_version())
elif os.name == "nt":
return os.path.join(prefix, "include")
elif os.name == "os2":
return os.path.join(prefix, "Include")
else:
raise DistutilsPlatformError(
"I don't know where Python installs its C header files "
"on platform '%s'" % os.name)
def get_python_lib(plat_specific=0, standard_lib=0, prefix=None):
"""Return the directory containing the Python library (standard or
site additions).
If 'plat_specific' is true, return the directory containing
platform-specific modules, i.e. any module from a non-pure-Python
module distribution; otherwise, return the platform-shared library
directory. If 'standard_lib' is true, return the directory
containing standard Python library modules; otherwise, return the
directory for site-specific modules.
If 'prefix' is supplied, use it instead of sys.prefix or
sys.exec_prefix -- i.e., ignore 'plat_specific'.
"""
if prefix is None:
prefix = plat_specific and EXEC_PREFIX or PREFIX
if os.name == "posix":
libpython = os.path.join(prefix,
"lib", "python" + get_python_version())
if standard_lib:
return libpython
else:
return os.path.join(libpython, "site-packages")
elif os.name == "nt":
if standard_lib:
return os.path.join(prefix, "Lib")
else:
if get_python_version() < "2.2":
return prefix
else:
return os.path.join(prefix, "Lib", "site-packages")
elif os.name == "os2":
if standard_lib:
return os.path.join(prefix, "Lib")
else:
return os.path.join(prefix, "Lib", "site-packages")
else:
raise DistutilsPlatformError(
"I don't know where Python installs its library "
"on platform '%s'" % os.name)
def customize_compiler(compiler):
"""Do any platform-specific customization of a CCompiler instance.
Mainly needed on Unix, so we can plug in the information that
varies across Unices and is stored in Python's Makefile.
"""
if compiler.compiler_type == "unix":
if sys.platform == "darwin":
# Perform first-time customization of compiler-related
# config vars on OS X now that we know we need a compiler.
# This is primarily to support Pythons from binary
# installers. The kind and paths to build tools on
# the user system may vary significantly from the system
# that Python itself was built on. Also the user OS
# version and build tools may not support the same set
# of CPU architectures for universal builds.
global _config_vars
# Use get_config_var() to ensure _config_vars is initialized.
if not get_config_var('CUSTOMIZED_OSX_COMPILER'):
import _osx_support
_osx_support.customize_compiler(_config_vars)
_config_vars['CUSTOMIZED_OSX_COMPILER'] = 'True'
(cc, cxx, cflags, ccshared, ldshared, so_ext, ar, ar_flags) = \
get_config_vars('CC', 'CXX', 'CFLAGS',
'CCSHARED', 'LDSHARED', 'SO', 'AR',
'ARFLAGS')
if 'CC' in os.environ:
newcc = os.environ['CC']
if (sys.platform == 'darwin'
and 'LDSHARED' not in os.environ
and ldshared.startswith(cc)):
# On OS X, if CC is overridden, use that as the default
# command for LDSHARED as well
ldshared = newcc + ldshared[len(cc):]
cc = newcc
if 'CXX' in os.environ:
cxx = os.environ['CXX']
if 'LDSHARED' in os.environ:
ldshared = os.environ['LDSHARED']
if 'CPP' in os.environ:
cpp = os.environ['CPP']
else:
cpp = cc + " -E" # not always
if 'LDFLAGS' in os.environ:
ldshared = ldshared + ' ' + os.environ['LDFLAGS']
if 'CFLAGS' in os.environ:
cflags = cflags + ' ' + os.environ['CFLAGS']
ldshared = ldshared + ' ' + os.environ['CFLAGS']
if 'CPPFLAGS' in os.environ:
cpp = cpp + ' ' + os.environ['CPPFLAGS']
cflags = cflags + ' ' + os.environ['CPPFLAGS']
ldshared = ldshared + ' ' + os.environ['CPPFLAGS']
if 'AR' in os.environ:
ar = os.environ['AR']
if 'ARFLAGS' in os.environ:
archiver = ar + ' ' + os.environ['ARFLAGS']
else:
archiver = ar + ' ' + ar_flags
cc_cmd = cc + ' ' + cflags
compiler.set_executables(
preprocessor=cpp,
compiler=cc_cmd,
compiler_so=cc_cmd + ' ' + ccshared,
compiler_cxx=cxx,
linker_so=ldshared,
linker_exe=cc,
archiver=archiver)
compiler.shared_lib_extension = so_ext
def get_config_h_filename():
"""Return full pathname of installed pyconfig.h file."""
if python_build:
if os.name == "nt":
inc_dir = os.path.join(project_base, "PC")
else:
inc_dir = project_base
else:
inc_dir = get_python_inc(plat_specific=1)
if get_python_version() < '2.2':
config_h = 'config.h'
else:
# The name of the config.h file changed in 2.2
config_h = 'pyconfig.h'
return os.path.join(inc_dir, config_h)
def get_makefile_filename():
"""Return full pathname of installed Makefile from the Python build."""
if python_build:
return os.path.join(project_base, "Makefile")
lib_dir = get_python_lib(plat_specific=1, standard_lib=1)
return os.path.join(lib_dir, "config", "Makefile")
def parse_config_h(fp, g=None):
"""Parse a config.h-style file.
A dictionary containing name/value pairs is returned. If an
optional dictionary is passed in as the second argument, it is
used instead of a new dictionary.
"""
if g is None:
g = {}
define_rx = re.compile("#define ([A-Z][A-Za-z0-9_]+) (.*)\n")
undef_rx = re.compile("/[*] #undef ([A-Z][A-Za-z0-9_]+) [*]/\n")
#
while 1:
line = fp.readline()
if not line:
break
m = define_rx.match(line)
if m:
n, v = m.group(1, 2)
try: v = int(v)
except ValueError: pass
g[n] = v
else:
m = undef_rx.match(line)
if m:
g[m.group(1)] = 0
return g
# Regexes needed for parsing Makefile (and similar syntaxes,
# like old-style Setup files).
_variable_rx = re.compile("([a-zA-Z][a-zA-Z0-9_]+)\s*=\s*(.*)")
_findvar1_rx = re.compile(r"\$\(([A-Za-z][A-Za-z0-9_]*)\)")
_findvar2_rx = re.compile(r"\${([A-Za-z][A-Za-z0-9_]*)}")
def parse_makefile(fn, g=None):
"""Parse a Makefile-style file.
A dictionary containing name/value pairs is returned. If an
optional dictionary is passed in as the second argument, it is
used instead of a new dictionary.
"""
from distutils.text_file import TextFile
fp = TextFile(fn, strip_comments=1, skip_blanks=1, join_lines=1)
if g is None:
g = {}
done = {}
notdone = {}
while 1:
line = fp.readline()
if line is None: # eof
break
m = _variable_rx.match(line)
if m:
n, v = m.group(1, 2)
v = v.strip()
# `$$' is a literal `$' in make
tmpv = v.replace('$$', '')
if "$" in tmpv:
notdone[n] = v
else:
try:
v = int(v)
except ValueError:
# insert literal `$'
done[n] = v.replace('$$', '$')
else:
done[n] = v
# do variable interpolation here
while notdone:
for name in notdone.keys():
value = notdone[name]
m = _findvar1_rx.search(value) or _findvar2_rx.search(value)
if m:
n = m.group(1)
found = True
if n in done:
item = str(done[n])
elif n in notdone:
# get it on a subsequent round
found = False
elif n in os.environ:
# do it like make: fall back to environment
item = os.environ[n]
else:
done[n] = item = ""
if found:
after = value[m.end():]
value = value[:m.start()] + item + after
if "$" in after:
notdone[name] = value
else:
try: value = int(value)
except ValueError:
done[name] = value.strip()
else:
done[name] = value
del notdone[name]
else:
# bogus variable reference; just drop it since we can't deal
del notdone[name]
fp.close()
# strip spurious spaces
for k, v in done.items():
if isinstance(v, str):
done[k] = v.strip()
# save the results in the global dictionary
g.update(done)
return g
def expand_makefile_vars(s, vars):
"""Expand Makefile-style variables -- "${foo}" or "$(foo)" -- in
'string' according to 'vars' (a dictionary mapping variable names to
values). Variables not present in 'vars' are silently expanded to the
empty string. The variable values in 'vars' should not contain further
variable expansions; if 'vars' is the output of 'parse_makefile()',
you're fine. Returns a variable-expanded version of 's'.
"""
# This algorithm does multiple expansion, so if vars['foo'] contains
# "${bar}", it will expand ${foo} to ${bar}, and then expand
# ${bar}... and so forth. This is fine as long as 'vars' comes from
# 'parse_makefile()', which takes care of such expansions eagerly,
# according to make's variable expansion semantics.
while 1:
m = _findvar1_rx.search(s) or _findvar2_rx.search(s)
if m:
(beg, end) = m.span()
s = s[0:beg] + vars.get(m.group(1)) + s[end:]
else:
break
return s
_config_vars = None
def _init_posix():
"""Initialize the module as appropriate for POSIX systems."""
# _sysconfigdata is generated at build time, see the sysconfig module
from _sysconfigdata import build_time_vars
global _config_vars
_config_vars = {}
_config_vars.update(build_time_vars)
def _init_nt():
"""Initialize the module as appropriate for NT"""
g = {}
# set basic install directories
g['LIBDEST'] = get_python_lib(plat_specific=0, standard_lib=1)
g['BINLIBDEST'] = get_python_lib(plat_specific=1, standard_lib=1)
# XXX hmmm.. a normal install puts include files here
g['INCLUDEPY'] = get_python_inc(plat_specific=0)
g['SO'] = '.pyd'
g['EXE'] = ".exe"
g['VERSION'] = get_python_version().replace(".", "")
g['BINDIR'] = os.path.dirname(os.path.abspath(sys.executable))
global _config_vars
_config_vars = g
def _init_os2():
"""Initialize the module as appropriate for OS/2"""
g = {}
# set basic install directories
g['LIBDEST'] = get_python_lib(plat_specific=0, standard_lib=1)
g['BINLIBDEST'] = get_python_lib(plat_specific=1, standard_lib=1)
# XXX hmmm.. a normal install puts include files here
g['INCLUDEPY'] = get_python_inc(plat_specific=0)
g['SO'] = '.pyd'
g['EXE'] = ".exe"
global _config_vars
_config_vars = g
def get_config_vars(*args):
"""With no arguments, return a dictionary of all configuration
variables relevant for the current platform. Generally this includes
everything needed to build extensions and install both pure modules and
extensions. On Unix, this means every variable defined in Python's
installed Makefile; on Windows and Mac OS it's a much smaller set.
With arguments, return a list of values that result from looking up
each argument in the configuration variable dictionary.
"""
global _config_vars
if _config_vars is None:
func = globals().get("_init_" + os.name)
if func:
func()
else:
_config_vars = {}
# Normalized versions of prefix and exec_prefix are handy to have;
# in fact, these are the standard versions used most places in the
# Distutils.
_config_vars['prefix'] = PREFIX
_config_vars['exec_prefix'] = EXEC_PREFIX
# OS X platforms require special customization to handle
# multi-architecture, multi-os-version installers
if sys.platform == 'darwin':
import _osx_support
_osx_support.customize_config_vars(_config_vars)
if args:
vals = []
for name in args:
vals.append(_config_vars.get(name))
return vals
else:
return _config_vars
def get_config_var(name):
"""Return the value of a single variable using the dictionary
returned by 'get_config_vars()'. Equivalent to
get_config_vars().get(name)
"""
return get_config_vars().get(name)
"""text_file
provides the TextFile class, which gives an interface to text files
that (optionally) takes care of stripping comments, ignoring blank
lines, and joining lines with backslashes."""
__revision__ = "$Id$"
import sys
class TextFile:
"""Provides a file-like object that takes care of all the things you
commonly want to do when processing a text file that has some
line-by-line syntax: strip comments (as long as "#" is your
comment character), skip blank lines, join adjacent lines by
escaping the newline (ie. backslash at end of line), strip
leading and/or trailing whitespace. All of these are optional
and independently controllable.
Provides a 'warn()' method so you can generate warning messages that
report physical line number, even if the logical line in question
spans multiple physical lines. Also provides 'unreadline()' for
implementing line-at-a-time lookahead.
Constructor is called as:
TextFile (filename=None, file=None, **options)
It bombs (RuntimeError) if both 'filename' and 'file' are None;
'filename' should be a string, and 'file' a file object (or
something that provides 'readline()' and 'close()' methods). It is
recommended that you supply at least 'filename', so that TextFile
can include it in warning messages. If 'file' is not supplied,
TextFile creates its own using the 'open()' builtin.
The options are all boolean, and affect the value returned by
'readline()':
strip_comments [default: true]
strip from "#" to end-of-line, as well as any whitespace
leading up to the "#" -- unless it is escaped by a backslash
lstrip_ws [default: false]
strip leading whitespace from each line before returning it
rstrip_ws [default: true]
strip trailing whitespace (including line terminator!) from
each line before returning it
skip_blanks [default: true}
skip lines that are empty *after* stripping comments and
whitespace. (If both lstrip_ws and rstrip_ws are false,
then some lines may consist of solely whitespace: these will
*not* be skipped, even if 'skip_blanks' is true.)
join_lines [default: false]
if a backslash is the last non-newline character on a line
after stripping comments and whitespace, join the following line
to it to form one "logical line"; if N consecutive lines end
with a backslash, then N+1 physical lines will be joined to
form one logical line.
collapse_join [default: false]
strip leading whitespace from lines that are joined to their
predecessor; only matters if (join_lines and not lstrip_ws)
Note that since 'rstrip_ws' can strip the trailing newline, the
semantics of 'readline()' must differ from those of the builtin file
object's 'readline()' method! In particular, 'readline()' returns
None for end-of-file: an empty string might just be a blank line (or
an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is
not."""
default_options = { 'strip_comments': 1,
'skip_blanks': 1,
'lstrip_ws': 0,
'rstrip_ws': 1,
'join_lines': 0,
'collapse_join': 0,
}
def __init__ (self, filename=None, file=None, **options):
"""Construct a new TextFile object. At least one of 'filename'
(a string) and 'file' (a file-like object) must be supplied.
They keyword argument options are described above and affect
the values returned by 'readline()'."""
if filename is None and file is None:
raise RuntimeError, \
"you must supply either or both of 'filename' and 'file'"
# set values for all options -- either from client option hash
# or fallback to default_options
for opt in self.default_options.keys():
if opt in options:
setattr (self, opt, options[opt])
else:
setattr (self, opt, self.default_options[opt])
# sanity check client option hash
for opt in options.keys():
if opt not in self.default_options:
raise KeyError, "invalid TextFile option '%s'" % opt
if file is None:
self.open (filename)
else:
self.filename = filename
self.file = file
self.current_line = 0 # assuming that file is at BOF!
# 'linebuf' is a stack of lines that will be emptied before we
# actually read from the file; it's only populated by an
# 'unreadline()' operation
self.linebuf = []
def open (self, filename):
"""Open a new file named 'filename'. This overrides both the
'filename' and 'file' arguments to the constructor."""
self.filename = filename
self.file = open (self.filename, 'r')
self.current_line = 0
def close (self):
"""Close the current file and forget everything we know about it
(filename, current line number)."""
file = self.file
self.file = None
self.filename = None
self.current_line = None
file.close()
def gen_error (self, msg, line=None):
outmsg = []
if line is None:
line = self.current_line
outmsg.append(self.filename + ", ")
if isinstance(line, (list, tuple)):
outmsg.append("lines %d-%d: " % tuple (line))
else:
outmsg.append("line %d: " % line)
outmsg.append(str(msg))
return ''.join(outmsg)
def error (self, msg, line=None):
raise ValueError, "error: " + self.gen_error(msg, line)
def warn (self, msg, line=None):
"""Print (to stderr) a warning message tied to the current logical
line in the current file. If the current logical line in the
file spans multiple physical lines, the warning refers to the
whole range, eg. "lines 3-5". If 'line' supplied, it overrides
the current line number; it may be a list or tuple to indicate a
range of physical lines, or an integer for a single physical
line."""
sys.stderr.write("warning: " + self.gen_error(msg, line) + "\n")
def readline (self):
"""Read and return a single logical line from the current file (or
from an internal buffer if lines have previously been "unread"
with 'unreadline()'). If the 'join_lines' option is true, this
may involve reading multiple physical lines concatenated into a
single string. Updates the current line number, so calling
'warn()' after 'readline()' emits a warning about the physical
line(s) just read. Returns None on end-of-file, since the empty
string can occur if 'rstrip_ws' is true but 'strip_blanks' is
not."""
# If any "unread" lines waiting in 'linebuf', return the top
# one. (We don't actually buffer read-ahead data -- lines only
# get put in 'linebuf' if the client explicitly does an
# 'unreadline()'.
if self.linebuf:
line = self.linebuf[-1]
del self.linebuf[-1]
return line
buildup_line = ''
while 1:
# read the line, make it None if EOF
line = self.file.readline()
if line == '': line = None
if self.strip_comments and line:
# Look for the first "#" in the line. If none, never
# mind. If we find one and it's the first character, or
# is not preceded by "\", then it starts a comment --
# strip the comment, strip whitespace before it, and
# carry on. Otherwise, it's just an escaped "#", so
# unescape it (and any other escaped "#"'s that might be
# lurking in there) and otherwise leave the line alone.
pos = line.find("#")
if pos == -1: # no "#" -- no comments
pass
# It's definitely a comment -- either "#" is the first
# character, or it's elsewhere and unescaped.
elif pos == 0 or line[pos-1] != "\\":
# Have to preserve the trailing newline, because it's
# the job of a later step (rstrip_ws) to remove it --
# and if rstrip_ws is false, we'd better preserve it!
# (NB. this means that if the final line is all comment
# and has no trailing newline, we will think that it's
# EOF; I think that's OK.)
eol = (line[-1] == '\n') and '\n' or ''
line = line[0:pos] + eol
# If all that's left is whitespace, then skip line
# *now*, before we try to join it to 'buildup_line' --
# that way constructs like
# hello \\
# # comment that should be ignored
# there
# result in "hello there".
if line.strip() == "":
continue
else: # it's an escaped "#"
line = line.replace("\\#", "#")
# did previous line end with a backslash? then accumulate
if self.join_lines and buildup_line:
# oops: end of file
if line is None:
self.warn ("continuation line immediately precedes "
"end-of-file")
return buildup_line
if self.collapse_join:
line = line.lstrip()
line = buildup_line + line
# careful: pay attention to line number when incrementing it
if isinstance(self.current_line, list):
self.current_line[1] = self.current_line[1] + 1
else:
self.current_line = [self.current_line,
self.current_line+1]
# just an ordinary line, read it as usual
else:
if line is None: # eof
return None
# still have to be careful about incrementing the line number!
if isinstance(self.current_line, list):
self.current_line = self.current_line[1] + 1
else:
self.current_line = self.current_line + 1
# strip whitespace however the client wants (leading and
# trailing, or one or the other, or neither)
if self.lstrip_ws and self.rstrip_ws:
line = line.strip()
elif self.lstrip_ws:
line = line.lstrip()
elif self.rstrip_ws:
line = line.rstrip()
# blank line (whether we rstrip'ed or not)? skip to next line
# if appropriate
if (line == '' or line == '\n') and self.skip_blanks:
continue
if self.join_lines:
if line[-1] == '\\':
buildup_line = line[:-1]
continue
if line[-2:] == '\\\n':
buildup_line = line[0:-2] + '\n'
continue
# well, I guess there's some actual content there: return it
return line
# readline ()
def readlines (self):
"""Read and return the list of all logical lines remaining in the
current file."""
lines = []
while 1:
line = self.readline()
if line is None:
return lines
lines.append (line)
def unreadline (self, line):
"""Push 'line' (a string) onto an internal buffer that will be
checked by future 'readline()' calls. Handy for implementing
a parser with line-at-a-time lookahead."""
self.linebuf.append (line)
"""distutils.unixccompiler
Contains the UnixCCompiler class, a subclass of CCompiler that handles
the "typical" Unix-style command-line C compiler:
* macros defined with -Dname[=value]
* macros undefined with -Uname
* include search directories specified with -Idir
* libraries specified with -lllib
* library search directories specified with -Ldir
* compile handled by 'cc' (or similar) executable with -c option:
compiles .c to .o
* link static library handled by 'ar' command (possibly with 'ranlib')
* link shared library handled by 'cc -shared'
"""
__revision__ = "$Id$"
import os, sys, re
from types import StringType, NoneType
from distutils import sysconfig
from distutils.dep_util import newer
from distutils.ccompiler import \
CCompiler, gen_preprocess_options, gen_lib_options
from distutils.errors import \
DistutilsExecError, CompileError, LibError, LinkError
from distutils import log
if sys.platform == 'darwin':
import _osx_support
# XXX Things not currently handled:
# * optimization/debug/warning flags; we just use whatever's in Python's
# Makefile and live with it. Is this adequate? If not, we might
# have to have a bunch of subclasses GNUCCompiler, SGICCompiler,
# SunCCompiler, and I suspect down that road lies madness.
# * even if we don't know a warning flag from an optimization flag,
# we need some way for outsiders to feed preprocessor/compiler/linker
# flags in to us -- eg. a sysadmin might want to mandate certain flags
# via a site config file, or a user might want to set something for
# compiling this module distribution only via the setup.py command
# line, whatever. As long as these options come from something on the
# current system, they can be as system-dependent as they like, and we
# should just happily stuff them into the preprocessor/compiler/linker
# options and carry on.
class UnixCCompiler(CCompiler):
compiler_type = 'unix'
# These are used by CCompiler in two places: the constructor sets
# instance attributes 'preprocessor', 'compiler', etc. from them, and
# 'set_executable()' allows any of these to be set. The defaults here
# are pretty generic; they will probably have to be set by an outsider
# (eg. using information discovered by the sysconfig about building
# Python extensions).
executables = {'preprocessor' : None,
'compiler' : ["cc"],
'compiler_so' : ["cc"],
'compiler_cxx' : ["cc"],
'linker_so' : ["cc", "-shared"],
'linker_exe' : ["cc"],
'archiver' : ["ar", "-cr"],
'ranlib' : None,
}
if sys.platform[:6] == "darwin":
executables['ranlib'] = ["ranlib"]
# Needed for the filename generation methods provided by the base
# class, CCompiler. NB. whoever instantiates/uses a particular
# UnixCCompiler instance should set 'shared_lib_ext' -- we set a
# reasonable common default here, but it's not necessarily used on all
# Unices!
src_extensions = [".c",".C",".cc",".cxx",".cpp",".m"]
obj_extension = ".o"
static_lib_extension = ".a"
shared_lib_extension = ".so"
dylib_lib_extension = ".dylib"
xcode_stub_lib_extension = ".tbd"
static_lib_format = shared_lib_format = dylib_lib_format = "lib%s%s"
xcode_stub_lib_format = dylib_lib_format
if sys.platform == "cygwin":
exe_extension = ".exe"
def preprocess(self, source,
output_file=None, macros=None, include_dirs=None,
extra_preargs=None, extra_postargs=None):
ignore, macros, include_dirs = \
self._fix_compile_args(None, macros, include_dirs)
pp_opts = gen_preprocess_options(macros, include_dirs)
pp_args = self.preprocessor + pp_opts
if output_file:
pp_args.extend(['-o', output_file])
if extra_preargs:
pp_args[:0] = extra_preargs
if extra_postargs:
pp_args.extend(extra_postargs)
pp_args.append(source)
# We need to preprocess: either we're being forced to, or we're
# generating output to stdout, or there's a target output file and
# the source file is newer than the target (or the target doesn't
# exist).
if self.force or output_file is None or newer(source, output_file):
if output_file:
self.mkpath(os.path.dirname(output_file))
try:
self.spawn(pp_args)
except DistutilsExecError, msg:
raise CompileError, msg
def _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts):
compiler_so = self.compiler_so
if sys.platform == 'darwin':
compiler_so = _osx_support.compiler_fixup(compiler_so,
cc_args + extra_postargs)
try:
self.spawn(compiler_so + cc_args + [src, '-o', obj] +
extra_postargs)
except DistutilsExecError, msg:
raise CompileError, msg
def create_static_lib(self, objects, output_libname,
output_dir=None, debug=0, target_lang=None):
objects, output_dir = self._fix_object_args(objects, output_dir)
output_filename = \
self.library_filename(output_libname, output_dir=output_dir)
if self._need_link(objects, output_filename):
self.mkpath(os.path.dirname(output_filename))
self.spawn(self.archiver +
[output_filename] +
objects + self.objects)
# Not many Unices required ranlib anymore -- SunOS 4.x is, I
# think the only major Unix that does. Maybe we need some
# platform intelligence here to skip ranlib if it's not
# needed -- or maybe Python's configure script took care of
# it for us, hence the check for leading colon.
if self.ranlib:
try:
self.spawn(self.ranlib + [output_filename])
except DistutilsExecError, msg:
raise LibError, msg
else:
log.debug("skipping %s (up-to-date)", output_filename)
def link(self, target_desc, objects,
output_filename, output_dir=None, libraries=None,
library_dirs=None, runtime_library_dirs=None,
export_symbols=None, debug=0, extra_preargs=None,
extra_postargs=None, build_temp=None, target_lang=None):
objects, output_dir = self._fix_object_args(objects, output_dir)
libraries, library_dirs, runtime_library_dirs = \
self._fix_lib_args(libraries, library_dirs, runtime_library_dirs)
lib_opts = gen_lib_options(self, library_dirs, runtime_library_dirs,
libraries)
if type(output_dir) not in (StringType, NoneType):
raise TypeError, "'output_dir' must be a string or None"
if output_dir is not None:
output_filename = os.path.join(output_dir, output_filename)
if self._need_link(objects, output_filename):
ld_args = (objects + self.objects +
lib_opts + ['-o', output_filename])
if debug:
ld_args[:0] = ['-g']
if extra_preargs:
ld_args[:0] = extra_preargs
if extra_postargs:
ld_args.extend(extra_postargs)
self.mkpath(os.path.dirname(output_filename))
try:
if target_desc == CCompiler.EXECUTABLE:
linker = self.linker_exe[:]
else:
linker = self.linker_so[:]
if target_lang == "c++" and self.compiler_cxx:
# skip over environment variable settings if /usr/bin/env
# is used to set up the linker's environment.
# This is needed on OSX. Note: this assumes that the
# normal and C++ compiler have the same environment
# settings.
i = 0
if os.path.basename(linker[0]) == "env":
i = 1
while '=' in linker[i]:
i = i + 1
linker[i] = self.compiler_cxx[i]
if sys.platform == 'darwin':
linker = _osx_support.compiler_fixup(linker, ld_args)
self.spawn(linker + ld_args)
except DistutilsExecError, msg:
raise LinkError, msg
else:
log.debug("skipping %s (up-to-date)", output_filename)
# -- Miscellaneous methods -----------------------------------------
# These are all used by the 'gen_lib_options() function, in
# ccompiler.py.
def library_dir_option(self, dir):
return "-L" + dir
def _is_gcc(self, compiler_name):
return "gcc" in compiler_name or "g++" in compiler_name
def runtime_library_dir_option(self, dir):
# XXX Hackish, at the very least. See Python bug #445902:
# http://sourceforge.net/tracker/index.php
# ?func=detail&aid=445902&group_id=5470&atid=105470
# Linkers on different platforms need different options to
# specify that directories need to be added to the list of
# directories searched for dependencies when a dynamic library
# is sought. GCC has to be told to pass the -R option through
# to the linker, whereas other compilers just know this.
# Other compilers may need something slightly different. At
# this time, there's no way to determine this information from
# the configuration data stored in the Python installation, so
# we use this hack.
compiler = os.path.basename(sysconfig.get_config_var("CC"))
if sys.platform[:6] == "darwin":
# MacOSX's linker doesn't understand the -R flag at all
return "-L" + dir
elif sys.platform[:7] == "freebsd":
return "-Wl,-rpath=" + dir
elif sys.platform[:5] == "hp-ux":
if self._is_gcc(compiler):
return ["-Wl,+s", "-L" + dir]
return ["+s", "-L" + dir]
elif sys.platform[:7] == "irix646" or sys.platform[:6] == "osf1V5":
return ["-rpath", dir]
elif self._is_gcc(compiler):
return "-Wl,-R" + dir
else:
return "-R" + dir
def library_option(self, lib):
return "-l" + lib
def find_library_file(self, dirs, lib, debug=0):
shared_f = self.library_filename(lib, lib_type='shared')
dylib_f = self.library_filename(lib, lib_type='dylib')
xcode_stub_f = self.library_filename(lib, lib_type='xcode_stub')
static_f = self.library_filename(lib, lib_type='static')
if sys.platform == 'darwin':
# On OSX users can specify an alternate SDK using
# '-isysroot', calculate the SDK root if it is specified
# (and use it further on)
#
# Note that, as of Xcode 7, Apple SDKs may contain textual stub
# libraries with .tbd extensions rather than the normal .dylib
# shared libraries installed in /. The Apple compiler tool
# chain handles this transparently but it can cause problems
# for programs that are being built with an SDK and searching
# for specific libraries. Callers of find_library_file need to
# keep in mind that the base filename of the returned SDK library
# file might have a different extension from that of the library
# file installed on the running system, for example:
# /Applications/Xcode.app/Contents/Developer/Platforms/
# MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/
# usr/lib/libedit.tbd
# vs
# /usr/lib/libedit.dylib
cflags = sysconfig.get_config_var('CFLAGS')
m = re.search(r'-isysroot\s+(\S+)', cflags)
if m is None:
sysroot = '/'
else:
sysroot = m.group(1)
for dir in dirs:
shared = os.path.join(dir, shared_f)
dylib = os.path.join(dir, dylib_f)
static = os.path.join(dir, static_f)
xcode_stub = os.path.join(dir, xcode_stub_f)
if sys.platform == 'darwin' and (
dir.startswith('/System/') or (
dir.startswith('/usr/') and not dir.startswith('/usr/local/'))):
shared = os.path.join(sysroot, dir[1:], shared_f)
dylib = os.path.join(sysroot, dir[1:], dylib_f)
static = os.path.join(sysroot, dir[1:], static_f)
xcode_stub = os.path.join(sysroot, dir[1:], xcode_stub_f)
# We're second-guessing the linker here, with not much hard
# data to go on: GCC seems to prefer the shared library, so I'm
# assuming that *all* Unix C compilers do. And of course I'm
# ignoring even GCC's "-static" option. So sue me.
if os.path.exists(dylib):
return dylib
elif os.path.exists(xcode_stub):
return xcode_stub
elif os.path.exists(shared):
return shared
elif os.path.exists(static):
return static
# Oops, didn't find it in *any* of 'dirs'
return None
"""distutils.util
Miscellaneous utility functions -- anything that doesn't fit into
one of the other *util.py modules.
"""
__revision__ = "$Id$"
import sys, os, string, re
from distutils.errors import DistutilsPlatformError
from distutils.dep_util import newer
from distutils.spawn import spawn
from distutils import log
from distutils.errors import DistutilsByteCompileError
def get_platform ():
"""Return a string that identifies the current platform. This is used
mainly to distinguish platform-specific build directories and
platform-specific built distributions. Typically includes the OS name
and version and the architecture (as supplied by 'os.uname()'),
although the exact information included depends on the OS; eg. for IRIX
the architecture isn't particularly important (IRIX only runs on SGI
hardware), but for Linux the kernel version isn't particularly
important.
Examples of returned values:
linux-i586
linux-alpha (?)
solaris-2.6-sun4u
irix-5.3
irix64-6.2
Windows will return one of:
win-amd64 (64bit Windows on AMD64 (aka x86_64, Intel64, EM64T, etc)
win-ia64 (64bit Windows on Itanium)
win32 (all others - specifically, sys.platform is returned)
For other non-POSIX platforms, currently just returns 'sys.platform'.
"""
if os.name == 'nt':
# sniff sys.version for architecture.
prefix = " bit ("
i = string.find(sys.version, prefix)
if i == -1:
return sys.platform
j = string.find(sys.version, ")", i)
look = sys.version[i+len(prefix):j].lower()
if look=='amd64':
return 'win-amd64'
if look=='itanium':
return 'win-ia64'
return sys.platform
# Set for cross builds explicitly
if "_PYTHON_HOST_PLATFORM" in os.environ:
return os.environ["_PYTHON_HOST_PLATFORM"]
if os.name != "posix" or not hasattr(os, 'uname'):
# XXX what about the architecture? NT is Intel or Alpha,
# Mac OS is M68k or PPC, etc.
return sys.platform
# Try to distinguish various flavours of Unix
(osname, host, release, version, machine) = os.uname()
# Convert the OS name to lowercase, remove '/' characters
# (to accommodate BSD/OS), and translate spaces (for "Power Macintosh")
osname = string.lower(osname)
osname = string.replace(osname, '/', '')
machine = string.replace(machine, ' ', '_')
machine = string.replace(machine, '/', '-')
if osname[:5] == "linux":
# At least on Linux/Intel, 'machine' is the processor --
# i386, etc.
# XXX what about Alpha, SPARC, etc?
return "%s-%s" % (osname, machine)
elif osname[:5] == "sunos":
if release[0] >= "5": # SunOS 5 == Solaris 2
osname = "solaris"
release = "%d.%s" % (int(release[0]) - 3, release[2:])
# We can't use "platform.architecture()[0]" because a
# bootstrap problem. We use a dict to get an error
# if some suspicious happens.
bitness = {2147483647:"32bit", 9223372036854775807:"64bit"}
machine += ".%s" % bitness[sys.maxint]
# fall through to standard osname-release-machine representation
elif osname[:4] == "irix": # could be "irix64"!
return "%s-%s" % (osname, release)
elif osname[:3] == "aix":
return "%s-%s.%s" % (osname, version, release)
elif osname[:6] == "cygwin":
osname = "cygwin"
rel_re = re.compile (r'[\d.]+')
m = rel_re.match(release)
if m:
release = m.group()
elif osname[:6] == "darwin":
import _osx_support, distutils.sysconfig
osname, release, machine = _osx_support.get_platform_osx(
distutils.sysconfig.get_config_vars(),
osname, release, machine)
return "%s-%s-%s" % (osname, release, machine)
# get_platform ()
def convert_path (pathname):
"""Return 'pathname' as a name that will work on the native filesystem,
i.e. split it on '/' and put it back together again using the current
directory separator. Needed because filenames in the setup script are
always supplied in Unix style, and have to be converted to the local
convention before we can actually use them in the filesystem. Raises
ValueError on non-Unix-ish systems if 'pathname' either starts or
ends with a slash.
"""
if os.sep == '/':
return pathname
if not pathname:
return pathname
if pathname[0] == '/':
raise ValueError, "path '%s' cannot be absolute" % pathname
if pathname[-1] == '/':
raise ValueError, "path '%s' cannot end with '/'" % pathname
paths = string.split(pathname, '/')
while '.' in paths:
paths.remove('.')
if not paths:
return os.curdir
return os.path.join(*paths)
# convert_path ()
def change_root (new_root, pathname):
"""Return 'pathname' with 'new_root' prepended. If 'pathname' is
relative, this is equivalent to "os.path.join(new_root,pathname)".
Otherwise, it requires making 'pathname' relative and then joining the
two, which is tricky on DOS/Windows and Mac OS.
"""
if os.name == 'posix':
if not os.path.isabs(pathname):
return os.path.join(new_root, pathname)
else:
return os.path.join(new_root, pathname[1:])
elif os.name == 'nt':
(drive, path) = os.path.splitdrive(pathname)
if path[0] == '\\':
path = path[1:]
return os.path.join(new_root, path)
elif os.name == 'os2':
(drive, path) = os.path.splitdrive(pathname)
if path[0] == os.sep:
path = path[1:]
return os.path.join(new_root, path)
else:
raise DistutilsPlatformError, \
"nothing known about platform '%s'" % os.name
_environ_checked = 0
def check_environ ():
"""Ensure that 'os.environ' has all the environment variables we
guarantee that users can use in config files, command-line options,
etc. Currently this includes:
HOME - user's home directory (Unix only)
PLAT - description of the current platform, including hardware
and OS (see 'get_platform()')
"""
global _environ_checked
if _environ_checked:
return
if os.name == 'posix' and 'HOME' not in os.environ:
try:
import pwd
os.environ['HOME'] = pwd.getpwuid(os.getuid())[5]
except (ImportError, KeyError):
# bpo-10496: if the current user identifier doesn't exist in the
# password database, do nothing
pass
if 'PLAT' not in os.environ:
os.environ['PLAT'] = get_platform()
_environ_checked = 1
def subst_vars (s, local_vars):
"""Perform shell/Perl-style variable substitution on 'string'. Every
occurrence of '$' followed by a name is considered a variable, and
variable is substituted by the value found in the 'local_vars'
dictionary, or in 'os.environ' if it's not in 'local_vars'.
'os.environ' is first checked/augmented to guarantee that it contains
certain values: see 'check_environ()'. Raise ValueError for any
variables not found in either 'local_vars' or 'os.environ'.
"""
check_environ()
def _subst (match, local_vars=local_vars):
var_name = match.group(1)
if var_name in local_vars:
return str(local_vars[var_name])
else:
return os.environ[var_name]
try:
return re.sub(r'\$([a-zA-Z_][a-zA-Z_0-9]*)', _subst, s)
except KeyError, var:
raise ValueError, "invalid variable '$%s'" % var
# subst_vars ()
def grok_environment_error (exc, prefix="error: "):
# Function kept for backward compatibility.
# Used to try clever things with EnvironmentErrors,
# but nowadays str(exception) produces good messages.
return prefix + str(exc)
# Needed by 'split_quoted()'
_wordchars_re = _squote_re = _dquote_re = None
def _init_regex():
global _wordchars_re, _squote_re, _dquote_re
_wordchars_re = re.compile(r'[^\\\'\"%s ]*' % string.whitespace)
_squote_re = re.compile(r"'(?:[^'\\]|\\.)*'")
_dquote_re = re.compile(r'"(?:[^"\\]|\\.)*"')
def split_quoted (s):
"""Split a string up according to Unix shell-like rules for quotes and
backslashes. In short: words are delimited by spaces, as long as those
spaces are not escaped by a backslash, or inside a quoted string.
Single and double quotes are equivalent, and the quote characters can
be backslash-escaped. The backslash is stripped from any two-character
escape sequence, leaving only the escaped character. The quote
characters are stripped from any quoted string. Returns a list of
words.
"""
# This is a nice algorithm for splitting up a single string, since it
# doesn't require character-by-character examination. It was a little
# bit of a brain-bender to get it working right, though...
if _wordchars_re is None: _init_regex()
s = string.strip(s)
words = []
pos = 0
while s:
m = _wordchars_re.match(s, pos)
end = m.end()
if end == len(s):
words.append(s[:end])
break
if s[end] in string.whitespace: # unescaped, unquoted whitespace: now
words.append(s[:end]) # we definitely have a word delimiter
s = string.lstrip(s[end:])
pos = 0
elif s[end] == '\\': # preserve whatever is being escaped;
# will become part of the current word
s = s[:end] + s[end+1:]
pos = end+1
else:
if s[end] == "'": # slurp singly-quoted string
m = _squote_re.match(s, end)
elif s[end] == '"': # slurp doubly-quoted string
m = _dquote_re.match(s, end)
else:
raise RuntimeError, \
"this can't happen (bad char '%c')" % s[end]
if m is None:
raise ValueError, \
"bad string (mismatched %s quotes?)" % s[end]
(beg, end) = m.span()
s = s[:beg] + s[beg+1:end-1] + s[end:]
pos = m.end() - 2
if pos >= len(s):
words.append(s)
break
return words
# split_quoted ()
def execute (func, args, msg=None, verbose=0, dry_run=0):
"""Perform some action that affects the outside world (eg. by
writing to the filesystem). Such actions are special because they
are disabled by the 'dry_run' flag. This method takes care of all
that bureaucracy for you; all you have to do is supply the
function to call and an argument tuple for it (to embody the
"external action" being performed), and an optional message to
print.
"""
if msg is None:
msg = "%s%r" % (func.__name__, args)
if msg[-2:] == ',)': # correct for singleton tuple
msg = msg[0:-2] + ')'
log.info(msg)
if not dry_run:
func(*args)
def strtobool (val):
"""Convert a string representation of truth to true (1) or false (0).
True values are 'y', 'yes', 't', 'true', 'on', and '1'; false values
are 'n', 'no', 'f', 'false', 'off', and '0'. Raises ValueError if
'val' is anything else.
"""
val = string.lower(val)
if val in ('y', 'yes', 't', 'true', 'on', '1'):
return 1
elif val in ('n', 'no', 'f', 'false', 'off', '0'):
return 0
else:
raise ValueError, "invalid truth value %r" % (val,)
def byte_compile (py_files,
optimize=0, force=0,
prefix=None, base_dir=None,
verbose=1, dry_run=0,
direct=None):
"""Byte-compile a collection of Python source files to either .pyc
or .pyo files in the same directory. 'py_files' is a list of files
to compile; any files that don't end in ".py" are silently skipped.
'optimize' must be one of the following:
0 - don't optimize (generate .pyc)
1 - normal optimization (like "python -O")
2 - extra optimization (like "python -OO")
If 'force' is true, all files are recompiled regardless of
timestamps.
The source filename encoded in each bytecode file defaults to the
filenames listed in 'py_files'; you can modify these with 'prefix' and
'basedir'. 'prefix' is a string that will be stripped off of each
source filename, and 'base_dir' is a directory name that will be
prepended (after 'prefix' is stripped). You can supply either or both
(or neither) of 'prefix' and 'base_dir', as you wish.
If 'dry_run' is true, doesn't actually do anything that would
affect the filesystem.
Byte-compilation is either done directly in this interpreter process
with the standard py_compile module, or indirectly by writing a
temporary script and executing it. Normally, you should let
'byte_compile()' figure out to use direct compilation or not (see
the source for details). The 'direct' flag is used by the script
generated in indirect mode; unless you know what you're doing, leave
it set to None.
"""
# nothing is done if sys.dont_write_bytecode is True
if sys.dont_write_bytecode:
raise DistutilsByteCompileError('byte-compiling is disabled.')
# First, if the caller didn't force us into direct or indirect mode,
# figure out which mode we should be in. We take a conservative
# approach: choose direct mode *only* if the current interpreter is
# in debug mode and optimize is 0. If we're not in debug mode (-O
# or -OO), we don't know which level of optimization this
# interpreter is running with, so we can't do direct
# byte-compilation and be certain that it's the right thing. Thus,
# always compile indirectly if the current interpreter is in either
# optimize mode, or if either optimization level was requested by
# the caller.
if direct is None:
direct = (__debug__ and optimize == 0)
# "Indirect" byte-compilation: write a temporary script and then
# run it with the appropriate flags.
if not direct:
try:
from tempfile import mkstemp
(script_fd, script_name) = mkstemp(".py")
except ImportError:
from tempfile import mktemp
(script_fd, script_name) = None, mktemp(".py")
log.info("writing byte-compilation script '%s'", script_name)
if not dry_run:
if script_fd is not None:
script = os.fdopen(script_fd, "w")
else:
script = open(script_name, "w")
script.write("""\
from distutils.util import byte_compile
files = [
""")
# XXX would be nice to write absolute filenames, just for
# safety's sake (script should be more robust in the face of
# chdir'ing before running it). But this requires abspath'ing
# 'prefix' as well, and that breaks the hack in build_lib's
# 'byte_compile()' method that carefully tacks on a trailing
# slash (os.sep really) to make sure the prefix here is "just
# right". This whole prefix business is rather delicate -- the
# problem is that it's really a directory, but I'm treating it
# as a dumb string, so trailing slashes and so forth matter.
#py_files = map(os.path.abspath, py_files)
#if prefix:
# prefix = os.path.abspath(prefix)
script.write(string.join(map(repr, py_files), ",\n") + "]\n")
script.write("""
byte_compile(files, optimize=%r, force=%r,
prefix=%r, base_dir=%r,
verbose=%r, dry_run=0,
direct=1)
""" % (optimize, force, prefix, base_dir, verbose))
script.close()
cmd = [sys.executable, script_name]
if optimize == 1:
cmd.insert(1, "-O")
elif optimize == 2:
cmd.insert(1, "-OO")
spawn(cmd, dry_run=dry_run)
execute(os.remove, (script_name,), "removing %s" % script_name,
dry_run=dry_run)
# "Direct" byte-compilation: use the py_compile module to compile
# right here, right now. Note that the script generated in indirect
# mode simply calls 'byte_compile()' in direct mode, a weird sort of
# cross-process recursion. Hey, it works!
else:
from py_compile import compile
for file in py_files:
if file[-3:] != ".py":
# This lets us be lazy and not filter filenames in
# the "install_lib" command.
continue
# Terminology from the py_compile module:
# cfile - byte-compiled file
# dfile - purported source filename (same as 'file' by default)
cfile = file + (__debug__ and "c" or "o")
dfile = file
if prefix:
if file[:len(prefix)] != prefix:
raise ValueError, \
("invalid prefix: filename %r doesn't start with %r"
% (file, prefix))
dfile = dfile[len(prefix):]
if base_dir:
dfile = os.path.join(base_dir, dfile)
cfile_base = os.path.basename(cfile)
if direct:
if force or newer(file, cfile):
log.info("byte-compiling %s to %s", file, cfile_base)
if not dry_run:
compile(file, cfile, dfile)
else:
log.debug("skipping byte-compilation of %s to %s",
file, cfile_base)
# byte_compile ()
def rfc822_escape (header):
"""Return a version of the string escaped for inclusion in an
RFC-822 header, by ensuring there are 8 spaces space after each newline.
"""
lines = string.split(header, '\n')
header = string.join(lines, '\n' + 8*' ')
return header
#
# distutils/version.py
#
# Implements multiple version numbering conventions for the
# Python Module Distribution Utilities.
#
# $Id$
#
"""Provides classes to represent module version numbers (one class for
each style of version numbering). There are currently two such classes
implemented: StrictVersion and LooseVersion.
Every version number class implements the following interface:
* the 'parse' method takes a string and parses it to some internal
representation; if the string is an invalid version number,
'parse' raises a ValueError exception
* the class constructor takes an optional string argument which,
if supplied, is passed to 'parse'
* __str__ reconstructs the string that was passed to 'parse' (or
an equivalent string -- ie. one that will generate an equivalent
version number instance)
* __repr__ generates Python code to recreate the version number instance
* __cmp__ compares the current instance with either another instance
of the same class or a string (which will be parsed to an instance
of the same class, thus must follow the same rules)
"""
import string, re
from types import StringType
class Version:
"""Abstract base class for version numbering classes. Just provides
constructor (__init__) and reproducer (__repr__), because those
seem to be the same for all version numbering classes.
"""
def __init__ (self, vstring=None):
if vstring:
self.parse(vstring)
def __repr__ (self):
return "%s ('%s')" % (self.__class__.__name__, str(self))
# Interface for version-number classes -- must be implemented
# by the following classes (the concrete ones -- Version should
# be treated as an abstract class).
# __init__ (string) - create and take same action as 'parse'
# (string parameter is optional)
# parse (string) - convert a string representation to whatever
# internal representation is appropriate for
# this style of version numbering
# __str__ (self) - convert back to a string; should be very similar
# (if not identical to) the string supplied to parse
# __repr__ (self) - generate Python code to recreate
# the instance
# __cmp__ (self, other) - compare two version numbers ('other' may
# be an unparsed version string, or another
# instance of your version class)
class StrictVersion (Version):
"""Version numbering for anal retentives and software idealists.
Implements the standard interface for version number classes as
described above. A version number consists of two or three
dot-separated numeric components, with an optional "pre-release" tag
on the end. The pre-release tag consists of the letter 'a' or 'b'
followed by a number. If the numeric components of two version
numbers are equal, then one with a pre-release tag will always
be deemed earlier (lesser) than one without.
The following are valid version numbers (shown in the order that
would be obtained by sorting according to the supplied cmp function):
0.4 0.4.0 (these two are equivalent)
0.4.1
0.5a1
0.5b3
0.5
0.9.6
1.0
1.0.4a3
1.0.4b1
1.0.4
The following are examples of invalid version numbers:
1
2.7.2.2
1.3.a4
1.3pl1
1.3c4
The rationale for this version numbering system will be explained
in the distutils documentation.
"""
version_re = re.compile(r'^(\d+) \. (\d+) (\. (\d+))? ([ab](\d+))?$',
re.VERBOSE)
def parse (self, vstring):
match = self.version_re.match(vstring)
if not match:
raise ValueError, "invalid version number '%s'" % vstring
(major, minor, patch, prerelease, prerelease_num) = \
match.group(1, 2, 4, 5, 6)
if patch:
self.version = tuple(map(string.atoi, [major, minor, patch]))
else:
self.version = tuple(map(string.atoi, [major, minor]) + [0])
if prerelease:
self.prerelease = (prerelease[0], string.atoi(prerelease_num))
else:
self.prerelease = None
def __str__ (self):
if self.version[2] == 0:
vstring = string.join(map(str, self.version[0:2]), '.')
else:
vstring = string.join(map(str, self.version), '.')
if self.prerelease:
vstring = vstring + self.prerelease[0] + str(self.prerelease[1])
return vstring
def __cmp__ (self, other):
if isinstance(other, StringType):
other = StrictVersion(other)
compare = cmp(self.version, other.version)
if (compare == 0): # have to compare prerelease
# case 1: neither has prerelease; they're equal
# case 2: self has prerelease, other doesn't; other is greater
# case 3: self doesn't have prerelease, other does: self is greater
# case 4: both have prerelease: must compare them!
if (not self.prerelease and not other.prerelease):
return 0
elif (self.prerelease and not other.prerelease):
return -1
elif (not self.prerelease and other.prerelease):
return 1
elif (self.prerelease and other.prerelease):
return cmp(self.prerelease, other.prerelease)
else: # numeric versions don't match --
return compare # prerelease stuff doesn't matter
# end class StrictVersion
# The rules according to Greg Stein:
# 1) a version number has 1 or more numbers separated by a period or by
# sequences of letters. If only periods, then these are compared
# left-to-right to determine an ordering.
# 2) sequences of letters are part of the tuple for comparison and are
# compared lexicographically
# 3) recognize the numeric components may have leading zeroes
#
# The LooseVersion class below implements these rules: a version number
# string is split up into a tuple of integer and string components, and
# comparison is a simple tuple comparison. This means that version
# numbers behave in a predictable and obvious way, but a way that might
# not necessarily be how people *want* version numbers to behave. There
# wouldn't be a problem if people could stick to purely numeric version
# numbers: just split on period and compare the numbers as tuples.
# However, people insist on putting letters into their version numbers;
# the most common purpose seems to be:
# - indicating a "pre-release" version
# ('alpha', 'beta', 'a', 'b', 'pre', 'p')
# - indicating a post-release patch ('p', 'pl', 'patch')
# but of course this can't cover all version number schemes, and there's
# no way to know what a programmer means without asking him.
#
# The problem is what to do with letters (and other non-numeric
# characters) in a version number. The current implementation does the
# obvious and predictable thing: keep them as strings and compare
# lexically within a tuple comparison. This has the desired effect if
# an appended letter sequence implies something "post-release":
# eg. "0.99" < "0.99pl14" < "1.0", and "5.001" < "5.001m" < "5.002".
#
# However, if letters in a version number imply a pre-release version,
# the "obvious" thing isn't correct. Eg. you would expect that
# "1.5.1" < "1.5.2a2" < "1.5.2", but under the tuple/lexical comparison
# implemented here, this just isn't so.
#
# Two possible solutions come to mind. The first is to tie the
# comparison algorithm to a particular set of semantic rules, as has
# been done in the StrictVersion class above. This works great as long
# as everyone can go along with bondage and discipline. Hopefully a
# (large) subset of Python module programmers will agree that the
# particular flavour of bondage and discipline provided by StrictVersion
# provides enough benefit to be worth using, and will submit their
# version numbering scheme to its domination. The free-thinking
# anarchists in the lot will never give in, though, and something needs
# to be done to accommodate them.
#
# Perhaps a "moderately strict" version class could be implemented that
# lets almost anything slide (syntactically), and makes some heuristic
# assumptions about non-digits in version number strings. This could
# sink into special-case-hell, though; if I was as talented and
# idiosyncratic as Larry Wall, I'd go ahead and implement a class that
# somehow knows that "1.2.1" < "1.2.2a2" < "1.2.2" < "1.2.2pl3", and is
# just as happy dealing with things like "2g6" and "1.13++". I don't
# think I'm smart enough to do it right though.
#
# In any case, I've coded the test suite for this module (see
# ../test/test_version.py) specifically to fail on things like comparing
# "1.2a2" and "1.2". That's not because the *code* is doing anything
# wrong, it's because the simple, obvious design doesn't match my
# complicated, hairy expectations for real-world version numbers. It
# would be a snap to fix the test suite to say, "Yep, LooseVersion does
# the Right Thing" (ie. the code matches the conception). But I'd rather
# have a conception that matches common notions about version numbers.
class LooseVersion (Version):
"""Version numbering for anarchists and software realists.
Implements the standard interface for version number classes as
described above. A version number consists of a series of numbers,
separated by either periods or strings of letters. When comparing
version numbers, the numeric components will be compared
numerically, and the alphabetic components lexically. The following
are all valid version numbers, in no particular order:
1.5.1
1.5.2b2
161
3.10a
8.02
3.4j
1996.07.12
3.2.pl0
3.1.1.6
2g6
11g
0.960923
2.2beta29
1.13++
5.5.kw
2.0b1pl0
In fact, there is no such thing as an invalid version number under
this scheme; the rules for comparison are simple and predictable,
but may not always give the results you want (for some definition
of "want").
"""
component_re = re.compile(r'(\d+ | [a-z]+ | \.)', re.VERBOSE)
def __init__ (self, vstring=None):
if vstring:
self.parse(vstring)
def parse (self, vstring):
# I've given up on thinking I can reconstruct the version string
# from the parsed tuple -- so I just store the string here for
# use by __str__
self.vstring = vstring
components = filter(lambda x: x and x != '.',
self.component_re.split(vstring))
for i in range(len(components)):
try:
components[i] = int(components[i])
except ValueError:
pass
self.version = components
def __str__ (self):
return self.vstring
def __repr__ (self):
return "LooseVersion ('%s')" % str(self)
def __cmp__ (self, other):
if isinstance(other, StringType):
other = LooseVersion(other)
return cmp(self.version, other.version)
# end class LooseVersion
"""Module for parsing and testing package version predicate strings.
"""
import re
import distutils.version
import operator
re_validPackage = re.compile(r"(?i)^\s*([a-z_]\w*(?:\.[a-z_]\w*)*)(.*)")
# (package) (rest)
re_paren = re.compile(r"^\s*\((.*)\)\s*$") # (list) inside of parentheses
re_splitComparison = re.compile(r"^\s*(<=|>=|<|>|!=|==)\s*([^\s,]+)\s*$")
# (comp) (version)
def splitUp(pred):
"""Parse a single version comparison.
Return (comparison string, StrictVersion)
"""
res = re_splitComparison.match(pred)
if not res:
raise ValueError("bad package restriction syntax: %r" % pred)
comp, verStr = res.groups()
return (comp, distutils.version.StrictVersion(verStr))
compmap = {"<": operator.lt, "<=": operator.le, "==": operator.eq,
">": operator.gt, ">=": operator.ge, "!=": operator.ne}
class VersionPredicate:
"""Parse and test package version predicates.
>>> v = VersionPredicate('pyepat.abc (>1.0, <3333.3a1, !=1555.1b3)')
The `name` attribute provides the full dotted name that is given::
>>> v.name
'pyepat.abc'
The str() of a `VersionPredicate` provides a normalized
human-readable version of the expression::
>>> print v
pyepat.abc (> 1.0, < 3333.3a1, != 1555.1b3)
The `satisfied_by()` method can be used to determine with a given
version number is included in the set described by the version
restrictions::
>>> v.satisfied_by('1.1')
True
>>> v.satisfied_by('1.4')
True
>>> v.satisfied_by('1.0')
False
>>> v.satisfied_by('4444.4')
False
>>> v.satisfied_by('1555.1b3')
False
`VersionPredicate` is flexible in accepting extra whitespace::
>>> v = VersionPredicate(' pat( == 0.1 ) ')
>>> v.name
'pat'
>>> v.satisfied_by('0.1')
True
>>> v.satisfied_by('0.2')
False
If any version numbers passed in do not conform to the
restrictions of `StrictVersion`, a `ValueError` is raised::
>>> v = VersionPredicate('p1.p2.p3.p4(>=1.0, <=1.3a1, !=1.2zb3)')
Traceback (most recent call last):
...
ValueError: invalid version number '1.2zb3'
It the module or package name given does not conform to what's
allowed as a legal module or package name, `ValueError` is
raised::
>>> v = VersionPredicate('foo-bar')
Traceback (most recent call last):
...
ValueError: expected parenthesized list: '-bar'
>>> v = VersionPredicate('foo bar (12.21)')
Traceback (most recent call last):
...
ValueError: expected parenthesized list: 'bar (12.21)'
"""
def __init__(self, versionPredicateStr):
"""Parse a version predicate string.
"""
# Fields:
# name: package name
# pred: list of (comparison string, StrictVersion)
versionPredicateStr = versionPredicateStr.strip()
if not versionPredicateStr:
raise ValueError("empty package restriction")
match = re_validPackage.match(versionPredicateStr)
if not match:
raise ValueError("bad package name in %r" % versionPredicateStr)
self.name, paren = match.groups()
paren = paren.strip()
if paren:
match = re_paren.match(paren)
if not match:
raise ValueError("expected parenthesized list: %r" % paren)
str = match.groups()[0]
self.pred = [splitUp(aPred) for aPred in str.split(",")]
if not self.pred:
raise ValueError("empty parenthesized list in %r"
% versionPredicateStr)
else:
self.pred = []
def __str__(self):
if self.pred:
seq = [cond + " " + str(ver) for cond, ver in self.pred]
return self.name + " (" + ", ".join(seq) + ")"
else:
return self.name
def satisfied_by(self, version):
"""True if version is compatible with all the predicates in self.
The parameter version must be acceptable to the StrictVersion
constructor. It may be either a string or StrictVersion.
"""
for cond, ver in self.pred:
if not compmap[cond](version, ver):
return False
return True
_provision_rx = None
def split_provision(value):
"""Return the name and optional version number of a provision.
The version number, if given, will be returned as a `StrictVersion`
instance, otherwise it will be `None`.
>>> split_provision('mypkg')
('mypkg', None)
>>> split_provision(' mypkg( 1.2 ) ')
('mypkg', StrictVersion ('1.2'))
"""
global _provision_rx
if _provision_rx is None:
_provision_rx = re.compile(
"([a-zA-Z_]\w*(?:\.[a-zA-Z_]\w*)*)(?:\s*\(\s*([^)\s]+)\s*\))?$")
value = value.strip()
m = _provision_rx.match(value)
if not m:
raise ValueError("illegal provides specification: %r" % value)
ver = m.group(2) or None
if ver:
ver = distutils.version.StrictVersion(ver)
return m.group(1), ver
"""distutils
The main package for the Python Module Distribution Utilities. Normally
used from a setup script as
from distutils.core import setup
setup (...)
"""
import sys
__version__ = sys.version[:sys.version.index(' ')]
# Module doctest.
# Released to the public domain 16-Jan-2001, by Tim Peters ([email protected]).
# Major enhancements and refactoring by:
# Jim Fulton
# Edward Loper
# Provided as-is; use at your own risk; no warranty; no promises; enjoy!
r"""Module doctest -- a framework for running examples in docstrings.
In simplest use, end each module M to be tested with:
def _test():
import doctest
doctest.testmod()
if __name__ == "__main__":
_test()
Then running the module as a script will cause the examples in the
docstrings to get executed and verified:
python M.py
This won't display anything unless an example fails, in which case the
failing example(s) and the cause(s) of the failure(s) are printed to stdout
(why not stderr? because stderr is a lame hack <0.2 wink>), and the final
line of output is "Test failed.".
Run it with the -v switch instead:
python M.py -v
and a detailed report of all examples tried is printed to stdout, along
with assorted summaries at the end.
You can force verbose mode by passing "verbose=True" to testmod, or prohibit
it by passing "verbose=False". In either of those cases, sys.argv is not
examined by testmod.
There are a variety of other ways to run doctests, including integration
with the unittest framework, and support for running non-Python text
files containing doctests. There are also many ways to override parts
of doctest's default behaviors. See the Library Reference Manual for
details.
"""
__docformat__ = 'reStructuredText en'
__all__ = [
# 0, Option Flags
'register_optionflag',
'DONT_ACCEPT_TRUE_FOR_1',
'DONT_ACCEPT_BLANKLINE',
'NORMALIZE_WHITESPACE',
'ELLIPSIS',
'SKIP',
'IGNORE_EXCEPTION_DETAIL',
'COMPARISON_FLAGS',
'REPORT_UDIFF',
'REPORT_CDIFF',
'REPORT_NDIFF',
'REPORT_ONLY_FIRST_FAILURE',
'REPORTING_FLAGS',
# 1. Utility Functions
# 2. Example & DocTest
'Example',
'DocTest',
# 3. Doctest Parser
'DocTestParser',
# 4. Doctest Finder
'DocTestFinder',
# 5. Doctest Runner
'DocTestRunner',
'OutputChecker',
'DocTestFailure',
'UnexpectedException',
'DebugRunner',
# 6. Test Functions
'testmod',
'testfile',
'run_docstring_examples',
# 7. Tester
'Tester',
# 8. Unittest Support
'DocTestSuite',
'DocFileSuite',
'set_unittest_reportflags',
# 9. Debugging Support
'script_from_examples',
'testsource',
'debug_src',
'debug',
]
import __future__
import sys, traceback, inspect, linecache, os, re
import unittest, difflib, pdb, tempfile
import warnings
from StringIO import StringIO
from collections import namedtuple
TestResults = namedtuple('TestResults', 'failed attempted')
# There are 4 basic classes:
# - Example: a <source, want> pair, plus an intra-docstring line number.
# - DocTest: a collection of examples, parsed from a docstring, plus
# info about where the docstring came from (name, filename, lineno).
# - DocTestFinder: extracts DocTests from a given object's docstring and
# its contained objects' docstrings.
# - DocTestRunner: runs DocTest cases, and accumulates statistics.
#
# So the basic picture is:
#
# list of:
# +------+ +---------+ +-------+
# |object| --DocTestFinder-> | DocTest | --DocTestRunner-> |results|
# +------+ +---------+ +-------+
# | Example |
# | ... |
# | Example |
# +---------+
# Option constants.
OPTIONFLAGS_BY_NAME = {}
def register_optionflag(name):
# Create a new flag unless `name` is already known.
return OPTIONFLAGS_BY_NAME.setdefault(name, 1 << len(OPTIONFLAGS_BY_NAME))
DONT_ACCEPT_TRUE_FOR_1 = register_optionflag('DONT_ACCEPT_TRUE_FOR_1')
DONT_ACCEPT_BLANKLINE = register_optionflag('DONT_ACCEPT_BLANKLINE')
NORMALIZE_WHITESPACE = register_optionflag('NORMALIZE_WHITESPACE')
ELLIPSIS = register_optionflag('ELLIPSIS')
SKIP = register_optionflag('SKIP')
IGNORE_EXCEPTION_DETAIL = register_optionflag('IGNORE_EXCEPTION_DETAIL')
COMPARISON_FLAGS = (DONT_ACCEPT_TRUE_FOR_1 |
DONT_ACCEPT_BLANKLINE |
NORMALIZE_WHITESPACE |
ELLIPSIS |
SKIP |
IGNORE_EXCEPTION_DETAIL)
REPORT_UDIFF = register_optionflag('REPORT_UDIFF')
REPORT_CDIFF = register_optionflag('REPORT_CDIFF')
REPORT_NDIFF = register_optionflag('REPORT_NDIFF')
REPORT_ONLY_FIRST_FAILURE = register_optionflag('REPORT_ONLY_FIRST_FAILURE')
REPORTING_FLAGS = (REPORT_UDIFF |
REPORT_CDIFF |
REPORT_NDIFF |
REPORT_ONLY_FIRST_FAILURE)
# Special string markers for use in `want` strings:
BLANKLINE_MARKER = '<BLANKLINE>'
ELLIPSIS_MARKER = '...'
######################################################################
## Table of Contents
######################################################################
# 1. Utility Functions
# 2. Example & DocTest -- store test cases
# 3. DocTest Parser -- extracts examples from strings
# 4. DocTest Finder -- extracts test cases from objects
# 5. DocTest Runner -- runs test cases
# 6. Test Functions -- convenient wrappers for testing
# 7. Tester Class -- for backwards compatibility
# 8. Unittest Support
# 9. Debugging Support
# 10. Example Usage
######################################################################
## 1. Utility Functions
######################################################################
def _extract_future_flags(globs):
"""
Return the compiler-flags associated with the future features that
have been imported into the given namespace (globs).
"""
flags = 0
for fname in __future__.all_feature_names:
feature = globs.get(fname, None)
if feature is getattr(__future__, fname):
flags |= feature.compiler_flag
return flags
def _normalize_module(module, depth=2):
"""
Return the module specified by `module`. In particular:
- If `module` is a module, then return module.
- If `module` is a string, then import and return the
module with that name.
- If `module` is None, then return the calling module.
The calling module is assumed to be the module of
the stack frame at the given depth in the call stack.
"""
if inspect.ismodule(module):
return module
elif isinstance(module, (str, unicode)):
return __import__(module, globals(), locals(), ["*"])
elif module is None:
return sys.modules[sys._getframe(depth).f_globals['__name__']]
else:
raise TypeError("Expected a module, string, or None")
def _load_testfile(filename, package, module_relative):
if module_relative:
package = _normalize_module(package, 3)
filename = _module_relative_path(package, filename)
if hasattr(package, '__loader__'):
if hasattr(package.__loader__, 'get_data'):
file_contents = package.__loader__.get_data(filename)
# get_data() opens files as 'rb', so one must do the equivalent
# conversion as universal newlines would do.
return file_contents.replace(os.linesep, '\n'), filename
with open(filename, 'U') as f:
return f.read(), filename
# Use sys.stdout encoding for output.
_encoding = getattr(sys.__stdout__, 'encoding', None) or 'utf-8'
def _indent(s, indent=4):
"""
Add the given number of space characters to the beginning of
every non-blank line in `s`, and return the result.
If the string `s` is Unicode, it is encoded using the stdout
encoding and the `backslashreplace` error handler.
"""
if isinstance(s, unicode):
s = s.encode(_encoding, 'backslashreplace')
# This regexp matches the start of non-blank lines:
return re.sub('(?m)^(?!$)', indent*' ', s)
def _exception_traceback(exc_info):
"""
Return a string containing a traceback message for the given
exc_info tuple (as returned by sys.exc_info()).
"""
# Get a traceback message.
excout = StringIO()
exc_type, exc_val, exc_tb = exc_info
traceback.print_exception(exc_type, exc_val, exc_tb, file=excout)
return excout.getvalue()
# Override some StringIO methods.
class _SpoofOut(StringIO):
def getvalue(self):
result = StringIO.getvalue(self)
# If anything at all was written, make sure there's a trailing
# newline. There's no way for the expected output to indicate
# that a trailing newline is missing.
if result and not result.endswith("\n"):
result += "\n"
# Prevent softspace from screwing up the next test case, in
# case they used print with a trailing comma in an example.
if hasattr(self, "softspace"):
del self.softspace
return result
def truncate(self, size=None):
StringIO.truncate(self, size)
if hasattr(self, "softspace"):
del self.softspace
if not self.buf:
# Reset it to an empty string, to make sure it's not unicode.
self.buf = ''
# Worst-case linear-time ellipsis matching.
def _ellipsis_match(want, got):
"""
Essentially the only subtle case:
>>> _ellipsis_match('aa...aa', 'aaa')
False
"""
if ELLIPSIS_MARKER not in want:
return want == got
# Find "the real" strings.
ws = want.split(ELLIPSIS_MARKER)
assert len(ws) >= 2
# Deal with exact matches possibly needed at one or both ends.
startpos, endpos = 0, len(got)
w = ws[0]
if w: # starts with exact match
if got.startswith(w):
startpos = len(w)
del ws[0]
else:
return False
w = ws[-1]
if w: # ends with exact match
if got.endswith(w):
endpos -= len(w)
del ws[-1]
else:
return False
if startpos > endpos:
# Exact end matches required more characters than we have, as in
# _ellipsis_match('aa...aa', 'aaa')
return False
# For the rest, we only need to find the leftmost non-overlapping
# match for each piece. If there's no overall match that way alone,
# there's no overall match period.
for w in ws:
# w may be '' at times, if there are consecutive ellipses, or
# due to an ellipsis at the start or end of `want`. That's OK.
# Search for an empty string succeeds, and doesn't change startpos.
startpos = got.find(w, startpos, endpos)
if startpos < 0:
return False
startpos += len(w)
return True
def _comment_line(line):
"Return a commented form of the given line"
line = line.rstrip()
if line:
return '# '+line
else:
return '#'
def _strip_exception_details(msg):
# Support for IGNORE_EXCEPTION_DETAIL.
# Get rid of everything except the exception name; in particular, drop
# the possibly dotted module path (if any) and the exception message (if
# any). We assume that a colon is never part of a dotted name, or of an
# exception name.
# E.g., given
# "foo.bar.MyError: la di da"
# return "MyError"
# Or for "abc.def" or "abc.def:\n" return "def".
start, end = 0, len(msg)
# The exception name must appear on the first line.
i = msg.find("\n")
if i >= 0:
end = i
# retain up to the first colon (if any)
i = msg.find(':', 0, end)
if i >= 0:
end = i
# retain just the exception name
i = msg.rfind('.', 0, end)
if i >= 0:
start = i+1
return msg[start: end]
class _OutputRedirectingPdb(pdb.Pdb):
"""
A specialized version of the python debugger that redirects stdout
to a given stream when interacting with the user. Stdout is *not*
redirected when traced code is executed.
"""
def __init__(self, out):
self.__out = out
self.__debugger_used = False
pdb.Pdb.__init__(self, stdout=out)
# still use input() to get user input
self.use_rawinput = 1
def set_trace(self, frame=None):
self.__debugger_used = True
if frame is None:
frame = sys._getframe().f_back
pdb.Pdb.set_trace(self, frame)
def set_continue(self):
# Calling set_continue unconditionally would break unit test
# coverage reporting, as Bdb.set_continue calls sys.settrace(None).
if self.__debugger_used:
pdb.Pdb.set_continue(self)
def trace_dispatch(self, *args):
# Redirect stdout to the given stream.
save_stdout = sys.stdout
sys.stdout = self.__out
# Call Pdb's trace dispatch method.
try:
return pdb.Pdb.trace_dispatch(self, *args)
finally:
sys.stdout = save_stdout
# [XX] Normalize with respect to os.path.pardir?
def _module_relative_path(module, path):
if not inspect.ismodule(module):
raise TypeError, 'Expected a module: %r' % module
if path.startswith('/'):
raise ValueError, 'Module-relative files may not have absolute paths'
# Find the base directory for the path.
if hasattr(module, '__file__'):
# A normal module/package
basedir = os.path.split(module.__file__)[0]
elif module.__name__ == '__main__':
# An interactive session.
if len(sys.argv)>0 and sys.argv[0] != '':
basedir = os.path.split(sys.argv[0])[0]
else:
basedir = os.curdir
else:
# A module w/o __file__ (this includes builtins)
raise ValueError("Can't resolve paths relative to the module " +
module + " (it has no __file__)")
# Combine the base directory and the path.
return os.path.join(basedir, *(path.split('/')))
######################################################################
## 2. Example & DocTest
######################################################################
## - An "example" is a <source, want> pair, where "source" is a
## fragment of source code, and "want" is the expected output for
## "source." The Example class also includes information about
## where the example was extracted from.
##
## - A "doctest" is a collection of examples, typically extracted from
## a string (such as an object's docstring). The DocTest class also
## includes information about where the string was extracted from.
class Example:
"""
A single doctest example, consisting of source code and expected
output. `Example` defines the following attributes:
- source: A single Python statement, always ending with a newline.
The constructor adds a newline if needed.
- want: The expected output from running the source code (either
from stdout, or a traceback in case of exception). `want` ends
with a newline unless it's empty, in which case it's an empty
string. The constructor adds a newline if needed.
- exc_msg: The exception message generated by the example, if
the example is expected to generate an exception; or `None` if
it is not expected to generate an exception. This exception
message is compared against the return value of
`traceback.format_exception_only()`. `exc_msg` ends with a
newline unless it's `None`. The constructor adds a newline
if needed.
- lineno: The line number within the DocTest string containing
this Example where the Example begins. This line number is
zero-based, with respect to the beginning of the DocTest.
- indent: The example's indentation in the DocTest string.
I.e., the number of space characters that precede the
example's first prompt.
- options: A dictionary mapping from option flags to True or
False, which is used to override default options for this
example. Any option flags not contained in this dictionary
are left at their default value (as specified by the
DocTestRunner's optionflags). By default, no options are set.
"""
def __init__(self, source, want, exc_msg=None, lineno=0, indent=0,
options=None):
# Normalize inputs.
if not source.endswith('\n'):
source += '\n'
if want and not want.endswith('\n'):
want += '\n'
if exc_msg is not None and not exc_msg.endswith('\n'):
exc_msg += '\n'
# Store properties.
self.source = source
self.want = want
self.lineno = lineno
self.indent = indent
if options is None: options = {}
self.options = options
self.exc_msg = exc_msg
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self.source == other.source and \
self.want == other.want and \
self.lineno == other.lineno and \
self.indent == other.indent and \
self.options == other.options and \
self.exc_msg == other.exc_msg
def __ne__(self, other):
return not self == other
def __hash__(self):
return hash((self.source, self.want, self.lineno, self.indent,
self.exc_msg))
class DocTest:
"""
A collection of doctest examples that should be run in a single
namespace. Each `DocTest` defines the following attributes:
- examples: the list of examples.
- globs: The namespace (aka globals) that the examples should
be run in.
- name: A name identifying the DocTest (typically, the name of
the object whose docstring this DocTest was extracted from).
- filename: The name of the file that this DocTest was extracted
from, or `None` if the filename is unknown.
- lineno: The line number within filename where this DocTest
begins, or `None` if the line number is unavailable. This
line number is zero-based, with respect to the beginning of
the file.
- docstring: The string that the examples were extracted from,
or `None` if the string is unavailable.
"""
def __init__(self, examples, globs, name, filename, lineno, docstring):
"""
Create a new DocTest containing the given examples. The
DocTest's globals are initialized with a copy of `globs`.
"""
assert not isinstance(examples, basestring), \
"DocTest no longer accepts str; use DocTestParser instead"
self.examples = examples
self.docstring = docstring
self.globs = globs.copy()
self.name = name
self.filename = filename
self.lineno = lineno
def __repr__(self):
if len(self.examples) == 0:
examples = 'no examples'
elif len(self.examples) == 1:
examples = '1 example'
else:
examples = '%d examples' % len(self.examples)
return ('<DocTest %s from %s:%s (%s)>' %
(self.name, self.filename, self.lineno, examples))
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self.examples == other.examples and \
self.docstring == other.docstring and \
self.globs == other.globs and \
self.name == other.name and \
self.filename == other.filename and \
self.lineno == other.lineno
def __ne__(self, other):
return not self == other
def __hash__(self):
return hash((self.docstring, self.name, self.filename, self.lineno))
# This lets us sort tests by name:
def __cmp__(self, other):
if not isinstance(other, DocTest):
return -1
return cmp((self.name, self.filename, self.lineno, id(self)),
(other.name, other.filename, other.lineno, id(other)))
######################################################################
## 3. DocTestParser
######################################################################
class DocTestParser:
"""
A class used to parse strings containing doctest examples.
"""
# This regular expression is used to find doctest examples in a
# string. It defines three groups: `source` is the source code
# (including leading indentation and prompts); `indent` is the
# indentation of the first (PS1) line of the source code; and
# `want` is the expected output (including leading indentation).
_EXAMPLE_RE = re.compile(r'''
# Source consists of a PS1 line followed by zero or more PS2 lines.
(?P<source>
(?:^(?P<indent> [ ]*) >>> .*) # PS1 line
(?:\n [ ]* \.\.\. .*)*) # PS2 lines
\n?
# Want consists of any non-blank lines that do not start with PS1.
(?P<want> (?:(?![ ]*$) # Not a blank line
(?![ ]*>>>) # Not a line starting with PS1
.+$\n? # But any other line
)*)
''', re.MULTILINE | re.VERBOSE)
# A regular expression for handling `want` strings that contain
# expected exceptions. It divides `want` into three pieces:
# - the traceback header line (`hdr`)
# - the traceback stack (`stack`)
# - the exception message (`msg`), as generated by
# traceback.format_exception_only()
# `msg` may have multiple lines. We assume/require that the
# exception message is the first non-indented line starting with a word
# character following the traceback header line.
_EXCEPTION_RE = re.compile(r"""
# Grab the traceback header. Different versions of Python have
# said different things on the first traceback line.
^(?P<hdr> Traceback\ \(
(?: most\ recent\ call\ last
| innermost\ last
) \) :
)
\s* $ # toss trailing whitespace on the header.
(?P<stack> .*?) # don't blink: absorb stuff until...
^ (?P<msg> \w+ .*) # a line *starts* with alphanum.
""", re.VERBOSE | re.MULTILINE | re.DOTALL)
# A callable returning a true value iff its argument is a blank line
# or contains a single comment.
_IS_BLANK_OR_COMMENT = re.compile(r'^[ ]*(#.*)?$').match
def parse(self, string, name='<string>'):
"""
Divide the given string into examples and intervening text,
and return them as a list of alternating Examples and strings.
Line numbers for the Examples are 0-based. The optional
argument `name` is a name identifying this string, and is only
used for error messages.
"""
string = string.expandtabs()
# If all lines begin with the same indentation, then strip it.
min_indent = self._min_indent(string)
if min_indent > 0:
string = '\n'.join([l[min_indent:] for l in string.split('\n')])
output = []
charno, lineno = 0, 0
# Find all doctest examples in the string:
for m in self._EXAMPLE_RE.finditer(string):
# Add the pre-example text to `output`.
output.append(string[charno:m.start()])
# Update lineno (lines before this example)
lineno += string.count('\n', charno, m.start())
# Extract info from the regexp match.
(source, options, want, exc_msg) = \
self._parse_example(m, name, lineno)
# Create an Example, and add it to the list.
if not self._IS_BLANK_OR_COMMENT(source):
output.append( Example(source, want, exc_msg,
lineno=lineno,
indent=min_indent+len(m.group('indent')),
options=options) )
# Update lineno (lines inside this example)
lineno += string.count('\n', m.start(), m.end())
# Update charno.
charno = m.end()
# Add any remaining post-example text to `output`.
output.append(string[charno:])
return output
def get_doctest(self, string, globs, name, filename, lineno):
"""
Extract all doctest examples from the given string, and
collect them into a `DocTest` object.
`globs`, `name`, `filename`, and `lineno` are attributes for
the new `DocTest` object. See the documentation for `DocTest`
for more information.
"""
return DocTest(self.get_examples(string, name), globs,
name, filename, lineno, string)
def get_examples(self, string, name='<string>'):
"""
Extract all doctest examples from the given string, and return
them as a list of `Example` objects. Line numbers are
0-based, because it's most common in doctests that nothing
interesting appears on the same line as opening triple-quote,
and so the first interesting line is called \"line 1\" then.
The optional argument `name` is a name identifying this
string, and is only used for error messages.
"""
return [x for x in self.parse(string, name)
if isinstance(x, Example)]
def _parse_example(self, m, name, lineno):
"""
Given a regular expression match from `_EXAMPLE_RE` (`m`),
return a pair `(source, want)`, where `source` is the matched
example's source code (with prompts and indentation stripped);
and `want` is the example's expected output (with indentation
stripped).
`name` is the string's name, and `lineno` is the line number
where the example starts; both are used for error messages.
"""
# Get the example's indentation level.
indent = len(m.group('indent'))
# Divide source into lines; check that they're properly
# indented; and then strip their indentation & prompts.
source_lines = m.group('source').split('\n')
self._check_prompt_blank(source_lines, indent, name, lineno)
self._check_prefix(source_lines[1:], ' '*indent + '.', name, lineno)
source = '\n'.join([sl[indent+4:] for sl in source_lines])
# Divide want into lines; check that it's properly indented; and
# then strip the indentation. Spaces before the last newline should
# be preserved, so plain rstrip() isn't good enough.
want = m.group('want')
want_lines = want.split('\n')
if len(want_lines) > 1 and re.match(r' *$', want_lines[-1]):
del want_lines[-1] # forget final newline & spaces after it
self._check_prefix(want_lines, ' '*indent, name,
lineno + len(source_lines))
want = '\n'.join([wl[indent:] for wl in want_lines])
# If `want` contains a traceback message, then extract it.
m = self._EXCEPTION_RE.match(want)
if m:
exc_msg = m.group('msg')
else:
exc_msg = None
# Extract options from the source.
options = self._find_options(source, name, lineno)
return source, options, want, exc_msg
# This regular expression looks for option directives in the
# source code of an example. Option directives are comments
# starting with "doctest:". Warning: this may give false
# positives for string-literals that contain the string
# "#doctest:". Eliminating these false positives would require
# actually parsing the string; but we limit them by ignoring any
# line containing "#doctest:" that is *followed* by a quote mark.
_OPTION_DIRECTIVE_RE = re.compile(r'#\s*doctest:\s*([^\n\'"]*)$',
re.MULTILINE)
def _find_options(self, source, name, lineno):
"""
Return a dictionary containing option overrides extracted from
option directives in the given source string.
`name` is the string's name, and `lineno` is the line number
where the example starts; both are used for error messages.
"""
options = {}
# (note: with the current regexp, this will match at most once:)
for m in self._OPTION_DIRECTIVE_RE.finditer(source):
option_strings = m.group(1).replace(',', ' ').split()
for option in option_strings:
if (option[0] not in '+-' or
option[1:] not in OPTIONFLAGS_BY_NAME):
raise ValueError('line %r of the doctest for %s '
'has an invalid option: %r' %
(lineno+1, name, option))
flag = OPTIONFLAGS_BY_NAME[option[1:]]
options[flag] = (option[0] == '+')
if options and self._IS_BLANK_OR_COMMENT(source):
raise ValueError('line %r of the doctest for %s has an option '
'directive on a line with no example: %r' %
(lineno, name, source))
return options
# This regular expression finds the indentation of every non-blank
# line in a string.
_INDENT_RE = re.compile('^([ ]*)(?=\S)', re.MULTILINE)
def _min_indent(self, s):
"Return the minimum indentation of any non-blank line in `s`"
indents = [len(indent) for indent in self._INDENT_RE.findall(s)]
if len(indents) > 0:
return min(indents)
else:
return 0
def _check_prompt_blank(self, lines, indent, name, lineno):
"""
Given the lines of a source string (including prompts and
leading indentation), check to make sure that every prompt is
followed by a space character. If any line is not followed by
a space character, then raise ValueError.
"""
for i, line in enumerate(lines):
if len(line) >= indent+4 and line[indent+3] != ' ':
raise ValueError('line %r of the docstring for %s '
'lacks blank after %s: %r' %
(lineno+i+1, name,
line[indent:indent+3], line))
def _check_prefix(self, lines, prefix, name, lineno):
"""
Check that every line in the given list starts with the given
prefix; if any line does not, then raise a ValueError.
"""
for i, line in enumerate(lines):
if line and not line.startswith(prefix):
raise ValueError('line %r of the docstring for %s has '
'inconsistent leading whitespace: %r' %
(lineno+i+1, name, line))
######################################################################
## 4. DocTest Finder
######################################################################
class DocTestFinder:
"""
A class used to extract the DocTests that are relevant to a given
object, from its docstring and the docstrings of its contained
objects. Doctests can currently be extracted from the following
object types: modules, functions, classes, methods, staticmethods,
classmethods, and properties.
"""
def __init__(self, verbose=False, parser=DocTestParser(),
recurse=True, exclude_empty=True):
"""
Create a new doctest finder.
The optional argument `parser` specifies a class or
function that should be used to create new DocTest objects (or
objects that implement the same interface as DocTest). The
signature for this factory function should match the signature
of the DocTest constructor.
If the optional argument `recurse` is false, then `find` will
only examine the given object, and not any contained objects.
If the optional argument `exclude_empty` is false, then `find`
will include tests for objects with empty docstrings.
"""
self._parser = parser
self._verbose = verbose
self._recurse = recurse
self._exclude_empty = exclude_empty
def find(self, obj, name=None, module=None, globs=None, extraglobs=None):
"""
Return a list of the DocTests that are defined by the given
object's docstring, or by any of its contained objects'
docstrings.
The optional parameter `module` is the module that contains
the given object. If the module is not specified or is None, then
the test finder will attempt to automatically determine the
correct module. The object's module is used:
- As a default namespace, if `globs` is not specified.
- To prevent the DocTestFinder from extracting DocTests
from objects that are imported from other modules.
- To find the name of the file containing the object.
- To help find the line number of the object within its
file.
Contained objects whose module does not match `module` are ignored.
If `module` is False, no attempt to find the module will be made.
This is obscure, of use mostly in tests: if `module` is False, or
is None but cannot be found automatically, then all objects are
considered to belong to the (non-existent) module, so all contained
objects will (recursively) be searched for doctests.
The globals for each DocTest is formed by combining `globs`
and `extraglobs` (bindings in `extraglobs` override bindings
in `globs`). A new copy of the globals dictionary is created
for each DocTest. If `globs` is not specified, then it
defaults to the module's `__dict__`, if specified, or {}
otherwise. If `extraglobs` is not specified, then it defaults
to {}.
"""
# If name was not specified, then extract it from the object.
if name is None:
name = getattr(obj, '__name__', None)
if name is None:
raise ValueError("DocTestFinder.find: name must be given "
"when obj.__name__ doesn't exist: %r" %
(type(obj),))
# Find the module that contains the given object (if obj is
# a module, then module=obj.). Note: this may fail, in which
# case module will be None.
if module is False:
module = None
elif module is None:
module = inspect.getmodule(obj)
# Read the module's source code. This is used by
# DocTestFinder._find_lineno to find the line number for a
# given object's docstring.
try:
file = inspect.getsourcefile(obj) or inspect.getfile(obj)
if module is not None:
# Supply the module globals in case the module was
# originally loaded via a PEP 302 loader and
# file is not a valid filesystem path
source_lines = linecache.getlines(file, module.__dict__)
else:
# No access to a loader, so assume it's a normal
# filesystem path
source_lines = linecache.getlines(file)
if not source_lines:
source_lines = None
except TypeError:
source_lines = None
# Initialize globals, and merge in extraglobs.
if globs is None:
if module is None:
globs = {}
else:
globs = module.__dict__.copy()
else:
globs = globs.copy()
if extraglobs is not None:
globs.update(extraglobs)
if '__name__' not in globs:
globs['__name__'] = '__main__' # provide a default module name
# Recursively explore `obj`, extracting DocTests.
tests = []
self._find(tests, obj, name, module, source_lines, globs, {})
# Sort the tests by alpha order of names, for consistency in
# verbose-mode output. This was a feature of doctest in Pythons
# <= 2.3 that got lost by accident in 2.4. It was repaired in
# 2.4.4 and 2.5.
tests.sort()
return tests
def _from_module(self, module, object):
"""
Return true if the given object is defined in the given
module.
"""
if module is None:
return True
elif inspect.getmodule(object) is not None:
return module is inspect.getmodule(object)
elif inspect.isfunction(object):
return module.__dict__ is object.func_globals
elif inspect.isclass(object):
return module.__name__ == object.__module__
elif hasattr(object, '__module__'):
return module.__name__ == object.__module__
elif isinstance(object, property):
return True # [XX] no way not be sure.
else:
raise ValueError("object must be a class or function")
def _find(self, tests, obj, name, module, source_lines, globs, seen):
"""
Find tests for the given object and any contained objects, and
add them to `tests`.
"""
if self._verbose:
print 'Finding tests in %s' % name
# If we've already processed this object, then ignore it.
if id(obj) in seen:
return
seen[id(obj)] = 1
# Find a test for this object, and add it to the list of tests.
test = self._get_test(obj, name, module, globs, source_lines)
if test is not None:
tests.append(test)
# Look for tests in a module's contained objects.
if inspect.ismodule(obj) and self._recurse:
for valname, val in obj.__dict__.items():
valname = '%s.%s' % (name, valname)
# Recurse to functions & classes.
if ((inspect.isfunction(val) or inspect.isclass(val)) and
self._from_module(module, val)):
self._find(tests, val, valname, module, source_lines,
globs, seen)
# Look for tests in a module's __test__ dictionary.
if inspect.ismodule(obj) and self._recurse:
for valname, val in getattr(obj, '__test__', {}).items():
if not isinstance(valname, basestring):
raise ValueError("DocTestFinder.find: __test__ keys "
"must be strings: %r" %
(type(valname),))
if not (inspect.isfunction(val) or inspect.isclass(val) or
inspect.ismethod(val) or inspect.ismodule(val) or
isinstance(val, basestring)):
raise ValueError("DocTestFinder.find: __test__ values "
"must be strings, functions, methods, "
"classes, or modules: %r" %
(type(val),))
valname = '%s.__test__.%s' % (name, valname)
self._find(tests, val, valname, module, source_lines,
globs, seen)
# Look for tests in a class's contained objects.
if inspect.isclass(obj) and self._recurse:
for valname, val in obj.__dict__.items():
# Special handling for staticmethod/classmethod.
if isinstance(val, staticmethod):
val = getattr(obj, valname)
if isinstance(val, classmethod):
val = getattr(obj, valname).im_func
# Recurse to methods, properties, and nested classes.
if ((inspect.isfunction(val) or inspect.isclass(val) or
isinstance(val, property)) and
self._from_module(module, val)):
valname = '%s.%s' % (name, valname)
self._find(tests, val, valname, module, source_lines,
globs, seen)
def _get_test(self, obj, name, module, globs, source_lines):
"""
Return a DocTest for the given object, if it defines a docstring;
otherwise, return None.
"""
# Extract the object's docstring. If it doesn't have one,
# then return None (no test for this object).
if isinstance(obj, basestring):
docstring = obj
else:
try:
if obj.__doc__ is None:
docstring = ''
else:
docstring = obj.__doc__
if not isinstance(docstring, basestring):
docstring = str(docstring)
except (TypeError, AttributeError):
docstring = ''
# Find the docstring's location in the file.
lineno = self._find_lineno(obj, source_lines)
# Don't bother if the docstring is empty.
if self._exclude_empty and not docstring:
return None
# Return a DocTest for this object.
if module is None:
filename = None
else:
filename = getattr(module, '__file__', module.__name__)
if filename[-4:] in (".pyc", ".pyo"):
filename = filename[:-1]
return self._parser.get_doctest(docstring, globs, name,
filename, lineno)
def _find_lineno(self, obj, source_lines):
"""
Return a line number of the given object's docstring. Note:
this method assumes that the object has a docstring.
"""
lineno = None
# Find the line number for modules.
if inspect.ismodule(obj):
lineno = 0
# Find the line number for classes.
# Note: this could be fooled if a class is defined multiple
# times in a single file.
if inspect.isclass(obj):
if source_lines is None:
return None
pat = re.compile(r'^\s*class\s*%s\b' %
getattr(obj, '__name__', '-'))
for i, line in enumerate(source_lines):
if pat.match(line):
lineno = i
break
# Find the line number for functions & methods.
if inspect.ismethod(obj): obj = obj.im_func
if inspect.isfunction(obj): obj = obj.func_code
if inspect.istraceback(obj): obj = obj.tb_frame
if inspect.isframe(obj): obj = obj.f_code
if inspect.iscode(obj):
lineno = getattr(obj, 'co_firstlineno', None)-1
# Find the line number where the docstring starts. Assume
# that it's the first line that begins with a quote mark.
# Note: this could be fooled by a multiline function
# signature, where a continuation line begins with a quote
# mark.
if lineno is not None:
if source_lines is None:
return lineno+1
pat = re.compile('(^|.*:)\s*\w*("|\')')
for lineno in range(lineno, len(source_lines)):
if pat.match(source_lines[lineno]):
return lineno
# We couldn't find the line number.
return None
######################################################################
## 5. DocTest Runner
######################################################################
class DocTestRunner:
"""
A class used to run DocTest test cases, and accumulate statistics.
The `run` method is used to process a single DocTest case. It
returns a tuple `(f, t)`, where `t` is the number of test cases
tried, and `f` is the number of test cases that failed.
>>> tests = DocTestFinder().find(_TestClass)
>>> runner = DocTestRunner(verbose=False)
>>> tests.sort(key = lambda test: test.name)
>>> for test in tests:
... print test.name, '->', runner.run(test)
_TestClass -> TestResults(failed=0, attempted=2)
_TestClass.__init__ -> TestResults(failed=0, attempted=2)
_TestClass.get -> TestResults(failed=0, attempted=2)
_TestClass.square -> TestResults(failed=0, attempted=1)
The `summarize` method prints a summary of all the test cases that
have been run by the runner, and returns an aggregated `(f, t)`
tuple:
>>> runner.summarize(verbose=1)
4 items passed all tests:
2 tests in _TestClass
2 tests in _TestClass.__init__
2 tests in _TestClass.get
1 tests in _TestClass.square
7 tests in 4 items.
7 passed and 0 failed.
Test passed.
TestResults(failed=0, attempted=7)
The aggregated number of tried examples and failed examples is
also available via the `tries` and `failures` attributes:
>>> runner.tries
7
>>> runner.failures
0
The comparison between expected outputs and actual outputs is done
by an `OutputChecker`. This comparison may be customized with a
number of option flags; see the documentation for `testmod` for
more information. If the option flags are insufficient, then the
comparison may also be customized by passing a subclass of
`OutputChecker` to the constructor.
The test runner's display output can be controlled in two ways.
First, an output function (`out) can be passed to
`TestRunner.run`; this function will be called with strings that
should be displayed. It defaults to `sys.stdout.write`. If
capturing the output is not sufficient, then the display output
can be also customized by subclassing DocTestRunner, and
overriding the methods `report_start`, `report_success`,
`report_unexpected_exception`, and `report_failure`.
"""
# This divider string is used to separate failure messages, and to
# separate sections of the summary.
DIVIDER = "*" * 70
def __init__(self, checker=None, verbose=None, optionflags=0):
"""
Create a new test runner.
Optional keyword arg `checker` is the `OutputChecker` that
should be used to compare the expected outputs and actual
outputs of doctest examples.
Optional keyword arg 'verbose' prints lots of stuff if true,
only failures if false; by default, it's true iff '-v' is in
sys.argv.
Optional argument `optionflags` can be used to control how the
test runner compares expected output to actual output, and how
it displays failures. See the documentation for `testmod` for
more information.
"""
self._checker = checker or OutputChecker()
if verbose is None:
verbose = '-v' in sys.argv
self._verbose = verbose
self.optionflags = optionflags
self.original_optionflags = optionflags
# Keep track of the examples we've run.
self.tries = 0
self.failures = 0
self._name2ft = {}
# Create a fake output target for capturing doctest output.
self._fakeout = _SpoofOut()
#/////////////////////////////////////////////////////////////////
# Reporting methods
#/////////////////////////////////////////////////////////////////
def report_start(self, out, test, example):
"""
Report that the test runner is about to process the given
example. (Only displays a message if verbose=True)
"""
if self._verbose:
if example.want:
out('Trying:\n' + _indent(example.source) +
'Expecting:\n' + _indent(example.want))
else:
out('Trying:\n' + _indent(example.source) +
'Expecting nothing\n')
def report_success(self, out, test, example, got):
"""
Report that the given example ran successfully. (Only
displays a message if verbose=True)
"""
if self._verbose:
out("ok\n")
def report_failure(self, out, test, example, got):
"""
Report that the given example failed.
"""
out(self._failure_header(test, example) +
self._checker.output_difference(example, got, self.optionflags))
def report_unexpected_exception(self, out, test, example, exc_info):
"""
Report that the given example raised an unexpected exception.
"""
out(self._failure_header(test, example) +
'Exception raised:\n' + _indent(_exception_traceback(exc_info)))
def _failure_header(self, test, example):
out = [self.DIVIDER]
if test.filename:
if test.lineno is not None and example.lineno is not None:
lineno = test.lineno + example.lineno + 1
else:
lineno = '?'
out.append('File "%s", line %s, in %s' %
(test.filename, lineno, test.name))
else:
out.append('Line %s, in %s' % (example.lineno+1, test.name))
out.append('Failed example:')
source = example.source
out.append(_indent(source))
return '\n'.join(out)
#/////////////////////////////////////////////////////////////////
# DocTest Running
#/////////////////////////////////////////////////////////////////
def __run(self, test, compileflags, out):
"""
Run the examples in `test`. Write the outcome of each example
with one of the `DocTestRunner.report_*` methods, using the
writer function `out`. `compileflags` is the set of compiler
flags that should be used to execute examples. Return a tuple
`(f, t)`, where `t` is the number of examples tried, and `f`
is the number of examples that failed. The examples are run
in the namespace `test.globs`.
"""
# Keep track of the number of failures and tries.
failures = tries = 0
# Save the option flags (since option directives can be used
# to modify them).
original_optionflags = self.optionflags
SUCCESS, FAILURE, BOOM = range(3) # `outcome` state
check = self._checker.check_output
# Process each example.
for examplenum, example in enumerate(test.examples):
# If REPORT_ONLY_FIRST_FAILURE is set, then suppress
# reporting after the first failure.
quiet = (self.optionflags & REPORT_ONLY_FIRST_FAILURE and
failures > 0)
# Merge in the example's options.
self.optionflags = original_optionflags
if example.options:
for (optionflag, val) in example.options.items():
if val:
self.optionflags |= optionflag
else:
self.optionflags &= ~optionflag
# If 'SKIP' is set, then skip this example.
if self.optionflags & SKIP:
continue
# Record that we started this example.
tries += 1
if not quiet:
self.report_start(out, test, example)
# Use a special filename for compile(), so we can retrieve
# the source code during interactive debugging (see
# __patched_linecache_getlines).
filename = '<doctest %s[%d]>' % (test.name, examplenum)
# Run the example in the given context (globs), and record
# any exception that gets raised. (But don't intercept
# keyboard interrupts.)
try:
# Don't blink! This is where the user's code gets run.
exec compile(example.source, filename, "single",
compileflags, 1) in test.globs
self.debugger.set_continue() # ==== Example Finished ====
exception = None
except KeyboardInterrupt:
raise
except:
exception = sys.exc_info()
self.debugger.set_continue() # ==== Example Finished ====
got = self._fakeout.getvalue() # the actual output
self._fakeout.truncate(0)
outcome = FAILURE # guilty until proved innocent or insane
# If the example executed without raising any exceptions,
# verify its output.
if exception is None:
if check(example.want, got, self.optionflags):
outcome = SUCCESS
# The example raised an exception: check if it was expected.
else:
exc_info = sys.exc_info()
exc_msg = traceback.format_exception_only(*exc_info[:2])[-1]
if not quiet:
got += _exception_traceback(exc_info)
# If `example.exc_msg` is None, then we weren't expecting
# an exception.
if example.exc_msg is None:
outcome = BOOM
# We expected an exception: see whether it matches.
elif check(example.exc_msg, exc_msg, self.optionflags):
outcome = SUCCESS
# Another chance if they didn't care about the detail.
elif self.optionflags & IGNORE_EXCEPTION_DETAIL:
if check(_strip_exception_details(example.exc_msg),
_strip_exception_details(exc_msg),
self.optionflags):
outcome = SUCCESS
# Report the outcome.
if outcome is SUCCESS:
if not quiet:
self.report_success(out, test, example, got)
elif outcome is FAILURE:
if not quiet:
self.report_failure(out, test, example, got)
failures += 1
elif outcome is BOOM:
if not quiet:
self.report_unexpected_exception(out, test, example,
exc_info)
failures += 1
else:
assert False, ("unknown outcome", outcome)
# Restore the option flags (in case they were modified)
self.optionflags = original_optionflags
# Record and return the number of failures and tries.
self.__record_outcome(test, failures, tries)
return TestResults(failures, tries)
def __record_outcome(self, test, f, t):
"""
Record the fact that the given DocTest (`test`) generated `f`
failures out of `t` tried examples.
"""
f2, t2 = self._name2ft.get(test.name, (0,0))
self._name2ft[test.name] = (f+f2, t+t2)
self.failures += f
self.tries += t
__LINECACHE_FILENAME_RE = re.compile(r'<doctest '
r'(?P<name>.+)'
r'\[(?P<examplenum>\d+)\]>$')
def __patched_linecache_getlines(self, filename, module_globals=None):
m = self.__LINECACHE_FILENAME_RE.match(filename)
if m and m.group('name') == self.test.name:
example = self.test.examples[int(m.group('examplenum'))]
source = example.source
if isinstance(source, unicode):
source = source.encode('ascii', 'backslashreplace')
return source.splitlines(True)
else:
return self.save_linecache_getlines(filename, module_globals)
def run(self, test, compileflags=None, out=None, clear_globs=True):
"""
Run the examples in `test`, and display the results using the
writer function `out`.
The examples are run in the namespace `test.globs`. If
`clear_globs` is true (the default), then this namespace will
be cleared after the test runs, to help with garbage
collection. If you would like to examine the namespace after
the test completes, then use `clear_globs=False`.
`compileflags` gives the set of flags that should be used by
the Python compiler when running the examples. If not
specified, then it will default to the set of future-import
flags that apply to `globs`.
The output of each example is checked using
`DocTestRunner.check_output`, and the results are formatted by
the `DocTestRunner.report_*` methods.
"""
self.test = test
if compileflags is None:
compileflags = _extract_future_flags(test.globs)
save_stdout = sys.stdout
if out is None:
out = save_stdout.write
sys.stdout = self._fakeout
# Patch pdb.set_trace to restore sys.stdout during interactive
# debugging (so it's not still redirected to self._fakeout).
# Note that the interactive output will go to *our*
# save_stdout, even if that's not the real sys.stdout; this
# allows us to write test cases for the set_trace behavior.
save_set_trace = pdb.set_trace
self.debugger = _OutputRedirectingPdb(save_stdout)
self.debugger.reset()
pdb.set_trace = self.debugger.set_trace
# Patch linecache.getlines, so we can see the example's source
# when we're inside the debugger.
self.save_linecache_getlines = linecache.getlines
linecache.getlines = self.__patched_linecache_getlines
# Make sure sys.displayhook just prints the value to stdout
save_displayhook = sys.displayhook
sys.displayhook = sys.__displayhook__
try:
return self.__run(test, compileflags, out)
finally:
sys.stdout = save_stdout
pdb.set_trace = save_set_trace
linecache.getlines = self.save_linecache_getlines
sys.displayhook = save_displayhook
if clear_globs:
test.globs.clear()
#/////////////////////////////////////////////////////////////////
# Summarization
#/////////////////////////////////////////////////////////////////
def summarize(self, verbose=None):
"""
Print a summary of all the test cases that have been run by
this DocTestRunner, and return a tuple `(f, t)`, where `f` is
the total number of failed examples, and `t` is the total
number of tried examples.
The optional `verbose` argument controls how detailed the
summary is. If the verbosity is not specified, then the
DocTestRunner's verbosity is used.
"""
if verbose is None:
verbose = self._verbose
notests = []
passed = []
failed = []
totalt = totalf = 0
for x in self._name2ft.items():
name, (f, t) = x
assert f <= t
totalt += t
totalf += f
if t == 0:
notests.append(name)
elif f == 0:
passed.append( (name, t) )
else:
failed.append(x)
if verbose:
if notests:
print len(notests), "items had no tests:"
notests.sort()
for thing in notests:
print " ", thing
if passed:
print len(passed), "items passed all tests:"
passed.sort()
for thing, count in passed:
print " %3d tests in %s" % (count, thing)
if failed:
print self.DIVIDER
print len(failed), "items had failures:"
failed.sort()
for thing, (f, t) in failed:
print " %3d of %3d in %s" % (f, t, thing)
if verbose:
print totalt, "tests in", len(self._name2ft), "items."
print totalt - totalf, "passed and", totalf, "failed."
if totalf:
print "***Test Failed***", totalf, "failures."
elif verbose:
print "Test passed."
return TestResults(totalf, totalt)
#/////////////////////////////////////////////////////////////////
# Backward compatibility cruft to maintain doctest.master.
#/////////////////////////////////////////////////////////////////
def merge(self, other):
d = self._name2ft
for name, (f, t) in other._name2ft.items():
if name in d:
# Don't print here by default, since doing
# so breaks some of the buildbots
#print "*** DocTestRunner.merge: '" + name + "' in both" \
# " testers; summing outcomes."
f2, t2 = d[name]
f = f + f2
t = t + t2
d[name] = f, t
class OutputChecker:
"""
A class used to check the whether the actual output from a doctest
example matches the expected output. `OutputChecker` defines two
methods: `check_output`, which compares a given pair of outputs,
and returns true if they match; and `output_difference`, which
returns a string describing the differences between two outputs.
"""
def check_output(self, want, got, optionflags):
"""
Return True iff the actual output from an example (`got`)
matches the expected output (`want`). These strings are
always considered to match if they are identical; but
depending on what option flags the test runner is using,
several non-exact match types are also possible. See the
documentation for `TestRunner` for more information about
option flags.
"""
# Handle the common case first, for efficiency:
# if they're string-identical, always return true.
if got == want:
return True
# The values True and False replaced 1 and 0 as the return
# value for boolean comparisons in Python 2.3.
if not (optionflags & DONT_ACCEPT_TRUE_FOR_1):
if (got,want) == ("True\n", "1\n"):
return True
if (got,want) == ("False\n", "0\n"):
return True
# <BLANKLINE> can be used as a special sequence to signify a
# blank line, unless the DONT_ACCEPT_BLANKLINE flag is used.
if not (optionflags & DONT_ACCEPT_BLANKLINE):
# Replace <BLANKLINE> in want with a blank line.
want = re.sub('(?m)^%s\s*?$' % re.escape(BLANKLINE_MARKER),
'', want)
# If a line in got contains only spaces, then remove the
# spaces.
got = re.sub('(?m)^\s*?$', '', got)
if got == want:
return True
# This flag causes doctest to ignore any differences in the
# contents of whitespace strings. Note that this can be used
# in conjunction with the ELLIPSIS flag.
if optionflags & NORMALIZE_WHITESPACE:
got = ' '.join(got.split())
want = ' '.join(want.split())
if got == want:
return True
# The ELLIPSIS flag says to let the sequence "..." in `want`
# match any substring in `got`.
if optionflags & ELLIPSIS:
if _ellipsis_match(want, got):
return True
# We didn't find any match; return false.
return False
# Should we do a fancy diff?
def _do_a_fancy_diff(self, want, got, optionflags):
# Not unless they asked for a fancy diff.
if not optionflags & (REPORT_UDIFF |
REPORT_CDIFF |
REPORT_NDIFF):
return False
# If expected output uses ellipsis, a meaningful fancy diff is
# too hard ... or maybe not. In two real-life failures Tim saw,
# a diff was a major help anyway, so this is commented out.
# [todo] _ellipsis_match() knows which pieces do and don't match,
# and could be the basis for a kick-ass diff in this case.
##if optionflags & ELLIPSIS and ELLIPSIS_MARKER in want:
## return False
# ndiff does intraline difference marking, so can be useful even
# for 1-line differences.
if optionflags & REPORT_NDIFF:
return True
# The other diff types need at least a few lines to be helpful.
return want.count('\n') > 2 and got.count('\n') > 2
def output_difference(self, example, got, optionflags):
"""
Return a string describing the differences between the
expected output for a given example (`example`) and the actual
output (`got`). `optionflags` is the set of option flags used
to compare `want` and `got`.
"""
want = example.want
# If <BLANKLINE>s are being used, then replace blank lines
# with <BLANKLINE> in the actual output string.
if not (optionflags & DONT_ACCEPT_BLANKLINE):
got = re.sub('(?m)^[ ]*(?=\n)', BLANKLINE_MARKER, got)
# Check if we should use diff.
if self._do_a_fancy_diff(want, got, optionflags):
# Split want & got into lines.
want_lines = want.splitlines(True) # True == keep line ends
got_lines = got.splitlines(True)
# Use difflib to find their differences.
if optionflags & REPORT_UDIFF:
diff = difflib.unified_diff(want_lines, got_lines, n=2)
diff = list(diff)[2:] # strip the diff header
kind = 'unified diff with -expected +actual'
elif optionflags & REPORT_CDIFF:
diff = difflib.context_diff(want_lines, got_lines, n=2)
diff = list(diff)[2:] # strip the diff header
kind = 'context diff with expected followed by actual'
elif optionflags & REPORT_NDIFF:
engine = difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK)
diff = list(engine.compare(want_lines, got_lines))
kind = 'ndiff with -expected +actual'
else:
assert 0, 'Bad diff option'
return 'Differences (%s):\n' % kind + _indent(''.join(diff))
# If we're not using diff, then simply list the expected
# output followed by the actual output.
if want and got:
return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
elif want:
return 'Expected:\n%sGot nothing\n' % _indent(want)
elif got:
return 'Expected nothing\nGot:\n%s' % _indent(got)
else:
return 'Expected nothing\nGot nothing\n'
class DocTestFailure(Exception):
"""A DocTest example has failed in debugging mode.
The exception instance has variables:
- test: the DocTest object being run
- example: the Example object that failed
- got: the actual output
"""
def __init__(self, test, example, got):
self.test = test
self.example = example
self.got = got
def __str__(self):
return str(self.test)
class UnexpectedException(Exception):
"""A DocTest example has encountered an unexpected exception
The exception instance has variables:
- test: the DocTest object being run
- example: the Example object that failed
- exc_info: the exception info
"""
def __init__(self, test, example, exc_info):
self.test = test
self.example = example
self.exc_info = exc_info
def __str__(self):
return str(self.test)
class DebugRunner(DocTestRunner):
r"""Run doc tests but raise an exception as soon as there is a failure.
If an unexpected exception occurs, an UnexpectedException is raised.
It contains the test, the example, and the original exception:
>>> runner = DebugRunner(verbose=False)
>>> test = DocTestParser().get_doctest('>>> raise KeyError\n42',
... {}, 'foo', 'foo.py', 0)
>>> try:
... runner.run(test)
... except UnexpectedException, failure:
... pass
>>> failure.test is test
True
>>> failure.example.want
'42\n'
>>> exc_info = failure.exc_info
>>> raise exc_info[0], exc_info[1], exc_info[2]
Traceback (most recent call last):
...
KeyError
We wrap the original exception to give the calling application
access to the test and example information.
If the output doesn't match, then a DocTestFailure is raised:
>>> test = DocTestParser().get_doctest('''
... >>> x = 1
... >>> x
... 2
... ''', {}, 'foo', 'foo.py', 0)
>>> try:
... runner.run(test)
... except DocTestFailure, failure:
... pass
DocTestFailure objects provide access to the test:
>>> failure.test is test
True
As well as to the example:
>>> failure.example.want
'2\n'
and the actual output:
>>> failure.got
'1\n'
If a failure or error occurs, the globals are left intact:
>>> del test.globs['__builtins__']
>>> test.globs
{'x': 1}
>>> test = DocTestParser().get_doctest('''
... >>> x = 2
... >>> raise KeyError
... ''', {}, 'foo', 'foo.py', 0)
>>> runner.run(test)
Traceback (most recent call last):
...
UnexpectedException: <DocTest foo from foo.py:0 (2 examples)>
>>> del test.globs['__builtins__']
>>> test.globs
{'x': 2}
But the globals are cleared if there is no error:
>>> test = DocTestParser().get_doctest('''
... >>> x = 2
... ''', {}, 'foo', 'foo.py', 0)
>>> runner.run(test)
TestResults(failed=0, attempted=1)
>>> test.globs
{}
"""
def run(self, test, compileflags=None, out=None, clear_globs=True):
r = DocTestRunner.run(self, test, compileflags, out, False)
if clear_globs:
test.globs.clear()
return r
def report_unexpected_exception(self, out, test, example, exc_info):
raise UnexpectedException(test, example, exc_info)
def report_failure(self, out, test, example, got):
raise DocTestFailure(test, example, got)
######################################################################
## 6. Test Functions
######################################################################
# These should be backwards compatible.
# For backward compatibility, a global instance of a DocTestRunner
# class, updated by testmod.
master = None
def testmod(m=None, name=None, globs=None, verbose=None,
report=True, optionflags=0, extraglobs=None,
raise_on_error=False, exclude_empty=False):
"""m=None, name=None, globs=None, verbose=None, report=True,
optionflags=0, extraglobs=None, raise_on_error=False,
exclude_empty=False
Test examples in docstrings in functions and classes reachable
from module m (or the current module if m is not supplied), starting
with m.__doc__.
Also test examples reachable from dict m.__test__ if it exists and is
not None. m.__test__ maps names to functions, classes and strings;
function and class docstrings are tested even if the name is private;
strings are tested directly, as if they were docstrings.
Return (#failures, #tests).
See help(doctest) for an overview.
Optional keyword arg "name" gives the name of the module; by default
use m.__name__.
Optional keyword arg "globs" gives a dict to be used as the globals
when executing examples; by default, use m.__dict__. A copy of this
dict is actually used for each docstring, so that each docstring's
examples start with a clean slate.
Optional keyword arg "extraglobs" gives a dictionary that should be
merged into the globals that are used to execute examples. By
default, no extra globals are used. This is new in 2.4.
Optional keyword arg "verbose" prints lots of stuff if true, prints
only failures if false; by default, it's true iff "-v" is in sys.argv.
Optional keyword arg "report" prints a summary at the end when true,
else prints nothing at the end. In verbose mode, the summary is
detailed, else very brief (in fact, empty if all tests passed).
Optional keyword arg "optionflags" or's together module constants,
and defaults to 0. This is new in 2.3. Possible values (see the
docs for details):
DONT_ACCEPT_TRUE_FOR_1
DONT_ACCEPT_BLANKLINE
NORMALIZE_WHITESPACE
ELLIPSIS
SKIP
IGNORE_EXCEPTION_DETAIL
REPORT_UDIFF
REPORT_CDIFF
REPORT_NDIFF
REPORT_ONLY_FIRST_FAILURE
Optional keyword arg "raise_on_error" raises an exception on the
first unexpected exception or failure. This allows failures to be
post-mortem debugged.
Advanced tomfoolery: testmod runs methods of a local instance of
class doctest.Tester, then merges the results into (or creates)
global Tester instance doctest.master. Methods of doctest.master
can be called directly too, if you want to do something unusual.
Passing report=0 to testmod is especially useful then, to delay
displaying a summary. Invoke doctest.master.summarize(verbose)
when you're done fiddling.
"""
global master
# If no module was given, then use __main__.
if m is None:
# DWA - m will still be None if this wasn't invoked from the command
# line, in which case the following TypeError is about as good an error
# as we should expect
m = sys.modules.get('__main__')
# Check that we were actually given a module.
if not inspect.ismodule(m):
raise TypeError("testmod: module required; %r" % (m,))
# If no name was given, then use the module's name.
if name is None:
name = m.__name__
# Find, parse, and run all tests in the given module.
finder = DocTestFinder(exclude_empty=exclude_empty)
if raise_on_error:
runner = DebugRunner(verbose=verbose, optionflags=optionflags)
else:
runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
for test in finder.find(m, name, globs=globs, extraglobs=extraglobs):
runner.run(test)
if report:
runner.summarize()
if master is None:
master = runner
else:
master.merge(runner)
return TestResults(runner.failures, runner.tries)
def testfile(filename, module_relative=True, name=None, package=None,
globs=None, verbose=None, report=True, optionflags=0,
extraglobs=None, raise_on_error=False, parser=DocTestParser(),
encoding=None):
"""
Test examples in the given file. Return (#failures, #tests).
Optional keyword arg "module_relative" specifies how filenames
should be interpreted:
- If "module_relative" is True (the default), then "filename"
specifies a module-relative path. By default, this path is
relative to the calling module's directory; but if the
"package" argument is specified, then it is relative to that
package. To ensure os-independence, "filename" should use
"/" characters to separate path segments, and should not
be an absolute path (i.e., it may not begin with "/").
- If "module_relative" is False, then "filename" specifies an
os-specific path. The path may be absolute or relative (to
the current working directory).
Optional keyword arg "name" gives the name of the test; by default
use the file's basename.
Optional keyword argument "package" is a Python package or the
name of a Python package whose directory should be used as the
base directory for a module relative filename. If no package is
specified, then the calling module's directory is used as the base
directory for module relative filenames. It is an error to
specify "package" if "module_relative" is False.
Optional keyword arg "globs" gives a dict to be used as the globals
when executing examples; by default, use {}. A copy of this dict
is actually used for each docstring, so that each docstring's
examples start with a clean slate.
Optional keyword arg "extraglobs" gives a dictionary that should be
merged into the globals that are used to execute examples. By
default, no extra globals are used.
Optional keyword arg "verbose" prints lots of stuff if true, prints
only failures if false; by default, it's true iff "-v" is in sys.argv.
Optional keyword arg "report" prints a summary at the end when true,
else prints nothing at the end. In verbose mode, the summary is
detailed, else very brief (in fact, empty if all tests passed).
Optional keyword arg "optionflags" or's together module constants,
and defaults to 0. Possible values (see the docs for details):
DONT_ACCEPT_TRUE_FOR_1
DONT_ACCEPT_BLANKLINE
NORMALIZE_WHITESPACE
ELLIPSIS
SKIP
IGNORE_EXCEPTION_DETAIL
REPORT_UDIFF
REPORT_CDIFF
REPORT_NDIFF
REPORT_ONLY_FIRST_FAILURE
Optional keyword arg "raise_on_error" raises an exception on the
first unexpected exception or failure. This allows failures to be
post-mortem debugged.
Optional keyword arg "parser" specifies a DocTestParser (or
subclass) that should be used to extract tests from the files.
Optional keyword arg "encoding" specifies an encoding that should
be used to convert the file to unicode.
Advanced tomfoolery: testmod runs methods of a local instance of
class doctest.Tester, then merges the results into (or creates)
global Tester instance doctest.master. Methods of doctest.master
can be called directly too, if you want to do something unusual.
Passing report=0 to testmod is especially useful then, to delay
displaying a summary. Invoke doctest.master.summarize(verbose)
when you're done fiddling.
"""
global master
if package and not module_relative:
raise ValueError("Package may only be specified for module-"
"relative paths.")
# Relativize the path
text, filename = _load_testfile(filename, package, module_relative)
# If no name was given, then use the file's name.
if name is None:
name = os.path.basename(filename)
# Assemble the globals.
if globs is None:
globs = {}
else:
globs = globs.copy()
if extraglobs is not None:
globs.update(extraglobs)
if '__name__' not in globs:
globs['__name__'] = '__main__'
if raise_on_error:
runner = DebugRunner(verbose=verbose, optionflags=optionflags)
else:
runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
if encoding is not None:
text = text.decode(encoding)
# Read the file, convert it to a test, and run it.
test = parser.get_doctest(text, globs, name, filename, 0)
runner.run(test)
if report:
runner.summarize()
if master is None:
master = runner
else:
master.merge(runner)
return TestResults(runner.failures, runner.tries)
def run_docstring_examples(f, globs, verbose=False, name="NoName",
compileflags=None, optionflags=0):
"""
Test examples in the given object's docstring (`f`), using `globs`
as globals. Optional argument `name` is used in failure messages.
If the optional argument `verbose` is true, then generate output
even if there are no failures.
`compileflags` gives the set of flags that should be used by the
Python compiler when running the examples. If not specified, then
it will default to the set of future-import flags that apply to
`globs`.
Optional keyword arg `optionflags` specifies options for the
testing and output. See the documentation for `testmod` for more
information.
"""
# Find, parse, and run all tests in the given module.
finder = DocTestFinder(verbose=verbose, recurse=False)
runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
for test in finder.find(f, name, globs=globs):
runner.run(test, compileflags=compileflags)
######################################################################
## 7. Tester
######################################################################
# This is provided only for backwards compatibility. It's not
# actually used in any way.
class Tester:
def __init__(self, mod=None, globs=None, verbose=None, optionflags=0):
warnings.warn("class Tester is deprecated; "
"use class doctest.DocTestRunner instead",
DeprecationWarning, stacklevel=2)
if mod is None and globs is None:
raise TypeError("Tester.__init__: must specify mod or globs")
if mod is not None and not inspect.ismodule(mod):
raise TypeError("Tester.__init__: mod must be a module; %r" %
(mod,))
if globs is None:
globs = mod.__dict__
self.globs = globs
self.verbose = verbose
self.optionflags = optionflags
self.testfinder = DocTestFinder()
self.testrunner = DocTestRunner(verbose=verbose,
optionflags=optionflags)
def runstring(self, s, name):
test = DocTestParser().get_doctest(s, self.globs, name, None, None)
if self.verbose:
print "Running string", name
(f,t) = self.testrunner.run(test)
if self.verbose:
print f, "of", t, "examples failed in string", name
return TestResults(f,t)
def rundoc(self, object, name=None, module=None):
f = t = 0
tests = self.testfinder.find(object, name, module=module,
globs=self.globs)
for test in tests:
(f2, t2) = self.testrunner.run(test)
(f,t) = (f+f2, t+t2)
return TestResults(f,t)
def rundict(self, d, name, module=None):
import types
m = types.ModuleType(name)
m.__dict__.update(d)
if module is None:
module = False
return self.rundoc(m, name, module)
def run__test__(self, d, name):
import types
m = types.ModuleType(name)
m.__test__ = d
return self.rundoc(m, name)
def summarize(self, verbose=None):
return self.testrunner.summarize(verbose)
def merge(self, other):
self.testrunner.merge(other.testrunner)
######################################################################
## 8. Unittest Support
######################################################################
_unittest_reportflags = 0
def set_unittest_reportflags(flags):
"""Sets the unittest option flags.
The old flag is returned so that a runner could restore the old
value if it wished to:
>>> import doctest
>>> old = doctest._unittest_reportflags
>>> doctest.set_unittest_reportflags(REPORT_NDIFF |
... REPORT_ONLY_FIRST_FAILURE) == old
True
>>> doctest._unittest_reportflags == (REPORT_NDIFF |
... REPORT_ONLY_FIRST_FAILURE)
True
Only reporting flags can be set:
>>> doctest.set_unittest_reportflags(ELLIPSIS)
Traceback (most recent call last):
...
ValueError: ('Only reporting flags allowed', 8)
>>> doctest.set_unittest_reportflags(old) == (REPORT_NDIFF |
... REPORT_ONLY_FIRST_FAILURE)
True
"""
global _unittest_reportflags
if (flags & REPORTING_FLAGS) != flags:
raise ValueError("Only reporting flags allowed", flags)
old = _unittest_reportflags
_unittest_reportflags = flags
return old
class DocTestCase(unittest.TestCase):
def __init__(self, test, optionflags=0, setUp=None, tearDown=None,
checker=None):
unittest.TestCase.__init__(self)
self._dt_optionflags = optionflags
self._dt_checker = checker
self._dt_test = test
self._dt_setUp = setUp
self._dt_tearDown = tearDown
def setUp(self):
test = self._dt_test
if self._dt_setUp is not None:
self._dt_setUp(test)
def tearDown(self):
test = self._dt_test
if self._dt_tearDown is not None:
self._dt_tearDown(test)
test.globs.clear()
def runTest(self):
test = self._dt_test
old = sys.stdout
new = StringIO()
optionflags = self._dt_optionflags
if not (optionflags & REPORTING_FLAGS):
# The option flags don't include any reporting flags,
# so add the default reporting flags
optionflags |= _unittest_reportflags
runner = DocTestRunner(optionflags=optionflags,
checker=self._dt_checker, verbose=False)
try:
runner.DIVIDER = "-"*70
failures, tries = runner.run(
test, out=new.write, clear_globs=False)
finally:
sys.stdout = old
if failures:
raise self.failureException(self.format_failure(new.getvalue()))
def format_failure(self, err):
test = self._dt_test
if test.lineno is None:
lineno = 'unknown line number'
else:
lineno = '%s' % test.lineno
lname = '.'.join(test.name.split('.')[-1:])
return ('Failed doctest test for %s\n'
' File "%s", line %s, in %s\n\n%s'
% (test.name, test.filename, lineno, lname, err)
)
def debug(self):
r"""Run the test case without results and without catching exceptions
The unit test framework includes a debug method on test cases
and test suites to support post-mortem debugging. The test code
is run in such a way that errors are not caught. This way a
caller can catch the errors and initiate post-mortem debugging.
The DocTestCase provides a debug method that raises
UnexpectedException errors if there is an unexpected
exception:
>>> test = DocTestParser().get_doctest('>>> raise KeyError\n42',
... {}, 'foo', 'foo.py', 0)
>>> case = DocTestCase(test)
>>> try:
... case.debug()
... except UnexpectedException, failure:
... pass
The UnexpectedException contains the test, the example, and
the original exception:
>>> failure.test is test
True
>>> failure.example.want
'42\n'
>>> exc_info = failure.exc_info
>>> raise exc_info[0], exc_info[1], exc_info[2]
Traceback (most recent call last):
...
KeyError
If the output doesn't match, then a DocTestFailure is raised:
>>> test = DocTestParser().get_doctest('''
... >>> x = 1
... >>> x
... 2
... ''', {}, 'foo', 'foo.py', 0)
>>> case = DocTestCase(test)
>>> try:
... case.debug()
... except DocTestFailure, failure:
... pass
DocTestFailure objects provide access to the test:
>>> failure.test is test
True
As well as to the example:
>>> failure.example.want
'2\n'
and the actual output:
>>> failure.got
'1\n'
"""
self.setUp()
runner = DebugRunner(optionflags=self._dt_optionflags,
checker=self._dt_checker, verbose=False)
runner.run(self._dt_test, clear_globs=False)
self.tearDown()
def id(self):
return self._dt_test.name
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self._dt_test == other._dt_test and \
self._dt_optionflags == other._dt_optionflags and \
self._dt_setUp == other._dt_setUp and \
self._dt_tearDown == other._dt_tearDown and \
self._dt_checker == other._dt_checker
def __ne__(self, other):
return not self == other
def __hash__(self):
return hash((self._dt_optionflags, self._dt_setUp, self._dt_tearDown,
self._dt_checker))
def __repr__(self):
name = self._dt_test.name.split('.')
return "%s (%s)" % (name[-1], '.'.join(name[:-1]))
__str__ = __repr__
def shortDescription(self):
return "Doctest: " + self._dt_test.name
class SkipDocTestCase(DocTestCase):
def __init__(self, module):
self.module = module
DocTestCase.__init__(self, None)
def setUp(self):
self.skipTest("DocTestSuite will not work with -O2 and above")
def test_skip(self):
pass
def shortDescription(self):
return "Skipping tests from %s" % self.module.__name__
__str__ = shortDescription
def DocTestSuite(module=None, globs=None, extraglobs=None, test_finder=None,
**options):
"""
Convert doctest tests for a module to a unittest test suite.
This converts each documentation string in a module that
contains doctest tests to a unittest test case. If any of the
tests in a doc string fail, then the test case fails. An exception
is raised showing the name of the file containing the test and a
(sometimes approximate) line number.
The `module` argument provides the module to be tested. The argument
can be either a module or a module name.
If no argument is given, the calling module is used.
A number of options may be provided as keyword arguments:
setUp
A set-up function. This is called before running the
tests in each file. The setUp function will be passed a DocTest
object. The setUp function can access the test globals as the
globs attribute of the test passed.
tearDown
A tear-down function. This is called after running the
tests in each file. The tearDown function will be passed a DocTest
object. The tearDown function can access the test globals as the
globs attribute of the test passed.
globs
A dictionary containing initial global variables for the tests.
optionflags
A set of doctest option flags expressed as an integer.
"""
if test_finder is None:
test_finder = DocTestFinder()
module = _normalize_module(module)
tests = test_finder.find(module, globs=globs, extraglobs=extraglobs)
if not tests and sys.flags.optimize >=2:
# Skip doctests when running with -O2
suite = unittest.TestSuite()
suite.addTest(SkipDocTestCase(module))
return suite
elif not tests:
# Why do we want to do this? Because it reveals a bug that might
# otherwise be hidden.
# It is probably a bug that this exception is not also raised if the
# number of doctest examples in tests is zero (i.e. if no doctest
# examples were found). However, we should probably not be raising
# an exception at all here, though it is too late to make this change
# for a maintenance release. See also issue #14649.
raise ValueError(module, "has no docstrings")
tests.sort()
suite = unittest.TestSuite()
for test in tests:
if len(test.examples) == 0:
continue
if not test.filename:
filename = module.__file__
if filename[-4:] in (".pyc", ".pyo"):
filename = filename[:-1]
test.filename = filename
suite.addTest(DocTestCase(test, **options))
return suite
class DocFileCase(DocTestCase):
def id(self):
return '_'.join(self._dt_test.name.split('.'))
def __repr__(self):
return self._dt_test.filename
__str__ = __repr__
def format_failure(self, err):
return ('Failed doctest test for %s\n File "%s", line 0\n\n%s'
% (self._dt_test.name, self._dt_test.filename, err)
)
def DocFileTest(path, module_relative=True, package=None,
globs=None, parser=DocTestParser(),
encoding=None, **options):
if globs is None:
globs = {}
else:
globs = globs.copy()
if package and not module_relative:
raise ValueError("Package may only be specified for module-"
"relative paths.")
# Relativize the path.
doc, path = _load_testfile(path, package, module_relative)
if "__file__" not in globs:
globs["__file__"] = path
# Find the file and read it.
name = os.path.basename(path)
# If an encoding is specified, use it to convert the file to unicode
if encoding is not None:
doc = doc.decode(encoding)
# Convert it to a test, and wrap it in a DocFileCase.
test = parser.get_doctest(doc, globs, name, path, 0)
return DocFileCase(test, **options)
def DocFileSuite(*paths, **kw):
"""A unittest suite for one or more doctest files.
The path to each doctest file is given as a string; the
interpretation of that string depends on the keyword argument
"module_relative".
A number of options may be provided as keyword arguments:
module_relative
If "module_relative" is True, then the given file paths are
interpreted as os-independent module-relative paths. By
default, these paths are relative to the calling module's
directory; but if the "package" argument is specified, then
they are relative to that package. To ensure os-independence,
"filename" should use "/" characters to separate path
segments, and may not be an absolute path (i.e., it may not
begin with "/").
If "module_relative" is False, then the given file paths are
interpreted as os-specific paths. These paths may be absolute
or relative (to the current working directory).
package
A Python package or the name of a Python package whose directory
should be used as the base directory for module relative paths.
If "package" is not specified, then the calling module's
directory is used as the base directory for module relative
filenames. It is an error to specify "package" if
"module_relative" is False.
setUp
A set-up function. This is called before running the
tests in each file. The setUp function will be passed a DocTest
object. The setUp function can access the test globals as the
globs attribute of the test passed.
tearDown
A tear-down function. This is called after running the
tests in each file. The tearDown function will be passed a DocTest
object. The tearDown function can access the test globals as the
globs attribute of the test passed.
globs
A dictionary containing initial global variables for the tests.
optionflags
A set of doctest option flags expressed as an integer.
parser
A DocTestParser (or subclass) that should be used to extract
tests from the files.
encoding
An encoding that will be used to convert the files to unicode.
"""
suite = unittest.TestSuite()
# We do this here so that _normalize_module is called at the right
# level. If it were called in DocFileTest, then this function
# would be the caller and we might guess the package incorrectly.
if kw.get('module_relative', True):
kw['package'] = _normalize_module(kw.get('package'))
for path in paths:
suite.addTest(DocFileTest(path, **kw))
return suite
######################################################################
## 9. Debugging Support
######################################################################
def script_from_examples(s):
r"""Extract script from text with examples.
Converts text with examples to a Python script. Example input is
converted to regular code. Example output and all other words
are converted to comments:
>>> text = '''
... Here are examples of simple math.
...
... Python has super accurate integer addition
...
... >>> 2 + 2
... 5
...
... And very friendly error messages:
...
... >>> 1/0
... To Infinity
... And
... Beyond
...
... You can use logic if you want:
...
... >>> if 0:
... ... blah
... ... blah
... ...
...
... Ho hum
... '''
>>> print script_from_examples(text)
# Here are examples of simple math.
#
# Python has super accurate integer addition
#
2 + 2
# Expected:
## 5
#
# And very friendly error messages:
#
1/0
# Expected:
## To Infinity
## And
## Beyond
#
# You can use logic if you want:
#
if 0:
blah
blah
#
# Ho hum
<BLANKLINE>
"""
output = []
for piece in DocTestParser().parse(s):
if isinstance(piece, Example):
# Add the example's source code (strip trailing NL)
output.append(piece.source[:-1])
# Add the expected output:
want = piece.want
if want:
output.append('# Expected:')
output += ['## '+l for l in want.split('\n')[:-1]]
else:
# Add non-example text.
output += [_comment_line(l)
for l in piece.split('\n')[:-1]]
# Trim junk on both ends.
while output and output[-1] == '#':
output.pop()
while output and output[0] == '#':
output.pop(0)
# Combine the output, and return it.
# Add a courtesy newline to prevent exec from choking (see bug #1172785)
return '\n'.join(output) + '\n'
def testsource(module, name):
"""Extract the test sources from a doctest docstring as a script.
Provide the module (or dotted name of the module) containing the
test to be debugged and the name (within the module) of the object
with the doc string with tests to be debugged.
"""
module = _normalize_module(module)
tests = DocTestFinder().find(module)
test = [t for t in tests if t.name == name]
if not test:
raise ValueError(name, "not found in tests")
test = test[0]
testsrc = script_from_examples(test.docstring)
return testsrc
def debug_src(src, pm=False, globs=None):
"""Debug a single doctest docstring, in argument `src`'"""
testsrc = script_from_examples(src)
debug_script(testsrc, pm, globs)
def debug_script(src, pm=False, globs=None):
"Debug a test script. `src` is the script, as a string."
import pdb
# Note that tempfile.NameTemporaryFile() cannot be used. As the
# docs say, a file so created cannot be opened by name a second time
# on modern Windows boxes, and execfile() needs to open it.
srcfilename = tempfile.mktemp(".py", "doctestdebug")
f = open(srcfilename, 'w')
f.write(src)
f.close()
try:
if globs:
globs = globs.copy()
else:
globs = {}
if pm:
try:
execfile(srcfilename, globs, globs)
except:
print sys.exc_info()[1]
pdb.post_mortem(sys.exc_info()[2])
else:
# Note that %r is vital here. '%s' instead can, e.g., cause
# backslashes to get treated as metacharacters on Windows.
pdb.run("execfile(%r)" % srcfilename, globs, globs)
finally:
os.remove(srcfilename)
def debug(module, name, pm=False):
"""Debug a single doctest docstring.
Provide the module (or dotted name of the module) containing the
test to be debugged and the name (within the module) of the object
with the docstring with tests to be debugged.
"""
module = _normalize_module(module)
testsrc = testsource(module, name)
debug_script(testsrc, pm, module.__dict__)
######################################################################
## 10. Example Usage
######################################################################
class _TestClass:
"""
A pointless class, for sanity-checking of docstring testing.
Methods:
square()
get()
>>> _TestClass(13).get() + _TestClass(-12).get()
1
>>> hex(_TestClass(13).square().get())
'0xa9'
"""
def __init__(self, val):
"""val -> _TestClass object with associated value val.
>>> t = _TestClass(123)
>>> print t.get()
123
"""
self.val = val
def square(self):
"""square() -> square TestClass's associated value
>>> _TestClass(13).square().get()
169
"""
self.val = self.val ** 2
return self
def get(self):
"""get() -> return TestClass's associated value.
>>> x = _TestClass(-42)
>>> print x.get()
-42
"""
return self.val
__test__ = {"_TestClass": _TestClass,
"string": r"""
Example of a string object, searched as-is.
>>> x = 1; y = 2
>>> x + y, x * y
(3, 2)
""",
"bool-int equivalence": r"""
In 2.2, boolean expressions displayed
0 or 1. By default, we still accept
them. This can be disabled by passing
DONT_ACCEPT_TRUE_FOR_1 to the new
optionflags argument.
>>> 4 == 4
1
>>> 4 == 4
True
>>> 4 > 4
0
>>> 4 > 4
False
""",
"blank lines": r"""
Blank lines can be marked with <BLANKLINE>:
>>> print 'foo\n\nbar\n'
foo
<BLANKLINE>
bar
<BLANKLINE>
""",
"ellipsis": r"""
If the ellipsis flag is used, then '...' can be used to
elide substrings in the desired output:
>>> print range(1000) #doctest: +ELLIPSIS
[0, 1, 2, ..., 999]
""",
"whitespace normalization": r"""
If the whitespace normalization flag is used, then
differences in whitespace are ignored.
>>> print range(30) #doctest: +NORMALIZE_WHITESPACE
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29]
""",
}
def _test():
testfiles = [arg for arg in sys.argv[1:] if arg and arg[0] != '-']
if not testfiles:
name = os.path.basename(sys.argv[0])
if '__loader__' in globals(): # python -m
name, _ = os.path.splitext(name)
print("usage: {0} [-v] file ...".format(name))
return 2
for filename in testfiles:
if filename.endswith(".py"):
# It is a module -- insert its dir into sys.path and try to
# import it. If it is part of a package, that possibly
# won't work because of package imports.
dirname, filename = os.path.split(filename)
sys.path.insert(0, dirname)
m = __import__(filename[:-3])
del sys.path[0]
failures, _ = testmod(m)
else:
failures, _ = testfile(filename, module_relative=False)
if failures:
return 1
return 0
if __name__ == "__main__":
sys.exit(_test())
"""Self documenting XML-RPC Server.
This module can be used to create XML-RPC servers that
serve pydoc-style documentation in response to HTTP
GET requests. This documentation is dynamically generated
based on the functions and methods registered with the
server.
This module is built upon the pydoc and SimpleXMLRPCServer
modules.
"""
import pydoc
import inspect
import re
import sys
from SimpleXMLRPCServer import (SimpleXMLRPCServer,
SimpleXMLRPCRequestHandler,
CGIXMLRPCRequestHandler,
resolve_dotted_attribute)
def _html_escape_quote(s):
s = s.replace("&", "&") # Must be done first!
s = s.replace("<", "<")
s = s.replace(">", ">")
s = s.replace('"', """)
s = s.replace('\'', "'")
return s
class ServerHTMLDoc(pydoc.HTMLDoc):
"""Class used to generate pydoc HTML document for a server"""
def markup(self, text, escape=None, funcs={}, classes={}, methods={}):
"""Mark up some plain text, given a context of symbols to look for.
Each context dictionary maps object names to anchor names."""
escape = escape or self.escape
results = []
here = 0
# XXX Note that this regular expression does not allow for the
# hyperlinking of arbitrary strings being used as method
# names. Only methods with names consisting of word characters
# and '.'s are hyperlinked.
pattern = re.compile(r'\b((http|ftp)://\S+[\w/]|'
r'RFC[- ]?(\d+)|'
r'PEP[- ]?(\d+)|'
r'(self\.)?((?:\w|\.)+))\b')
while 1:
match = pattern.search(text, here)
if not match: break
start, end = match.span()
results.append(escape(text[here:start]))
all, scheme, rfc, pep, selfdot, name = match.groups()
if scheme:
url = escape(all).replace('"', '"')
results.append('<a href="%s">%s</a>' % (url, url))
elif rfc:
url = 'http://www.rfc-editor.org/rfc/rfc%d.txt' % int(rfc)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif pep:
url = 'http://www.python.org/dev/peps/pep-%04d/' % int(pep)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif text[end:end+1] == '(':
results.append(self.namelink(name, methods, funcs, classes))
elif selfdot:
results.append('self.<strong>%s</strong>' % name)
else:
results.append(self.namelink(name, classes))
here = end
results.append(escape(text[here:]))
return ''.join(results)
def docroutine(self, object, name, mod=None,
funcs={}, classes={}, methods={}, cl=None):
"""Produce HTML documentation for a function or method object."""
anchor = (cl and cl.__name__ or '') + '-' + name
note = ''
title = '<a name="%s"><strong>%s</strong></a>' % (
self.escape(anchor), self.escape(name))
if inspect.ismethod(object):
args, varargs, varkw, defaults = inspect.getargspec(object.im_func)
# exclude the argument bound to the instance, it will be
# confusing to the non-Python user
argspec = inspect.formatargspec (
args[1:],
varargs,
varkw,
defaults,
formatvalue=self.formatvalue
)
elif inspect.isfunction(object):
args, varargs, varkw, defaults = inspect.getargspec(object)
argspec = inspect.formatargspec(
args, varargs, varkw, defaults, formatvalue=self.formatvalue)
else:
argspec = '(...)'
if isinstance(object, tuple):
argspec = object[0] or argspec
docstring = object[1] or ""
else:
docstring = pydoc.getdoc(object)
decl = title + argspec + (note and self.grey(
'<font face="helvetica, arial">%s</font>' % note))
doc = self.markup(
docstring, self.preformat, funcs, classes, methods)
doc = doc and '<dd><tt>%s</tt></dd>' % doc
return '<dl><dt>%s</dt>%s</dl>\n' % (decl, doc)
def docserver(self, server_name, package_documentation, methods):
"""Produce HTML documentation for an XML-RPC server."""
fdict = {}
for key, value in methods.items():
fdict[key] = '#-' + key
fdict[value] = fdict[key]
server_name = self.escape(server_name)
head = '<big><big><strong>%s</strong></big></big>' % server_name
result = self.heading(head, '#ffffff', '#7799ee')
doc = self.markup(package_documentation, self.preformat, fdict)
doc = doc and '<tt>%s</tt>' % doc
result = result + '<p>%s</p>\n' % doc
contents = []
method_items = sorted(methods.items())
for key, value in method_items:
contents.append(self.docroutine(value, key, funcs=fdict))
result = result + self.bigsection(
'Methods', '#ffffff', '#eeaa77', pydoc.join(contents))
return result
class XMLRPCDocGenerator:
"""Generates documentation for an XML-RPC server.
This class is designed as mix-in and should not
be constructed directly.
"""
def __init__(self):
# setup variables used for HTML documentation
self.server_name = 'XML-RPC Server Documentation'
self.server_documentation = \
"This server exports the following methods through the XML-RPC "\
"protocol."
self.server_title = 'XML-RPC Server Documentation'
def set_server_title(self, server_title):
"""Set the HTML title of the generated server documentation"""
self.server_title = server_title
def set_server_name(self, server_name):
"""Set the name of the generated HTML server documentation"""
self.server_name = server_name
def set_server_documentation(self, server_documentation):
"""Set the documentation string for the entire server."""
self.server_documentation = server_documentation
def generate_html_documentation(self):
"""generate_html_documentation() => html documentation for the server
Generates HTML documentation for the server using introspection for
installed functions and instances that do not implement the
_dispatch method. Alternatively, instances can choose to implement
the _get_method_argstring(method_name) method to provide the
argument string used in the documentation and the
_methodHelp(method_name) method to provide the help text used
in the documentation."""
methods = {}
for method_name in self.system_listMethods():
if method_name in self.funcs:
method = self.funcs[method_name]
elif self.instance is not None:
method_info = [None, None] # argspec, documentation
if hasattr(self.instance, '_get_method_argstring'):
method_info[0] = self.instance._get_method_argstring(method_name)
if hasattr(self.instance, '_methodHelp'):
method_info[1] = self.instance._methodHelp(method_name)
method_info = tuple(method_info)
if method_info != (None, None):
method = method_info
elif not hasattr(self.instance, '_dispatch'):
try:
method = resolve_dotted_attribute(
self.instance,
method_name
)
except AttributeError:
method = method_info
else:
method = method_info
else:
assert 0, "Could not find method in self.functions and no "\
"instance installed"
methods[method_name] = method
documenter = ServerHTMLDoc()
documentation = documenter.docserver(
self.server_name,
self.server_documentation,
methods
)
title = _html_escape_quote(self.server_title)
return documenter.page(title, documentation)
class DocXMLRPCRequestHandler(SimpleXMLRPCRequestHandler):
"""XML-RPC and documentation request handler class.
Handles all HTTP POST requests and attempts to decode them as
XML-RPC requests.
Handles all HTTP GET requests and interprets them as requests
for documentation.
"""
def do_GET(self):
"""Handles the HTTP GET request.
Interpret all HTTP GET requests as requests for server
documentation.
"""
# Check that the path is legal
if not self.is_rpc_path_valid():
self.report_404()
return
response = self.server.generate_html_documentation()
self.send_response(200)
self.send_header("Content-type", "text/html")
self.send_header("Content-length", str(len(response)))
self.end_headers()
self.wfile.write(response)
class DocXMLRPCServer( SimpleXMLRPCServer,
XMLRPCDocGenerator):
"""XML-RPC and HTML documentation server.
Adds the ability to serve server documentation to the capabilities
of SimpleXMLRPCServer.
"""
def __init__(self, addr, requestHandler=DocXMLRPCRequestHandler,
logRequests=1, allow_none=False, encoding=None,
bind_and_activate=True):
SimpleXMLRPCServer.__init__(self, addr, requestHandler, logRequests,
allow_none, encoding, bind_and_activate)
XMLRPCDocGenerator.__init__(self)
class DocCGIXMLRPCRequestHandler( CGIXMLRPCRequestHandler,
XMLRPCDocGenerator):
"""Handler for XML-RPC data and documentation requests passed through
CGI"""
def handle_get(self):
"""Handles the HTTP GET request.
Interpret all HTTP GET requests as requests for server
documentation.
"""
response = self.generate_html_documentation()
print 'Content-Type: text/html'
print 'Content-Length: %d' % len(response)
print
sys.stdout.write(response)
def __init__(self):
CGIXMLRPCRequestHandler.__init__(self)
XMLRPCDocGenerator.__init__(self)
"""A dumb and slow but simple dbm clone.
For database spam, spam.dir contains the index (a text file),
spam.bak *may* contain a backup of the index (also a text file),
while spam.dat contains the data (a binary file).
XXX TO DO:
- seems to contain a bug when updating...
- reclaim free space (currently, space once occupied by deleted or expanded
items is never reused)
- support concurrent access (currently, if two processes take turns making
updates, they can mess up the index)
- support efficient access to large databases (currently, the whole index
is read when the database is opened, and some updates rewrite the whole index)
- support opening for read-only (flag = 'm')
"""
import ast as _ast
import os as _os
import __builtin__
import UserDict
_open = __builtin__.open
_BLOCKSIZE = 512
error = IOError # For anydbm
class _Database(UserDict.DictMixin):
# The on-disk directory and data files can remain in mutually
# inconsistent states for an arbitrarily long time (see comments
# at the end of __setitem__). This is only repaired when _commit()
# gets called. One place _commit() gets called is from __del__(),
# and if that occurs at program shutdown time, module globals may
# already have gotten rebound to None. Since it's crucial that
# _commit() finish successfully, we can't ignore shutdown races
# here, and _commit() must not reference any globals.
_os = _os # for _commit()
_open = _open # for _commit()
def __init__(self, filebasename, mode, flag='c'):
self._mode = mode
self._readonly = (flag == 'r')
# The directory file is a text file. Each line looks like
# "%r, (%d, %d)\n" % (key, pos, siz)
# where key is the string key, pos is the offset into the dat
# file of the associated value's first byte, and siz is the number
# of bytes in the associated value.
self._dirfile = filebasename + _os.extsep + 'dir'
# The data file is a binary file pointed into by the directory
# file, and holds the values associated with keys. Each value
# begins at a _BLOCKSIZE-aligned byte offset, and is a raw
# binary 8-bit string value.
self._datfile = filebasename + _os.extsep + 'dat'
self._bakfile = filebasename + _os.extsep + 'bak'
# The index is an in-memory dict, mirroring the directory file.
self._index = None # maps keys to (pos, siz) pairs
# Mod by Jack: create data file if needed
try:
f = _open(self._datfile, 'r')
except IOError:
with _open(self._datfile, 'w') as f:
self._chmod(self._datfile)
else:
f.close()
self._update()
# Read directory file into the in-memory index dict.
def _update(self):
self._index = {}
try:
f = _open(self._dirfile)
except IOError:
self._modified = not self._readonly
else:
self._modified = False
with f:
for line in f:
line = line.rstrip()
key, pos_and_siz_pair = _ast.literal_eval(line)
self._index[key] = pos_and_siz_pair
# Write the index dict to the directory file. The original directory
# file (if any) is renamed with a .bak extension first. If a .bak
# file currently exists, it's deleted.
def _commit(self):
# CAUTION: It's vital that _commit() succeed, and _commit() can
# be called from __del__(). Therefore we must never reference a
# global in this routine.
if self._index is None or not self._modified:
return # nothing to do
try:
self._os.unlink(self._bakfile)
except self._os.error:
pass
try:
self._os.rename(self._dirfile, self._bakfile)
except self._os.error:
pass
with self._open(self._dirfile, 'w') as f:
self._chmod(self._dirfile)
for key, pos_and_siz_pair in self._index.iteritems():
f.write("%r, %r\n" % (key, pos_and_siz_pair))
sync = _commit
def __getitem__(self, key):
pos, siz = self._index[key] # may raise KeyError
with _open(self._datfile, 'rb') as f:
f.seek(pos)
dat = f.read(siz)
return dat
# Append val to the data file, starting at a _BLOCKSIZE-aligned
# offset. The data file is first padded with NUL bytes (if needed)
# to get to an aligned offset. Return pair
# (starting offset of val, len(val))
def _addval(self, val):
with _open(self._datfile, 'rb+') as f:
f.seek(0, 2)
pos = int(f.tell())
npos = ((pos + _BLOCKSIZE - 1) // _BLOCKSIZE) * _BLOCKSIZE
f.write('\0'*(npos-pos))
pos = npos
f.write(val)
return (pos, len(val))
# Write val to the data file, starting at offset pos. The caller
# is responsible for ensuring that there's enough room starting at
# pos to hold val, without overwriting some other value. Return
# pair (pos, len(val)).
def _setval(self, pos, val):
with _open(self._datfile, 'rb+') as f:
f.seek(pos)
f.write(val)
return (pos, len(val))
# key is a new key whose associated value starts in the data file
# at offset pos and with length siz. Add an index record to
# the in-memory index dict, and append one to the directory file.
def _addkey(self, key, pos_and_siz_pair):
self._index[key] = pos_and_siz_pair
with _open(self._dirfile, 'a') as f:
self._chmod(self._dirfile)
f.write("%r, %r\n" % (key, pos_and_siz_pair))
def __setitem__(self, key, val):
if not type(key) == type('') == type(val):
raise TypeError, "keys and values must be strings"
self._modified = True
if key not in self._index:
self._addkey(key, self._addval(val))
else:
# See whether the new value is small enough to fit in the
# (padded) space currently occupied by the old value.
pos, siz = self._index[key]
oldblocks = (siz + _BLOCKSIZE - 1) // _BLOCKSIZE
newblocks = (len(val) + _BLOCKSIZE - 1) // _BLOCKSIZE
if newblocks <= oldblocks:
self._index[key] = self._setval(pos, val)
else:
# The new value doesn't fit in the (padded) space used
# by the old value. The blocks used by the old value are
# forever lost.
self._index[key] = self._addval(val)
# Note that _index may be out of synch with the directory
# file now: _setval() and _addval() don't update the directory
# file. This also means that the on-disk directory and data
# files are in a mutually inconsistent state, and they'll
# remain that way until _commit() is called. Note that this
# is a disaster (for the database) if the program crashes
# (so that _commit() never gets called).
def __delitem__(self, key):
self._modified = True
# The blocks used by the associated value are lost.
del self._index[key]
# XXX It's unclear why we do a _commit() here (the code always
# XXX has, so I'm not changing it). _setitem__ doesn't try to
# XXX keep the directory file in synch. Why should we? Or
# XXX why shouldn't __setitem__?
self._commit()
def keys(self):
return self._index.keys()
def has_key(self, key):
return key in self._index
def __contains__(self, key):
return key in self._index
def iterkeys(self):
return self._index.iterkeys()
__iter__ = iterkeys
def __len__(self):
return len(self._index)
def close(self):
try:
self._commit()
finally:
self._index = self._datfile = self._dirfile = self._bakfile = None
__del__ = close
def _chmod (self, file):
if hasattr(self._os, 'chmod'):
self._os.chmod(file, self._mode)
def open(file, flag=None, mode=0666):
"""Open the database file, filename, and return corresponding object.
The flag argument, used to control how the database is opened in the
other DBM implementations, is ignored in the dumbdbm module; the
database is always opened for update, and will be created if it does
not exist.
The optional mode argument is the UNIX mode of the file, used only when
the database has to be created. It defaults to octal code 0666 (and
will be modified by the prevailing umask).
"""
# flag argument is currently ignored
# Modify mode depending on the umask
try:
um = _os.umask(0)
_os.umask(um)
except AttributeError:
pass
else:
# Turn off any bits that are set in the umask
mode = mode & (~um)
return _Database(file, mode, flag)
"""Drop-in replacement for the thread module.
Meant to be used as a brain-dead substitute so that threaded code does
not need to be rewritten for when the thread module is not present.
Suggested usage is::
try:
import thread
except ImportError:
import dummy_thread as thread
"""
# Exports only things specified by thread documentation;
# skipping obsolete synonyms allocate(), start_new(), exit_thread().
__all__ = ['error', 'start_new_thread', 'exit', 'get_ident', 'allocate_lock',
'interrupt_main', 'LockType']
import traceback as _traceback
class error(Exception):
"""Dummy implementation of thread.error."""
def __init__(self, *args):
self.args = args
def start_new_thread(function, args, kwargs={}):
"""Dummy implementation of thread.start_new_thread().
Compatibility is maintained by making sure that ``args`` is a
tuple and ``kwargs`` is a dictionary. If an exception is raised
and it is SystemExit (which can be done by thread.exit()) it is
caught and nothing is done; all other exceptions are printed out
by using traceback.print_exc().
If the executed function calls interrupt_main the KeyboardInterrupt will be
raised when the function returns.
"""
if type(args) != type(tuple()):
raise TypeError("2nd arg must be a tuple")
if type(kwargs) != type(dict()):
raise TypeError("3rd arg must be a dict")
global _main
_main = False
try:
function(*args, **kwargs)
except SystemExit:
pass
except:
_traceback.print_exc()
_main = True
global _interrupt
if _interrupt:
_interrupt = False
raise KeyboardInterrupt
def exit():
"""Dummy implementation of thread.exit()."""
raise SystemExit
def get_ident():
"""Dummy implementation of thread.get_ident().
Since this module should only be used when threadmodule is not
available, it is safe to assume that the current process is the
only thread. Thus a constant can be safely returned.
"""
return -1
def allocate_lock():
"""Dummy implementation of thread.allocate_lock()."""
return LockType()
def stack_size(size=None):
"""Dummy implementation of thread.stack_size()."""
if size is not None:
raise error("setting thread stack size not supported")
return 0
class LockType(object):
"""Class implementing dummy implementation of thread.LockType.
Compatibility is maintained by maintaining self.locked_status
which is a boolean that stores the state of the lock. Pickling of
the lock, though, should not be done since if the thread module is
then used with an unpickled ``lock()`` from here problems could
occur from this class not having atomic methods.
"""
def __init__(self):
self.locked_status = False
def acquire(self, waitflag=None):
"""Dummy implementation of acquire().
For blocking calls, self.locked_status is automatically set to
True and returned appropriately based on value of
``waitflag``. If it is non-blocking, then the value is
actually checked and not set if it is already acquired. This
is all done so that threading.Condition's assert statements
aren't triggered and throw a little fit.
"""
if waitflag is None or waitflag:
self.locked_status = True
return True
else:
if not self.locked_status:
self.locked_status = True
return True
else:
return False
__enter__ = acquire
def __exit__(self, typ, val, tb):
self.release()
def release(self):
"""Release the dummy lock."""
# XXX Perhaps shouldn't actually bother to test? Could lead
# to problems for complex, threaded code.
if not self.locked_status:
raise error
self.locked_status = False
return True
def locked(self):
return self.locked_status
# Used to signal that interrupt_main was called in a "thread"
_interrupt = False
# True when not executing in a "thread"
_main = True
def interrupt_main():
"""Set _interrupt flag to True to have start_new_thread raise
KeyboardInterrupt upon exiting."""
if _main:
raise KeyboardInterrupt
else:
global _interrupt
_interrupt = True
"""Faux ``threading`` version using ``dummy_thread`` instead of ``thread``.
The module ``_dummy_threading`` is added to ``sys.modules`` in order
to not have ``threading`` considered imported. Had ``threading`` been
directly imported it would have made all subsequent imports succeed
regardless of whether ``thread`` was available which is not desired.
"""
from sys import modules as sys_modules
import dummy_thread
# Declaring now so as to not have to nest ``try``s to get proper clean-up.
holding_thread = False
holding_threading = False
holding__threading_local = False
try:
# Could have checked if ``thread`` was not in sys.modules and gone
# a different route, but decided to mirror technique used with
# ``threading`` below.
if 'thread' in sys_modules:
held_thread = sys_modules['thread']
holding_thread = True
# Must have some module named ``thread`` that implements its API
# in order to initially import ``threading``.
sys_modules['thread'] = sys_modules['dummy_thread']
if 'threading' in sys_modules:
# If ``threading`` is already imported, might as well prevent
# trying to import it more than needed by saving it if it is
# already imported before deleting it.
held_threading = sys_modules['threading']
holding_threading = True
del sys_modules['threading']
if '_threading_local' in sys_modules:
# If ``_threading_local`` is already imported, might as well prevent
# trying to import it more than needed by saving it if it is
# already imported before deleting it.
held__threading_local = sys_modules['_threading_local']
holding__threading_local = True
del sys_modules['_threading_local']
import threading
# Need a copy of the code kept somewhere...
sys_modules['_dummy_threading'] = sys_modules['threading']
del sys_modules['threading']
sys_modules['_dummy__threading_local'] = sys_modules['_threading_local']
del sys_modules['_threading_local']
from _dummy_threading import *
from _dummy_threading import __all__
finally:
# Put back ``threading`` if we overwrote earlier
if holding_threading:
sys_modules['threading'] = held_threading
del held_threading
del holding_threading
# Put back ``_threading_local`` if we overwrote earlier
if holding__threading_local:
sys_modules['_threading_local'] = held__threading_local
del held__threading_local
del holding__threading_local
# Put back ``thread`` if we overwrote, else del the entry we made
if holding_thread:
sys_modules['thread'] = held_thread
del held_thread
else:
del sys_modules['thread']
del holding_thread
del dummy_thread
del sys_modules
# Copyright (C) 2002-2006 Python Software Foundation
# Author: Ben Gertzfield
# Contact: [email protected]
"""Base64 content transfer encoding per RFCs 2045-2047.
This module handles the content transfer encoding method defined in RFC 2045
to encode arbitrary 8-bit data using the three 8-bit bytes in four 7-bit
characters encoding known as Base64.
It is used in the MIME standards for email to attach images, audio, and text
using some 8-bit character sets to messages.
This module provides an interface to encode and decode both headers and bodies
with Base64 encoding.
RFC 2045 defines a method for including character set information in an
`encoded-word' in a header. This method is commonly used for 8-bit real names
in To:, From:, Cc:, etc. fields, as well as Subject: lines.
This module does not do the line wrapping or end-of-line character conversion
necessary for proper internationalized headers; it only does dumb encoding and
decoding. To deal with the various line wrapping issues, use the email.header
module.
"""
__all__ = [
'base64_len',
'body_decode',
'body_encode',
'decode',
'decodestring',
'encode',
'encodestring',
'header_encode',
]
from binascii import b2a_base64, a2b_base64
from email.utils import fix_eols
CRLF = '\r\n'
NL = '\n'
EMPTYSTRING = ''
# See also Charset.py
MISC_LEN = 7
# Helpers
def base64_len(s):
"""Return the length of s when it is encoded with base64."""
groups_of_3, leftover = divmod(len(s), 3)
# 4 bytes out for each 3 bytes (or nonzero fraction thereof) in.
# Thanks, Tim!
n = groups_of_3 * 4
if leftover:
n += 4
return n
def header_encode(header, charset='iso-8859-1', keep_eols=False,
maxlinelen=76, eol=NL):
"""Encode a single header line with Base64 encoding in a given charset.
Defined in RFC 2045, this Base64 encoding is identical to normal Base64
encoding, except that each line must be intelligently wrapped (respecting
the Base64 encoding), and subsequent lines must start with a space.
charset names the character set to use to encode the header. It defaults
to iso-8859-1.
End-of-line characters (\\r, \\n, \\r\\n) will be automatically converted
to the canonical email line separator \\r\\n unless the keep_eols
parameter is True (the default is False).
Each line of the header will be terminated in the value of eol, which
defaults to "\\n". Set this to "\\r\\n" if you are using the result of
this function directly in email.
The resulting string will be in the form:
"=?charset?b?WW/5ciBtYXp66XLrIHf8eiBhIGhhbXBzdGHuciBBIFlv+XIgbWF6euly?=\\n
=?charset?b?6yB3/HogYSBoYW1wc3Rh7nIgQkMgWW/5ciBtYXp66XLrIHf8eiBhIGhh?="
with each line wrapped at, at most, maxlinelen characters (defaults to 76
characters).
"""
# Return empty headers unchanged
if not header:
return header
if not keep_eols:
header = fix_eols(header)
# Base64 encode each line, in encoded chunks no greater than maxlinelen in
# length, after the RFC chrome is added in.
base64ed = []
max_encoded = maxlinelen - len(charset) - MISC_LEN
max_unencoded = max_encoded * 3 // 4
for i in range(0, len(header), max_unencoded):
base64ed.append(b2a_base64(header[i:i+max_unencoded]))
# Now add the RFC chrome to each encoded chunk
lines = []
for line in base64ed:
# Ignore the last character of each line if it is a newline
if line.endswith(NL):
line = line[:-1]
# Add the chrome
lines.append('=?%s?b?%s?=' % (charset, line))
# Glue the lines together and return it. BAW: should we be able to
# specify the leading whitespace in the joiner?
joiner = eol + ' '
return joiner.join(lines)
def encode(s, binary=True, maxlinelen=76, eol=NL):
"""Encode a string with base64.
Each line will be wrapped at, at most, maxlinelen characters (defaults to
76 characters).
If binary is False, end-of-line characters will be converted to the
canonical email end-of-line sequence \\r\\n. Otherwise they will be left
verbatim (this is the default).
Each line of encoded text will end with eol, which defaults to "\\n". Set
this to "\\r\\n" if you will be using the result of this function directly
in an email.
"""
if not s:
return s
if not binary:
s = fix_eols(s)
encvec = []
max_unencoded = maxlinelen * 3 // 4
for i in range(0, len(s), max_unencoded):
# BAW: should encode() inherit b2a_base64()'s dubious behavior in
# adding a newline to the encoded string?
enc = b2a_base64(s[i:i + max_unencoded])
if enc.endswith(NL) and eol != NL:
enc = enc[:-1] + eol
encvec.append(enc)
return EMPTYSTRING.join(encvec)
# For convenience and backwards compatibility w/ standard base64 module
body_encode = encode
encodestring = encode
def decode(s, convert_eols=None):
"""Decode a raw base64 string.
If convert_eols is set to a string value, all canonical email linefeeds,
e.g. "\\r\\n", in the decoded text will be converted to the value of
convert_eols. os.linesep is a good choice for convert_eols if you are
decoding a text attachment.
This function does not parse a full MIME header value encoded with
base64 (like =?iso-8859-1?b?bmloISBuaWgh?=) -- please use the high
level email.header class for that functionality.
"""
if not s:
return s
dec = a2b_base64(s)
if convert_eols:
return dec.replace(CRLF, convert_eols)
return dec
# For convenience and backwards compatibility w/ standard base64 module
body_decode = decode
decodestring = decode
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Ben Gertzfield, Barry Warsaw
# Contact: [email protected]
__all__ = [
'Charset',
'add_alias',
'add_charset',
'add_codec',
]
import codecs
import email.base64mime
import email.quoprimime
from email import errors
from email.encoders import encode_7or8bit
# Flags for types of header encodings
QP = 1 # Quoted-Printable
BASE64 = 2 # Base64
SHORTEST = 3 # the shorter of QP and base64, but only for headers
# In "=?charset?q?hello_world?=", the =?, ?q?, and ?= add up to 7
MISC_LEN = 7
DEFAULT_CHARSET = 'us-ascii'
# Defaults
CHARSETS = {
# input header enc body enc output conv
'iso-8859-1': (QP, QP, None),
'iso-8859-2': (QP, QP, None),
'iso-8859-3': (QP, QP, None),
'iso-8859-4': (QP, QP, None),
# iso-8859-5 is Cyrillic, and not especially used
# iso-8859-6 is Arabic, also not particularly used
# iso-8859-7 is Greek, QP will not make it readable
# iso-8859-8 is Hebrew, QP will not make it readable
'iso-8859-9': (QP, QP, None),
'iso-8859-10': (QP, QP, None),
# iso-8859-11 is Thai, QP will not make it readable
'iso-8859-13': (QP, QP, None),
'iso-8859-14': (QP, QP, None),
'iso-8859-15': (QP, QP, None),
'iso-8859-16': (QP, QP, None),
'windows-1252':(QP, QP, None),
'viscii': (QP, QP, None),
'us-ascii': (None, None, None),
'big5': (BASE64, BASE64, None),
'gb2312': (BASE64, BASE64, None),
'euc-jp': (BASE64, None, 'iso-2022-jp'),
'shift_jis': (BASE64, None, 'iso-2022-jp'),
'iso-2022-jp': (BASE64, None, None),
'koi8-r': (BASE64, BASE64, None),
'utf-8': (SHORTEST, BASE64, 'utf-8'),
# We're making this one up to represent raw unencoded 8-bit
'8bit': (None, BASE64, 'utf-8'),
}
# Aliases for other commonly-used names for character sets. Map
# them to the real ones used in email.
ALIASES = {
'latin_1': 'iso-8859-1',
'latin-1': 'iso-8859-1',
'latin_2': 'iso-8859-2',
'latin-2': 'iso-8859-2',
'latin_3': 'iso-8859-3',
'latin-3': 'iso-8859-3',
'latin_4': 'iso-8859-4',
'latin-4': 'iso-8859-4',
'latin_5': 'iso-8859-9',
'latin-5': 'iso-8859-9',
'latin_6': 'iso-8859-10',
'latin-6': 'iso-8859-10',
'latin_7': 'iso-8859-13',
'latin-7': 'iso-8859-13',
'latin_8': 'iso-8859-14',
'latin-8': 'iso-8859-14',
'latin_9': 'iso-8859-15',
'latin-9': 'iso-8859-15',
'latin_10':'iso-8859-16',
'latin-10':'iso-8859-16',
'cp949': 'ks_c_5601-1987',
'euc_jp': 'euc-jp',
'euc_kr': 'euc-kr',
'ascii': 'us-ascii',
}
# Map charsets to their Unicode codec strings.
CODEC_MAP = {
'gb2312': 'eucgb2312_cn',
'big5': 'big5_tw',
# Hack: We don't want *any* conversion for stuff marked us-ascii, as all
# sorts of garbage might be sent to us in the guise of 7-bit us-ascii.
# Let that stuff pass through without conversion to/from Unicode.
'us-ascii': None,
}
# Convenience functions for extending the above mappings
def add_charset(charset, header_enc=None, body_enc=None, output_charset=None):
"""Add character set properties to the global registry.
charset is the input character set, and must be the canonical name of a
character set.
Optional header_enc and body_enc is either Charset.QP for
quoted-printable, Charset.BASE64 for base64 encoding, Charset.SHORTEST for
the shortest of qp or base64 encoding, or None for no encoding. SHORTEST
is only valid for header_enc. It describes how message headers and
message bodies in the input charset are to be encoded. Default is no
encoding.
Optional output_charset is the character set that the output should be
in. Conversions will proceed from input charset, to Unicode, to the
output charset when the method Charset.convert() is called. The default
is to output in the same character set as the input.
Both input_charset and output_charset must have Unicode codec entries in
the module's charset-to-codec mapping; use add_codec(charset, codecname)
to add codecs the module does not know about. See the codecs module's
documentation for more information.
"""
if body_enc == SHORTEST:
raise ValueError('SHORTEST not allowed for body_enc')
CHARSETS[charset] = (header_enc, body_enc, output_charset)
def add_alias(alias, canonical):
"""Add a character set alias.
alias is the alias name, e.g. latin-1
canonical is the character set's canonical name, e.g. iso-8859-1
"""
ALIASES[alias] = canonical
def add_codec(charset, codecname):
"""Add a codec that map characters in the given charset to/from Unicode.
charset is the canonical name of a character set. codecname is the name
of a Python codec, as appropriate for the second argument to the unicode()
built-in, or to the encode() method of a Unicode string.
"""
CODEC_MAP[charset] = codecname
class Charset:
"""Map character sets to their email properties.
This class provides information about the requirements imposed on email
for a specific character set. It also provides convenience routines for
converting between character sets, given the availability of the
applicable codecs. Given a character set, it will do its best to provide
information on how to use that character set in an email in an
RFC-compliant way.
Certain character sets must be encoded with quoted-printable or base64
when used in email headers or bodies. Certain character sets must be
converted outright, and are not allowed in email. Instances of this
module expose the following information about a character set:
input_charset: The initial character set specified. Common aliases
are converted to their `official' email names (e.g. latin_1
is converted to iso-8859-1). Defaults to 7-bit us-ascii.
header_encoding: If the character set must be encoded before it can be
used in an email header, this attribute will be set to
Charset.QP (for quoted-printable), Charset.BASE64 (for
base64 encoding), or Charset.SHORTEST for the shortest of
QP or BASE64 encoding. Otherwise, it will be None.
body_encoding: Same as header_encoding, but describes the encoding for the
mail message's body, which indeed may be different than the
header encoding. Charset.SHORTEST is not allowed for
body_encoding.
output_charset: Some character sets must be converted before they can be
used in email headers or bodies. If the input_charset is
one of them, this attribute will contain the name of the
charset output will be converted to. Otherwise, it will
be None.
input_codec: The name of the Python codec used to convert the
input_charset to Unicode. If no conversion codec is
necessary, this attribute will be None.
output_codec: The name of the Python codec used to convert Unicode
to the output_charset. If no conversion codec is necessary,
this attribute will have the same value as the input_codec.
"""
def __init__(self, input_charset=DEFAULT_CHARSET):
# RFC 2046, $4.1.2 says charsets are not case sensitive. We coerce to
# unicode because its .lower() is locale insensitive. If the argument
# is already a unicode, we leave it at that, but ensure that the
# charset is ASCII, as the standard (RFC XXX) requires.
try:
if isinstance(input_charset, unicode):
input_charset.encode('ascii')
else:
input_charset = unicode(input_charset, 'ascii')
except UnicodeError:
raise errors.CharsetError(input_charset)
input_charset = input_charset.lower().encode('ascii')
# Set the input charset after filtering through the aliases and/or codecs
if not (input_charset in ALIASES or input_charset in CHARSETS):
try:
input_charset = codecs.lookup(input_charset).name
except LookupError:
pass
self.input_charset = ALIASES.get(input_charset, input_charset)
# We can try to guess which encoding and conversion to use by the
# charset_map dictionary. Try that first, but let the user override
# it.
henc, benc, conv = CHARSETS.get(self.input_charset,
(SHORTEST, BASE64, None))
if not conv:
conv = self.input_charset
# Set the attributes, allowing the arguments to override the default.
self.header_encoding = henc
self.body_encoding = benc
self.output_charset = ALIASES.get(conv, conv)
# Now set the codecs. If one isn't defined for input_charset,
# guess and try a Unicode codec with the same name as input_codec.
self.input_codec = CODEC_MAP.get(self.input_charset,
self.input_charset)
self.output_codec = CODEC_MAP.get(self.output_charset,
self.output_charset)
def __str__(self):
return self.input_charset.lower()
__repr__ = __str__
def __eq__(self, other):
return str(self) == str(other).lower()
def __ne__(self, other):
return not self.__eq__(other)
def get_body_encoding(self):
"""Return the content-transfer-encoding used for body encoding.
This is either the string `quoted-printable' or `base64' depending on
the encoding used, or it is a function in which case you should call
the function with a single argument, the Message object being
encoded. The function should then set the Content-Transfer-Encoding
header itself to whatever is appropriate.
Returns "quoted-printable" if self.body_encoding is QP.
Returns "base64" if self.body_encoding is BASE64.
Returns "7bit" otherwise.
"""
assert self.body_encoding != SHORTEST
if self.body_encoding == QP:
return 'quoted-printable'
elif self.body_encoding == BASE64:
return 'base64'
else:
return encode_7or8bit
def convert(self, s):
"""Convert a string from the input_codec to the output_codec."""
if self.input_codec != self.output_codec:
return unicode(s, self.input_codec).encode(self.output_codec)
else:
return s
def to_splittable(self, s):
"""Convert a possibly multibyte string to a safely splittable format.
Uses the input_codec to try and convert the string to Unicode, so it
can be safely split on character boundaries (even for multibyte
characters).
Returns the string as-is if it isn't known how to convert it to
Unicode with the input_charset.
Characters that could not be converted to Unicode will be replaced
with the Unicode replacement character U+FFFD.
"""
if isinstance(s, unicode) or self.input_codec is None:
return s
try:
return unicode(s, self.input_codec, 'replace')
except LookupError:
# Input codec not installed on system, so return the original
# string unchanged.
return s
def from_splittable(self, ustr, to_output=True):
"""Convert a splittable string back into an encoded string.
Uses the proper codec to try and convert the string from Unicode back
into an encoded format. Return the string as-is if it is not Unicode,
or if it could not be converted from Unicode.
Characters that could not be converted from Unicode will be replaced
with an appropriate character (usually '?').
If to_output is True (the default), uses output_codec to convert to an
encoded format. If to_output is False, uses input_codec.
"""
if to_output:
codec = self.output_codec
else:
codec = self.input_codec
if not isinstance(ustr, unicode) or codec is None:
return ustr
try:
return ustr.encode(codec, 'replace')
except LookupError:
# Output codec not installed
return ustr
def get_output_charset(self):
"""Return the output character set.
This is self.output_charset if that is not None, otherwise it is
self.input_charset.
"""
return self.output_charset or self.input_charset
def encoded_header_len(self, s):
"""Return the length of the encoded header string."""
cset = self.get_output_charset()
# The len(s) of a 7bit encoding is len(s)
if self.header_encoding == BASE64:
return email.base64mime.base64_len(s) + len(cset) + MISC_LEN
elif self.header_encoding == QP:
return email.quoprimime.header_quopri_len(s) + len(cset) + MISC_LEN
elif self.header_encoding == SHORTEST:
lenb64 = email.base64mime.base64_len(s)
lenqp = email.quoprimime.header_quopri_len(s)
return min(lenb64, lenqp) + len(cset) + MISC_LEN
else:
return len(s)
def header_encode(self, s, convert=False):
"""Header-encode a string, optionally converting it to output_charset.
If convert is True, the string will be converted from the input
charset to the output charset automatically. This is not useful for
multibyte character sets, which have line length issues (multibyte
characters must be split on a character, not a byte boundary); use the
high-level Header class to deal with these issues. convert defaults
to False.
The type of encoding (base64 or quoted-printable) will be based on
self.header_encoding.
"""
cset = self.get_output_charset()
if convert:
s = self.convert(s)
# 7bit/8bit encodings return the string unchanged (modulo conversions)
if self.header_encoding == BASE64:
return email.base64mime.header_encode(s, cset)
elif self.header_encoding == QP:
return email.quoprimime.header_encode(s, cset, maxlinelen=None)
elif self.header_encoding == SHORTEST:
lenb64 = email.base64mime.base64_len(s)
lenqp = email.quoprimime.header_quopri_len(s)
if lenb64 < lenqp:
return email.base64mime.header_encode(s, cset)
else:
return email.quoprimime.header_encode(s, cset, maxlinelen=None)
else:
return s
def body_encode(self, s, convert=True):
"""Body-encode a string and convert it to output_charset.
If convert is True (the default), the string will be converted from
the input charset to output charset automatically. Unlike
header_encode(), there are no issues with byte boundaries and
multibyte charsets in email bodies, so this is usually pretty safe.
The type of encoding (base64 or quoted-printable) will be based on
self.body_encoding.
"""
if convert:
s = self.convert(s)
# 7bit/8bit encodings return the string unchanged (module conversions)
if self.body_encoding is BASE64:
return email.base64mime.body_encode(s)
elif self.body_encoding is QP:
return email.quoprimime.body_encode(s)
else:
return s
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Encodings and related functions."""
__all__ = [
'encode_7or8bit',
'encode_base64',
'encode_noop',
'encode_quopri',
]
import base64
from quopri import encodestring as _encodestring
def _qencode(s):
enc = _encodestring(s, quotetabs=True)
# Must encode spaces, which quopri.encodestring() doesn't do
return enc.replace(' ', '=20')
def _bencode(s):
# We can't quite use base64.encodestring() since it tacks on a "courtesy
# newline". Blech!
if not s:
return s
hasnewline = (s[-1] == '\n')
value = base64.encodestring(s)
if not hasnewline and value[-1] == '\n':
return value[:-1]
return value
def encode_base64(msg):
"""Encode the message's payload in Base64.
Also, add an appropriate Content-Transfer-Encoding header.
"""
orig = msg.get_payload()
encdata = _bencode(orig)
msg.set_payload(encdata)
msg['Content-Transfer-Encoding'] = 'base64'
def encode_quopri(msg):
"""Encode the message's payload in quoted-printable.
Also, add an appropriate Content-Transfer-Encoding header.
"""
orig = msg.get_payload()
encdata = _qencode(orig)
msg.set_payload(encdata)
msg['Content-Transfer-Encoding'] = 'quoted-printable'
def encode_7or8bit(msg):
"""Set the Content-Transfer-Encoding header to 7bit or 8bit."""
orig = msg.get_payload()
if orig is None:
# There's no payload. For backwards compatibility we use 7bit
msg['Content-Transfer-Encoding'] = '7bit'
return
# We play a trick to make this go fast. If encoding to ASCII succeeds, we
# know the data must be 7bit, otherwise treat it as 8bit.
try:
orig.encode('ascii')
except UnicodeError:
msg['Content-Transfer-Encoding'] = '8bit'
else:
msg['Content-Transfer-Encoding'] = '7bit'
def encode_noop(msg):
"""Do nothing."""
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""email package exception classes."""
class MessageError(Exception):
"""Base class for errors in the email package."""
class MessageParseError(MessageError):
"""Base class for message parsing errors."""
class HeaderParseError(MessageParseError):
"""Error while parsing headers."""
class BoundaryError(MessageParseError):
"""Couldn't find terminating boundary."""
class MultipartConversionError(MessageError, TypeError):
"""Conversion to a multipart is prohibited."""
class CharsetError(MessageError):
"""An illegal charset was given."""
# These are parsing defects which the parser was able to work around.
class MessageDefect:
"""Base class for a message defect."""
def __init__(self, line=None):
self.line = line
class NoBoundaryInMultipartDefect(MessageDefect):
"""A message claimed to be a multipart but had no boundary parameter."""
class StartBoundaryNotFoundDefect(MessageDefect):
"""The claimed start boundary was never found."""
class FirstHeaderLineIsContinuationDefect(MessageDefect):
"""A message had a continuation line as its first header line."""
class MisplacedEnvelopeHeaderDefect(MessageDefect):
"""A 'Unix-from' header was found in the middle of a header block."""
class MalformedHeaderDefect(MessageDefect):
"""Found a header that was missing a colon, or was otherwise malformed."""
class MultipartInvariantViolationDefect(MessageDefect):
"""A message claimed to be a multipart but no subparts were found."""
# Copyright (C) 2004-2006 Python Software Foundation
# Authors: Baxter, Wouters and Warsaw
# Contact: [email protected]
"""FeedParser - An email feed parser.
The feed parser implements an interface for incrementally parsing an email
message, line by line. This has advantages for certain applications, such as
those reading email messages off a socket.
FeedParser.feed() is the primary interface for pushing new data into the
parser. It returns when there's nothing more it can do with the available
data. When you have no more data to push into the parser, call .close().
This completes the parsing and returns the root message object.
The other advantage of this parser is that it will never raise a parsing
exception. Instead, when it finds something unexpected, it adds a 'defect' to
the current message. Defects are just instances that live on the message
object's .defects attribute.
"""
__all__ = ['FeedParser']
import re
from email import errors
from email import message
NLCRE = re.compile('\r\n|\r|\n')
NLCRE_bol = re.compile('(\r\n|\r|\n)')
NLCRE_eol = re.compile('(\r\n|\r|\n)\Z')
NLCRE_crack = re.compile('(\r\n|\r|\n)')
# RFC 2822 $3.6.8 Optional fields. ftext is %d33-57 / %d59-126, Any character
# except controls, SP, and ":".
headerRE = re.compile(r'^(From |[\041-\071\073-\176]{1,}:|[\t ])')
EMPTYSTRING = ''
NL = '\n'
NeedMoreData = object()
class BufferedSubFile(object):
"""A file-ish object that can have new data loaded into it.
You can also push and pop line-matching predicates onto a stack. When the
current predicate matches the current line, a false EOF response
(i.e. empty string) is returned instead. This lets the parser adhere to a
simple abstraction -- it parses until EOF closes the current message.
"""
def __init__(self):
# Chunks of the last partial line pushed into this object.
self._partial = []
# The list of full, pushed lines, in reverse order
self._lines = []
# The stack of false-EOF checking predicates.
self._eofstack = []
# A flag indicating whether the file has been closed or not.
self._closed = False
def push_eof_matcher(self, pred):
self._eofstack.append(pred)
def pop_eof_matcher(self):
return self._eofstack.pop()
def close(self):
# Don't forget any trailing partial line.
self.pushlines(''.join(self._partial).splitlines(True))
self._partial = []
self._closed = True
def readline(self):
if not self._lines:
if self._closed:
return ''
return NeedMoreData
# Pop the line off the stack and see if it matches the current
# false-EOF predicate.
line = self._lines.pop()
# RFC 2046, section 5.1.2 requires us to recognize outer level
# boundaries at any level of inner nesting. Do this, but be sure it's
# in the order of most to least nested.
for ateof in self._eofstack[::-1]:
if ateof(line):
# We're at the false EOF. But push the last line back first.
self._lines.append(line)
return ''
return line
def unreadline(self, line):
# Let the consumer push a line back into the buffer.
assert line is not NeedMoreData
self._lines.append(line)
def push(self, data):
"""Push some new data into this object."""
# Crack into lines, but preserve the linesep characters on the end of each
parts = data.splitlines(True)
if not parts or not parts[0].endswith(('\n', '\r')):
# No new complete lines, so just accumulate partials
self._partial += parts
return
if self._partial:
# If there are previous leftovers, complete them now
self._partial.append(parts[0])
parts[0:1] = ''.join(self._partial).splitlines(True)
del self._partial[:]
# If the last element of the list does not end in a newline, then treat
# it as a partial line. We only check for '\n' here because a line
# ending with '\r' might be a line that was split in the middle of a
# '\r\n' sequence (see bugs 1555570 and 1721862).
if not parts[-1].endswith('\n'):
self._partial = [parts.pop()]
self.pushlines(parts)
def pushlines(self, lines):
# Reverse and insert at the front of the lines.
self._lines[:0] = lines[::-1]
def is_closed(self):
return self._closed
def __iter__(self):
return self
def next(self):
line = self.readline()
if line == '':
raise StopIteration
return line
class FeedParser:
"""A feed-style parser of email."""
def __init__(self, _factory=message.Message):
"""_factory is called with no arguments to create a new message obj"""
self._factory = _factory
self._input = BufferedSubFile()
self._msgstack = []
self._parse = self._parsegen().next
self._cur = None
self._last = None
self._headersonly = False
# Non-public interface for supporting Parser's headersonly flag
def _set_headersonly(self):
self._headersonly = True
def feed(self, data):
"""Push more data into the parser."""
self._input.push(data)
self._call_parse()
def _call_parse(self):
try:
self._parse()
except StopIteration:
pass
def close(self):
"""Parse all remaining data and return the root message object."""
self._input.close()
self._call_parse()
root = self._pop_message()
assert not self._msgstack
# Look for final set of defects
if root.get_content_maintype() == 'multipart' \
and not root.is_multipart():
root.defects.append(errors.MultipartInvariantViolationDefect())
return root
def _new_message(self):
msg = self._factory()
if self._cur and self._cur.get_content_type() == 'multipart/digest':
msg.set_default_type('message/rfc822')
if self._msgstack:
self._msgstack[-1].attach(msg)
self._msgstack.append(msg)
self._cur = msg
self._last = msg
def _pop_message(self):
retval = self._msgstack.pop()
if self._msgstack:
self._cur = self._msgstack[-1]
else:
self._cur = None
return retval
def _parsegen(self):
# Create a new message and start by parsing headers.
self._new_message()
headers = []
# Collect the headers, searching for a line that doesn't match the RFC
# 2822 header or continuation pattern (including an empty line).
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
if not headerRE.match(line):
# If we saw the RFC defined header/body separator
# (i.e. newline), just throw it away. Otherwise the line is
# part of the body so push it back.
if not NLCRE.match(line):
self._input.unreadline(line)
break
headers.append(line)
# Done with the headers, so parse them and figure out what we're
# supposed to see in the body of the message.
self._parse_headers(headers)
# Headers-only parsing is a backwards compatibility hack, which was
# necessary in the older parser, which could raise errors. All
# remaining lines in the input are thrown into the message body.
if self._headersonly:
lines = []
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
if line == '':
break
lines.append(line)
self._cur.set_payload(EMPTYSTRING.join(lines))
return
if self._cur.get_content_type() == 'message/delivery-status':
# message/delivery-status contains blocks of headers separated by
# a blank line. We'll represent each header block as a separate
# nested message object, but the processing is a bit different
# than standard message/* types because there is no body for the
# nested messages. A blank line separates the subparts.
while True:
self._input.push_eof_matcher(NLCRE.match)
for retval in self._parsegen():
if retval is NeedMoreData:
yield NeedMoreData
continue
break
msg = self._pop_message()
# We need to pop the EOF matcher in order to tell if we're at
# the end of the current file, not the end of the last block
# of message headers.
self._input.pop_eof_matcher()
# The input stream must be sitting at the newline or at the
# EOF. We want to see if we're at the end of this subpart, so
# first consume the blank line, then test the next line to see
# if we're at this subpart's EOF.
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
break
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
break
if line == '':
break
# Not at EOF so this is a line we're going to need.
self._input.unreadline(line)
return
if self._cur.get_content_maintype() == 'message':
# The message claims to be a message/* type, then what follows is
# another RFC 2822 message.
for retval in self._parsegen():
if retval is NeedMoreData:
yield NeedMoreData
continue
break
self._pop_message()
return
if self._cur.get_content_maintype() == 'multipart':
boundary = self._cur.get_boundary()
if boundary is None:
# The message /claims/ to be a multipart but it has not
# defined a boundary. That's a problem which we'll handle by
# reading everything until the EOF and marking the message as
# defective.
self._cur.defects.append(errors.NoBoundaryInMultipartDefect())
lines = []
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
lines.append(line)
self._cur.set_payload(EMPTYSTRING.join(lines))
return
# Create a line match predicate which matches the inter-part
# boundary as well as the end-of-multipart boundary. Don't push
# this onto the input stream until we've scanned past the
# preamble.
separator = '--' + boundary
boundaryre = re.compile(
'(?P<sep>' + re.escape(separator) +
r')(?P<end>--)?(?P<ws>[ \t]*)(?P<linesep>\r\n|\r|\n)?$')
capturing_preamble = True
preamble = []
linesep = False
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
if line == '':
break
mo = boundaryre.match(line)
if mo:
# If we're looking at the end boundary, we're done with
# this multipart. If there was a newline at the end of
# the closing boundary, then we need to initialize the
# epilogue with the empty string (see below).
if mo.group('end'):
linesep = mo.group('linesep')
break
# We saw an inter-part boundary. Were we in the preamble?
if capturing_preamble:
if preamble:
# According to RFC 2046, the last newline belongs
# to the boundary.
lastline = preamble[-1]
eolmo = NLCRE_eol.search(lastline)
if eolmo:
preamble[-1] = lastline[:-len(eolmo.group(0))]
self._cur.preamble = EMPTYSTRING.join(preamble)
capturing_preamble = False
self._input.unreadline(line)
continue
# We saw a boundary separating two parts. Consume any
# multiple boundary lines that may be following. Our
# interpretation of RFC 2046 BNF grammar does not produce
# body parts within such double boundaries.
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
mo = boundaryre.match(line)
if not mo:
self._input.unreadline(line)
break
# Recurse to parse this subpart; the input stream points
# at the subpart's first line.
self._input.push_eof_matcher(boundaryre.match)
for retval in self._parsegen():
if retval is NeedMoreData:
yield NeedMoreData
continue
break
# Because of RFC 2046, the newline preceding the boundary
# separator actually belongs to the boundary, not the
# previous subpart's payload (or epilogue if the previous
# part is a multipart).
if self._last.get_content_maintype() == 'multipart':
epilogue = self._last.epilogue
if epilogue == '':
self._last.epilogue = None
elif epilogue is not None:
mo = NLCRE_eol.search(epilogue)
if mo:
end = len(mo.group(0))
self._last.epilogue = epilogue[:-end]
else:
payload = self._last.get_payload()
if isinstance(payload, basestring):
mo = NLCRE_eol.search(payload)
if mo:
payload = payload[:-len(mo.group(0))]
self._last.set_payload(payload)
self._input.pop_eof_matcher()
self._pop_message()
# Set the multipart up for newline cleansing, which will
# happen if we're in a nested multipart.
self._last = self._cur
else:
# I think we must be in the preamble
assert capturing_preamble
preamble.append(line)
# We've seen either the EOF or the end boundary. If we're still
# capturing the preamble, we never saw the start boundary. Note
# that as a defect and store the captured text as the payload.
# Everything from here to the EOF is epilogue.
if capturing_preamble:
self._cur.defects.append(errors.StartBoundaryNotFoundDefect())
self._cur.set_payload(EMPTYSTRING.join(preamble))
epilogue = []
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
self._cur.epilogue = EMPTYSTRING.join(epilogue)
return
# If the end boundary ended in a newline, we'll need to make sure
# the epilogue isn't None
if linesep:
epilogue = ['']
else:
epilogue = []
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
epilogue.append(line)
# Any CRLF at the front of the epilogue is not technically part of
# the epilogue. Also, watch out for an empty string epilogue,
# which means a single newline.
if epilogue:
firstline = epilogue[0]
bolmo = NLCRE_bol.match(firstline)
if bolmo:
epilogue[0] = firstline[len(bolmo.group(0)):]
self._cur.epilogue = EMPTYSTRING.join(epilogue)
return
# Otherwise, it's some non-multipart type, so the entire rest of the
# file contents becomes the payload.
lines = []
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
lines.append(line)
self._cur.set_payload(EMPTYSTRING.join(lines))
def _parse_headers(self, lines):
# Passed a list of lines that make up the headers for the current msg
lastheader = ''
lastvalue = []
for lineno, line in enumerate(lines):
# Check for continuation
if line[0] in ' \t':
if not lastheader:
# The first line of the headers was a continuation. This
# is illegal, so let's note the defect, store the illegal
# line, and ignore it for purposes of headers.
defect = errors.FirstHeaderLineIsContinuationDefect(line)
self._cur.defects.append(defect)
continue
lastvalue.append(line)
continue
if lastheader:
# XXX reconsider the joining of folded lines
lhdr = EMPTYSTRING.join(lastvalue)[:-1].rstrip('\r\n')
self._cur[lastheader] = lhdr
lastheader, lastvalue = '', []
# Check for envelope header, i.e. unix-from
if line.startswith('From '):
if lineno == 0:
# Strip off the trailing newline
mo = NLCRE_eol.search(line)
if mo:
line = line[:-len(mo.group(0))]
self._cur.set_unixfrom(line)
continue
elif lineno == len(lines) - 1:
# Something looking like a unix-from at the end - it's
# probably the first line of the body, so push back the
# line and stop.
self._input.unreadline(line)
return
else:
# Weirdly placed unix-from line. Note this as a defect
# and ignore it.
defect = errors.MisplacedEnvelopeHeaderDefect(line)
self._cur.defects.append(defect)
continue
# Split the line on the colon separating field name from value.
i = line.find(':')
if i < 0:
defect = errors.MalformedHeaderDefect(line)
self._cur.defects.append(defect)
continue
lastheader = line[:i]
lastvalue = [line[i+1:].lstrip()]
# Done with all the lines, so handle the last header.
if lastheader:
# XXX reconsider the joining of folded lines
self._cur[lastheader] = EMPTYSTRING.join(lastvalue).rstrip('\r\n')
# Copyright (C) 2001-2010 Python Software Foundation
# Contact: [email protected]
"""Classes to generate plain text from a message object tree."""
__all__ = ['Generator', 'DecodedGenerator']
import re
import sys
import time
import random
import warnings
from cStringIO import StringIO
from email.header import Header
UNDERSCORE = '_'
NL = '\n'
fcre = re.compile(r'^From ', re.MULTILINE)
def _is8bitstring(s):
if isinstance(s, str):
try:
unicode(s, 'us-ascii')
except UnicodeError:
return True
return False
class Generator:
"""Generates output from a Message object tree.
This basic generator writes the message to the given file object as plain
text.
"""
#
# Public interface
#
def __init__(self, outfp, mangle_from_=True, maxheaderlen=78):
"""Create the generator for message flattening.
outfp is the output file-like object for writing the message to. It
must have a write() method.
Optional mangle_from_ is a flag that, when True (the default), escapes
From_ lines in the body of the message by putting a `>' in front of
them.
Optional maxheaderlen specifies the longest length for a non-continued
header. When a header line is longer (in characters, with tabs
expanded to 8 spaces) than maxheaderlen, the header will split as
defined in the Header class. Set maxheaderlen to zero to disable
header wrapping. The default is 78, as recommended (but not required)
by RFC 2822.
"""
self._fp = outfp
self._mangle_from_ = mangle_from_
self._maxheaderlen = maxheaderlen
def write(self, s):
# Just delegate to the file object
self._fp.write(s)
def flatten(self, msg, unixfrom=False):
"""Print the message object tree rooted at msg to the output file
specified when the Generator instance was created.
unixfrom is a flag that forces the printing of a Unix From_ delimiter
before the first object in the message tree. If the original message
has no From_ delimiter, a `standard' one is crafted. By default, this
is False to inhibit the printing of any From_ delimiter.
Note that for subobjects, no From_ line is printed.
"""
if unixfrom:
ufrom = msg.get_unixfrom()
if not ufrom:
ufrom = 'From nobody ' + time.ctime(time.time())
print >> self._fp, ufrom
self._write(msg)
def clone(self, fp):
"""Clone this generator with the exact same options."""
return self.__class__(fp, self._mangle_from_, self._maxheaderlen)
#
# Protected interface - undocumented ;/
#
def _write(self, msg):
# We can't write the headers yet because of the following scenario:
# say a multipart message includes the boundary string somewhere in
# its body. We'd have to calculate the new boundary /before/ we write
# the headers so that we can write the correct Content-Type:
# parameter.
#
# The way we do this, so as to make the _handle_*() methods simpler,
# is to cache any subpart writes into a StringIO. The we write the
# headers and the StringIO contents. That way, subpart handlers can
# Do The Right Thing, and can still modify the Content-Type: header if
# necessary.
oldfp = self._fp
try:
self._fp = sfp = StringIO()
self._dispatch(msg)
finally:
self._fp = oldfp
# Write the headers. First we see if the message object wants to
# handle that itself. If not, we'll do it generically.
meth = getattr(msg, '_write_headers', None)
if meth is None:
self._write_headers(msg)
else:
meth(self)
self._fp.write(sfp.getvalue())
def _dispatch(self, msg):
# Get the Content-Type: for the message, then try to dispatch to
# self._handle_<maintype>_<subtype>(). If there's no handler for the
# full MIME type, then dispatch to self._handle_<maintype>(). If
# that's missing too, then dispatch to self._writeBody().
main = msg.get_content_maintype()
sub = msg.get_content_subtype()
specific = UNDERSCORE.join((main, sub)).replace('-', '_')
meth = getattr(self, '_handle_' + specific, None)
if meth is None:
generic = main.replace('-', '_')
meth = getattr(self, '_handle_' + generic, None)
if meth is None:
meth = self._writeBody
meth(msg)
#
# Default handlers
#
def _write_headers(self, msg):
for h, v in msg.items():
print >> self._fp, '%s:' % h,
if self._maxheaderlen == 0:
# Explicit no-wrapping
print >> self._fp, v
elif isinstance(v, Header):
# Header instances know what to do
print >> self._fp, v.encode()
elif _is8bitstring(v):
# If we have raw 8bit data in a byte string, we have no idea
# what the encoding is. There is no safe way to split this
# string. If it's ascii-subset, then we could do a normal
# ascii split, but if it's multibyte then we could break the
# string. There's no way to know so the least harm seems to
# be to not split the string and risk it being too long.
print >> self._fp, v
else:
# Header's got lots of smarts, so use it. Note that this is
# fundamentally broken though because we lose idempotency when
# the header string is continued with tabs. It will now be
# continued with spaces. This was reversedly broken before we
# fixed bug 1974. Either way, we lose.
print >> self._fp, Header(
v, maxlinelen=self._maxheaderlen, header_name=h).encode()
# A blank line always separates headers from body
print >> self._fp
#
# Handlers for writing types and subtypes
#
def _handle_text(self, msg):
payload = msg.get_payload()
if payload is None:
return
if not isinstance(payload, basestring):
raise TypeError('string payload expected: %s' % type(payload))
if self._mangle_from_:
payload = fcre.sub('>From ', payload)
self._fp.write(payload)
# Default body handler
_writeBody = _handle_text
def _handle_multipart(self, msg):
# The trick here is to write out each part separately, merge them all
# together, and then make sure that the boundary we've chosen isn't
# present in the payload.
msgtexts = []
subparts = msg.get_payload()
if subparts is None:
subparts = []
elif isinstance(subparts, basestring):
# e.g. a non-strict parse of a message with no starting boundary.
self._fp.write(subparts)
return
elif not isinstance(subparts, list):
# Scalar payload
subparts = [subparts]
for part in subparts:
s = StringIO()
g = self.clone(s)
g.flatten(part, unixfrom=False)
msgtexts.append(s.getvalue())
# BAW: What about boundaries that are wrapped in double-quotes?
boundary = msg.get_boundary()
if not boundary:
# Create a boundary that doesn't appear in any of the
# message texts.
alltext = NL.join(msgtexts)
boundary = _make_boundary(alltext)
msg.set_boundary(boundary)
# If there's a preamble, write it out, with a trailing CRLF
if msg.preamble is not None:
if self._mangle_from_:
preamble = fcre.sub('>From ', msg.preamble)
else:
preamble = msg.preamble
print >> self._fp, preamble
# dash-boundary transport-padding CRLF
print >> self._fp, '--' + boundary
# body-part
if msgtexts:
self._fp.write(msgtexts.pop(0))
# *encapsulation
# --> delimiter transport-padding
# --> CRLF body-part
for body_part in msgtexts:
# delimiter transport-padding CRLF
print >> self._fp, '\n--' + boundary
# body-part
self._fp.write(body_part)
# close-delimiter transport-padding
self._fp.write('\n--' + boundary + '--' + NL)
if msg.epilogue is not None:
if self._mangle_from_:
epilogue = fcre.sub('>From ', msg.epilogue)
else:
epilogue = msg.epilogue
self._fp.write(epilogue)
def _handle_multipart_signed(self, msg):
# The contents of signed parts has to stay unmodified in order to keep
# the signature intact per RFC1847 2.1, so we disable header wrapping.
# RDM: This isn't enough to completely preserve the part, but it helps.
old_maxheaderlen = self._maxheaderlen
try:
self._maxheaderlen = 0
self._handle_multipart(msg)
finally:
self._maxheaderlen = old_maxheaderlen
def _handle_message_delivery_status(self, msg):
# We can't just write the headers directly to self's file object
# because this will leave an extra newline between the last header
# block and the boundary. Sigh.
blocks = []
for part in msg.get_payload():
s = StringIO()
g = self.clone(s)
g.flatten(part, unixfrom=False)
text = s.getvalue()
lines = text.split('\n')
# Strip off the unnecessary trailing empty line
if lines and lines[-1] == '':
blocks.append(NL.join(lines[:-1]))
else:
blocks.append(text)
# Now join all the blocks with an empty line. This has the lovely
# effect of separating each block with an empty line, but not adding
# an extra one after the last one.
self._fp.write(NL.join(blocks))
def _handle_message(self, msg):
s = StringIO()
g = self.clone(s)
# The payload of a message/rfc822 part should be a multipart sequence
# of length 1. The zeroth element of the list should be the Message
# object for the subpart. Extract that object, stringify it, and
# write it out.
# Except, it turns out, when it's a string instead, which happens when
# and only when HeaderParser is used on a message of mime type
# message/rfc822. Such messages are generated by, for example,
# Groupwise when forwarding unadorned messages. (Issue 7970.) So
# in that case we just emit the string body.
payload = msg.get_payload()
if isinstance(payload, list):
g.flatten(msg.get_payload(0), unixfrom=False)
payload = s.getvalue()
self._fp.write(payload)
_FMT = '[Non-text (%(type)s) part of message omitted, filename %(filename)s]'
class DecodedGenerator(Generator):
"""Generates a text representation of a message.
Like the Generator base class, except that non-text parts are substituted
with a format string representing the part.
"""
def __init__(self, outfp, mangle_from_=True, maxheaderlen=78, fmt=None):
"""Like Generator.__init__() except that an additional optional
argument is allowed.
Walks through all subparts of a message. If the subpart is of main
type `text', then it prints the decoded payload of the subpart.
Otherwise, fmt is a format string that is used instead of the message
payload. fmt is expanded with the following keywords (in
%(keyword)s format):
type : Full MIME type of the non-text part
maintype : Main MIME type of the non-text part
subtype : Sub-MIME type of the non-text part
filename : Filename of the non-text part
description: Description associated with the non-text part
encoding : Content transfer encoding of the non-text part
The default value for fmt is None, meaning
[Non-text (%(type)s) part of message omitted, filename %(filename)s]
"""
Generator.__init__(self, outfp, mangle_from_, maxheaderlen)
if fmt is None:
self._fmt = _FMT
else:
self._fmt = fmt
def _dispatch(self, msg):
for part in msg.walk():
maintype = part.get_content_maintype()
if maintype == 'text':
print >> self, part.get_payload(decode=True)
elif maintype == 'multipart':
# Just skip this
pass
else:
print >> self, self._fmt % {
'type' : part.get_content_type(),
'maintype' : part.get_content_maintype(),
'subtype' : part.get_content_subtype(),
'filename' : part.get_filename('[no filename]'),
'description': part.get('Content-Description',
'[no description]'),
'encoding' : part.get('Content-Transfer-Encoding',
'[no encoding]'),
}
# Helper
_width = len(repr(sys.maxint-1))
_fmt = '%%0%dd' % _width
def _make_boundary(text=None):
# Craft a random boundary. If text is given, ensure that the chosen
# boundary doesn't appear in the text.
token = random.randrange(sys.maxint)
boundary = ('=' * 15) + (_fmt % token) + '=='
if text is None:
return boundary
b = boundary
counter = 0
while True:
cre = re.compile('^--' + re.escape(b) + '(--)?$', re.MULTILINE)
if not cre.search(text):
break
b = boundary + '.' + str(counter)
counter += 1
return b
# Copyright (C) 2002-2006 Python Software Foundation
# Author: Ben Gertzfield, Barry Warsaw
# Contact: [email protected]
"""Header encoding and decoding functionality."""
__all__ = [
'Header',
'decode_header',
'make_header',
]
import re
import binascii
import email.quoprimime
import email.base64mime
from email.errors import HeaderParseError
from email.charset import Charset
NL = '\n'
SPACE = ' '
USPACE = u' '
SPACE8 = ' ' * 8
UEMPTYSTRING = u''
MAXLINELEN = 76
USASCII = Charset('us-ascii')
UTF8 = Charset('utf-8')
# Match encoded-word strings in the form =?charset?q?Hello_World?=
ecre = re.compile(r'''
=\? # literal =?
(?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
\? # literal ?
(?P<encoding>[qb]) # either a "q" or a "b", case insensitive
\? # literal ?
(?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
\?= # literal ?=
(?=[ \t]|$) # whitespace or the end of the string
''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
# Field name regexp, including trailing colon, but not separating whitespace,
# according to RFC 2822. Character range is from tilde to exclamation mark.
# For use with .match()
fcre = re.compile(r'[\041-\176]+:$')
# Find a header embedded in a putative header value. Used to check for
# header injection attack.
_embeded_header = re.compile(r'\n[^ \t]+:')
# Helpers
_max_append = email.quoprimime._max_append
def decode_header(header):
"""Decode a message header value without converting charset.
Returns a list of (decoded_string, charset) pairs containing each of the
decoded parts of the header. Charset is None for non-encoded parts of the
header, otherwise a lower-case string containing the name of the character
set specified in the encoded string.
An email.errors.HeaderParseError may be raised when certain decoding error
occurs (e.g. a base64 decoding exception).
"""
# If no encoding, just return the header
header = str(header)
if not ecre.search(header):
return [(header, None)]
decoded = []
dec = ''
for line in header.splitlines():
# This line might not have an encoding in it
if not ecre.search(line):
decoded.append((line, None))
continue
parts = ecre.split(line)
while parts:
unenc = parts.pop(0).strip()
if unenc:
# Should we continue a long line?
if decoded and decoded[-1][1] is None:
decoded[-1] = (decoded[-1][0] + SPACE + unenc, None)
else:
decoded.append((unenc, None))
if parts:
charset, encoding = [s.lower() for s in parts[0:2]]
encoded = parts[2]
dec = None
if encoding == 'q':
dec = email.quoprimime.header_decode(encoded)
elif encoding == 'b':
paderr = len(encoded) % 4 # Postel's law: add missing padding
if paderr:
encoded += '==='[:4 - paderr]
try:
dec = email.base64mime.decode(encoded)
except binascii.Error:
# Turn this into a higher level exception. BAW: Right
# now we throw the lower level exception away but
# when/if we get exception chaining, we'll preserve it.
raise HeaderParseError
if dec is None:
dec = encoded
if decoded and decoded[-1][1] == charset:
decoded[-1] = (decoded[-1][0] + dec, decoded[-1][1])
else:
decoded.append((dec, charset))
del parts[0:3]
return decoded
def make_header(decoded_seq, maxlinelen=None, header_name=None,
continuation_ws=' '):
"""Create a Header from a sequence of pairs as returned by decode_header()
decode_header() takes a header value string and returns a sequence of
pairs of the format (decoded_string, charset) where charset is the string
name of the character set.
This function takes one of those sequence of pairs and returns a Header
instance. Optional maxlinelen, header_name, and continuation_ws are as in
the Header constructor.
"""
h = Header(maxlinelen=maxlinelen, header_name=header_name,
continuation_ws=continuation_ws)
for s, charset in decoded_seq:
# None means us-ascii but we can simply pass it on to h.append()
if charset is not None and not isinstance(charset, Charset):
charset = Charset(charset)
h.append(s, charset)
return h
class Header:
def __init__(self, s=None, charset=None,
maxlinelen=None, header_name=None,
continuation_ws=' ', errors='strict'):
"""Create a MIME-compliant header that can contain many character sets.
Optional s is the initial header value. If None, the initial header
value is not set. You can later append to the header with .append()
method calls. s may be a byte string or a Unicode string, but see the
.append() documentation for semantics.
Optional charset serves two purposes: it has the same meaning as the
charset argument to the .append() method. It also sets the default
character set for all subsequent .append() calls that omit the charset
argument. If charset is not provided in the constructor, the us-ascii
charset is used both as s's initial charset and as the default for
subsequent .append() calls.
The maximum line length can be specified explicit via maxlinelen. For
splitting the first line to a shorter value (to account for the field
header which isn't included in s, e.g. `Subject') pass in the name of
the field in header_name. The default maxlinelen is 76.
continuation_ws must be RFC 2822 compliant folding whitespace (usually
either a space or a hard tab) which will be prepended to continuation
lines.
errors is passed through to the .append() call.
"""
if charset is None:
charset = USASCII
if not isinstance(charset, Charset):
charset = Charset(charset)
self._charset = charset
self._continuation_ws = continuation_ws
cws_expanded_len = len(continuation_ws.replace('\t', SPACE8))
# BAW: I believe `chunks' and `maxlinelen' should be non-public.
self._chunks = []
if s is not None:
self.append(s, charset, errors)
if maxlinelen is None:
maxlinelen = MAXLINELEN
if header_name is None:
# We don't know anything about the field header so the first line
# is the same length as subsequent lines.
self._firstlinelen = maxlinelen
else:
# The first line should be shorter to take into account the field
# header. Also subtract off 2 extra for the colon and space.
self._firstlinelen = maxlinelen - len(header_name) - 2
# Second and subsequent lines should subtract off the length in
# columns of the continuation whitespace prefix.
self._maxlinelen = maxlinelen - cws_expanded_len
def __str__(self):
"""A synonym for self.encode()."""
return self.encode()
def __unicode__(self):
"""Helper for the built-in unicode function."""
uchunks = []
lastcs = None
for s, charset in self._chunks:
# We must preserve spaces between encoded and non-encoded word
# boundaries, which means for us we need to add a space when we go
# from a charset to None/us-ascii, or from None/us-ascii to a
# charset. Only do this for the second and subsequent chunks.
nextcs = charset
if uchunks:
if lastcs not in (None, 'us-ascii'):
if nextcs in (None, 'us-ascii'):
uchunks.append(USPACE)
nextcs = None
elif nextcs not in (None, 'us-ascii'):
uchunks.append(USPACE)
lastcs = nextcs
uchunks.append(unicode(s, str(charset)))
return UEMPTYSTRING.join(uchunks)
# Rich comparison operators for equality only. BAW: does it make sense to
# have or explicitly disable <, <=, >, >= operators?
def __eq__(self, other):
# other may be a Header or a string. Both are fine so coerce
# ourselves to a string, swap the args and do another comparison.
return other == self.encode()
def __ne__(self, other):
return not self == other
def append(self, s, charset=None, errors='strict'):
"""Append a string to the MIME header.
Optional charset, if given, should be a Charset instance or the name
of a character set (which will be converted to a Charset instance). A
value of None (the default) means that the charset given in the
constructor is used.
s may be a byte string or a Unicode string. If it is a byte string
(i.e. isinstance(s, str) is true), then charset is the encoding of
that byte string, and a UnicodeError will be raised if the string
cannot be decoded with that charset. If s is a Unicode string, then
charset is a hint specifying the character set of the characters in
the string. In this case, when producing an RFC 2822 compliant header
using RFC 2047 rules, the Unicode string will be encoded using the
following charsets in order: us-ascii, the charset hint, utf-8. The
first character set not to provoke a UnicodeError is used.
Optional `errors' is passed as the third argument to any unicode() or
ustr.encode() call.
"""
if charset is None:
charset = self._charset
elif not isinstance(charset, Charset):
charset = Charset(charset)
# If the charset is our faux 8bit charset, leave the string unchanged
if charset != '8bit':
# We need to test that the string can be converted to unicode and
# back to a byte string, given the input and output codecs of the
# charset.
if isinstance(s, str):
# Possibly raise UnicodeError if the byte string can't be
# converted to a unicode with the input codec of the charset.
incodec = charset.input_codec or 'us-ascii'
ustr = unicode(s, incodec, errors)
# Now make sure that the unicode could be converted back to a
# byte string with the output codec, which may be different
# than the iput coded. Still, use the original byte string.
outcodec = charset.output_codec or 'us-ascii'
ustr.encode(outcodec, errors)
elif isinstance(s, unicode):
# Now we have to be sure the unicode string can be converted
# to a byte string with a reasonable output codec. We want to
# use the byte string in the chunk.
for charset in USASCII, charset, UTF8:
try:
outcodec = charset.output_codec or 'us-ascii'
s = s.encode(outcodec, errors)
break
except UnicodeError:
pass
else:
assert False, 'utf-8 conversion failed'
self._chunks.append((s, charset))
def _split(self, s, charset, maxlinelen, splitchars):
# Split up a header safely for use with encode_chunks.
splittable = charset.to_splittable(s)
encoded = charset.from_splittable(splittable, True)
elen = charset.encoded_header_len(encoded)
# If the line's encoded length first, just return it
if elen <= maxlinelen:
return [(encoded, charset)]
# If we have undetermined raw 8bit characters sitting in a byte
# string, we really don't know what the right thing to do is. We
# can't really split it because it might be multibyte data which we
# could break if we split it between pairs. The least harm seems to
# be to not split the header at all, but that means they could go out
# longer than maxlinelen.
if charset == '8bit':
return [(s, charset)]
# BAW: I'm not sure what the right test here is. What we're trying to
# do is be faithful to RFC 2822's recommendation that ($2.2.3):
#
# "Note: Though structured field bodies are defined in such a way that
# folding can take place between many of the lexical tokens (and even
# within some of the lexical tokens), folding SHOULD be limited to
# placing the CRLF at higher-level syntactic breaks."
#
# For now, I can only imagine doing this when the charset is us-ascii,
# although it's possible that other charsets may also benefit from the
# higher-level syntactic breaks.
elif charset == 'us-ascii':
return self._split_ascii(s, charset, maxlinelen, splitchars)
# BAW: should we use encoded?
elif elen == len(s):
# We can split on _maxlinelen boundaries because we know that the
# encoding won't change the size of the string
splitpnt = maxlinelen
first = charset.from_splittable(splittable[:splitpnt], False)
last = charset.from_splittable(splittable[splitpnt:], False)
else:
# Binary search for split point
first, last = _binsplit(splittable, charset, maxlinelen)
# first is of the proper length so just wrap it in the appropriate
# chrome. last must be recursively split.
fsplittable = charset.to_splittable(first)
fencoded = charset.from_splittable(fsplittable, True)
chunk = [(fencoded, charset)]
return chunk + self._split(last, charset, self._maxlinelen, splitchars)
def _split_ascii(self, s, charset, firstlen, splitchars):
chunks = _split_ascii(s, firstlen, self._maxlinelen,
self._continuation_ws, splitchars)
return zip(chunks, [charset]*len(chunks))
def _encode_chunks(self, newchunks, maxlinelen):
# MIME-encode a header with many different charsets and/or encodings.
#
# Given a list of pairs (string, charset), return a MIME-encoded
# string suitable for use in a header field. Each pair may have
# different charsets and/or encodings, and the resulting header will
# accurately reflect each setting.
#
# Each encoding can be email.utils.QP (quoted-printable, for
# ASCII-like character sets like iso-8859-1), email.utils.BASE64
# (Base64, for non-ASCII like character sets like KOI8-R and
# iso-2022-jp), or None (no encoding).
#
# Each pair will be represented on a separate line; the resulting
# string will be in the format:
#
# =?charset1?q?Mar=EDa_Gonz=E1lez_Alonso?=\n
# =?charset2?b?SvxyZ2VuIEL2aW5n?="
chunks = []
for header, charset in newchunks:
if not header:
continue
if charset is None or charset.header_encoding is None:
s = header
else:
s = charset.header_encode(header)
# Don't add more folding whitespace than necessary
if chunks and chunks[-1].endswith(' '):
extra = ''
else:
extra = ' '
_max_append(chunks, s, maxlinelen, extra)
joiner = NL + self._continuation_ws
return joiner.join(chunks)
def encode(self, splitchars=';, '):
"""Encode a message header into an RFC-compliant format.
There are many issues involved in converting a given string for use in
an email header. Only certain character sets are readable in most
email clients, and as header strings can only contain a subset of
7-bit ASCII, care must be taken to properly convert and encode (with
Base64 or quoted-printable) header strings. In addition, there is a
75-character length limit on any given encoded header field, so
line-wrapping must be performed, even with double-byte character sets.
This method will do its best to convert the string to the correct
character set used in email, and encode and line wrap it safely with
the appropriate scheme for that character set.
If the given charset is not known or an error occurs during
conversion, this function will return the header untouched.
Optional splitchars is a string containing characters to split long
ASCII lines on, in rough support of RFC 2822's `highest level
syntactic breaks'. This doesn't affect RFC 2047 encoded lines.
"""
newchunks = []
maxlinelen = self._firstlinelen
lastlen = 0
for s, charset in self._chunks:
# The first bit of the next chunk should be just long enough to
# fill the next line. Don't forget the space separating the
# encoded words.
targetlen = maxlinelen - lastlen - 1
if targetlen < charset.encoded_header_len(''):
# Stick it on the next line
targetlen = maxlinelen
newchunks += self._split(s, charset, targetlen, splitchars)
lastchunk, lastcharset = newchunks[-1]
lastlen = lastcharset.encoded_header_len(lastchunk)
value = self._encode_chunks(newchunks, maxlinelen)
if _embeded_header.search(value):
raise HeaderParseError("header value appears to contain "
"an embedded header: {!r}".format(value))
return value
def _split_ascii(s, firstlen, restlen, continuation_ws, splitchars):
lines = []
maxlen = firstlen
for line in s.splitlines():
# Ignore any leading whitespace (i.e. continuation whitespace) already
# on the line, since we'll be adding our own.
line = line.lstrip()
if len(line) < maxlen:
lines.append(line)
maxlen = restlen
continue
# Attempt to split the line at the highest-level syntactic break
# possible. Note that we don't have a lot of smarts about field
# syntax; we just try to break on semi-colons, then commas, then
# whitespace.
for ch in splitchars:
if ch in line:
break
else:
# There's nothing useful to split the line on, not even spaces, so
# just append this line unchanged
lines.append(line)
maxlen = restlen
continue
# Now split the line on the character plus trailing whitespace
cre = re.compile(r'%s\s*' % ch)
if ch in ';,':
eol = ch
else:
eol = ''
joiner = eol + ' '
joinlen = len(joiner)
wslen = len(continuation_ws.replace('\t', SPACE8))
this = []
linelen = 0
for part in cre.split(line):
curlen = linelen + max(0, len(this)-1) * joinlen
partlen = len(part)
onfirstline = not lines
# We don't want to split after the field name, if we're on the
# first line and the field name is present in the header string.
if ch == ' ' and onfirstline and \
len(this) == 1 and fcre.match(this[0]):
this.append(part)
linelen += partlen
elif curlen + partlen > maxlen:
if this:
lines.append(joiner.join(this) + eol)
# If this part is longer than maxlen and we aren't already
# splitting on whitespace, try to recursively split this line
# on whitespace.
if partlen > maxlen and ch != ' ':
subl = _split_ascii(part, maxlen, restlen,
continuation_ws, ' ')
lines.extend(subl[:-1])
this = [subl[-1]]
else:
this = [part]
linelen = wslen + len(this[-1])
maxlen = restlen
else:
this.append(part)
linelen += partlen
# Put any left over parts on a line by themselves
if this:
lines.append(joiner.join(this))
return lines
def _binsplit(splittable, charset, maxlinelen):
i = 0
j = len(splittable)
while i < j:
# Invariants:
# 1. splittable[:k] fits for all k <= i (note that we *assume*,
# at the start, that splittable[:0] fits).
# 2. splittable[:k] does not fit for any k > j (at the start,
# this means we shouldn't look at any k > len(splittable)).
# 3. We don't know about splittable[:k] for k in i+1..j.
# 4. We want to set i to the largest k that fits, with i <= k <= j.
#
m = (i+j+1) >> 1 # ceiling((i+j)/2); i < m <= j
chunk = charset.from_splittable(splittable[:m], True)
chunklen = charset.encoded_header_len(chunk)
if chunklen <= maxlinelen:
# m is acceptable, so is a new lower bound.
i = m
else:
# m is not acceptable, so final i must be < m.
j = m - 1
# i == j. Invariant #1 implies that splittable[:i] fits, and
# invariant #2 implies that splittable[:i+1] does not fit, so i
# is what we're looking for.
first = charset.from_splittable(splittable[:i], False)
last = charset.from_splittable(splittable[i:], False)
return first, last
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Various types of useful iterators and generators."""
__all__ = [
'body_line_iterator',
'typed_subpart_iterator',
'walk',
# Do not include _structure() since it's part of the debugging API.
]
import sys
from cStringIO import StringIO
# This function will become a method of the Message class
def walk(self):
"""Walk over the message tree, yielding each subpart.
The walk is performed in depth-first order. This method is a
generator.
"""
yield self
if self.is_multipart():
for subpart in self.get_payload():
for subsubpart in subpart.walk():
yield subsubpart
# These two functions are imported into the Iterators.py interface module.
def body_line_iterator(msg, decode=False):
"""Iterate over the parts, returning string payloads line-by-line.
Optional decode (default False) is passed through to .get_payload().
"""
for subpart in msg.walk():
payload = subpart.get_payload(decode=decode)
if isinstance(payload, basestring):
for line in StringIO(payload):
yield line
def typed_subpart_iterator(msg, maintype='text', subtype=None):
"""Iterate over the subparts with a given MIME type.
Use `maintype' as the main MIME type to match against; this defaults to
"text". Optional `subtype' is the MIME subtype to match against; if
omitted, only the main type is matched.
"""
for subpart in msg.walk():
if subpart.get_content_maintype() == maintype:
if subtype is None or subpart.get_content_subtype() == subtype:
yield subpart
def _structure(msg, fp=None, level=0, include_default=False):
"""A handy debugging aid"""
if fp is None:
fp = sys.stdout
tab = ' ' * (level * 4)
print >> fp, tab + msg.get_content_type(),
if include_default:
print >> fp, '[%s]' % msg.get_default_type()
else:
print >> fp
if msg.is_multipart():
for subpart in msg.get_payload():
_structure(subpart, fp, level+1, include_default)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Keith Dart
# Contact: [email protected]
"""Class representing application/* type MIME documents."""
__all__ = ["MIMEApplication"]
from email import encoders
from email.mime.nonmultipart import MIMENonMultipart
class MIMEApplication(MIMENonMultipart):
"""Class for generating application/* MIME documents."""
def __init__(self, _data, _subtype='octet-stream',
_encoder=encoders.encode_base64, **_params):
"""Create an application/* type MIME document.
_data is a string containing the raw application data.
_subtype is the MIME content type subtype, defaulting to
'octet-stream'.
_encoder is a function which will perform the actual encoding for
transport of the application data, defaulting to base64 encoding.
Any additional keyword arguments are passed to the base class
constructor, which turns them into parameters on the Content-Type
header.
"""
if _subtype is None:
raise TypeError('Invalid application MIME subtype')
MIMENonMultipart.__init__(self, 'application', _subtype, **_params)
self.set_payload(_data)
_encoder(self)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Anthony Baxter
# Contact: [email protected]
"""Class representing audio/* type MIME documents."""
__all__ = ['MIMEAudio']
import sndhdr
from cStringIO import StringIO
from email import encoders
from email.mime.nonmultipart import MIMENonMultipart
_sndhdr_MIMEmap = {'au' : 'basic',
'wav' :'x-wav',
'aiff':'x-aiff',
'aifc':'x-aiff',
}
# There are others in sndhdr that don't have MIME types. :(
# Additional ones to be added to sndhdr? midi, mp3, realaudio, wma??
def _whatsnd(data):
"""Try to identify a sound file type.
sndhdr.what() has a pretty cruddy interface, unfortunately. This is why
we re-do it here. It would be easier to reverse engineer the Unix 'file'
command and use the standard 'magic' file, as shipped with a modern Unix.
"""
hdr = data[:512]
fakefile = StringIO(hdr)
for testfn in sndhdr.tests:
res = testfn(hdr, fakefile)
if res is not None:
return _sndhdr_MIMEmap.get(res[0])
return None
class MIMEAudio(MIMENonMultipart):
"""Class for generating audio/* MIME documents."""
def __init__(self, _audiodata, _subtype=None,
_encoder=encoders.encode_base64, **_params):
"""Create an audio/* type MIME document.
_audiodata is a string containing the raw audio data. If this data
can be decoded by the standard Python `sndhdr' module, then the
subtype will be automatically included in the Content-Type header.
Otherwise, you can specify the specific audio subtype via the
_subtype parameter. If _subtype is not given, and no subtype can be
guessed, a TypeError is raised.
_encoder is a function which will perform the actual encoding for
transport of the image data. It takes one argument, which is this
Image instance. It should use get_payload() and set_payload() to
change the payload to the encoded form. It should also add any
Content-Transfer-Encoding or other headers to the message as
necessary. The default encoding is Base64.
Any additional keyword arguments are passed to the base class
constructor, which turns them into parameters on the Content-Type
header.
"""
if _subtype is None:
_subtype = _whatsnd(_audiodata)
if _subtype is None:
raise TypeError('Could not find audio MIME subtype')
MIMENonMultipart.__init__(self, 'audio', _subtype, **_params)
self.set_payload(_audiodata)
_encoder(self)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Base class for MIME specializations."""
__all__ = ['MIMEBase']
from email import message
class MIMEBase(message.Message):
"""Base class for MIME specializations."""
def __init__(self, _maintype, _subtype, **_params):
"""This constructor adds a Content-Type: and a MIME-Version: header.
The Content-Type: header is taken from the _maintype and _subtype
arguments. Additional parameters for this header are taken from the
keyword arguments.
"""
message.Message.__init__(self)
ctype = '%s/%s' % (_maintype, _subtype)
self.add_header('Content-Type', ctype, **_params)
self['MIME-Version'] = '1.0'
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Class representing image/* type MIME documents."""
__all__ = ['MIMEImage']
import imghdr
from email import encoders
from email.mime.nonmultipart import MIMENonMultipart
class MIMEImage(MIMENonMultipart):
"""Class for generating image/* type MIME documents."""
def __init__(self, _imagedata, _subtype=None,
_encoder=encoders.encode_base64, **_params):
"""Create an image/* type MIME document.
_imagedata is a string containing the raw image data. If this data
can be decoded by the standard Python `imghdr' module, then the
subtype will be automatically included in the Content-Type header.
Otherwise, you can specify the specific image subtype via the _subtype
parameter.
_encoder is a function which will perform the actual encoding for
transport of the image data. It takes one argument, which is this
Image instance. It should use get_payload() and set_payload() to
change the payload to the encoded form. It should also add any
Content-Transfer-Encoding or other headers to the message as
necessary. The default encoding is Base64.
Any additional keyword arguments are passed to the base class
constructor, which turns them into parameters on the Content-Type
header.
"""
if _subtype is None:
_subtype = imghdr.what(None, _imagedata)
if _subtype is None:
raise TypeError('Could not guess image MIME subtype')
MIMENonMultipart.__init__(self, 'image', _subtype, **_params)
self.set_payload(_imagedata)
_encoder(self)
# Copyright (C) 2002-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Base class for MIME multipart/* type messages."""
__all__ = ['MIMEMultipart']
from email.mime.base import MIMEBase
class MIMEMultipart(MIMEBase):
"""Base class for MIME multipart/* type messages."""
def __init__(self, _subtype='mixed', boundary=None, _subparts=None,
**_params):
"""Creates a multipart/* type message.
By default, creates a multipart/mixed message, with proper
Content-Type and MIME-Version headers.
_subtype is the subtype of the multipart content type, defaulting to
`mixed'.
boundary is the multipart boundary string. By default it is
calculated as needed.
_subparts is a sequence of initial subparts for the payload. It
must be an iterable object, such as a list. You can always
attach new subparts to the message by using the attach() method.
Additional parameters for the Content-Type header are taken from the
keyword arguments (or passed into the _params argument).
"""
MIMEBase.__init__(self, 'multipart', _subtype, **_params)
# Initialise _payload to an empty list as the Message superclass's
# implementation of is_multipart assumes that _payload is a list for
# multipart messages.
self._payload = []
if _subparts:
for p in _subparts:
self.attach(p)
if boundary:
self.set_boundary(boundary)
# Copyright (C) 2002-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Base class for MIME type messages that are not multipart."""
__all__ = ['MIMENonMultipart']
from email import errors
from email.mime.base import MIMEBase
class MIMENonMultipart(MIMEBase):
"""Base class for MIME non-multipart type messages."""
def attach(self, payload):
# The public API prohibits attaching multiple subparts to MIMEBase
# derived subtypes since none of them are, by definition, of content
# type multipart/*
raise errors.MultipartConversionError(
'Cannot attach additional subparts to non-multipart/*')
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Class representing text/* type MIME documents."""
__all__ = ['MIMEText']
from email.encoders import encode_7or8bit
from email.mime.nonmultipart import MIMENonMultipart
class MIMEText(MIMENonMultipart):
"""Class for generating text/* type MIME documents."""
def __init__(self, _text, _subtype='plain', _charset='us-ascii'):
"""Create a text/* type MIME document.
_text is the string for this message object.
_subtype is the MIME sub content type, defaulting to "plain".
_charset is the character set parameter added to the Content-Type
header. This defaults to "us-ascii". Note that as a side-effect, the
Content-Transfer-Encoding header will also be set.
"""
MIMENonMultipart.__init__(self, 'text', _subtype,
**{'charset': _charset})
self.set_payload(_text, _charset)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw, Thomas Wouters, Anthony Baxter
# Contact: [email protected]
"""A parser of RFC 2822 and MIME email messages."""
__all__ = ['Parser', 'HeaderParser']
import warnings
from cStringIO import StringIO
from email.feedparser import FeedParser
from email.message import Message
class Parser:
def __init__(self, *args, **kws):
"""Parser of RFC 2822 and MIME email messages.
Creates an in-memory object tree representing the email message, which
can then be manipulated and turned over to a Generator to return the
textual representation of the message.
The string must be formatted as a block of RFC 2822 headers and header
continuation lines, optionally preceded by a `Unix-from' header. The
header block is terminated either by the end of the string or by a
blank line.
_class is the class to instantiate for new message objects when they
must be created. This class must have a constructor that can take
zero arguments. Default is Message.Message.
"""
if len(args) >= 1:
if '_class' in kws:
raise TypeError("Multiple values for keyword arg '_class'")
kws['_class'] = args[0]
if len(args) == 2:
if 'strict' in kws:
raise TypeError("Multiple values for keyword arg 'strict'")
kws['strict'] = args[1]
if len(args) > 2:
raise TypeError('Too many arguments')
if '_class' in kws:
self._class = kws['_class']
del kws['_class']
else:
self._class = Message
if 'strict' in kws:
warnings.warn("'strict' argument is deprecated (and ignored)",
DeprecationWarning, 2)
del kws['strict']
if kws:
raise TypeError('Unexpected keyword arguments')
def parse(self, fp, headersonly=False):
"""Create a message structure from the data in a file.
Reads all the data from the file and returns the root of the message
structure. Optional headersonly is a flag specifying whether to stop
parsing after reading the headers or not. The default is False,
meaning it parses the entire contents of the file.
"""
feedparser = FeedParser(self._class)
if headersonly:
feedparser._set_headersonly()
while True:
data = fp.read(8192)
if not data:
break
feedparser.feed(data)
return feedparser.close()
def parsestr(self, text, headersonly=False):
"""Create a message structure from a string.
Returns the root of the message structure. Optional headersonly is a
flag specifying whether to stop parsing after reading the headers or
not. The default is False, meaning it parses the entire contents of
the file.
"""
return self.parse(StringIO(text), headersonly=headersonly)
class HeaderParser(Parser):
def parse(self, fp, headersonly=True):
return Parser.parse(self, fp, True)
def parsestr(self, text, headersonly=True):
return Parser.parsestr(self, text, True)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Ben Gertzfield
# Contact: [email protected]
"""Quoted-printable content transfer encoding per RFCs 2045-2047.
This module handles the content transfer encoding method defined in RFC 2045
to encode US ASCII-like 8-bit data called `quoted-printable'. It is used to
safely encode text that is in a character set similar to the 7-bit US ASCII
character set, but that includes some 8-bit characters that are normally not
allowed in email bodies or headers.
Quoted-printable is very space-inefficient for encoding binary files; use the
email.base64mime module for that instead.
This module provides an interface to encode and decode both headers and bodies
with quoted-printable encoding.
RFC 2045 defines a method for including character set information in an
`encoded-word' in a header. This method is commonly used for 8-bit real names
in To:/From:/Cc: etc. fields, as well as Subject: lines.
This module does not do the line wrapping or end-of-line character
conversion necessary for proper internationalized headers; it only
does dumb encoding and decoding. To deal with the various line
wrapping issues, use the email.header module.
"""
__all__ = [
'body_decode',
'body_encode',
'body_quopri_check',
'body_quopri_len',
'decode',
'decodestring',
'encode',
'encodestring',
'header_decode',
'header_encode',
'header_quopri_check',
'header_quopri_len',
'quote',
'unquote',
]
import re
from string import hexdigits
from email.utils import fix_eols
CRLF = '\r\n'
NL = '\n'
# See also Charset.py
MISC_LEN = 7
hqre = re.compile(r'[^-a-zA-Z0-9!*+/ ]')
bqre = re.compile(r'[^ !-<>-~\t]')
# Helpers
def header_quopri_check(c):
"""Return True if the character should be escaped with header quopri."""
return bool(hqre.match(c))
def body_quopri_check(c):
"""Return True if the character should be escaped with body quopri."""
return bool(bqre.match(c))
def header_quopri_len(s):
"""Return the length of str when it is encoded with header quopri."""
count = 0
for c in s:
if hqre.match(c):
count += 3
else:
count += 1
return count
def body_quopri_len(str):
"""Return the length of str when it is encoded with body quopri."""
count = 0
for c in str:
if bqre.match(c):
count += 3
else:
count += 1
return count
def _max_append(L, s, maxlen, extra=''):
if not L:
L.append(s.lstrip())
elif len(L[-1]) + len(s) <= maxlen:
L[-1] += extra + s
else:
L.append(s.lstrip())
def unquote(s):
"""Turn a string in the form =AB to the ASCII character with value 0xab"""
return chr(int(s[1:3], 16))
def quote(c):
return "=%02X" % ord(c)
def header_encode(header, charset="iso-8859-1", keep_eols=False,
maxlinelen=76, eol=NL):
"""Encode a single header line with quoted-printable (like) encoding.
Defined in RFC 2045, this `Q' encoding is similar to quoted-printable, but
used specifically for email header fields to allow charsets with mostly 7
bit characters (and some 8 bit) to remain more or less readable in non-RFC
2045 aware mail clients.
charset names the character set to use to encode the header. It defaults
to iso-8859-1.
The resulting string will be in the form:
"=?charset?q?I_f=E2rt_in_your_g=E8n=E8ral_dire=E7tion?\\n
=?charset?q?Silly_=C8nglish_Kn=EEghts?="
with each line wrapped safely at, at most, maxlinelen characters (defaults
to 76 characters). If maxlinelen is None, the entire string is encoded in
one chunk with no splitting.
End-of-line characters (\\r, \\n, \\r\\n) will be automatically converted
to the canonical email line separator \\r\\n unless the keep_eols
parameter is True (the default is False).
Each line of the header will be terminated in the value of eol, which
defaults to "\\n". Set this to "\\r\\n" if you are using the result of
this function directly in email.
"""
# Return empty headers unchanged
if not header:
return header
if not keep_eols:
header = fix_eols(header)
# Quopri encode each line, in encoded chunks no greater than maxlinelen in
# length, after the RFC chrome is added in.
quoted = []
if maxlinelen is None:
# An obnoxiously large number that's good enough
max_encoded = 100000
else:
max_encoded = maxlinelen - len(charset) - MISC_LEN - 1
for c in header:
# Space may be represented as _ instead of =20 for readability
if c == ' ':
_max_append(quoted, '_', max_encoded)
# These characters can be included verbatim
elif not hqre.match(c):
_max_append(quoted, c, max_encoded)
# Otherwise, replace with hex value like =E2
else:
_max_append(quoted, "=%02X" % ord(c), max_encoded)
# Now add the RFC chrome to each encoded chunk and glue the chunks
# together. BAW: should we be able to specify the leading whitespace in
# the joiner?
joiner = eol + ' '
return joiner.join(['=?%s?q?%s?=' % (charset, line) for line in quoted])
def encode(body, binary=False, maxlinelen=76, eol=NL):
"""Encode with quoted-printable, wrapping at maxlinelen characters.
If binary is False (the default), end-of-line characters will be converted
to the canonical email end-of-line sequence \\r\\n. Otherwise they will
be left verbatim.
Each line of encoded text will end with eol, which defaults to "\\n". Set
this to "\\r\\n" if you will be using the result of this function directly
in an email.
Each line will be wrapped at, at most, maxlinelen characters (defaults to
76 characters). Long lines will have the `soft linefeed' quoted-printable
character "=" appended to them, so the decoded text will be identical to
the original text.
"""
if not body:
return body
if not binary:
body = fix_eols(body)
# BAW: We're accumulating the body text by string concatenation. That
# can't be very efficient, but I don't have time now to rewrite it. It
# just feels like this algorithm could be more efficient.
encoded_body = ''
lineno = -1
# Preserve line endings here so we can check later to see an eol needs to
# be added to the output later.
lines = body.splitlines(1)
for line in lines:
# But strip off line-endings for processing this line.
if line.endswith(CRLF):
line = line[:-2]
elif line[-1] in CRLF:
line = line[:-1]
lineno += 1
encoded_line = ''
prev = None
linelen = len(line)
# Now we need to examine every character to see if it needs to be
# quopri encoded. BAW: again, string concatenation is inefficient.
for j in range(linelen):
c = line[j]
prev = c
if bqre.match(c):
c = quote(c)
elif j+1 == linelen:
# Check for whitespace at end of line; special case
if c not in ' \t':
encoded_line += c
prev = c
continue
# Check to see to see if the line has reached its maximum length
if len(encoded_line) + len(c) >= maxlinelen:
encoded_body += encoded_line + '=' + eol
encoded_line = ''
encoded_line += c
# Now at end of line..
if prev and prev in ' \t':
# Special case for whitespace at end of file
if lineno + 1 == len(lines):
prev = quote(prev)
if len(encoded_line) + len(prev) > maxlinelen:
encoded_body += encoded_line + '=' + eol + prev
else:
encoded_body += encoded_line + prev
# Just normal whitespace at end of line
else:
encoded_body += encoded_line + prev + '=' + eol
encoded_line = ''
# Now look at the line we just finished and it has a line ending, we
# need to add eol to the end of the line.
if lines[lineno].endswith(CRLF) or lines[lineno][-1] in CRLF:
encoded_body += encoded_line + eol
else:
encoded_body += encoded_line
encoded_line = ''
return encoded_body
# For convenience and backwards compatibility w/ standard base64 module
body_encode = encode
encodestring = encode
# BAW: I'm not sure if the intent was for the signature of this function to be
# the same as base64MIME.decode() or not...
def decode(encoded, eol=NL):
"""Decode a quoted-printable string.
Lines are separated with eol, which defaults to \\n.
"""
if not encoded:
return encoded
# BAW: see comment in encode() above. Again, we're building up the
# decoded string with string concatenation, which could be done much more
# efficiently.
decoded = ''
for line in encoded.splitlines():
line = line.rstrip()
if not line:
decoded += eol
continue
i = 0
n = len(line)
while i < n:
c = line[i]
if c != '=':
decoded += c
i += 1
# Otherwise, c == "=". Are we at the end of the line? If so, add
# a soft line break.
elif i+1 == n:
i += 1
continue
# Decode if in form =AB
elif i+2 < n and line[i+1] in hexdigits and line[i+2] in hexdigits:
decoded += unquote(line[i:i+3])
i += 3
# Otherwise, not in form =AB, pass literally
else:
decoded += c
i += 1
if i == n:
decoded += eol
# Special case if original string did not end with eol
if not encoded.endswith(eol) and decoded.endswith(eol):
decoded = decoded[:-1]
return decoded
# For convenience and backwards compatibility w/ standard base64 module
body_decode = decode
decodestring = decode
def _unquote_match(match):
"""Turn a match in the form =AB to the ASCII character with value 0xab"""
s = match.group(0)
return unquote(s)
# Header decoding is done a bit differently
def header_decode(s):
"""Decode a string encoded with RFC 2045 MIME header `Q' encoding.
This function does not parse a full MIME header value encoded with
quoted-printable (like =?iso-8859-1?q?Hello_World?=) -- please use
the high level email.header class for that functionality.
"""
s = s.replace('_', ' ')
return re.sub(r'=[a-fA-F0-9]{2}', _unquote_match, s)
# Copyright (C) 2001-2010 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Miscellaneous utilities."""
__all__ = [
'collapse_rfc2231_value',
'decode_params',
'decode_rfc2231',
'encode_rfc2231',
'formataddr',
'formatdate',
'getaddresses',
'make_msgid',
'mktime_tz',
'parseaddr',
'parsedate',
'parsedate_tz',
'unquote',
]
import os
import re
import time
import base64
import random
import socket
import urllib
import warnings
from email._parseaddr import quote
from email._parseaddr import AddressList as _AddressList
from email._parseaddr import mktime_tz
# We need wormarounds for bugs in these methods in older Pythons (see below)
from email._parseaddr import parsedate as _parsedate
from email._parseaddr import parsedate_tz as _parsedate_tz
from quopri import decodestring as _qdecode
# Intrapackage imports
from email.encoders import _bencode, _qencode
COMMASPACE = ', '
EMPTYSTRING = ''
UEMPTYSTRING = u''
CRLF = '\r\n'
TICK = "'"
specialsre = re.compile(r'[][\\()<>@,:;".]')
escapesre = re.compile(r'[][\\()"]')
# Helpers
def _identity(s):
return s
def _bdecode(s):
"""Decodes a base64 string.
This function is equivalent to base64.decodestring and it's retained only
for backward compatibility. It used to remove the last \\n of the decoded
string, if it had any (see issue 7143).
"""
if not s:
return s
return base64.decodestring(s)
def fix_eols(s):
"""Replace all line-ending characters with \\r\\n."""
# Fix newlines with no preceding carriage return
s = re.sub(r'(?<!\r)\n', CRLF, s)
# Fix carriage returns with no following newline
s = re.sub(r'\r(?!\n)', CRLF, s)
return s
def formataddr(pair):
"""The inverse of parseaddr(), this takes a 2-tuple of the form
(realname, email_address) and returns the string value suitable
for an RFC 2822 From, To or Cc header.
If the first element of pair is false, then the second element is
returned unmodified.
"""
name, address = pair
if name:
quotes = ''
if specialsre.search(name):
quotes = '"'
name = escapesre.sub(r'\\\g<0>', name)
return '%s%s%s <%s>' % (quotes, name, quotes, address)
return address
def getaddresses(fieldvalues):
"""Return a list of (REALNAME, EMAIL) for each fieldvalue."""
all = COMMASPACE.join(fieldvalues)
a = _AddressList(all)
return a.addresslist
ecre = re.compile(r'''
=\? # literal =?
(?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
\? # literal ?
(?P<encoding>[qb]) # either a "q" or a "b", case insensitive
\? # literal ?
(?P<atom>.*?) # non-greedy up to the next ?= is the atom
\?= # literal ?=
''', re.VERBOSE | re.IGNORECASE)
def formatdate(timeval=None, localtime=False, usegmt=False):
"""Returns a date string as specified by RFC 2822, e.g.:
Fri, 09 Nov 2001 01:08:47 -0000
Optional timeval if given is a floating point time value as accepted by
gmtime() and localtime(), otherwise the current time is used.
Optional localtime is a flag that when True, interprets timeval, and
returns a date relative to the local timezone instead of UTC, properly
taking daylight savings time into account.
Optional argument usegmt means that the timezone is written out as
an ascii string, not numeric one (so "GMT" instead of "+0000"). This
is needed for HTTP, and is only used when localtime==False.
"""
# Note: we cannot use strftime() because that honors the locale and RFC
# 2822 requires that day and month names be the English abbreviations.
if timeval is None:
timeval = time.time()
if localtime:
now = time.localtime(timeval)
# Calculate timezone offset, based on whether the local zone has
# daylight savings time, and whether DST is in effect.
if time.daylight and now[-1]:
offset = time.altzone
else:
offset = time.timezone
hours, minutes = divmod(abs(offset), 3600)
# Remember offset is in seconds west of UTC, but the timezone is in
# minutes east of UTC, so the signs differ.
if offset > 0:
sign = '-'
else:
sign = '+'
zone = '%s%02d%02d' % (sign, hours, minutes // 60)
else:
now = time.gmtime(timeval)
# Timezone offset is always -0000
if usegmt:
zone = 'GMT'
else:
zone = '-0000'
return '%s, %02d %s %04d %02d:%02d:%02d %s' % (
['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'][now[6]],
now[2],
['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'][now[1] - 1],
now[0], now[3], now[4], now[5],
zone)
def make_msgid(idstring=None):
"""Returns a string suitable for RFC 2822 compliant Message-ID, e.g:
<142480216486.20800.16526388040877946887@nightshade.la.mastaler.com>
Optional idstring if given is a string used to strengthen the
uniqueness of the message id.
"""
timeval = int(time.time()*100)
pid = os.getpid()
randint = random.getrandbits(64)
if idstring is None:
idstring = ''
else:
idstring = '.' + idstring
idhost = socket.getfqdn()
msgid = '<%d.%d.%d%s@%s>' % (timeval, pid, randint, idstring, idhost)
return msgid
# These functions are in the standalone mimelib version only because they've
# subsequently been fixed in the latest Python versions. We use this to worm
# around broken older Pythons.
def parsedate(data):
if not data:
return None
return _parsedate(data)
def parsedate_tz(data):
if not data:
return None
return _parsedate_tz(data)
def parseaddr(addr):
"""
Parse addr into its constituent realname and email address parts.
Return a tuple of realname and email address, unless the parse fails, in
which case return a 2-tuple of ('', '').
"""
addrs = _AddressList(addr).addresslist
if not addrs:
return '', ''
return addrs[0]
# rfc822.unquote() doesn't properly de-backslash-ify in Python pre-2.3.
def unquote(str):
"""Remove quotes from a string."""
if len(str) > 1:
if str.startswith('"') and str.endswith('"'):
return str[1:-1].replace('\\\\', '\\').replace('\\"', '"')
if str.startswith('<') and str.endswith('>'):
return str[1:-1]
return str
# RFC2231-related functions - parameter encoding and decoding
def decode_rfc2231(s):
"""Decode string according to RFC 2231"""
parts = s.split(TICK, 2)
if len(parts) <= 2:
return None, None, s
return parts
def encode_rfc2231(s, charset=None, language=None):
"""Encode string according to RFC 2231.
If neither charset nor language is given, then s is returned as-is. If
charset is given but not language, the string is encoded using the empty
string for language.
"""
import urllib
s = urllib.quote(s, safe='')
if charset is None and language is None:
return s
if language is None:
language = ''
return "%s'%s'%s" % (charset, language, s)
rfc2231_continuation = re.compile(r'^(?P<name>\w+)\*((?P<num>[0-9]+)\*?)?$')
def decode_params(params):
"""Decode parameters list according to RFC 2231.
params is a sequence of 2-tuples containing (param name, string value).
"""
# Copy params so we don't mess with the original
params = params[:]
new_params = []
# Map parameter's name to a list of continuations. The values are a
# 3-tuple of the continuation number, the string value, and a flag
# specifying whether a particular segment is %-encoded.
rfc2231_params = {}
name, value = params.pop(0)
new_params.append((name, value))
while params:
name, value = params.pop(0)
if name.endswith('*'):
encoded = True
else:
encoded = False
value = unquote(value)
mo = rfc2231_continuation.match(name)
if mo:
name, num = mo.group('name', 'num')
if num is not None:
num = int(num)
rfc2231_params.setdefault(name, []).append((num, value, encoded))
else:
new_params.append((name, '"%s"' % quote(value)))
if rfc2231_params:
for name, continuations in rfc2231_params.items():
value = []
extended = False
# Sort by number
continuations.sort()
# And now append all values in numerical order, converting
# %-encodings for the encoded segments. If any of the
# continuation names ends in a *, then the entire string, after
# decoding segments and concatenating, must have the charset and
# language specifiers at the beginning of the string.
for num, s, encoded in continuations:
if encoded:
s = urllib.unquote(s)
extended = True
value.append(s)
value = quote(EMPTYSTRING.join(value))
if extended:
charset, language, value = decode_rfc2231(value)
new_params.append((name, (charset, language, '"%s"' % value)))
else:
new_params.append((name, '"%s"' % value))
return new_params
def collapse_rfc2231_value(value, errors='replace',
fallback_charset='us-ascii'):
if isinstance(value, tuple):
rawval = unquote(value[2])
charset = value[0] or 'us-ascii'
try:
return unicode(rawval, charset, errors)
except LookupError:
# XXX charset is unknown to Python.
return unicode(rawval, fallback_charset, errors)
else:
return unquote(value)
# Copyright (C) 2002-2007 Python Software Foundation
# Contact: [email protected]
"""Email address parsing code.
Lifted directly from rfc822.py. This should eventually be rewritten.
"""
__all__ = [
'mktime_tz',
'parsedate',
'parsedate_tz',
'quote',
]
import time, calendar
SPACE = ' '
EMPTYSTRING = ''
COMMASPACE = ', '
# Parse a date field
_monthnames = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul',
'aug', 'sep', 'oct', 'nov', 'dec',
'january', 'february', 'march', 'april', 'may', 'june', 'july',
'august', 'september', 'october', 'november', 'december']
_daynames = ['mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun']
# The timezone table does not include the military time zones defined
# in RFC822, other than Z. According to RFC1123, the description in
# RFC822 gets the signs wrong, so we can't rely on any such time
# zones. RFC1123 recommends that numeric timezone indicators be used
# instead of timezone names.
_timezones = {'UT':0, 'UTC':0, 'GMT':0, 'Z':0,
'AST': -400, 'ADT': -300, # Atlantic (used in Canada)
'EST': -500, 'EDT': -400, # Eastern
'CST': -600, 'CDT': -500, # Central
'MST': -700, 'MDT': -600, # Mountain
'PST': -800, 'PDT': -700 # Pacific
}
def parsedate_tz(data):
"""Convert a date string to a time tuple.
Accounts for military timezones.
"""
data = data.split()
# The FWS after the comma after the day-of-week is optional, so search and
# adjust for this.
if data[0].endswith(',') or data[0].lower() in _daynames:
# There's a dayname here. Skip it
del data[0]
else:
i = data[0].rfind(',')
if i >= 0:
data[0] = data[0][i+1:]
if len(data) == 3: # RFC 850 date, deprecated
stuff = data[0].split('-')
if len(stuff) == 3:
data = stuff + data[1:]
if len(data) == 4:
s = data[3]
i = s.find('+')
if i > 0:
data[3:] = [s[:i], s[i+1:]]
else:
data.append('') # Dummy tz
if len(data) < 5:
return None
data = data[:5]
[dd, mm, yy, tm, tz] = data
mm = mm.lower()
if mm not in _monthnames:
dd, mm = mm, dd.lower()
if mm not in _monthnames:
return None
mm = _monthnames.index(mm) + 1
if mm > 12:
mm -= 12
if dd[-1] == ',':
dd = dd[:-1]
i = yy.find(':')
if i > 0:
yy, tm = tm, yy
if yy[-1] == ',':
yy = yy[:-1]
if not yy[0].isdigit():
yy, tz = tz, yy
if tm[-1] == ',':
tm = tm[:-1]
tm = tm.split(':')
if len(tm) == 2:
[thh, tmm] = tm
tss = '0'
elif len(tm) == 3:
[thh, tmm, tss] = tm
else:
return None
try:
yy = int(yy)
dd = int(dd)
thh = int(thh)
tmm = int(tmm)
tss = int(tss)
except ValueError:
return None
# Check for a yy specified in two-digit format, then convert it to the
# appropriate four-digit format, according to the POSIX standard. RFC 822
# calls for a two-digit yy, but RFC 2822 (which obsoletes RFC 822)
# mandates a 4-digit yy. For more information, see the documentation for
# the time module.
if yy < 100:
# The year is between 1969 and 1999 (inclusive).
if yy > 68:
yy += 1900
# The year is between 2000 and 2068 (inclusive).
else:
yy += 2000
tzoffset = None
tz = tz.upper()
if tz in _timezones:
tzoffset = _timezones[tz]
else:
try:
tzoffset = int(tz)
except ValueError:
pass
# Convert a timezone offset into seconds ; -0500 -> -18000
if tzoffset:
if tzoffset < 0:
tzsign = -1
tzoffset = -tzoffset
else:
tzsign = 1
tzoffset = tzsign * ( (tzoffset//100)*3600 + (tzoffset % 100)*60)
# Daylight Saving Time flag is set to -1, since DST is unknown.
return yy, mm, dd, thh, tmm, tss, 0, 1, -1, tzoffset
def parsedate(data):
"""Convert a time string to a time tuple."""
t = parsedate_tz(data)
if isinstance(t, tuple):
return t[:9]
else:
return t
def mktime_tz(data):
"""Turn a 10-tuple as returned by parsedate_tz() into a POSIX timestamp."""
if data[9] is None:
# No zone info, so localtime is better assumption than GMT
return time.mktime(data[:8] + (-1,))
else:
t = calendar.timegm(data)
return t - data[9]
def quote(str):
"""Prepare string to be used in a quoted string.
Turns backslash and double quote characters into quoted pairs. These
are the only characters that need to be quoted inside a quoted string.
Does not add the surrounding double quotes.
"""
return str.replace('\\', '\\\\').replace('"', '\\"')
class AddrlistClass:
"""Address parser class by Ben Escoto.
To understand what this class does, it helps to have a copy of RFC 2822 in
front of you.
Note: this class interface is deprecated and may be removed in the future.
Use rfc822.AddressList instead.
"""
def __init__(self, field):
"""Initialize a new instance.
`field' is an unparsed address header field, containing
one or more addresses.
"""
self.specials = '()<>@,:;.\"[]'
self.pos = 0
self.LWS = ' \t'
self.CR = '\r\n'
self.FWS = self.LWS + self.CR
self.atomends = self.specials + self.LWS + self.CR
# Note that RFC 2822 now specifies `.' as obs-phrase, meaning that it
# is obsolete syntax. RFC 2822 requires that we recognize obsolete
# syntax, so allow dots in phrases.
self.phraseends = self.atomends.replace('.', '')
self.field = field
self.commentlist = []
def gotonext(self):
"""Parse up to the start of the next address."""
while self.pos < len(self.field):
if self.field[self.pos] in self.LWS + '\n\r':
self.pos += 1
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
else:
break
def getaddrlist(self):
"""Parse all addresses.
Returns a list containing all of the addresses.
"""
result = []
while self.pos < len(self.field):
ad = self.getaddress()
if ad:
result += ad
else:
result.append(('', ''))
return result
def getaddress(self):
"""Parse the next address."""
self.commentlist = []
self.gotonext()
oldpos = self.pos
oldcl = self.commentlist
plist = self.getphraselist()
self.gotonext()
returnlist = []
if self.pos >= len(self.field):
# Bad email address technically, no domain.
if plist:
returnlist = [(SPACE.join(self.commentlist), plist[0])]
elif self.field[self.pos] in '.@':
# email address is just an addrspec
# this isn't very efficient since we start over
self.pos = oldpos
self.commentlist = oldcl
addrspec = self.getaddrspec()
returnlist = [(SPACE.join(self.commentlist), addrspec)]
elif self.field[self.pos] == ':':
# address is a group
returnlist = []
fieldlen = len(self.field)
self.pos += 1
while self.pos < len(self.field):
self.gotonext()
if self.pos < fieldlen and self.field[self.pos] == ';':
self.pos += 1
break
returnlist = returnlist + self.getaddress()
elif self.field[self.pos] == '<':
# Address is a phrase then a route addr
routeaddr = self.getrouteaddr()
if self.commentlist:
returnlist = [(SPACE.join(plist) + ' (' +
' '.join(self.commentlist) + ')', routeaddr)]
else:
returnlist = [(SPACE.join(plist), routeaddr)]
else:
if plist:
returnlist = [(SPACE.join(self.commentlist), plist[0])]
elif self.field[self.pos] in self.specials:
self.pos += 1
self.gotonext()
if self.pos < len(self.field) and self.field[self.pos] == ',':
self.pos += 1
return returnlist
def getrouteaddr(self):
"""Parse a route address (Return-path value).
This method just skips all the route stuff and returns the addrspec.
"""
if self.field[self.pos] != '<':
return
expectroute = False
self.pos += 1
self.gotonext()
adlist = ''
while self.pos < len(self.field):
if expectroute:
self.getdomain()
expectroute = False
elif self.field[self.pos] == '>':
self.pos += 1
break
elif self.field[self.pos] == '@':
self.pos += 1
expectroute = True
elif self.field[self.pos] == ':':
self.pos += 1
else:
adlist = self.getaddrspec()
self.pos += 1
break
self.gotonext()
return adlist
def getaddrspec(self):
"""Parse an RFC 2822 addr-spec."""
aslist = []
self.gotonext()
while self.pos < len(self.field):
if self.field[self.pos] == '.':
aslist.append('.')
self.pos += 1
elif self.field[self.pos] == '"':
aslist.append('"%s"' % quote(self.getquote()))
elif self.field[self.pos] in self.atomends:
break
else:
aslist.append(self.getatom())
self.gotonext()
if self.pos >= len(self.field) or self.field[self.pos] != '@':
return EMPTYSTRING.join(aslist)
aslist.append('@')
self.pos += 1
self.gotonext()
domain = self.getdomain()
if not domain:
# Invalid domain, return an empty address instead of returning a
# local part to denote failed parsing.
return EMPTYSTRING
return EMPTYSTRING.join(aslist) + domain
def getdomain(self):
"""Get the complete domain name from an address."""
sdlist = []
while self.pos < len(self.field):
if self.field[self.pos] in self.LWS:
self.pos += 1
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
elif self.field[self.pos] == '[':
sdlist.append(self.getdomainliteral())
elif self.field[self.pos] == '.':
self.pos += 1
sdlist.append('.')
elif self.field[self.pos] == '@':
# bpo-34155: Don't parse domains with two `@` like
# `[email protected]@important.com`.
return EMPTYSTRING
elif self.field[self.pos] in self.atomends:
break
else:
sdlist.append(self.getatom())
return EMPTYSTRING.join(sdlist)
def getdelimited(self, beginchar, endchars, allowcomments=True):
"""Parse a header fragment delimited by special characters.
`beginchar' is the start character for the fragment.
If self is not looking at an instance of `beginchar' then
getdelimited returns the empty string.
`endchars' is a sequence of allowable end-delimiting characters.
Parsing stops when one of these is encountered.
If `allowcomments' is non-zero, embedded RFC 2822 comments are allowed
within the parsed fragment.
"""
if self.field[self.pos] != beginchar:
return ''
slist = ['']
quote = False
self.pos += 1
while self.pos < len(self.field):
if quote:
slist.append(self.field[self.pos])
quote = False
elif self.field[self.pos] in endchars:
self.pos += 1
break
elif allowcomments and self.field[self.pos] == '(':
slist.append(self.getcomment())
continue # have already advanced pos from getcomment
elif self.field[self.pos] == '\\':
quote = True
else:
slist.append(self.field[self.pos])
self.pos += 1
return EMPTYSTRING.join(slist)
def getquote(self):
"""Get a quote-delimited fragment from self's field."""
return self.getdelimited('"', '"\r', False)
def getcomment(self):
"""Get a parenthesis-delimited fragment from self's field."""
return self.getdelimited('(', ')\r', True)
def getdomainliteral(self):
"""Parse an RFC 2822 domain-literal."""
return '[%s]' % self.getdelimited('[', ']\r', False)
def getatom(self, atomends=None):
"""Parse an RFC 2822 atom.
Optional atomends specifies a different set of end token delimiters
(the default is to use self.atomends). This is used e.g. in
getphraselist() since phrase endings must not include the `.' (which
is legal in phrases)."""
atomlist = ['']
if atomends is None:
atomends = self.atomends
while self.pos < len(self.field):
if self.field[self.pos] in atomends:
break
else:
atomlist.append(self.field[self.pos])
self.pos += 1
return EMPTYSTRING.join(atomlist)
def getphraselist(self):
"""Parse a sequence of RFC 2822 phrases.
A phrase is a sequence of words, which are in turn either RFC 2822
atoms or quoted-strings. Phrases are canonicalized by squeezing all
runs of continuous whitespace into one space.
"""
plist = []
while self.pos < len(self.field):
if self.field[self.pos] in self.FWS:
self.pos += 1
elif self.field[self.pos] == '"':
plist.append(self.getquote())
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
elif self.field[self.pos] in self.phraseends:
break
else:
plist.append(self.getatom(self.phraseends))
return plist
class AddressList(AddrlistClass):
"""An AddressList encapsulates a list of parsed RFC 2822 addresses."""
def __init__(self, field):
AddrlistClass.__init__(self, field)
if field:
self.addresslist = self.getaddrlist()
else:
self.addresslist = []
def __len__(self):
return len(self.addresslist)
def __add__(self, other):
# Set union
newaddr = AddressList(None)
newaddr.addresslist = self.addresslist[:]
for x in other.addresslist:
if not x in self.addresslist:
newaddr.addresslist.append(x)
return newaddr
def __iadd__(self, other):
# Set union, in-place
for x in other.addresslist:
if not x in self.addresslist:
self.addresslist.append(x)
return self
def __sub__(self, other):
# Set difference
newaddr = AddressList(None)
for x in self.addresslist:
if not x in other.addresslist:
newaddr.addresslist.append(x)
return newaddr
def __isub__(self, other):
# Set difference, in-place
for x in other.addresslist:
if x in self.addresslist:
self.addresslist.remove(x)
return self
def __getitem__(self, index):
# Make indexing, slices, and 'in' work
return self.addresslist[index]
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""A package for parsing, handling, and generating email messages."""
__version__ = '4.0.3'
__all__ = [
# Old names
'base64MIME',
'Charset',
'Encoders',
'Errors',
'Generator',
'Header',
'Iterators',
'Message',
'MIMEAudio',
'MIMEBase',
'MIMEImage',
'MIMEMessage',
'MIMEMultipart',
'MIMENonMultipart',
'MIMEText',
'Parser',
'quopriMIME',
'Utils',
'message_from_string',
'message_from_file',
# new names
'base64mime',
'charset',
'encoders',
'errors',
'generator',
'header',
'iterators',
'message',
'mime',
'parser',
'quoprimime',
'utils',
]
# Some convenience routines. Don't import Parser and Message as side-effects
# of importing email since those cascadingly import most of the rest of the
# email package.
def message_from_string(s, *args, **kws):
"""Parse a string into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import Parser
return Parser(*args, **kws).parsestr(s)
def message_from_file(fp, *args, **kws):
"""Read a file and parse its contents into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import Parser
return Parser(*args, **kws).parse(fp)
# Lazy loading to provide name mapping from new-style names (PEP 8 compatible
# email 4.0 module names), to old-style names (email 3.0 module names).
import sys
class LazyImporter(object):
def __init__(self, module_name):
self.__name__ = 'email.' + module_name
def __getattr__(self, name):
__import__(self.__name__)
mod = sys.modules[self.__name__]
self.__dict__.update(mod.__dict__)
return getattr(mod, name)
_LOWERNAMES = [
# email.<old name> -> email.<new name is lowercased old name>
'Charset',
'Encoders',
'Errors',
'FeedParser',
'Generator',
'Header',
'Iterators',
'Message',
'Parser',
'Utils',
'base64MIME',
'quopriMIME',
]
_MIMENAMES = [
# email.MIME<old name> -> email.mime.<new name is lowercased old name>
'Audio',
'Base',
'Image',
'Message',
'Multipart',
'NonMultipart',
'Text',
]
for _name in _LOWERNAMES:
importer = LazyImporter(_name.lower())
sys.modules['email.' + _name] = importer
setattr(sys.modules['email'], _name, importer)
import email.mime
for _name in _MIMENAMES:
importer = LazyImporter('mime.' + _name.lower())
sys.modules['email.MIME' + _name] = importer
setattr(sys.modules['email'], 'MIME' + _name, importer)
setattr(sys.modules['email.mime'], _name, importer)
""" Encoding Aliases Support
This module is used by the encodings package search function to
map encodings names to module names.
Note that the search function normalizes the encoding names before
doing the lookup, so the mapping will have to map normalized
encoding names to module names.
Contents:
The following aliases dictionary contains mappings of all IANA
character set names for which the Python core library provides
codecs. In addition to these, a few Python specific codec
aliases have also been added.
"""
aliases = {
# Please keep this list sorted alphabetically by value !
# ascii codec
'646' : 'ascii',
'ansi_x3.4_1968' : 'ascii',
'ansi_x3_4_1968' : 'ascii', # some email headers use this non-standard name
'ansi_x3.4_1986' : 'ascii',
'cp367' : 'ascii',
'csascii' : 'ascii',
'ibm367' : 'ascii',
'iso646_us' : 'ascii',
'iso_646.irv_1991' : 'ascii',
'iso_ir_6' : 'ascii',
'us' : 'ascii',
'us_ascii' : 'ascii',
# base64_codec codec
'base64' : 'base64_codec',
'base_64' : 'base64_codec',
# big5 codec
'big5_tw' : 'big5',
'csbig5' : 'big5',
# big5hkscs codec
'big5_hkscs' : 'big5hkscs',
'hkscs' : 'big5hkscs',
# bz2_codec codec
'bz2' : 'bz2_codec',
# cp037 codec
'037' : 'cp037',
'csibm037' : 'cp037',
'ebcdic_cp_ca' : 'cp037',
'ebcdic_cp_nl' : 'cp037',
'ebcdic_cp_us' : 'cp037',
'ebcdic_cp_wt' : 'cp037',
'ibm037' : 'cp037',
'ibm039' : 'cp037',
# cp1026 codec
'1026' : 'cp1026',
'csibm1026' : 'cp1026',
'ibm1026' : 'cp1026',
# cp1140 codec
'1140' : 'cp1140',
'ibm1140' : 'cp1140',
# cp1250 codec
'1250' : 'cp1250',
'windows_1250' : 'cp1250',
# cp1251 codec
'1251' : 'cp1251',
'windows_1251' : 'cp1251',
# cp1252 codec
'1252' : 'cp1252',
'windows_1252' : 'cp1252',
# cp1253 codec
'1253' : 'cp1253',
'windows_1253' : 'cp1253',
# cp1254 codec
'1254' : 'cp1254',
'windows_1254' : 'cp1254',
# cp1255 codec
'1255' : 'cp1255',
'windows_1255' : 'cp1255',
# cp1256 codec
'1256' : 'cp1256',
'windows_1256' : 'cp1256',
# cp1257 codec
'1257' : 'cp1257',
'windows_1257' : 'cp1257',
# cp1258 codec
'1258' : 'cp1258',
'windows_1258' : 'cp1258',
# cp424 codec
'424' : 'cp424',
'csibm424' : 'cp424',
'ebcdic_cp_he' : 'cp424',
'ibm424' : 'cp424',
# cp437 codec
'437' : 'cp437',
'cspc8codepage437' : 'cp437',
'ibm437' : 'cp437',
# cp500 codec
'500' : 'cp500',
'csibm500' : 'cp500',
'ebcdic_cp_be' : 'cp500',
'ebcdic_cp_ch' : 'cp500',
'ibm500' : 'cp500',
# cp775 codec
'775' : 'cp775',
'cspc775baltic' : 'cp775',
'ibm775' : 'cp775',
# cp850 codec
'850' : 'cp850',
'cspc850multilingual' : 'cp850',
'ibm850' : 'cp850',
# cp852 codec
'852' : 'cp852',
'cspcp852' : 'cp852',
'ibm852' : 'cp852',
# cp855 codec
'855' : 'cp855',
'csibm855' : 'cp855',
'ibm855' : 'cp855',
# cp857 codec
'857' : 'cp857',
'csibm857' : 'cp857',
'ibm857' : 'cp857',
# cp858 codec
'858' : 'cp858',
'csibm858' : 'cp858',
'ibm858' : 'cp858',
# cp860 codec
'860' : 'cp860',
'csibm860' : 'cp860',
'ibm860' : 'cp860',
# cp861 codec
'861' : 'cp861',
'cp_is' : 'cp861',
'csibm861' : 'cp861',
'ibm861' : 'cp861',
# cp862 codec
'862' : 'cp862',
'cspc862latinhebrew' : 'cp862',
'ibm862' : 'cp862',
# cp863 codec
'863' : 'cp863',
'csibm863' : 'cp863',
'ibm863' : 'cp863',
# cp864 codec
'864' : 'cp864',
'csibm864' : 'cp864',
'ibm864' : 'cp864',
# cp865 codec
'865' : 'cp865',
'csibm865' : 'cp865',
'ibm865' : 'cp865',
# cp866 codec
'866' : 'cp866',
'csibm866' : 'cp866',
'ibm866' : 'cp866',
# cp869 codec
'869' : 'cp869',
'cp_gr' : 'cp869',
'csibm869' : 'cp869',
'ibm869' : 'cp869',
# cp932 codec
'932' : 'cp932',
'ms932' : 'cp932',
'mskanji' : 'cp932',
'ms_kanji' : 'cp932',
# cp949 codec
'949' : 'cp949',
'ms949' : 'cp949',
'uhc' : 'cp949',
# cp950 codec
'950' : 'cp950',
'ms950' : 'cp950',
# euc_jis_2004 codec
'jisx0213' : 'euc_jis_2004',
'eucjis2004' : 'euc_jis_2004',
'euc_jis2004' : 'euc_jis_2004',
# euc_jisx0213 codec
'eucjisx0213' : 'euc_jisx0213',
# euc_jp codec
'eucjp' : 'euc_jp',
'ujis' : 'euc_jp',
'u_jis' : 'euc_jp',
# euc_kr codec
'euckr' : 'euc_kr',
'korean' : 'euc_kr',
'ksc5601' : 'euc_kr',
'ks_c_5601' : 'euc_kr',
'ks_c_5601_1987' : 'euc_kr',
'ksx1001' : 'euc_kr',
'ks_x_1001' : 'euc_kr',
# gb18030 codec
'gb18030_2000' : 'gb18030',
# gb2312 codec
'chinese' : 'gb2312',
'csiso58gb231280' : 'gb2312',
'euc_cn' : 'gb2312',
'euccn' : 'gb2312',
'eucgb2312_cn' : 'gb2312',
'gb2312_1980' : 'gb2312',
'gb2312_80' : 'gb2312',
'iso_ir_58' : 'gb2312',
# gbk codec
'936' : 'gbk',
'cp936' : 'gbk',
'ms936' : 'gbk',
# hex_codec codec
'hex' : 'hex_codec',
# hp_roman8 codec
'roman8' : 'hp_roman8',
'r8' : 'hp_roman8',
'csHPRoman8' : 'hp_roman8',
# hz codec
'hzgb' : 'hz',
'hz_gb' : 'hz',
'hz_gb_2312' : 'hz',
# iso2022_jp codec
'csiso2022jp' : 'iso2022_jp',
'iso2022jp' : 'iso2022_jp',
'iso_2022_jp' : 'iso2022_jp',
# iso2022_jp_1 codec
'iso2022jp_1' : 'iso2022_jp_1',
'iso_2022_jp_1' : 'iso2022_jp_1',
# iso2022_jp_2 codec
'iso2022jp_2' : 'iso2022_jp_2',
'iso_2022_jp_2' : 'iso2022_jp_2',
# iso2022_jp_2004 codec
'iso_2022_jp_2004' : 'iso2022_jp_2004',
'iso2022jp_2004' : 'iso2022_jp_2004',
# iso2022_jp_3 codec
'iso2022jp_3' : 'iso2022_jp_3',
'iso_2022_jp_3' : 'iso2022_jp_3',
# iso2022_jp_ext codec
'iso2022jp_ext' : 'iso2022_jp_ext',
'iso_2022_jp_ext' : 'iso2022_jp_ext',
# iso2022_kr codec
'csiso2022kr' : 'iso2022_kr',
'iso2022kr' : 'iso2022_kr',
'iso_2022_kr' : 'iso2022_kr',
# iso8859_10 codec
'csisolatin6' : 'iso8859_10',
'iso_8859_10' : 'iso8859_10',
'iso_8859_10_1992' : 'iso8859_10',
'iso_ir_157' : 'iso8859_10',
'l6' : 'iso8859_10',
'latin6' : 'iso8859_10',
# iso8859_11 codec
'thai' : 'iso8859_11',
'iso_8859_11' : 'iso8859_11',
'iso_8859_11_2001' : 'iso8859_11',
# iso8859_13 codec
'iso_8859_13' : 'iso8859_13',
'l7' : 'iso8859_13',
'latin7' : 'iso8859_13',
# iso8859_14 codec
'iso_8859_14' : 'iso8859_14',
'iso_8859_14_1998' : 'iso8859_14',
'iso_celtic' : 'iso8859_14',
'iso_ir_199' : 'iso8859_14',
'l8' : 'iso8859_14',
'latin8' : 'iso8859_14',
# iso8859_15 codec
'iso_8859_15' : 'iso8859_15',
'l9' : 'iso8859_15',
'latin9' : 'iso8859_15',
# iso8859_16 codec
'iso_8859_16' : 'iso8859_16',
'iso_8859_16_2001' : 'iso8859_16',
'iso_ir_226' : 'iso8859_16',
'l10' : 'iso8859_16',
'latin10' : 'iso8859_16',
# iso8859_2 codec
'csisolatin2' : 'iso8859_2',
'iso_8859_2' : 'iso8859_2',
'iso_8859_2_1987' : 'iso8859_2',
'iso_ir_101' : 'iso8859_2',
'l2' : 'iso8859_2',
'latin2' : 'iso8859_2',
# iso8859_3 codec
'csisolatin3' : 'iso8859_3',
'iso_8859_3' : 'iso8859_3',
'iso_8859_3_1988' : 'iso8859_3',
'iso_ir_109' : 'iso8859_3',
'l3' : 'iso8859_3',
'latin3' : 'iso8859_3',
# iso8859_4 codec
'csisolatin4' : 'iso8859_4',
'iso_8859_4' : 'iso8859_4',
'iso_8859_4_1988' : 'iso8859_4',
'iso_ir_110' : 'iso8859_4',
'l4' : 'iso8859_4',
'latin4' : 'iso8859_4',
# iso8859_5 codec
'csisolatincyrillic' : 'iso8859_5',
'cyrillic' : 'iso8859_5',
'iso_8859_5' : 'iso8859_5',
'iso_8859_5_1988' : 'iso8859_5',
'iso_ir_144' : 'iso8859_5',
# iso8859_6 codec
'arabic' : 'iso8859_6',
'asmo_708' : 'iso8859_6',
'csisolatinarabic' : 'iso8859_6',
'ecma_114' : 'iso8859_6',
'iso_8859_6' : 'iso8859_6',
'iso_8859_6_1987' : 'iso8859_6',
'iso_ir_127' : 'iso8859_6',
# iso8859_7 codec
'csisolatingreek' : 'iso8859_7',
'ecma_118' : 'iso8859_7',
'elot_928' : 'iso8859_7',
'greek' : 'iso8859_7',
'greek8' : 'iso8859_7',
'iso_8859_7' : 'iso8859_7',
'iso_8859_7_1987' : 'iso8859_7',
'iso_ir_126' : 'iso8859_7',
# iso8859_8 codec
'csisolatinhebrew' : 'iso8859_8',
'hebrew' : 'iso8859_8',
'iso_8859_8' : 'iso8859_8',
'iso_8859_8_1988' : 'iso8859_8',
'iso_ir_138' : 'iso8859_8',
# iso8859_9 codec
'csisolatin5' : 'iso8859_9',
'iso_8859_9' : 'iso8859_9',
'iso_8859_9_1989' : 'iso8859_9',
'iso_ir_148' : 'iso8859_9',
'l5' : 'iso8859_9',
'latin5' : 'iso8859_9',
# johab codec
'cp1361' : 'johab',
'ms1361' : 'johab',
# koi8_r codec
'cskoi8r' : 'koi8_r',
# latin_1 codec
#
# Note that the latin_1 codec is implemented internally in C and a
# lot faster than the charmap codec iso8859_1 which uses the same
# encoding. This is why we discourage the use of the iso8859_1
# codec and alias it to latin_1 instead.
#
'8859' : 'latin_1',
'cp819' : 'latin_1',
'csisolatin1' : 'latin_1',
'ibm819' : 'latin_1',
'iso8859' : 'latin_1',
'iso8859_1' : 'latin_1',
'iso_8859_1' : 'latin_1',
'iso_8859_1_1987' : 'latin_1',
'iso_ir_100' : 'latin_1',
'l1' : 'latin_1',
'latin' : 'latin_1',
'latin1' : 'latin_1',
# mac_cyrillic codec
'maccyrillic' : 'mac_cyrillic',
# mac_greek codec
'macgreek' : 'mac_greek',
# mac_iceland codec
'maciceland' : 'mac_iceland',
# mac_latin2 codec
'maccentraleurope' : 'mac_latin2',
'maclatin2' : 'mac_latin2',
# mac_roman codec
'macroman' : 'mac_roman',
# mac_turkish codec
'macturkish' : 'mac_turkish',
# mbcs codec
'dbcs' : 'mbcs',
# ptcp154 codec
'csptcp154' : 'ptcp154',
'pt154' : 'ptcp154',
'cp154' : 'ptcp154',
'cyrillic_asian' : 'ptcp154',
# quopri_codec codec
'quopri' : 'quopri_codec',
'quoted_printable' : 'quopri_codec',
'quotedprintable' : 'quopri_codec',
# rot_13 codec
'rot13' : 'rot_13',
# shift_jis codec
'csshiftjis' : 'shift_jis',
'shiftjis' : 'shift_jis',
'sjis' : 'shift_jis',
's_jis' : 'shift_jis',
# shift_jis_2004 codec
'shiftjis2004' : 'shift_jis_2004',
'sjis_2004' : 'shift_jis_2004',
's_jis_2004' : 'shift_jis_2004',
# shift_jisx0213 codec
'shiftjisx0213' : 'shift_jisx0213',
'sjisx0213' : 'shift_jisx0213',
's_jisx0213' : 'shift_jisx0213',
# tactis codec
'tis260' : 'tactis',
# tis_620 codec
'tis620' : 'tis_620',
'tis_620_0' : 'tis_620',
'tis_620_2529_0' : 'tis_620',
'tis_620_2529_1' : 'tis_620',
'iso_ir_166' : 'tis_620',
# utf_16 codec
'u16' : 'utf_16',
'utf16' : 'utf_16',
# utf_16_be codec
'unicodebigunmarked' : 'utf_16_be',
'utf_16be' : 'utf_16_be',
# utf_16_le codec
'unicodelittleunmarked' : 'utf_16_le',
'utf_16le' : 'utf_16_le',
# utf_32 codec
'u32' : 'utf_32',
'utf32' : 'utf_32',
# utf_32_be codec
'utf_32be' : 'utf_32_be',
# utf_32_le codec
'utf_32le' : 'utf_32_le',
# utf_7 codec
'u7' : 'utf_7',
'utf7' : 'utf_7',
'unicode_1_1_utf_7' : 'utf_7',
# utf_8 codec
'u8' : 'utf_8',
'utf' : 'utf_8',
'utf8' : 'utf_8',
'utf8_ucs2' : 'utf_8',
'utf8_ucs4' : 'utf_8',
# uu_codec codec
'uu' : 'uu_codec',
# zlib_codec codec
'zip' : 'zlib_codec',
'zlib' : 'zlib_codec',
}
""" Python 'ascii' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.ascii_encode
decode = codecs.ascii_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.ascii_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.ascii_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
class StreamConverter(StreamWriter,StreamReader):
encode = codecs.ascii_decode
decode = codecs.ascii_encode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='ascii',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python 'base64_codec' Codec - base64 content transfer encoding
Unlike most of the other codecs which target Unicode, this codec
will return Python string objects for both encode and decode.
Written by Marc-Andre Lemburg ([email protected]).
"""
import codecs, base64
### Codec APIs
def base64_encode(input,errors='strict'):
""" Encodes the object input and returns a tuple (output
object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = base64.encodestring(input)
return (output, len(input))
def base64_decode(input,errors='strict'):
""" Decodes the object input and returns a tuple (output
object, length consumed).
input must be an object which provides the bf_getreadbuf
buffer slot. Python strings, buffer objects and memory
mapped files are examples of objects providing this slot.
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = base64.decodestring(input)
return (output, len(input))
class Codec(codecs.Codec):
def encode(self, input,errors='strict'):
return base64_encode(input,errors)
def decode(self, input,errors='strict'):
return base64_decode(input,errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
assert self.errors == 'strict'
return base64.encodestring(input)
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
assert self.errors == 'strict'
return base64.decodestring(input)
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='base64',
encode=base64_encode,
decode=base64_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
""" Python 'bz2_codec' Codec - bz2 compression encoding
Unlike most of the other codecs which target Unicode, this codec
will return Python string objects for both encode and decode.
Adapted by Raymond Hettinger from zlib_codec.py which was written
by Marc-Andre Lemburg ([email protected]).
"""
import codecs
import bz2 # this codec needs the optional bz2 module !
### Codec APIs
def bz2_encode(input,errors='strict'):
""" Encodes the object input and returns a tuple (output
object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = bz2.compress(input)
return (output, len(input))
def bz2_decode(input,errors='strict'):
""" Decodes the object input and returns a tuple (output
object, length consumed).
input must be an object which provides the bf_getreadbuf
buffer slot. Python strings, buffer objects and memory
mapped files are examples of objects providing this slot.
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = bz2.decompress(input)
return (output, len(input))
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return bz2_encode(input, errors)
def decode(self, input, errors='strict'):
return bz2_decode(input, errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
assert errors == 'strict'
self.errors = errors
self.compressobj = bz2.BZ2Compressor()
def encode(self, input, final=False):
if final:
c = self.compressobj.compress(input)
return c + self.compressobj.flush()
else:
return self.compressobj.compress(input)
def reset(self):
self.compressobj = bz2.BZ2Compressor()
class IncrementalDecoder(codecs.IncrementalDecoder):
def __init__(self, errors='strict'):
assert errors == 'strict'
self.errors = errors
self.decompressobj = bz2.BZ2Decompressor()
def decode(self, input, final=False):
try:
return self.decompressobj.decompress(input)
except EOFError:
return ''
def reset(self):
self.decompressobj = bz2.BZ2Decompressor()
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name="bz2",
encode=bz2_encode,
decode=bz2_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
""" Generic Python Character Mapping Codec.
Use this codec directly rather than through the automatic
conversion mechanisms supplied by unicode() and .encode().
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.charmap_encode
decode = codecs.charmap_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict', mapping=None):
codecs.IncrementalEncoder.__init__(self, errors)
self.mapping = mapping
def encode(self, input, final=False):
return codecs.charmap_encode(input, self.errors, self.mapping)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def __init__(self, errors='strict', mapping=None):
codecs.IncrementalDecoder.__init__(self, errors)
self.mapping = mapping
def decode(self, input, final=False):
return codecs.charmap_decode(input, self.errors, self.mapping)[0]
class StreamWriter(Codec,codecs.StreamWriter):
def __init__(self,stream,errors='strict',mapping=None):
codecs.StreamWriter.__init__(self,stream,errors)
self.mapping = mapping
def encode(self,input,errors='strict'):
return Codec.encode(input,errors,self.mapping)
class StreamReader(Codec,codecs.StreamReader):
def __init__(self,stream,errors='strict',mapping=None):
codecs.StreamReader.__init__(self,stream,errors)
self.mapping = mapping
def decode(self,input,errors='strict'):
return Codec.decode(input,errors,self.mapping)
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='charmap',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python Character Mapping Codec cp037 generated from 'MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp037',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x9c' # 0x04 -> CONTROL
u'\t' # 0x05 -> HORIZONTAL TABULATION
u'\x86' # 0x06 -> CONTROL
u'\x7f' # 0x07 -> DELETE
u'\x97' # 0x08 -> CONTROL
u'\x8d' # 0x09 -> CONTROL
u'\x8e' # 0x0A -> CONTROL
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x9d' # 0x14 -> CONTROL
u'\x85' # 0x15 -> CONTROL
u'\x08' # 0x16 -> BACKSPACE
u'\x87' # 0x17 -> CONTROL
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x92' # 0x1A -> CONTROL
u'\x8f' # 0x1B -> CONTROL
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u'\x80' # 0x20 -> CONTROL
u'\x81' # 0x21 -> CONTROL
u'\x82' # 0x22 -> CONTROL
u'\x83' # 0x23 -> CONTROL
u'\x84' # 0x24 -> CONTROL
u'\n' # 0x25 -> LINE FEED
u'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
u'\x1b' # 0x27 -> ESCAPE
u'\x88' # 0x28 -> CONTROL
u'\x89' # 0x29 -> CONTROL
u'\x8a' # 0x2A -> CONTROL
u'\x8b' # 0x2B -> CONTROL
u'\x8c' # 0x2C -> CONTROL
u'\x05' # 0x2D -> ENQUIRY
u'\x06' # 0x2E -> ACKNOWLEDGE
u'\x07' # 0x2F -> BELL
u'\x90' # 0x30 -> CONTROL
u'\x91' # 0x31 -> CONTROL
u'\x16' # 0x32 -> SYNCHRONOUS IDLE
u'\x93' # 0x33 -> CONTROL
u'\x94' # 0x34 -> CONTROL
u'\x95' # 0x35 -> CONTROL
u'\x96' # 0x36 -> CONTROL
u'\x04' # 0x37 -> END OF TRANSMISSION
u'\x98' # 0x38 -> CONTROL
u'\x99' # 0x39 -> CONTROL
u'\x9a' # 0x3A -> CONTROL
u'\x9b' # 0x3B -> CONTROL
u'\x14' # 0x3C -> DEVICE CONTROL FOUR
u'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
u'\x9e' # 0x3E -> CONTROL
u'\x1a' # 0x3F -> SUBSTITUTE
u' ' # 0x40 -> SPACE
u'\xa0' # 0x41 -> NO-BREAK SPACE
u'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x43 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x48 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
u'\xa2' # 0x4A -> CENT SIGN
u'.' # 0x4B -> FULL STOP
u'<' # 0x4C -> LESS-THAN SIGN
u'(' # 0x4D -> LEFT PARENTHESIS
u'+' # 0x4E -> PLUS SIGN
u'|' # 0x4F -> VERTICAL LINE
u'&' # 0x50 -> AMPERSAND
u'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
u'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
u'\xdf' # 0x59 -> LATIN SMALL LETTER SHARP S (GERMAN)
u'!' # 0x5A -> EXCLAMATION MARK
u'$' # 0x5B -> DOLLAR SIGN
u'*' # 0x5C -> ASTERISK
u')' # 0x5D -> RIGHT PARENTHESIS
u';' # 0x5E -> SEMICOLON
u'\xac' # 0x5F -> NOT SIGN
u'-' # 0x60 -> HYPHEN-MINUS
u'/' # 0x61 -> SOLIDUS
u'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc4' # 0x63 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc7' # 0x68 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xa6' # 0x6A -> BROKEN BAR
u',' # 0x6B -> COMMA
u'%' # 0x6C -> PERCENT SIGN
u'_' # 0x6D -> LOW LINE
u'>' # 0x6E -> GREATER-THAN SIGN
u'?' # 0x6F -> QUESTION MARK
u'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
u'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
u'`' # 0x79 -> GRAVE ACCENT
u':' # 0x7A -> COLON
u'#' # 0x7B -> NUMBER SIGN
u'@' # 0x7C -> COMMERCIAL AT
u"'" # 0x7D -> APOSTROPHE
u'=' # 0x7E -> EQUALS SIGN
u'"' # 0x7F -> QUOTATION MARK
u'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
u'a' # 0x81 -> LATIN SMALL LETTER A
u'b' # 0x82 -> LATIN SMALL LETTER B
u'c' # 0x83 -> LATIN SMALL LETTER C
u'd' # 0x84 -> LATIN SMALL LETTER D
u'e' # 0x85 -> LATIN SMALL LETTER E
u'f' # 0x86 -> LATIN SMALL LETTER F
u'g' # 0x87 -> LATIN SMALL LETTER G
u'h' # 0x88 -> LATIN SMALL LETTER H
u'i' # 0x89 -> LATIN SMALL LETTER I
u'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xf0' # 0x8C -> LATIN SMALL LETTER ETH (ICELANDIC)
u'\xfd' # 0x8D -> LATIN SMALL LETTER Y WITH ACUTE
u'\xfe' # 0x8E -> LATIN SMALL LETTER THORN (ICELANDIC)
u'\xb1' # 0x8F -> PLUS-MINUS SIGN
u'\xb0' # 0x90 -> DEGREE SIGN
u'j' # 0x91 -> LATIN SMALL LETTER J
u'k' # 0x92 -> LATIN SMALL LETTER K
u'l' # 0x93 -> LATIN SMALL LETTER L
u'm' # 0x94 -> LATIN SMALL LETTER M
u'n' # 0x95 -> LATIN SMALL LETTER N
u'o' # 0x96 -> LATIN SMALL LETTER O
u'p' # 0x97 -> LATIN SMALL LETTER P
u'q' # 0x98 -> LATIN SMALL LETTER Q
u'r' # 0x99 -> LATIN SMALL LETTER R
u'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
u'\xe6' # 0x9C -> LATIN SMALL LIGATURE AE
u'\xb8' # 0x9D -> CEDILLA
u'\xc6' # 0x9E -> LATIN CAPITAL LIGATURE AE
u'\xa4' # 0x9F -> CURRENCY SIGN
u'\xb5' # 0xA0 -> MICRO SIGN
u'~' # 0xA1 -> TILDE
u's' # 0xA2 -> LATIN SMALL LETTER S
u't' # 0xA3 -> LATIN SMALL LETTER T
u'u' # 0xA4 -> LATIN SMALL LETTER U
u'v' # 0xA5 -> LATIN SMALL LETTER V
u'w' # 0xA6 -> LATIN SMALL LETTER W
u'x' # 0xA7 -> LATIN SMALL LETTER X
u'y' # 0xA8 -> LATIN SMALL LETTER Y
u'z' # 0xA9 -> LATIN SMALL LETTER Z
u'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
u'\xbf' # 0xAB -> INVERTED QUESTION MARK
u'\xd0' # 0xAC -> LATIN CAPITAL LETTER ETH (ICELANDIC)
u'\xdd' # 0xAD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xde' # 0xAE -> LATIN CAPITAL LETTER THORN (ICELANDIC)
u'\xae' # 0xAF -> REGISTERED SIGN
u'^' # 0xB0 -> CIRCUMFLEX ACCENT
u'\xa3' # 0xB1 -> POUND SIGN
u'\xa5' # 0xB2 -> YEN SIGN
u'\xb7' # 0xB3 -> MIDDLE DOT
u'\xa9' # 0xB4 -> COPYRIGHT SIGN
u'\xa7' # 0xB5 -> SECTION SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
u'[' # 0xBA -> LEFT SQUARE BRACKET
u']' # 0xBB -> RIGHT SQUARE BRACKET
u'\xaf' # 0xBC -> MACRON
u'\xa8' # 0xBD -> DIAERESIS
u'\xb4' # 0xBE -> ACUTE ACCENT
u'\xd7' # 0xBF -> MULTIPLICATION SIGN
u'{' # 0xC0 -> LEFT CURLY BRACKET
u'A' # 0xC1 -> LATIN CAPITAL LETTER A
u'B' # 0xC2 -> LATIN CAPITAL LETTER B
u'C' # 0xC3 -> LATIN CAPITAL LETTER C
u'D' # 0xC4 -> LATIN CAPITAL LETTER D
u'E' # 0xC5 -> LATIN CAPITAL LETTER E
u'F' # 0xC6 -> LATIN CAPITAL LETTER F
u'G' # 0xC7 -> LATIN CAPITAL LETTER G
u'H' # 0xC8 -> LATIN CAPITAL LETTER H
u'I' # 0xC9 -> LATIN CAPITAL LETTER I
u'\xad' # 0xCA -> SOFT HYPHEN
u'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0xCC -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
u'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
u'}' # 0xD0 -> RIGHT CURLY BRACKET
u'J' # 0xD1 -> LATIN CAPITAL LETTER J
u'K' # 0xD2 -> LATIN CAPITAL LETTER K
u'L' # 0xD3 -> LATIN CAPITAL LETTER L
u'M' # 0xD4 -> LATIN CAPITAL LETTER M
u'N' # 0xD5 -> LATIN CAPITAL LETTER N
u'O' # 0xD6 -> LATIN CAPITAL LETTER O
u'P' # 0xD7 -> LATIN CAPITAL LETTER P
u'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
u'R' # 0xD9 -> LATIN CAPITAL LETTER R
u'\xb9' # 0xDA -> SUPERSCRIPT ONE
u'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xDC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
u'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\\' # 0xE0 -> REVERSE SOLIDUS
u'\xf7' # 0xE1 -> DIVISION SIGN
u'S' # 0xE2 -> LATIN CAPITAL LETTER S
u'T' # 0xE3 -> LATIN CAPITAL LETTER T
u'U' # 0xE4 -> LATIN CAPITAL LETTER U
u'V' # 0xE5 -> LATIN CAPITAL LETTER V
u'W' # 0xE6 -> LATIN CAPITAL LETTER W
u'X' # 0xE7 -> LATIN CAPITAL LETTER X
u'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
u'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
u'\xb2' # 0xEA -> SUPERSCRIPT TWO
u'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd6' # 0xEC -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
u'0' # 0xF0 -> DIGIT ZERO
u'1' # 0xF1 -> DIGIT ONE
u'2' # 0xF2 -> DIGIT TWO
u'3' # 0xF3 -> DIGIT THREE
u'4' # 0xF4 -> DIGIT FOUR
u'5' # 0xF5 -> DIGIT FIVE
u'6' # 0xF6 -> DIGIT SIX
u'7' # 0xF7 -> DIGIT SEVEN
u'8' # 0xF8 -> DIGIT EIGHT
u'9' # 0xF9 -> DIGIT NINE
u'\xb3' # 0xFA -> SUPERSCRIPT THREE
u'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xFC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
u'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1006 generated from 'MAPPINGS/VENDORS/MISC/CP1006.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1006',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u06f0' # 0xA1 -> EXTENDED ARABIC-INDIC DIGIT ZERO
u'\u06f1' # 0xA2 -> EXTENDED ARABIC-INDIC DIGIT ONE
u'\u06f2' # 0xA3 -> EXTENDED ARABIC-INDIC DIGIT TWO
u'\u06f3' # 0xA4 -> EXTENDED ARABIC-INDIC DIGIT THREE
u'\u06f4' # 0xA5 -> EXTENDED ARABIC-INDIC DIGIT FOUR
u'\u06f5' # 0xA6 -> EXTENDED ARABIC-INDIC DIGIT FIVE
u'\u06f6' # 0xA7 -> EXTENDED ARABIC-INDIC DIGIT SIX
u'\u06f7' # 0xA8 -> EXTENDED ARABIC-INDIC DIGIT SEVEN
u'\u06f8' # 0xA9 -> EXTENDED ARABIC-INDIC DIGIT EIGHT
u'\u06f9' # 0xAA -> EXTENDED ARABIC-INDIC DIGIT NINE
u'\u060c' # 0xAB -> ARABIC COMMA
u'\u061b' # 0xAC -> ARABIC SEMICOLON
u'\xad' # 0xAD -> SOFT HYPHEN
u'\u061f' # 0xAE -> ARABIC QUESTION MARK
u'\ufe81' # 0xAF -> ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
u'\ufe8d' # 0xB0 -> ARABIC LETTER ALEF ISOLATED FORM
u'\ufe8e' # 0xB1 -> ARABIC LETTER ALEF FINAL FORM
u'\ufe8e' # 0xB2 -> ARABIC LETTER ALEF FINAL FORM
u'\ufe8f' # 0xB3 -> ARABIC LETTER BEH ISOLATED FORM
u'\ufe91' # 0xB4 -> ARABIC LETTER BEH INITIAL FORM
u'\ufb56' # 0xB5 -> ARABIC LETTER PEH ISOLATED FORM
u'\ufb58' # 0xB6 -> ARABIC LETTER PEH INITIAL FORM
u'\ufe93' # 0xB7 -> ARABIC LETTER TEH MARBUTA ISOLATED FORM
u'\ufe95' # 0xB8 -> ARABIC LETTER TEH ISOLATED FORM
u'\ufe97' # 0xB9 -> ARABIC LETTER TEH INITIAL FORM
u'\ufb66' # 0xBA -> ARABIC LETTER TTEH ISOLATED FORM
u'\ufb68' # 0xBB -> ARABIC LETTER TTEH INITIAL FORM
u'\ufe99' # 0xBC -> ARABIC LETTER THEH ISOLATED FORM
u'\ufe9b' # 0xBD -> ARABIC LETTER THEH INITIAL FORM
u'\ufe9d' # 0xBE -> ARABIC LETTER JEEM ISOLATED FORM
u'\ufe9f' # 0xBF -> ARABIC LETTER JEEM INITIAL FORM
u'\ufb7a' # 0xC0 -> ARABIC LETTER TCHEH ISOLATED FORM
u'\ufb7c' # 0xC1 -> ARABIC LETTER TCHEH INITIAL FORM
u'\ufea1' # 0xC2 -> ARABIC LETTER HAH ISOLATED FORM
u'\ufea3' # 0xC3 -> ARABIC LETTER HAH INITIAL FORM
u'\ufea5' # 0xC4 -> ARABIC LETTER KHAH ISOLATED FORM
u'\ufea7' # 0xC5 -> ARABIC LETTER KHAH INITIAL FORM
u'\ufea9' # 0xC6 -> ARABIC LETTER DAL ISOLATED FORM
u'\ufb84' # 0xC7 -> ARABIC LETTER DAHAL ISOLATED FORMN
u'\ufeab' # 0xC8 -> ARABIC LETTER THAL ISOLATED FORM
u'\ufead' # 0xC9 -> ARABIC LETTER REH ISOLATED FORM
u'\ufb8c' # 0xCA -> ARABIC LETTER RREH ISOLATED FORM
u'\ufeaf' # 0xCB -> ARABIC LETTER ZAIN ISOLATED FORM
u'\ufb8a' # 0xCC -> ARABIC LETTER JEH ISOLATED FORM
u'\ufeb1' # 0xCD -> ARABIC LETTER SEEN ISOLATED FORM
u'\ufeb3' # 0xCE -> ARABIC LETTER SEEN INITIAL FORM
u'\ufeb5' # 0xCF -> ARABIC LETTER SHEEN ISOLATED FORM
u'\ufeb7' # 0xD0 -> ARABIC LETTER SHEEN INITIAL FORM
u'\ufeb9' # 0xD1 -> ARABIC LETTER SAD ISOLATED FORM
u'\ufebb' # 0xD2 -> ARABIC LETTER SAD INITIAL FORM
u'\ufebd' # 0xD3 -> ARABIC LETTER DAD ISOLATED FORM
u'\ufebf' # 0xD4 -> ARABIC LETTER DAD INITIAL FORM
u'\ufec1' # 0xD5 -> ARABIC LETTER TAH ISOLATED FORM
u'\ufec5' # 0xD6 -> ARABIC LETTER ZAH ISOLATED FORM
u'\ufec9' # 0xD7 -> ARABIC LETTER AIN ISOLATED FORM
u'\ufeca' # 0xD8 -> ARABIC LETTER AIN FINAL FORM
u'\ufecb' # 0xD9 -> ARABIC LETTER AIN INITIAL FORM
u'\ufecc' # 0xDA -> ARABIC LETTER AIN MEDIAL FORM
u'\ufecd' # 0xDB -> ARABIC LETTER GHAIN ISOLATED FORM
u'\ufece' # 0xDC -> ARABIC LETTER GHAIN FINAL FORM
u'\ufecf' # 0xDD -> ARABIC LETTER GHAIN INITIAL FORM
u'\ufed0' # 0xDE -> ARABIC LETTER GHAIN MEDIAL FORM
u'\ufed1' # 0xDF -> ARABIC LETTER FEH ISOLATED FORM
u'\ufed3' # 0xE0 -> ARABIC LETTER FEH INITIAL FORM
u'\ufed5' # 0xE1 -> ARABIC LETTER QAF ISOLATED FORM
u'\ufed7' # 0xE2 -> ARABIC LETTER QAF INITIAL FORM
u'\ufed9' # 0xE3 -> ARABIC LETTER KAF ISOLATED FORM
u'\ufedb' # 0xE4 -> ARABIC LETTER KAF INITIAL FORM
u'\ufb92' # 0xE5 -> ARABIC LETTER GAF ISOLATED FORM
u'\ufb94' # 0xE6 -> ARABIC LETTER GAF INITIAL FORM
u'\ufedd' # 0xE7 -> ARABIC LETTER LAM ISOLATED FORM
u'\ufedf' # 0xE8 -> ARABIC LETTER LAM INITIAL FORM
u'\ufee0' # 0xE9 -> ARABIC LETTER LAM MEDIAL FORM
u'\ufee1' # 0xEA -> ARABIC LETTER MEEM ISOLATED FORM
u'\ufee3' # 0xEB -> ARABIC LETTER MEEM INITIAL FORM
u'\ufb9e' # 0xEC -> ARABIC LETTER NOON GHUNNA ISOLATED FORM
u'\ufee5' # 0xED -> ARABIC LETTER NOON ISOLATED FORM
u'\ufee7' # 0xEE -> ARABIC LETTER NOON INITIAL FORM
u'\ufe85' # 0xEF -> ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
u'\ufeed' # 0xF0 -> ARABIC LETTER WAW ISOLATED FORM
u'\ufba6' # 0xF1 -> ARABIC LETTER HEH GOAL ISOLATED FORM
u'\ufba8' # 0xF2 -> ARABIC LETTER HEH GOAL INITIAL FORM
u'\ufba9' # 0xF3 -> ARABIC LETTER HEH GOAL MEDIAL FORM
u'\ufbaa' # 0xF4 -> ARABIC LETTER HEH DOACHASHMEE ISOLATED FORM
u'\ufe80' # 0xF5 -> ARABIC LETTER HAMZA ISOLATED FORM
u'\ufe89' # 0xF6 -> ARABIC LETTER YEH WITH HAMZA ABOVE ISOLATED FORM
u'\ufe8a' # 0xF7 -> ARABIC LETTER YEH WITH HAMZA ABOVE FINAL FORM
u'\ufe8b' # 0xF8 -> ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
u'\ufef1' # 0xF9 -> ARABIC LETTER YEH ISOLATED FORM
u'\ufef2' # 0xFA -> ARABIC LETTER YEH FINAL FORM
u'\ufef3' # 0xFB -> ARABIC LETTER YEH INITIAL FORM
u'\ufbb0' # 0xFC -> ARABIC LETTER YEH BARREE WITH HAMZA ABOVE ISOLATED FORM
u'\ufbae' # 0xFD -> ARABIC LETTER YEH BARREE ISOLATED FORM
u'\ufe7c' # 0xFE -> ARABIC SHADDA ISOLATED FORM
u'\ufe7d' # 0xFF -> ARABIC SHADDA MEDIAL FORM
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1026 generated from 'MAPPINGS/VENDORS/MICSFT/EBCDIC/CP1026.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1026',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x9c' # 0x04 -> CONTROL
u'\t' # 0x05 -> HORIZONTAL TABULATION
u'\x86' # 0x06 -> CONTROL
u'\x7f' # 0x07 -> DELETE
u'\x97' # 0x08 -> CONTROL
u'\x8d' # 0x09 -> CONTROL
u'\x8e' # 0x0A -> CONTROL
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x9d' # 0x14 -> CONTROL
u'\x85' # 0x15 -> CONTROL
u'\x08' # 0x16 -> BACKSPACE
u'\x87' # 0x17 -> CONTROL
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x92' # 0x1A -> CONTROL
u'\x8f' # 0x1B -> CONTROL
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u'\x80' # 0x20 -> CONTROL
u'\x81' # 0x21 -> CONTROL
u'\x82' # 0x22 -> CONTROL
u'\x83' # 0x23 -> CONTROL
u'\x84' # 0x24 -> CONTROL
u'\n' # 0x25 -> LINE FEED
u'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
u'\x1b' # 0x27 -> ESCAPE
u'\x88' # 0x28 -> CONTROL
u'\x89' # 0x29 -> CONTROL
u'\x8a' # 0x2A -> CONTROL
u'\x8b' # 0x2B -> CONTROL
u'\x8c' # 0x2C -> CONTROL
u'\x05' # 0x2D -> ENQUIRY
u'\x06' # 0x2E -> ACKNOWLEDGE
u'\x07' # 0x2F -> BELL
u'\x90' # 0x30 -> CONTROL
u'\x91' # 0x31 -> CONTROL
u'\x16' # 0x32 -> SYNCHRONOUS IDLE
u'\x93' # 0x33 -> CONTROL
u'\x94' # 0x34 -> CONTROL
u'\x95' # 0x35 -> CONTROL
u'\x96' # 0x36 -> CONTROL
u'\x04' # 0x37 -> END OF TRANSMISSION
u'\x98' # 0x38 -> CONTROL
u'\x99' # 0x39 -> CONTROL
u'\x9a' # 0x3A -> CONTROL
u'\x9b' # 0x3B -> CONTROL
u'\x14' # 0x3C -> DEVICE CONTROL FOUR
u'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
u'\x9e' # 0x3E -> CONTROL
u'\x1a' # 0x3F -> SUBSTITUTE
u' ' # 0x40 -> SPACE
u'\xa0' # 0x41 -> NO-BREAK SPACE
u'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x43 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
u'{' # 0x48 -> LEFT CURLY BRACKET
u'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
u'\xc7' # 0x4A -> LATIN CAPITAL LETTER C WITH CEDILLA
u'.' # 0x4B -> FULL STOP
u'<' # 0x4C -> LESS-THAN SIGN
u'(' # 0x4D -> LEFT PARENTHESIS
u'+' # 0x4E -> PLUS SIGN
u'!' # 0x4F -> EXCLAMATION MARK
u'&' # 0x50 -> AMPERSAND
u'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
u'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
u'\xdf' # 0x59 -> LATIN SMALL LETTER SHARP S (GERMAN)
u'\u011e' # 0x5A -> LATIN CAPITAL LETTER G WITH BREVE
u'\u0130' # 0x5B -> LATIN CAPITAL LETTER I WITH DOT ABOVE
u'*' # 0x5C -> ASTERISK
u')' # 0x5D -> RIGHT PARENTHESIS
u';' # 0x5E -> SEMICOLON
u'^' # 0x5F -> CIRCUMFLEX ACCENT
u'-' # 0x60 -> HYPHEN-MINUS
u'/' # 0x61 -> SOLIDUS
u'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc4' # 0x63 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'[' # 0x68 -> LEFT SQUARE BRACKET
u'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
u'\u015f' # 0x6A -> LATIN SMALL LETTER S WITH CEDILLA
u',' # 0x6B -> COMMA
u'%' # 0x6C -> PERCENT SIGN
u'_' # 0x6D -> LOW LINE
u'>' # 0x6E -> GREATER-THAN SIGN
u'?' # 0x6F -> QUESTION MARK
u'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
u'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
u'\u0131' # 0x79 -> LATIN SMALL LETTER DOTLESS I
u':' # 0x7A -> COLON
u'\xd6' # 0x7B -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\u015e' # 0x7C -> LATIN CAPITAL LETTER S WITH CEDILLA
u"'" # 0x7D -> APOSTROPHE
u'=' # 0x7E -> EQUALS SIGN
u'\xdc' # 0x7F -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
u'a' # 0x81 -> LATIN SMALL LETTER A
u'b' # 0x82 -> LATIN SMALL LETTER B
u'c' # 0x83 -> LATIN SMALL LETTER C
u'd' # 0x84 -> LATIN SMALL LETTER D
u'e' # 0x85 -> LATIN SMALL LETTER E
u'f' # 0x86 -> LATIN SMALL LETTER F
u'g' # 0x87 -> LATIN SMALL LETTER G
u'h' # 0x88 -> LATIN SMALL LETTER H
u'i' # 0x89 -> LATIN SMALL LETTER I
u'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'}' # 0x8C -> RIGHT CURLY BRACKET
u'`' # 0x8D -> GRAVE ACCENT
u'\xa6' # 0x8E -> BROKEN BAR
u'\xb1' # 0x8F -> PLUS-MINUS SIGN
u'\xb0' # 0x90 -> DEGREE SIGN
u'j' # 0x91 -> LATIN SMALL LETTER J
u'k' # 0x92 -> LATIN SMALL LETTER K
u'l' # 0x93 -> LATIN SMALL LETTER L
u'm' # 0x94 -> LATIN SMALL LETTER M
u'n' # 0x95 -> LATIN SMALL LETTER N
u'o' # 0x96 -> LATIN SMALL LETTER O
u'p' # 0x97 -> LATIN SMALL LETTER P
u'q' # 0x98 -> LATIN SMALL LETTER Q
u'r' # 0x99 -> LATIN SMALL LETTER R
u'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
u'\xe6' # 0x9C -> LATIN SMALL LIGATURE AE
u'\xb8' # 0x9D -> CEDILLA
u'\xc6' # 0x9E -> LATIN CAPITAL LIGATURE AE
u'\xa4' # 0x9F -> CURRENCY SIGN
u'\xb5' # 0xA0 -> MICRO SIGN
u'\xf6' # 0xA1 -> LATIN SMALL LETTER O WITH DIAERESIS
u's' # 0xA2 -> LATIN SMALL LETTER S
u't' # 0xA3 -> LATIN SMALL LETTER T
u'u' # 0xA4 -> LATIN SMALL LETTER U
u'v' # 0xA5 -> LATIN SMALL LETTER V
u'w' # 0xA6 -> LATIN SMALL LETTER W
u'x' # 0xA7 -> LATIN SMALL LETTER X
u'y' # 0xA8 -> LATIN SMALL LETTER Y
u'z' # 0xA9 -> LATIN SMALL LETTER Z
u'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
u'\xbf' # 0xAB -> INVERTED QUESTION MARK
u']' # 0xAC -> RIGHT SQUARE BRACKET
u'$' # 0xAD -> DOLLAR SIGN
u'@' # 0xAE -> COMMERCIAL AT
u'\xae' # 0xAF -> REGISTERED SIGN
u'\xa2' # 0xB0 -> CENT SIGN
u'\xa3' # 0xB1 -> POUND SIGN
u'\xa5' # 0xB2 -> YEN SIGN
u'\xb7' # 0xB3 -> MIDDLE DOT
u'\xa9' # 0xB4 -> COPYRIGHT SIGN
u'\xa7' # 0xB5 -> SECTION SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
u'\xac' # 0xBA -> NOT SIGN
u'|' # 0xBB -> VERTICAL LINE
u'\xaf' # 0xBC -> MACRON
u'\xa8' # 0xBD -> DIAERESIS
u'\xb4' # 0xBE -> ACUTE ACCENT
u'\xd7' # 0xBF -> MULTIPLICATION SIGN
u'\xe7' # 0xC0 -> LATIN SMALL LETTER C WITH CEDILLA
u'A' # 0xC1 -> LATIN CAPITAL LETTER A
u'B' # 0xC2 -> LATIN CAPITAL LETTER B
u'C' # 0xC3 -> LATIN CAPITAL LETTER C
u'D' # 0xC4 -> LATIN CAPITAL LETTER D
u'E' # 0xC5 -> LATIN CAPITAL LETTER E
u'F' # 0xC6 -> LATIN CAPITAL LETTER F
u'G' # 0xC7 -> LATIN CAPITAL LETTER G
u'H' # 0xC8 -> LATIN CAPITAL LETTER H
u'I' # 0xC9 -> LATIN CAPITAL LETTER I
u'\xad' # 0xCA -> SOFT HYPHEN
u'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'~' # 0xCC -> TILDE
u'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
u'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
u'\u011f' # 0xD0 -> LATIN SMALL LETTER G WITH BREVE
u'J' # 0xD1 -> LATIN CAPITAL LETTER J
u'K' # 0xD2 -> LATIN CAPITAL LETTER K
u'L' # 0xD3 -> LATIN CAPITAL LETTER L
u'M' # 0xD4 -> LATIN CAPITAL LETTER M
u'N' # 0xD5 -> LATIN CAPITAL LETTER N
u'O' # 0xD6 -> LATIN CAPITAL LETTER O
u'P' # 0xD7 -> LATIN CAPITAL LETTER P
u'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
u'R' # 0xD9 -> LATIN CAPITAL LETTER R
u'\xb9' # 0xDA -> SUPERSCRIPT ONE
u'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\\' # 0xDC -> REVERSE SOLIDUS
u'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
u'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\xfc' # 0xE0 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xf7' # 0xE1 -> DIVISION SIGN
u'S' # 0xE2 -> LATIN CAPITAL LETTER S
u'T' # 0xE3 -> LATIN CAPITAL LETTER T
u'U' # 0xE4 -> LATIN CAPITAL LETTER U
u'V' # 0xE5 -> LATIN CAPITAL LETTER V
u'W' # 0xE6 -> LATIN CAPITAL LETTER W
u'X' # 0xE7 -> LATIN CAPITAL LETTER X
u'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
u'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
u'\xb2' # 0xEA -> SUPERSCRIPT TWO
u'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'#' # 0xEC -> NUMBER SIGN
u'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
u'0' # 0xF0 -> DIGIT ZERO
u'1' # 0xF1 -> DIGIT ONE
u'2' # 0xF2 -> DIGIT TWO
u'3' # 0xF3 -> DIGIT THREE
u'4' # 0xF4 -> DIGIT FOUR
u'5' # 0xF5 -> DIGIT FIVE
u'6' # 0xF6 -> DIGIT SIX
u'7' # 0xF7 -> DIGIT SEVEN
u'8' # 0xF8 -> DIGIT EIGHT
u'9' # 0xF9 -> DIGIT NINE
u'\xb3' # 0xFA -> SUPERSCRIPT THREE
u'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'"' # 0xFC -> QUOTATION MARK
u'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
u'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1140 generated from 'python-mappings/CP1140.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1140',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x9c' # 0x04 -> CONTROL
u'\t' # 0x05 -> HORIZONTAL TABULATION
u'\x86' # 0x06 -> CONTROL
u'\x7f' # 0x07 -> DELETE
u'\x97' # 0x08 -> CONTROL
u'\x8d' # 0x09 -> CONTROL
u'\x8e' # 0x0A -> CONTROL
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x9d' # 0x14 -> CONTROL
u'\x85' # 0x15 -> CONTROL
u'\x08' # 0x16 -> BACKSPACE
u'\x87' # 0x17 -> CONTROL
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x92' # 0x1A -> CONTROL
u'\x8f' # 0x1B -> CONTROL
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u'\x80' # 0x20 -> CONTROL
u'\x81' # 0x21 -> CONTROL
u'\x82' # 0x22 -> CONTROL
u'\x83' # 0x23 -> CONTROL
u'\x84' # 0x24 -> CONTROL
u'\n' # 0x25 -> LINE FEED
u'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
u'\x1b' # 0x27 -> ESCAPE
u'\x88' # 0x28 -> CONTROL
u'\x89' # 0x29 -> CONTROL
u'\x8a' # 0x2A -> CONTROL
u'\x8b' # 0x2B -> CONTROL
u'\x8c' # 0x2C -> CONTROL
u'\x05' # 0x2D -> ENQUIRY
u'\x06' # 0x2E -> ACKNOWLEDGE
u'\x07' # 0x2F -> BELL
u'\x90' # 0x30 -> CONTROL
u'\x91' # 0x31 -> CONTROL
u'\x16' # 0x32 -> SYNCHRONOUS IDLE
u'\x93' # 0x33 -> CONTROL
u'\x94' # 0x34 -> CONTROL
u'\x95' # 0x35 -> CONTROL
u'\x96' # 0x36 -> CONTROL
u'\x04' # 0x37 -> END OF TRANSMISSION
u'\x98' # 0x38 -> CONTROL
u'\x99' # 0x39 -> CONTROL
u'\x9a' # 0x3A -> CONTROL
u'\x9b' # 0x3B -> CONTROL
u'\x14' # 0x3C -> DEVICE CONTROL FOUR
u'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
u'\x9e' # 0x3E -> CONTROL
u'\x1a' # 0x3F -> SUBSTITUTE
u' ' # 0x40 -> SPACE
u'\xa0' # 0x41 -> NO-BREAK SPACE
u'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x43 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x48 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
u'\xa2' # 0x4A -> CENT SIGN
u'.' # 0x4B -> FULL STOP
u'<' # 0x4C -> LESS-THAN SIGN
u'(' # 0x4D -> LEFT PARENTHESIS
u'+' # 0x4E -> PLUS SIGN
u'|' # 0x4F -> VERTICAL LINE
u'&' # 0x50 -> AMPERSAND
u'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
u'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
u'\xdf' # 0x59 -> LATIN SMALL LETTER SHARP S (GERMAN)
u'!' # 0x5A -> EXCLAMATION MARK
u'$' # 0x5B -> DOLLAR SIGN
u'*' # 0x5C -> ASTERISK
u')' # 0x5D -> RIGHT PARENTHESIS
u';' # 0x5E -> SEMICOLON
u'\xac' # 0x5F -> NOT SIGN
u'-' # 0x60 -> HYPHEN-MINUS
u'/' # 0x61 -> SOLIDUS
u'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc4' # 0x63 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc7' # 0x68 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xa6' # 0x6A -> BROKEN BAR
u',' # 0x6B -> COMMA
u'%' # 0x6C -> PERCENT SIGN
u'_' # 0x6D -> LOW LINE
u'>' # 0x6E -> GREATER-THAN SIGN
u'?' # 0x6F -> QUESTION MARK
u'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
u'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
u'`' # 0x79 -> GRAVE ACCENT
u':' # 0x7A -> COLON
u'#' # 0x7B -> NUMBER SIGN
u'@' # 0x7C -> COMMERCIAL AT
u"'" # 0x7D -> APOSTROPHE
u'=' # 0x7E -> EQUALS SIGN
u'"' # 0x7F -> QUOTATION MARK
u'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
u'a' # 0x81 -> LATIN SMALL LETTER A
u'b' # 0x82 -> LATIN SMALL LETTER B
u'c' # 0x83 -> LATIN SMALL LETTER C
u'd' # 0x84 -> LATIN SMALL LETTER D
u'e' # 0x85 -> LATIN SMALL LETTER E
u'f' # 0x86 -> LATIN SMALL LETTER F
u'g' # 0x87 -> LATIN SMALL LETTER G
u'h' # 0x88 -> LATIN SMALL LETTER H
u'i' # 0x89 -> LATIN SMALL LETTER I
u'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xf0' # 0x8C -> LATIN SMALL LETTER ETH (ICELANDIC)
u'\xfd' # 0x8D -> LATIN SMALL LETTER Y WITH ACUTE
u'\xfe' # 0x8E -> LATIN SMALL LETTER THORN (ICELANDIC)
u'\xb1' # 0x8F -> PLUS-MINUS SIGN
u'\xb0' # 0x90 -> DEGREE SIGN
u'j' # 0x91 -> LATIN SMALL LETTER J
u'k' # 0x92 -> LATIN SMALL LETTER K
u'l' # 0x93 -> LATIN SMALL LETTER L
u'm' # 0x94 -> LATIN SMALL LETTER M
u'n' # 0x95 -> LATIN SMALL LETTER N
u'o' # 0x96 -> LATIN SMALL LETTER O
u'p' # 0x97 -> LATIN SMALL LETTER P
u'q' # 0x98 -> LATIN SMALL LETTER Q
u'r' # 0x99 -> LATIN SMALL LETTER R
u'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
u'\xe6' # 0x9C -> LATIN SMALL LIGATURE AE
u'\xb8' # 0x9D -> CEDILLA
u'\xc6' # 0x9E -> LATIN CAPITAL LIGATURE AE
u'\u20ac' # 0x9F -> EURO SIGN
u'\xb5' # 0xA0 -> MICRO SIGN
u'~' # 0xA1 -> TILDE
u's' # 0xA2 -> LATIN SMALL LETTER S
u't' # 0xA3 -> LATIN SMALL LETTER T
u'u' # 0xA4 -> LATIN SMALL LETTER U
u'v' # 0xA5 -> LATIN SMALL LETTER V
u'w' # 0xA6 -> LATIN SMALL LETTER W
u'x' # 0xA7 -> LATIN SMALL LETTER X
u'y' # 0xA8 -> LATIN SMALL LETTER Y
u'z' # 0xA9 -> LATIN SMALL LETTER Z
u'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
u'\xbf' # 0xAB -> INVERTED QUESTION MARK
u'\xd0' # 0xAC -> LATIN CAPITAL LETTER ETH (ICELANDIC)
u'\xdd' # 0xAD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xde' # 0xAE -> LATIN CAPITAL LETTER THORN (ICELANDIC)
u'\xae' # 0xAF -> REGISTERED SIGN
u'^' # 0xB0 -> CIRCUMFLEX ACCENT
u'\xa3' # 0xB1 -> POUND SIGN
u'\xa5' # 0xB2 -> YEN SIGN
u'\xb7' # 0xB3 -> MIDDLE DOT
u'\xa9' # 0xB4 -> COPYRIGHT SIGN
u'\xa7' # 0xB5 -> SECTION SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
u'[' # 0xBA -> LEFT SQUARE BRACKET
u']' # 0xBB -> RIGHT SQUARE BRACKET
u'\xaf' # 0xBC -> MACRON
u'\xa8' # 0xBD -> DIAERESIS
u'\xb4' # 0xBE -> ACUTE ACCENT
u'\xd7' # 0xBF -> MULTIPLICATION SIGN
u'{' # 0xC0 -> LEFT CURLY BRACKET
u'A' # 0xC1 -> LATIN CAPITAL LETTER A
u'B' # 0xC2 -> LATIN CAPITAL LETTER B
u'C' # 0xC3 -> LATIN CAPITAL LETTER C
u'D' # 0xC4 -> LATIN CAPITAL LETTER D
u'E' # 0xC5 -> LATIN CAPITAL LETTER E
u'F' # 0xC6 -> LATIN CAPITAL LETTER F
u'G' # 0xC7 -> LATIN CAPITAL LETTER G
u'H' # 0xC8 -> LATIN CAPITAL LETTER H
u'I' # 0xC9 -> LATIN CAPITAL LETTER I
u'\xad' # 0xCA -> SOFT HYPHEN
u'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0xCC -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
u'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
u'}' # 0xD0 -> RIGHT CURLY BRACKET
u'J' # 0xD1 -> LATIN CAPITAL LETTER J
u'K' # 0xD2 -> LATIN CAPITAL LETTER K
u'L' # 0xD3 -> LATIN CAPITAL LETTER L
u'M' # 0xD4 -> LATIN CAPITAL LETTER M
u'N' # 0xD5 -> LATIN CAPITAL LETTER N
u'O' # 0xD6 -> LATIN CAPITAL LETTER O
u'P' # 0xD7 -> LATIN CAPITAL LETTER P
u'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
u'R' # 0xD9 -> LATIN CAPITAL LETTER R
u'\xb9' # 0xDA -> SUPERSCRIPT ONE
u'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xDC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
u'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\\' # 0xE0 -> REVERSE SOLIDUS
u'\xf7' # 0xE1 -> DIVISION SIGN
u'S' # 0xE2 -> LATIN CAPITAL LETTER S
u'T' # 0xE3 -> LATIN CAPITAL LETTER T
u'U' # 0xE4 -> LATIN CAPITAL LETTER U
u'V' # 0xE5 -> LATIN CAPITAL LETTER V
u'W' # 0xE6 -> LATIN CAPITAL LETTER W
u'X' # 0xE7 -> LATIN CAPITAL LETTER X
u'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
u'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
u'\xb2' # 0xEA -> SUPERSCRIPT TWO
u'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd6' # 0xEC -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
u'0' # 0xF0 -> DIGIT ZERO
u'1' # 0xF1 -> DIGIT ONE
u'2' # 0xF2 -> DIGIT TWO
u'3' # 0xF3 -> DIGIT THREE
u'4' # 0xF4 -> DIGIT FOUR
u'5' # 0xF5 -> DIGIT FIVE
u'6' # 0xF6 -> DIGIT SIX
u'7' # 0xF7 -> DIGIT SEVEN
u'8' # 0xF8 -> DIGIT EIGHT
u'9' # 0xF9 -> DIGIT NINE
u'\xb3' # 0xFA -> SUPERSCRIPT THREE
u'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xFC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
u'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1250 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1250',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\ufffe' # 0x81 -> UNDEFINED
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\ufffe' # 0x83 -> UNDEFINED
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\ufffe' # 0x88 -> UNDEFINED
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\u0160' # 0x8A -> LATIN CAPITAL LETTER S WITH CARON
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u015a' # 0x8C -> LATIN CAPITAL LETTER S WITH ACUTE
u'\u0164' # 0x8D -> LATIN CAPITAL LETTER T WITH CARON
u'\u017d' # 0x8E -> LATIN CAPITAL LETTER Z WITH CARON
u'\u0179' # 0x8F -> LATIN CAPITAL LETTER Z WITH ACUTE
u'\ufffe' # 0x90 -> UNDEFINED
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\ufffe' # 0x98 -> UNDEFINED
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\u0161' # 0x9A -> LATIN SMALL LETTER S WITH CARON
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\u015b' # 0x9C -> LATIN SMALL LETTER S WITH ACUTE
u'\u0165' # 0x9D -> LATIN SMALL LETTER T WITH CARON
u'\u017e' # 0x9E -> LATIN SMALL LETTER Z WITH CARON
u'\u017a' # 0x9F -> LATIN SMALL LETTER Z WITH ACUTE
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u02c7' # 0xA1 -> CARON
u'\u02d8' # 0xA2 -> BREVE
u'\u0141' # 0xA3 -> LATIN CAPITAL LETTER L WITH STROKE
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\u0104' # 0xA5 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u015e' # 0xAA -> LATIN CAPITAL LETTER S WITH CEDILLA
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\u017b' # 0xAF -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u02db' # 0xB2 -> OGONEK
u'\u0142' # 0xB3 -> LATIN SMALL LETTER L WITH STROKE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\u0105' # 0xB9 -> LATIN SMALL LETTER A WITH OGONEK
u'\u015f' # 0xBA -> LATIN SMALL LETTER S WITH CEDILLA
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u013d' # 0xBC -> LATIN CAPITAL LETTER L WITH CARON
u'\u02dd' # 0xBD -> DOUBLE ACUTE ACCENT
u'\u013e' # 0xBE -> LATIN SMALL LETTER L WITH CARON
u'\u017c' # 0xBF -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\u0154' # 0xC0 -> LATIN CAPITAL LETTER R WITH ACUTE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\u0102' # 0xC3 -> LATIN CAPITAL LETTER A WITH BREVE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\u0139' # 0xC5 -> LATIN CAPITAL LETTER L WITH ACUTE
u'\u0106' # 0xC6 -> LATIN CAPITAL LETTER C WITH ACUTE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\u0118' # 0xCA -> LATIN CAPITAL LETTER E WITH OGONEK
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\u011a' # 0xCC -> LATIN CAPITAL LETTER E WITH CARON
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\u010e' # 0xCF -> LATIN CAPITAL LETTER D WITH CARON
u'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
u'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
u'\u0147' # 0xD2 -> LATIN CAPITAL LETTER N WITH CARON
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\u0150' # 0xD5 -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\u0158' # 0xD8 -> LATIN CAPITAL LETTER R WITH CARON
u'\u016e' # 0xD9 -> LATIN CAPITAL LETTER U WITH RING ABOVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\u0170' # 0xDB -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\u0162' # 0xDE -> LATIN CAPITAL LETTER T WITH CEDILLA
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\u0155' # 0xE0 -> LATIN SMALL LETTER R WITH ACUTE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\u0103' # 0xE3 -> LATIN SMALL LETTER A WITH BREVE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u013a' # 0xE5 -> LATIN SMALL LETTER L WITH ACUTE
u'\u0107' # 0xE6 -> LATIN SMALL LETTER C WITH ACUTE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\u0119' # 0xEA -> LATIN SMALL LETTER E WITH OGONEK
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\u011b' # 0xEC -> LATIN SMALL LETTER E WITH CARON
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\u010f' # 0xEF -> LATIN SMALL LETTER D WITH CARON
u'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
u'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
u'\u0148' # 0xF2 -> LATIN SMALL LETTER N WITH CARON
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\u0151' # 0xF5 -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\u0159' # 0xF8 -> LATIN SMALL LETTER R WITH CARON
u'\u016f' # 0xF9 -> LATIN SMALL LETTER U WITH RING ABOVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\u0171' # 0xFB -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
u'\u0163' # 0xFE -> LATIN SMALL LETTER T WITH CEDILLA
u'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1251 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1251',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u0402' # 0x80 -> CYRILLIC CAPITAL LETTER DJE
u'\u0403' # 0x81 -> CYRILLIC CAPITAL LETTER GJE
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\u0453' # 0x83 -> CYRILLIC SMALL LETTER GJE
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\u20ac' # 0x88 -> EURO SIGN
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\u0409' # 0x8A -> CYRILLIC CAPITAL LETTER LJE
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u040a' # 0x8C -> CYRILLIC CAPITAL LETTER NJE
u'\u040c' # 0x8D -> CYRILLIC CAPITAL LETTER KJE
u'\u040b' # 0x8E -> CYRILLIC CAPITAL LETTER TSHE
u'\u040f' # 0x8F -> CYRILLIC CAPITAL LETTER DZHE
u'\u0452' # 0x90 -> CYRILLIC SMALL LETTER DJE
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\ufffe' # 0x98 -> UNDEFINED
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\u0459' # 0x9A -> CYRILLIC SMALL LETTER LJE
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\u045a' # 0x9C -> CYRILLIC SMALL LETTER NJE
u'\u045c' # 0x9D -> CYRILLIC SMALL LETTER KJE
u'\u045b' # 0x9E -> CYRILLIC SMALL LETTER TSHE
u'\u045f' # 0x9F -> CYRILLIC SMALL LETTER DZHE
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u040e' # 0xA1 -> CYRILLIC CAPITAL LETTER SHORT U
u'\u045e' # 0xA2 -> CYRILLIC SMALL LETTER SHORT U
u'\u0408' # 0xA3 -> CYRILLIC CAPITAL LETTER JE
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\u0490' # 0xA5 -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\u0401' # 0xA8 -> CYRILLIC CAPITAL LETTER IO
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u0404' # 0xAA -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\u0407' # 0xAF -> CYRILLIC CAPITAL LETTER YI
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u0406' # 0xB2 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
u'\u0456' # 0xB3 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
u'\u0491' # 0xB4 -> CYRILLIC SMALL LETTER GHE WITH UPTURN
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\u0451' # 0xB8 -> CYRILLIC SMALL LETTER IO
u'\u2116' # 0xB9 -> NUMERO SIGN
u'\u0454' # 0xBA -> CYRILLIC SMALL LETTER UKRAINIAN IE
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u0458' # 0xBC -> CYRILLIC SMALL LETTER JE
u'\u0405' # 0xBD -> CYRILLIC CAPITAL LETTER DZE
u'\u0455' # 0xBE -> CYRILLIC SMALL LETTER DZE
u'\u0457' # 0xBF -> CYRILLIC SMALL LETTER YI
u'\u0410' # 0xC0 -> CYRILLIC CAPITAL LETTER A
u'\u0411' # 0xC1 -> CYRILLIC CAPITAL LETTER BE
u'\u0412' # 0xC2 -> CYRILLIC CAPITAL LETTER VE
u'\u0413' # 0xC3 -> CYRILLIC CAPITAL LETTER GHE
u'\u0414' # 0xC4 -> CYRILLIC CAPITAL LETTER DE
u'\u0415' # 0xC5 -> CYRILLIC CAPITAL LETTER IE
u'\u0416' # 0xC6 -> CYRILLIC CAPITAL LETTER ZHE
u'\u0417' # 0xC7 -> CYRILLIC CAPITAL LETTER ZE
u'\u0418' # 0xC8 -> CYRILLIC CAPITAL LETTER I
u'\u0419' # 0xC9 -> CYRILLIC CAPITAL LETTER SHORT I
u'\u041a' # 0xCA -> CYRILLIC CAPITAL LETTER KA
u'\u041b' # 0xCB -> CYRILLIC CAPITAL LETTER EL
u'\u041c' # 0xCC -> CYRILLIC CAPITAL LETTER EM
u'\u041d' # 0xCD -> CYRILLIC CAPITAL LETTER EN
u'\u041e' # 0xCE -> CYRILLIC CAPITAL LETTER O
u'\u041f' # 0xCF -> CYRILLIC CAPITAL LETTER PE
u'\u0420' # 0xD0 -> CYRILLIC CAPITAL LETTER ER
u'\u0421' # 0xD1 -> CYRILLIC CAPITAL LETTER ES
u'\u0422' # 0xD2 -> CYRILLIC CAPITAL LETTER TE
u'\u0423' # 0xD3 -> CYRILLIC CAPITAL LETTER U
u'\u0424' # 0xD4 -> CYRILLIC CAPITAL LETTER EF
u'\u0425' # 0xD5 -> CYRILLIC CAPITAL LETTER HA
u'\u0426' # 0xD6 -> CYRILLIC CAPITAL LETTER TSE
u'\u0427' # 0xD7 -> CYRILLIC CAPITAL LETTER CHE
u'\u0428' # 0xD8 -> CYRILLIC CAPITAL LETTER SHA
u'\u0429' # 0xD9 -> CYRILLIC CAPITAL LETTER SHCHA
u'\u042a' # 0xDA -> CYRILLIC CAPITAL LETTER HARD SIGN
u'\u042b' # 0xDB -> CYRILLIC CAPITAL LETTER YERU
u'\u042c' # 0xDC -> CYRILLIC CAPITAL LETTER SOFT SIGN
u'\u042d' # 0xDD -> CYRILLIC CAPITAL LETTER E
u'\u042e' # 0xDE -> CYRILLIC CAPITAL LETTER YU
u'\u042f' # 0xDF -> CYRILLIC CAPITAL LETTER YA
u'\u0430' # 0xE0 -> CYRILLIC SMALL LETTER A
u'\u0431' # 0xE1 -> CYRILLIC SMALL LETTER BE
u'\u0432' # 0xE2 -> CYRILLIC SMALL LETTER VE
u'\u0433' # 0xE3 -> CYRILLIC SMALL LETTER GHE
u'\u0434' # 0xE4 -> CYRILLIC SMALL LETTER DE
u'\u0435' # 0xE5 -> CYRILLIC SMALL LETTER IE
u'\u0436' # 0xE6 -> CYRILLIC SMALL LETTER ZHE
u'\u0437' # 0xE7 -> CYRILLIC SMALL LETTER ZE
u'\u0438' # 0xE8 -> CYRILLIC SMALL LETTER I
u'\u0439' # 0xE9 -> CYRILLIC SMALL LETTER SHORT I
u'\u043a' # 0xEA -> CYRILLIC SMALL LETTER KA
u'\u043b' # 0xEB -> CYRILLIC SMALL LETTER EL
u'\u043c' # 0xEC -> CYRILLIC SMALL LETTER EM
u'\u043d' # 0xED -> CYRILLIC SMALL LETTER EN
u'\u043e' # 0xEE -> CYRILLIC SMALL LETTER O
u'\u043f' # 0xEF -> CYRILLIC SMALL LETTER PE
u'\u0440' # 0xF0 -> CYRILLIC SMALL LETTER ER
u'\u0441' # 0xF1 -> CYRILLIC SMALL LETTER ES
u'\u0442' # 0xF2 -> CYRILLIC SMALL LETTER TE
u'\u0443' # 0xF3 -> CYRILLIC SMALL LETTER U
u'\u0444' # 0xF4 -> CYRILLIC SMALL LETTER EF
u'\u0445' # 0xF5 -> CYRILLIC SMALL LETTER HA
u'\u0446' # 0xF6 -> CYRILLIC SMALL LETTER TSE
u'\u0447' # 0xF7 -> CYRILLIC SMALL LETTER CHE
u'\u0448' # 0xF8 -> CYRILLIC SMALL LETTER SHA
u'\u0449' # 0xF9 -> CYRILLIC SMALL LETTER SHCHA
u'\u044a' # 0xFA -> CYRILLIC SMALL LETTER HARD SIGN
u'\u044b' # 0xFB -> CYRILLIC SMALL LETTER YERU
u'\u044c' # 0xFC -> CYRILLIC SMALL LETTER SOFT SIGN
u'\u044d' # 0xFD -> CYRILLIC SMALL LETTER E
u'\u044e' # 0xFE -> CYRILLIC SMALL LETTER YU
u'\u044f' # 0xFF -> CYRILLIC SMALL LETTER YA
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1252 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1252',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\ufffe' # 0x81 -> UNDEFINED
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\u0160' # 0x8A -> LATIN CAPITAL LETTER S WITH CARON
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
u'\ufffe' # 0x8D -> UNDEFINED
u'\u017d' # 0x8E -> LATIN CAPITAL LETTER Z WITH CARON
u'\ufffe' # 0x8F -> UNDEFINED
u'\ufffe' # 0x90 -> UNDEFINED
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\u02dc' # 0x98 -> SMALL TILDE
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\u0161' # 0x9A -> LATIN SMALL LETTER S WITH CARON
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
u'\ufffe' # 0x9D -> UNDEFINED
u'\u017e' # 0x9E -> LATIN SMALL LETTER Z WITH CARON
u'\u0178' # 0x9F -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\xbf' # 0xBF -> INVERTED QUESTION MARK
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH
u'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH
u'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
u'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
u'\xfe' # 0xFE -> LATIN SMALL LETTER THORN
u'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1253 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1253.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1253',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\ufffe' # 0x81 -> UNDEFINED
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\ufffe' # 0x88 -> UNDEFINED
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\ufffe' # 0x8A -> UNDEFINED
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\ufffe' # 0x8C -> UNDEFINED
u'\ufffe' # 0x8D -> UNDEFINED
u'\ufffe' # 0x8E -> UNDEFINED
u'\ufffe' # 0x8F -> UNDEFINED
u'\ufffe' # 0x90 -> UNDEFINED
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\ufffe' # 0x98 -> UNDEFINED
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\ufffe' # 0x9A -> UNDEFINED
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\ufffe' # 0x9C -> UNDEFINED
u'\ufffe' # 0x9D -> UNDEFINED
u'\ufffe' # 0x9E -> UNDEFINED
u'\ufffe' # 0x9F -> UNDEFINED
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0385' # 0xA1 -> GREEK DIALYTIKA TONOS
u'\u0386' # 0xA2 -> GREEK CAPITAL LETTER ALPHA WITH TONOS
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\ufffe' # 0xAA -> UNDEFINED
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\u2015' # 0xAF -> HORIZONTAL BAR
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\u0384' # 0xB4 -> GREEK TONOS
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\u0388' # 0xB8 -> GREEK CAPITAL LETTER EPSILON WITH TONOS
u'\u0389' # 0xB9 -> GREEK CAPITAL LETTER ETA WITH TONOS
u'\u038a' # 0xBA -> GREEK CAPITAL LETTER IOTA WITH TONOS
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u038c' # 0xBC -> GREEK CAPITAL LETTER OMICRON WITH TONOS
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\u038e' # 0xBE -> GREEK CAPITAL LETTER UPSILON WITH TONOS
u'\u038f' # 0xBF -> GREEK CAPITAL LETTER OMEGA WITH TONOS
u'\u0390' # 0xC0 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
u'\u0391' # 0xC1 -> GREEK CAPITAL LETTER ALPHA
u'\u0392' # 0xC2 -> GREEK CAPITAL LETTER BETA
u'\u0393' # 0xC3 -> GREEK CAPITAL LETTER GAMMA
u'\u0394' # 0xC4 -> GREEK CAPITAL LETTER DELTA
u'\u0395' # 0xC5 -> GREEK CAPITAL LETTER EPSILON
u'\u0396' # 0xC6 -> GREEK CAPITAL LETTER ZETA
u'\u0397' # 0xC7 -> GREEK CAPITAL LETTER ETA
u'\u0398' # 0xC8 -> GREEK CAPITAL LETTER THETA
u'\u0399' # 0xC9 -> GREEK CAPITAL LETTER IOTA
u'\u039a' # 0xCA -> GREEK CAPITAL LETTER KAPPA
u'\u039b' # 0xCB -> GREEK CAPITAL LETTER LAMDA
u'\u039c' # 0xCC -> GREEK CAPITAL LETTER MU
u'\u039d' # 0xCD -> GREEK CAPITAL LETTER NU
u'\u039e' # 0xCE -> GREEK CAPITAL LETTER XI
u'\u039f' # 0xCF -> GREEK CAPITAL LETTER OMICRON
u'\u03a0' # 0xD0 -> GREEK CAPITAL LETTER PI
u'\u03a1' # 0xD1 -> GREEK CAPITAL LETTER RHO
u'\ufffe' # 0xD2 -> UNDEFINED
u'\u03a3' # 0xD3 -> GREEK CAPITAL LETTER SIGMA
u'\u03a4' # 0xD4 -> GREEK CAPITAL LETTER TAU
u'\u03a5' # 0xD5 -> GREEK CAPITAL LETTER UPSILON
u'\u03a6' # 0xD6 -> GREEK CAPITAL LETTER PHI
u'\u03a7' # 0xD7 -> GREEK CAPITAL LETTER CHI
u'\u03a8' # 0xD8 -> GREEK CAPITAL LETTER PSI
u'\u03a9' # 0xD9 -> GREEK CAPITAL LETTER OMEGA
u'\u03aa' # 0xDA -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
u'\u03ab' # 0xDB -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
u'\u03ac' # 0xDC -> GREEK SMALL LETTER ALPHA WITH TONOS
u'\u03ad' # 0xDD -> GREEK SMALL LETTER EPSILON WITH TONOS
u'\u03ae' # 0xDE -> GREEK SMALL LETTER ETA WITH TONOS
u'\u03af' # 0xDF -> GREEK SMALL LETTER IOTA WITH TONOS
u'\u03b0' # 0xE0 -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
u'\u03b1' # 0xE1 -> GREEK SMALL LETTER ALPHA
u'\u03b2' # 0xE2 -> GREEK SMALL LETTER BETA
u'\u03b3' # 0xE3 -> GREEK SMALL LETTER GAMMA
u'\u03b4' # 0xE4 -> GREEK SMALL LETTER DELTA
u'\u03b5' # 0xE5 -> GREEK SMALL LETTER EPSILON
u'\u03b6' # 0xE6 -> GREEK SMALL LETTER ZETA
u'\u03b7' # 0xE7 -> GREEK SMALL LETTER ETA
u'\u03b8' # 0xE8 -> GREEK SMALL LETTER THETA
u'\u03b9' # 0xE9 -> GREEK SMALL LETTER IOTA
u'\u03ba' # 0xEA -> GREEK SMALL LETTER KAPPA
u'\u03bb' # 0xEB -> GREEK SMALL LETTER LAMDA
u'\u03bc' # 0xEC -> GREEK SMALL LETTER MU
u'\u03bd' # 0xED -> GREEK SMALL LETTER NU
u'\u03be' # 0xEE -> GREEK SMALL LETTER XI
u'\u03bf' # 0xEF -> GREEK SMALL LETTER OMICRON
u'\u03c0' # 0xF0 -> GREEK SMALL LETTER PI
u'\u03c1' # 0xF1 -> GREEK SMALL LETTER RHO
u'\u03c2' # 0xF2 -> GREEK SMALL LETTER FINAL SIGMA
u'\u03c3' # 0xF3 -> GREEK SMALL LETTER SIGMA
u'\u03c4' # 0xF4 -> GREEK SMALL LETTER TAU
u'\u03c5' # 0xF5 -> GREEK SMALL LETTER UPSILON
u'\u03c6' # 0xF6 -> GREEK SMALL LETTER PHI
u'\u03c7' # 0xF7 -> GREEK SMALL LETTER CHI
u'\u03c8' # 0xF8 -> GREEK SMALL LETTER PSI
u'\u03c9' # 0xF9 -> GREEK SMALL LETTER OMEGA
u'\u03ca' # 0xFA -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
u'\u03cb' # 0xFB -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
u'\u03cc' # 0xFC -> GREEK SMALL LETTER OMICRON WITH TONOS
u'\u03cd' # 0xFD -> GREEK SMALL LETTER UPSILON WITH TONOS
u'\u03ce' # 0xFE -> GREEK SMALL LETTER OMEGA WITH TONOS
u'\ufffe' # 0xFF -> UNDEFINED
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1254 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1254.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1254',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\ufffe' # 0x81 -> UNDEFINED
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\u0160' # 0x8A -> LATIN CAPITAL LETTER S WITH CARON
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
u'\ufffe' # 0x8D -> UNDEFINED
u'\ufffe' # 0x8E -> UNDEFINED
u'\ufffe' # 0x8F -> UNDEFINED
u'\ufffe' # 0x90 -> UNDEFINED
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\u02dc' # 0x98 -> SMALL TILDE
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\u0161' # 0x9A -> LATIN SMALL LETTER S WITH CARON
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
u'\ufffe' # 0x9D -> UNDEFINED
u'\ufffe' # 0x9E -> UNDEFINED
u'\u0178' # 0x9F -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\xbf' # 0xBF -> INVERTED QUESTION MARK
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\u011e' # 0xD0 -> LATIN CAPITAL LETTER G WITH BREVE
u'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u0130' # 0xDD -> LATIN CAPITAL LETTER I WITH DOT ABOVE
u'\u015e' # 0xDE -> LATIN CAPITAL LETTER S WITH CEDILLA
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\u011f' # 0xF0 -> LATIN SMALL LETTER G WITH BREVE
u'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
u'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u0131' # 0xFD -> LATIN SMALL LETTER DOTLESS I
u'\u015f' # 0xFE -> LATIN SMALL LETTER S WITH CEDILLA
u'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1255 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1255.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1255',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\ufffe' # 0x81 -> UNDEFINED
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\ufffe' # 0x8A -> UNDEFINED
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\ufffe' # 0x8C -> UNDEFINED
u'\ufffe' # 0x8D -> UNDEFINED
u'\ufffe' # 0x8E -> UNDEFINED
u'\ufffe' # 0x8F -> UNDEFINED
u'\ufffe' # 0x90 -> UNDEFINED
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\u02dc' # 0x98 -> SMALL TILDE
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\ufffe' # 0x9A -> UNDEFINED
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\ufffe' # 0x9C -> UNDEFINED
u'\ufffe' # 0x9D -> UNDEFINED
u'\ufffe' # 0x9E -> UNDEFINED
u'\ufffe' # 0x9F -> UNDEFINED
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\u20aa' # 0xA4 -> NEW SHEQEL SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\xd7' # 0xAA -> MULTIPLICATION SIGN
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\xf7' # 0xBA -> DIVISION SIGN
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\xbf' # 0xBF -> INVERTED QUESTION MARK
u'\u05b0' # 0xC0 -> HEBREW POINT SHEVA
u'\u05b1' # 0xC1 -> HEBREW POINT HATAF SEGOL
u'\u05b2' # 0xC2 -> HEBREW POINT HATAF PATAH
u'\u05b3' # 0xC3 -> HEBREW POINT HATAF QAMATS
u'\u05b4' # 0xC4 -> HEBREW POINT HIRIQ
u'\u05b5' # 0xC5 -> HEBREW POINT TSERE
u'\u05b6' # 0xC6 -> HEBREW POINT SEGOL
u'\u05b7' # 0xC7 -> HEBREW POINT PATAH
u'\u05b8' # 0xC8 -> HEBREW POINT QAMATS
u'\u05b9' # 0xC9 -> HEBREW POINT HOLAM
u'\ufffe' # 0xCA -> UNDEFINED
u'\u05bb' # 0xCB -> HEBREW POINT QUBUTS
u'\u05bc' # 0xCC -> HEBREW POINT DAGESH OR MAPIQ
u'\u05bd' # 0xCD -> HEBREW POINT METEG
u'\u05be' # 0xCE -> HEBREW PUNCTUATION MAQAF
u'\u05bf' # 0xCF -> HEBREW POINT RAFE
u'\u05c0' # 0xD0 -> HEBREW PUNCTUATION PASEQ
u'\u05c1' # 0xD1 -> HEBREW POINT SHIN DOT
u'\u05c2' # 0xD2 -> HEBREW POINT SIN DOT
u'\u05c3' # 0xD3 -> HEBREW PUNCTUATION SOF PASUQ
u'\u05f0' # 0xD4 -> HEBREW LIGATURE YIDDISH DOUBLE VAV
u'\u05f1' # 0xD5 -> HEBREW LIGATURE YIDDISH VAV YOD
u'\u05f2' # 0xD6 -> HEBREW LIGATURE YIDDISH DOUBLE YOD
u'\u05f3' # 0xD7 -> HEBREW PUNCTUATION GERESH
u'\u05f4' # 0xD8 -> HEBREW PUNCTUATION GERSHAYIM
u'\ufffe' # 0xD9 -> UNDEFINED
u'\ufffe' # 0xDA -> UNDEFINED
u'\ufffe' # 0xDB -> UNDEFINED
u'\ufffe' # 0xDC -> UNDEFINED
u'\ufffe' # 0xDD -> UNDEFINED
u'\ufffe' # 0xDE -> UNDEFINED
u'\ufffe' # 0xDF -> UNDEFINED
u'\u05d0' # 0xE0 -> HEBREW LETTER ALEF
u'\u05d1' # 0xE1 -> HEBREW LETTER BET
u'\u05d2' # 0xE2 -> HEBREW LETTER GIMEL
u'\u05d3' # 0xE3 -> HEBREW LETTER DALET
u'\u05d4' # 0xE4 -> HEBREW LETTER HE
u'\u05d5' # 0xE5 -> HEBREW LETTER VAV
u'\u05d6' # 0xE6 -> HEBREW LETTER ZAYIN
u'\u05d7' # 0xE7 -> HEBREW LETTER HET
u'\u05d8' # 0xE8 -> HEBREW LETTER TET
u'\u05d9' # 0xE9 -> HEBREW LETTER YOD
u'\u05da' # 0xEA -> HEBREW LETTER FINAL KAF
u'\u05db' # 0xEB -> HEBREW LETTER KAF
u'\u05dc' # 0xEC -> HEBREW LETTER LAMED
u'\u05dd' # 0xED -> HEBREW LETTER FINAL MEM
u'\u05de' # 0xEE -> HEBREW LETTER MEM
u'\u05df' # 0xEF -> HEBREW LETTER FINAL NUN
u'\u05e0' # 0xF0 -> HEBREW LETTER NUN
u'\u05e1' # 0xF1 -> HEBREW LETTER SAMEKH
u'\u05e2' # 0xF2 -> HEBREW LETTER AYIN
u'\u05e3' # 0xF3 -> HEBREW LETTER FINAL PE
u'\u05e4' # 0xF4 -> HEBREW LETTER PE
u'\u05e5' # 0xF5 -> HEBREW LETTER FINAL TSADI
u'\u05e6' # 0xF6 -> HEBREW LETTER TSADI
u'\u05e7' # 0xF7 -> HEBREW LETTER QOF
u'\u05e8' # 0xF8 -> HEBREW LETTER RESH
u'\u05e9' # 0xF9 -> HEBREW LETTER SHIN
u'\u05ea' # 0xFA -> HEBREW LETTER TAV
u'\ufffe' # 0xFB -> UNDEFINED
u'\ufffe' # 0xFC -> UNDEFINED
u'\u200e' # 0xFD -> LEFT-TO-RIGHT MARK
u'\u200f' # 0xFE -> RIGHT-TO-LEFT MARK
u'\ufffe' # 0xFF -> UNDEFINED
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1256 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1256.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1256',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\u067e' # 0x81 -> ARABIC LETTER PEH
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\u0679' # 0x8A -> ARABIC LETTER TTEH
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
u'\u0686' # 0x8D -> ARABIC LETTER TCHEH
u'\u0698' # 0x8E -> ARABIC LETTER JEH
u'\u0688' # 0x8F -> ARABIC LETTER DDAL
u'\u06af' # 0x90 -> ARABIC LETTER GAF
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\u06a9' # 0x98 -> ARABIC LETTER KEHEH
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\u0691' # 0x9A -> ARABIC LETTER RREH
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
u'\u200c' # 0x9D -> ZERO WIDTH NON-JOINER
u'\u200d' # 0x9E -> ZERO WIDTH JOINER
u'\u06ba' # 0x9F -> ARABIC LETTER NOON GHUNNA
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u060c' # 0xA1 -> ARABIC COMMA
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u06be' # 0xAA -> ARABIC LETTER HEH DOACHASHMEE
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\u061b' # 0xBA -> ARABIC SEMICOLON
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\u061f' # 0xBF -> ARABIC QUESTION MARK
u'\u06c1' # 0xC0 -> ARABIC LETTER HEH GOAL
u'\u0621' # 0xC1 -> ARABIC LETTER HAMZA
u'\u0622' # 0xC2 -> ARABIC LETTER ALEF WITH MADDA ABOVE
u'\u0623' # 0xC3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE
u'\u0624' # 0xC4 -> ARABIC LETTER WAW WITH HAMZA ABOVE
u'\u0625' # 0xC5 -> ARABIC LETTER ALEF WITH HAMZA BELOW
u'\u0626' # 0xC6 -> ARABIC LETTER YEH WITH HAMZA ABOVE
u'\u0627' # 0xC7 -> ARABIC LETTER ALEF
u'\u0628' # 0xC8 -> ARABIC LETTER BEH
u'\u0629' # 0xC9 -> ARABIC LETTER TEH MARBUTA
u'\u062a' # 0xCA -> ARABIC LETTER TEH
u'\u062b' # 0xCB -> ARABIC LETTER THEH
u'\u062c' # 0xCC -> ARABIC LETTER JEEM
u'\u062d' # 0xCD -> ARABIC LETTER HAH
u'\u062e' # 0xCE -> ARABIC LETTER KHAH
u'\u062f' # 0xCF -> ARABIC LETTER DAL
u'\u0630' # 0xD0 -> ARABIC LETTER THAL
u'\u0631' # 0xD1 -> ARABIC LETTER REH
u'\u0632' # 0xD2 -> ARABIC LETTER ZAIN
u'\u0633' # 0xD3 -> ARABIC LETTER SEEN
u'\u0634' # 0xD4 -> ARABIC LETTER SHEEN
u'\u0635' # 0xD5 -> ARABIC LETTER SAD
u'\u0636' # 0xD6 -> ARABIC LETTER DAD
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\u0637' # 0xD8 -> ARABIC LETTER TAH
u'\u0638' # 0xD9 -> ARABIC LETTER ZAH
u'\u0639' # 0xDA -> ARABIC LETTER AIN
u'\u063a' # 0xDB -> ARABIC LETTER GHAIN
u'\u0640' # 0xDC -> ARABIC TATWEEL
u'\u0641' # 0xDD -> ARABIC LETTER FEH
u'\u0642' # 0xDE -> ARABIC LETTER QAF
u'\u0643' # 0xDF -> ARABIC LETTER KAF
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\u0644' # 0xE1 -> ARABIC LETTER LAM
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\u0645' # 0xE3 -> ARABIC LETTER MEEM
u'\u0646' # 0xE4 -> ARABIC LETTER NOON
u'\u0647' # 0xE5 -> ARABIC LETTER HEH
u'\u0648' # 0xE6 -> ARABIC LETTER WAW
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\u0649' # 0xEC -> ARABIC LETTER ALEF MAKSURA
u'\u064a' # 0xED -> ARABIC LETTER YEH
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\u064b' # 0xF0 -> ARABIC FATHATAN
u'\u064c' # 0xF1 -> ARABIC DAMMATAN
u'\u064d' # 0xF2 -> ARABIC KASRATAN
u'\u064e' # 0xF3 -> ARABIC FATHA
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\u064f' # 0xF5 -> ARABIC DAMMA
u'\u0650' # 0xF6 -> ARABIC KASRA
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\u0651' # 0xF8 -> ARABIC SHADDA
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\u0652' # 0xFA -> ARABIC SUKUN
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u200e' # 0xFD -> LEFT-TO-RIGHT MARK
u'\u200f' # 0xFE -> RIGHT-TO-LEFT MARK
u'\u06d2' # 0xFF -> ARABIC LETTER YEH BARREE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1257 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1257.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1257',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\ufffe' # 0x81 -> UNDEFINED
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\ufffe' # 0x83 -> UNDEFINED
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\ufffe' # 0x88 -> UNDEFINED
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\ufffe' # 0x8A -> UNDEFINED
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\ufffe' # 0x8C -> UNDEFINED
u'\xa8' # 0x8D -> DIAERESIS
u'\u02c7' # 0x8E -> CARON
u'\xb8' # 0x8F -> CEDILLA
u'\ufffe' # 0x90 -> UNDEFINED
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\ufffe' # 0x98 -> UNDEFINED
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\ufffe' # 0x9A -> UNDEFINED
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\ufffe' # 0x9C -> UNDEFINED
u'\xaf' # 0x9D -> MACRON
u'\u02db' # 0x9E -> OGONEK
u'\ufffe' # 0x9F -> UNDEFINED
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\ufffe' # 0xA1 -> UNDEFINED
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\ufffe' # 0xA5 -> UNDEFINED
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xd8' # 0xA8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u0156' # 0xAA -> LATIN CAPITAL LETTER R WITH CEDILLA
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xc6' # 0xAF -> LATIN CAPITAL LETTER AE
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xf8' # 0xB8 -> LATIN SMALL LETTER O WITH STROKE
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\u0157' # 0xBA -> LATIN SMALL LETTER R WITH CEDILLA
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\xe6' # 0xBF -> LATIN SMALL LETTER AE
u'\u0104' # 0xC0 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\u012e' # 0xC1 -> LATIN CAPITAL LETTER I WITH OGONEK
u'\u0100' # 0xC2 -> LATIN CAPITAL LETTER A WITH MACRON
u'\u0106' # 0xC3 -> LATIN CAPITAL LETTER C WITH ACUTE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\u0118' # 0xC6 -> LATIN CAPITAL LETTER E WITH OGONEK
u'\u0112' # 0xC7 -> LATIN CAPITAL LETTER E WITH MACRON
u'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\u0179' # 0xCA -> LATIN CAPITAL LETTER Z WITH ACUTE
u'\u0116' # 0xCB -> LATIN CAPITAL LETTER E WITH DOT ABOVE
u'\u0122' # 0xCC -> LATIN CAPITAL LETTER G WITH CEDILLA
u'\u0136' # 0xCD -> LATIN CAPITAL LETTER K WITH CEDILLA
u'\u012a' # 0xCE -> LATIN CAPITAL LETTER I WITH MACRON
u'\u013b' # 0xCF -> LATIN CAPITAL LETTER L WITH CEDILLA
u'\u0160' # 0xD0 -> LATIN CAPITAL LETTER S WITH CARON
u'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
u'\u0145' # 0xD2 -> LATIN CAPITAL LETTER N WITH CEDILLA
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\u014c' # 0xD4 -> LATIN CAPITAL LETTER O WITH MACRON
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\u0172' # 0xD8 -> LATIN CAPITAL LETTER U WITH OGONEK
u'\u0141' # 0xD9 -> LATIN CAPITAL LETTER L WITH STROKE
u'\u015a' # 0xDA -> LATIN CAPITAL LETTER S WITH ACUTE
u'\u016a' # 0xDB -> LATIN CAPITAL LETTER U WITH MACRON
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u017b' # 0xDD -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\u017d' # 0xDE -> LATIN CAPITAL LETTER Z WITH CARON
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\u0105' # 0xE0 -> LATIN SMALL LETTER A WITH OGONEK
u'\u012f' # 0xE1 -> LATIN SMALL LETTER I WITH OGONEK
u'\u0101' # 0xE2 -> LATIN SMALL LETTER A WITH MACRON
u'\u0107' # 0xE3 -> LATIN SMALL LETTER C WITH ACUTE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\u0119' # 0xE6 -> LATIN SMALL LETTER E WITH OGONEK
u'\u0113' # 0xE7 -> LATIN SMALL LETTER E WITH MACRON
u'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\u017a' # 0xEA -> LATIN SMALL LETTER Z WITH ACUTE
u'\u0117' # 0xEB -> LATIN SMALL LETTER E WITH DOT ABOVE
u'\u0123' # 0xEC -> LATIN SMALL LETTER G WITH CEDILLA
u'\u0137' # 0xED -> LATIN SMALL LETTER K WITH CEDILLA
u'\u012b' # 0xEE -> LATIN SMALL LETTER I WITH MACRON
u'\u013c' # 0xEF -> LATIN SMALL LETTER L WITH CEDILLA
u'\u0161' # 0xF0 -> LATIN SMALL LETTER S WITH CARON
u'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
u'\u0146' # 0xF2 -> LATIN SMALL LETTER N WITH CEDILLA
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\u014d' # 0xF4 -> LATIN SMALL LETTER O WITH MACRON
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\u0173' # 0xF8 -> LATIN SMALL LETTER U WITH OGONEK
u'\u0142' # 0xF9 -> LATIN SMALL LETTER L WITH STROKE
u'\u015b' # 0xFA -> LATIN SMALL LETTER S WITH ACUTE
u'\u016b' # 0xFB -> LATIN SMALL LETTER U WITH MACRON
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u017c' # 0xFD -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\u017e' # 0xFE -> LATIN SMALL LETTER Z WITH CARON
u'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1258 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1258.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1258',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\ufffe' # 0x81 -> UNDEFINED
u'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
u'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
u'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\u2020' # 0x86 -> DAGGER
u'\u2021' # 0x87 -> DOUBLE DAGGER
u'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u2030' # 0x89 -> PER MILLE SIGN
u'\ufffe' # 0x8A -> UNDEFINED
u'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
u'\ufffe' # 0x8D -> UNDEFINED
u'\ufffe' # 0x8E -> UNDEFINED
u'\ufffe' # 0x8F -> UNDEFINED
u'\ufffe' # 0x90 -> UNDEFINED
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\u02dc' # 0x98 -> SMALL TILDE
u'\u2122' # 0x99 -> TRADE MARK SIGN
u'\ufffe' # 0x9A -> UNDEFINED
u'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
u'\ufffe' # 0x9D -> UNDEFINED
u'\ufffe' # 0x9E -> UNDEFINED
u'\u0178' # 0x9F -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\xbf' # 0xBF -> INVERTED QUESTION MARK
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\u0102' # 0xC3 -> LATIN CAPITAL LETTER A WITH BREVE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\u0300' # 0xCC -> COMBINING GRAVE ACCENT
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
u'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
u'\u0309' # 0xD2 -> COMBINING HOOK ABOVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\u01a0' # 0xD5 -> LATIN CAPITAL LETTER O WITH HORN
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u01af' # 0xDD -> LATIN CAPITAL LETTER U WITH HORN
u'\u0303' # 0xDE -> COMBINING TILDE
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\u0103' # 0xE3 -> LATIN SMALL LETTER A WITH BREVE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\u0301' # 0xEC -> COMBINING ACUTE ACCENT
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
u'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
u'\u0323' # 0xF2 -> COMBINING DOT BELOW
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\u01a1' # 0xF5 -> LATIN SMALL LETTER O WITH HORN
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u01b0' # 0xFD -> LATIN SMALL LETTER U WITH HORN
u'\u20ab' # 0xFE -> DONG SIGN
u'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp424 generated from 'MAPPINGS/VENDORS/MISC/CP424.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp424',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x9c' # 0x04 -> SELECT
u'\t' # 0x05 -> HORIZONTAL TABULATION
u'\x86' # 0x06 -> REQUIRED NEW LINE
u'\x7f' # 0x07 -> DELETE
u'\x97' # 0x08 -> GRAPHIC ESCAPE
u'\x8d' # 0x09 -> SUPERSCRIPT
u'\x8e' # 0x0A -> REPEAT
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x9d' # 0x14 -> RESTORE/ENABLE PRESENTATION
u'\x85' # 0x15 -> NEW LINE
u'\x08' # 0x16 -> BACKSPACE
u'\x87' # 0x17 -> PROGRAM OPERATOR COMMUNICATION
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x92' # 0x1A -> UNIT BACK SPACE
u'\x8f' # 0x1B -> CUSTOMER USE ONE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u'\x80' # 0x20 -> DIGIT SELECT
u'\x81' # 0x21 -> START OF SIGNIFICANCE
u'\x82' # 0x22 -> FIELD SEPARATOR
u'\x83' # 0x23 -> WORD UNDERSCORE
u'\x84' # 0x24 -> BYPASS OR INHIBIT PRESENTATION
u'\n' # 0x25 -> LINE FEED
u'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
u'\x1b' # 0x27 -> ESCAPE
u'\x88' # 0x28 -> SET ATTRIBUTE
u'\x89' # 0x29 -> START FIELD EXTENDED
u'\x8a' # 0x2A -> SET MODE OR SWITCH
u'\x8b' # 0x2B -> CONTROL SEQUENCE PREFIX
u'\x8c' # 0x2C -> MODIFY FIELD ATTRIBUTE
u'\x05' # 0x2D -> ENQUIRY
u'\x06' # 0x2E -> ACKNOWLEDGE
u'\x07' # 0x2F -> BELL
u'\x90' # 0x30 -> <reserved>
u'\x91' # 0x31 -> <reserved>
u'\x16' # 0x32 -> SYNCHRONOUS IDLE
u'\x93' # 0x33 -> INDEX RETURN
u'\x94' # 0x34 -> PRESENTATION POSITION
u'\x95' # 0x35 -> TRANSPARENT
u'\x96' # 0x36 -> NUMERIC BACKSPACE
u'\x04' # 0x37 -> END OF TRANSMISSION
u'\x98' # 0x38 -> SUBSCRIPT
u'\x99' # 0x39 -> INDENT TABULATION
u'\x9a' # 0x3A -> REVERSE FORM FEED
u'\x9b' # 0x3B -> CUSTOMER USE THREE
u'\x14' # 0x3C -> DEVICE CONTROL FOUR
u'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
u'\x9e' # 0x3E -> <reserved>
u'\x1a' # 0x3F -> SUBSTITUTE
u' ' # 0x40 -> SPACE
u'\u05d0' # 0x41 -> HEBREW LETTER ALEF
u'\u05d1' # 0x42 -> HEBREW LETTER BET
u'\u05d2' # 0x43 -> HEBREW LETTER GIMEL
u'\u05d3' # 0x44 -> HEBREW LETTER DALET
u'\u05d4' # 0x45 -> HEBREW LETTER HE
u'\u05d5' # 0x46 -> HEBREW LETTER VAV
u'\u05d6' # 0x47 -> HEBREW LETTER ZAYIN
u'\u05d7' # 0x48 -> HEBREW LETTER HET
u'\u05d8' # 0x49 -> HEBREW LETTER TET
u'\xa2' # 0x4A -> CENT SIGN
u'.' # 0x4B -> FULL STOP
u'<' # 0x4C -> LESS-THAN SIGN
u'(' # 0x4D -> LEFT PARENTHESIS
u'+' # 0x4E -> PLUS SIGN
u'|' # 0x4F -> VERTICAL LINE
u'&' # 0x50 -> AMPERSAND
u'\u05d9' # 0x51 -> HEBREW LETTER YOD
u'\u05da' # 0x52 -> HEBREW LETTER FINAL KAF
u'\u05db' # 0x53 -> HEBREW LETTER KAF
u'\u05dc' # 0x54 -> HEBREW LETTER LAMED
u'\u05dd' # 0x55 -> HEBREW LETTER FINAL MEM
u'\u05de' # 0x56 -> HEBREW LETTER MEM
u'\u05df' # 0x57 -> HEBREW LETTER FINAL NUN
u'\u05e0' # 0x58 -> HEBREW LETTER NUN
u'\u05e1' # 0x59 -> HEBREW LETTER SAMEKH
u'!' # 0x5A -> EXCLAMATION MARK
u'$' # 0x5B -> DOLLAR SIGN
u'*' # 0x5C -> ASTERISK
u')' # 0x5D -> RIGHT PARENTHESIS
u';' # 0x5E -> SEMICOLON
u'\xac' # 0x5F -> NOT SIGN
u'-' # 0x60 -> HYPHEN-MINUS
u'/' # 0x61 -> SOLIDUS
u'\u05e2' # 0x62 -> HEBREW LETTER AYIN
u'\u05e3' # 0x63 -> HEBREW LETTER FINAL PE
u'\u05e4' # 0x64 -> HEBREW LETTER PE
u'\u05e5' # 0x65 -> HEBREW LETTER FINAL TSADI
u'\u05e6' # 0x66 -> HEBREW LETTER TSADI
u'\u05e7' # 0x67 -> HEBREW LETTER QOF
u'\u05e8' # 0x68 -> HEBREW LETTER RESH
u'\u05e9' # 0x69 -> HEBREW LETTER SHIN
u'\xa6' # 0x6A -> BROKEN BAR
u',' # 0x6B -> COMMA
u'%' # 0x6C -> PERCENT SIGN
u'_' # 0x6D -> LOW LINE
u'>' # 0x6E -> GREATER-THAN SIGN
u'?' # 0x6F -> QUESTION MARK
u'\ufffe' # 0x70 -> UNDEFINED
u'\u05ea' # 0x71 -> HEBREW LETTER TAV
u'\ufffe' # 0x72 -> UNDEFINED
u'\ufffe' # 0x73 -> UNDEFINED
u'\xa0' # 0x74 -> NO-BREAK SPACE
u'\ufffe' # 0x75 -> UNDEFINED
u'\ufffe' # 0x76 -> UNDEFINED
u'\ufffe' # 0x77 -> UNDEFINED
u'\u2017' # 0x78 -> DOUBLE LOW LINE
u'`' # 0x79 -> GRAVE ACCENT
u':' # 0x7A -> COLON
u'#' # 0x7B -> NUMBER SIGN
u'@' # 0x7C -> COMMERCIAL AT
u"'" # 0x7D -> APOSTROPHE
u'=' # 0x7E -> EQUALS SIGN
u'"' # 0x7F -> QUOTATION MARK
u'\ufffe' # 0x80 -> UNDEFINED
u'a' # 0x81 -> LATIN SMALL LETTER A
u'b' # 0x82 -> LATIN SMALL LETTER B
u'c' # 0x83 -> LATIN SMALL LETTER C
u'd' # 0x84 -> LATIN SMALL LETTER D
u'e' # 0x85 -> LATIN SMALL LETTER E
u'f' # 0x86 -> LATIN SMALL LETTER F
u'g' # 0x87 -> LATIN SMALL LETTER G
u'h' # 0x88 -> LATIN SMALL LETTER H
u'i' # 0x89 -> LATIN SMALL LETTER I
u'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\ufffe' # 0x8C -> UNDEFINED
u'\ufffe' # 0x8D -> UNDEFINED
u'\ufffe' # 0x8E -> UNDEFINED
u'\xb1' # 0x8F -> PLUS-MINUS SIGN
u'\xb0' # 0x90 -> DEGREE SIGN
u'j' # 0x91 -> LATIN SMALL LETTER J
u'k' # 0x92 -> LATIN SMALL LETTER K
u'l' # 0x93 -> LATIN SMALL LETTER L
u'm' # 0x94 -> LATIN SMALL LETTER M
u'n' # 0x95 -> LATIN SMALL LETTER N
u'o' # 0x96 -> LATIN SMALL LETTER O
u'p' # 0x97 -> LATIN SMALL LETTER P
u'q' # 0x98 -> LATIN SMALL LETTER Q
u'r' # 0x99 -> LATIN SMALL LETTER R
u'\ufffe' # 0x9A -> UNDEFINED
u'\ufffe' # 0x9B -> UNDEFINED
u'\ufffe' # 0x9C -> UNDEFINED
u'\xb8' # 0x9D -> CEDILLA
u'\ufffe' # 0x9E -> UNDEFINED
u'\xa4' # 0x9F -> CURRENCY SIGN
u'\xb5' # 0xA0 -> MICRO SIGN
u'~' # 0xA1 -> TILDE
u's' # 0xA2 -> LATIN SMALL LETTER S
u't' # 0xA3 -> LATIN SMALL LETTER T
u'u' # 0xA4 -> LATIN SMALL LETTER U
u'v' # 0xA5 -> LATIN SMALL LETTER V
u'w' # 0xA6 -> LATIN SMALL LETTER W
u'x' # 0xA7 -> LATIN SMALL LETTER X
u'y' # 0xA8 -> LATIN SMALL LETTER Y
u'z' # 0xA9 -> LATIN SMALL LETTER Z
u'\ufffe' # 0xAA -> UNDEFINED
u'\ufffe' # 0xAB -> UNDEFINED
u'\ufffe' # 0xAC -> UNDEFINED
u'\ufffe' # 0xAD -> UNDEFINED
u'\ufffe' # 0xAE -> UNDEFINED
u'\xae' # 0xAF -> REGISTERED SIGN
u'^' # 0xB0 -> CIRCUMFLEX ACCENT
u'\xa3' # 0xB1 -> POUND SIGN
u'\xa5' # 0xB2 -> YEN SIGN
u'\xb7' # 0xB3 -> MIDDLE DOT
u'\xa9' # 0xB4 -> COPYRIGHT SIGN
u'\xa7' # 0xB5 -> SECTION SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
u'[' # 0xBA -> LEFT SQUARE BRACKET
u']' # 0xBB -> RIGHT SQUARE BRACKET
u'\xaf' # 0xBC -> MACRON
u'\xa8' # 0xBD -> DIAERESIS
u'\xb4' # 0xBE -> ACUTE ACCENT
u'\xd7' # 0xBF -> MULTIPLICATION SIGN
u'{' # 0xC0 -> LEFT CURLY BRACKET
u'A' # 0xC1 -> LATIN CAPITAL LETTER A
u'B' # 0xC2 -> LATIN CAPITAL LETTER B
u'C' # 0xC3 -> LATIN CAPITAL LETTER C
u'D' # 0xC4 -> LATIN CAPITAL LETTER D
u'E' # 0xC5 -> LATIN CAPITAL LETTER E
u'F' # 0xC6 -> LATIN CAPITAL LETTER F
u'G' # 0xC7 -> LATIN CAPITAL LETTER G
u'H' # 0xC8 -> LATIN CAPITAL LETTER H
u'I' # 0xC9 -> LATIN CAPITAL LETTER I
u'\xad' # 0xCA -> SOFT HYPHEN
u'\ufffe' # 0xCB -> UNDEFINED
u'\ufffe' # 0xCC -> UNDEFINED
u'\ufffe' # 0xCD -> UNDEFINED
u'\ufffe' # 0xCE -> UNDEFINED
u'\ufffe' # 0xCF -> UNDEFINED
u'}' # 0xD0 -> RIGHT CURLY BRACKET
u'J' # 0xD1 -> LATIN CAPITAL LETTER J
u'K' # 0xD2 -> LATIN CAPITAL LETTER K
u'L' # 0xD3 -> LATIN CAPITAL LETTER L
u'M' # 0xD4 -> LATIN CAPITAL LETTER M
u'N' # 0xD5 -> LATIN CAPITAL LETTER N
u'O' # 0xD6 -> LATIN CAPITAL LETTER O
u'P' # 0xD7 -> LATIN CAPITAL LETTER P
u'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
u'R' # 0xD9 -> LATIN CAPITAL LETTER R
u'\xb9' # 0xDA -> SUPERSCRIPT ONE
u'\ufffe' # 0xDB -> UNDEFINED
u'\ufffe' # 0xDC -> UNDEFINED
u'\ufffe' # 0xDD -> UNDEFINED
u'\ufffe' # 0xDE -> UNDEFINED
u'\ufffe' # 0xDF -> UNDEFINED
u'\\' # 0xE0 -> REVERSE SOLIDUS
u'\xf7' # 0xE1 -> DIVISION SIGN
u'S' # 0xE2 -> LATIN CAPITAL LETTER S
u'T' # 0xE3 -> LATIN CAPITAL LETTER T
u'U' # 0xE4 -> LATIN CAPITAL LETTER U
u'V' # 0xE5 -> LATIN CAPITAL LETTER V
u'W' # 0xE6 -> LATIN CAPITAL LETTER W
u'X' # 0xE7 -> LATIN CAPITAL LETTER X
u'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
u'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
u'\xb2' # 0xEA -> SUPERSCRIPT TWO
u'\ufffe' # 0xEB -> UNDEFINED
u'\ufffe' # 0xEC -> UNDEFINED
u'\ufffe' # 0xED -> UNDEFINED
u'\ufffe' # 0xEE -> UNDEFINED
u'\ufffe' # 0xEF -> UNDEFINED
u'0' # 0xF0 -> DIGIT ZERO
u'1' # 0xF1 -> DIGIT ONE
u'2' # 0xF2 -> DIGIT TWO
u'3' # 0xF3 -> DIGIT THREE
u'4' # 0xF4 -> DIGIT FOUR
u'5' # 0xF5 -> DIGIT FIVE
u'6' # 0xF6 -> DIGIT SIX
u'7' # 0xF7 -> DIGIT SEVEN
u'8' # 0xF8 -> DIGIT EIGHT
u'9' # 0xF9 -> DIGIT NINE
u'\xb3' # 0xFA -> SUPERSCRIPT THREE
u'\ufffe' # 0xFB -> UNDEFINED
u'\ufffe' # 0xFC -> UNDEFINED
u'\ufffe' # 0xFD -> UNDEFINED
u'\ufffe' # 0xFE -> UNDEFINED
u'\x9f' # 0xFF -> EIGHT ONES
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp437 generated from 'VENDORS/MICSFT/PC/CP437.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp437',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00a2, # CENT SIGN
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00a5, # YEN SIGN
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
u'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
u'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
u'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
u'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
u'\xff' # 0x0098 -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xa2' # 0x009b -> CENT SIGN
u'\xa3' # 0x009c -> POUND SIGN
u'\xa5' # 0x009d -> YEN SIGN
u'\u20a7' # 0x009e -> PESETA SIGN
u'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
u'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
u'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
u'\u2310' # 0x00a9 -> REVERSED NOT SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
u'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
u'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
u'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
u'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
u'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
u'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
u'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
u'\u221e' # 0x00ec -> INFINITY
u'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
u'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
u'\u2229' # 0x00ef -> INTERSECTION
u'\u2261' # 0x00f0 -> IDENTICAL TO
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
u'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
u'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
u'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\u2248' # 0x00f7 -> ALMOST EQUAL TO
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\u221a' # 0x00fb -> SQUARE ROOT
u'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x009b, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a5: 0x009d, # YEN SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00ff: 0x0098, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp500 generated from 'MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp500',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x9c' # 0x04 -> CONTROL
u'\t' # 0x05 -> HORIZONTAL TABULATION
u'\x86' # 0x06 -> CONTROL
u'\x7f' # 0x07 -> DELETE
u'\x97' # 0x08 -> CONTROL
u'\x8d' # 0x09 -> CONTROL
u'\x8e' # 0x0A -> CONTROL
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x9d' # 0x14 -> CONTROL
u'\x85' # 0x15 -> CONTROL
u'\x08' # 0x16 -> BACKSPACE
u'\x87' # 0x17 -> CONTROL
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x92' # 0x1A -> CONTROL
u'\x8f' # 0x1B -> CONTROL
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u'\x80' # 0x20 -> CONTROL
u'\x81' # 0x21 -> CONTROL
u'\x82' # 0x22 -> CONTROL
u'\x83' # 0x23 -> CONTROL
u'\x84' # 0x24 -> CONTROL
u'\n' # 0x25 -> LINE FEED
u'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
u'\x1b' # 0x27 -> ESCAPE
u'\x88' # 0x28 -> CONTROL
u'\x89' # 0x29 -> CONTROL
u'\x8a' # 0x2A -> CONTROL
u'\x8b' # 0x2B -> CONTROL
u'\x8c' # 0x2C -> CONTROL
u'\x05' # 0x2D -> ENQUIRY
u'\x06' # 0x2E -> ACKNOWLEDGE
u'\x07' # 0x2F -> BELL
u'\x90' # 0x30 -> CONTROL
u'\x91' # 0x31 -> CONTROL
u'\x16' # 0x32 -> SYNCHRONOUS IDLE
u'\x93' # 0x33 -> CONTROL
u'\x94' # 0x34 -> CONTROL
u'\x95' # 0x35 -> CONTROL
u'\x96' # 0x36 -> CONTROL
u'\x04' # 0x37 -> END OF TRANSMISSION
u'\x98' # 0x38 -> CONTROL
u'\x99' # 0x39 -> CONTROL
u'\x9a' # 0x3A -> CONTROL
u'\x9b' # 0x3B -> CONTROL
u'\x14' # 0x3C -> DEVICE CONTROL FOUR
u'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
u'\x9e' # 0x3E -> CONTROL
u'\x1a' # 0x3F -> SUBSTITUTE
u' ' # 0x40 -> SPACE
u'\xa0' # 0x41 -> NO-BREAK SPACE
u'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x43 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x48 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
u'[' # 0x4A -> LEFT SQUARE BRACKET
u'.' # 0x4B -> FULL STOP
u'<' # 0x4C -> LESS-THAN SIGN
u'(' # 0x4D -> LEFT PARENTHESIS
u'+' # 0x4E -> PLUS SIGN
u'!' # 0x4F -> EXCLAMATION MARK
u'&' # 0x50 -> AMPERSAND
u'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
u'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
u'\xdf' # 0x59 -> LATIN SMALL LETTER SHARP S (GERMAN)
u']' # 0x5A -> RIGHT SQUARE BRACKET
u'$' # 0x5B -> DOLLAR SIGN
u'*' # 0x5C -> ASTERISK
u')' # 0x5D -> RIGHT PARENTHESIS
u';' # 0x5E -> SEMICOLON
u'^' # 0x5F -> CIRCUMFLEX ACCENT
u'-' # 0x60 -> HYPHEN-MINUS
u'/' # 0x61 -> SOLIDUS
u'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc4' # 0x63 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc7' # 0x68 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xa6' # 0x6A -> BROKEN BAR
u',' # 0x6B -> COMMA
u'%' # 0x6C -> PERCENT SIGN
u'_' # 0x6D -> LOW LINE
u'>' # 0x6E -> GREATER-THAN SIGN
u'?' # 0x6F -> QUESTION MARK
u'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
u'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
u'`' # 0x79 -> GRAVE ACCENT
u':' # 0x7A -> COLON
u'#' # 0x7B -> NUMBER SIGN
u'@' # 0x7C -> COMMERCIAL AT
u"'" # 0x7D -> APOSTROPHE
u'=' # 0x7E -> EQUALS SIGN
u'"' # 0x7F -> QUOTATION MARK
u'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
u'a' # 0x81 -> LATIN SMALL LETTER A
u'b' # 0x82 -> LATIN SMALL LETTER B
u'c' # 0x83 -> LATIN SMALL LETTER C
u'd' # 0x84 -> LATIN SMALL LETTER D
u'e' # 0x85 -> LATIN SMALL LETTER E
u'f' # 0x86 -> LATIN SMALL LETTER F
u'g' # 0x87 -> LATIN SMALL LETTER G
u'h' # 0x88 -> LATIN SMALL LETTER H
u'i' # 0x89 -> LATIN SMALL LETTER I
u'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xf0' # 0x8C -> LATIN SMALL LETTER ETH (ICELANDIC)
u'\xfd' # 0x8D -> LATIN SMALL LETTER Y WITH ACUTE
u'\xfe' # 0x8E -> LATIN SMALL LETTER THORN (ICELANDIC)
u'\xb1' # 0x8F -> PLUS-MINUS SIGN
u'\xb0' # 0x90 -> DEGREE SIGN
u'j' # 0x91 -> LATIN SMALL LETTER J
u'k' # 0x92 -> LATIN SMALL LETTER K
u'l' # 0x93 -> LATIN SMALL LETTER L
u'm' # 0x94 -> LATIN SMALL LETTER M
u'n' # 0x95 -> LATIN SMALL LETTER N
u'o' # 0x96 -> LATIN SMALL LETTER O
u'p' # 0x97 -> LATIN SMALL LETTER P
u'q' # 0x98 -> LATIN SMALL LETTER Q
u'r' # 0x99 -> LATIN SMALL LETTER R
u'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
u'\xe6' # 0x9C -> LATIN SMALL LIGATURE AE
u'\xb8' # 0x9D -> CEDILLA
u'\xc6' # 0x9E -> LATIN CAPITAL LIGATURE AE
u'\xa4' # 0x9F -> CURRENCY SIGN
u'\xb5' # 0xA0 -> MICRO SIGN
u'~' # 0xA1 -> TILDE
u's' # 0xA2 -> LATIN SMALL LETTER S
u't' # 0xA3 -> LATIN SMALL LETTER T
u'u' # 0xA4 -> LATIN SMALL LETTER U
u'v' # 0xA5 -> LATIN SMALL LETTER V
u'w' # 0xA6 -> LATIN SMALL LETTER W
u'x' # 0xA7 -> LATIN SMALL LETTER X
u'y' # 0xA8 -> LATIN SMALL LETTER Y
u'z' # 0xA9 -> LATIN SMALL LETTER Z
u'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
u'\xbf' # 0xAB -> INVERTED QUESTION MARK
u'\xd0' # 0xAC -> LATIN CAPITAL LETTER ETH (ICELANDIC)
u'\xdd' # 0xAD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xde' # 0xAE -> LATIN CAPITAL LETTER THORN (ICELANDIC)
u'\xae' # 0xAF -> REGISTERED SIGN
u'\xa2' # 0xB0 -> CENT SIGN
u'\xa3' # 0xB1 -> POUND SIGN
u'\xa5' # 0xB2 -> YEN SIGN
u'\xb7' # 0xB3 -> MIDDLE DOT
u'\xa9' # 0xB4 -> COPYRIGHT SIGN
u'\xa7' # 0xB5 -> SECTION SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
u'\xac' # 0xBA -> NOT SIGN
u'|' # 0xBB -> VERTICAL LINE
u'\xaf' # 0xBC -> MACRON
u'\xa8' # 0xBD -> DIAERESIS
u'\xb4' # 0xBE -> ACUTE ACCENT
u'\xd7' # 0xBF -> MULTIPLICATION SIGN
u'{' # 0xC0 -> LEFT CURLY BRACKET
u'A' # 0xC1 -> LATIN CAPITAL LETTER A
u'B' # 0xC2 -> LATIN CAPITAL LETTER B
u'C' # 0xC3 -> LATIN CAPITAL LETTER C
u'D' # 0xC4 -> LATIN CAPITAL LETTER D
u'E' # 0xC5 -> LATIN CAPITAL LETTER E
u'F' # 0xC6 -> LATIN CAPITAL LETTER F
u'G' # 0xC7 -> LATIN CAPITAL LETTER G
u'H' # 0xC8 -> LATIN CAPITAL LETTER H
u'I' # 0xC9 -> LATIN CAPITAL LETTER I
u'\xad' # 0xCA -> SOFT HYPHEN
u'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0xCC -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
u'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
u'}' # 0xD0 -> RIGHT CURLY BRACKET
u'J' # 0xD1 -> LATIN CAPITAL LETTER J
u'K' # 0xD2 -> LATIN CAPITAL LETTER K
u'L' # 0xD3 -> LATIN CAPITAL LETTER L
u'M' # 0xD4 -> LATIN CAPITAL LETTER M
u'N' # 0xD5 -> LATIN CAPITAL LETTER N
u'O' # 0xD6 -> LATIN CAPITAL LETTER O
u'P' # 0xD7 -> LATIN CAPITAL LETTER P
u'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
u'R' # 0xD9 -> LATIN CAPITAL LETTER R
u'\xb9' # 0xDA -> SUPERSCRIPT ONE
u'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xDC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
u'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\\' # 0xE0 -> REVERSE SOLIDUS
u'\xf7' # 0xE1 -> DIVISION SIGN
u'S' # 0xE2 -> LATIN CAPITAL LETTER S
u'T' # 0xE3 -> LATIN CAPITAL LETTER T
u'U' # 0xE4 -> LATIN CAPITAL LETTER U
u'V' # 0xE5 -> LATIN CAPITAL LETTER V
u'W' # 0xE6 -> LATIN CAPITAL LETTER W
u'X' # 0xE7 -> LATIN CAPITAL LETTER X
u'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
u'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
u'\xb2' # 0xEA -> SUPERSCRIPT TWO
u'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd6' # 0xEC -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
u'0' # 0xF0 -> DIGIT ZERO
u'1' # 0xF1 -> DIGIT ONE
u'2' # 0xF2 -> DIGIT TWO
u'3' # 0xF3 -> DIGIT THREE
u'4' # 0xF4 -> DIGIT FOUR
u'5' # 0xF5 -> DIGIT FIVE
u'6' # 0xF6 -> DIGIT SIX
u'7' # 0xF7 -> DIGIT SEVEN
u'8' # 0xF8 -> DIGIT EIGHT
u'9' # 0xF9 -> DIGIT NINE
u'\xb3' # 0xFA -> SUPERSCRIPT THREE
u'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xFC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
u'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
"""Python Character Mapping Codec cp720 generated on Windows:
Vista 6.0.6002 SP2 Multiprocessor Free with the command:
python Tools/unicode/genwincodec.py 720
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp720',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\x80'
u'\x81'
u'\xe9' # 0x82 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x83 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\x84'
u'\xe0' # 0x85 -> LATIN SMALL LETTER A WITH GRAVE
u'\x86'
u'\xe7' # 0x87 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x88 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x89 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x8A -> LATIN SMALL LETTER E WITH GRAVE
u'\xef' # 0x8B -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xee' # 0x8C -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\x8d'
u'\x8e'
u'\x8f'
u'\x90'
u'\u0651' # 0x91 -> ARABIC SHADDA
u'\u0652' # 0x92 -> ARABIC SUKUN
u'\xf4' # 0x93 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xa4' # 0x94 -> CURRENCY SIGN
u'\u0640' # 0x95 -> ARABIC TATWEEL
u'\xfb' # 0x96 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xf9' # 0x97 -> LATIN SMALL LETTER U WITH GRAVE
u'\u0621' # 0x98 -> ARABIC LETTER HAMZA
u'\u0622' # 0x99 -> ARABIC LETTER ALEF WITH MADDA ABOVE
u'\u0623' # 0x9A -> ARABIC LETTER ALEF WITH HAMZA ABOVE
u'\u0624' # 0x9B -> ARABIC LETTER WAW WITH HAMZA ABOVE
u'\xa3' # 0x9C -> POUND SIGN
u'\u0625' # 0x9D -> ARABIC LETTER ALEF WITH HAMZA BELOW
u'\u0626' # 0x9E -> ARABIC LETTER YEH WITH HAMZA ABOVE
u'\u0627' # 0x9F -> ARABIC LETTER ALEF
u'\u0628' # 0xA0 -> ARABIC LETTER BEH
u'\u0629' # 0xA1 -> ARABIC LETTER TEH MARBUTA
u'\u062a' # 0xA2 -> ARABIC LETTER TEH
u'\u062b' # 0xA3 -> ARABIC LETTER THEH
u'\u062c' # 0xA4 -> ARABIC LETTER JEEM
u'\u062d' # 0xA5 -> ARABIC LETTER HAH
u'\u062e' # 0xA6 -> ARABIC LETTER KHAH
u'\u062f' # 0xA7 -> ARABIC LETTER DAL
u'\u0630' # 0xA8 -> ARABIC LETTER THAL
u'\u0631' # 0xA9 -> ARABIC LETTER REH
u'\u0632' # 0xAA -> ARABIC LETTER ZAIN
u'\u0633' # 0xAB -> ARABIC LETTER SEEN
u'\u0634' # 0xAC -> ARABIC LETTER SHEEN
u'\u0635' # 0xAD -> ARABIC LETTER SAD
u'\xab' # 0xAE -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xAF -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0xB0 -> LIGHT SHADE
u'\u2592' # 0xB1 -> MEDIUM SHADE
u'\u2593' # 0xB2 -> DARK SHADE
u'\u2502' # 0xB3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0xB4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0xB5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0xB6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0xB7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0xB8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0xB9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0xBA -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0xBB -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0xBC -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0xBD -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0xBE -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0xBF -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0xC0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0xC1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0xC2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0xC3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0xC4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0xC5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0xC6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0xC7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0xC8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0xC9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0xCA -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0xCB -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0xCC -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0xCD -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0xCE -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0xCF -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0xD0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0xD1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0xD2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0xD3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0xD4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0xD5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0xD6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0xD7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0xD8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0xD9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0xDA -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0xDB -> FULL BLOCK
u'\u2584' # 0xDC -> LOWER HALF BLOCK
u'\u258c' # 0xDD -> LEFT HALF BLOCK
u'\u2590' # 0xDE -> RIGHT HALF BLOCK
u'\u2580' # 0xDF -> UPPER HALF BLOCK
u'\u0636' # 0xE0 -> ARABIC LETTER DAD
u'\u0637' # 0xE1 -> ARABIC LETTER TAH
u'\u0638' # 0xE2 -> ARABIC LETTER ZAH
u'\u0639' # 0xE3 -> ARABIC LETTER AIN
u'\u063a' # 0xE4 -> ARABIC LETTER GHAIN
u'\u0641' # 0xE5 -> ARABIC LETTER FEH
u'\xb5' # 0xE6 -> MICRO SIGN
u'\u0642' # 0xE7 -> ARABIC LETTER QAF
u'\u0643' # 0xE8 -> ARABIC LETTER KAF
u'\u0644' # 0xE9 -> ARABIC LETTER LAM
u'\u0645' # 0xEA -> ARABIC LETTER MEEM
u'\u0646' # 0xEB -> ARABIC LETTER NOON
u'\u0647' # 0xEC -> ARABIC LETTER HEH
u'\u0648' # 0xED -> ARABIC LETTER WAW
u'\u0649' # 0xEE -> ARABIC LETTER ALEF MAKSURA
u'\u064a' # 0xEF -> ARABIC LETTER YEH
u'\u2261' # 0xF0 -> IDENTICAL TO
u'\u064b' # 0xF1 -> ARABIC FATHATAN
u'\u064c' # 0xF2 -> ARABIC DAMMATAN
u'\u064d' # 0xF3 -> ARABIC KASRATAN
u'\u064e' # 0xF4 -> ARABIC FATHA
u'\u064f' # 0xF5 -> ARABIC DAMMA
u'\u0650' # 0xF6 -> ARABIC KASRA
u'\u2248' # 0xF7 -> ALMOST EQUAL TO
u'\xb0' # 0xF8 -> DEGREE SIGN
u'\u2219' # 0xF9 -> BULLET OPERATOR
u'\xb7' # 0xFA -> MIDDLE DOT
u'\u221a' # 0xFB -> SQUARE ROOT
u'\u207f' # 0xFC -> SUPERSCRIPT LATIN SMALL LETTER N
u'\xb2' # 0xFD -> SUPERSCRIPT TWO
u'\u25a0' # 0xFE -> BLACK SQUARE
u'\xa0' # 0xFF -> NO-BREAK SPACE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp737 generated from 'VENDORS/MICSFT/PC/CP737.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp737',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0391, # GREEK CAPITAL LETTER ALPHA
0x0081: 0x0392, # GREEK CAPITAL LETTER BETA
0x0082: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x0083: 0x0394, # GREEK CAPITAL LETTER DELTA
0x0084: 0x0395, # GREEK CAPITAL LETTER EPSILON
0x0085: 0x0396, # GREEK CAPITAL LETTER ZETA
0x0086: 0x0397, # GREEK CAPITAL LETTER ETA
0x0087: 0x0398, # GREEK CAPITAL LETTER THETA
0x0088: 0x0399, # GREEK CAPITAL LETTER IOTA
0x0089: 0x039a, # GREEK CAPITAL LETTER KAPPA
0x008a: 0x039b, # GREEK CAPITAL LETTER LAMDA
0x008b: 0x039c, # GREEK CAPITAL LETTER MU
0x008c: 0x039d, # GREEK CAPITAL LETTER NU
0x008d: 0x039e, # GREEK CAPITAL LETTER XI
0x008e: 0x039f, # GREEK CAPITAL LETTER OMICRON
0x008f: 0x03a0, # GREEK CAPITAL LETTER PI
0x0090: 0x03a1, # GREEK CAPITAL LETTER RHO
0x0091: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x0092: 0x03a4, # GREEK CAPITAL LETTER TAU
0x0093: 0x03a5, # GREEK CAPITAL LETTER UPSILON
0x0094: 0x03a6, # GREEK CAPITAL LETTER PHI
0x0095: 0x03a7, # GREEK CAPITAL LETTER CHI
0x0096: 0x03a8, # GREEK CAPITAL LETTER PSI
0x0097: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x0098: 0x03b1, # GREEK SMALL LETTER ALPHA
0x0099: 0x03b2, # GREEK SMALL LETTER BETA
0x009a: 0x03b3, # GREEK SMALL LETTER GAMMA
0x009b: 0x03b4, # GREEK SMALL LETTER DELTA
0x009c: 0x03b5, # GREEK SMALL LETTER EPSILON
0x009d: 0x03b6, # GREEK SMALL LETTER ZETA
0x009e: 0x03b7, # GREEK SMALL LETTER ETA
0x009f: 0x03b8, # GREEK SMALL LETTER THETA
0x00a0: 0x03b9, # GREEK SMALL LETTER IOTA
0x00a1: 0x03ba, # GREEK SMALL LETTER KAPPA
0x00a2: 0x03bb, # GREEK SMALL LETTER LAMDA
0x00a3: 0x03bc, # GREEK SMALL LETTER MU
0x00a4: 0x03bd, # GREEK SMALL LETTER NU
0x00a5: 0x03be, # GREEK SMALL LETTER XI
0x00a6: 0x03bf, # GREEK SMALL LETTER OMICRON
0x00a7: 0x03c0, # GREEK SMALL LETTER PI
0x00a8: 0x03c1, # GREEK SMALL LETTER RHO
0x00a9: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00aa: 0x03c2, # GREEK SMALL LETTER FINAL SIGMA
0x00ab: 0x03c4, # GREEK SMALL LETTER TAU
0x00ac: 0x03c5, # GREEK SMALL LETTER UPSILON
0x00ad: 0x03c6, # GREEK SMALL LETTER PHI
0x00ae: 0x03c7, # GREEK SMALL LETTER CHI
0x00af: 0x03c8, # GREEK SMALL LETTER PSI
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03c9, # GREEK SMALL LETTER OMEGA
0x00e1: 0x03ac, # GREEK SMALL LETTER ALPHA WITH TONOS
0x00e2: 0x03ad, # GREEK SMALL LETTER EPSILON WITH TONOS
0x00e3: 0x03ae, # GREEK SMALL LETTER ETA WITH TONOS
0x00e4: 0x03ca, # GREEK SMALL LETTER IOTA WITH DIALYTIKA
0x00e5: 0x03af, # GREEK SMALL LETTER IOTA WITH TONOS
0x00e6: 0x03cc, # GREEK SMALL LETTER OMICRON WITH TONOS
0x00e7: 0x03cd, # GREEK SMALL LETTER UPSILON WITH TONOS
0x00e8: 0x03cb, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0x00e9: 0x03ce, # GREEK SMALL LETTER OMEGA WITH TONOS
0x00ea: 0x0386, # GREEK CAPITAL LETTER ALPHA WITH TONOS
0x00eb: 0x0388, # GREEK CAPITAL LETTER EPSILON WITH TONOS
0x00ec: 0x0389, # GREEK CAPITAL LETTER ETA WITH TONOS
0x00ed: 0x038a, # GREEK CAPITAL LETTER IOTA WITH TONOS
0x00ee: 0x038c, # GREEK CAPITAL LETTER OMICRON WITH TONOS
0x00ef: 0x038e, # GREEK CAPITAL LETTER UPSILON WITH TONOS
0x00f0: 0x038f, # GREEK CAPITAL LETTER OMEGA WITH TONOS
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x03aa, # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0x00f5: 0x03ab, # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\u0391' # 0x0080 -> GREEK CAPITAL LETTER ALPHA
u'\u0392' # 0x0081 -> GREEK CAPITAL LETTER BETA
u'\u0393' # 0x0082 -> GREEK CAPITAL LETTER GAMMA
u'\u0394' # 0x0083 -> GREEK CAPITAL LETTER DELTA
u'\u0395' # 0x0084 -> GREEK CAPITAL LETTER EPSILON
u'\u0396' # 0x0085 -> GREEK CAPITAL LETTER ZETA
u'\u0397' # 0x0086 -> GREEK CAPITAL LETTER ETA
u'\u0398' # 0x0087 -> GREEK CAPITAL LETTER THETA
u'\u0399' # 0x0088 -> GREEK CAPITAL LETTER IOTA
u'\u039a' # 0x0089 -> GREEK CAPITAL LETTER KAPPA
u'\u039b' # 0x008a -> GREEK CAPITAL LETTER LAMDA
u'\u039c' # 0x008b -> GREEK CAPITAL LETTER MU
u'\u039d' # 0x008c -> GREEK CAPITAL LETTER NU
u'\u039e' # 0x008d -> GREEK CAPITAL LETTER XI
u'\u039f' # 0x008e -> GREEK CAPITAL LETTER OMICRON
u'\u03a0' # 0x008f -> GREEK CAPITAL LETTER PI
u'\u03a1' # 0x0090 -> GREEK CAPITAL LETTER RHO
u'\u03a3' # 0x0091 -> GREEK CAPITAL LETTER SIGMA
u'\u03a4' # 0x0092 -> GREEK CAPITAL LETTER TAU
u'\u03a5' # 0x0093 -> GREEK CAPITAL LETTER UPSILON
u'\u03a6' # 0x0094 -> GREEK CAPITAL LETTER PHI
u'\u03a7' # 0x0095 -> GREEK CAPITAL LETTER CHI
u'\u03a8' # 0x0096 -> GREEK CAPITAL LETTER PSI
u'\u03a9' # 0x0097 -> GREEK CAPITAL LETTER OMEGA
u'\u03b1' # 0x0098 -> GREEK SMALL LETTER ALPHA
u'\u03b2' # 0x0099 -> GREEK SMALL LETTER BETA
u'\u03b3' # 0x009a -> GREEK SMALL LETTER GAMMA
u'\u03b4' # 0x009b -> GREEK SMALL LETTER DELTA
u'\u03b5' # 0x009c -> GREEK SMALL LETTER EPSILON
u'\u03b6' # 0x009d -> GREEK SMALL LETTER ZETA
u'\u03b7' # 0x009e -> GREEK SMALL LETTER ETA
u'\u03b8' # 0x009f -> GREEK SMALL LETTER THETA
u'\u03b9' # 0x00a0 -> GREEK SMALL LETTER IOTA
u'\u03ba' # 0x00a1 -> GREEK SMALL LETTER KAPPA
u'\u03bb' # 0x00a2 -> GREEK SMALL LETTER LAMDA
u'\u03bc' # 0x00a3 -> GREEK SMALL LETTER MU
u'\u03bd' # 0x00a4 -> GREEK SMALL LETTER NU
u'\u03be' # 0x00a5 -> GREEK SMALL LETTER XI
u'\u03bf' # 0x00a6 -> GREEK SMALL LETTER OMICRON
u'\u03c0' # 0x00a7 -> GREEK SMALL LETTER PI
u'\u03c1' # 0x00a8 -> GREEK SMALL LETTER RHO
u'\u03c3' # 0x00a9 -> GREEK SMALL LETTER SIGMA
u'\u03c2' # 0x00aa -> GREEK SMALL LETTER FINAL SIGMA
u'\u03c4' # 0x00ab -> GREEK SMALL LETTER TAU
u'\u03c5' # 0x00ac -> GREEK SMALL LETTER UPSILON
u'\u03c6' # 0x00ad -> GREEK SMALL LETTER PHI
u'\u03c7' # 0x00ae -> GREEK SMALL LETTER CHI
u'\u03c8' # 0x00af -> GREEK SMALL LETTER PSI
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u03c9' # 0x00e0 -> GREEK SMALL LETTER OMEGA
u'\u03ac' # 0x00e1 -> GREEK SMALL LETTER ALPHA WITH TONOS
u'\u03ad' # 0x00e2 -> GREEK SMALL LETTER EPSILON WITH TONOS
u'\u03ae' # 0x00e3 -> GREEK SMALL LETTER ETA WITH TONOS
u'\u03ca' # 0x00e4 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
u'\u03af' # 0x00e5 -> GREEK SMALL LETTER IOTA WITH TONOS
u'\u03cc' # 0x00e6 -> GREEK SMALL LETTER OMICRON WITH TONOS
u'\u03cd' # 0x00e7 -> GREEK SMALL LETTER UPSILON WITH TONOS
u'\u03cb' # 0x00e8 -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
u'\u03ce' # 0x00e9 -> GREEK SMALL LETTER OMEGA WITH TONOS
u'\u0386' # 0x00ea -> GREEK CAPITAL LETTER ALPHA WITH TONOS
u'\u0388' # 0x00eb -> GREEK CAPITAL LETTER EPSILON WITH TONOS
u'\u0389' # 0x00ec -> GREEK CAPITAL LETTER ETA WITH TONOS
u'\u038a' # 0x00ed -> GREEK CAPITAL LETTER IOTA WITH TONOS
u'\u038c' # 0x00ee -> GREEK CAPITAL LETTER OMICRON WITH TONOS
u'\u038e' # 0x00ef -> GREEK CAPITAL LETTER UPSILON WITH TONOS
u'\u038f' # 0x00f0 -> GREEK CAPITAL LETTER OMEGA WITH TONOS
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
u'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
u'\u03aa' # 0x00f4 -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
u'\u03ab' # 0x00f5 -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\u2248' # 0x00f7 -> ALMOST EQUAL TO
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\u221a' # 0x00fb -> SQUARE ROOT
u'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b7: 0x00fa, # MIDDLE DOT
0x00f7: 0x00f6, # DIVISION SIGN
0x0386: 0x00ea, # GREEK CAPITAL LETTER ALPHA WITH TONOS
0x0388: 0x00eb, # GREEK CAPITAL LETTER EPSILON WITH TONOS
0x0389: 0x00ec, # GREEK CAPITAL LETTER ETA WITH TONOS
0x038a: 0x00ed, # GREEK CAPITAL LETTER IOTA WITH TONOS
0x038c: 0x00ee, # GREEK CAPITAL LETTER OMICRON WITH TONOS
0x038e: 0x00ef, # GREEK CAPITAL LETTER UPSILON WITH TONOS
0x038f: 0x00f0, # GREEK CAPITAL LETTER OMEGA WITH TONOS
0x0391: 0x0080, # GREEK CAPITAL LETTER ALPHA
0x0392: 0x0081, # GREEK CAPITAL LETTER BETA
0x0393: 0x0082, # GREEK CAPITAL LETTER GAMMA
0x0394: 0x0083, # GREEK CAPITAL LETTER DELTA
0x0395: 0x0084, # GREEK CAPITAL LETTER EPSILON
0x0396: 0x0085, # GREEK CAPITAL LETTER ZETA
0x0397: 0x0086, # GREEK CAPITAL LETTER ETA
0x0398: 0x0087, # GREEK CAPITAL LETTER THETA
0x0399: 0x0088, # GREEK CAPITAL LETTER IOTA
0x039a: 0x0089, # GREEK CAPITAL LETTER KAPPA
0x039b: 0x008a, # GREEK CAPITAL LETTER LAMDA
0x039c: 0x008b, # GREEK CAPITAL LETTER MU
0x039d: 0x008c, # GREEK CAPITAL LETTER NU
0x039e: 0x008d, # GREEK CAPITAL LETTER XI
0x039f: 0x008e, # GREEK CAPITAL LETTER OMICRON
0x03a0: 0x008f, # GREEK CAPITAL LETTER PI
0x03a1: 0x0090, # GREEK CAPITAL LETTER RHO
0x03a3: 0x0091, # GREEK CAPITAL LETTER SIGMA
0x03a4: 0x0092, # GREEK CAPITAL LETTER TAU
0x03a5: 0x0093, # GREEK CAPITAL LETTER UPSILON
0x03a6: 0x0094, # GREEK CAPITAL LETTER PHI
0x03a7: 0x0095, # GREEK CAPITAL LETTER CHI
0x03a8: 0x0096, # GREEK CAPITAL LETTER PSI
0x03a9: 0x0097, # GREEK CAPITAL LETTER OMEGA
0x03aa: 0x00f4, # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0x03ab: 0x00f5, # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0x03ac: 0x00e1, # GREEK SMALL LETTER ALPHA WITH TONOS
0x03ad: 0x00e2, # GREEK SMALL LETTER EPSILON WITH TONOS
0x03ae: 0x00e3, # GREEK SMALL LETTER ETA WITH TONOS
0x03af: 0x00e5, # GREEK SMALL LETTER IOTA WITH TONOS
0x03b1: 0x0098, # GREEK SMALL LETTER ALPHA
0x03b2: 0x0099, # GREEK SMALL LETTER BETA
0x03b3: 0x009a, # GREEK SMALL LETTER GAMMA
0x03b4: 0x009b, # GREEK SMALL LETTER DELTA
0x03b5: 0x009c, # GREEK SMALL LETTER EPSILON
0x03b6: 0x009d, # GREEK SMALL LETTER ZETA
0x03b7: 0x009e, # GREEK SMALL LETTER ETA
0x03b8: 0x009f, # GREEK SMALL LETTER THETA
0x03b9: 0x00a0, # GREEK SMALL LETTER IOTA
0x03ba: 0x00a1, # GREEK SMALL LETTER KAPPA
0x03bb: 0x00a2, # GREEK SMALL LETTER LAMDA
0x03bc: 0x00a3, # GREEK SMALL LETTER MU
0x03bd: 0x00a4, # GREEK SMALL LETTER NU
0x03be: 0x00a5, # GREEK SMALL LETTER XI
0x03bf: 0x00a6, # GREEK SMALL LETTER OMICRON
0x03c0: 0x00a7, # GREEK SMALL LETTER PI
0x03c1: 0x00a8, # GREEK SMALL LETTER RHO
0x03c2: 0x00aa, # GREEK SMALL LETTER FINAL SIGMA
0x03c3: 0x00a9, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00ab, # GREEK SMALL LETTER TAU
0x03c5: 0x00ac, # GREEK SMALL LETTER UPSILON
0x03c6: 0x00ad, # GREEK SMALL LETTER PHI
0x03c7: 0x00ae, # GREEK SMALL LETTER CHI
0x03c8: 0x00af, # GREEK SMALL LETTER PSI
0x03c9: 0x00e0, # GREEK SMALL LETTER OMEGA
0x03ca: 0x00e4, # GREEK SMALL LETTER IOTA WITH DIALYTIKA
0x03cb: 0x00e8, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0x03cc: 0x00e6, # GREEK SMALL LETTER OMICRON WITH TONOS
0x03cd: 0x00e7, # GREEK SMALL LETTER UPSILON WITH TONOS
0x03ce: 0x00e9, # GREEK SMALL LETTER OMEGA WITH TONOS
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp775 generated from 'VENDORS/MICSFT/PC/CP775.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp775',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0106, # LATIN CAPITAL LETTER C WITH ACUTE
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x0101, # LATIN SMALL LETTER A WITH MACRON
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x0123, # LATIN SMALL LETTER G WITH CEDILLA
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x0107, # LATIN SMALL LETTER C WITH ACUTE
0x0088: 0x0142, # LATIN SMALL LETTER L WITH STROKE
0x0089: 0x0113, # LATIN SMALL LETTER E WITH MACRON
0x008a: 0x0156, # LATIN CAPITAL LETTER R WITH CEDILLA
0x008b: 0x0157, # LATIN SMALL LETTER R WITH CEDILLA
0x008c: 0x012b, # LATIN SMALL LETTER I WITH MACRON
0x008d: 0x0179, # LATIN CAPITAL LETTER Z WITH ACUTE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x014d, # LATIN SMALL LETTER O WITH MACRON
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x0122, # LATIN CAPITAL LETTER G WITH CEDILLA
0x0096: 0x00a2, # CENT SIGN
0x0097: 0x015a, # LATIN CAPITAL LETTER S WITH ACUTE
0x0098: 0x015b, # LATIN SMALL LETTER S WITH ACUTE
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x00d7, # MULTIPLICATION SIGN
0x009f: 0x00a4, # CURRENCY SIGN
0x00a0: 0x0100, # LATIN CAPITAL LETTER A WITH MACRON
0x00a1: 0x012a, # LATIN CAPITAL LETTER I WITH MACRON
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x017b, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x00a4: 0x017c, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x00a5: 0x017a, # LATIN SMALL LETTER Z WITH ACUTE
0x00a6: 0x201d, # RIGHT DOUBLE QUOTATION MARK
0x00a7: 0x00a6, # BROKEN BAR
0x00a8: 0x00a9, # COPYRIGHT SIGN
0x00a9: 0x00ae, # REGISTERED SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x0141, # LATIN CAPITAL LETTER L WITH STROKE
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x0104, # LATIN CAPITAL LETTER A WITH OGONEK
0x00b6: 0x010c, # LATIN CAPITAL LETTER C WITH CARON
0x00b7: 0x0118, # LATIN CAPITAL LETTER E WITH OGONEK
0x00b8: 0x0116, # LATIN CAPITAL LETTER E WITH DOT ABOVE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x012e, # LATIN CAPITAL LETTER I WITH OGONEK
0x00be: 0x0160, # LATIN CAPITAL LETTER S WITH CARON
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x0172, # LATIN CAPITAL LETTER U WITH OGONEK
0x00c7: 0x016a, # LATIN CAPITAL LETTER U WITH MACRON
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x017d, # LATIN CAPITAL LETTER Z WITH CARON
0x00d0: 0x0105, # LATIN SMALL LETTER A WITH OGONEK
0x00d1: 0x010d, # LATIN SMALL LETTER C WITH CARON
0x00d2: 0x0119, # LATIN SMALL LETTER E WITH OGONEK
0x00d3: 0x0117, # LATIN SMALL LETTER E WITH DOT ABOVE
0x00d4: 0x012f, # LATIN SMALL LETTER I WITH OGONEK
0x00d5: 0x0161, # LATIN SMALL LETTER S WITH CARON
0x00d6: 0x0173, # LATIN SMALL LETTER U WITH OGONEK
0x00d7: 0x016b, # LATIN SMALL LETTER U WITH MACRON
0x00d8: 0x017e, # LATIN SMALL LETTER Z WITH CARON
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S (GERMAN)
0x00e2: 0x014c, # LATIN CAPITAL LETTER O WITH MACRON
0x00e3: 0x0143, # LATIN CAPITAL LETTER N WITH ACUTE
0x00e4: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00e5: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x0144, # LATIN SMALL LETTER N WITH ACUTE
0x00e8: 0x0136, # LATIN CAPITAL LETTER K WITH CEDILLA
0x00e9: 0x0137, # LATIN SMALL LETTER K WITH CEDILLA
0x00ea: 0x013b, # LATIN CAPITAL LETTER L WITH CEDILLA
0x00eb: 0x013c, # LATIN SMALL LETTER L WITH CEDILLA
0x00ec: 0x0146, # LATIN SMALL LETTER N WITH CEDILLA
0x00ed: 0x0112, # LATIN CAPITAL LETTER E WITH MACRON
0x00ee: 0x0145, # LATIN CAPITAL LETTER N WITH CEDILLA
0x00ef: 0x2019, # RIGHT SINGLE QUOTATION MARK
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x201c, # LEFT DOUBLE QUOTATION MARK
0x00f3: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x201e, # DOUBLE LOW-9 QUOTATION MARK
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x00b9, # SUPERSCRIPT ONE
0x00fc: 0x00b3, # SUPERSCRIPT THREE
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\u0106' # 0x0080 -> LATIN CAPITAL LETTER C WITH ACUTE
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\u0101' # 0x0083 -> LATIN SMALL LETTER A WITH MACRON
u'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u0123' # 0x0085 -> LATIN SMALL LETTER G WITH CEDILLA
u'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\u0107' # 0x0087 -> LATIN SMALL LETTER C WITH ACUTE
u'\u0142' # 0x0088 -> LATIN SMALL LETTER L WITH STROKE
u'\u0113' # 0x0089 -> LATIN SMALL LETTER E WITH MACRON
u'\u0156' # 0x008a -> LATIN CAPITAL LETTER R WITH CEDILLA
u'\u0157' # 0x008b -> LATIN SMALL LETTER R WITH CEDILLA
u'\u012b' # 0x008c -> LATIN SMALL LETTER I WITH MACRON
u'\u0179' # 0x008d -> LATIN CAPITAL LETTER Z WITH ACUTE
u'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
u'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
u'\u014d' # 0x0093 -> LATIN SMALL LETTER O WITH MACRON
u'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\u0122' # 0x0095 -> LATIN CAPITAL LETTER G WITH CEDILLA
u'\xa2' # 0x0096 -> CENT SIGN
u'\u015a' # 0x0097 -> LATIN CAPITAL LETTER S WITH ACUTE
u'\u015b' # 0x0098 -> LATIN SMALL LETTER S WITH ACUTE
u'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
u'\xa3' # 0x009c -> POUND SIGN
u'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd7' # 0x009e -> MULTIPLICATION SIGN
u'\xa4' # 0x009f -> CURRENCY SIGN
u'\u0100' # 0x00a0 -> LATIN CAPITAL LETTER A WITH MACRON
u'\u012a' # 0x00a1 -> LATIN CAPITAL LETTER I WITH MACRON
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\u017b' # 0x00a3 -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\u017c' # 0x00a4 -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\u017a' # 0x00a5 -> LATIN SMALL LETTER Z WITH ACUTE
u'\u201d' # 0x00a6 -> RIGHT DOUBLE QUOTATION MARK
u'\xa6' # 0x00a7 -> BROKEN BAR
u'\xa9' # 0x00a8 -> COPYRIGHT SIGN
u'\xae' # 0x00a9 -> REGISTERED SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\u0141' # 0x00ad -> LATIN CAPITAL LETTER L WITH STROKE
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u0104' # 0x00b5 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\u010c' # 0x00b6 -> LATIN CAPITAL LETTER C WITH CARON
u'\u0118' # 0x00b7 -> LATIN CAPITAL LETTER E WITH OGONEK
u'\u0116' # 0x00b8 -> LATIN CAPITAL LETTER E WITH DOT ABOVE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u012e' # 0x00bd -> LATIN CAPITAL LETTER I WITH OGONEK
u'\u0160' # 0x00be -> LATIN CAPITAL LETTER S WITH CARON
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u0172' # 0x00c6 -> LATIN CAPITAL LETTER U WITH OGONEK
u'\u016a' # 0x00c7 -> LATIN CAPITAL LETTER U WITH MACRON
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u017d' # 0x00cf -> LATIN CAPITAL LETTER Z WITH CARON
u'\u0105' # 0x00d0 -> LATIN SMALL LETTER A WITH OGONEK
u'\u010d' # 0x00d1 -> LATIN SMALL LETTER C WITH CARON
u'\u0119' # 0x00d2 -> LATIN SMALL LETTER E WITH OGONEK
u'\u0117' # 0x00d3 -> LATIN SMALL LETTER E WITH DOT ABOVE
u'\u012f' # 0x00d4 -> LATIN SMALL LETTER I WITH OGONEK
u'\u0161' # 0x00d5 -> LATIN SMALL LETTER S WITH CARON
u'\u0173' # 0x00d6 -> LATIN SMALL LETTER U WITH OGONEK
u'\u016b' # 0x00d7 -> LATIN SMALL LETTER U WITH MACRON
u'\u017e' # 0x00d8 -> LATIN SMALL LETTER Z WITH CARON
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S (GERMAN)
u'\u014c' # 0x00e2 -> LATIN CAPITAL LETTER O WITH MACRON
u'\u0143' # 0x00e3 -> LATIN CAPITAL LETTER N WITH ACUTE
u'\xf5' # 0x00e4 -> LATIN SMALL LETTER O WITH TILDE
u'\xd5' # 0x00e5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\u0144' # 0x00e7 -> LATIN SMALL LETTER N WITH ACUTE
u'\u0136' # 0x00e8 -> LATIN CAPITAL LETTER K WITH CEDILLA
u'\u0137' # 0x00e9 -> LATIN SMALL LETTER K WITH CEDILLA
u'\u013b' # 0x00ea -> LATIN CAPITAL LETTER L WITH CEDILLA
u'\u013c' # 0x00eb -> LATIN SMALL LETTER L WITH CEDILLA
u'\u0146' # 0x00ec -> LATIN SMALL LETTER N WITH CEDILLA
u'\u0112' # 0x00ed -> LATIN CAPITAL LETTER E WITH MACRON
u'\u0145' # 0x00ee -> LATIN CAPITAL LETTER N WITH CEDILLA
u'\u2019' # 0x00ef -> RIGHT SINGLE QUOTATION MARK
u'\xad' # 0x00f0 -> SOFT HYPHEN
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u201c' # 0x00f2 -> LEFT DOUBLE QUOTATION MARK
u'\xbe' # 0x00f3 -> VULGAR FRACTION THREE QUARTERS
u'\xb6' # 0x00f4 -> PILCROW SIGN
u'\xa7' # 0x00f5 -> SECTION SIGN
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\u201e' # 0x00f7 -> DOUBLE LOW-9 QUOTATION MARK
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\xb9' # 0x00fb -> SUPERSCRIPT ONE
u'\xb3' # 0x00fc -> SUPERSCRIPT THREE
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a2: 0x0096, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x009f, # CURRENCY SIGN
0x00a6: 0x00a7, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a9: 0x00a8, # COPYRIGHT SIGN
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00ae: 0x00a9, # REGISTERED SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00fc, # SUPERSCRIPT THREE
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x00f4, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b9: 0x00fb, # SUPERSCRIPT ONE
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00f3, # VULGAR FRACTION THREE QUARTERS
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d5: 0x00e5, # LATIN CAPITAL LETTER O WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x009e, # MULTIPLICATION SIGN
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S (GERMAN)
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f5: 0x00e4, # LATIN SMALL LETTER O WITH TILDE
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x0100: 0x00a0, # LATIN CAPITAL LETTER A WITH MACRON
0x0101: 0x0083, # LATIN SMALL LETTER A WITH MACRON
0x0104: 0x00b5, # LATIN CAPITAL LETTER A WITH OGONEK
0x0105: 0x00d0, # LATIN SMALL LETTER A WITH OGONEK
0x0106: 0x0080, # LATIN CAPITAL LETTER C WITH ACUTE
0x0107: 0x0087, # LATIN SMALL LETTER C WITH ACUTE
0x010c: 0x00b6, # LATIN CAPITAL LETTER C WITH CARON
0x010d: 0x00d1, # LATIN SMALL LETTER C WITH CARON
0x0112: 0x00ed, # LATIN CAPITAL LETTER E WITH MACRON
0x0113: 0x0089, # LATIN SMALL LETTER E WITH MACRON
0x0116: 0x00b8, # LATIN CAPITAL LETTER E WITH DOT ABOVE
0x0117: 0x00d3, # LATIN SMALL LETTER E WITH DOT ABOVE
0x0118: 0x00b7, # LATIN CAPITAL LETTER E WITH OGONEK
0x0119: 0x00d2, # LATIN SMALL LETTER E WITH OGONEK
0x0122: 0x0095, # LATIN CAPITAL LETTER G WITH CEDILLA
0x0123: 0x0085, # LATIN SMALL LETTER G WITH CEDILLA
0x012a: 0x00a1, # LATIN CAPITAL LETTER I WITH MACRON
0x012b: 0x008c, # LATIN SMALL LETTER I WITH MACRON
0x012e: 0x00bd, # LATIN CAPITAL LETTER I WITH OGONEK
0x012f: 0x00d4, # LATIN SMALL LETTER I WITH OGONEK
0x0136: 0x00e8, # LATIN CAPITAL LETTER K WITH CEDILLA
0x0137: 0x00e9, # LATIN SMALL LETTER K WITH CEDILLA
0x013b: 0x00ea, # LATIN CAPITAL LETTER L WITH CEDILLA
0x013c: 0x00eb, # LATIN SMALL LETTER L WITH CEDILLA
0x0141: 0x00ad, # LATIN CAPITAL LETTER L WITH STROKE
0x0142: 0x0088, # LATIN SMALL LETTER L WITH STROKE
0x0143: 0x00e3, # LATIN CAPITAL LETTER N WITH ACUTE
0x0144: 0x00e7, # LATIN SMALL LETTER N WITH ACUTE
0x0145: 0x00ee, # LATIN CAPITAL LETTER N WITH CEDILLA
0x0146: 0x00ec, # LATIN SMALL LETTER N WITH CEDILLA
0x014c: 0x00e2, # LATIN CAPITAL LETTER O WITH MACRON
0x014d: 0x0093, # LATIN SMALL LETTER O WITH MACRON
0x0156: 0x008a, # LATIN CAPITAL LETTER R WITH CEDILLA
0x0157: 0x008b, # LATIN SMALL LETTER R WITH CEDILLA
0x015a: 0x0097, # LATIN CAPITAL LETTER S WITH ACUTE
0x015b: 0x0098, # LATIN SMALL LETTER S WITH ACUTE
0x0160: 0x00be, # LATIN CAPITAL LETTER S WITH CARON
0x0161: 0x00d5, # LATIN SMALL LETTER S WITH CARON
0x016a: 0x00c7, # LATIN CAPITAL LETTER U WITH MACRON
0x016b: 0x00d7, # LATIN SMALL LETTER U WITH MACRON
0x0172: 0x00c6, # LATIN CAPITAL LETTER U WITH OGONEK
0x0173: 0x00d6, # LATIN SMALL LETTER U WITH OGONEK
0x0179: 0x008d, # LATIN CAPITAL LETTER Z WITH ACUTE
0x017a: 0x00a5, # LATIN SMALL LETTER Z WITH ACUTE
0x017b: 0x00a3, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x017c: 0x00a4, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x017d: 0x00cf, # LATIN CAPITAL LETTER Z WITH CARON
0x017e: 0x00d8, # LATIN SMALL LETTER Z WITH CARON
0x2019: 0x00ef, # RIGHT SINGLE QUOTATION MARK
0x201c: 0x00f2, # LEFT DOUBLE QUOTATION MARK
0x201d: 0x00a6, # RIGHT DOUBLE QUOTATION MARK
0x201e: 0x00f7, # DOUBLE LOW-9 QUOTATION MARK
0x2219: 0x00f9, # BULLET OPERATOR
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP850.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp850',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x00d7, # MULTIPLICATION SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x00ae, # REGISTERED SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00b6: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00b7: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x00b8: 0x00a9, # COPYRIGHT SIGN
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x00a2, # CENT SIGN
0x00be: 0x00a5, # YEN SIGN
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x00c7: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x00f0, # LATIN SMALL LETTER ETH
0x00d1: 0x00d0, # LATIN CAPITAL LETTER ETH
0x00d2: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00d3: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00d4: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x00d5: 0x0131, # LATIN SMALL LETTER DOTLESS I
0x00d6: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d7: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d8: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x00a6, # BROKEN BAR
0x00de: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e3: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00e4: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00e5: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x00fe, # LATIN SMALL LETTER THORN
0x00e8: 0x00de, # LATIN CAPITAL LETTER THORN
0x00e9: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ea: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00eb: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x00ec: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x00ed: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00ee: 0x00af, # MACRON
0x00ef: 0x00b4, # ACUTE ACCENT
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2017, # DOUBLE LOW LINE
0x00f3: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x00b8, # CEDILLA
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x00b9, # SUPERSCRIPT ONE
0x00fc: 0x00b3, # SUPERSCRIPT THREE
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
u'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
u'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
u'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
u'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
u'\xff' # 0x0098 -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
u'\xa3' # 0x009c -> POUND SIGN
u'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd7' # 0x009e -> MULTIPLICATION SIGN
u'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
u'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
u'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
u'\xae' # 0x00a9 -> REGISTERED SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\xc1' # 0x00b5 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0x00b6 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc0' # 0x00b7 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xa9' # 0x00b8 -> COPYRIGHT SIGN
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\xa2' # 0x00bd -> CENT SIGN
u'\xa5' # 0x00be -> YEN SIGN
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\xe3' # 0x00c6 -> LATIN SMALL LETTER A WITH TILDE
u'\xc3' # 0x00c7 -> LATIN CAPITAL LETTER A WITH TILDE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\xa4' # 0x00cf -> CURRENCY SIGN
u'\xf0' # 0x00d0 -> LATIN SMALL LETTER ETH
u'\xd0' # 0x00d1 -> LATIN CAPITAL LETTER ETH
u'\xca' # 0x00d2 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0x00d3 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0x00d4 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\u0131' # 0x00d5 -> LATIN SMALL LETTER DOTLESS I
u'\xcd' # 0x00d6 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0x00d7 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0x00d8 -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\xa6' # 0x00dd -> BROKEN BAR
u'\xcc' # 0x00de -> LATIN CAPITAL LETTER I WITH GRAVE
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\xd4' # 0x00e2 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd2' # 0x00e3 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xf5' # 0x00e4 -> LATIN SMALL LETTER O WITH TILDE
u'\xd5' # 0x00e5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\xfe' # 0x00e7 -> LATIN SMALL LETTER THORN
u'\xde' # 0x00e8 -> LATIN CAPITAL LETTER THORN
u'\xda' # 0x00e9 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0x00ea -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xd9' # 0x00eb -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xfd' # 0x00ec -> LATIN SMALL LETTER Y WITH ACUTE
u'\xdd' # 0x00ed -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xaf' # 0x00ee -> MACRON
u'\xb4' # 0x00ef -> ACUTE ACCENT
u'\xad' # 0x00f0 -> SOFT HYPHEN
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2017' # 0x00f2 -> DOUBLE LOW LINE
u'\xbe' # 0x00f3 -> VULGAR FRACTION THREE QUARTERS
u'\xb6' # 0x00f4 -> PILCROW SIGN
u'\xa7' # 0x00f5 -> SECTION SIGN
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\xb8' # 0x00f7 -> CEDILLA
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\xa8' # 0x00f9 -> DIAERESIS
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\xb9' # 0x00fb -> SUPERSCRIPT ONE
u'\xb3' # 0x00fc -> SUPERSCRIPT THREE
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x00bd, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a5: 0x00be, # YEN SIGN
0x00a6: 0x00dd, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00a9: 0x00b8, # COPYRIGHT SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00ae: 0x00a9, # REGISTERED SIGN
0x00af: 0x00ee, # MACRON
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00fc, # SUPERSCRIPT THREE
0x00b4: 0x00ef, # ACUTE ACCENT
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x00f4, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b8: 0x00f7, # CEDILLA
0x00b9: 0x00fb, # SUPERSCRIPT ONE
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00f3, # VULGAR FRACTION THREE QUARTERS
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c0: 0x00b7, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c1: 0x00b5, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x00b6, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c3: 0x00c7, # LATIN CAPITAL LETTER A WITH TILDE
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x00d4, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x00d2, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cb: 0x00d3, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00cc: 0x00de, # LATIN CAPITAL LETTER I WITH GRAVE
0x00cd: 0x00d6, # LATIN CAPITAL LETTER I WITH ACUTE
0x00ce: 0x00d7, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00cf: 0x00d8, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d0: 0x00d1, # LATIN CAPITAL LETTER ETH
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d2: 0x00e3, # LATIN CAPITAL LETTER O WITH GRAVE
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x00e2, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d5: 0x00e5, # LATIN CAPITAL LETTER O WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x009e, # MULTIPLICATION SIGN
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00d9: 0x00eb, # LATIN CAPITAL LETTER U WITH GRAVE
0x00da: 0x00e9, # LATIN CAPITAL LETTER U WITH ACUTE
0x00db: 0x00ea, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dd: 0x00ed, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00de: 0x00e8, # LATIN CAPITAL LETTER THORN
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e3: 0x00c6, # LATIN SMALL LETTER A WITH TILDE
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f0: 0x00d0, # LATIN SMALL LETTER ETH
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f5: 0x00e4, # LATIN SMALL LETTER O WITH TILDE
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00fd: 0x00ec, # LATIN SMALL LETTER Y WITH ACUTE
0x00fe: 0x00e7, # LATIN SMALL LETTER THORN
0x00ff: 0x0098, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0131: 0x00d5, # LATIN SMALL LETTER DOTLESS I
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x2017: 0x00f2, # DOUBLE LOW LINE
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP852.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp852',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x016f, # LATIN SMALL LETTER U WITH RING ABOVE
0x0086: 0x0107, # LATIN SMALL LETTER C WITH ACUTE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x0142, # LATIN SMALL LETTER L WITH STROKE
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x0150, # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
0x008b: 0x0151, # LATIN SMALL LETTER O WITH DOUBLE ACUTE
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x0179, # LATIN CAPITAL LETTER Z WITH ACUTE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x0106, # LATIN CAPITAL LETTER C WITH ACUTE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x0139, # LATIN CAPITAL LETTER L WITH ACUTE
0x0092: 0x013a, # LATIN SMALL LETTER L WITH ACUTE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x013d, # LATIN CAPITAL LETTER L WITH CARON
0x0096: 0x013e, # LATIN SMALL LETTER L WITH CARON
0x0097: 0x015a, # LATIN CAPITAL LETTER S WITH ACUTE
0x0098: 0x015b, # LATIN SMALL LETTER S WITH ACUTE
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x0164, # LATIN CAPITAL LETTER T WITH CARON
0x009c: 0x0165, # LATIN SMALL LETTER T WITH CARON
0x009d: 0x0141, # LATIN CAPITAL LETTER L WITH STROKE
0x009e: 0x00d7, # MULTIPLICATION SIGN
0x009f: 0x010d, # LATIN SMALL LETTER C WITH CARON
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x0104, # LATIN CAPITAL LETTER A WITH OGONEK
0x00a5: 0x0105, # LATIN SMALL LETTER A WITH OGONEK
0x00a6: 0x017d, # LATIN CAPITAL LETTER Z WITH CARON
0x00a7: 0x017e, # LATIN SMALL LETTER Z WITH CARON
0x00a8: 0x0118, # LATIN CAPITAL LETTER E WITH OGONEK
0x00a9: 0x0119, # LATIN SMALL LETTER E WITH OGONEK
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x017a, # LATIN SMALL LETTER Z WITH ACUTE
0x00ac: 0x010c, # LATIN CAPITAL LETTER C WITH CARON
0x00ad: 0x015f, # LATIN SMALL LETTER S WITH CEDILLA
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00b6: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00b7: 0x011a, # LATIN CAPITAL LETTER E WITH CARON
0x00b8: 0x015e, # LATIN CAPITAL LETTER S WITH CEDILLA
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x017b, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x00be: 0x017c, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x0102, # LATIN CAPITAL LETTER A WITH BREVE
0x00c7: 0x0103, # LATIN SMALL LETTER A WITH BREVE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x0111, # LATIN SMALL LETTER D WITH STROKE
0x00d1: 0x0110, # LATIN CAPITAL LETTER D WITH STROKE
0x00d2: 0x010e, # LATIN CAPITAL LETTER D WITH CARON
0x00d3: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00d4: 0x010f, # LATIN SMALL LETTER D WITH CARON
0x00d5: 0x0147, # LATIN CAPITAL LETTER N WITH CARON
0x00d6: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d7: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d8: 0x011b, # LATIN SMALL LETTER E WITH CARON
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x0162, # LATIN CAPITAL LETTER T WITH CEDILLA
0x00de: 0x016e, # LATIN CAPITAL LETTER U WITH RING ABOVE
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e3: 0x0143, # LATIN CAPITAL LETTER N WITH ACUTE
0x00e4: 0x0144, # LATIN SMALL LETTER N WITH ACUTE
0x00e5: 0x0148, # LATIN SMALL LETTER N WITH CARON
0x00e6: 0x0160, # LATIN CAPITAL LETTER S WITH CARON
0x00e7: 0x0161, # LATIN SMALL LETTER S WITH CARON
0x00e8: 0x0154, # LATIN CAPITAL LETTER R WITH ACUTE
0x00e9: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ea: 0x0155, # LATIN SMALL LETTER R WITH ACUTE
0x00eb: 0x0170, # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
0x00ec: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x00ed: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00ee: 0x0163, # LATIN SMALL LETTER T WITH CEDILLA
0x00ef: 0x00b4, # ACUTE ACCENT
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x02dd, # DOUBLE ACUTE ACCENT
0x00f2: 0x02db, # OGONEK
0x00f3: 0x02c7, # CARON
0x00f4: 0x02d8, # BREVE
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x00b8, # CEDILLA
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x02d9, # DOT ABOVE
0x00fb: 0x0171, # LATIN SMALL LETTER U WITH DOUBLE ACUTE
0x00fc: 0x0158, # LATIN CAPITAL LETTER R WITH CARON
0x00fd: 0x0159, # LATIN SMALL LETTER R WITH CARON
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u016f' # 0x0085 -> LATIN SMALL LETTER U WITH RING ABOVE
u'\u0107' # 0x0086 -> LATIN SMALL LETTER C WITH ACUTE
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\u0142' # 0x0088 -> LATIN SMALL LETTER L WITH STROKE
u'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\u0150' # 0x008a -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
u'\u0151' # 0x008b -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
u'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\u0179' # 0x008d -> LATIN CAPITAL LETTER Z WITH ACUTE
u'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\u0106' # 0x008f -> LATIN CAPITAL LETTER C WITH ACUTE
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\u0139' # 0x0091 -> LATIN CAPITAL LETTER L WITH ACUTE
u'\u013a' # 0x0092 -> LATIN SMALL LETTER L WITH ACUTE
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\u013d' # 0x0095 -> LATIN CAPITAL LETTER L WITH CARON
u'\u013e' # 0x0096 -> LATIN SMALL LETTER L WITH CARON
u'\u015a' # 0x0097 -> LATIN CAPITAL LETTER S WITH ACUTE
u'\u015b' # 0x0098 -> LATIN SMALL LETTER S WITH ACUTE
u'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u0164' # 0x009b -> LATIN CAPITAL LETTER T WITH CARON
u'\u0165' # 0x009c -> LATIN SMALL LETTER T WITH CARON
u'\u0141' # 0x009d -> LATIN CAPITAL LETTER L WITH STROKE
u'\xd7' # 0x009e -> MULTIPLICATION SIGN
u'\u010d' # 0x009f -> LATIN SMALL LETTER C WITH CARON
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\u0104' # 0x00a4 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\u0105' # 0x00a5 -> LATIN SMALL LETTER A WITH OGONEK
u'\u017d' # 0x00a6 -> LATIN CAPITAL LETTER Z WITH CARON
u'\u017e' # 0x00a7 -> LATIN SMALL LETTER Z WITH CARON
u'\u0118' # 0x00a8 -> LATIN CAPITAL LETTER E WITH OGONEK
u'\u0119' # 0x00a9 -> LATIN SMALL LETTER E WITH OGONEK
u'\xac' # 0x00aa -> NOT SIGN
u'\u017a' # 0x00ab -> LATIN SMALL LETTER Z WITH ACUTE
u'\u010c' # 0x00ac -> LATIN CAPITAL LETTER C WITH CARON
u'\u015f' # 0x00ad -> LATIN SMALL LETTER S WITH CEDILLA
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\xc1' # 0x00b5 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0x00b6 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\u011a' # 0x00b7 -> LATIN CAPITAL LETTER E WITH CARON
u'\u015e' # 0x00b8 -> LATIN CAPITAL LETTER S WITH CEDILLA
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u017b' # 0x00bd -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\u017c' # 0x00be -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u0102' # 0x00c6 -> LATIN CAPITAL LETTER A WITH BREVE
u'\u0103' # 0x00c7 -> LATIN SMALL LETTER A WITH BREVE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\xa4' # 0x00cf -> CURRENCY SIGN
u'\u0111' # 0x00d0 -> LATIN SMALL LETTER D WITH STROKE
u'\u0110' # 0x00d1 -> LATIN CAPITAL LETTER D WITH STROKE
u'\u010e' # 0x00d2 -> LATIN CAPITAL LETTER D WITH CARON
u'\xcb' # 0x00d3 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\u010f' # 0x00d4 -> LATIN SMALL LETTER D WITH CARON
u'\u0147' # 0x00d5 -> LATIN CAPITAL LETTER N WITH CARON
u'\xcd' # 0x00d6 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0x00d7 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\u011b' # 0x00d8 -> LATIN SMALL LETTER E WITH CARON
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u0162' # 0x00dd -> LATIN CAPITAL LETTER T WITH CEDILLA
u'\u016e' # 0x00de -> LATIN CAPITAL LETTER U WITH RING ABOVE
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\xd4' # 0x00e2 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\u0143' # 0x00e3 -> LATIN CAPITAL LETTER N WITH ACUTE
u'\u0144' # 0x00e4 -> LATIN SMALL LETTER N WITH ACUTE
u'\u0148' # 0x00e5 -> LATIN SMALL LETTER N WITH CARON
u'\u0160' # 0x00e6 -> LATIN CAPITAL LETTER S WITH CARON
u'\u0161' # 0x00e7 -> LATIN SMALL LETTER S WITH CARON
u'\u0154' # 0x00e8 -> LATIN CAPITAL LETTER R WITH ACUTE
u'\xda' # 0x00e9 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\u0155' # 0x00ea -> LATIN SMALL LETTER R WITH ACUTE
u'\u0170' # 0x00eb -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
u'\xfd' # 0x00ec -> LATIN SMALL LETTER Y WITH ACUTE
u'\xdd' # 0x00ed -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\u0163' # 0x00ee -> LATIN SMALL LETTER T WITH CEDILLA
u'\xb4' # 0x00ef -> ACUTE ACCENT
u'\xad' # 0x00f0 -> SOFT HYPHEN
u'\u02dd' # 0x00f1 -> DOUBLE ACUTE ACCENT
u'\u02db' # 0x00f2 -> OGONEK
u'\u02c7' # 0x00f3 -> CARON
u'\u02d8' # 0x00f4 -> BREVE
u'\xa7' # 0x00f5 -> SECTION SIGN
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\xb8' # 0x00f7 -> CEDILLA
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\xa8' # 0x00f9 -> DIAERESIS
u'\u02d9' # 0x00fa -> DOT ABOVE
u'\u0171' # 0x00fb -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
u'\u0158' # 0x00fc -> LATIN CAPITAL LETTER R WITH CARON
u'\u0159' # 0x00fd -> LATIN SMALL LETTER R WITH CARON
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b4: 0x00ef, # ACUTE ACCENT
0x00b8: 0x00f7, # CEDILLA
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00c1: 0x00b5, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x00b6, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00cb: 0x00d3, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00cd: 0x00d6, # LATIN CAPITAL LETTER I WITH ACUTE
0x00ce: 0x00d7, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x00e2, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x009e, # MULTIPLICATION SIGN
0x00da: 0x00e9, # LATIN CAPITAL LETTER U WITH ACUTE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dd: 0x00ed, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00fd: 0x00ec, # LATIN SMALL LETTER Y WITH ACUTE
0x0102: 0x00c6, # LATIN CAPITAL LETTER A WITH BREVE
0x0103: 0x00c7, # LATIN SMALL LETTER A WITH BREVE
0x0104: 0x00a4, # LATIN CAPITAL LETTER A WITH OGONEK
0x0105: 0x00a5, # LATIN SMALL LETTER A WITH OGONEK
0x0106: 0x008f, # LATIN CAPITAL LETTER C WITH ACUTE
0x0107: 0x0086, # LATIN SMALL LETTER C WITH ACUTE
0x010c: 0x00ac, # LATIN CAPITAL LETTER C WITH CARON
0x010d: 0x009f, # LATIN SMALL LETTER C WITH CARON
0x010e: 0x00d2, # LATIN CAPITAL LETTER D WITH CARON
0x010f: 0x00d4, # LATIN SMALL LETTER D WITH CARON
0x0110: 0x00d1, # LATIN CAPITAL LETTER D WITH STROKE
0x0111: 0x00d0, # LATIN SMALL LETTER D WITH STROKE
0x0118: 0x00a8, # LATIN CAPITAL LETTER E WITH OGONEK
0x0119: 0x00a9, # LATIN SMALL LETTER E WITH OGONEK
0x011a: 0x00b7, # LATIN CAPITAL LETTER E WITH CARON
0x011b: 0x00d8, # LATIN SMALL LETTER E WITH CARON
0x0139: 0x0091, # LATIN CAPITAL LETTER L WITH ACUTE
0x013a: 0x0092, # LATIN SMALL LETTER L WITH ACUTE
0x013d: 0x0095, # LATIN CAPITAL LETTER L WITH CARON
0x013e: 0x0096, # LATIN SMALL LETTER L WITH CARON
0x0141: 0x009d, # LATIN CAPITAL LETTER L WITH STROKE
0x0142: 0x0088, # LATIN SMALL LETTER L WITH STROKE
0x0143: 0x00e3, # LATIN CAPITAL LETTER N WITH ACUTE
0x0144: 0x00e4, # LATIN SMALL LETTER N WITH ACUTE
0x0147: 0x00d5, # LATIN CAPITAL LETTER N WITH CARON
0x0148: 0x00e5, # LATIN SMALL LETTER N WITH CARON
0x0150: 0x008a, # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
0x0151: 0x008b, # LATIN SMALL LETTER O WITH DOUBLE ACUTE
0x0154: 0x00e8, # LATIN CAPITAL LETTER R WITH ACUTE
0x0155: 0x00ea, # LATIN SMALL LETTER R WITH ACUTE
0x0158: 0x00fc, # LATIN CAPITAL LETTER R WITH CARON
0x0159: 0x00fd, # LATIN SMALL LETTER R WITH CARON
0x015a: 0x0097, # LATIN CAPITAL LETTER S WITH ACUTE
0x015b: 0x0098, # LATIN SMALL LETTER S WITH ACUTE
0x015e: 0x00b8, # LATIN CAPITAL LETTER S WITH CEDILLA
0x015f: 0x00ad, # LATIN SMALL LETTER S WITH CEDILLA
0x0160: 0x00e6, # LATIN CAPITAL LETTER S WITH CARON
0x0161: 0x00e7, # LATIN SMALL LETTER S WITH CARON
0x0162: 0x00dd, # LATIN CAPITAL LETTER T WITH CEDILLA
0x0163: 0x00ee, # LATIN SMALL LETTER T WITH CEDILLA
0x0164: 0x009b, # LATIN CAPITAL LETTER T WITH CARON
0x0165: 0x009c, # LATIN SMALL LETTER T WITH CARON
0x016e: 0x00de, # LATIN CAPITAL LETTER U WITH RING ABOVE
0x016f: 0x0085, # LATIN SMALL LETTER U WITH RING ABOVE
0x0170: 0x00eb, # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
0x0171: 0x00fb, # LATIN SMALL LETTER U WITH DOUBLE ACUTE
0x0179: 0x008d, # LATIN CAPITAL LETTER Z WITH ACUTE
0x017a: 0x00ab, # LATIN SMALL LETTER Z WITH ACUTE
0x017b: 0x00bd, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x017c: 0x00be, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x017d: 0x00a6, # LATIN CAPITAL LETTER Z WITH CARON
0x017e: 0x00a7, # LATIN SMALL LETTER Z WITH CARON
0x02c7: 0x00f3, # CARON
0x02d8: 0x00f4, # BREVE
0x02d9: 0x00fa, # DOT ABOVE
0x02db: 0x00f2, # OGONEK
0x02dd: 0x00f1, # DOUBLE ACUTE ACCENT
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP855.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp855',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0452, # CYRILLIC SMALL LETTER DJE
0x0081: 0x0402, # CYRILLIC CAPITAL LETTER DJE
0x0082: 0x0453, # CYRILLIC SMALL LETTER GJE
0x0083: 0x0403, # CYRILLIC CAPITAL LETTER GJE
0x0084: 0x0451, # CYRILLIC SMALL LETTER IO
0x0085: 0x0401, # CYRILLIC CAPITAL LETTER IO
0x0086: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x0087: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x0088: 0x0455, # CYRILLIC SMALL LETTER DZE
0x0089: 0x0405, # CYRILLIC CAPITAL LETTER DZE
0x008a: 0x0456, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0x008b: 0x0406, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0x008c: 0x0457, # CYRILLIC SMALL LETTER YI
0x008d: 0x0407, # CYRILLIC CAPITAL LETTER YI
0x008e: 0x0458, # CYRILLIC SMALL LETTER JE
0x008f: 0x0408, # CYRILLIC CAPITAL LETTER JE
0x0090: 0x0459, # CYRILLIC SMALL LETTER LJE
0x0091: 0x0409, # CYRILLIC CAPITAL LETTER LJE
0x0092: 0x045a, # CYRILLIC SMALL LETTER NJE
0x0093: 0x040a, # CYRILLIC CAPITAL LETTER NJE
0x0094: 0x045b, # CYRILLIC SMALL LETTER TSHE
0x0095: 0x040b, # CYRILLIC CAPITAL LETTER TSHE
0x0096: 0x045c, # CYRILLIC SMALL LETTER KJE
0x0097: 0x040c, # CYRILLIC CAPITAL LETTER KJE
0x0098: 0x045e, # CYRILLIC SMALL LETTER SHORT U
0x0099: 0x040e, # CYRILLIC CAPITAL LETTER SHORT U
0x009a: 0x045f, # CYRILLIC SMALL LETTER DZHE
0x009b: 0x040f, # CYRILLIC CAPITAL LETTER DZHE
0x009c: 0x044e, # CYRILLIC SMALL LETTER YU
0x009d: 0x042e, # CYRILLIC CAPITAL LETTER YU
0x009e: 0x044a, # CYRILLIC SMALL LETTER HARD SIGN
0x009f: 0x042a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x00a0: 0x0430, # CYRILLIC SMALL LETTER A
0x00a1: 0x0410, # CYRILLIC CAPITAL LETTER A
0x00a2: 0x0431, # CYRILLIC SMALL LETTER BE
0x00a3: 0x0411, # CYRILLIC CAPITAL LETTER BE
0x00a4: 0x0446, # CYRILLIC SMALL LETTER TSE
0x00a5: 0x0426, # CYRILLIC CAPITAL LETTER TSE
0x00a6: 0x0434, # CYRILLIC SMALL LETTER DE
0x00a7: 0x0414, # CYRILLIC CAPITAL LETTER DE
0x00a8: 0x0435, # CYRILLIC SMALL LETTER IE
0x00a9: 0x0415, # CYRILLIC CAPITAL LETTER IE
0x00aa: 0x0444, # CYRILLIC SMALL LETTER EF
0x00ab: 0x0424, # CYRILLIC CAPITAL LETTER EF
0x00ac: 0x0433, # CYRILLIC SMALL LETTER GHE
0x00ad: 0x0413, # CYRILLIC CAPITAL LETTER GHE
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x0445, # CYRILLIC SMALL LETTER HA
0x00b6: 0x0425, # CYRILLIC CAPITAL LETTER HA
0x00b7: 0x0438, # CYRILLIC SMALL LETTER I
0x00b8: 0x0418, # CYRILLIC CAPITAL LETTER I
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x0439, # CYRILLIC SMALL LETTER SHORT I
0x00be: 0x0419, # CYRILLIC CAPITAL LETTER SHORT I
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x043a, # CYRILLIC SMALL LETTER KA
0x00c7: 0x041a, # CYRILLIC CAPITAL LETTER KA
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x043b, # CYRILLIC SMALL LETTER EL
0x00d1: 0x041b, # CYRILLIC CAPITAL LETTER EL
0x00d2: 0x043c, # CYRILLIC SMALL LETTER EM
0x00d3: 0x041c, # CYRILLIC CAPITAL LETTER EM
0x00d4: 0x043d, # CYRILLIC SMALL LETTER EN
0x00d5: 0x041d, # CYRILLIC CAPITAL LETTER EN
0x00d6: 0x043e, # CYRILLIC SMALL LETTER O
0x00d7: 0x041e, # CYRILLIC CAPITAL LETTER O
0x00d8: 0x043f, # CYRILLIC SMALL LETTER PE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x041f, # CYRILLIC CAPITAL LETTER PE
0x00de: 0x044f, # CYRILLIC SMALL LETTER YA
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x042f, # CYRILLIC CAPITAL LETTER YA
0x00e1: 0x0440, # CYRILLIC SMALL LETTER ER
0x00e2: 0x0420, # CYRILLIC CAPITAL LETTER ER
0x00e3: 0x0441, # CYRILLIC SMALL LETTER ES
0x00e4: 0x0421, # CYRILLIC CAPITAL LETTER ES
0x00e5: 0x0442, # CYRILLIC SMALL LETTER TE
0x00e6: 0x0422, # CYRILLIC CAPITAL LETTER TE
0x00e7: 0x0443, # CYRILLIC SMALL LETTER U
0x00e8: 0x0423, # CYRILLIC CAPITAL LETTER U
0x00e9: 0x0436, # CYRILLIC SMALL LETTER ZHE
0x00ea: 0x0416, # CYRILLIC CAPITAL LETTER ZHE
0x00eb: 0x0432, # CYRILLIC SMALL LETTER VE
0x00ec: 0x0412, # CYRILLIC CAPITAL LETTER VE
0x00ed: 0x044c, # CYRILLIC SMALL LETTER SOFT SIGN
0x00ee: 0x042c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x00ef: 0x2116, # NUMERO SIGN
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x044b, # CYRILLIC SMALL LETTER YERU
0x00f2: 0x042b, # CYRILLIC CAPITAL LETTER YERU
0x00f3: 0x0437, # CYRILLIC SMALL LETTER ZE
0x00f4: 0x0417, # CYRILLIC CAPITAL LETTER ZE
0x00f5: 0x0448, # CYRILLIC SMALL LETTER SHA
0x00f6: 0x0428, # CYRILLIC CAPITAL LETTER SHA
0x00f7: 0x044d, # CYRILLIC SMALL LETTER E
0x00f8: 0x042d, # CYRILLIC CAPITAL LETTER E
0x00f9: 0x0449, # CYRILLIC SMALL LETTER SHCHA
0x00fa: 0x0429, # CYRILLIC CAPITAL LETTER SHCHA
0x00fb: 0x0447, # CYRILLIC SMALL LETTER CHE
0x00fc: 0x0427, # CYRILLIC CAPITAL LETTER CHE
0x00fd: 0x00a7, # SECTION SIGN
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\u0452' # 0x0080 -> CYRILLIC SMALL LETTER DJE
u'\u0402' # 0x0081 -> CYRILLIC CAPITAL LETTER DJE
u'\u0453' # 0x0082 -> CYRILLIC SMALL LETTER GJE
u'\u0403' # 0x0083 -> CYRILLIC CAPITAL LETTER GJE
u'\u0451' # 0x0084 -> CYRILLIC SMALL LETTER IO
u'\u0401' # 0x0085 -> CYRILLIC CAPITAL LETTER IO
u'\u0454' # 0x0086 -> CYRILLIC SMALL LETTER UKRAINIAN IE
u'\u0404' # 0x0087 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
u'\u0455' # 0x0088 -> CYRILLIC SMALL LETTER DZE
u'\u0405' # 0x0089 -> CYRILLIC CAPITAL LETTER DZE
u'\u0456' # 0x008a -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
u'\u0406' # 0x008b -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
u'\u0457' # 0x008c -> CYRILLIC SMALL LETTER YI
u'\u0407' # 0x008d -> CYRILLIC CAPITAL LETTER YI
u'\u0458' # 0x008e -> CYRILLIC SMALL LETTER JE
u'\u0408' # 0x008f -> CYRILLIC CAPITAL LETTER JE
u'\u0459' # 0x0090 -> CYRILLIC SMALL LETTER LJE
u'\u0409' # 0x0091 -> CYRILLIC CAPITAL LETTER LJE
u'\u045a' # 0x0092 -> CYRILLIC SMALL LETTER NJE
u'\u040a' # 0x0093 -> CYRILLIC CAPITAL LETTER NJE
u'\u045b' # 0x0094 -> CYRILLIC SMALL LETTER TSHE
u'\u040b' # 0x0095 -> CYRILLIC CAPITAL LETTER TSHE
u'\u045c' # 0x0096 -> CYRILLIC SMALL LETTER KJE
u'\u040c' # 0x0097 -> CYRILLIC CAPITAL LETTER KJE
u'\u045e' # 0x0098 -> CYRILLIC SMALL LETTER SHORT U
u'\u040e' # 0x0099 -> CYRILLIC CAPITAL LETTER SHORT U
u'\u045f' # 0x009a -> CYRILLIC SMALL LETTER DZHE
u'\u040f' # 0x009b -> CYRILLIC CAPITAL LETTER DZHE
u'\u044e' # 0x009c -> CYRILLIC SMALL LETTER YU
u'\u042e' # 0x009d -> CYRILLIC CAPITAL LETTER YU
u'\u044a' # 0x009e -> CYRILLIC SMALL LETTER HARD SIGN
u'\u042a' # 0x009f -> CYRILLIC CAPITAL LETTER HARD SIGN
u'\u0430' # 0x00a0 -> CYRILLIC SMALL LETTER A
u'\u0410' # 0x00a1 -> CYRILLIC CAPITAL LETTER A
u'\u0431' # 0x00a2 -> CYRILLIC SMALL LETTER BE
u'\u0411' # 0x00a3 -> CYRILLIC CAPITAL LETTER BE
u'\u0446' # 0x00a4 -> CYRILLIC SMALL LETTER TSE
u'\u0426' # 0x00a5 -> CYRILLIC CAPITAL LETTER TSE
u'\u0434' # 0x00a6 -> CYRILLIC SMALL LETTER DE
u'\u0414' # 0x00a7 -> CYRILLIC CAPITAL LETTER DE
u'\u0435' # 0x00a8 -> CYRILLIC SMALL LETTER IE
u'\u0415' # 0x00a9 -> CYRILLIC CAPITAL LETTER IE
u'\u0444' # 0x00aa -> CYRILLIC SMALL LETTER EF
u'\u0424' # 0x00ab -> CYRILLIC CAPITAL LETTER EF
u'\u0433' # 0x00ac -> CYRILLIC SMALL LETTER GHE
u'\u0413' # 0x00ad -> CYRILLIC CAPITAL LETTER GHE
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u0445' # 0x00b5 -> CYRILLIC SMALL LETTER HA
u'\u0425' # 0x00b6 -> CYRILLIC CAPITAL LETTER HA
u'\u0438' # 0x00b7 -> CYRILLIC SMALL LETTER I
u'\u0418' # 0x00b8 -> CYRILLIC CAPITAL LETTER I
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u0439' # 0x00bd -> CYRILLIC SMALL LETTER SHORT I
u'\u0419' # 0x00be -> CYRILLIC CAPITAL LETTER SHORT I
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u043a' # 0x00c6 -> CYRILLIC SMALL LETTER KA
u'\u041a' # 0x00c7 -> CYRILLIC CAPITAL LETTER KA
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\xa4' # 0x00cf -> CURRENCY SIGN
u'\u043b' # 0x00d0 -> CYRILLIC SMALL LETTER EL
u'\u041b' # 0x00d1 -> CYRILLIC CAPITAL LETTER EL
u'\u043c' # 0x00d2 -> CYRILLIC SMALL LETTER EM
u'\u041c' # 0x00d3 -> CYRILLIC CAPITAL LETTER EM
u'\u043d' # 0x00d4 -> CYRILLIC SMALL LETTER EN
u'\u041d' # 0x00d5 -> CYRILLIC CAPITAL LETTER EN
u'\u043e' # 0x00d6 -> CYRILLIC SMALL LETTER O
u'\u041e' # 0x00d7 -> CYRILLIC CAPITAL LETTER O
u'\u043f' # 0x00d8 -> CYRILLIC SMALL LETTER PE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u041f' # 0x00dd -> CYRILLIC CAPITAL LETTER PE
u'\u044f' # 0x00de -> CYRILLIC SMALL LETTER YA
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u042f' # 0x00e0 -> CYRILLIC CAPITAL LETTER YA
u'\u0440' # 0x00e1 -> CYRILLIC SMALL LETTER ER
u'\u0420' # 0x00e2 -> CYRILLIC CAPITAL LETTER ER
u'\u0441' # 0x00e3 -> CYRILLIC SMALL LETTER ES
u'\u0421' # 0x00e4 -> CYRILLIC CAPITAL LETTER ES
u'\u0442' # 0x00e5 -> CYRILLIC SMALL LETTER TE
u'\u0422' # 0x00e6 -> CYRILLIC CAPITAL LETTER TE
u'\u0443' # 0x00e7 -> CYRILLIC SMALL LETTER U
u'\u0423' # 0x00e8 -> CYRILLIC CAPITAL LETTER U
u'\u0436' # 0x00e9 -> CYRILLIC SMALL LETTER ZHE
u'\u0416' # 0x00ea -> CYRILLIC CAPITAL LETTER ZHE
u'\u0432' # 0x00eb -> CYRILLIC SMALL LETTER VE
u'\u0412' # 0x00ec -> CYRILLIC CAPITAL LETTER VE
u'\u044c' # 0x00ed -> CYRILLIC SMALL LETTER SOFT SIGN
u'\u042c' # 0x00ee -> CYRILLIC CAPITAL LETTER SOFT SIGN
u'\u2116' # 0x00ef -> NUMERO SIGN
u'\xad' # 0x00f0 -> SOFT HYPHEN
u'\u044b' # 0x00f1 -> CYRILLIC SMALL LETTER YERU
u'\u042b' # 0x00f2 -> CYRILLIC CAPITAL LETTER YERU
u'\u0437' # 0x00f3 -> CYRILLIC SMALL LETTER ZE
u'\u0417' # 0x00f4 -> CYRILLIC CAPITAL LETTER ZE
u'\u0448' # 0x00f5 -> CYRILLIC SMALL LETTER SHA
u'\u0428' # 0x00f6 -> CYRILLIC CAPITAL LETTER SHA
u'\u044d' # 0x00f7 -> CYRILLIC SMALL LETTER E
u'\u042d' # 0x00f8 -> CYRILLIC CAPITAL LETTER E
u'\u0449' # 0x00f9 -> CYRILLIC SMALL LETTER SHCHA
u'\u0429' # 0x00fa -> CYRILLIC CAPITAL LETTER SHCHA
u'\u0447' # 0x00fb -> CYRILLIC SMALL LETTER CHE
u'\u0427' # 0x00fc -> CYRILLIC CAPITAL LETTER CHE
u'\xa7' # 0x00fd -> SECTION SIGN
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a7: 0x00fd, # SECTION SIGN
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ad: 0x00f0, # SOFT HYPHEN
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x0401: 0x0085, # CYRILLIC CAPITAL LETTER IO
0x0402: 0x0081, # CYRILLIC CAPITAL LETTER DJE
0x0403: 0x0083, # CYRILLIC CAPITAL LETTER GJE
0x0404: 0x0087, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x0405: 0x0089, # CYRILLIC CAPITAL LETTER DZE
0x0406: 0x008b, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0x0407: 0x008d, # CYRILLIC CAPITAL LETTER YI
0x0408: 0x008f, # CYRILLIC CAPITAL LETTER JE
0x0409: 0x0091, # CYRILLIC CAPITAL LETTER LJE
0x040a: 0x0093, # CYRILLIC CAPITAL LETTER NJE
0x040b: 0x0095, # CYRILLIC CAPITAL LETTER TSHE
0x040c: 0x0097, # CYRILLIC CAPITAL LETTER KJE
0x040e: 0x0099, # CYRILLIC CAPITAL LETTER SHORT U
0x040f: 0x009b, # CYRILLIC CAPITAL LETTER DZHE
0x0410: 0x00a1, # CYRILLIC CAPITAL LETTER A
0x0411: 0x00a3, # CYRILLIC CAPITAL LETTER BE
0x0412: 0x00ec, # CYRILLIC CAPITAL LETTER VE
0x0413: 0x00ad, # CYRILLIC CAPITAL LETTER GHE
0x0414: 0x00a7, # CYRILLIC CAPITAL LETTER DE
0x0415: 0x00a9, # CYRILLIC CAPITAL LETTER IE
0x0416: 0x00ea, # CYRILLIC CAPITAL LETTER ZHE
0x0417: 0x00f4, # CYRILLIC CAPITAL LETTER ZE
0x0418: 0x00b8, # CYRILLIC CAPITAL LETTER I
0x0419: 0x00be, # CYRILLIC CAPITAL LETTER SHORT I
0x041a: 0x00c7, # CYRILLIC CAPITAL LETTER KA
0x041b: 0x00d1, # CYRILLIC CAPITAL LETTER EL
0x041c: 0x00d3, # CYRILLIC CAPITAL LETTER EM
0x041d: 0x00d5, # CYRILLIC CAPITAL LETTER EN
0x041e: 0x00d7, # CYRILLIC CAPITAL LETTER O
0x041f: 0x00dd, # CYRILLIC CAPITAL LETTER PE
0x0420: 0x00e2, # CYRILLIC CAPITAL LETTER ER
0x0421: 0x00e4, # CYRILLIC CAPITAL LETTER ES
0x0422: 0x00e6, # CYRILLIC CAPITAL LETTER TE
0x0423: 0x00e8, # CYRILLIC CAPITAL LETTER U
0x0424: 0x00ab, # CYRILLIC CAPITAL LETTER EF
0x0425: 0x00b6, # CYRILLIC CAPITAL LETTER HA
0x0426: 0x00a5, # CYRILLIC CAPITAL LETTER TSE
0x0427: 0x00fc, # CYRILLIC CAPITAL LETTER CHE
0x0428: 0x00f6, # CYRILLIC CAPITAL LETTER SHA
0x0429: 0x00fa, # CYRILLIC CAPITAL LETTER SHCHA
0x042a: 0x009f, # CYRILLIC CAPITAL LETTER HARD SIGN
0x042b: 0x00f2, # CYRILLIC CAPITAL LETTER YERU
0x042c: 0x00ee, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x042d: 0x00f8, # CYRILLIC CAPITAL LETTER E
0x042e: 0x009d, # CYRILLIC CAPITAL LETTER YU
0x042f: 0x00e0, # CYRILLIC CAPITAL LETTER YA
0x0430: 0x00a0, # CYRILLIC SMALL LETTER A
0x0431: 0x00a2, # CYRILLIC SMALL LETTER BE
0x0432: 0x00eb, # CYRILLIC SMALL LETTER VE
0x0433: 0x00ac, # CYRILLIC SMALL LETTER GHE
0x0434: 0x00a6, # CYRILLIC SMALL LETTER DE
0x0435: 0x00a8, # CYRILLIC SMALL LETTER IE
0x0436: 0x00e9, # CYRILLIC SMALL LETTER ZHE
0x0437: 0x00f3, # CYRILLIC SMALL LETTER ZE
0x0438: 0x00b7, # CYRILLIC SMALL LETTER I
0x0439: 0x00bd, # CYRILLIC SMALL LETTER SHORT I
0x043a: 0x00c6, # CYRILLIC SMALL LETTER KA
0x043b: 0x00d0, # CYRILLIC SMALL LETTER EL
0x043c: 0x00d2, # CYRILLIC SMALL LETTER EM
0x043d: 0x00d4, # CYRILLIC SMALL LETTER EN
0x043e: 0x00d6, # CYRILLIC SMALL LETTER O
0x043f: 0x00d8, # CYRILLIC SMALL LETTER PE
0x0440: 0x00e1, # CYRILLIC SMALL LETTER ER
0x0441: 0x00e3, # CYRILLIC SMALL LETTER ES
0x0442: 0x00e5, # CYRILLIC SMALL LETTER TE
0x0443: 0x00e7, # CYRILLIC SMALL LETTER U
0x0444: 0x00aa, # CYRILLIC SMALL LETTER EF
0x0445: 0x00b5, # CYRILLIC SMALL LETTER HA
0x0446: 0x00a4, # CYRILLIC SMALL LETTER TSE
0x0447: 0x00fb, # CYRILLIC SMALL LETTER CHE
0x0448: 0x00f5, # CYRILLIC SMALL LETTER SHA
0x0449: 0x00f9, # CYRILLIC SMALL LETTER SHCHA
0x044a: 0x009e, # CYRILLIC SMALL LETTER HARD SIGN
0x044b: 0x00f1, # CYRILLIC SMALL LETTER YERU
0x044c: 0x00ed, # CYRILLIC SMALL LETTER SOFT SIGN
0x044d: 0x00f7, # CYRILLIC SMALL LETTER E
0x044e: 0x009c, # CYRILLIC SMALL LETTER YU
0x044f: 0x00de, # CYRILLIC SMALL LETTER YA
0x0451: 0x0084, # CYRILLIC SMALL LETTER IO
0x0452: 0x0080, # CYRILLIC SMALL LETTER DJE
0x0453: 0x0082, # CYRILLIC SMALL LETTER GJE
0x0454: 0x0086, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x0455: 0x0088, # CYRILLIC SMALL LETTER DZE
0x0456: 0x008a, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0x0457: 0x008c, # CYRILLIC SMALL LETTER YI
0x0458: 0x008e, # CYRILLIC SMALL LETTER JE
0x0459: 0x0090, # CYRILLIC SMALL LETTER LJE
0x045a: 0x0092, # CYRILLIC SMALL LETTER NJE
0x045b: 0x0094, # CYRILLIC SMALL LETTER TSHE
0x045c: 0x0096, # CYRILLIC SMALL LETTER KJE
0x045e: 0x0098, # CYRILLIC SMALL LETTER SHORT U
0x045f: 0x009a, # CYRILLIC SMALL LETTER DZHE
0x2116: 0x00ef, # NUMERO SIGN
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp856 generated from 'MAPPINGS/VENDORS/MISC/CP856.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp856',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u05d0' # 0x80 -> HEBREW LETTER ALEF
u'\u05d1' # 0x81 -> HEBREW LETTER BET
u'\u05d2' # 0x82 -> HEBREW LETTER GIMEL
u'\u05d3' # 0x83 -> HEBREW LETTER DALET
u'\u05d4' # 0x84 -> HEBREW LETTER HE
u'\u05d5' # 0x85 -> HEBREW LETTER VAV
u'\u05d6' # 0x86 -> HEBREW LETTER ZAYIN
u'\u05d7' # 0x87 -> HEBREW LETTER HET
u'\u05d8' # 0x88 -> HEBREW LETTER TET
u'\u05d9' # 0x89 -> HEBREW LETTER YOD
u'\u05da' # 0x8A -> HEBREW LETTER FINAL KAF
u'\u05db' # 0x8B -> HEBREW LETTER KAF
u'\u05dc' # 0x8C -> HEBREW LETTER LAMED
u'\u05dd' # 0x8D -> HEBREW LETTER FINAL MEM
u'\u05de' # 0x8E -> HEBREW LETTER MEM
u'\u05df' # 0x8F -> HEBREW LETTER FINAL NUN
u'\u05e0' # 0x90 -> HEBREW LETTER NUN
u'\u05e1' # 0x91 -> HEBREW LETTER SAMEKH
u'\u05e2' # 0x92 -> HEBREW LETTER AYIN
u'\u05e3' # 0x93 -> HEBREW LETTER FINAL PE
u'\u05e4' # 0x94 -> HEBREW LETTER PE
u'\u05e5' # 0x95 -> HEBREW LETTER FINAL TSADI
u'\u05e6' # 0x96 -> HEBREW LETTER TSADI
u'\u05e7' # 0x97 -> HEBREW LETTER QOF
u'\u05e8' # 0x98 -> HEBREW LETTER RESH
u'\u05e9' # 0x99 -> HEBREW LETTER SHIN
u'\u05ea' # 0x9A -> HEBREW LETTER TAV
u'\ufffe' # 0x9B -> UNDEFINED
u'\xa3' # 0x9C -> POUND SIGN
u'\ufffe' # 0x9D -> UNDEFINED
u'\xd7' # 0x9E -> MULTIPLICATION SIGN
u'\ufffe' # 0x9F -> UNDEFINED
u'\ufffe' # 0xA0 -> UNDEFINED
u'\ufffe' # 0xA1 -> UNDEFINED
u'\ufffe' # 0xA2 -> UNDEFINED
u'\ufffe' # 0xA3 -> UNDEFINED
u'\ufffe' # 0xA4 -> UNDEFINED
u'\ufffe' # 0xA5 -> UNDEFINED
u'\ufffe' # 0xA6 -> UNDEFINED
u'\ufffe' # 0xA7 -> UNDEFINED
u'\ufffe' # 0xA8 -> UNDEFINED
u'\xae' # 0xA9 -> REGISTERED SIGN
u'\xac' # 0xAA -> NOT SIGN
u'\xbd' # 0xAB -> VULGAR FRACTION ONE HALF
u'\xbc' # 0xAC -> VULGAR FRACTION ONE QUARTER
u'\ufffe' # 0xAD -> UNDEFINED
u'\xab' # 0xAE -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xAF -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0xB0 -> LIGHT SHADE
u'\u2592' # 0xB1 -> MEDIUM SHADE
u'\u2593' # 0xB2 -> DARK SHADE
u'\u2502' # 0xB3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0xB4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\ufffe' # 0xB5 -> UNDEFINED
u'\ufffe' # 0xB6 -> UNDEFINED
u'\ufffe' # 0xB7 -> UNDEFINED
u'\xa9' # 0xB8 -> COPYRIGHT SIGN
u'\u2563' # 0xB9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0xBA -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0xBB -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0xBC -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\xa2' # 0xBD -> CENT SIGN
u'\xa5' # 0xBE -> YEN SIGN
u'\u2510' # 0xBF -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0xC0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0xC1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0xC2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0xC3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0xC4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0xC5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\ufffe' # 0xC6 -> UNDEFINED
u'\ufffe' # 0xC7 -> UNDEFINED
u'\u255a' # 0xC8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0xC9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0xCA -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0xCB -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0xCC -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0xCD -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0xCE -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\xa4' # 0xCF -> CURRENCY SIGN
u'\ufffe' # 0xD0 -> UNDEFINED
u'\ufffe' # 0xD1 -> UNDEFINED
u'\ufffe' # 0xD2 -> UNDEFINED
u'\ufffe' # 0xD3 -> UNDEFINEDS
u'\ufffe' # 0xD4 -> UNDEFINED
u'\ufffe' # 0xD5 -> UNDEFINED
u'\ufffe' # 0xD6 -> UNDEFINEDE
u'\ufffe' # 0xD7 -> UNDEFINED
u'\ufffe' # 0xD8 -> UNDEFINED
u'\u2518' # 0xD9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0xDA -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0xDB -> FULL BLOCK
u'\u2584' # 0xDC -> LOWER HALF BLOCK
u'\xa6' # 0xDD -> BROKEN BAR
u'\ufffe' # 0xDE -> UNDEFINED
u'\u2580' # 0xDF -> UPPER HALF BLOCK
u'\ufffe' # 0xE0 -> UNDEFINED
u'\ufffe' # 0xE1 -> UNDEFINED
u'\ufffe' # 0xE2 -> UNDEFINED
u'\ufffe' # 0xE3 -> UNDEFINED
u'\ufffe' # 0xE4 -> UNDEFINED
u'\ufffe' # 0xE5 -> UNDEFINED
u'\xb5' # 0xE6 -> MICRO SIGN
u'\ufffe' # 0xE7 -> UNDEFINED
u'\ufffe' # 0xE8 -> UNDEFINED
u'\ufffe' # 0xE9 -> UNDEFINED
u'\ufffe' # 0xEA -> UNDEFINED
u'\ufffe' # 0xEB -> UNDEFINED
u'\ufffe' # 0xEC -> UNDEFINED
u'\ufffe' # 0xED -> UNDEFINED
u'\xaf' # 0xEE -> MACRON
u'\xb4' # 0xEF -> ACUTE ACCENT
u'\xad' # 0xF0 -> SOFT HYPHEN
u'\xb1' # 0xF1 -> PLUS-MINUS SIGN
u'\u2017' # 0xF2 -> DOUBLE LOW LINE
u'\xbe' # 0xF3 -> VULGAR FRACTION THREE QUARTERS
u'\xb6' # 0xF4 -> PILCROW SIGN
u'\xa7' # 0xF5 -> SECTION SIGN
u'\xf7' # 0xF6 -> DIVISION SIGN
u'\xb8' # 0xF7 -> CEDILLA
u'\xb0' # 0xF8 -> DEGREE SIGN
u'\xa8' # 0xF9 -> DIAERESIS
u'\xb7' # 0xFA -> MIDDLE DOT
u'\xb9' # 0xFB -> SUPERSCRIPT ONE
u'\xb3' # 0xFC -> SUPERSCRIPT THREE
u'\xb2' # 0xFD -> SUPERSCRIPT TWO
u'\u25a0' # 0xFE -> BLACK SQUARE
u'\xa0' # 0xFF -> NO-BREAK SPACE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP857.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp857',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x0131, # LATIN SMALL LETTER DOTLESS I
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x0130, # LATIN CAPITAL LETTER I WITH DOT ABOVE
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x015e, # LATIN CAPITAL LETTER S WITH CEDILLA
0x009f: 0x015f, # LATIN SMALL LETTER S WITH CEDILLA
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x011e, # LATIN CAPITAL LETTER G WITH BREVE
0x00a7: 0x011f, # LATIN SMALL LETTER G WITH BREVE
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x00ae, # REGISTERED SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00b6: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00b7: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x00b8: 0x00a9, # COPYRIGHT SIGN
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x00a2, # CENT SIGN
0x00be: 0x00a5, # YEN SIGN
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x00c7: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00d1: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00d2: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00d3: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00d4: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x00d5: None, # UNDEFINED
0x00d6: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d7: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d8: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x00a6, # BROKEN BAR
0x00de: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e3: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00e4: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00e5: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: None, # UNDEFINED
0x00e8: 0x00d7, # MULTIPLICATION SIGN
0x00e9: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ea: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00eb: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x00ed: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x00ee: 0x00af, # MACRON
0x00ef: 0x00b4, # ACUTE ACCENT
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: None, # UNDEFINED
0x00f3: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x00b8, # CEDILLA
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x00b9, # SUPERSCRIPT ONE
0x00fc: 0x00b3, # SUPERSCRIPT THREE
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
u'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\u0131' # 0x008d -> LATIN SMALL LETTER DOTLESS I
u'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
u'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
u'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
u'\u0130' # 0x0098 -> LATIN CAPITAL LETTER I WITH DOT ABOVE
u'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
u'\xa3' # 0x009c -> POUND SIGN
u'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
u'\u015e' # 0x009e -> LATIN CAPITAL LETTER S WITH CEDILLA
u'\u015f' # 0x009f -> LATIN SMALL LETTER S WITH CEDILLA
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
u'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
u'\u011e' # 0x00a6 -> LATIN CAPITAL LETTER G WITH BREVE
u'\u011f' # 0x00a7 -> LATIN SMALL LETTER G WITH BREVE
u'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
u'\xae' # 0x00a9 -> REGISTERED SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\xc1' # 0x00b5 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0x00b6 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc0' # 0x00b7 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xa9' # 0x00b8 -> COPYRIGHT SIGN
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\xa2' # 0x00bd -> CENT SIGN
u'\xa5' # 0x00be -> YEN SIGN
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\xe3' # 0x00c6 -> LATIN SMALL LETTER A WITH TILDE
u'\xc3' # 0x00c7 -> LATIN CAPITAL LETTER A WITH TILDE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\xa4' # 0x00cf -> CURRENCY SIGN
u'\xba' # 0x00d0 -> MASCULINE ORDINAL INDICATOR
u'\xaa' # 0x00d1 -> FEMININE ORDINAL INDICATOR
u'\xca' # 0x00d2 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0x00d3 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0x00d4 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\ufffe' # 0x00d5 -> UNDEFINED
u'\xcd' # 0x00d6 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0x00d7 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0x00d8 -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\xa6' # 0x00dd -> BROKEN BAR
u'\xcc' # 0x00de -> LATIN CAPITAL LETTER I WITH GRAVE
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\xd4' # 0x00e2 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd2' # 0x00e3 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xf5' # 0x00e4 -> LATIN SMALL LETTER O WITH TILDE
u'\xd5' # 0x00e5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\ufffe' # 0x00e7 -> UNDEFINED
u'\xd7' # 0x00e8 -> MULTIPLICATION SIGN
u'\xda' # 0x00e9 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0x00ea -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xd9' # 0x00eb -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xec' # 0x00ec -> LATIN SMALL LETTER I WITH GRAVE
u'\xff' # 0x00ed -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\xaf' # 0x00ee -> MACRON
u'\xb4' # 0x00ef -> ACUTE ACCENT
u'\xad' # 0x00f0 -> SOFT HYPHEN
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\ufffe' # 0x00f2 -> UNDEFINED
u'\xbe' # 0x00f3 -> VULGAR FRACTION THREE QUARTERS
u'\xb6' # 0x00f4 -> PILCROW SIGN
u'\xa7' # 0x00f5 -> SECTION SIGN
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\xb8' # 0x00f7 -> CEDILLA
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\xa8' # 0x00f9 -> DIAERESIS
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\xb9' # 0x00fb -> SUPERSCRIPT ONE
u'\xb3' # 0x00fc -> SUPERSCRIPT THREE
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x00bd, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a5: 0x00be, # YEN SIGN
0x00a6: 0x00dd, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00a9: 0x00b8, # COPYRIGHT SIGN
0x00aa: 0x00d1, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00ae: 0x00a9, # REGISTERED SIGN
0x00af: 0x00ee, # MACRON
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00fc, # SUPERSCRIPT THREE
0x00b4: 0x00ef, # ACUTE ACCENT
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x00f4, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b8: 0x00f7, # CEDILLA
0x00b9: 0x00fb, # SUPERSCRIPT ONE
0x00ba: 0x00d0, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00f3, # VULGAR FRACTION THREE QUARTERS
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c0: 0x00b7, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c1: 0x00b5, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x00b6, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c3: 0x00c7, # LATIN CAPITAL LETTER A WITH TILDE
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x00d4, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x00d2, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cb: 0x00d3, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00cc: 0x00de, # LATIN CAPITAL LETTER I WITH GRAVE
0x00cd: 0x00d6, # LATIN CAPITAL LETTER I WITH ACUTE
0x00ce: 0x00d7, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00cf: 0x00d8, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d2: 0x00e3, # LATIN CAPITAL LETTER O WITH GRAVE
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x00e2, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d5: 0x00e5, # LATIN CAPITAL LETTER O WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x00e8, # MULTIPLICATION SIGN
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00d9: 0x00eb, # LATIN CAPITAL LETTER U WITH GRAVE
0x00da: 0x00e9, # LATIN CAPITAL LETTER U WITH ACUTE
0x00db: 0x00ea, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e3: 0x00c6, # LATIN SMALL LETTER A WITH TILDE
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f5: 0x00e4, # LATIN SMALL LETTER O WITH TILDE
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00ff: 0x00ed, # LATIN SMALL LETTER Y WITH DIAERESIS
0x011e: 0x00a6, # LATIN CAPITAL LETTER G WITH BREVE
0x011f: 0x00a7, # LATIN SMALL LETTER G WITH BREVE
0x0130: 0x0098, # LATIN CAPITAL LETTER I WITH DOT ABOVE
0x0131: 0x008d, # LATIN SMALL LETTER DOTLESS I
0x015e: 0x009e, # LATIN CAPITAL LETTER S WITH CEDILLA
0x015f: 0x009f, # LATIN SMALL LETTER S WITH CEDILLA
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec for CP858, modified from cp850.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp858',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x00d7, # MULTIPLICATION SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x00ae, # REGISTERED SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00b6: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00b7: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x00b8: 0x00a9, # COPYRIGHT SIGN
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x00a2, # CENT SIGN
0x00be: 0x00a5, # YEN SIGN
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x00c7: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x00f0, # LATIN SMALL LETTER ETH
0x00d1: 0x00d0, # LATIN CAPITAL LETTER ETH
0x00d2: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00d3: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00d4: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x00d5: 0x20ac, # EURO SIGN
0x00d6: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d7: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d8: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x00a6, # BROKEN BAR
0x00de: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e3: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00e4: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00e5: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x00fe, # LATIN SMALL LETTER THORN
0x00e8: 0x00de, # LATIN CAPITAL LETTER THORN
0x00e9: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ea: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00eb: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x00ec: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x00ed: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00ee: 0x00af, # MACRON
0x00ef: 0x00b4, # ACUTE ACCENT
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2017, # DOUBLE LOW LINE
0x00f3: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x00b8, # CEDILLA
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x00b9, # SUPERSCRIPT ONE
0x00fc: 0x00b3, # SUPERSCRIPT THREE
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
u'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
u'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
u'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
u'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
u'\xff' # 0x0098 -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
u'\xa3' # 0x009c -> POUND SIGN
u'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd7' # 0x009e -> MULTIPLICATION SIGN
u'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
u'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
u'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
u'\xae' # 0x00a9 -> REGISTERED SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\xc1' # 0x00b5 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0x00b6 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc0' # 0x00b7 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xa9' # 0x00b8 -> COPYRIGHT SIGN
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\xa2' # 0x00bd -> CENT SIGN
u'\xa5' # 0x00be -> YEN SIGN
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\xe3' # 0x00c6 -> LATIN SMALL LETTER A WITH TILDE
u'\xc3' # 0x00c7 -> LATIN CAPITAL LETTER A WITH TILDE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\xa4' # 0x00cf -> CURRENCY SIGN
u'\xf0' # 0x00d0 -> LATIN SMALL LETTER ETH
u'\xd0' # 0x00d1 -> LATIN CAPITAL LETTER ETH
u'\xca' # 0x00d2 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0x00d3 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0x00d4 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\u20ac' # 0x00d5 -> EURO SIGN
u'\xcd' # 0x00d6 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0x00d7 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0x00d8 -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\xa6' # 0x00dd -> BROKEN BAR
u'\xcc' # 0x00de -> LATIN CAPITAL LETTER I WITH GRAVE
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\xd4' # 0x00e2 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd2' # 0x00e3 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xf5' # 0x00e4 -> LATIN SMALL LETTER O WITH TILDE
u'\xd5' # 0x00e5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\xfe' # 0x00e7 -> LATIN SMALL LETTER THORN
u'\xde' # 0x00e8 -> LATIN CAPITAL LETTER THORN
u'\xda' # 0x00e9 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0x00ea -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xd9' # 0x00eb -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xfd' # 0x00ec -> LATIN SMALL LETTER Y WITH ACUTE
u'\xdd' # 0x00ed -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xaf' # 0x00ee -> MACRON
u'\xb4' # 0x00ef -> ACUTE ACCENT
u'\xad' # 0x00f0 -> SOFT HYPHEN
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2017' # 0x00f2 -> DOUBLE LOW LINE
u'\xbe' # 0x00f3 -> VULGAR FRACTION THREE QUARTERS
u'\xb6' # 0x00f4 -> PILCROW SIGN
u'\xa7' # 0x00f5 -> SECTION SIGN
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\xb8' # 0x00f7 -> CEDILLA
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\xa8' # 0x00f9 -> DIAERESIS
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\xb9' # 0x00fb -> SUPERSCRIPT ONE
u'\xb3' # 0x00fc -> SUPERSCRIPT THREE
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x00bd, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a5: 0x00be, # YEN SIGN
0x00a6: 0x00dd, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00a9: 0x00b8, # COPYRIGHT SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00ae: 0x00a9, # REGISTERED SIGN
0x00af: 0x00ee, # MACRON
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00fc, # SUPERSCRIPT THREE
0x00b4: 0x00ef, # ACUTE ACCENT
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x00f4, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b8: 0x00f7, # CEDILLA
0x00b9: 0x00fb, # SUPERSCRIPT ONE
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00f3, # VULGAR FRACTION THREE QUARTERS
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c0: 0x00b7, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c1: 0x00b5, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x00b6, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c3: 0x00c7, # LATIN CAPITAL LETTER A WITH TILDE
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x00d4, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x00d2, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cb: 0x00d3, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00cc: 0x00de, # LATIN CAPITAL LETTER I WITH GRAVE
0x00cd: 0x00d6, # LATIN CAPITAL LETTER I WITH ACUTE
0x00ce: 0x00d7, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00cf: 0x00d8, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d0: 0x00d1, # LATIN CAPITAL LETTER ETH
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d2: 0x00e3, # LATIN CAPITAL LETTER O WITH GRAVE
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x00e2, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d5: 0x00e5, # LATIN CAPITAL LETTER O WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x009e, # MULTIPLICATION SIGN
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00d9: 0x00eb, # LATIN CAPITAL LETTER U WITH GRAVE
0x00da: 0x00e9, # LATIN CAPITAL LETTER U WITH ACUTE
0x00db: 0x00ea, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dd: 0x00ed, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00de: 0x00e8, # LATIN CAPITAL LETTER THORN
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e3: 0x00c6, # LATIN SMALL LETTER A WITH TILDE
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f0: 0x00d0, # LATIN SMALL LETTER ETH
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f5: 0x00e4, # LATIN SMALL LETTER O WITH TILDE
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00fd: 0x00ec, # LATIN SMALL LETTER Y WITH ACUTE
0x00fe: 0x00e7, # LATIN SMALL LETTER THORN
0x00ff: 0x0098, # LATIN SMALL LETTER Y WITH DIAERESIS
0x20ac: 0x00d5, # EURO SIGN
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x2017: 0x00f2, # DOUBLE LOW LINE
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP860.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp860',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x008c: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x008f: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x0092: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x0099: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00a2, # CENT SIGN
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0x0084 -> LATIN SMALL LETTER A WITH TILDE
u'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
u'\xc1' # 0x0086 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xca' # 0x0089 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
u'\xcd' # 0x008b -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xd4' # 0x008c -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
u'\xc3' # 0x008e -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc2' # 0x008f -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xc0' # 0x0091 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc8' # 0x0092 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0x0094 -> LATIN SMALL LETTER O WITH TILDE
u'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
u'\xda' # 0x0096 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
u'\xcc' # 0x0098 -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xd5' # 0x0099 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xa2' # 0x009b -> CENT SIGN
u'\xa3' # 0x009c -> POUND SIGN
u'\xd9' # 0x009d -> LATIN CAPITAL LETTER U WITH GRAVE
u'\u20a7' # 0x009e -> PESETA SIGN
u'\xd3' # 0x009f -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
u'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
u'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
u'\xd2' # 0x00a9 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
u'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
u'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
u'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
u'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
u'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
u'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
u'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
u'\u221e' # 0x00ec -> INFINITY
u'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
u'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
u'\u2229' # 0x00ef -> INTERSECTION
u'\u2261' # 0x00f0 -> IDENTICAL TO
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
u'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
u'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
u'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\u2248' # 0x00f7 -> ALMOST EQUAL TO
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\u221a' # 0x00fb -> SQUARE ROOT
u'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x009b, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c0: 0x0091, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c1: 0x0086, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x008f, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c3: 0x008e, # LATIN CAPITAL LETTER A WITH TILDE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x0092, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x0089, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cc: 0x0098, # LATIN CAPITAL LETTER I WITH GRAVE
0x00cd: 0x008b, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d2: 0x00a9, # LATIN CAPITAL LETTER O WITH GRAVE
0x00d3: 0x009f, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x008c, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d5: 0x0099, # LATIN CAPITAL LETTER O WITH TILDE
0x00d9: 0x009d, # LATIN CAPITAL LETTER U WITH GRAVE
0x00da: 0x0096, # LATIN CAPITAL LETTER U WITH ACUTE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e3: 0x0084, # LATIN SMALL LETTER A WITH TILDE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f5: 0x0094, # LATIN SMALL LETTER O WITH TILDE
0x00f7: 0x00f6, # DIVISION SIGN
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP861.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp861',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00d0, # LATIN CAPITAL LETTER ETH
0x008c: 0x00f0, # LATIN SMALL LETTER ETH
0x008d: 0x00de, # LATIN CAPITAL LETTER THORN
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00fe, # LATIN SMALL LETTER THORN
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x0098: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00a5: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00a6: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00a7: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
u'\xd0' # 0x008b -> LATIN CAPITAL LETTER ETH
u'\xf0' # 0x008c -> LATIN SMALL LETTER ETH
u'\xde' # 0x008d -> LATIN CAPITAL LETTER THORN
u'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
u'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xfe' # 0x0095 -> LATIN SMALL LETTER THORN
u'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xdd' # 0x0097 -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xfd' # 0x0098 -> LATIN SMALL LETTER Y WITH ACUTE
u'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
u'\xa3' # 0x009c -> POUND SIGN
u'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
u'\u20a7' # 0x009e -> PESETA SIGN
u'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xc1' # 0x00a4 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xcd' # 0x00a5 -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xd3' # 0x00a6 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xda' # 0x00a7 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
u'\u2310' # 0x00a9 -> REVERSED NOT SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
u'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
u'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
u'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
u'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
u'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
u'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
u'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
u'\u221e' # 0x00ec -> INFINITY
u'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
u'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
u'\u2229' # 0x00ef -> INTERSECTION
u'\u2261' # 0x00f0 -> IDENTICAL TO
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
u'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
u'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
u'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\u2248' # 0x00f7 -> ALMOST EQUAL TO
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\u221a' # 0x00fb -> SQUARE ROOT
u'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a3: 0x009c, # POUND SIGN
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c1: 0x00a4, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00cd: 0x00a5, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d0: 0x008b, # LATIN CAPITAL LETTER ETH
0x00d3: 0x00a6, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00da: 0x00a7, # LATIN CAPITAL LETTER U WITH ACUTE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dd: 0x0097, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00de: 0x008d, # LATIN CAPITAL LETTER THORN
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00f0: 0x008c, # LATIN SMALL LETTER ETH
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00fd: 0x0098, # LATIN SMALL LETTER Y WITH ACUTE
0x00fe: 0x0095, # LATIN SMALL LETTER THORN
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP862.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp862',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x05d0, # HEBREW LETTER ALEF
0x0081: 0x05d1, # HEBREW LETTER BET
0x0082: 0x05d2, # HEBREW LETTER GIMEL
0x0083: 0x05d3, # HEBREW LETTER DALET
0x0084: 0x05d4, # HEBREW LETTER HE
0x0085: 0x05d5, # HEBREW LETTER VAV
0x0086: 0x05d6, # HEBREW LETTER ZAYIN
0x0087: 0x05d7, # HEBREW LETTER HET
0x0088: 0x05d8, # HEBREW LETTER TET
0x0089: 0x05d9, # HEBREW LETTER YOD
0x008a: 0x05da, # HEBREW LETTER FINAL KAF
0x008b: 0x05db, # HEBREW LETTER KAF
0x008c: 0x05dc, # HEBREW LETTER LAMED
0x008d: 0x05dd, # HEBREW LETTER FINAL MEM
0x008e: 0x05de, # HEBREW LETTER MEM
0x008f: 0x05df, # HEBREW LETTER FINAL NUN
0x0090: 0x05e0, # HEBREW LETTER NUN
0x0091: 0x05e1, # HEBREW LETTER SAMEKH
0x0092: 0x05e2, # HEBREW LETTER AYIN
0x0093: 0x05e3, # HEBREW LETTER FINAL PE
0x0094: 0x05e4, # HEBREW LETTER PE
0x0095: 0x05e5, # HEBREW LETTER FINAL TSADI
0x0096: 0x05e6, # HEBREW LETTER TSADI
0x0097: 0x05e7, # HEBREW LETTER QOF
0x0098: 0x05e8, # HEBREW LETTER RESH
0x0099: 0x05e9, # HEBREW LETTER SHIN
0x009a: 0x05ea, # HEBREW LETTER TAV
0x009b: 0x00a2, # CENT SIGN
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00a5, # YEN SIGN
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S (GERMAN)
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\u05d0' # 0x0080 -> HEBREW LETTER ALEF
u'\u05d1' # 0x0081 -> HEBREW LETTER BET
u'\u05d2' # 0x0082 -> HEBREW LETTER GIMEL
u'\u05d3' # 0x0083 -> HEBREW LETTER DALET
u'\u05d4' # 0x0084 -> HEBREW LETTER HE
u'\u05d5' # 0x0085 -> HEBREW LETTER VAV
u'\u05d6' # 0x0086 -> HEBREW LETTER ZAYIN
u'\u05d7' # 0x0087 -> HEBREW LETTER HET
u'\u05d8' # 0x0088 -> HEBREW LETTER TET
u'\u05d9' # 0x0089 -> HEBREW LETTER YOD
u'\u05da' # 0x008a -> HEBREW LETTER FINAL KAF
u'\u05db' # 0x008b -> HEBREW LETTER KAF
u'\u05dc' # 0x008c -> HEBREW LETTER LAMED
u'\u05dd' # 0x008d -> HEBREW LETTER FINAL MEM
u'\u05de' # 0x008e -> HEBREW LETTER MEM
u'\u05df' # 0x008f -> HEBREW LETTER FINAL NUN
u'\u05e0' # 0x0090 -> HEBREW LETTER NUN
u'\u05e1' # 0x0091 -> HEBREW LETTER SAMEKH
u'\u05e2' # 0x0092 -> HEBREW LETTER AYIN
u'\u05e3' # 0x0093 -> HEBREW LETTER FINAL PE
u'\u05e4' # 0x0094 -> HEBREW LETTER PE
u'\u05e5' # 0x0095 -> HEBREW LETTER FINAL TSADI
u'\u05e6' # 0x0096 -> HEBREW LETTER TSADI
u'\u05e7' # 0x0097 -> HEBREW LETTER QOF
u'\u05e8' # 0x0098 -> HEBREW LETTER RESH
u'\u05e9' # 0x0099 -> HEBREW LETTER SHIN
u'\u05ea' # 0x009a -> HEBREW LETTER TAV
u'\xa2' # 0x009b -> CENT SIGN
u'\xa3' # 0x009c -> POUND SIGN
u'\xa5' # 0x009d -> YEN SIGN
u'\u20a7' # 0x009e -> PESETA SIGN
u'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
u'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
u'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
u'\u2310' # 0x00a9 -> REVERSED NOT SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S (GERMAN)
u'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
u'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
u'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
u'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
u'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
u'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
u'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
u'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
u'\u221e' # 0x00ec -> INFINITY
u'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
u'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
u'\u2229' # 0x00ef -> INTERSECTION
u'\u2261' # 0x00f0 -> IDENTICAL TO
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
u'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
u'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
u'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\u2248' # 0x00f7 -> ALMOST EQUAL TO
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\u221a' # 0x00fb -> SQUARE ROOT
u'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x009b, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a5: 0x009d, # YEN SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S (GERMAN)
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f7: 0x00f6, # DIVISION SIGN
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x05d0: 0x0080, # HEBREW LETTER ALEF
0x05d1: 0x0081, # HEBREW LETTER BET
0x05d2: 0x0082, # HEBREW LETTER GIMEL
0x05d3: 0x0083, # HEBREW LETTER DALET
0x05d4: 0x0084, # HEBREW LETTER HE
0x05d5: 0x0085, # HEBREW LETTER VAV
0x05d6: 0x0086, # HEBREW LETTER ZAYIN
0x05d7: 0x0087, # HEBREW LETTER HET
0x05d8: 0x0088, # HEBREW LETTER TET
0x05d9: 0x0089, # HEBREW LETTER YOD
0x05da: 0x008a, # HEBREW LETTER FINAL KAF
0x05db: 0x008b, # HEBREW LETTER KAF
0x05dc: 0x008c, # HEBREW LETTER LAMED
0x05dd: 0x008d, # HEBREW LETTER FINAL MEM
0x05de: 0x008e, # HEBREW LETTER MEM
0x05df: 0x008f, # HEBREW LETTER FINAL NUN
0x05e0: 0x0090, # HEBREW LETTER NUN
0x05e1: 0x0091, # HEBREW LETTER SAMEKH
0x05e2: 0x0092, # HEBREW LETTER AYIN
0x05e3: 0x0093, # HEBREW LETTER FINAL PE
0x05e4: 0x0094, # HEBREW LETTER PE
0x05e5: 0x0095, # HEBREW LETTER FINAL TSADI
0x05e6: 0x0096, # HEBREW LETTER TSADI
0x05e7: 0x0097, # HEBREW LETTER QOF
0x05e8: 0x0098, # HEBREW LETTER RESH
0x05e9: 0x0099, # HEBREW LETTER SHIN
0x05ea: 0x009a, # HEBREW LETTER TAV
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP863.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp863',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00b6, # PILCROW SIGN
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x2017, # DOUBLE LOW LINE
0x008e: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x008f: 0x00a7, # SECTION SIGN
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x0092: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x0095: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00a4, # CURRENCY SIGN
0x0099: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00a2, # CENT SIGN
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x009e: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00a6, # BROKEN BAR
0x00a1: 0x00b4, # ACUTE ACCENT
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00a8, # DIAERESIS
0x00a5: 0x00b8, # CEDILLA
0x00a6: 0x00b3, # SUPERSCRIPT THREE
0x00a7: 0x00af, # MACRON
0x00a8: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xc2' # 0x0084 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
u'\xb6' # 0x0086 -> PILCROW SIGN
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
u'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\u2017' # 0x008d -> DOUBLE LOW LINE
u'\xc0' # 0x008e -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xa7' # 0x008f -> SECTION SIGN
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xc8' # 0x0091 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xca' # 0x0092 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xcb' # 0x0094 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcf' # 0x0095 -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
u'\xa4' # 0x0098 -> CURRENCY SIGN
u'\xd4' # 0x0099 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xa2' # 0x009b -> CENT SIGN
u'\xa3' # 0x009c -> POUND SIGN
u'\xd9' # 0x009d -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xdb' # 0x009e -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
u'\xa6' # 0x00a0 -> BROKEN BAR
u'\xb4' # 0x00a1 -> ACUTE ACCENT
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xa8' # 0x00a4 -> DIAERESIS
u'\xb8' # 0x00a5 -> CEDILLA
u'\xb3' # 0x00a6 -> SUPERSCRIPT THREE
u'\xaf' # 0x00a7 -> MACRON
u'\xce' # 0x00a8 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\u2310' # 0x00a9 -> REVERSED NOT SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xbe' # 0x00ad -> VULGAR FRACTION THREE QUARTERS
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
u'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
u'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
u'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
u'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
u'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
u'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
u'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
u'\u221e' # 0x00ec -> INFINITY
u'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
u'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
u'\u2229' # 0x00ef -> INTERSECTION
u'\u2261' # 0x00f0 -> IDENTICAL TO
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
u'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
u'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
u'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\u2248' # 0x00f7 -> ALMOST EQUAL TO
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\u221a' # 0x00fb -> SQUARE ROOT
u'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a2: 0x009b, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x0098, # CURRENCY SIGN
0x00a6: 0x00a0, # BROKEN BAR
0x00a7: 0x008f, # SECTION SIGN
0x00a8: 0x00a4, # DIAERESIS
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00af: 0x00a7, # MACRON
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00a6, # SUPERSCRIPT THREE
0x00b4: 0x00a1, # ACUTE ACCENT
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x0086, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b8: 0x00a5, # CEDILLA
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00ad, # VULGAR FRACTION THREE QUARTERS
0x00c0: 0x008e, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c2: 0x0084, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x0091, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x0092, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cb: 0x0094, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00ce: 0x00a8, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00cf: 0x0095, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d4: 0x0099, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d9: 0x009d, # LATIN CAPITAL LETTER U WITH GRAVE
0x00db: 0x009e, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f7: 0x00f6, # DIVISION SIGN
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x2017: 0x008d, # DOUBLE LOW LINE
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP864.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp864',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0025: 0x066a, # ARABIC PERCENT SIGN
0x0080: 0x00b0, # DEGREE SIGN
0x0081: 0x00b7, # MIDDLE DOT
0x0082: 0x2219, # BULLET OPERATOR
0x0083: 0x221a, # SQUARE ROOT
0x0084: 0x2592, # MEDIUM SHADE
0x0085: 0x2500, # FORMS LIGHT HORIZONTAL
0x0086: 0x2502, # FORMS LIGHT VERTICAL
0x0087: 0x253c, # FORMS LIGHT VERTICAL AND HORIZONTAL
0x0088: 0x2524, # FORMS LIGHT VERTICAL AND LEFT
0x0089: 0x252c, # FORMS LIGHT DOWN AND HORIZONTAL
0x008a: 0x251c, # FORMS LIGHT VERTICAL AND RIGHT
0x008b: 0x2534, # FORMS LIGHT UP AND HORIZONTAL
0x008c: 0x2510, # FORMS LIGHT DOWN AND LEFT
0x008d: 0x250c, # FORMS LIGHT DOWN AND RIGHT
0x008e: 0x2514, # FORMS LIGHT UP AND RIGHT
0x008f: 0x2518, # FORMS LIGHT UP AND LEFT
0x0090: 0x03b2, # GREEK SMALL BETA
0x0091: 0x221e, # INFINITY
0x0092: 0x03c6, # GREEK SMALL PHI
0x0093: 0x00b1, # PLUS-OR-MINUS SIGN
0x0094: 0x00bd, # FRACTION 1/2
0x0095: 0x00bc, # FRACTION 1/4
0x0096: 0x2248, # ALMOST EQUAL TO
0x0097: 0x00ab, # LEFT POINTING GUILLEMET
0x0098: 0x00bb, # RIGHT POINTING GUILLEMET
0x0099: 0xfef7, # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM
0x009a: 0xfef8, # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE FINAL FORM
0x009b: None, # UNDEFINED
0x009c: None, # UNDEFINED
0x009d: 0xfefb, # ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM
0x009e: 0xfefc, # ARABIC LIGATURE LAM WITH ALEF FINAL FORM
0x009f: None, # UNDEFINED
0x00a1: 0x00ad, # SOFT HYPHEN
0x00a2: 0xfe82, # ARABIC LETTER ALEF WITH MADDA ABOVE FINAL FORM
0x00a5: 0xfe84, # ARABIC LETTER ALEF WITH HAMZA ABOVE FINAL FORM
0x00a6: None, # UNDEFINED
0x00a7: None, # UNDEFINED
0x00a8: 0xfe8e, # ARABIC LETTER ALEF FINAL FORM
0x00a9: 0xfe8f, # ARABIC LETTER BEH ISOLATED FORM
0x00aa: 0xfe95, # ARABIC LETTER TEH ISOLATED FORM
0x00ab: 0xfe99, # ARABIC LETTER THEH ISOLATED FORM
0x00ac: 0x060c, # ARABIC COMMA
0x00ad: 0xfe9d, # ARABIC LETTER JEEM ISOLATED FORM
0x00ae: 0xfea1, # ARABIC LETTER HAH ISOLATED FORM
0x00af: 0xfea5, # ARABIC LETTER KHAH ISOLATED FORM
0x00b0: 0x0660, # ARABIC-INDIC DIGIT ZERO
0x00b1: 0x0661, # ARABIC-INDIC DIGIT ONE
0x00b2: 0x0662, # ARABIC-INDIC DIGIT TWO
0x00b3: 0x0663, # ARABIC-INDIC DIGIT THREE
0x00b4: 0x0664, # ARABIC-INDIC DIGIT FOUR
0x00b5: 0x0665, # ARABIC-INDIC DIGIT FIVE
0x00b6: 0x0666, # ARABIC-INDIC DIGIT SIX
0x00b7: 0x0667, # ARABIC-INDIC DIGIT SEVEN
0x00b8: 0x0668, # ARABIC-INDIC DIGIT EIGHT
0x00b9: 0x0669, # ARABIC-INDIC DIGIT NINE
0x00ba: 0xfed1, # ARABIC LETTER FEH ISOLATED FORM
0x00bb: 0x061b, # ARABIC SEMICOLON
0x00bc: 0xfeb1, # ARABIC LETTER SEEN ISOLATED FORM
0x00bd: 0xfeb5, # ARABIC LETTER SHEEN ISOLATED FORM
0x00be: 0xfeb9, # ARABIC LETTER SAD ISOLATED FORM
0x00bf: 0x061f, # ARABIC QUESTION MARK
0x00c0: 0x00a2, # CENT SIGN
0x00c1: 0xfe80, # ARABIC LETTER HAMZA ISOLATED FORM
0x00c2: 0xfe81, # ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
0x00c3: 0xfe83, # ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
0x00c4: 0xfe85, # ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
0x00c5: 0xfeca, # ARABIC LETTER AIN FINAL FORM
0x00c6: 0xfe8b, # ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
0x00c7: 0xfe8d, # ARABIC LETTER ALEF ISOLATED FORM
0x00c8: 0xfe91, # ARABIC LETTER BEH INITIAL FORM
0x00c9: 0xfe93, # ARABIC LETTER TEH MARBUTA ISOLATED FORM
0x00ca: 0xfe97, # ARABIC LETTER TEH INITIAL FORM
0x00cb: 0xfe9b, # ARABIC LETTER THEH INITIAL FORM
0x00cc: 0xfe9f, # ARABIC LETTER JEEM INITIAL FORM
0x00cd: 0xfea3, # ARABIC LETTER HAH INITIAL FORM
0x00ce: 0xfea7, # ARABIC LETTER KHAH INITIAL FORM
0x00cf: 0xfea9, # ARABIC LETTER DAL ISOLATED FORM
0x00d0: 0xfeab, # ARABIC LETTER THAL ISOLATED FORM
0x00d1: 0xfead, # ARABIC LETTER REH ISOLATED FORM
0x00d2: 0xfeaf, # ARABIC LETTER ZAIN ISOLATED FORM
0x00d3: 0xfeb3, # ARABIC LETTER SEEN INITIAL FORM
0x00d4: 0xfeb7, # ARABIC LETTER SHEEN INITIAL FORM
0x00d5: 0xfebb, # ARABIC LETTER SAD INITIAL FORM
0x00d6: 0xfebf, # ARABIC LETTER DAD INITIAL FORM
0x00d7: 0xfec1, # ARABIC LETTER TAH ISOLATED FORM
0x00d8: 0xfec5, # ARABIC LETTER ZAH ISOLATED FORM
0x00d9: 0xfecb, # ARABIC LETTER AIN INITIAL FORM
0x00da: 0xfecf, # ARABIC LETTER GHAIN INITIAL FORM
0x00db: 0x00a6, # BROKEN VERTICAL BAR
0x00dc: 0x00ac, # NOT SIGN
0x00dd: 0x00f7, # DIVISION SIGN
0x00de: 0x00d7, # MULTIPLICATION SIGN
0x00df: 0xfec9, # ARABIC LETTER AIN ISOLATED FORM
0x00e0: 0x0640, # ARABIC TATWEEL
0x00e1: 0xfed3, # ARABIC LETTER FEH INITIAL FORM
0x00e2: 0xfed7, # ARABIC LETTER QAF INITIAL FORM
0x00e3: 0xfedb, # ARABIC LETTER KAF INITIAL FORM
0x00e4: 0xfedf, # ARABIC LETTER LAM INITIAL FORM
0x00e5: 0xfee3, # ARABIC LETTER MEEM INITIAL FORM
0x00e6: 0xfee7, # ARABIC LETTER NOON INITIAL FORM
0x00e7: 0xfeeb, # ARABIC LETTER HEH INITIAL FORM
0x00e8: 0xfeed, # ARABIC LETTER WAW ISOLATED FORM
0x00e9: 0xfeef, # ARABIC LETTER ALEF MAKSURA ISOLATED FORM
0x00ea: 0xfef3, # ARABIC LETTER YEH INITIAL FORM
0x00eb: 0xfebd, # ARABIC LETTER DAD ISOLATED FORM
0x00ec: 0xfecc, # ARABIC LETTER AIN MEDIAL FORM
0x00ed: 0xfece, # ARABIC LETTER GHAIN FINAL FORM
0x00ee: 0xfecd, # ARABIC LETTER GHAIN ISOLATED FORM
0x00ef: 0xfee1, # ARABIC LETTER MEEM ISOLATED FORM
0x00f0: 0xfe7d, # ARABIC SHADDA MEDIAL FORM
0x00f1: 0x0651, # ARABIC SHADDAH
0x00f2: 0xfee5, # ARABIC LETTER NOON ISOLATED FORM
0x00f3: 0xfee9, # ARABIC LETTER HEH ISOLATED FORM
0x00f4: 0xfeec, # ARABIC LETTER HEH MEDIAL FORM
0x00f5: 0xfef0, # ARABIC LETTER ALEF MAKSURA FINAL FORM
0x00f6: 0xfef2, # ARABIC LETTER YEH FINAL FORM
0x00f7: 0xfed0, # ARABIC LETTER GHAIN MEDIAL FORM
0x00f8: 0xfed5, # ARABIC LETTER QAF ISOLATED FORM
0x00f9: 0xfef5, # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE ISOLATED FORM
0x00fa: 0xfef6, # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE FINAL FORM
0x00fb: 0xfedd, # ARABIC LETTER LAM ISOLATED FORM
0x00fc: 0xfed9, # ARABIC LETTER KAF ISOLATED FORM
0x00fd: 0xfef1, # ARABIC LETTER YEH ISOLATED FORM
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: None, # UNDEFINED
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'\u066a' # 0x0025 -> ARABIC PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xb0' # 0x0080 -> DEGREE SIGN
u'\xb7' # 0x0081 -> MIDDLE DOT
u'\u2219' # 0x0082 -> BULLET OPERATOR
u'\u221a' # 0x0083 -> SQUARE ROOT
u'\u2592' # 0x0084 -> MEDIUM SHADE
u'\u2500' # 0x0085 -> FORMS LIGHT HORIZONTAL
u'\u2502' # 0x0086 -> FORMS LIGHT VERTICAL
u'\u253c' # 0x0087 -> FORMS LIGHT VERTICAL AND HORIZONTAL
u'\u2524' # 0x0088 -> FORMS LIGHT VERTICAL AND LEFT
u'\u252c' # 0x0089 -> FORMS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x008a -> FORMS LIGHT VERTICAL AND RIGHT
u'\u2534' # 0x008b -> FORMS LIGHT UP AND HORIZONTAL
u'\u2510' # 0x008c -> FORMS LIGHT DOWN AND LEFT
u'\u250c' # 0x008d -> FORMS LIGHT DOWN AND RIGHT
u'\u2514' # 0x008e -> FORMS LIGHT UP AND RIGHT
u'\u2518' # 0x008f -> FORMS LIGHT UP AND LEFT
u'\u03b2' # 0x0090 -> GREEK SMALL BETA
u'\u221e' # 0x0091 -> INFINITY
u'\u03c6' # 0x0092 -> GREEK SMALL PHI
u'\xb1' # 0x0093 -> PLUS-OR-MINUS SIGN
u'\xbd' # 0x0094 -> FRACTION 1/2
u'\xbc' # 0x0095 -> FRACTION 1/4
u'\u2248' # 0x0096 -> ALMOST EQUAL TO
u'\xab' # 0x0097 -> LEFT POINTING GUILLEMET
u'\xbb' # 0x0098 -> RIGHT POINTING GUILLEMET
u'\ufef7' # 0x0099 -> ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM
u'\ufef8' # 0x009a -> ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE FINAL FORM
u'\ufffe' # 0x009b -> UNDEFINED
u'\ufffe' # 0x009c -> UNDEFINED
u'\ufefb' # 0x009d -> ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM
u'\ufefc' # 0x009e -> ARABIC LIGATURE LAM WITH ALEF FINAL FORM
u'\ufffe' # 0x009f -> UNDEFINED
u'\xa0' # 0x00a0 -> NON-BREAKING SPACE
u'\xad' # 0x00a1 -> SOFT HYPHEN
u'\ufe82' # 0x00a2 -> ARABIC LETTER ALEF WITH MADDA ABOVE FINAL FORM
u'\xa3' # 0x00a3 -> POUND SIGN
u'\xa4' # 0x00a4 -> CURRENCY SIGN
u'\ufe84' # 0x00a5 -> ARABIC LETTER ALEF WITH HAMZA ABOVE FINAL FORM
u'\ufffe' # 0x00a6 -> UNDEFINED
u'\ufffe' # 0x00a7 -> UNDEFINED
u'\ufe8e' # 0x00a8 -> ARABIC LETTER ALEF FINAL FORM
u'\ufe8f' # 0x00a9 -> ARABIC LETTER BEH ISOLATED FORM
u'\ufe95' # 0x00aa -> ARABIC LETTER TEH ISOLATED FORM
u'\ufe99' # 0x00ab -> ARABIC LETTER THEH ISOLATED FORM
u'\u060c' # 0x00ac -> ARABIC COMMA
u'\ufe9d' # 0x00ad -> ARABIC LETTER JEEM ISOLATED FORM
u'\ufea1' # 0x00ae -> ARABIC LETTER HAH ISOLATED FORM
u'\ufea5' # 0x00af -> ARABIC LETTER KHAH ISOLATED FORM
u'\u0660' # 0x00b0 -> ARABIC-INDIC DIGIT ZERO
u'\u0661' # 0x00b1 -> ARABIC-INDIC DIGIT ONE
u'\u0662' # 0x00b2 -> ARABIC-INDIC DIGIT TWO
u'\u0663' # 0x00b3 -> ARABIC-INDIC DIGIT THREE
u'\u0664' # 0x00b4 -> ARABIC-INDIC DIGIT FOUR
u'\u0665' # 0x00b5 -> ARABIC-INDIC DIGIT FIVE
u'\u0666' # 0x00b6 -> ARABIC-INDIC DIGIT SIX
u'\u0667' # 0x00b7 -> ARABIC-INDIC DIGIT SEVEN
u'\u0668' # 0x00b8 -> ARABIC-INDIC DIGIT EIGHT
u'\u0669' # 0x00b9 -> ARABIC-INDIC DIGIT NINE
u'\ufed1' # 0x00ba -> ARABIC LETTER FEH ISOLATED FORM
u'\u061b' # 0x00bb -> ARABIC SEMICOLON
u'\ufeb1' # 0x00bc -> ARABIC LETTER SEEN ISOLATED FORM
u'\ufeb5' # 0x00bd -> ARABIC LETTER SHEEN ISOLATED FORM
u'\ufeb9' # 0x00be -> ARABIC LETTER SAD ISOLATED FORM
u'\u061f' # 0x00bf -> ARABIC QUESTION MARK
u'\xa2' # 0x00c0 -> CENT SIGN
u'\ufe80' # 0x00c1 -> ARABIC LETTER HAMZA ISOLATED FORM
u'\ufe81' # 0x00c2 -> ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
u'\ufe83' # 0x00c3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
u'\ufe85' # 0x00c4 -> ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
u'\ufeca' # 0x00c5 -> ARABIC LETTER AIN FINAL FORM
u'\ufe8b' # 0x00c6 -> ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
u'\ufe8d' # 0x00c7 -> ARABIC LETTER ALEF ISOLATED FORM
u'\ufe91' # 0x00c8 -> ARABIC LETTER BEH INITIAL FORM
u'\ufe93' # 0x00c9 -> ARABIC LETTER TEH MARBUTA ISOLATED FORM
u'\ufe97' # 0x00ca -> ARABIC LETTER TEH INITIAL FORM
u'\ufe9b' # 0x00cb -> ARABIC LETTER THEH INITIAL FORM
u'\ufe9f' # 0x00cc -> ARABIC LETTER JEEM INITIAL FORM
u'\ufea3' # 0x00cd -> ARABIC LETTER HAH INITIAL FORM
u'\ufea7' # 0x00ce -> ARABIC LETTER KHAH INITIAL FORM
u'\ufea9' # 0x00cf -> ARABIC LETTER DAL ISOLATED FORM
u'\ufeab' # 0x00d0 -> ARABIC LETTER THAL ISOLATED FORM
u'\ufead' # 0x00d1 -> ARABIC LETTER REH ISOLATED FORM
u'\ufeaf' # 0x00d2 -> ARABIC LETTER ZAIN ISOLATED FORM
u'\ufeb3' # 0x00d3 -> ARABIC LETTER SEEN INITIAL FORM
u'\ufeb7' # 0x00d4 -> ARABIC LETTER SHEEN INITIAL FORM
u'\ufebb' # 0x00d5 -> ARABIC LETTER SAD INITIAL FORM
u'\ufebf' # 0x00d6 -> ARABIC LETTER DAD INITIAL FORM
u'\ufec1' # 0x00d7 -> ARABIC LETTER TAH ISOLATED FORM
u'\ufec5' # 0x00d8 -> ARABIC LETTER ZAH ISOLATED FORM
u'\ufecb' # 0x00d9 -> ARABIC LETTER AIN INITIAL FORM
u'\ufecf' # 0x00da -> ARABIC LETTER GHAIN INITIAL FORM
u'\xa6' # 0x00db -> BROKEN VERTICAL BAR
u'\xac' # 0x00dc -> NOT SIGN
u'\xf7' # 0x00dd -> DIVISION SIGN
u'\xd7' # 0x00de -> MULTIPLICATION SIGN
u'\ufec9' # 0x00df -> ARABIC LETTER AIN ISOLATED FORM
u'\u0640' # 0x00e0 -> ARABIC TATWEEL
u'\ufed3' # 0x00e1 -> ARABIC LETTER FEH INITIAL FORM
u'\ufed7' # 0x00e2 -> ARABIC LETTER QAF INITIAL FORM
u'\ufedb' # 0x00e3 -> ARABIC LETTER KAF INITIAL FORM
u'\ufedf' # 0x00e4 -> ARABIC LETTER LAM INITIAL FORM
u'\ufee3' # 0x00e5 -> ARABIC LETTER MEEM INITIAL FORM
u'\ufee7' # 0x00e6 -> ARABIC LETTER NOON INITIAL FORM
u'\ufeeb' # 0x00e7 -> ARABIC LETTER HEH INITIAL FORM
u'\ufeed' # 0x00e8 -> ARABIC LETTER WAW ISOLATED FORM
u'\ufeef' # 0x00e9 -> ARABIC LETTER ALEF MAKSURA ISOLATED FORM
u'\ufef3' # 0x00ea -> ARABIC LETTER YEH INITIAL FORM
u'\ufebd' # 0x00eb -> ARABIC LETTER DAD ISOLATED FORM
u'\ufecc' # 0x00ec -> ARABIC LETTER AIN MEDIAL FORM
u'\ufece' # 0x00ed -> ARABIC LETTER GHAIN FINAL FORM
u'\ufecd' # 0x00ee -> ARABIC LETTER GHAIN ISOLATED FORM
u'\ufee1' # 0x00ef -> ARABIC LETTER MEEM ISOLATED FORM
u'\ufe7d' # 0x00f0 -> ARABIC SHADDA MEDIAL FORM
u'\u0651' # 0x00f1 -> ARABIC SHADDAH
u'\ufee5' # 0x00f2 -> ARABIC LETTER NOON ISOLATED FORM
u'\ufee9' # 0x00f3 -> ARABIC LETTER HEH ISOLATED FORM
u'\ufeec' # 0x00f4 -> ARABIC LETTER HEH MEDIAL FORM
u'\ufef0' # 0x00f5 -> ARABIC LETTER ALEF MAKSURA FINAL FORM
u'\ufef2' # 0x00f6 -> ARABIC LETTER YEH FINAL FORM
u'\ufed0' # 0x00f7 -> ARABIC LETTER GHAIN MEDIAL FORM
u'\ufed5' # 0x00f8 -> ARABIC LETTER QAF ISOLATED FORM
u'\ufef5' # 0x00f9 -> ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE ISOLATED FORM
u'\ufef6' # 0x00fa -> ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE FINAL FORM
u'\ufedd' # 0x00fb -> ARABIC LETTER LAM ISOLATED FORM
u'\ufed9' # 0x00fc -> ARABIC LETTER KAF ISOLATED FORM
u'\ufef1' # 0x00fd -> ARABIC LETTER YEH ISOLATED FORM
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\ufffe' # 0x00ff -> UNDEFINED
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00a0, # NON-BREAKING SPACE
0x00a2: 0x00c0, # CENT SIGN
0x00a3: 0x00a3, # POUND SIGN
0x00a4: 0x00a4, # CURRENCY SIGN
0x00a6: 0x00db, # BROKEN VERTICAL BAR
0x00ab: 0x0097, # LEFT POINTING GUILLEMET
0x00ac: 0x00dc, # NOT SIGN
0x00ad: 0x00a1, # SOFT HYPHEN
0x00b0: 0x0080, # DEGREE SIGN
0x00b1: 0x0093, # PLUS-OR-MINUS SIGN
0x00b7: 0x0081, # MIDDLE DOT
0x00bb: 0x0098, # RIGHT POINTING GUILLEMET
0x00bc: 0x0095, # FRACTION 1/4
0x00bd: 0x0094, # FRACTION 1/2
0x00d7: 0x00de, # MULTIPLICATION SIGN
0x00f7: 0x00dd, # DIVISION SIGN
0x03b2: 0x0090, # GREEK SMALL BETA
0x03c6: 0x0092, # GREEK SMALL PHI
0x060c: 0x00ac, # ARABIC COMMA
0x061b: 0x00bb, # ARABIC SEMICOLON
0x061f: 0x00bf, # ARABIC QUESTION MARK
0x0640: 0x00e0, # ARABIC TATWEEL
0x0651: 0x00f1, # ARABIC SHADDAH
0x0660: 0x00b0, # ARABIC-INDIC DIGIT ZERO
0x0661: 0x00b1, # ARABIC-INDIC DIGIT ONE
0x0662: 0x00b2, # ARABIC-INDIC DIGIT TWO
0x0663: 0x00b3, # ARABIC-INDIC DIGIT THREE
0x0664: 0x00b4, # ARABIC-INDIC DIGIT FOUR
0x0665: 0x00b5, # ARABIC-INDIC DIGIT FIVE
0x0666: 0x00b6, # ARABIC-INDIC DIGIT SIX
0x0667: 0x00b7, # ARABIC-INDIC DIGIT SEVEN
0x0668: 0x00b8, # ARABIC-INDIC DIGIT EIGHT
0x0669: 0x00b9, # ARABIC-INDIC DIGIT NINE
0x066a: 0x0025, # ARABIC PERCENT SIGN
0x2219: 0x0082, # BULLET OPERATOR
0x221a: 0x0083, # SQUARE ROOT
0x221e: 0x0091, # INFINITY
0x2248: 0x0096, # ALMOST EQUAL TO
0x2500: 0x0085, # FORMS LIGHT HORIZONTAL
0x2502: 0x0086, # FORMS LIGHT VERTICAL
0x250c: 0x008d, # FORMS LIGHT DOWN AND RIGHT
0x2510: 0x008c, # FORMS LIGHT DOWN AND LEFT
0x2514: 0x008e, # FORMS LIGHT UP AND RIGHT
0x2518: 0x008f, # FORMS LIGHT UP AND LEFT
0x251c: 0x008a, # FORMS LIGHT VERTICAL AND RIGHT
0x2524: 0x0088, # FORMS LIGHT VERTICAL AND LEFT
0x252c: 0x0089, # FORMS LIGHT DOWN AND HORIZONTAL
0x2534: 0x008b, # FORMS LIGHT UP AND HORIZONTAL
0x253c: 0x0087, # FORMS LIGHT VERTICAL AND HORIZONTAL
0x2592: 0x0084, # MEDIUM SHADE
0x25a0: 0x00fe, # BLACK SQUARE
0xfe7d: 0x00f0, # ARABIC SHADDA MEDIAL FORM
0xfe80: 0x00c1, # ARABIC LETTER HAMZA ISOLATED FORM
0xfe81: 0x00c2, # ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
0xfe82: 0x00a2, # ARABIC LETTER ALEF WITH MADDA ABOVE FINAL FORM
0xfe83: 0x00c3, # ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
0xfe84: 0x00a5, # ARABIC LETTER ALEF WITH HAMZA ABOVE FINAL FORM
0xfe85: 0x00c4, # ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
0xfe8b: 0x00c6, # ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
0xfe8d: 0x00c7, # ARABIC LETTER ALEF ISOLATED FORM
0xfe8e: 0x00a8, # ARABIC LETTER ALEF FINAL FORM
0xfe8f: 0x00a9, # ARABIC LETTER BEH ISOLATED FORM
0xfe91: 0x00c8, # ARABIC LETTER BEH INITIAL FORM
0xfe93: 0x00c9, # ARABIC LETTER TEH MARBUTA ISOLATED FORM
0xfe95: 0x00aa, # ARABIC LETTER TEH ISOLATED FORM
0xfe97: 0x00ca, # ARABIC LETTER TEH INITIAL FORM
0xfe99: 0x00ab, # ARABIC LETTER THEH ISOLATED FORM
0xfe9b: 0x00cb, # ARABIC LETTER THEH INITIAL FORM
0xfe9d: 0x00ad, # ARABIC LETTER JEEM ISOLATED FORM
0xfe9f: 0x00cc, # ARABIC LETTER JEEM INITIAL FORM
0xfea1: 0x00ae, # ARABIC LETTER HAH ISOLATED FORM
0xfea3: 0x00cd, # ARABIC LETTER HAH INITIAL FORM
0xfea5: 0x00af, # ARABIC LETTER KHAH ISOLATED FORM
0xfea7: 0x00ce, # ARABIC LETTER KHAH INITIAL FORM
0xfea9: 0x00cf, # ARABIC LETTER DAL ISOLATED FORM
0xfeab: 0x00d0, # ARABIC LETTER THAL ISOLATED FORM
0xfead: 0x00d1, # ARABIC LETTER REH ISOLATED FORM
0xfeaf: 0x00d2, # ARABIC LETTER ZAIN ISOLATED FORM
0xfeb1: 0x00bc, # ARABIC LETTER SEEN ISOLATED FORM
0xfeb3: 0x00d3, # ARABIC LETTER SEEN INITIAL FORM
0xfeb5: 0x00bd, # ARABIC LETTER SHEEN ISOLATED FORM
0xfeb7: 0x00d4, # ARABIC LETTER SHEEN INITIAL FORM
0xfeb9: 0x00be, # ARABIC LETTER SAD ISOLATED FORM
0xfebb: 0x00d5, # ARABIC LETTER SAD INITIAL FORM
0xfebd: 0x00eb, # ARABIC LETTER DAD ISOLATED FORM
0xfebf: 0x00d6, # ARABIC LETTER DAD INITIAL FORM
0xfec1: 0x00d7, # ARABIC LETTER TAH ISOLATED FORM
0xfec5: 0x00d8, # ARABIC LETTER ZAH ISOLATED FORM
0xfec9: 0x00df, # ARABIC LETTER AIN ISOLATED FORM
0xfeca: 0x00c5, # ARABIC LETTER AIN FINAL FORM
0xfecb: 0x00d9, # ARABIC LETTER AIN INITIAL FORM
0xfecc: 0x00ec, # ARABIC LETTER AIN MEDIAL FORM
0xfecd: 0x00ee, # ARABIC LETTER GHAIN ISOLATED FORM
0xfece: 0x00ed, # ARABIC LETTER GHAIN FINAL FORM
0xfecf: 0x00da, # ARABIC LETTER GHAIN INITIAL FORM
0xfed0: 0x00f7, # ARABIC LETTER GHAIN MEDIAL FORM
0xfed1: 0x00ba, # ARABIC LETTER FEH ISOLATED FORM
0xfed3: 0x00e1, # ARABIC LETTER FEH INITIAL FORM
0xfed5: 0x00f8, # ARABIC LETTER QAF ISOLATED FORM
0xfed7: 0x00e2, # ARABIC LETTER QAF INITIAL FORM
0xfed9: 0x00fc, # ARABIC LETTER KAF ISOLATED FORM
0xfedb: 0x00e3, # ARABIC LETTER KAF INITIAL FORM
0xfedd: 0x00fb, # ARABIC LETTER LAM ISOLATED FORM
0xfedf: 0x00e4, # ARABIC LETTER LAM INITIAL FORM
0xfee1: 0x00ef, # ARABIC LETTER MEEM ISOLATED FORM
0xfee3: 0x00e5, # ARABIC LETTER MEEM INITIAL FORM
0xfee5: 0x00f2, # ARABIC LETTER NOON ISOLATED FORM
0xfee7: 0x00e6, # ARABIC LETTER NOON INITIAL FORM
0xfee9: 0x00f3, # ARABIC LETTER HEH ISOLATED FORM
0xfeeb: 0x00e7, # ARABIC LETTER HEH INITIAL FORM
0xfeec: 0x00f4, # ARABIC LETTER HEH MEDIAL FORM
0xfeed: 0x00e8, # ARABIC LETTER WAW ISOLATED FORM
0xfeef: 0x00e9, # ARABIC LETTER ALEF MAKSURA ISOLATED FORM
0xfef0: 0x00f5, # ARABIC LETTER ALEF MAKSURA FINAL FORM
0xfef1: 0x00fd, # ARABIC LETTER YEH ISOLATED FORM
0xfef2: 0x00f6, # ARABIC LETTER YEH FINAL FORM
0xfef3: 0x00ea, # ARABIC LETTER YEH INITIAL FORM
0xfef5: 0x00f9, # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE ISOLATED FORM
0xfef6: 0x00fa, # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE FINAL FORM
0xfef7: 0x0099, # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM
0xfef8: 0x009a, # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE FINAL FORM
0xfefb: 0x009d, # ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM
0xfefc: 0x009e, # ARABIC LIGATURE LAM WITH ALEF FINAL FORM
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP865.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp865',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00a4, # CURRENCY SIGN
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
u'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
u'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
u'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
u'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
u'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
u'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
u'\xff' # 0x0098 -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
u'\xa3' # 0x009c -> POUND SIGN
u'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
u'\u20a7' # 0x009e -> PESETA SIGN
u'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
u'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
u'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
u'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
u'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
u'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
u'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
u'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
u'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
u'\u2310' # 0x00a9 -> REVERSED NOT SIGN
u'\xac' # 0x00aa -> NOT SIGN
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
u'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xa4' # 0x00af -> CURRENCY SIGN
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
u'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
u'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
u'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
u'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
u'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
u'\xb5' # 0x00e6 -> MICRO SIGN
u'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
u'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
u'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
u'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
u'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
u'\u221e' # 0x00ec -> INFINITY
u'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
u'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
u'\u2229' # 0x00ef -> INTERSECTION
u'\u2261' # 0x00f0 -> IDENTICAL TO
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
u'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
u'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
u'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
u'\xf7' # 0x00f6 -> DIVISION SIGN
u'\u2248' # 0x00f7 -> ALMOST EQUAL TO
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\u221a' # 0x00fb -> SQUARE ROOT
u'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
u'\xb2' # 0x00fd -> SUPERSCRIPT TWO
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x00af, # CURRENCY SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00ff: 0x0098, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP866.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp866',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0410, # CYRILLIC CAPITAL LETTER A
0x0081: 0x0411, # CYRILLIC CAPITAL LETTER BE
0x0082: 0x0412, # CYRILLIC CAPITAL LETTER VE
0x0083: 0x0413, # CYRILLIC CAPITAL LETTER GHE
0x0084: 0x0414, # CYRILLIC CAPITAL LETTER DE
0x0085: 0x0415, # CYRILLIC CAPITAL LETTER IE
0x0086: 0x0416, # CYRILLIC CAPITAL LETTER ZHE
0x0087: 0x0417, # CYRILLIC CAPITAL LETTER ZE
0x0088: 0x0418, # CYRILLIC CAPITAL LETTER I
0x0089: 0x0419, # CYRILLIC CAPITAL LETTER SHORT I
0x008a: 0x041a, # CYRILLIC CAPITAL LETTER KA
0x008b: 0x041b, # CYRILLIC CAPITAL LETTER EL
0x008c: 0x041c, # CYRILLIC CAPITAL LETTER EM
0x008d: 0x041d, # CYRILLIC CAPITAL LETTER EN
0x008e: 0x041e, # CYRILLIC CAPITAL LETTER O
0x008f: 0x041f, # CYRILLIC CAPITAL LETTER PE
0x0090: 0x0420, # CYRILLIC CAPITAL LETTER ER
0x0091: 0x0421, # CYRILLIC CAPITAL LETTER ES
0x0092: 0x0422, # CYRILLIC CAPITAL LETTER TE
0x0093: 0x0423, # CYRILLIC CAPITAL LETTER U
0x0094: 0x0424, # CYRILLIC CAPITAL LETTER EF
0x0095: 0x0425, # CYRILLIC CAPITAL LETTER HA
0x0096: 0x0426, # CYRILLIC CAPITAL LETTER TSE
0x0097: 0x0427, # CYRILLIC CAPITAL LETTER CHE
0x0098: 0x0428, # CYRILLIC CAPITAL LETTER SHA
0x0099: 0x0429, # CYRILLIC CAPITAL LETTER SHCHA
0x009a: 0x042a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x009b: 0x042b, # CYRILLIC CAPITAL LETTER YERU
0x009c: 0x042c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x009d: 0x042d, # CYRILLIC CAPITAL LETTER E
0x009e: 0x042e, # CYRILLIC CAPITAL LETTER YU
0x009f: 0x042f, # CYRILLIC CAPITAL LETTER YA
0x00a0: 0x0430, # CYRILLIC SMALL LETTER A
0x00a1: 0x0431, # CYRILLIC SMALL LETTER BE
0x00a2: 0x0432, # CYRILLIC SMALL LETTER VE
0x00a3: 0x0433, # CYRILLIC SMALL LETTER GHE
0x00a4: 0x0434, # CYRILLIC SMALL LETTER DE
0x00a5: 0x0435, # CYRILLIC SMALL LETTER IE
0x00a6: 0x0436, # CYRILLIC SMALL LETTER ZHE
0x00a7: 0x0437, # CYRILLIC SMALL LETTER ZE
0x00a8: 0x0438, # CYRILLIC SMALL LETTER I
0x00a9: 0x0439, # CYRILLIC SMALL LETTER SHORT I
0x00aa: 0x043a, # CYRILLIC SMALL LETTER KA
0x00ab: 0x043b, # CYRILLIC SMALL LETTER EL
0x00ac: 0x043c, # CYRILLIC SMALL LETTER EM
0x00ad: 0x043d, # CYRILLIC SMALL LETTER EN
0x00ae: 0x043e, # CYRILLIC SMALL LETTER O
0x00af: 0x043f, # CYRILLIC SMALL LETTER PE
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x0440, # CYRILLIC SMALL LETTER ER
0x00e1: 0x0441, # CYRILLIC SMALL LETTER ES
0x00e2: 0x0442, # CYRILLIC SMALL LETTER TE
0x00e3: 0x0443, # CYRILLIC SMALL LETTER U
0x00e4: 0x0444, # CYRILLIC SMALL LETTER EF
0x00e5: 0x0445, # CYRILLIC SMALL LETTER HA
0x00e6: 0x0446, # CYRILLIC SMALL LETTER TSE
0x00e7: 0x0447, # CYRILLIC SMALL LETTER CHE
0x00e8: 0x0448, # CYRILLIC SMALL LETTER SHA
0x00e9: 0x0449, # CYRILLIC SMALL LETTER SHCHA
0x00ea: 0x044a, # CYRILLIC SMALL LETTER HARD SIGN
0x00eb: 0x044b, # CYRILLIC SMALL LETTER YERU
0x00ec: 0x044c, # CYRILLIC SMALL LETTER SOFT SIGN
0x00ed: 0x044d, # CYRILLIC SMALL LETTER E
0x00ee: 0x044e, # CYRILLIC SMALL LETTER YU
0x00ef: 0x044f, # CYRILLIC SMALL LETTER YA
0x00f0: 0x0401, # CYRILLIC CAPITAL LETTER IO
0x00f1: 0x0451, # CYRILLIC SMALL LETTER IO
0x00f2: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x00f3: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x00f4: 0x0407, # CYRILLIC CAPITAL LETTER YI
0x00f5: 0x0457, # CYRILLIC SMALL LETTER YI
0x00f6: 0x040e, # CYRILLIC CAPITAL LETTER SHORT U
0x00f7: 0x045e, # CYRILLIC SMALL LETTER SHORT U
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x2116, # NUMERO SIGN
0x00fd: 0x00a4, # CURRENCY SIGN
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\u0410' # 0x0080 -> CYRILLIC CAPITAL LETTER A
u'\u0411' # 0x0081 -> CYRILLIC CAPITAL LETTER BE
u'\u0412' # 0x0082 -> CYRILLIC CAPITAL LETTER VE
u'\u0413' # 0x0083 -> CYRILLIC CAPITAL LETTER GHE
u'\u0414' # 0x0084 -> CYRILLIC CAPITAL LETTER DE
u'\u0415' # 0x0085 -> CYRILLIC CAPITAL LETTER IE
u'\u0416' # 0x0086 -> CYRILLIC CAPITAL LETTER ZHE
u'\u0417' # 0x0087 -> CYRILLIC CAPITAL LETTER ZE
u'\u0418' # 0x0088 -> CYRILLIC CAPITAL LETTER I
u'\u0419' # 0x0089 -> CYRILLIC CAPITAL LETTER SHORT I
u'\u041a' # 0x008a -> CYRILLIC CAPITAL LETTER KA
u'\u041b' # 0x008b -> CYRILLIC CAPITAL LETTER EL
u'\u041c' # 0x008c -> CYRILLIC CAPITAL LETTER EM
u'\u041d' # 0x008d -> CYRILLIC CAPITAL LETTER EN
u'\u041e' # 0x008e -> CYRILLIC CAPITAL LETTER O
u'\u041f' # 0x008f -> CYRILLIC CAPITAL LETTER PE
u'\u0420' # 0x0090 -> CYRILLIC CAPITAL LETTER ER
u'\u0421' # 0x0091 -> CYRILLIC CAPITAL LETTER ES
u'\u0422' # 0x0092 -> CYRILLIC CAPITAL LETTER TE
u'\u0423' # 0x0093 -> CYRILLIC CAPITAL LETTER U
u'\u0424' # 0x0094 -> CYRILLIC CAPITAL LETTER EF
u'\u0425' # 0x0095 -> CYRILLIC CAPITAL LETTER HA
u'\u0426' # 0x0096 -> CYRILLIC CAPITAL LETTER TSE
u'\u0427' # 0x0097 -> CYRILLIC CAPITAL LETTER CHE
u'\u0428' # 0x0098 -> CYRILLIC CAPITAL LETTER SHA
u'\u0429' # 0x0099 -> CYRILLIC CAPITAL LETTER SHCHA
u'\u042a' # 0x009a -> CYRILLIC CAPITAL LETTER HARD SIGN
u'\u042b' # 0x009b -> CYRILLIC CAPITAL LETTER YERU
u'\u042c' # 0x009c -> CYRILLIC CAPITAL LETTER SOFT SIGN
u'\u042d' # 0x009d -> CYRILLIC CAPITAL LETTER E
u'\u042e' # 0x009e -> CYRILLIC CAPITAL LETTER YU
u'\u042f' # 0x009f -> CYRILLIC CAPITAL LETTER YA
u'\u0430' # 0x00a0 -> CYRILLIC SMALL LETTER A
u'\u0431' # 0x00a1 -> CYRILLIC SMALL LETTER BE
u'\u0432' # 0x00a2 -> CYRILLIC SMALL LETTER VE
u'\u0433' # 0x00a3 -> CYRILLIC SMALL LETTER GHE
u'\u0434' # 0x00a4 -> CYRILLIC SMALL LETTER DE
u'\u0435' # 0x00a5 -> CYRILLIC SMALL LETTER IE
u'\u0436' # 0x00a6 -> CYRILLIC SMALL LETTER ZHE
u'\u0437' # 0x00a7 -> CYRILLIC SMALL LETTER ZE
u'\u0438' # 0x00a8 -> CYRILLIC SMALL LETTER I
u'\u0439' # 0x00a9 -> CYRILLIC SMALL LETTER SHORT I
u'\u043a' # 0x00aa -> CYRILLIC SMALL LETTER KA
u'\u043b' # 0x00ab -> CYRILLIC SMALL LETTER EL
u'\u043c' # 0x00ac -> CYRILLIC SMALL LETTER EM
u'\u043d' # 0x00ad -> CYRILLIC SMALL LETTER EN
u'\u043e' # 0x00ae -> CYRILLIC SMALL LETTER O
u'\u043f' # 0x00af -> CYRILLIC SMALL LETTER PE
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u258c' # 0x00dd -> LEFT HALF BLOCK
u'\u2590' # 0x00de -> RIGHT HALF BLOCK
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u0440' # 0x00e0 -> CYRILLIC SMALL LETTER ER
u'\u0441' # 0x00e1 -> CYRILLIC SMALL LETTER ES
u'\u0442' # 0x00e2 -> CYRILLIC SMALL LETTER TE
u'\u0443' # 0x00e3 -> CYRILLIC SMALL LETTER U
u'\u0444' # 0x00e4 -> CYRILLIC SMALL LETTER EF
u'\u0445' # 0x00e5 -> CYRILLIC SMALL LETTER HA
u'\u0446' # 0x00e6 -> CYRILLIC SMALL LETTER TSE
u'\u0447' # 0x00e7 -> CYRILLIC SMALL LETTER CHE
u'\u0448' # 0x00e8 -> CYRILLIC SMALL LETTER SHA
u'\u0449' # 0x00e9 -> CYRILLIC SMALL LETTER SHCHA
u'\u044a' # 0x00ea -> CYRILLIC SMALL LETTER HARD SIGN
u'\u044b' # 0x00eb -> CYRILLIC SMALL LETTER YERU
u'\u044c' # 0x00ec -> CYRILLIC SMALL LETTER SOFT SIGN
u'\u044d' # 0x00ed -> CYRILLIC SMALL LETTER E
u'\u044e' # 0x00ee -> CYRILLIC SMALL LETTER YU
u'\u044f' # 0x00ef -> CYRILLIC SMALL LETTER YA
u'\u0401' # 0x00f0 -> CYRILLIC CAPITAL LETTER IO
u'\u0451' # 0x00f1 -> CYRILLIC SMALL LETTER IO
u'\u0404' # 0x00f2 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
u'\u0454' # 0x00f3 -> CYRILLIC SMALL LETTER UKRAINIAN IE
u'\u0407' # 0x00f4 -> CYRILLIC CAPITAL LETTER YI
u'\u0457' # 0x00f5 -> CYRILLIC SMALL LETTER YI
u'\u040e' # 0x00f6 -> CYRILLIC CAPITAL LETTER SHORT U
u'\u045e' # 0x00f7 -> CYRILLIC SMALL LETTER SHORT U
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\u2219' # 0x00f9 -> BULLET OPERATOR
u'\xb7' # 0x00fa -> MIDDLE DOT
u'\u221a' # 0x00fb -> SQUARE ROOT
u'\u2116' # 0x00fc -> NUMERO SIGN
u'\xa4' # 0x00fd -> CURRENCY SIGN
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a4: 0x00fd, # CURRENCY SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x0401: 0x00f0, # CYRILLIC CAPITAL LETTER IO
0x0404: 0x00f2, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x0407: 0x00f4, # CYRILLIC CAPITAL LETTER YI
0x040e: 0x00f6, # CYRILLIC CAPITAL LETTER SHORT U
0x0410: 0x0080, # CYRILLIC CAPITAL LETTER A
0x0411: 0x0081, # CYRILLIC CAPITAL LETTER BE
0x0412: 0x0082, # CYRILLIC CAPITAL LETTER VE
0x0413: 0x0083, # CYRILLIC CAPITAL LETTER GHE
0x0414: 0x0084, # CYRILLIC CAPITAL LETTER DE
0x0415: 0x0085, # CYRILLIC CAPITAL LETTER IE
0x0416: 0x0086, # CYRILLIC CAPITAL LETTER ZHE
0x0417: 0x0087, # CYRILLIC CAPITAL LETTER ZE
0x0418: 0x0088, # CYRILLIC CAPITAL LETTER I
0x0419: 0x0089, # CYRILLIC CAPITAL LETTER SHORT I
0x041a: 0x008a, # CYRILLIC CAPITAL LETTER KA
0x041b: 0x008b, # CYRILLIC CAPITAL LETTER EL
0x041c: 0x008c, # CYRILLIC CAPITAL LETTER EM
0x041d: 0x008d, # CYRILLIC CAPITAL LETTER EN
0x041e: 0x008e, # CYRILLIC CAPITAL LETTER O
0x041f: 0x008f, # CYRILLIC CAPITAL LETTER PE
0x0420: 0x0090, # CYRILLIC CAPITAL LETTER ER
0x0421: 0x0091, # CYRILLIC CAPITAL LETTER ES
0x0422: 0x0092, # CYRILLIC CAPITAL LETTER TE
0x0423: 0x0093, # CYRILLIC CAPITAL LETTER U
0x0424: 0x0094, # CYRILLIC CAPITAL LETTER EF
0x0425: 0x0095, # CYRILLIC CAPITAL LETTER HA
0x0426: 0x0096, # CYRILLIC CAPITAL LETTER TSE
0x0427: 0x0097, # CYRILLIC CAPITAL LETTER CHE
0x0428: 0x0098, # CYRILLIC CAPITAL LETTER SHA
0x0429: 0x0099, # CYRILLIC CAPITAL LETTER SHCHA
0x042a: 0x009a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x042b: 0x009b, # CYRILLIC CAPITAL LETTER YERU
0x042c: 0x009c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x042d: 0x009d, # CYRILLIC CAPITAL LETTER E
0x042e: 0x009e, # CYRILLIC CAPITAL LETTER YU
0x042f: 0x009f, # CYRILLIC CAPITAL LETTER YA
0x0430: 0x00a0, # CYRILLIC SMALL LETTER A
0x0431: 0x00a1, # CYRILLIC SMALL LETTER BE
0x0432: 0x00a2, # CYRILLIC SMALL LETTER VE
0x0433: 0x00a3, # CYRILLIC SMALL LETTER GHE
0x0434: 0x00a4, # CYRILLIC SMALL LETTER DE
0x0435: 0x00a5, # CYRILLIC SMALL LETTER IE
0x0436: 0x00a6, # CYRILLIC SMALL LETTER ZHE
0x0437: 0x00a7, # CYRILLIC SMALL LETTER ZE
0x0438: 0x00a8, # CYRILLIC SMALL LETTER I
0x0439: 0x00a9, # CYRILLIC SMALL LETTER SHORT I
0x043a: 0x00aa, # CYRILLIC SMALL LETTER KA
0x043b: 0x00ab, # CYRILLIC SMALL LETTER EL
0x043c: 0x00ac, # CYRILLIC SMALL LETTER EM
0x043d: 0x00ad, # CYRILLIC SMALL LETTER EN
0x043e: 0x00ae, # CYRILLIC SMALL LETTER O
0x043f: 0x00af, # CYRILLIC SMALL LETTER PE
0x0440: 0x00e0, # CYRILLIC SMALL LETTER ER
0x0441: 0x00e1, # CYRILLIC SMALL LETTER ES
0x0442: 0x00e2, # CYRILLIC SMALL LETTER TE
0x0443: 0x00e3, # CYRILLIC SMALL LETTER U
0x0444: 0x00e4, # CYRILLIC SMALL LETTER EF
0x0445: 0x00e5, # CYRILLIC SMALL LETTER HA
0x0446: 0x00e6, # CYRILLIC SMALL LETTER TSE
0x0447: 0x00e7, # CYRILLIC SMALL LETTER CHE
0x0448: 0x00e8, # CYRILLIC SMALL LETTER SHA
0x0449: 0x00e9, # CYRILLIC SMALL LETTER SHCHA
0x044a: 0x00ea, # CYRILLIC SMALL LETTER HARD SIGN
0x044b: 0x00eb, # CYRILLIC SMALL LETTER YERU
0x044c: 0x00ec, # CYRILLIC SMALL LETTER SOFT SIGN
0x044d: 0x00ed, # CYRILLIC SMALL LETTER E
0x044e: 0x00ee, # CYRILLIC SMALL LETTER YU
0x044f: 0x00ef, # CYRILLIC SMALL LETTER YA
0x0451: 0x00f1, # CYRILLIC SMALL LETTER IO
0x0454: 0x00f3, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x0457: 0x00f5, # CYRILLIC SMALL LETTER YI
0x045e: 0x00f7, # CYRILLIC SMALL LETTER SHORT U
0x2116: 0x00fc, # NUMERO SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP869.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp869',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: None, # UNDEFINED
0x0081: None, # UNDEFINED
0x0082: None, # UNDEFINED
0x0083: None, # UNDEFINED
0x0084: None, # UNDEFINED
0x0085: None, # UNDEFINED
0x0086: 0x0386, # GREEK CAPITAL LETTER ALPHA WITH TONOS
0x0087: None, # UNDEFINED
0x0088: 0x00b7, # MIDDLE DOT
0x0089: 0x00ac, # NOT SIGN
0x008a: 0x00a6, # BROKEN BAR
0x008b: 0x2018, # LEFT SINGLE QUOTATION MARK
0x008c: 0x2019, # RIGHT SINGLE QUOTATION MARK
0x008d: 0x0388, # GREEK CAPITAL LETTER EPSILON WITH TONOS
0x008e: 0x2015, # HORIZONTAL BAR
0x008f: 0x0389, # GREEK CAPITAL LETTER ETA WITH TONOS
0x0090: 0x038a, # GREEK CAPITAL LETTER IOTA WITH TONOS
0x0091: 0x03aa, # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0x0092: 0x038c, # GREEK CAPITAL LETTER OMICRON WITH TONOS
0x0093: None, # UNDEFINED
0x0094: None, # UNDEFINED
0x0095: 0x038e, # GREEK CAPITAL LETTER UPSILON WITH TONOS
0x0096: 0x03ab, # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0x0097: 0x00a9, # COPYRIGHT SIGN
0x0098: 0x038f, # GREEK CAPITAL LETTER OMEGA WITH TONOS
0x0099: 0x00b2, # SUPERSCRIPT TWO
0x009a: 0x00b3, # SUPERSCRIPT THREE
0x009b: 0x03ac, # GREEK SMALL LETTER ALPHA WITH TONOS
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x03ad, # GREEK SMALL LETTER EPSILON WITH TONOS
0x009e: 0x03ae, # GREEK SMALL LETTER ETA WITH TONOS
0x009f: 0x03af, # GREEK SMALL LETTER IOTA WITH TONOS
0x00a0: 0x03ca, # GREEK SMALL LETTER IOTA WITH DIALYTIKA
0x00a1: 0x0390, # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
0x00a2: 0x03cc, # GREEK SMALL LETTER OMICRON WITH TONOS
0x00a3: 0x03cd, # GREEK SMALL LETTER UPSILON WITH TONOS
0x00a4: 0x0391, # GREEK CAPITAL LETTER ALPHA
0x00a5: 0x0392, # GREEK CAPITAL LETTER BETA
0x00a6: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00a7: 0x0394, # GREEK CAPITAL LETTER DELTA
0x00a8: 0x0395, # GREEK CAPITAL LETTER EPSILON
0x00a9: 0x0396, # GREEK CAPITAL LETTER ZETA
0x00aa: 0x0397, # GREEK CAPITAL LETTER ETA
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ad: 0x0399, # GREEK CAPITAL LETTER IOTA
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x039a, # GREEK CAPITAL LETTER KAPPA
0x00b6: 0x039b, # GREEK CAPITAL LETTER LAMDA
0x00b7: 0x039c, # GREEK CAPITAL LETTER MU
0x00b8: 0x039d, # GREEK CAPITAL LETTER NU
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x039e, # GREEK CAPITAL LETTER XI
0x00be: 0x039f, # GREEK CAPITAL LETTER OMICRON
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x03a0, # GREEK CAPITAL LETTER PI
0x00c7: 0x03a1, # GREEK CAPITAL LETTER RHO
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00d0: 0x03a4, # GREEK CAPITAL LETTER TAU
0x00d1: 0x03a5, # GREEK CAPITAL LETTER UPSILON
0x00d2: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00d3: 0x03a7, # GREEK CAPITAL LETTER CHI
0x00d4: 0x03a8, # GREEK CAPITAL LETTER PSI
0x00d5: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00d6: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00d7: 0x03b2, # GREEK SMALL LETTER BETA
0x00d8: 0x03b3, # GREEK SMALL LETTER GAMMA
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x03b4, # GREEK SMALL LETTER DELTA
0x00de: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b6, # GREEK SMALL LETTER ZETA
0x00e1: 0x03b7, # GREEK SMALL LETTER ETA
0x00e2: 0x03b8, # GREEK SMALL LETTER THETA
0x00e3: 0x03b9, # GREEK SMALL LETTER IOTA
0x00e4: 0x03ba, # GREEK SMALL LETTER KAPPA
0x00e5: 0x03bb, # GREEK SMALL LETTER LAMDA
0x00e6: 0x03bc, # GREEK SMALL LETTER MU
0x00e7: 0x03bd, # GREEK SMALL LETTER NU
0x00e8: 0x03be, # GREEK SMALL LETTER XI
0x00e9: 0x03bf, # GREEK SMALL LETTER OMICRON
0x00ea: 0x03c0, # GREEK SMALL LETTER PI
0x00eb: 0x03c1, # GREEK SMALL LETTER RHO
0x00ec: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00ed: 0x03c2, # GREEK SMALL LETTER FINAL SIGMA
0x00ee: 0x03c4, # GREEK SMALL LETTER TAU
0x00ef: 0x0384, # GREEK TONOS
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x03c5, # GREEK SMALL LETTER UPSILON
0x00f3: 0x03c6, # GREEK SMALL LETTER PHI
0x00f4: 0x03c7, # GREEK SMALL LETTER CHI
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x03c8, # GREEK SMALL LETTER PSI
0x00f7: 0x0385, # GREEK DIALYTIKA TONOS
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x03c9, # GREEK SMALL LETTER OMEGA
0x00fb: 0x03cb, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0x00fc: 0x03b0, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
0x00fd: 0x03ce, # GREEK SMALL LETTER OMEGA WITH TONOS
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> NULL
u'\x01' # 0x0001 -> START OF HEADING
u'\x02' # 0x0002 -> START OF TEXT
u'\x03' # 0x0003 -> END OF TEXT
u'\x04' # 0x0004 -> END OF TRANSMISSION
u'\x05' # 0x0005 -> ENQUIRY
u'\x06' # 0x0006 -> ACKNOWLEDGE
u'\x07' # 0x0007 -> BELL
u'\x08' # 0x0008 -> BACKSPACE
u'\t' # 0x0009 -> HORIZONTAL TABULATION
u'\n' # 0x000a -> LINE FEED
u'\x0b' # 0x000b -> VERTICAL TABULATION
u'\x0c' # 0x000c -> FORM FEED
u'\r' # 0x000d -> CARRIAGE RETURN
u'\x0e' # 0x000e -> SHIFT OUT
u'\x0f' # 0x000f -> SHIFT IN
u'\x10' # 0x0010 -> DATA LINK ESCAPE
u'\x11' # 0x0011 -> DEVICE CONTROL ONE
u'\x12' # 0x0012 -> DEVICE CONTROL TWO
u'\x13' # 0x0013 -> DEVICE CONTROL THREE
u'\x14' # 0x0014 -> DEVICE CONTROL FOUR
u'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x0016 -> SYNCHRONOUS IDLE
u'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x0018 -> CANCEL
u'\x19' # 0x0019 -> END OF MEDIUM
u'\x1a' # 0x001a -> SUBSTITUTE
u'\x1b' # 0x001b -> ESCAPE
u'\x1c' # 0x001c -> FILE SEPARATOR
u'\x1d' # 0x001d -> GROUP SEPARATOR
u'\x1e' # 0x001e -> RECORD SEPARATOR
u'\x1f' # 0x001f -> UNIT SEPARATOR
u' ' # 0x0020 -> SPACE
u'!' # 0x0021 -> EXCLAMATION MARK
u'"' # 0x0022 -> QUOTATION MARK
u'#' # 0x0023 -> NUMBER SIGN
u'$' # 0x0024 -> DOLLAR SIGN
u'%' # 0x0025 -> PERCENT SIGN
u'&' # 0x0026 -> AMPERSAND
u"'" # 0x0027 -> APOSTROPHE
u'(' # 0x0028 -> LEFT PARENTHESIS
u')' # 0x0029 -> RIGHT PARENTHESIS
u'*' # 0x002a -> ASTERISK
u'+' # 0x002b -> PLUS SIGN
u',' # 0x002c -> COMMA
u'-' # 0x002d -> HYPHEN-MINUS
u'.' # 0x002e -> FULL STOP
u'/' # 0x002f -> SOLIDUS
u'0' # 0x0030 -> DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE
u':' # 0x003a -> COLON
u';' # 0x003b -> SEMICOLON
u'<' # 0x003c -> LESS-THAN SIGN
u'=' # 0x003d -> EQUALS SIGN
u'>' # 0x003e -> GREATER-THAN SIGN
u'?' # 0x003f -> QUESTION MARK
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET
u'\\' # 0x005c -> REVERSE SOLIDUS
u']' # 0x005d -> RIGHT SQUARE BRACKET
u'^' # 0x005e -> CIRCUMFLEX ACCENT
u'_' # 0x005f -> LOW LINE
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET
u'|' # 0x007c -> VERTICAL LINE
u'}' # 0x007d -> RIGHT CURLY BRACKET
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> DELETE
u'\ufffe' # 0x0080 -> UNDEFINED
u'\ufffe' # 0x0081 -> UNDEFINED
u'\ufffe' # 0x0082 -> UNDEFINED
u'\ufffe' # 0x0083 -> UNDEFINED
u'\ufffe' # 0x0084 -> UNDEFINED
u'\ufffe' # 0x0085 -> UNDEFINED
u'\u0386' # 0x0086 -> GREEK CAPITAL LETTER ALPHA WITH TONOS
u'\ufffe' # 0x0087 -> UNDEFINED
u'\xb7' # 0x0088 -> MIDDLE DOT
u'\xac' # 0x0089 -> NOT SIGN
u'\xa6' # 0x008a -> BROKEN BAR
u'\u2018' # 0x008b -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x008c -> RIGHT SINGLE QUOTATION MARK
u'\u0388' # 0x008d -> GREEK CAPITAL LETTER EPSILON WITH TONOS
u'\u2015' # 0x008e -> HORIZONTAL BAR
u'\u0389' # 0x008f -> GREEK CAPITAL LETTER ETA WITH TONOS
u'\u038a' # 0x0090 -> GREEK CAPITAL LETTER IOTA WITH TONOS
u'\u03aa' # 0x0091 -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
u'\u038c' # 0x0092 -> GREEK CAPITAL LETTER OMICRON WITH TONOS
u'\ufffe' # 0x0093 -> UNDEFINED
u'\ufffe' # 0x0094 -> UNDEFINED
u'\u038e' # 0x0095 -> GREEK CAPITAL LETTER UPSILON WITH TONOS
u'\u03ab' # 0x0096 -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
u'\xa9' # 0x0097 -> COPYRIGHT SIGN
u'\u038f' # 0x0098 -> GREEK CAPITAL LETTER OMEGA WITH TONOS
u'\xb2' # 0x0099 -> SUPERSCRIPT TWO
u'\xb3' # 0x009a -> SUPERSCRIPT THREE
u'\u03ac' # 0x009b -> GREEK SMALL LETTER ALPHA WITH TONOS
u'\xa3' # 0x009c -> POUND SIGN
u'\u03ad' # 0x009d -> GREEK SMALL LETTER EPSILON WITH TONOS
u'\u03ae' # 0x009e -> GREEK SMALL LETTER ETA WITH TONOS
u'\u03af' # 0x009f -> GREEK SMALL LETTER IOTA WITH TONOS
u'\u03ca' # 0x00a0 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
u'\u0390' # 0x00a1 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
u'\u03cc' # 0x00a2 -> GREEK SMALL LETTER OMICRON WITH TONOS
u'\u03cd' # 0x00a3 -> GREEK SMALL LETTER UPSILON WITH TONOS
u'\u0391' # 0x00a4 -> GREEK CAPITAL LETTER ALPHA
u'\u0392' # 0x00a5 -> GREEK CAPITAL LETTER BETA
u'\u0393' # 0x00a6 -> GREEK CAPITAL LETTER GAMMA
u'\u0394' # 0x00a7 -> GREEK CAPITAL LETTER DELTA
u'\u0395' # 0x00a8 -> GREEK CAPITAL LETTER EPSILON
u'\u0396' # 0x00a9 -> GREEK CAPITAL LETTER ZETA
u'\u0397' # 0x00aa -> GREEK CAPITAL LETTER ETA
u'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
u'\u0398' # 0x00ac -> GREEK CAPITAL LETTER THETA
u'\u0399' # 0x00ad -> GREEK CAPITAL LETTER IOTA
u'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2591' # 0x00b0 -> LIGHT SHADE
u'\u2592' # 0x00b1 -> MEDIUM SHADE
u'\u2593' # 0x00b2 -> DARK SHADE
u'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
u'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u039a' # 0x00b5 -> GREEK CAPITAL LETTER KAPPA
u'\u039b' # 0x00b6 -> GREEK CAPITAL LETTER LAMDA
u'\u039c' # 0x00b7 -> GREEK CAPITAL LETTER MU
u'\u039d' # 0x00b8 -> GREEK CAPITAL LETTER NU
u'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u039e' # 0x00bd -> GREEK CAPITAL LETTER XI
u'\u039f' # 0x00be -> GREEK CAPITAL LETTER OMICRON
u'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u03a0' # 0x00c6 -> GREEK CAPITAL LETTER PI
u'\u03a1' # 0x00c7 -> GREEK CAPITAL LETTER RHO
u'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\u03a3' # 0x00cf -> GREEK CAPITAL LETTER SIGMA
u'\u03a4' # 0x00d0 -> GREEK CAPITAL LETTER TAU
u'\u03a5' # 0x00d1 -> GREEK CAPITAL LETTER UPSILON
u'\u03a6' # 0x00d2 -> GREEK CAPITAL LETTER PHI
u'\u03a7' # 0x00d3 -> GREEK CAPITAL LETTER CHI
u'\u03a8' # 0x00d4 -> GREEK CAPITAL LETTER PSI
u'\u03a9' # 0x00d5 -> GREEK CAPITAL LETTER OMEGA
u'\u03b1' # 0x00d6 -> GREEK SMALL LETTER ALPHA
u'\u03b2' # 0x00d7 -> GREEK SMALL LETTER BETA
u'\u03b3' # 0x00d8 -> GREEK SMALL LETTER GAMMA
u'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2588' # 0x00db -> FULL BLOCK
u'\u2584' # 0x00dc -> LOWER HALF BLOCK
u'\u03b4' # 0x00dd -> GREEK SMALL LETTER DELTA
u'\u03b5' # 0x00de -> GREEK SMALL LETTER EPSILON
u'\u2580' # 0x00df -> UPPER HALF BLOCK
u'\u03b6' # 0x00e0 -> GREEK SMALL LETTER ZETA
u'\u03b7' # 0x00e1 -> GREEK SMALL LETTER ETA
u'\u03b8' # 0x00e2 -> GREEK SMALL LETTER THETA
u'\u03b9' # 0x00e3 -> GREEK SMALL LETTER IOTA
u'\u03ba' # 0x00e4 -> GREEK SMALL LETTER KAPPA
u'\u03bb' # 0x00e5 -> GREEK SMALL LETTER LAMDA
u'\u03bc' # 0x00e6 -> GREEK SMALL LETTER MU
u'\u03bd' # 0x00e7 -> GREEK SMALL LETTER NU
u'\u03be' # 0x00e8 -> GREEK SMALL LETTER XI
u'\u03bf' # 0x00e9 -> GREEK SMALL LETTER OMICRON
u'\u03c0' # 0x00ea -> GREEK SMALL LETTER PI
u'\u03c1' # 0x00eb -> GREEK SMALL LETTER RHO
u'\u03c3' # 0x00ec -> GREEK SMALL LETTER SIGMA
u'\u03c2' # 0x00ed -> GREEK SMALL LETTER FINAL SIGMA
u'\u03c4' # 0x00ee -> GREEK SMALL LETTER TAU
u'\u0384' # 0x00ef -> GREEK TONOS
u'\xad' # 0x00f0 -> SOFT HYPHEN
u'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
u'\u03c5' # 0x00f2 -> GREEK SMALL LETTER UPSILON
u'\u03c6' # 0x00f3 -> GREEK SMALL LETTER PHI
u'\u03c7' # 0x00f4 -> GREEK SMALL LETTER CHI
u'\xa7' # 0x00f5 -> SECTION SIGN
u'\u03c8' # 0x00f6 -> GREEK SMALL LETTER PSI
u'\u0385' # 0x00f7 -> GREEK DIALYTIKA TONOS
u'\xb0' # 0x00f8 -> DEGREE SIGN
u'\xa8' # 0x00f9 -> DIAERESIS
u'\u03c9' # 0x00fa -> GREEK SMALL LETTER OMEGA
u'\u03cb' # 0x00fb -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
u'\u03b0' # 0x00fc -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
u'\u03ce' # 0x00fd -> GREEK SMALL LETTER OMEGA WITH TONOS
u'\u25a0' # 0x00fe -> BLACK SQUARE
u'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a3: 0x009c, # POUND SIGN
0x00a6: 0x008a, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00a9: 0x0097, # COPYRIGHT SIGN
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x0089, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x0099, # SUPERSCRIPT TWO
0x00b3: 0x009a, # SUPERSCRIPT THREE
0x00b7: 0x0088, # MIDDLE DOT
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x0384: 0x00ef, # GREEK TONOS
0x0385: 0x00f7, # GREEK DIALYTIKA TONOS
0x0386: 0x0086, # GREEK CAPITAL LETTER ALPHA WITH TONOS
0x0388: 0x008d, # GREEK CAPITAL LETTER EPSILON WITH TONOS
0x0389: 0x008f, # GREEK CAPITAL LETTER ETA WITH TONOS
0x038a: 0x0090, # GREEK CAPITAL LETTER IOTA WITH TONOS
0x038c: 0x0092, # GREEK CAPITAL LETTER OMICRON WITH TONOS
0x038e: 0x0095, # GREEK CAPITAL LETTER UPSILON WITH TONOS
0x038f: 0x0098, # GREEK CAPITAL LETTER OMEGA WITH TONOS
0x0390: 0x00a1, # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
0x0391: 0x00a4, # GREEK CAPITAL LETTER ALPHA
0x0392: 0x00a5, # GREEK CAPITAL LETTER BETA
0x0393: 0x00a6, # GREEK CAPITAL LETTER GAMMA
0x0394: 0x00a7, # GREEK CAPITAL LETTER DELTA
0x0395: 0x00a8, # GREEK CAPITAL LETTER EPSILON
0x0396: 0x00a9, # GREEK CAPITAL LETTER ZETA
0x0397: 0x00aa, # GREEK CAPITAL LETTER ETA
0x0398: 0x00ac, # GREEK CAPITAL LETTER THETA
0x0399: 0x00ad, # GREEK CAPITAL LETTER IOTA
0x039a: 0x00b5, # GREEK CAPITAL LETTER KAPPA
0x039b: 0x00b6, # GREEK CAPITAL LETTER LAMDA
0x039c: 0x00b7, # GREEK CAPITAL LETTER MU
0x039d: 0x00b8, # GREEK CAPITAL LETTER NU
0x039e: 0x00bd, # GREEK CAPITAL LETTER XI
0x039f: 0x00be, # GREEK CAPITAL LETTER OMICRON
0x03a0: 0x00c6, # GREEK CAPITAL LETTER PI
0x03a1: 0x00c7, # GREEK CAPITAL LETTER RHO
0x03a3: 0x00cf, # GREEK CAPITAL LETTER SIGMA
0x03a4: 0x00d0, # GREEK CAPITAL LETTER TAU
0x03a5: 0x00d1, # GREEK CAPITAL LETTER UPSILON
0x03a6: 0x00d2, # GREEK CAPITAL LETTER PHI
0x03a7: 0x00d3, # GREEK CAPITAL LETTER CHI
0x03a8: 0x00d4, # GREEK CAPITAL LETTER PSI
0x03a9: 0x00d5, # GREEK CAPITAL LETTER OMEGA
0x03aa: 0x0091, # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0x03ab: 0x0096, # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0x03ac: 0x009b, # GREEK SMALL LETTER ALPHA WITH TONOS
0x03ad: 0x009d, # GREEK SMALL LETTER EPSILON WITH TONOS
0x03ae: 0x009e, # GREEK SMALL LETTER ETA WITH TONOS
0x03af: 0x009f, # GREEK SMALL LETTER IOTA WITH TONOS
0x03b0: 0x00fc, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
0x03b1: 0x00d6, # GREEK SMALL LETTER ALPHA
0x03b2: 0x00d7, # GREEK SMALL LETTER BETA
0x03b3: 0x00d8, # GREEK SMALL LETTER GAMMA
0x03b4: 0x00dd, # GREEK SMALL LETTER DELTA
0x03b5: 0x00de, # GREEK SMALL LETTER EPSILON
0x03b6: 0x00e0, # GREEK SMALL LETTER ZETA
0x03b7: 0x00e1, # GREEK SMALL LETTER ETA
0x03b8: 0x00e2, # GREEK SMALL LETTER THETA
0x03b9: 0x00e3, # GREEK SMALL LETTER IOTA
0x03ba: 0x00e4, # GREEK SMALL LETTER KAPPA
0x03bb: 0x00e5, # GREEK SMALL LETTER LAMDA
0x03bc: 0x00e6, # GREEK SMALL LETTER MU
0x03bd: 0x00e7, # GREEK SMALL LETTER NU
0x03be: 0x00e8, # GREEK SMALL LETTER XI
0x03bf: 0x00e9, # GREEK SMALL LETTER OMICRON
0x03c0: 0x00ea, # GREEK SMALL LETTER PI
0x03c1: 0x00eb, # GREEK SMALL LETTER RHO
0x03c2: 0x00ed, # GREEK SMALL LETTER FINAL SIGMA
0x03c3: 0x00ec, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00ee, # GREEK SMALL LETTER TAU
0x03c5: 0x00f2, # GREEK SMALL LETTER UPSILON
0x03c6: 0x00f3, # GREEK SMALL LETTER PHI
0x03c7: 0x00f4, # GREEK SMALL LETTER CHI
0x03c8: 0x00f6, # GREEK SMALL LETTER PSI
0x03c9: 0x00fa, # GREEK SMALL LETTER OMEGA
0x03ca: 0x00a0, # GREEK SMALL LETTER IOTA WITH DIALYTIKA
0x03cb: 0x00fb, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0x03cc: 0x00a2, # GREEK SMALL LETTER OMICRON WITH TONOS
0x03cd: 0x00a3, # GREEK SMALL LETTER UPSILON WITH TONOS
0x03ce: 0x00fd, # GREEK SMALL LETTER OMEGA WITH TONOS
0x2015: 0x008e, # HORIZONTAL BAR
0x2018: 0x008b, # LEFT SINGLE QUOTATION MARK
0x2019: 0x008c, # RIGHT SINGLE QUOTATION MARK
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp874 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp874',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u20ac' # 0x80 -> EURO SIGN
u'\ufffe' # 0x81 -> UNDEFINED
u'\ufffe' # 0x82 -> UNDEFINED
u'\ufffe' # 0x83 -> UNDEFINED
u'\ufffe' # 0x84 -> UNDEFINED
u'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
u'\ufffe' # 0x86 -> UNDEFINED
u'\ufffe' # 0x87 -> UNDEFINED
u'\ufffe' # 0x88 -> UNDEFINED
u'\ufffe' # 0x89 -> UNDEFINED
u'\ufffe' # 0x8A -> UNDEFINED
u'\ufffe' # 0x8B -> UNDEFINED
u'\ufffe' # 0x8C -> UNDEFINED
u'\ufffe' # 0x8D -> UNDEFINED
u'\ufffe' # 0x8E -> UNDEFINED
u'\ufffe' # 0x8F -> UNDEFINED
u'\ufffe' # 0x90 -> UNDEFINED
u'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
u'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
u'\u2022' # 0x95 -> BULLET
u'\u2013' # 0x96 -> EN DASH
u'\u2014' # 0x97 -> EM DASH
u'\ufffe' # 0x98 -> UNDEFINED
u'\ufffe' # 0x99 -> UNDEFINED
u'\ufffe' # 0x9A -> UNDEFINED
u'\ufffe' # 0x9B -> UNDEFINED
u'\ufffe' # 0x9C -> UNDEFINED
u'\ufffe' # 0x9D -> UNDEFINED
u'\ufffe' # 0x9E -> UNDEFINED
u'\ufffe' # 0x9F -> UNDEFINED
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0e01' # 0xA1 -> THAI CHARACTER KO KAI
u'\u0e02' # 0xA2 -> THAI CHARACTER KHO KHAI
u'\u0e03' # 0xA3 -> THAI CHARACTER KHO KHUAT
u'\u0e04' # 0xA4 -> THAI CHARACTER KHO KHWAI
u'\u0e05' # 0xA5 -> THAI CHARACTER KHO KHON
u'\u0e06' # 0xA6 -> THAI CHARACTER KHO RAKHANG
u'\u0e07' # 0xA7 -> THAI CHARACTER NGO NGU
u'\u0e08' # 0xA8 -> THAI CHARACTER CHO CHAN
u'\u0e09' # 0xA9 -> THAI CHARACTER CHO CHING
u'\u0e0a' # 0xAA -> THAI CHARACTER CHO CHANG
u'\u0e0b' # 0xAB -> THAI CHARACTER SO SO
u'\u0e0c' # 0xAC -> THAI CHARACTER CHO CHOE
u'\u0e0d' # 0xAD -> THAI CHARACTER YO YING
u'\u0e0e' # 0xAE -> THAI CHARACTER DO CHADA
u'\u0e0f' # 0xAF -> THAI CHARACTER TO PATAK
u'\u0e10' # 0xB0 -> THAI CHARACTER THO THAN
u'\u0e11' # 0xB1 -> THAI CHARACTER THO NANGMONTHO
u'\u0e12' # 0xB2 -> THAI CHARACTER THO PHUTHAO
u'\u0e13' # 0xB3 -> THAI CHARACTER NO NEN
u'\u0e14' # 0xB4 -> THAI CHARACTER DO DEK
u'\u0e15' # 0xB5 -> THAI CHARACTER TO TAO
u'\u0e16' # 0xB6 -> THAI CHARACTER THO THUNG
u'\u0e17' # 0xB7 -> THAI CHARACTER THO THAHAN
u'\u0e18' # 0xB8 -> THAI CHARACTER THO THONG
u'\u0e19' # 0xB9 -> THAI CHARACTER NO NU
u'\u0e1a' # 0xBA -> THAI CHARACTER BO BAIMAI
u'\u0e1b' # 0xBB -> THAI CHARACTER PO PLA
u'\u0e1c' # 0xBC -> THAI CHARACTER PHO PHUNG
u'\u0e1d' # 0xBD -> THAI CHARACTER FO FA
u'\u0e1e' # 0xBE -> THAI CHARACTER PHO PHAN
u'\u0e1f' # 0xBF -> THAI CHARACTER FO FAN
u'\u0e20' # 0xC0 -> THAI CHARACTER PHO SAMPHAO
u'\u0e21' # 0xC1 -> THAI CHARACTER MO MA
u'\u0e22' # 0xC2 -> THAI CHARACTER YO YAK
u'\u0e23' # 0xC3 -> THAI CHARACTER RO RUA
u'\u0e24' # 0xC4 -> THAI CHARACTER RU
u'\u0e25' # 0xC5 -> THAI CHARACTER LO LING
u'\u0e26' # 0xC6 -> THAI CHARACTER LU
u'\u0e27' # 0xC7 -> THAI CHARACTER WO WAEN
u'\u0e28' # 0xC8 -> THAI CHARACTER SO SALA
u'\u0e29' # 0xC9 -> THAI CHARACTER SO RUSI
u'\u0e2a' # 0xCA -> THAI CHARACTER SO SUA
u'\u0e2b' # 0xCB -> THAI CHARACTER HO HIP
u'\u0e2c' # 0xCC -> THAI CHARACTER LO CHULA
u'\u0e2d' # 0xCD -> THAI CHARACTER O ANG
u'\u0e2e' # 0xCE -> THAI CHARACTER HO NOKHUK
u'\u0e2f' # 0xCF -> THAI CHARACTER PAIYANNOI
u'\u0e30' # 0xD0 -> THAI CHARACTER SARA A
u'\u0e31' # 0xD1 -> THAI CHARACTER MAI HAN-AKAT
u'\u0e32' # 0xD2 -> THAI CHARACTER SARA AA
u'\u0e33' # 0xD3 -> THAI CHARACTER SARA AM
u'\u0e34' # 0xD4 -> THAI CHARACTER SARA I
u'\u0e35' # 0xD5 -> THAI CHARACTER SARA II
u'\u0e36' # 0xD6 -> THAI CHARACTER SARA UE
u'\u0e37' # 0xD7 -> THAI CHARACTER SARA UEE
u'\u0e38' # 0xD8 -> THAI CHARACTER SARA U
u'\u0e39' # 0xD9 -> THAI CHARACTER SARA UU
u'\u0e3a' # 0xDA -> THAI CHARACTER PHINTHU
u'\ufffe' # 0xDB -> UNDEFINED
u'\ufffe' # 0xDC -> UNDEFINED
u'\ufffe' # 0xDD -> UNDEFINED
u'\ufffe' # 0xDE -> UNDEFINED
u'\u0e3f' # 0xDF -> THAI CURRENCY SYMBOL BAHT
u'\u0e40' # 0xE0 -> THAI CHARACTER SARA E
u'\u0e41' # 0xE1 -> THAI CHARACTER SARA AE
u'\u0e42' # 0xE2 -> THAI CHARACTER SARA O
u'\u0e43' # 0xE3 -> THAI CHARACTER SARA AI MAIMUAN
u'\u0e44' # 0xE4 -> THAI CHARACTER SARA AI MAIMALAI
u'\u0e45' # 0xE5 -> THAI CHARACTER LAKKHANGYAO
u'\u0e46' # 0xE6 -> THAI CHARACTER MAIYAMOK
u'\u0e47' # 0xE7 -> THAI CHARACTER MAITAIKHU
u'\u0e48' # 0xE8 -> THAI CHARACTER MAI EK
u'\u0e49' # 0xE9 -> THAI CHARACTER MAI THO
u'\u0e4a' # 0xEA -> THAI CHARACTER MAI TRI
u'\u0e4b' # 0xEB -> THAI CHARACTER MAI CHATTAWA
u'\u0e4c' # 0xEC -> THAI CHARACTER THANTHAKHAT
u'\u0e4d' # 0xED -> THAI CHARACTER NIKHAHIT
u'\u0e4e' # 0xEE -> THAI CHARACTER YAMAKKAN
u'\u0e4f' # 0xEF -> THAI CHARACTER FONGMAN
u'\u0e50' # 0xF0 -> THAI DIGIT ZERO
u'\u0e51' # 0xF1 -> THAI DIGIT ONE
u'\u0e52' # 0xF2 -> THAI DIGIT TWO
u'\u0e53' # 0xF3 -> THAI DIGIT THREE
u'\u0e54' # 0xF4 -> THAI DIGIT FOUR
u'\u0e55' # 0xF5 -> THAI DIGIT FIVE
u'\u0e56' # 0xF6 -> THAI DIGIT SIX
u'\u0e57' # 0xF7 -> THAI DIGIT SEVEN
u'\u0e58' # 0xF8 -> THAI DIGIT EIGHT
u'\u0e59' # 0xF9 -> THAI DIGIT NINE
u'\u0e5a' # 0xFA -> THAI CHARACTER ANGKHANKHU
u'\u0e5b' # 0xFB -> THAI CHARACTER KHOMUT
u'\ufffe' # 0xFC -> UNDEFINED
u'\ufffe' # 0xFD -> UNDEFINED
u'\ufffe' # 0xFE -> UNDEFINED
u'\ufffe' # 0xFF -> UNDEFINED
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp875 generated from 'MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp875',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x9c' # 0x04 -> CONTROL
u'\t' # 0x05 -> HORIZONTAL TABULATION
u'\x86' # 0x06 -> CONTROL
u'\x7f' # 0x07 -> DELETE
u'\x97' # 0x08 -> CONTROL
u'\x8d' # 0x09 -> CONTROL
u'\x8e' # 0x0A -> CONTROL
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x9d' # 0x14 -> CONTROL
u'\x85' # 0x15 -> CONTROL
u'\x08' # 0x16 -> BACKSPACE
u'\x87' # 0x17 -> CONTROL
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x92' # 0x1A -> CONTROL
u'\x8f' # 0x1B -> CONTROL
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u'\x80' # 0x20 -> CONTROL
u'\x81' # 0x21 -> CONTROL
u'\x82' # 0x22 -> CONTROL
u'\x83' # 0x23 -> CONTROL
u'\x84' # 0x24 -> CONTROL
u'\n' # 0x25 -> LINE FEED
u'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
u'\x1b' # 0x27 -> ESCAPE
u'\x88' # 0x28 -> CONTROL
u'\x89' # 0x29 -> CONTROL
u'\x8a' # 0x2A -> CONTROL
u'\x8b' # 0x2B -> CONTROL
u'\x8c' # 0x2C -> CONTROL
u'\x05' # 0x2D -> ENQUIRY
u'\x06' # 0x2E -> ACKNOWLEDGE
u'\x07' # 0x2F -> BELL
u'\x90' # 0x30 -> CONTROL
u'\x91' # 0x31 -> CONTROL
u'\x16' # 0x32 -> SYNCHRONOUS IDLE
u'\x93' # 0x33 -> CONTROL
u'\x94' # 0x34 -> CONTROL
u'\x95' # 0x35 -> CONTROL
u'\x96' # 0x36 -> CONTROL
u'\x04' # 0x37 -> END OF TRANSMISSION
u'\x98' # 0x38 -> CONTROL
u'\x99' # 0x39 -> CONTROL
u'\x9a' # 0x3A -> CONTROL
u'\x9b' # 0x3B -> CONTROL
u'\x14' # 0x3C -> DEVICE CONTROL FOUR
u'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
u'\x9e' # 0x3E -> CONTROL
u'\x1a' # 0x3F -> SUBSTITUTE
u' ' # 0x40 -> SPACE
u'\u0391' # 0x41 -> GREEK CAPITAL LETTER ALPHA
u'\u0392' # 0x42 -> GREEK CAPITAL LETTER BETA
u'\u0393' # 0x43 -> GREEK CAPITAL LETTER GAMMA
u'\u0394' # 0x44 -> GREEK CAPITAL LETTER DELTA
u'\u0395' # 0x45 -> GREEK CAPITAL LETTER EPSILON
u'\u0396' # 0x46 -> GREEK CAPITAL LETTER ZETA
u'\u0397' # 0x47 -> GREEK CAPITAL LETTER ETA
u'\u0398' # 0x48 -> GREEK CAPITAL LETTER THETA
u'\u0399' # 0x49 -> GREEK CAPITAL LETTER IOTA
u'[' # 0x4A -> LEFT SQUARE BRACKET
u'.' # 0x4B -> FULL STOP
u'<' # 0x4C -> LESS-THAN SIGN
u'(' # 0x4D -> LEFT PARENTHESIS
u'+' # 0x4E -> PLUS SIGN
u'!' # 0x4F -> EXCLAMATION MARK
u'&' # 0x50 -> AMPERSAND
u'\u039a' # 0x51 -> GREEK CAPITAL LETTER KAPPA
u'\u039b' # 0x52 -> GREEK CAPITAL LETTER LAMDA
u'\u039c' # 0x53 -> GREEK CAPITAL LETTER MU
u'\u039d' # 0x54 -> GREEK CAPITAL LETTER NU
u'\u039e' # 0x55 -> GREEK CAPITAL LETTER XI
u'\u039f' # 0x56 -> GREEK CAPITAL LETTER OMICRON
u'\u03a0' # 0x57 -> GREEK CAPITAL LETTER PI
u'\u03a1' # 0x58 -> GREEK CAPITAL LETTER RHO
u'\u03a3' # 0x59 -> GREEK CAPITAL LETTER SIGMA
u']' # 0x5A -> RIGHT SQUARE BRACKET
u'$' # 0x5B -> DOLLAR SIGN
u'*' # 0x5C -> ASTERISK
u')' # 0x5D -> RIGHT PARENTHESIS
u';' # 0x5E -> SEMICOLON
u'^' # 0x5F -> CIRCUMFLEX ACCENT
u'-' # 0x60 -> HYPHEN-MINUS
u'/' # 0x61 -> SOLIDUS
u'\u03a4' # 0x62 -> GREEK CAPITAL LETTER TAU
u'\u03a5' # 0x63 -> GREEK CAPITAL LETTER UPSILON
u'\u03a6' # 0x64 -> GREEK CAPITAL LETTER PHI
u'\u03a7' # 0x65 -> GREEK CAPITAL LETTER CHI
u'\u03a8' # 0x66 -> GREEK CAPITAL LETTER PSI
u'\u03a9' # 0x67 -> GREEK CAPITAL LETTER OMEGA
u'\u03aa' # 0x68 -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
u'\u03ab' # 0x69 -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
u'|' # 0x6A -> VERTICAL LINE
u',' # 0x6B -> COMMA
u'%' # 0x6C -> PERCENT SIGN
u'_' # 0x6D -> LOW LINE
u'>' # 0x6E -> GREATER-THAN SIGN
u'?' # 0x6F -> QUESTION MARK
u'\xa8' # 0x70 -> DIAERESIS
u'\u0386' # 0x71 -> GREEK CAPITAL LETTER ALPHA WITH TONOS
u'\u0388' # 0x72 -> GREEK CAPITAL LETTER EPSILON WITH TONOS
u'\u0389' # 0x73 -> GREEK CAPITAL LETTER ETA WITH TONOS
u'\xa0' # 0x74 -> NO-BREAK SPACE
u'\u038a' # 0x75 -> GREEK CAPITAL LETTER IOTA WITH TONOS
u'\u038c' # 0x76 -> GREEK CAPITAL LETTER OMICRON WITH TONOS
u'\u038e' # 0x77 -> GREEK CAPITAL LETTER UPSILON WITH TONOS
u'\u038f' # 0x78 -> GREEK CAPITAL LETTER OMEGA WITH TONOS
u'`' # 0x79 -> GRAVE ACCENT
u':' # 0x7A -> COLON
u'#' # 0x7B -> NUMBER SIGN
u'@' # 0x7C -> COMMERCIAL AT
u"'" # 0x7D -> APOSTROPHE
u'=' # 0x7E -> EQUALS SIGN
u'"' # 0x7F -> QUOTATION MARK
u'\u0385' # 0x80 -> GREEK DIALYTIKA TONOS
u'a' # 0x81 -> LATIN SMALL LETTER A
u'b' # 0x82 -> LATIN SMALL LETTER B
u'c' # 0x83 -> LATIN SMALL LETTER C
u'd' # 0x84 -> LATIN SMALL LETTER D
u'e' # 0x85 -> LATIN SMALL LETTER E
u'f' # 0x86 -> LATIN SMALL LETTER F
u'g' # 0x87 -> LATIN SMALL LETTER G
u'h' # 0x88 -> LATIN SMALL LETTER H
u'i' # 0x89 -> LATIN SMALL LETTER I
u'\u03b1' # 0x8A -> GREEK SMALL LETTER ALPHA
u'\u03b2' # 0x8B -> GREEK SMALL LETTER BETA
u'\u03b3' # 0x8C -> GREEK SMALL LETTER GAMMA
u'\u03b4' # 0x8D -> GREEK SMALL LETTER DELTA
u'\u03b5' # 0x8E -> GREEK SMALL LETTER EPSILON
u'\u03b6' # 0x8F -> GREEK SMALL LETTER ZETA
u'\xb0' # 0x90 -> DEGREE SIGN
u'j' # 0x91 -> LATIN SMALL LETTER J
u'k' # 0x92 -> LATIN SMALL LETTER K
u'l' # 0x93 -> LATIN SMALL LETTER L
u'm' # 0x94 -> LATIN SMALL LETTER M
u'n' # 0x95 -> LATIN SMALL LETTER N
u'o' # 0x96 -> LATIN SMALL LETTER O
u'p' # 0x97 -> LATIN SMALL LETTER P
u'q' # 0x98 -> LATIN SMALL LETTER Q
u'r' # 0x99 -> LATIN SMALL LETTER R
u'\u03b7' # 0x9A -> GREEK SMALL LETTER ETA
u'\u03b8' # 0x9B -> GREEK SMALL LETTER THETA
u'\u03b9' # 0x9C -> GREEK SMALL LETTER IOTA
u'\u03ba' # 0x9D -> GREEK SMALL LETTER KAPPA
u'\u03bb' # 0x9E -> GREEK SMALL LETTER LAMDA
u'\u03bc' # 0x9F -> GREEK SMALL LETTER MU
u'\xb4' # 0xA0 -> ACUTE ACCENT
u'~' # 0xA1 -> TILDE
u's' # 0xA2 -> LATIN SMALL LETTER S
u't' # 0xA3 -> LATIN SMALL LETTER T
u'u' # 0xA4 -> LATIN SMALL LETTER U
u'v' # 0xA5 -> LATIN SMALL LETTER V
u'w' # 0xA6 -> LATIN SMALL LETTER W
u'x' # 0xA7 -> LATIN SMALL LETTER X
u'y' # 0xA8 -> LATIN SMALL LETTER Y
u'z' # 0xA9 -> LATIN SMALL LETTER Z
u'\u03bd' # 0xAA -> GREEK SMALL LETTER NU
u'\u03be' # 0xAB -> GREEK SMALL LETTER XI
u'\u03bf' # 0xAC -> GREEK SMALL LETTER OMICRON
u'\u03c0' # 0xAD -> GREEK SMALL LETTER PI
u'\u03c1' # 0xAE -> GREEK SMALL LETTER RHO
u'\u03c3' # 0xAF -> GREEK SMALL LETTER SIGMA
u'\xa3' # 0xB0 -> POUND SIGN
u'\u03ac' # 0xB1 -> GREEK SMALL LETTER ALPHA WITH TONOS
u'\u03ad' # 0xB2 -> GREEK SMALL LETTER EPSILON WITH TONOS
u'\u03ae' # 0xB3 -> GREEK SMALL LETTER ETA WITH TONOS
u'\u03ca' # 0xB4 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
u'\u03af' # 0xB5 -> GREEK SMALL LETTER IOTA WITH TONOS
u'\u03cc' # 0xB6 -> GREEK SMALL LETTER OMICRON WITH TONOS
u'\u03cd' # 0xB7 -> GREEK SMALL LETTER UPSILON WITH TONOS
u'\u03cb' # 0xB8 -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
u'\u03ce' # 0xB9 -> GREEK SMALL LETTER OMEGA WITH TONOS
u'\u03c2' # 0xBA -> GREEK SMALL LETTER FINAL SIGMA
u'\u03c4' # 0xBB -> GREEK SMALL LETTER TAU
u'\u03c5' # 0xBC -> GREEK SMALL LETTER UPSILON
u'\u03c6' # 0xBD -> GREEK SMALL LETTER PHI
u'\u03c7' # 0xBE -> GREEK SMALL LETTER CHI
u'\u03c8' # 0xBF -> GREEK SMALL LETTER PSI
u'{' # 0xC0 -> LEFT CURLY BRACKET
u'A' # 0xC1 -> LATIN CAPITAL LETTER A
u'B' # 0xC2 -> LATIN CAPITAL LETTER B
u'C' # 0xC3 -> LATIN CAPITAL LETTER C
u'D' # 0xC4 -> LATIN CAPITAL LETTER D
u'E' # 0xC5 -> LATIN CAPITAL LETTER E
u'F' # 0xC6 -> LATIN CAPITAL LETTER F
u'G' # 0xC7 -> LATIN CAPITAL LETTER G
u'H' # 0xC8 -> LATIN CAPITAL LETTER H
u'I' # 0xC9 -> LATIN CAPITAL LETTER I
u'\xad' # 0xCA -> SOFT HYPHEN
u'\u03c9' # 0xCB -> GREEK SMALL LETTER OMEGA
u'\u0390' # 0xCC -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
u'\u03b0' # 0xCD -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
u'\u2018' # 0xCE -> LEFT SINGLE QUOTATION MARK
u'\u2015' # 0xCF -> HORIZONTAL BAR
u'}' # 0xD0 -> RIGHT CURLY BRACKET
u'J' # 0xD1 -> LATIN CAPITAL LETTER J
u'K' # 0xD2 -> LATIN CAPITAL LETTER K
u'L' # 0xD3 -> LATIN CAPITAL LETTER L
u'M' # 0xD4 -> LATIN CAPITAL LETTER M
u'N' # 0xD5 -> LATIN CAPITAL LETTER N
u'O' # 0xD6 -> LATIN CAPITAL LETTER O
u'P' # 0xD7 -> LATIN CAPITAL LETTER P
u'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
u'R' # 0xD9 -> LATIN CAPITAL LETTER R
u'\xb1' # 0xDA -> PLUS-MINUS SIGN
u'\xbd' # 0xDB -> VULGAR FRACTION ONE HALF
u'\x1a' # 0xDC -> SUBSTITUTE
u'\u0387' # 0xDD -> GREEK ANO TELEIA
u'\u2019' # 0xDE -> RIGHT SINGLE QUOTATION MARK
u'\xa6' # 0xDF -> BROKEN BAR
u'\\' # 0xE0 -> REVERSE SOLIDUS
u'\x1a' # 0xE1 -> SUBSTITUTE
u'S' # 0xE2 -> LATIN CAPITAL LETTER S
u'T' # 0xE3 -> LATIN CAPITAL LETTER T
u'U' # 0xE4 -> LATIN CAPITAL LETTER U
u'V' # 0xE5 -> LATIN CAPITAL LETTER V
u'W' # 0xE6 -> LATIN CAPITAL LETTER W
u'X' # 0xE7 -> LATIN CAPITAL LETTER X
u'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
u'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
u'\xb2' # 0xEA -> SUPERSCRIPT TWO
u'\xa7' # 0xEB -> SECTION SIGN
u'\x1a' # 0xEC -> SUBSTITUTE
u'\x1a' # 0xED -> SUBSTITUTE
u'\xab' # 0xEE -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xEF -> NOT SIGN
u'0' # 0xF0 -> DIGIT ZERO
u'1' # 0xF1 -> DIGIT ONE
u'2' # 0xF2 -> DIGIT TWO
u'3' # 0xF3 -> DIGIT THREE
u'4' # 0xF4 -> DIGIT FOUR
u'5' # 0xF5 -> DIGIT FIVE
u'6' # 0xF6 -> DIGIT SIX
u'7' # 0xF7 -> DIGIT SEVEN
u'8' # 0xF8 -> DIGIT EIGHT
u'9' # 0xF9 -> DIGIT NINE
u'\xb3' # 0xFA -> SUPERSCRIPT THREE
u'\xa9' # 0xFB -> COPYRIGHT SIGN
u'\x1a' # 0xFC -> SUBSTITUTE
u'\x1a' # 0xFD -> SUBSTITUTE
u'\xbb' # 0xFE -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python 'hex_codec' Codec - 2-digit hex content transfer encoding
Unlike most of the other codecs which target Unicode, this codec
will return Python string objects for both encode and decode.
Written by Marc-Andre Lemburg ([email protected]).
"""
import codecs, binascii
### Codec APIs
def hex_encode(input,errors='strict'):
""" Encodes the object input and returns a tuple (output
object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = binascii.b2a_hex(input)
return (output, len(input))
def hex_decode(input,errors='strict'):
""" Decodes the object input and returns a tuple (output
object, length consumed).
input must be an object which provides the bf_getreadbuf
buffer slot. Python strings, buffer objects and memory
mapped files are examples of objects providing this slot.
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = binascii.a2b_hex(input)
return (output, len(input))
class Codec(codecs.Codec):
def encode(self, input,errors='strict'):
return hex_encode(input,errors)
def decode(self, input,errors='strict'):
return hex_decode(input,errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
assert self.errors == 'strict'
return binascii.b2a_hex(input)
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
assert self.errors == 'strict'
return binascii.a2b_hex(input)
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='hex',
encode=hex_encode,
decode=hex_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
""" Python Character Mapping Codec generated from 'hp_roman8.txt' with gencodec.py.
Based on data from ftp://dkuug.dk/i18n/charmaps/HP-ROMAN8 (Keld Simonsen)
Original source: LaserJet IIP Printer User's Manual HP part no
33471-90901, Hewlet-Packard, June 1989.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_map)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_map)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='hp-roman8',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x00a1: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x00a2: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00a3: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x00a4: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00a5: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00a6: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00a7: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00a8: 0x00b4, # ACUTE ACCENT
0x00a9: 0x02cb, # MODIFIER LETTER GRAVE ACCENT (Mandarin Chinese fourth tone)
0x00aa: 0x02c6, # MODIFIER LETTER CIRCUMFLEX ACCENT
0x00ab: 0x00a8, # DIAERESIS
0x00ac: 0x02dc, # SMALL TILDE
0x00ad: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x00ae: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00af: 0x20a4, # LIRA SIGN
0x00b0: 0x00af, # MACRON
0x00b1: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00b2: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x00b3: 0x00b0, # DEGREE SIGN
0x00b4: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00b5: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x00b6: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00b7: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00b8: 0x00a1, # INVERTED EXCLAMATION MARK
0x00b9: 0x00bf, # INVERTED QUESTION MARK
0x00ba: 0x00a4, # CURRENCY SIGN
0x00bb: 0x00a3, # POUND SIGN
0x00bc: 0x00a5, # YEN SIGN
0x00bd: 0x00a7, # SECTION SIGN
0x00be: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00bf: 0x00a2, # CENT SIGN
0x00c0: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00c1: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00c2: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00c3: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00c4: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00c5: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x00c6: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00c7: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00c8: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x00c9: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x00ca: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x00cb: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x00cc: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x00cd: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ce: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x00cf: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x00d0: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00d1: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00d2: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x00d3: 0x00c6, # LATIN CAPITAL LETTER AE
0x00d4: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x00d5: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00d6: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x00d7: 0x00e6, # LATIN SMALL LETTER AE
0x00d8: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00d9: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x00da: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00db: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dc: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x00dd: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x00de: 0x00df, # LATIN SMALL LETTER SHARP S (German)
0x00df: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e0: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00e1: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x00e2: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x00e3: 0x00d0, # LATIN CAPITAL LETTER ETH (Icelandic)
0x00e4: 0x00f0, # LATIN SMALL LETTER ETH (Icelandic)
0x00e5: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00e6: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x00e7: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e8: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00e9: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00ea: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00eb: 0x0160, # LATIN CAPITAL LETTER S WITH CARON
0x00ec: 0x0161, # LATIN SMALL LETTER S WITH CARON
0x00ed: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ee: 0x0178, # LATIN CAPITAL LETTER Y WITH DIAERESIS
0x00ef: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x00f0: 0x00de, # LATIN CAPITAL LETTER THORN (Icelandic)
0x00f1: 0x00fe, # LATIN SMALL LETTER THORN (Icelandic)
0x00f2: 0x00b7, # MIDDLE DOT
0x00f3: 0x00b5, # MICRO SIGN
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f6: 0x2014, # EM DASH
0x00f7: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00f8: 0x00bd, # VULGAR FRACTION ONE HALF
0x00f9: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00fa: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00fb: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00fc: 0x25a0, # BLACK SQUARE
0x00fd: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00fe: 0x00b1, # PLUS-MINUS SIGN
0x00ff: None,
})
### Encoding Map
encoding_map = codecs.make_encoding_map(decoding_map)
# This module implements the RFCs 3490 (IDNA) and 3491 (Nameprep)
import stringprep, re, codecs
from unicodedata import ucd_3_2_0 as unicodedata
# IDNA section 3.1
dots = re.compile(u"[\u002E\u3002\uFF0E\uFF61]")
# IDNA section 5
ace_prefix = "xn--"
uace_prefix = unicode(ace_prefix, "ascii")
# This assumes query strings, so AllowUnassigned is true
def nameprep(label):
# Map
newlabel = []
for c in label:
if stringprep.in_table_b1(c):
# Map to nothing
continue
newlabel.append(stringprep.map_table_b2(c))
label = u"".join(newlabel)
# Normalize
label = unicodedata.normalize("NFKC", label)
# Prohibit
for c in label:
if stringprep.in_table_c12(c) or \
stringprep.in_table_c22(c) or \
stringprep.in_table_c3(c) or \
stringprep.in_table_c4(c) or \
stringprep.in_table_c5(c) or \
stringprep.in_table_c6(c) or \
stringprep.in_table_c7(c) or \
stringprep.in_table_c8(c) or \
stringprep.in_table_c9(c):
raise UnicodeError("Invalid character %r" % c)
# Check bidi
RandAL = map(stringprep.in_table_d1, label)
for c in RandAL:
if c:
# There is a RandAL char in the string. Must perform further
# tests:
# 1) The characters in section 5.8 MUST be prohibited.
# This is table C.8, which was already checked
# 2) If a string contains any RandALCat character, the string
# MUST NOT contain any LCat character.
if filter(stringprep.in_table_d2, label):
raise UnicodeError("Violation of BIDI requirement 2")
# 3) If a string contains any RandALCat character, a
# RandALCat character MUST be the first character of the
# string, and a RandALCat character MUST be the last
# character of the string.
if not RandAL[0] or not RandAL[-1]:
raise UnicodeError("Violation of BIDI requirement 3")
return label
def ToASCII(label):
try:
# Step 1: try ASCII
label = label.encode("ascii")
except UnicodeError:
pass
else:
# Skip to step 3: UseSTD3ASCIIRules is false, so
# Skip to step 8.
if 0 < len(label) < 64:
return label
raise UnicodeError("label empty or too long")
# Step 2: nameprep
label = nameprep(label)
# Step 3: UseSTD3ASCIIRules is false
# Step 4: try ASCII
try:
label = label.encode("ascii")
except UnicodeError:
pass
else:
# Skip to step 8.
if 0 < len(label) < 64:
return label
raise UnicodeError("label empty or too long")
# Step 5: Check ACE prefix
if label.startswith(uace_prefix):
raise UnicodeError("Label starts with ACE prefix")
# Step 6: Encode with PUNYCODE
label = label.encode("punycode")
# Step 7: Prepend ACE prefix
label = ace_prefix + label
# Step 8: Check size
if 0 < len(label) < 64:
return label
raise UnicodeError("label empty or too long")
def ToUnicode(label):
# Step 1: Check for ASCII
if isinstance(label, str):
pure_ascii = True
else:
try:
label = label.encode("ascii")
pure_ascii = True
except UnicodeError:
pure_ascii = False
if not pure_ascii:
# Step 2: Perform nameprep
label = nameprep(label)
# It doesn't say this, but apparently, it should be ASCII now
try:
label = label.encode("ascii")
except UnicodeError:
raise UnicodeError("Invalid character in IDN label")
# Step 3: Check for ACE prefix
if not label.startswith(ace_prefix):
return unicode(label, "ascii")
# Step 4: Remove ACE prefix
label1 = label[len(ace_prefix):]
# Step 5: Decode using PUNYCODE
result = label1.decode("punycode")
# Step 6: Apply ToASCII
label2 = ToASCII(result)
# Step 7: Compare the result of step 6 with the one of step 3
# label2 will already be in lower case.
if label.lower() != label2:
raise UnicodeError("IDNA does not round-trip", label, label2)
# Step 8: return the result of step 5
return result
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
if errors != 'strict':
# IDNA is quite clear that implementations must be strict
raise UnicodeError("unsupported error handling "+errors)
if not input:
return "", 0
result = []
labels = dots.split(input)
if labels and len(labels[-1])==0:
trailing_dot = '.'
del labels[-1]
else:
trailing_dot = ''
for label in labels:
result.append(ToASCII(label))
# Join with U+002E
return ".".join(result)+trailing_dot, len(input)
def decode(self,input,errors='strict'):
if errors != 'strict':
raise UnicodeError("Unsupported error handling "+errors)
if not input:
return u"", 0
# IDNA allows decoding to operate on Unicode strings, too.
if isinstance(input, unicode):
labels = dots.split(input)
else:
# Must be ASCII string
input = str(input)
unicode(input, "ascii")
labels = input.split(".")
if labels and len(labels[-1]) == 0:
trailing_dot = u'.'
del labels[-1]
else:
trailing_dot = u''
result = []
for label in labels:
result.append(ToUnicode(label))
return u".".join(result)+trailing_dot, len(input)
class IncrementalEncoder(codecs.BufferedIncrementalEncoder):
def _buffer_encode(self, input, errors, final):
if errors != 'strict':
# IDNA is quite clear that implementations must be strict
raise UnicodeError("unsupported error handling "+errors)
if not input:
return ("", 0)
labels = dots.split(input)
trailing_dot = u''
if labels:
if not labels[-1]:
trailing_dot = '.'
del labels[-1]
elif not final:
# Keep potentially unfinished label until the next call
del labels[-1]
if labels:
trailing_dot = '.'
result = []
size = 0
for label in labels:
result.append(ToASCII(label))
if size:
size += 1
size += len(label)
# Join with U+002E
result = ".".join(result) + trailing_dot
size += len(trailing_dot)
return (result, size)
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
def _buffer_decode(self, input, errors, final):
if errors != 'strict':
raise UnicodeError("Unsupported error handling "+errors)
if not input:
return (u"", 0)
# IDNA allows decoding to operate on Unicode strings, too.
if isinstance(input, unicode):
labels = dots.split(input)
else:
# Must be ASCII string
input = str(input)
unicode(input, "ascii")
labels = input.split(".")
trailing_dot = u''
if labels:
if not labels[-1]:
trailing_dot = u'.'
del labels[-1]
elif not final:
# Keep potentially unfinished label until the next call
del labels[-1]
if labels:
trailing_dot = u'.'
result = []
size = 0
for label in labels:
result.append(ToUnicode(label))
if size:
size += 1
size += len(label)
result = u".".join(result) + trailing_dot
size += len(trailing_dot)
return (result, size)
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='idna',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python Character Mapping Codec iso8859_1 generated from 'MAPPINGS/ISO8859/8859-1.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-1',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\xbf' # 0xBF -> INVERTED QUESTION MARK
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH (Icelandic)
u'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN (Icelandic)
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S (German)
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH (Icelandic)
u'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
u'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
u'\xfe' # 0xFE -> LATIN SMALL LETTER THORN (Icelandic)
u'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_10 generated from 'MAPPINGS/ISO8859/8859-10.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-10',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0104' # 0xA1 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\u0112' # 0xA2 -> LATIN CAPITAL LETTER E WITH MACRON
u'\u0122' # 0xA3 -> LATIN CAPITAL LETTER G WITH CEDILLA
u'\u012a' # 0xA4 -> LATIN CAPITAL LETTER I WITH MACRON
u'\u0128' # 0xA5 -> LATIN CAPITAL LETTER I WITH TILDE
u'\u0136' # 0xA6 -> LATIN CAPITAL LETTER K WITH CEDILLA
u'\xa7' # 0xA7 -> SECTION SIGN
u'\u013b' # 0xA8 -> LATIN CAPITAL LETTER L WITH CEDILLA
u'\u0110' # 0xA9 -> LATIN CAPITAL LETTER D WITH STROKE
u'\u0160' # 0xAA -> LATIN CAPITAL LETTER S WITH CARON
u'\u0166' # 0xAB -> LATIN CAPITAL LETTER T WITH STROKE
u'\u017d' # 0xAC -> LATIN CAPITAL LETTER Z WITH CARON
u'\xad' # 0xAD -> SOFT HYPHEN
u'\u016a' # 0xAE -> LATIN CAPITAL LETTER U WITH MACRON
u'\u014a' # 0xAF -> LATIN CAPITAL LETTER ENG
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\u0105' # 0xB1 -> LATIN SMALL LETTER A WITH OGONEK
u'\u0113' # 0xB2 -> LATIN SMALL LETTER E WITH MACRON
u'\u0123' # 0xB3 -> LATIN SMALL LETTER G WITH CEDILLA
u'\u012b' # 0xB4 -> LATIN SMALL LETTER I WITH MACRON
u'\u0129' # 0xB5 -> LATIN SMALL LETTER I WITH TILDE
u'\u0137' # 0xB6 -> LATIN SMALL LETTER K WITH CEDILLA
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\u013c' # 0xB8 -> LATIN SMALL LETTER L WITH CEDILLA
u'\u0111' # 0xB9 -> LATIN SMALL LETTER D WITH STROKE
u'\u0161' # 0xBA -> LATIN SMALL LETTER S WITH CARON
u'\u0167' # 0xBB -> LATIN SMALL LETTER T WITH STROKE
u'\u017e' # 0xBC -> LATIN SMALL LETTER Z WITH CARON
u'\u2015' # 0xBD -> HORIZONTAL BAR
u'\u016b' # 0xBE -> LATIN SMALL LETTER U WITH MACRON
u'\u014b' # 0xBF -> LATIN SMALL LETTER ENG
u'\u0100' # 0xC0 -> LATIN CAPITAL LETTER A WITH MACRON
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\u012e' # 0xC7 -> LATIN CAPITAL LETTER I WITH OGONEK
u'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\u0118' # 0xCA -> LATIN CAPITAL LETTER E WITH OGONEK
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\u0116' # 0xCC -> LATIN CAPITAL LETTER E WITH DOT ABOVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH (Icelandic)
u'\u0145' # 0xD1 -> LATIN CAPITAL LETTER N WITH CEDILLA
u'\u014c' # 0xD2 -> LATIN CAPITAL LETTER O WITH MACRON
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\u0168' # 0xD7 -> LATIN CAPITAL LETTER U WITH TILDE
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\u0172' # 0xD9 -> LATIN CAPITAL LETTER U WITH OGONEK
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN (Icelandic)
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S (German)
u'\u0101' # 0xE0 -> LATIN SMALL LETTER A WITH MACRON
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\u012f' # 0xE7 -> LATIN SMALL LETTER I WITH OGONEK
u'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\u0119' # 0xEA -> LATIN SMALL LETTER E WITH OGONEK
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\u0117' # 0xEC -> LATIN SMALL LETTER E WITH DOT ABOVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH (Icelandic)
u'\u0146' # 0xF1 -> LATIN SMALL LETTER N WITH CEDILLA
u'\u014d' # 0xF2 -> LATIN SMALL LETTER O WITH MACRON
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\u0169' # 0xF7 -> LATIN SMALL LETTER U WITH TILDE
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\u0173' # 0xF9 -> LATIN SMALL LETTER U WITH OGONEK
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
u'\xfe' # 0xFE -> LATIN SMALL LETTER THORN (Icelandic)
u'\u0138' # 0xFF -> LATIN SMALL LETTER KRA
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_11 generated from 'MAPPINGS/ISO8859/8859-11.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-11',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0e01' # 0xA1 -> THAI CHARACTER KO KAI
u'\u0e02' # 0xA2 -> THAI CHARACTER KHO KHAI
u'\u0e03' # 0xA3 -> THAI CHARACTER KHO KHUAT
u'\u0e04' # 0xA4 -> THAI CHARACTER KHO KHWAI
u'\u0e05' # 0xA5 -> THAI CHARACTER KHO KHON
u'\u0e06' # 0xA6 -> THAI CHARACTER KHO RAKHANG
u'\u0e07' # 0xA7 -> THAI CHARACTER NGO NGU
u'\u0e08' # 0xA8 -> THAI CHARACTER CHO CHAN
u'\u0e09' # 0xA9 -> THAI CHARACTER CHO CHING
u'\u0e0a' # 0xAA -> THAI CHARACTER CHO CHANG
u'\u0e0b' # 0xAB -> THAI CHARACTER SO SO
u'\u0e0c' # 0xAC -> THAI CHARACTER CHO CHOE
u'\u0e0d' # 0xAD -> THAI CHARACTER YO YING
u'\u0e0e' # 0xAE -> THAI CHARACTER DO CHADA
u'\u0e0f' # 0xAF -> THAI CHARACTER TO PATAK
u'\u0e10' # 0xB0 -> THAI CHARACTER THO THAN
u'\u0e11' # 0xB1 -> THAI CHARACTER THO NANGMONTHO
u'\u0e12' # 0xB2 -> THAI CHARACTER THO PHUTHAO
u'\u0e13' # 0xB3 -> THAI CHARACTER NO NEN
u'\u0e14' # 0xB4 -> THAI CHARACTER DO DEK
u'\u0e15' # 0xB5 -> THAI CHARACTER TO TAO
u'\u0e16' # 0xB6 -> THAI CHARACTER THO THUNG
u'\u0e17' # 0xB7 -> THAI CHARACTER THO THAHAN
u'\u0e18' # 0xB8 -> THAI CHARACTER THO THONG
u'\u0e19' # 0xB9 -> THAI CHARACTER NO NU
u'\u0e1a' # 0xBA -> THAI CHARACTER BO BAIMAI
u'\u0e1b' # 0xBB -> THAI CHARACTER PO PLA
u'\u0e1c' # 0xBC -> THAI CHARACTER PHO PHUNG
u'\u0e1d' # 0xBD -> THAI CHARACTER FO FA
u'\u0e1e' # 0xBE -> THAI CHARACTER PHO PHAN
u'\u0e1f' # 0xBF -> THAI CHARACTER FO FAN
u'\u0e20' # 0xC0 -> THAI CHARACTER PHO SAMPHAO
u'\u0e21' # 0xC1 -> THAI CHARACTER MO MA
u'\u0e22' # 0xC2 -> THAI CHARACTER YO YAK
u'\u0e23' # 0xC3 -> THAI CHARACTER RO RUA
u'\u0e24' # 0xC4 -> THAI CHARACTER RU
u'\u0e25' # 0xC5 -> THAI CHARACTER LO LING
u'\u0e26' # 0xC6 -> THAI CHARACTER LU
u'\u0e27' # 0xC7 -> THAI CHARACTER WO WAEN
u'\u0e28' # 0xC8 -> THAI CHARACTER SO SALA
u'\u0e29' # 0xC9 -> THAI CHARACTER SO RUSI
u'\u0e2a' # 0xCA -> THAI CHARACTER SO SUA
u'\u0e2b' # 0xCB -> THAI CHARACTER HO HIP
u'\u0e2c' # 0xCC -> THAI CHARACTER LO CHULA
u'\u0e2d' # 0xCD -> THAI CHARACTER O ANG
u'\u0e2e' # 0xCE -> THAI CHARACTER HO NOKHUK
u'\u0e2f' # 0xCF -> THAI CHARACTER PAIYANNOI
u'\u0e30' # 0xD0 -> THAI CHARACTER SARA A
u'\u0e31' # 0xD1 -> THAI CHARACTER MAI HAN-AKAT
u'\u0e32' # 0xD2 -> THAI CHARACTER SARA AA
u'\u0e33' # 0xD3 -> THAI CHARACTER SARA AM
u'\u0e34' # 0xD4 -> THAI CHARACTER SARA I
u'\u0e35' # 0xD5 -> THAI CHARACTER SARA II
u'\u0e36' # 0xD6 -> THAI CHARACTER SARA UE
u'\u0e37' # 0xD7 -> THAI CHARACTER SARA UEE
u'\u0e38' # 0xD8 -> THAI CHARACTER SARA U
u'\u0e39' # 0xD9 -> THAI CHARACTER SARA UU
u'\u0e3a' # 0xDA -> THAI CHARACTER PHINTHU
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\u0e3f' # 0xDF -> THAI CURRENCY SYMBOL BAHT
u'\u0e40' # 0xE0 -> THAI CHARACTER SARA E
u'\u0e41' # 0xE1 -> THAI CHARACTER SARA AE
u'\u0e42' # 0xE2 -> THAI CHARACTER SARA O
u'\u0e43' # 0xE3 -> THAI CHARACTER SARA AI MAIMUAN
u'\u0e44' # 0xE4 -> THAI CHARACTER SARA AI MAIMALAI
u'\u0e45' # 0xE5 -> THAI CHARACTER LAKKHANGYAO
u'\u0e46' # 0xE6 -> THAI CHARACTER MAIYAMOK
u'\u0e47' # 0xE7 -> THAI CHARACTER MAITAIKHU
u'\u0e48' # 0xE8 -> THAI CHARACTER MAI EK
u'\u0e49' # 0xE9 -> THAI CHARACTER MAI THO
u'\u0e4a' # 0xEA -> THAI CHARACTER MAI TRI
u'\u0e4b' # 0xEB -> THAI CHARACTER MAI CHATTAWA
u'\u0e4c' # 0xEC -> THAI CHARACTER THANTHAKHAT
u'\u0e4d' # 0xED -> THAI CHARACTER NIKHAHIT
u'\u0e4e' # 0xEE -> THAI CHARACTER YAMAKKAN
u'\u0e4f' # 0xEF -> THAI CHARACTER FONGMAN
u'\u0e50' # 0xF0 -> THAI DIGIT ZERO
u'\u0e51' # 0xF1 -> THAI DIGIT ONE
u'\u0e52' # 0xF2 -> THAI DIGIT TWO
u'\u0e53' # 0xF3 -> THAI DIGIT THREE
u'\u0e54' # 0xF4 -> THAI DIGIT FOUR
u'\u0e55' # 0xF5 -> THAI DIGIT FIVE
u'\u0e56' # 0xF6 -> THAI DIGIT SIX
u'\u0e57' # 0xF7 -> THAI DIGIT SEVEN
u'\u0e58' # 0xF8 -> THAI DIGIT EIGHT
u'\u0e59' # 0xF9 -> THAI DIGIT NINE
u'\u0e5a' # 0xFA -> THAI CHARACTER ANGKHANKHU
u'\u0e5b' # 0xFB -> THAI CHARACTER KHOMUT
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_13 generated from 'MAPPINGS/ISO8859/8859-13.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-13',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u201d' # 0xA1 -> RIGHT DOUBLE QUOTATION MARK
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\u201e' # 0xA5 -> DOUBLE LOW-9 QUOTATION MARK
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xd8' # 0xA8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u0156' # 0xAA -> LATIN CAPITAL LETTER R WITH CEDILLA
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xc6' # 0xAF -> LATIN CAPITAL LETTER AE
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\u201c' # 0xB4 -> LEFT DOUBLE QUOTATION MARK
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xf8' # 0xB8 -> LATIN SMALL LETTER O WITH STROKE
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\u0157' # 0xBA -> LATIN SMALL LETTER R WITH CEDILLA
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\xe6' # 0xBF -> LATIN SMALL LETTER AE
u'\u0104' # 0xC0 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\u012e' # 0xC1 -> LATIN CAPITAL LETTER I WITH OGONEK
u'\u0100' # 0xC2 -> LATIN CAPITAL LETTER A WITH MACRON
u'\u0106' # 0xC3 -> LATIN CAPITAL LETTER C WITH ACUTE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\u0118' # 0xC6 -> LATIN CAPITAL LETTER E WITH OGONEK
u'\u0112' # 0xC7 -> LATIN CAPITAL LETTER E WITH MACRON
u'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\u0179' # 0xCA -> LATIN CAPITAL LETTER Z WITH ACUTE
u'\u0116' # 0xCB -> LATIN CAPITAL LETTER E WITH DOT ABOVE
u'\u0122' # 0xCC -> LATIN CAPITAL LETTER G WITH CEDILLA
u'\u0136' # 0xCD -> LATIN CAPITAL LETTER K WITH CEDILLA
u'\u012a' # 0xCE -> LATIN CAPITAL LETTER I WITH MACRON
u'\u013b' # 0xCF -> LATIN CAPITAL LETTER L WITH CEDILLA
u'\u0160' # 0xD0 -> LATIN CAPITAL LETTER S WITH CARON
u'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
u'\u0145' # 0xD2 -> LATIN CAPITAL LETTER N WITH CEDILLA
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\u014c' # 0xD4 -> LATIN CAPITAL LETTER O WITH MACRON
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\u0172' # 0xD8 -> LATIN CAPITAL LETTER U WITH OGONEK
u'\u0141' # 0xD9 -> LATIN CAPITAL LETTER L WITH STROKE
u'\u015a' # 0xDA -> LATIN CAPITAL LETTER S WITH ACUTE
u'\u016a' # 0xDB -> LATIN CAPITAL LETTER U WITH MACRON
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u017b' # 0xDD -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\u017d' # 0xDE -> LATIN CAPITAL LETTER Z WITH CARON
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S (German)
u'\u0105' # 0xE0 -> LATIN SMALL LETTER A WITH OGONEK
u'\u012f' # 0xE1 -> LATIN SMALL LETTER I WITH OGONEK
u'\u0101' # 0xE2 -> LATIN SMALL LETTER A WITH MACRON
u'\u0107' # 0xE3 -> LATIN SMALL LETTER C WITH ACUTE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\u0119' # 0xE6 -> LATIN SMALL LETTER E WITH OGONEK
u'\u0113' # 0xE7 -> LATIN SMALL LETTER E WITH MACRON
u'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\u017a' # 0xEA -> LATIN SMALL LETTER Z WITH ACUTE
u'\u0117' # 0xEB -> LATIN SMALL LETTER E WITH DOT ABOVE
u'\u0123' # 0xEC -> LATIN SMALL LETTER G WITH CEDILLA
u'\u0137' # 0xED -> LATIN SMALL LETTER K WITH CEDILLA
u'\u012b' # 0xEE -> LATIN SMALL LETTER I WITH MACRON
u'\u013c' # 0xEF -> LATIN SMALL LETTER L WITH CEDILLA
u'\u0161' # 0xF0 -> LATIN SMALL LETTER S WITH CARON
u'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
u'\u0146' # 0xF2 -> LATIN SMALL LETTER N WITH CEDILLA
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\u014d' # 0xF4 -> LATIN SMALL LETTER O WITH MACRON
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\u0173' # 0xF8 -> LATIN SMALL LETTER U WITH OGONEK
u'\u0142' # 0xF9 -> LATIN SMALL LETTER L WITH STROKE
u'\u015b' # 0xFA -> LATIN SMALL LETTER S WITH ACUTE
u'\u016b' # 0xFB -> LATIN SMALL LETTER U WITH MACRON
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u017c' # 0xFD -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\u017e' # 0xFE -> LATIN SMALL LETTER Z WITH CARON
u'\u2019' # 0xFF -> RIGHT SINGLE QUOTATION MARK
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_14 generated from 'MAPPINGS/ISO8859/8859-14.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-14',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u1e02' # 0xA1 -> LATIN CAPITAL LETTER B WITH DOT ABOVE
u'\u1e03' # 0xA2 -> LATIN SMALL LETTER B WITH DOT ABOVE
u'\xa3' # 0xA3 -> POUND SIGN
u'\u010a' # 0xA4 -> LATIN CAPITAL LETTER C WITH DOT ABOVE
u'\u010b' # 0xA5 -> LATIN SMALL LETTER C WITH DOT ABOVE
u'\u1e0a' # 0xA6 -> LATIN CAPITAL LETTER D WITH DOT ABOVE
u'\xa7' # 0xA7 -> SECTION SIGN
u'\u1e80' # 0xA8 -> LATIN CAPITAL LETTER W WITH GRAVE
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u1e82' # 0xAA -> LATIN CAPITAL LETTER W WITH ACUTE
u'\u1e0b' # 0xAB -> LATIN SMALL LETTER D WITH DOT ABOVE
u'\u1ef2' # 0xAC -> LATIN CAPITAL LETTER Y WITH GRAVE
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\u0178' # 0xAF -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\u1e1e' # 0xB0 -> LATIN CAPITAL LETTER F WITH DOT ABOVE
u'\u1e1f' # 0xB1 -> LATIN SMALL LETTER F WITH DOT ABOVE
u'\u0120' # 0xB2 -> LATIN CAPITAL LETTER G WITH DOT ABOVE
u'\u0121' # 0xB3 -> LATIN SMALL LETTER G WITH DOT ABOVE
u'\u1e40' # 0xB4 -> LATIN CAPITAL LETTER M WITH DOT ABOVE
u'\u1e41' # 0xB5 -> LATIN SMALL LETTER M WITH DOT ABOVE
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\u1e56' # 0xB7 -> LATIN CAPITAL LETTER P WITH DOT ABOVE
u'\u1e81' # 0xB8 -> LATIN SMALL LETTER W WITH GRAVE
u'\u1e57' # 0xB9 -> LATIN SMALL LETTER P WITH DOT ABOVE
u'\u1e83' # 0xBA -> LATIN SMALL LETTER W WITH ACUTE
u'\u1e60' # 0xBB -> LATIN CAPITAL LETTER S WITH DOT ABOVE
u'\u1ef3' # 0xBC -> LATIN SMALL LETTER Y WITH GRAVE
u'\u1e84' # 0xBD -> LATIN CAPITAL LETTER W WITH DIAERESIS
u'\u1e85' # 0xBE -> LATIN SMALL LETTER W WITH DIAERESIS
u'\u1e61' # 0xBF -> LATIN SMALL LETTER S WITH DOT ABOVE
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\u0174' # 0xD0 -> LATIN CAPITAL LETTER W WITH CIRCUMFLEX
u'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\u1e6a' # 0xD7 -> LATIN CAPITAL LETTER T WITH DOT ABOVE
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\u0176' # 0xDE -> LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\u0175' # 0xF0 -> LATIN SMALL LETTER W WITH CIRCUMFLEX
u'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
u'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\u1e6b' # 0xF7 -> LATIN SMALL LETTER T WITH DOT ABOVE
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
u'\u0177' # 0xFE -> LATIN SMALL LETTER Y WITH CIRCUMFLEX
u'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_15 generated from 'MAPPINGS/ISO8859/8859-15.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-15',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\u20ac' # 0xA4 -> EURO SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\u0160' # 0xA6 -> LATIN CAPITAL LETTER S WITH CARON
u'\xa7' # 0xA7 -> SECTION SIGN
u'\u0161' # 0xA8 -> LATIN SMALL LETTER S WITH CARON
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\u017d' # 0xB4 -> LATIN CAPITAL LETTER Z WITH CARON
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\u017e' # 0xB8 -> LATIN SMALL LETTER Z WITH CARON
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u0152' # 0xBC -> LATIN CAPITAL LIGATURE OE
u'\u0153' # 0xBD -> LATIN SMALL LIGATURE OE
u'\u0178' # 0xBE -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\xbf' # 0xBF -> INVERTED QUESTION MARK
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH
u'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH
u'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
u'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
u'\xfe' # 0xFE -> LATIN SMALL LETTER THORN
u'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_16 generated from 'MAPPINGS/ISO8859/8859-16.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-16',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0104' # 0xA1 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\u0105' # 0xA2 -> LATIN SMALL LETTER A WITH OGONEK
u'\u0141' # 0xA3 -> LATIN CAPITAL LETTER L WITH STROKE
u'\u20ac' # 0xA4 -> EURO SIGN
u'\u201e' # 0xA5 -> DOUBLE LOW-9 QUOTATION MARK
u'\u0160' # 0xA6 -> LATIN CAPITAL LETTER S WITH CARON
u'\xa7' # 0xA7 -> SECTION SIGN
u'\u0161' # 0xA8 -> LATIN SMALL LETTER S WITH CARON
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u0218' # 0xAA -> LATIN CAPITAL LETTER S WITH COMMA BELOW
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u0179' # 0xAC -> LATIN CAPITAL LETTER Z WITH ACUTE
u'\xad' # 0xAD -> SOFT HYPHEN
u'\u017a' # 0xAE -> LATIN SMALL LETTER Z WITH ACUTE
u'\u017b' # 0xAF -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u010c' # 0xB2 -> LATIN CAPITAL LETTER C WITH CARON
u'\u0142' # 0xB3 -> LATIN SMALL LETTER L WITH STROKE
u'\u017d' # 0xB4 -> LATIN CAPITAL LETTER Z WITH CARON
u'\u201d' # 0xB5 -> RIGHT DOUBLE QUOTATION MARK
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\u017e' # 0xB8 -> LATIN SMALL LETTER Z WITH CARON
u'\u010d' # 0xB9 -> LATIN SMALL LETTER C WITH CARON
u'\u0219' # 0xBA -> LATIN SMALL LETTER S WITH COMMA BELOW
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u0152' # 0xBC -> LATIN CAPITAL LIGATURE OE
u'\u0153' # 0xBD -> LATIN SMALL LIGATURE OE
u'\u0178' # 0xBE -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\u017c' # 0xBF -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\u0102' # 0xC3 -> LATIN CAPITAL LETTER A WITH BREVE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\u0106' # 0xC5 -> LATIN CAPITAL LETTER C WITH ACUTE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
u'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
u'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\u0150' # 0xD5 -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\u015a' # 0xD7 -> LATIN CAPITAL LETTER S WITH ACUTE
u'\u0170' # 0xD8 -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u0118' # 0xDD -> LATIN CAPITAL LETTER E WITH OGONEK
u'\u021a' # 0xDE -> LATIN CAPITAL LETTER T WITH COMMA BELOW
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\u0103' # 0xE3 -> LATIN SMALL LETTER A WITH BREVE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u0107' # 0xE5 -> LATIN SMALL LETTER C WITH ACUTE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
u'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
u'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\u0151' # 0xF5 -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\u015b' # 0xF7 -> LATIN SMALL LETTER S WITH ACUTE
u'\u0171' # 0xF8 -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u0119' # 0xFD -> LATIN SMALL LETTER E WITH OGONEK
u'\u021b' # 0xFE -> LATIN SMALL LETTER T WITH COMMA BELOW
u'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_2 generated from 'MAPPINGS/ISO8859/8859-2.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-2',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0104' # 0xA1 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\u02d8' # 0xA2 -> BREVE
u'\u0141' # 0xA3 -> LATIN CAPITAL LETTER L WITH STROKE
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\u013d' # 0xA5 -> LATIN CAPITAL LETTER L WITH CARON
u'\u015a' # 0xA6 -> LATIN CAPITAL LETTER S WITH ACUTE
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\u0160' # 0xA9 -> LATIN CAPITAL LETTER S WITH CARON
u'\u015e' # 0xAA -> LATIN CAPITAL LETTER S WITH CEDILLA
u'\u0164' # 0xAB -> LATIN CAPITAL LETTER T WITH CARON
u'\u0179' # 0xAC -> LATIN CAPITAL LETTER Z WITH ACUTE
u'\xad' # 0xAD -> SOFT HYPHEN
u'\u017d' # 0xAE -> LATIN CAPITAL LETTER Z WITH CARON
u'\u017b' # 0xAF -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\u0105' # 0xB1 -> LATIN SMALL LETTER A WITH OGONEK
u'\u02db' # 0xB2 -> OGONEK
u'\u0142' # 0xB3 -> LATIN SMALL LETTER L WITH STROKE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\u013e' # 0xB5 -> LATIN SMALL LETTER L WITH CARON
u'\u015b' # 0xB6 -> LATIN SMALL LETTER S WITH ACUTE
u'\u02c7' # 0xB7 -> CARON
u'\xb8' # 0xB8 -> CEDILLA
u'\u0161' # 0xB9 -> LATIN SMALL LETTER S WITH CARON
u'\u015f' # 0xBA -> LATIN SMALL LETTER S WITH CEDILLA
u'\u0165' # 0xBB -> LATIN SMALL LETTER T WITH CARON
u'\u017a' # 0xBC -> LATIN SMALL LETTER Z WITH ACUTE
u'\u02dd' # 0xBD -> DOUBLE ACUTE ACCENT
u'\u017e' # 0xBE -> LATIN SMALL LETTER Z WITH CARON
u'\u017c' # 0xBF -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\u0154' # 0xC0 -> LATIN CAPITAL LETTER R WITH ACUTE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\u0102' # 0xC3 -> LATIN CAPITAL LETTER A WITH BREVE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\u0139' # 0xC5 -> LATIN CAPITAL LETTER L WITH ACUTE
u'\u0106' # 0xC6 -> LATIN CAPITAL LETTER C WITH ACUTE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\u0118' # 0xCA -> LATIN CAPITAL LETTER E WITH OGONEK
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\u011a' # 0xCC -> LATIN CAPITAL LETTER E WITH CARON
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\u010e' # 0xCF -> LATIN CAPITAL LETTER D WITH CARON
u'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
u'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
u'\u0147' # 0xD2 -> LATIN CAPITAL LETTER N WITH CARON
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\u0150' # 0xD5 -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\u0158' # 0xD8 -> LATIN CAPITAL LETTER R WITH CARON
u'\u016e' # 0xD9 -> LATIN CAPITAL LETTER U WITH RING ABOVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\u0170' # 0xDB -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\u0162' # 0xDE -> LATIN CAPITAL LETTER T WITH CEDILLA
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\u0155' # 0xE0 -> LATIN SMALL LETTER R WITH ACUTE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\u0103' # 0xE3 -> LATIN SMALL LETTER A WITH BREVE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u013a' # 0xE5 -> LATIN SMALL LETTER L WITH ACUTE
u'\u0107' # 0xE6 -> LATIN SMALL LETTER C WITH ACUTE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\u0119' # 0xEA -> LATIN SMALL LETTER E WITH OGONEK
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\u011b' # 0xEC -> LATIN SMALL LETTER E WITH CARON
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\u010f' # 0xEF -> LATIN SMALL LETTER D WITH CARON
u'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
u'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
u'\u0148' # 0xF2 -> LATIN SMALL LETTER N WITH CARON
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\u0151' # 0xF5 -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\u0159' # 0xF8 -> LATIN SMALL LETTER R WITH CARON
u'\u016f' # 0xF9 -> LATIN SMALL LETTER U WITH RING ABOVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\u0171' # 0xFB -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
u'\u0163' # 0xFE -> LATIN SMALL LETTER T WITH CEDILLA
u'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_3 generated from 'MAPPINGS/ISO8859/8859-3.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-3',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0126' # 0xA1 -> LATIN CAPITAL LETTER H WITH STROKE
u'\u02d8' # 0xA2 -> BREVE
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\ufffe'
u'\u0124' # 0xA6 -> LATIN CAPITAL LETTER H WITH CIRCUMFLEX
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\u0130' # 0xA9 -> LATIN CAPITAL LETTER I WITH DOT ABOVE
u'\u015e' # 0xAA -> LATIN CAPITAL LETTER S WITH CEDILLA
u'\u011e' # 0xAB -> LATIN CAPITAL LETTER G WITH BREVE
u'\u0134' # 0xAC -> LATIN CAPITAL LETTER J WITH CIRCUMFLEX
u'\xad' # 0xAD -> SOFT HYPHEN
u'\ufffe'
u'\u017b' # 0xAF -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\u0127' # 0xB1 -> LATIN SMALL LETTER H WITH STROKE
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\u0125' # 0xB6 -> LATIN SMALL LETTER H WITH CIRCUMFLEX
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\u0131' # 0xB9 -> LATIN SMALL LETTER DOTLESS I
u'\u015f' # 0xBA -> LATIN SMALL LETTER S WITH CEDILLA
u'\u011f' # 0xBB -> LATIN SMALL LETTER G WITH BREVE
u'\u0135' # 0xBC -> LATIN SMALL LETTER J WITH CIRCUMFLEX
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\ufffe'
u'\u017c' # 0xBF -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\ufffe'
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\u010a' # 0xC5 -> LATIN CAPITAL LETTER C WITH DOT ABOVE
u'\u0108' # 0xC6 -> LATIN CAPITAL LETTER C WITH CIRCUMFLEX
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\ufffe'
u'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\u0120' # 0xD5 -> LATIN CAPITAL LETTER G WITH DOT ABOVE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\u011c' # 0xD8 -> LATIN CAPITAL LETTER G WITH CIRCUMFLEX
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u016c' # 0xDD -> LATIN CAPITAL LETTER U WITH BREVE
u'\u015c' # 0xDE -> LATIN CAPITAL LETTER S WITH CIRCUMFLEX
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\ufffe'
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u010b' # 0xE5 -> LATIN SMALL LETTER C WITH DOT ABOVE
u'\u0109' # 0xE6 -> LATIN SMALL LETTER C WITH CIRCUMFLEX
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\ufffe'
u'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
u'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\u0121' # 0xF5 -> LATIN SMALL LETTER G WITH DOT ABOVE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\u011d' # 0xF8 -> LATIN SMALL LETTER G WITH CIRCUMFLEX
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u016d' # 0xFD -> LATIN SMALL LETTER U WITH BREVE
u'\u015d' # 0xFE -> LATIN SMALL LETTER S WITH CIRCUMFLEX
u'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_4 generated from 'MAPPINGS/ISO8859/8859-4.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-4',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0104' # 0xA1 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\u0138' # 0xA2 -> LATIN SMALL LETTER KRA
u'\u0156' # 0xA3 -> LATIN CAPITAL LETTER R WITH CEDILLA
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\u0128' # 0xA5 -> LATIN CAPITAL LETTER I WITH TILDE
u'\u013b' # 0xA6 -> LATIN CAPITAL LETTER L WITH CEDILLA
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\u0160' # 0xA9 -> LATIN CAPITAL LETTER S WITH CARON
u'\u0112' # 0xAA -> LATIN CAPITAL LETTER E WITH MACRON
u'\u0122' # 0xAB -> LATIN CAPITAL LETTER G WITH CEDILLA
u'\u0166' # 0xAC -> LATIN CAPITAL LETTER T WITH STROKE
u'\xad' # 0xAD -> SOFT HYPHEN
u'\u017d' # 0xAE -> LATIN CAPITAL LETTER Z WITH CARON
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\u0105' # 0xB1 -> LATIN SMALL LETTER A WITH OGONEK
u'\u02db' # 0xB2 -> OGONEK
u'\u0157' # 0xB3 -> LATIN SMALL LETTER R WITH CEDILLA
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\u0129' # 0xB5 -> LATIN SMALL LETTER I WITH TILDE
u'\u013c' # 0xB6 -> LATIN SMALL LETTER L WITH CEDILLA
u'\u02c7' # 0xB7 -> CARON
u'\xb8' # 0xB8 -> CEDILLA
u'\u0161' # 0xB9 -> LATIN SMALL LETTER S WITH CARON
u'\u0113' # 0xBA -> LATIN SMALL LETTER E WITH MACRON
u'\u0123' # 0xBB -> LATIN SMALL LETTER G WITH CEDILLA
u'\u0167' # 0xBC -> LATIN SMALL LETTER T WITH STROKE
u'\u014a' # 0xBD -> LATIN CAPITAL LETTER ENG
u'\u017e' # 0xBE -> LATIN SMALL LETTER Z WITH CARON
u'\u014b' # 0xBF -> LATIN SMALL LETTER ENG
u'\u0100' # 0xC0 -> LATIN CAPITAL LETTER A WITH MACRON
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\u012e' # 0xC7 -> LATIN CAPITAL LETTER I WITH OGONEK
u'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\u0118' # 0xCA -> LATIN CAPITAL LETTER E WITH OGONEK
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\u0116' # 0xCC -> LATIN CAPITAL LETTER E WITH DOT ABOVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\u012a' # 0xCF -> LATIN CAPITAL LETTER I WITH MACRON
u'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
u'\u0145' # 0xD1 -> LATIN CAPITAL LETTER N WITH CEDILLA
u'\u014c' # 0xD2 -> LATIN CAPITAL LETTER O WITH MACRON
u'\u0136' # 0xD3 -> LATIN CAPITAL LETTER K WITH CEDILLA
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\u0172' # 0xD9 -> LATIN CAPITAL LETTER U WITH OGONEK
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u0168' # 0xDD -> LATIN CAPITAL LETTER U WITH TILDE
u'\u016a' # 0xDE -> LATIN CAPITAL LETTER U WITH MACRON
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\u0101' # 0xE0 -> LATIN SMALL LETTER A WITH MACRON
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\u012f' # 0xE7 -> LATIN SMALL LETTER I WITH OGONEK
u'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\u0119' # 0xEA -> LATIN SMALL LETTER E WITH OGONEK
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\u0117' # 0xEC -> LATIN SMALL LETTER E WITH DOT ABOVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\u012b' # 0xEF -> LATIN SMALL LETTER I WITH MACRON
u'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
u'\u0146' # 0xF1 -> LATIN SMALL LETTER N WITH CEDILLA
u'\u014d' # 0xF2 -> LATIN SMALL LETTER O WITH MACRON
u'\u0137' # 0xF3 -> LATIN SMALL LETTER K WITH CEDILLA
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\u0173' # 0xF9 -> LATIN SMALL LETTER U WITH OGONEK
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u0169' # 0xFD -> LATIN SMALL LETTER U WITH TILDE
u'\u016b' # 0xFE -> LATIN SMALL LETTER U WITH MACRON
u'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_5 generated from 'MAPPINGS/ISO8859/8859-5.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-5',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u0401' # 0xA1 -> CYRILLIC CAPITAL LETTER IO
u'\u0402' # 0xA2 -> CYRILLIC CAPITAL LETTER DJE
u'\u0403' # 0xA3 -> CYRILLIC CAPITAL LETTER GJE
u'\u0404' # 0xA4 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
u'\u0405' # 0xA5 -> CYRILLIC CAPITAL LETTER DZE
u'\u0406' # 0xA6 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
u'\u0407' # 0xA7 -> CYRILLIC CAPITAL LETTER YI
u'\u0408' # 0xA8 -> CYRILLIC CAPITAL LETTER JE
u'\u0409' # 0xA9 -> CYRILLIC CAPITAL LETTER LJE
u'\u040a' # 0xAA -> CYRILLIC CAPITAL LETTER NJE
u'\u040b' # 0xAB -> CYRILLIC CAPITAL LETTER TSHE
u'\u040c' # 0xAC -> CYRILLIC CAPITAL LETTER KJE
u'\xad' # 0xAD -> SOFT HYPHEN
u'\u040e' # 0xAE -> CYRILLIC CAPITAL LETTER SHORT U
u'\u040f' # 0xAF -> CYRILLIC CAPITAL LETTER DZHE
u'\u0410' # 0xB0 -> CYRILLIC CAPITAL LETTER A
u'\u0411' # 0xB1 -> CYRILLIC CAPITAL LETTER BE
u'\u0412' # 0xB2 -> CYRILLIC CAPITAL LETTER VE
u'\u0413' # 0xB3 -> CYRILLIC CAPITAL LETTER GHE
u'\u0414' # 0xB4 -> CYRILLIC CAPITAL LETTER DE
u'\u0415' # 0xB5 -> CYRILLIC CAPITAL LETTER IE
u'\u0416' # 0xB6 -> CYRILLIC CAPITAL LETTER ZHE
u'\u0417' # 0xB7 -> CYRILLIC CAPITAL LETTER ZE
u'\u0418' # 0xB8 -> CYRILLIC CAPITAL LETTER I
u'\u0419' # 0xB9 -> CYRILLIC CAPITAL LETTER SHORT I
u'\u041a' # 0xBA -> CYRILLIC CAPITAL LETTER KA
u'\u041b' # 0xBB -> CYRILLIC CAPITAL LETTER EL
u'\u041c' # 0xBC -> CYRILLIC CAPITAL LETTER EM
u'\u041d' # 0xBD -> CYRILLIC CAPITAL LETTER EN
u'\u041e' # 0xBE -> CYRILLIC CAPITAL LETTER O
u'\u041f' # 0xBF -> CYRILLIC CAPITAL LETTER PE
u'\u0420' # 0xC0 -> CYRILLIC CAPITAL LETTER ER
u'\u0421' # 0xC1 -> CYRILLIC CAPITAL LETTER ES
u'\u0422' # 0xC2 -> CYRILLIC CAPITAL LETTER TE
u'\u0423' # 0xC3 -> CYRILLIC CAPITAL LETTER U
u'\u0424' # 0xC4 -> CYRILLIC CAPITAL LETTER EF
u'\u0425' # 0xC5 -> CYRILLIC CAPITAL LETTER HA
u'\u0426' # 0xC6 -> CYRILLIC CAPITAL LETTER TSE
u'\u0427' # 0xC7 -> CYRILLIC CAPITAL LETTER CHE
u'\u0428' # 0xC8 -> CYRILLIC CAPITAL LETTER SHA
u'\u0429' # 0xC9 -> CYRILLIC CAPITAL LETTER SHCHA
u'\u042a' # 0xCA -> CYRILLIC CAPITAL LETTER HARD SIGN
u'\u042b' # 0xCB -> CYRILLIC CAPITAL LETTER YERU
u'\u042c' # 0xCC -> CYRILLIC CAPITAL LETTER SOFT SIGN
u'\u042d' # 0xCD -> CYRILLIC CAPITAL LETTER E
u'\u042e' # 0xCE -> CYRILLIC CAPITAL LETTER YU
u'\u042f' # 0xCF -> CYRILLIC CAPITAL LETTER YA
u'\u0430' # 0xD0 -> CYRILLIC SMALL LETTER A
u'\u0431' # 0xD1 -> CYRILLIC SMALL LETTER BE
u'\u0432' # 0xD2 -> CYRILLIC SMALL LETTER VE
u'\u0433' # 0xD3 -> CYRILLIC SMALL LETTER GHE
u'\u0434' # 0xD4 -> CYRILLIC SMALL LETTER DE
u'\u0435' # 0xD5 -> CYRILLIC SMALL LETTER IE
u'\u0436' # 0xD6 -> CYRILLIC SMALL LETTER ZHE
u'\u0437' # 0xD7 -> CYRILLIC SMALL LETTER ZE
u'\u0438' # 0xD8 -> CYRILLIC SMALL LETTER I
u'\u0439' # 0xD9 -> CYRILLIC SMALL LETTER SHORT I
u'\u043a' # 0xDA -> CYRILLIC SMALL LETTER KA
u'\u043b' # 0xDB -> CYRILLIC SMALL LETTER EL
u'\u043c' # 0xDC -> CYRILLIC SMALL LETTER EM
u'\u043d' # 0xDD -> CYRILLIC SMALL LETTER EN
u'\u043e' # 0xDE -> CYRILLIC SMALL LETTER O
u'\u043f' # 0xDF -> CYRILLIC SMALL LETTER PE
u'\u0440' # 0xE0 -> CYRILLIC SMALL LETTER ER
u'\u0441' # 0xE1 -> CYRILLIC SMALL LETTER ES
u'\u0442' # 0xE2 -> CYRILLIC SMALL LETTER TE
u'\u0443' # 0xE3 -> CYRILLIC SMALL LETTER U
u'\u0444' # 0xE4 -> CYRILLIC SMALL LETTER EF
u'\u0445' # 0xE5 -> CYRILLIC SMALL LETTER HA
u'\u0446' # 0xE6 -> CYRILLIC SMALL LETTER TSE
u'\u0447' # 0xE7 -> CYRILLIC SMALL LETTER CHE
u'\u0448' # 0xE8 -> CYRILLIC SMALL LETTER SHA
u'\u0449' # 0xE9 -> CYRILLIC SMALL LETTER SHCHA
u'\u044a' # 0xEA -> CYRILLIC SMALL LETTER HARD SIGN
u'\u044b' # 0xEB -> CYRILLIC SMALL LETTER YERU
u'\u044c' # 0xEC -> CYRILLIC SMALL LETTER SOFT SIGN
u'\u044d' # 0xED -> CYRILLIC SMALL LETTER E
u'\u044e' # 0xEE -> CYRILLIC SMALL LETTER YU
u'\u044f' # 0xEF -> CYRILLIC SMALL LETTER YA
u'\u2116' # 0xF0 -> NUMERO SIGN
u'\u0451' # 0xF1 -> CYRILLIC SMALL LETTER IO
u'\u0452' # 0xF2 -> CYRILLIC SMALL LETTER DJE
u'\u0453' # 0xF3 -> CYRILLIC SMALL LETTER GJE
u'\u0454' # 0xF4 -> CYRILLIC SMALL LETTER UKRAINIAN IE
u'\u0455' # 0xF5 -> CYRILLIC SMALL LETTER DZE
u'\u0456' # 0xF6 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
u'\u0457' # 0xF7 -> CYRILLIC SMALL LETTER YI
u'\u0458' # 0xF8 -> CYRILLIC SMALL LETTER JE
u'\u0459' # 0xF9 -> CYRILLIC SMALL LETTER LJE
u'\u045a' # 0xFA -> CYRILLIC SMALL LETTER NJE
u'\u045b' # 0xFB -> CYRILLIC SMALL LETTER TSHE
u'\u045c' # 0xFC -> CYRILLIC SMALL LETTER KJE
u'\xa7' # 0xFD -> SECTION SIGN
u'\u045e' # 0xFE -> CYRILLIC SMALL LETTER SHORT U
u'\u045f' # 0xFF -> CYRILLIC SMALL LETTER DZHE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_6 generated from 'MAPPINGS/ISO8859/8859-6.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-6',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\u060c' # 0xAC -> ARABIC COMMA
u'\xad' # 0xAD -> SOFT HYPHEN
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\u061b' # 0xBB -> ARABIC SEMICOLON
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\u061f' # 0xBF -> ARABIC QUESTION MARK
u'\ufffe'
u'\u0621' # 0xC1 -> ARABIC LETTER HAMZA
u'\u0622' # 0xC2 -> ARABIC LETTER ALEF WITH MADDA ABOVE
u'\u0623' # 0xC3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE
u'\u0624' # 0xC4 -> ARABIC LETTER WAW WITH HAMZA ABOVE
u'\u0625' # 0xC5 -> ARABIC LETTER ALEF WITH HAMZA BELOW
u'\u0626' # 0xC6 -> ARABIC LETTER YEH WITH HAMZA ABOVE
u'\u0627' # 0xC7 -> ARABIC LETTER ALEF
u'\u0628' # 0xC8 -> ARABIC LETTER BEH
u'\u0629' # 0xC9 -> ARABIC LETTER TEH MARBUTA
u'\u062a' # 0xCA -> ARABIC LETTER TEH
u'\u062b' # 0xCB -> ARABIC LETTER THEH
u'\u062c' # 0xCC -> ARABIC LETTER JEEM
u'\u062d' # 0xCD -> ARABIC LETTER HAH
u'\u062e' # 0xCE -> ARABIC LETTER KHAH
u'\u062f' # 0xCF -> ARABIC LETTER DAL
u'\u0630' # 0xD0 -> ARABIC LETTER THAL
u'\u0631' # 0xD1 -> ARABIC LETTER REH
u'\u0632' # 0xD2 -> ARABIC LETTER ZAIN
u'\u0633' # 0xD3 -> ARABIC LETTER SEEN
u'\u0634' # 0xD4 -> ARABIC LETTER SHEEN
u'\u0635' # 0xD5 -> ARABIC LETTER SAD
u'\u0636' # 0xD6 -> ARABIC LETTER DAD
u'\u0637' # 0xD7 -> ARABIC LETTER TAH
u'\u0638' # 0xD8 -> ARABIC LETTER ZAH
u'\u0639' # 0xD9 -> ARABIC LETTER AIN
u'\u063a' # 0xDA -> ARABIC LETTER GHAIN
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\u0640' # 0xE0 -> ARABIC TATWEEL
u'\u0641' # 0xE1 -> ARABIC LETTER FEH
u'\u0642' # 0xE2 -> ARABIC LETTER QAF
u'\u0643' # 0xE3 -> ARABIC LETTER KAF
u'\u0644' # 0xE4 -> ARABIC LETTER LAM
u'\u0645' # 0xE5 -> ARABIC LETTER MEEM
u'\u0646' # 0xE6 -> ARABIC LETTER NOON
u'\u0647' # 0xE7 -> ARABIC LETTER HEH
u'\u0648' # 0xE8 -> ARABIC LETTER WAW
u'\u0649' # 0xE9 -> ARABIC LETTER ALEF MAKSURA
u'\u064a' # 0xEA -> ARABIC LETTER YEH
u'\u064b' # 0xEB -> ARABIC FATHATAN
u'\u064c' # 0xEC -> ARABIC DAMMATAN
u'\u064d' # 0xED -> ARABIC KASRATAN
u'\u064e' # 0xEE -> ARABIC FATHA
u'\u064f' # 0xEF -> ARABIC DAMMA
u'\u0650' # 0xF0 -> ARABIC KASRA
u'\u0651' # 0xF1 -> ARABIC SHADDA
u'\u0652' # 0xF2 -> ARABIC SUKUN
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_7 generated from 'MAPPINGS/ISO8859/8859-7.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-7',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\u2018' # 0xA1 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xA2 -> RIGHT SINGLE QUOTATION MARK
u'\xa3' # 0xA3 -> POUND SIGN
u'\u20ac' # 0xA4 -> EURO SIGN
u'\u20af' # 0xA5 -> DRACHMA SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u037a' # 0xAA -> GREEK YPOGEGRAMMENI
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\ufffe'
u'\u2015' # 0xAF -> HORIZONTAL BAR
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\u0384' # 0xB4 -> GREEK TONOS
u'\u0385' # 0xB5 -> GREEK DIALYTIKA TONOS
u'\u0386' # 0xB6 -> GREEK CAPITAL LETTER ALPHA WITH TONOS
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\u0388' # 0xB8 -> GREEK CAPITAL LETTER EPSILON WITH TONOS
u'\u0389' # 0xB9 -> GREEK CAPITAL LETTER ETA WITH TONOS
u'\u038a' # 0xBA -> GREEK CAPITAL LETTER IOTA WITH TONOS
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u038c' # 0xBC -> GREEK CAPITAL LETTER OMICRON WITH TONOS
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\u038e' # 0xBE -> GREEK CAPITAL LETTER UPSILON WITH TONOS
u'\u038f' # 0xBF -> GREEK CAPITAL LETTER OMEGA WITH TONOS
u'\u0390' # 0xC0 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
u'\u0391' # 0xC1 -> GREEK CAPITAL LETTER ALPHA
u'\u0392' # 0xC2 -> GREEK CAPITAL LETTER BETA
u'\u0393' # 0xC3 -> GREEK CAPITAL LETTER GAMMA
u'\u0394' # 0xC4 -> GREEK CAPITAL LETTER DELTA
u'\u0395' # 0xC5 -> GREEK CAPITAL LETTER EPSILON
u'\u0396' # 0xC6 -> GREEK CAPITAL LETTER ZETA
u'\u0397' # 0xC7 -> GREEK CAPITAL LETTER ETA
u'\u0398' # 0xC8 -> GREEK CAPITAL LETTER THETA
u'\u0399' # 0xC9 -> GREEK CAPITAL LETTER IOTA
u'\u039a' # 0xCA -> GREEK CAPITAL LETTER KAPPA
u'\u039b' # 0xCB -> GREEK CAPITAL LETTER LAMDA
u'\u039c' # 0xCC -> GREEK CAPITAL LETTER MU
u'\u039d' # 0xCD -> GREEK CAPITAL LETTER NU
u'\u039e' # 0xCE -> GREEK CAPITAL LETTER XI
u'\u039f' # 0xCF -> GREEK CAPITAL LETTER OMICRON
u'\u03a0' # 0xD0 -> GREEK CAPITAL LETTER PI
u'\u03a1' # 0xD1 -> GREEK CAPITAL LETTER RHO
u'\ufffe'
u'\u03a3' # 0xD3 -> GREEK CAPITAL LETTER SIGMA
u'\u03a4' # 0xD4 -> GREEK CAPITAL LETTER TAU
u'\u03a5' # 0xD5 -> GREEK CAPITAL LETTER UPSILON
u'\u03a6' # 0xD6 -> GREEK CAPITAL LETTER PHI
u'\u03a7' # 0xD7 -> GREEK CAPITAL LETTER CHI
u'\u03a8' # 0xD8 -> GREEK CAPITAL LETTER PSI
u'\u03a9' # 0xD9 -> GREEK CAPITAL LETTER OMEGA
u'\u03aa' # 0xDA -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
u'\u03ab' # 0xDB -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
u'\u03ac' # 0xDC -> GREEK SMALL LETTER ALPHA WITH TONOS
u'\u03ad' # 0xDD -> GREEK SMALL LETTER EPSILON WITH TONOS
u'\u03ae' # 0xDE -> GREEK SMALL LETTER ETA WITH TONOS
u'\u03af' # 0xDF -> GREEK SMALL LETTER IOTA WITH TONOS
u'\u03b0' # 0xE0 -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
u'\u03b1' # 0xE1 -> GREEK SMALL LETTER ALPHA
u'\u03b2' # 0xE2 -> GREEK SMALL LETTER BETA
u'\u03b3' # 0xE3 -> GREEK SMALL LETTER GAMMA
u'\u03b4' # 0xE4 -> GREEK SMALL LETTER DELTA
u'\u03b5' # 0xE5 -> GREEK SMALL LETTER EPSILON
u'\u03b6' # 0xE6 -> GREEK SMALL LETTER ZETA
u'\u03b7' # 0xE7 -> GREEK SMALL LETTER ETA
u'\u03b8' # 0xE8 -> GREEK SMALL LETTER THETA
u'\u03b9' # 0xE9 -> GREEK SMALL LETTER IOTA
u'\u03ba' # 0xEA -> GREEK SMALL LETTER KAPPA
u'\u03bb' # 0xEB -> GREEK SMALL LETTER LAMDA
u'\u03bc' # 0xEC -> GREEK SMALL LETTER MU
u'\u03bd' # 0xED -> GREEK SMALL LETTER NU
u'\u03be' # 0xEE -> GREEK SMALL LETTER XI
u'\u03bf' # 0xEF -> GREEK SMALL LETTER OMICRON
u'\u03c0' # 0xF0 -> GREEK SMALL LETTER PI
u'\u03c1' # 0xF1 -> GREEK SMALL LETTER RHO
u'\u03c2' # 0xF2 -> GREEK SMALL LETTER FINAL SIGMA
u'\u03c3' # 0xF3 -> GREEK SMALL LETTER SIGMA
u'\u03c4' # 0xF4 -> GREEK SMALL LETTER TAU
u'\u03c5' # 0xF5 -> GREEK SMALL LETTER UPSILON
u'\u03c6' # 0xF6 -> GREEK SMALL LETTER PHI
u'\u03c7' # 0xF7 -> GREEK SMALL LETTER CHI
u'\u03c8' # 0xF8 -> GREEK SMALL LETTER PSI
u'\u03c9' # 0xF9 -> GREEK SMALL LETTER OMEGA
u'\u03ca' # 0xFA -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
u'\u03cb' # 0xFB -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
u'\u03cc' # 0xFC -> GREEK SMALL LETTER OMICRON WITH TONOS
u'\u03cd' # 0xFD -> GREEK SMALL LETTER UPSILON WITH TONOS
u'\u03ce' # 0xFE -> GREEK SMALL LETTER OMEGA WITH TONOS
u'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_8 generated from 'MAPPINGS/ISO8859/8859-8.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-8',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\ufffe'
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\xd7' # 0xAA -> MULTIPLICATION SIGN
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\xf7' # 0xBA -> DIVISION SIGN
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\u2017' # 0xDF -> DOUBLE LOW LINE
u'\u05d0' # 0xE0 -> HEBREW LETTER ALEF
u'\u05d1' # 0xE1 -> HEBREW LETTER BET
u'\u05d2' # 0xE2 -> HEBREW LETTER GIMEL
u'\u05d3' # 0xE3 -> HEBREW LETTER DALET
u'\u05d4' # 0xE4 -> HEBREW LETTER HE
u'\u05d5' # 0xE5 -> HEBREW LETTER VAV
u'\u05d6' # 0xE6 -> HEBREW LETTER ZAYIN
u'\u05d7' # 0xE7 -> HEBREW LETTER HET
u'\u05d8' # 0xE8 -> HEBREW LETTER TET
u'\u05d9' # 0xE9 -> HEBREW LETTER YOD
u'\u05da' # 0xEA -> HEBREW LETTER FINAL KAF
u'\u05db' # 0xEB -> HEBREW LETTER KAF
u'\u05dc' # 0xEC -> HEBREW LETTER LAMED
u'\u05dd' # 0xED -> HEBREW LETTER FINAL MEM
u'\u05de' # 0xEE -> HEBREW LETTER MEM
u'\u05df' # 0xEF -> HEBREW LETTER FINAL NUN
u'\u05e0' # 0xF0 -> HEBREW LETTER NUN
u'\u05e1' # 0xF1 -> HEBREW LETTER SAMEKH
u'\u05e2' # 0xF2 -> HEBREW LETTER AYIN
u'\u05e3' # 0xF3 -> HEBREW LETTER FINAL PE
u'\u05e4' # 0xF4 -> HEBREW LETTER PE
u'\u05e5' # 0xF5 -> HEBREW LETTER FINAL TSADI
u'\u05e6' # 0xF6 -> HEBREW LETTER TSADI
u'\u05e7' # 0xF7 -> HEBREW LETTER QOF
u'\u05e8' # 0xF8 -> HEBREW LETTER RESH
u'\u05e9' # 0xF9 -> HEBREW LETTER SHIN
u'\u05ea' # 0xFA -> HEBREW LETTER TAV
u'\ufffe'
u'\ufffe'
u'\u200e' # 0xFD -> LEFT-TO-RIGHT MARK
u'\u200f' # 0xFE -> RIGHT-TO-LEFT MARK
u'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_9 generated from 'MAPPINGS/ISO8859/8859-9.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-9',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\xa0' # 0xA0 -> NO-BREAK SPACE
u'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa4' # 0xA4 -> CURRENCY SIGN
u'\xa5' # 0xA5 -> YEN SIGN
u'\xa6' # 0xA6 -> BROKEN BAR
u'\xa7' # 0xA7 -> SECTION SIGN
u'\xa8' # 0xA8 -> DIAERESIS
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
u'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xac' # 0xAC -> NOT SIGN
u'\xad' # 0xAD -> SOFT HYPHEN
u'\xae' # 0xAE -> REGISTERED SIGN
u'\xaf' # 0xAF -> MACRON
u'\xb0' # 0xB0 -> DEGREE SIGN
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\xb2' # 0xB2 -> SUPERSCRIPT TWO
u'\xb3' # 0xB3 -> SUPERSCRIPT THREE
u'\xb4' # 0xB4 -> ACUTE ACCENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\xb6' # 0xB6 -> PILCROW SIGN
u'\xb7' # 0xB7 -> MIDDLE DOT
u'\xb8' # 0xB8 -> CEDILLA
u'\xb9' # 0xB9 -> SUPERSCRIPT ONE
u'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
u'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
u'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
u'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
u'\xbf' # 0xBF -> INVERTED QUESTION MARK
u'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
u'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
u'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\u011e' # 0xD0 -> LATIN CAPITAL LETTER G WITH BREVE
u'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
u'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xd7' # 0xD7 -> MULTIPLICATION SIGN
u'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
u'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u0130' # 0xDD -> LATIN CAPITAL LETTER I WITH DOT ABOVE
u'\u015e' # 0xDE -> LATIN CAPITAL LETTER S WITH CEDILLA
u'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
u'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
u'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
u'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
u'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
u'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
u'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
u'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
u'\u011f' # 0xF0 -> LATIN SMALL LETTER G WITH BREVE
u'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
u'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
u'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0xF7 -> DIVISION SIGN
u'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
u'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
u'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
u'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u0131' # 0xFD -> LATIN SMALL LETTER DOTLESS I
u'\u015f' # 0xFE -> LATIN SMALL LETTER S WITH CEDILLA
u'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec koi8_r generated from 'MAPPINGS/VENDORS/MISC/KOI8-R.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='koi8-r',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u2500' # 0x80 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u2502' # 0x81 -> BOX DRAWINGS LIGHT VERTICAL
u'\u250c' # 0x82 -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2510' # 0x83 -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x84 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2518' # 0x85 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u251c' # 0x86 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2524' # 0x87 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u252c' # 0x88 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u2534' # 0x89 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u253c' # 0x8A -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u2580' # 0x8B -> UPPER HALF BLOCK
u'\u2584' # 0x8C -> LOWER HALF BLOCK
u'\u2588' # 0x8D -> FULL BLOCK
u'\u258c' # 0x8E -> LEFT HALF BLOCK
u'\u2590' # 0x8F -> RIGHT HALF BLOCK
u'\u2591' # 0x90 -> LIGHT SHADE
u'\u2592' # 0x91 -> MEDIUM SHADE
u'\u2593' # 0x92 -> DARK SHADE
u'\u2320' # 0x93 -> TOP HALF INTEGRAL
u'\u25a0' # 0x94 -> BLACK SQUARE
u'\u2219' # 0x95 -> BULLET OPERATOR
u'\u221a' # 0x96 -> SQUARE ROOT
u'\u2248' # 0x97 -> ALMOST EQUAL TO
u'\u2264' # 0x98 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0x99 -> GREATER-THAN OR EQUAL TO
u'\xa0' # 0x9A -> NO-BREAK SPACE
u'\u2321' # 0x9B -> BOTTOM HALF INTEGRAL
u'\xb0' # 0x9C -> DEGREE SIGN
u'\xb2' # 0x9D -> SUPERSCRIPT TWO
u'\xb7' # 0x9E -> MIDDLE DOT
u'\xf7' # 0x9F -> DIVISION SIGN
u'\u2550' # 0xA0 -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u2551' # 0xA1 -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2552' # 0xA2 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u0451' # 0xA3 -> CYRILLIC SMALL LETTER IO
u'\u2553' # 0xA4 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
u'\u2554' # 0xA5 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u2555' # 0xA6 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
u'\u2556' # 0xA7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
u'\u2557' # 0xA8 -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u2558' # 0xA9 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2559' # 0xAA -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u255a' # 0xAB -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u255b' # 0xAC -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u255c' # 0xAD -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
u'\u255d' # 0xAE -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255e' # 0xAF -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0xB0 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u2560' # 0xB1 -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2561' # 0xB2 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u0401' # 0xB3 -> CYRILLIC CAPITAL LETTER IO
u'\u2562' # 0xB4 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
u'\u2563' # 0xB5 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u2564' # 0xB6 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
u'\u2565' # 0xB7 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
u'\u2566' # 0xB8 -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2567' # 0xB9 -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0xBA -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2569' # 0xBB -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u256a' # 0xBC -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u256b' # 0xBD -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
u'\u256c' # 0xBE -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\xa9' # 0xBF -> COPYRIGHT SIGN
u'\u044e' # 0xC0 -> CYRILLIC SMALL LETTER YU
u'\u0430' # 0xC1 -> CYRILLIC SMALL LETTER A
u'\u0431' # 0xC2 -> CYRILLIC SMALL LETTER BE
u'\u0446' # 0xC3 -> CYRILLIC SMALL LETTER TSE
u'\u0434' # 0xC4 -> CYRILLIC SMALL LETTER DE
u'\u0435' # 0xC5 -> CYRILLIC SMALL LETTER IE
u'\u0444' # 0xC6 -> CYRILLIC SMALL LETTER EF
u'\u0433' # 0xC7 -> CYRILLIC SMALL LETTER GHE
u'\u0445' # 0xC8 -> CYRILLIC SMALL LETTER HA
u'\u0438' # 0xC9 -> CYRILLIC SMALL LETTER I
u'\u0439' # 0xCA -> CYRILLIC SMALL LETTER SHORT I
u'\u043a' # 0xCB -> CYRILLIC SMALL LETTER KA
u'\u043b' # 0xCC -> CYRILLIC SMALL LETTER EL
u'\u043c' # 0xCD -> CYRILLIC SMALL LETTER EM
u'\u043d' # 0xCE -> CYRILLIC SMALL LETTER EN
u'\u043e' # 0xCF -> CYRILLIC SMALL LETTER O
u'\u043f' # 0xD0 -> CYRILLIC SMALL LETTER PE
u'\u044f' # 0xD1 -> CYRILLIC SMALL LETTER YA
u'\u0440' # 0xD2 -> CYRILLIC SMALL LETTER ER
u'\u0441' # 0xD3 -> CYRILLIC SMALL LETTER ES
u'\u0442' # 0xD4 -> CYRILLIC SMALL LETTER TE
u'\u0443' # 0xD5 -> CYRILLIC SMALL LETTER U
u'\u0436' # 0xD6 -> CYRILLIC SMALL LETTER ZHE
u'\u0432' # 0xD7 -> CYRILLIC SMALL LETTER VE
u'\u044c' # 0xD8 -> CYRILLIC SMALL LETTER SOFT SIGN
u'\u044b' # 0xD9 -> CYRILLIC SMALL LETTER YERU
u'\u0437' # 0xDA -> CYRILLIC SMALL LETTER ZE
u'\u0448' # 0xDB -> CYRILLIC SMALL LETTER SHA
u'\u044d' # 0xDC -> CYRILLIC SMALL LETTER E
u'\u0449' # 0xDD -> CYRILLIC SMALL LETTER SHCHA
u'\u0447' # 0xDE -> CYRILLIC SMALL LETTER CHE
u'\u044a' # 0xDF -> CYRILLIC SMALL LETTER HARD SIGN
u'\u042e' # 0xE0 -> CYRILLIC CAPITAL LETTER YU
u'\u0410' # 0xE1 -> CYRILLIC CAPITAL LETTER A
u'\u0411' # 0xE2 -> CYRILLIC CAPITAL LETTER BE
u'\u0426' # 0xE3 -> CYRILLIC CAPITAL LETTER TSE
u'\u0414' # 0xE4 -> CYRILLIC CAPITAL LETTER DE
u'\u0415' # 0xE5 -> CYRILLIC CAPITAL LETTER IE
u'\u0424' # 0xE6 -> CYRILLIC CAPITAL LETTER EF
u'\u0413' # 0xE7 -> CYRILLIC CAPITAL LETTER GHE
u'\u0425' # 0xE8 -> CYRILLIC CAPITAL LETTER HA
u'\u0418' # 0xE9 -> CYRILLIC CAPITAL LETTER I
u'\u0419' # 0xEA -> CYRILLIC CAPITAL LETTER SHORT I
u'\u041a' # 0xEB -> CYRILLIC CAPITAL LETTER KA
u'\u041b' # 0xEC -> CYRILLIC CAPITAL LETTER EL
u'\u041c' # 0xED -> CYRILLIC CAPITAL LETTER EM
u'\u041d' # 0xEE -> CYRILLIC CAPITAL LETTER EN
u'\u041e' # 0xEF -> CYRILLIC CAPITAL LETTER O
u'\u041f' # 0xF0 -> CYRILLIC CAPITAL LETTER PE
u'\u042f' # 0xF1 -> CYRILLIC CAPITAL LETTER YA
u'\u0420' # 0xF2 -> CYRILLIC CAPITAL LETTER ER
u'\u0421' # 0xF3 -> CYRILLIC CAPITAL LETTER ES
u'\u0422' # 0xF4 -> CYRILLIC CAPITAL LETTER TE
u'\u0423' # 0xF5 -> CYRILLIC CAPITAL LETTER U
u'\u0416' # 0xF6 -> CYRILLIC CAPITAL LETTER ZHE
u'\u0412' # 0xF7 -> CYRILLIC CAPITAL LETTER VE
u'\u042c' # 0xF8 -> CYRILLIC CAPITAL LETTER SOFT SIGN
u'\u042b' # 0xF9 -> CYRILLIC CAPITAL LETTER YERU
u'\u0417' # 0xFA -> CYRILLIC CAPITAL LETTER ZE
u'\u0428' # 0xFB -> CYRILLIC CAPITAL LETTER SHA
u'\u042d' # 0xFC -> CYRILLIC CAPITAL LETTER E
u'\u0429' # 0xFD -> CYRILLIC CAPITAL LETTER SHCHA
u'\u0427' # 0xFE -> CYRILLIC CAPITAL LETTER CHE
u'\u042a' # 0xFF -> CYRILLIC CAPITAL LETTER HARD SIGN
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec koi8_u generated from 'python-mappings/KOI8-U.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='koi8-u',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\u2500' # 0x80 -> BOX DRAWINGS LIGHT HORIZONTAL
u'\u2502' # 0x81 -> BOX DRAWINGS LIGHT VERTICAL
u'\u250c' # 0x82 -> BOX DRAWINGS LIGHT DOWN AND RIGHT
u'\u2510' # 0x83 -> BOX DRAWINGS LIGHT DOWN AND LEFT
u'\u2514' # 0x84 -> BOX DRAWINGS LIGHT UP AND RIGHT
u'\u2518' # 0x85 -> BOX DRAWINGS LIGHT UP AND LEFT
u'\u251c' # 0x86 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
u'\u2524' # 0x87 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
u'\u252c' # 0x88 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
u'\u2534' # 0x89 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
u'\u253c' # 0x8A -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
u'\u2580' # 0x8B -> UPPER HALF BLOCK
u'\u2584' # 0x8C -> LOWER HALF BLOCK
u'\u2588' # 0x8D -> FULL BLOCK
u'\u258c' # 0x8E -> LEFT HALF BLOCK
u'\u2590' # 0x8F -> RIGHT HALF BLOCK
u'\u2591' # 0x90 -> LIGHT SHADE
u'\u2592' # 0x91 -> MEDIUM SHADE
u'\u2593' # 0x92 -> DARK SHADE
u'\u2320' # 0x93 -> TOP HALF INTEGRAL
u'\u25a0' # 0x94 -> BLACK SQUARE
u'\u2219' # 0x95 -> BULLET OPERATOR
u'\u221a' # 0x96 -> SQUARE ROOT
u'\u2248' # 0x97 -> ALMOST EQUAL TO
u'\u2264' # 0x98 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0x99 -> GREATER-THAN OR EQUAL TO
u'\xa0' # 0x9A -> NO-BREAK SPACE
u'\u2321' # 0x9B -> BOTTOM HALF INTEGRAL
u'\xb0' # 0x9C -> DEGREE SIGN
u'\xb2' # 0x9D -> SUPERSCRIPT TWO
u'\xb7' # 0x9E -> MIDDLE DOT
u'\xf7' # 0x9F -> DIVISION SIGN
u'\u2550' # 0xA0 -> BOX DRAWINGS DOUBLE HORIZONTAL
u'\u2551' # 0xA1 -> BOX DRAWINGS DOUBLE VERTICAL
u'\u2552' # 0xA2 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
u'\u0451' # 0xA3 -> CYRILLIC SMALL LETTER IO
u'\u0454' # 0xA4 -> CYRILLIC SMALL LETTER UKRAINIAN IE
u'\u2554' # 0xA5 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
u'\u0456' # 0xA6 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
u'\u0457' # 0xA7 -> CYRILLIC SMALL LETTER YI (UKRAINIAN)
u'\u2557' # 0xA8 -> BOX DRAWINGS DOUBLE DOWN AND LEFT
u'\u2558' # 0xA9 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
u'\u2559' # 0xAA -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
u'\u255a' # 0xAB -> BOX DRAWINGS DOUBLE UP AND RIGHT
u'\u255b' # 0xAC -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
u'\u0491' # 0xAD -> CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN
u'\u255d' # 0xAE -> BOX DRAWINGS DOUBLE UP AND LEFT
u'\u255e' # 0xAF -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
u'\u255f' # 0xB0 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
u'\u2560' # 0xB1 -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
u'\u2561' # 0xB2 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
u'\u0401' # 0xB3 -> CYRILLIC CAPITAL LETTER IO
u'\u0404' # 0xB4 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
u'\u2563' # 0xB5 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
u'\u0406' # 0xB6 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
u'\u0407' # 0xB7 -> CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
u'\u2566' # 0xB8 -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
u'\u2567' # 0xB9 -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
u'\u2568' # 0xBA -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
u'\u2569' # 0xBB -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
u'\u256a' # 0xBC -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
u'\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN
u'\u256c' # 0xBE -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
u'\xa9' # 0xBF -> COPYRIGHT SIGN
u'\u044e' # 0xC0 -> CYRILLIC SMALL LETTER YU
u'\u0430' # 0xC1 -> CYRILLIC SMALL LETTER A
u'\u0431' # 0xC2 -> CYRILLIC SMALL LETTER BE
u'\u0446' # 0xC3 -> CYRILLIC SMALL LETTER TSE
u'\u0434' # 0xC4 -> CYRILLIC SMALL LETTER DE
u'\u0435' # 0xC5 -> CYRILLIC SMALL LETTER IE
u'\u0444' # 0xC6 -> CYRILLIC SMALL LETTER EF
u'\u0433' # 0xC7 -> CYRILLIC SMALL LETTER GHE
u'\u0445' # 0xC8 -> CYRILLIC SMALL LETTER HA
u'\u0438' # 0xC9 -> CYRILLIC SMALL LETTER I
u'\u0439' # 0xCA -> CYRILLIC SMALL LETTER SHORT I
u'\u043a' # 0xCB -> CYRILLIC SMALL LETTER KA
u'\u043b' # 0xCC -> CYRILLIC SMALL LETTER EL
u'\u043c' # 0xCD -> CYRILLIC SMALL LETTER EM
u'\u043d' # 0xCE -> CYRILLIC SMALL LETTER EN
u'\u043e' # 0xCF -> CYRILLIC SMALL LETTER O
u'\u043f' # 0xD0 -> CYRILLIC SMALL LETTER PE
u'\u044f' # 0xD1 -> CYRILLIC SMALL LETTER YA
u'\u0440' # 0xD2 -> CYRILLIC SMALL LETTER ER
u'\u0441' # 0xD3 -> CYRILLIC SMALL LETTER ES
u'\u0442' # 0xD4 -> CYRILLIC SMALL LETTER TE
u'\u0443' # 0xD5 -> CYRILLIC SMALL LETTER U
u'\u0436' # 0xD6 -> CYRILLIC SMALL LETTER ZHE
u'\u0432' # 0xD7 -> CYRILLIC SMALL LETTER VE
u'\u044c' # 0xD8 -> CYRILLIC SMALL LETTER SOFT SIGN
u'\u044b' # 0xD9 -> CYRILLIC SMALL LETTER YERU
u'\u0437' # 0xDA -> CYRILLIC SMALL LETTER ZE
u'\u0448' # 0xDB -> CYRILLIC SMALL LETTER SHA
u'\u044d' # 0xDC -> CYRILLIC SMALL LETTER E
u'\u0449' # 0xDD -> CYRILLIC SMALL LETTER SHCHA
u'\u0447' # 0xDE -> CYRILLIC SMALL LETTER CHE
u'\u044a' # 0xDF -> CYRILLIC SMALL LETTER HARD SIGN
u'\u042e' # 0xE0 -> CYRILLIC CAPITAL LETTER YU
u'\u0410' # 0xE1 -> CYRILLIC CAPITAL LETTER A
u'\u0411' # 0xE2 -> CYRILLIC CAPITAL LETTER BE
u'\u0426' # 0xE3 -> CYRILLIC CAPITAL LETTER TSE
u'\u0414' # 0xE4 -> CYRILLIC CAPITAL LETTER DE
u'\u0415' # 0xE5 -> CYRILLIC CAPITAL LETTER IE
u'\u0424' # 0xE6 -> CYRILLIC CAPITAL LETTER EF
u'\u0413' # 0xE7 -> CYRILLIC CAPITAL LETTER GHE
u'\u0425' # 0xE8 -> CYRILLIC CAPITAL LETTER HA
u'\u0418' # 0xE9 -> CYRILLIC CAPITAL LETTER I
u'\u0419' # 0xEA -> CYRILLIC CAPITAL LETTER SHORT I
u'\u041a' # 0xEB -> CYRILLIC CAPITAL LETTER KA
u'\u041b' # 0xEC -> CYRILLIC CAPITAL LETTER EL
u'\u041c' # 0xED -> CYRILLIC CAPITAL LETTER EM
u'\u041d' # 0xEE -> CYRILLIC CAPITAL LETTER EN
u'\u041e' # 0xEF -> CYRILLIC CAPITAL LETTER O
u'\u041f' # 0xF0 -> CYRILLIC CAPITAL LETTER PE
u'\u042f' # 0xF1 -> CYRILLIC CAPITAL LETTER YA
u'\u0420' # 0xF2 -> CYRILLIC CAPITAL LETTER ER
u'\u0421' # 0xF3 -> CYRILLIC CAPITAL LETTER ES
u'\u0422' # 0xF4 -> CYRILLIC CAPITAL LETTER TE
u'\u0423' # 0xF5 -> CYRILLIC CAPITAL LETTER U
u'\u0416' # 0xF6 -> CYRILLIC CAPITAL LETTER ZHE
u'\u0412' # 0xF7 -> CYRILLIC CAPITAL LETTER VE
u'\u042c' # 0xF8 -> CYRILLIC CAPITAL LETTER SOFT SIGN
u'\u042b' # 0xF9 -> CYRILLIC CAPITAL LETTER YERU
u'\u0417' # 0xFA -> CYRILLIC CAPITAL LETTER ZE
u'\u0428' # 0xFB -> CYRILLIC CAPITAL LETTER SHA
u'\u042d' # 0xFC -> CYRILLIC CAPITAL LETTER E
u'\u0429' # 0xFD -> CYRILLIC CAPITAL LETTER SHCHA
u'\u0427' # 0xFE -> CYRILLIC CAPITAL LETTER CHE
u'\u042a' # 0xFF -> CYRILLIC CAPITAL LETTER HARD SIGN
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python 'latin-1' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.latin_1_encode
decode = codecs.latin_1_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.latin_1_encode(input,self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.latin_1_decode(input,self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
class StreamConverter(StreamWriter,StreamReader):
encode = codecs.latin_1_decode
decode = codecs.latin_1_encode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-1',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python Character Mapping Codec generated from 'VENDORS/APPLE/ARABIC.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-arabic',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x0081: 0x00a0, # NO-BREAK SPACE, right-left
0x0082: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0083: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0084: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x0085: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x0086: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x0087: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x0088: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0089: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x008a: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x008b: 0x06ba, # ARABIC LETTER NOON GHUNNA
0x008c: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
0x008d: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x008e: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x008f: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x0090: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0091: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x0092: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x0093: 0x2026, # HORIZONTAL ELLIPSIS, right-left
0x0094: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x0095: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x0096: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x0097: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x0098: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
0x0099: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x009a: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x009b: 0x00f7, # DIVISION SIGN, right-left
0x009c: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x009d: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x009e: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x009f: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x00a0: 0x0020, # SPACE, right-left
0x00a1: 0x0021, # EXCLAMATION MARK, right-left
0x00a2: 0x0022, # QUOTATION MARK, right-left
0x00a3: 0x0023, # NUMBER SIGN, right-left
0x00a4: 0x0024, # DOLLAR SIGN, right-left
0x00a5: 0x066a, # ARABIC PERCENT SIGN
0x00a6: 0x0026, # AMPERSAND, right-left
0x00a7: 0x0027, # APOSTROPHE, right-left
0x00a8: 0x0028, # LEFT PARENTHESIS, right-left
0x00a9: 0x0029, # RIGHT PARENTHESIS, right-left
0x00aa: 0x002a, # ASTERISK, right-left
0x00ab: 0x002b, # PLUS SIGN, right-left
0x00ac: 0x060c, # ARABIC COMMA
0x00ad: 0x002d, # HYPHEN-MINUS, right-left
0x00ae: 0x002e, # FULL STOP, right-left
0x00af: 0x002f, # SOLIDUS, right-left
0x00b0: 0x0660, # ARABIC-INDIC DIGIT ZERO, right-left (need override)
0x00b1: 0x0661, # ARABIC-INDIC DIGIT ONE, right-left (need override)
0x00b2: 0x0662, # ARABIC-INDIC DIGIT TWO, right-left (need override)
0x00b3: 0x0663, # ARABIC-INDIC DIGIT THREE, right-left (need override)
0x00b4: 0x0664, # ARABIC-INDIC DIGIT FOUR, right-left (need override)
0x00b5: 0x0665, # ARABIC-INDIC DIGIT FIVE, right-left (need override)
0x00b6: 0x0666, # ARABIC-INDIC DIGIT SIX, right-left (need override)
0x00b7: 0x0667, # ARABIC-INDIC DIGIT SEVEN, right-left (need override)
0x00b8: 0x0668, # ARABIC-INDIC DIGIT EIGHT, right-left (need override)
0x00b9: 0x0669, # ARABIC-INDIC DIGIT NINE, right-left (need override)
0x00ba: 0x003a, # COLON, right-left
0x00bb: 0x061b, # ARABIC SEMICOLON
0x00bc: 0x003c, # LESS-THAN SIGN, right-left
0x00bd: 0x003d, # EQUALS SIGN, right-left
0x00be: 0x003e, # GREATER-THAN SIGN, right-left
0x00bf: 0x061f, # ARABIC QUESTION MARK
0x00c0: 0x274a, # EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
0x00c1: 0x0621, # ARABIC LETTER HAMZA
0x00c2: 0x0622, # ARABIC LETTER ALEF WITH MADDA ABOVE
0x00c3: 0x0623, # ARABIC LETTER ALEF WITH HAMZA ABOVE
0x00c4: 0x0624, # ARABIC LETTER WAW WITH HAMZA ABOVE
0x00c5: 0x0625, # ARABIC LETTER ALEF WITH HAMZA BELOW
0x00c6: 0x0626, # ARABIC LETTER YEH WITH HAMZA ABOVE
0x00c7: 0x0627, # ARABIC LETTER ALEF
0x00c8: 0x0628, # ARABIC LETTER BEH
0x00c9: 0x0629, # ARABIC LETTER TEH MARBUTA
0x00ca: 0x062a, # ARABIC LETTER TEH
0x00cb: 0x062b, # ARABIC LETTER THEH
0x00cc: 0x062c, # ARABIC LETTER JEEM
0x00cd: 0x062d, # ARABIC LETTER HAH
0x00ce: 0x062e, # ARABIC LETTER KHAH
0x00cf: 0x062f, # ARABIC LETTER DAL
0x00d0: 0x0630, # ARABIC LETTER THAL
0x00d1: 0x0631, # ARABIC LETTER REH
0x00d2: 0x0632, # ARABIC LETTER ZAIN
0x00d3: 0x0633, # ARABIC LETTER SEEN
0x00d4: 0x0634, # ARABIC LETTER SHEEN
0x00d5: 0x0635, # ARABIC LETTER SAD
0x00d6: 0x0636, # ARABIC LETTER DAD
0x00d7: 0x0637, # ARABIC LETTER TAH
0x00d8: 0x0638, # ARABIC LETTER ZAH
0x00d9: 0x0639, # ARABIC LETTER AIN
0x00da: 0x063a, # ARABIC LETTER GHAIN
0x00db: 0x005b, # LEFT SQUARE BRACKET, right-left
0x00dc: 0x005c, # REVERSE SOLIDUS, right-left
0x00dd: 0x005d, # RIGHT SQUARE BRACKET, right-left
0x00de: 0x005e, # CIRCUMFLEX ACCENT, right-left
0x00df: 0x005f, # LOW LINE, right-left
0x00e0: 0x0640, # ARABIC TATWEEL
0x00e1: 0x0641, # ARABIC LETTER FEH
0x00e2: 0x0642, # ARABIC LETTER QAF
0x00e3: 0x0643, # ARABIC LETTER KAF
0x00e4: 0x0644, # ARABIC LETTER LAM
0x00e5: 0x0645, # ARABIC LETTER MEEM
0x00e6: 0x0646, # ARABIC LETTER NOON
0x00e7: 0x0647, # ARABIC LETTER HEH
0x00e8: 0x0648, # ARABIC LETTER WAW
0x00e9: 0x0649, # ARABIC LETTER ALEF MAKSURA
0x00ea: 0x064a, # ARABIC LETTER YEH
0x00eb: 0x064b, # ARABIC FATHATAN
0x00ec: 0x064c, # ARABIC DAMMATAN
0x00ed: 0x064d, # ARABIC KASRATAN
0x00ee: 0x064e, # ARABIC FATHA
0x00ef: 0x064f, # ARABIC DAMMA
0x00f0: 0x0650, # ARABIC KASRA
0x00f1: 0x0651, # ARABIC SHADDA
0x00f2: 0x0652, # ARABIC SUKUN
0x00f3: 0x067e, # ARABIC LETTER PEH
0x00f4: 0x0679, # ARABIC LETTER TTEH
0x00f5: 0x0686, # ARABIC LETTER TCHEH
0x00f6: 0x06d5, # ARABIC LETTER AE
0x00f7: 0x06a4, # ARABIC LETTER VEH
0x00f8: 0x06af, # ARABIC LETTER GAF
0x00f9: 0x0688, # ARABIC LETTER DDAL
0x00fa: 0x0691, # ARABIC LETTER RREH
0x00fb: 0x007b, # LEFT CURLY BRACKET, right-left
0x00fc: 0x007c, # VERTICAL LINE, right-left
0x00fd: 0x007d, # RIGHT CURLY BRACKET, right-left
0x00fe: 0x0698, # ARABIC LETTER JEH
0x00ff: 0x06d2, # ARABIC LETTER YEH BARREE
})
### Decoding Table
decoding_table = (
u'\x00' # 0x0000 -> CONTROL CHARACTER
u'\x01' # 0x0001 -> CONTROL CHARACTER
u'\x02' # 0x0002 -> CONTROL CHARACTER
u'\x03' # 0x0003 -> CONTROL CHARACTER
u'\x04' # 0x0004 -> CONTROL CHARACTER
u'\x05' # 0x0005 -> CONTROL CHARACTER
u'\x06' # 0x0006 -> CONTROL CHARACTER
u'\x07' # 0x0007 -> CONTROL CHARACTER
u'\x08' # 0x0008 -> CONTROL CHARACTER
u'\t' # 0x0009 -> CONTROL CHARACTER
u'\n' # 0x000a -> CONTROL CHARACTER
u'\x0b' # 0x000b -> CONTROL CHARACTER
u'\x0c' # 0x000c -> CONTROL CHARACTER
u'\r' # 0x000d -> CONTROL CHARACTER
u'\x0e' # 0x000e -> CONTROL CHARACTER
u'\x0f' # 0x000f -> CONTROL CHARACTER
u'\x10' # 0x0010 -> CONTROL CHARACTER
u'\x11' # 0x0011 -> CONTROL CHARACTER
u'\x12' # 0x0012 -> CONTROL CHARACTER
u'\x13' # 0x0013 -> CONTROL CHARACTER
u'\x14' # 0x0014 -> CONTROL CHARACTER
u'\x15' # 0x0015 -> CONTROL CHARACTER
u'\x16' # 0x0016 -> CONTROL CHARACTER
u'\x17' # 0x0017 -> CONTROL CHARACTER
u'\x18' # 0x0018 -> CONTROL CHARACTER
u'\x19' # 0x0019 -> CONTROL CHARACTER
u'\x1a' # 0x001a -> CONTROL CHARACTER
u'\x1b' # 0x001b -> CONTROL CHARACTER
u'\x1c' # 0x001c -> CONTROL CHARACTER
u'\x1d' # 0x001d -> CONTROL CHARACTER
u'\x1e' # 0x001e -> CONTROL CHARACTER
u'\x1f' # 0x001f -> CONTROL CHARACTER
u' ' # 0x0020 -> SPACE, left-right
u'!' # 0x0021 -> EXCLAMATION MARK, left-right
u'"' # 0x0022 -> QUOTATION MARK, left-right
u'#' # 0x0023 -> NUMBER SIGN, left-right
u'$' # 0x0024 -> DOLLAR SIGN, left-right
u'%' # 0x0025 -> PERCENT SIGN, left-right
u'&' # 0x0026 -> AMPERSAND, left-right
u"'" # 0x0027 -> APOSTROPHE, left-right
u'(' # 0x0028 -> LEFT PARENTHESIS, left-right
u')' # 0x0029 -> RIGHT PARENTHESIS, left-right
u'*' # 0x002a -> ASTERISK, left-right
u'+' # 0x002b -> PLUS SIGN, left-right
u',' # 0x002c -> COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
u'-' # 0x002d -> HYPHEN-MINUS, left-right
u'.' # 0x002e -> FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
u'/' # 0x002f -> SOLIDUS, left-right
u'0' # 0x0030 -> DIGIT ZERO; in Arabic-script context, displayed as 0x0660 ARABIC-INDIC DIGIT ZERO
u'1' # 0x0031 -> DIGIT ONE; in Arabic-script context, displayed as 0x0661 ARABIC-INDIC DIGIT ONE
u'2' # 0x0032 -> DIGIT TWO; in Arabic-script context, displayed as 0x0662 ARABIC-INDIC DIGIT TWO
u'3' # 0x0033 -> DIGIT THREE; in Arabic-script context, displayed as 0x0663 ARABIC-INDIC DIGIT THREE
u'4' # 0x0034 -> DIGIT FOUR; in Arabic-script context, displayed as 0x0664 ARABIC-INDIC DIGIT FOUR
u'5' # 0x0035 -> DIGIT FIVE; in Arabic-script context, displayed as 0x0665 ARABIC-INDIC DIGIT FIVE
u'6' # 0x0036 -> DIGIT SIX; in Arabic-script context, displayed as 0x0666 ARABIC-INDIC DIGIT SIX
u'7' # 0x0037 -> DIGIT SEVEN; in Arabic-script context, displayed as 0x0667 ARABIC-INDIC DIGIT SEVEN
u'8' # 0x0038 -> DIGIT EIGHT; in Arabic-script context, displayed as 0x0668 ARABIC-INDIC DIGIT EIGHT
u'9' # 0x0039 -> DIGIT NINE; in Arabic-script context, displayed as 0x0669 ARABIC-INDIC DIGIT NINE
u':' # 0x003a -> COLON, left-right
u';' # 0x003b -> SEMICOLON, left-right
u'<' # 0x003c -> LESS-THAN SIGN, left-right
u'=' # 0x003d -> EQUALS SIGN, left-right
u'>' # 0x003e -> GREATER-THAN SIGN, left-right
u'?' # 0x003f -> QUESTION MARK, left-right
u'@' # 0x0040 -> COMMERCIAL AT
u'A' # 0x0041 -> LATIN CAPITAL LETTER A
u'B' # 0x0042 -> LATIN CAPITAL LETTER B
u'C' # 0x0043 -> LATIN CAPITAL LETTER C
u'D' # 0x0044 -> LATIN CAPITAL LETTER D
u'E' # 0x0045 -> LATIN CAPITAL LETTER E
u'F' # 0x0046 -> LATIN CAPITAL LETTER F
u'G' # 0x0047 -> LATIN CAPITAL LETTER G
u'H' # 0x0048 -> LATIN CAPITAL LETTER H
u'I' # 0x0049 -> LATIN CAPITAL LETTER I
u'J' # 0x004a -> LATIN CAPITAL LETTER J
u'K' # 0x004b -> LATIN CAPITAL LETTER K
u'L' # 0x004c -> LATIN CAPITAL LETTER L
u'M' # 0x004d -> LATIN CAPITAL LETTER M
u'N' # 0x004e -> LATIN CAPITAL LETTER N
u'O' # 0x004f -> LATIN CAPITAL LETTER O
u'P' # 0x0050 -> LATIN CAPITAL LETTER P
u'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
u'R' # 0x0052 -> LATIN CAPITAL LETTER R
u'S' # 0x0053 -> LATIN CAPITAL LETTER S
u'T' # 0x0054 -> LATIN CAPITAL LETTER T
u'U' # 0x0055 -> LATIN CAPITAL LETTER U
u'V' # 0x0056 -> LATIN CAPITAL LETTER V
u'W' # 0x0057 -> LATIN CAPITAL LETTER W
u'X' # 0x0058 -> LATIN CAPITAL LETTER X
u'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
u'Z' # 0x005a -> LATIN CAPITAL LETTER Z
u'[' # 0x005b -> LEFT SQUARE BRACKET, left-right
u'\\' # 0x005c -> REVERSE SOLIDUS, left-right
u']' # 0x005d -> RIGHT SQUARE BRACKET, left-right
u'^' # 0x005e -> CIRCUMFLEX ACCENT, left-right
u'_' # 0x005f -> LOW LINE, left-right
u'`' # 0x0060 -> GRAVE ACCENT
u'a' # 0x0061 -> LATIN SMALL LETTER A
u'b' # 0x0062 -> LATIN SMALL LETTER B
u'c' # 0x0063 -> LATIN SMALL LETTER C
u'd' # 0x0064 -> LATIN SMALL LETTER D
u'e' # 0x0065 -> LATIN SMALL LETTER E
u'f' # 0x0066 -> LATIN SMALL LETTER F
u'g' # 0x0067 -> LATIN SMALL LETTER G
u'h' # 0x0068 -> LATIN SMALL LETTER H
u'i' # 0x0069 -> LATIN SMALL LETTER I
u'j' # 0x006a -> LATIN SMALL LETTER J
u'k' # 0x006b -> LATIN SMALL LETTER K
u'l' # 0x006c -> LATIN SMALL LETTER L
u'm' # 0x006d -> LATIN SMALL LETTER M
u'n' # 0x006e -> LATIN SMALL LETTER N
u'o' # 0x006f -> LATIN SMALL LETTER O
u'p' # 0x0070 -> LATIN SMALL LETTER P
u'q' # 0x0071 -> LATIN SMALL LETTER Q
u'r' # 0x0072 -> LATIN SMALL LETTER R
u's' # 0x0073 -> LATIN SMALL LETTER S
u't' # 0x0074 -> LATIN SMALL LETTER T
u'u' # 0x0075 -> LATIN SMALL LETTER U
u'v' # 0x0076 -> LATIN SMALL LETTER V
u'w' # 0x0077 -> LATIN SMALL LETTER W
u'x' # 0x0078 -> LATIN SMALL LETTER X
u'y' # 0x0079 -> LATIN SMALL LETTER Y
u'z' # 0x007a -> LATIN SMALL LETTER Z
u'{' # 0x007b -> LEFT CURLY BRACKET, left-right
u'|' # 0x007c -> VERTICAL LINE, left-right
u'}' # 0x007d -> RIGHT CURLY BRACKET, left-right
u'~' # 0x007e -> TILDE
u'\x7f' # 0x007f -> CONTROL CHARACTER
u'\xc4' # 0x0080 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xa0' # 0x0081 -> NO-BREAK SPACE, right-left
u'\xc7' # 0x0082 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc9' # 0x0083 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xd1' # 0x0084 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd6' # 0x0085 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x0086 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xe1' # 0x0087 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe0' # 0x0088 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe2' # 0x0089 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x008a -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u06ba' # 0x008b -> ARABIC LETTER NOON GHUNNA
u'\xab' # 0x008c -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
u'\xe7' # 0x008d -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe9' # 0x008e -> LATIN SMALL LETTER E WITH ACUTE
u'\xe8' # 0x008f -> LATIN SMALL LETTER E WITH GRAVE
u'\xea' # 0x0090 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x0091 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xed' # 0x0092 -> LATIN SMALL LETTER I WITH ACUTE
u'\u2026' # 0x0093 -> HORIZONTAL ELLIPSIS, right-left
u'\xee' # 0x0094 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x0095 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf1' # 0x0096 -> LATIN SMALL LETTER N WITH TILDE
u'\xf3' # 0x0097 -> LATIN SMALL LETTER O WITH ACUTE
u'\xbb' # 0x0098 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
u'\xf4' # 0x0099 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x009a -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0x009b -> DIVISION SIGN, right-left
u'\xfa' # 0x009c -> LATIN SMALL LETTER U WITH ACUTE
u'\xf9' # 0x009d -> LATIN SMALL LETTER U WITH GRAVE
u'\xfb' # 0x009e -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0x009f -> LATIN SMALL LETTER U WITH DIAERESIS
u' ' # 0x00a0 -> SPACE, right-left
u'!' # 0x00a1 -> EXCLAMATION MARK, right-left
u'"' # 0x00a2 -> QUOTATION MARK, right-left
u'#' # 0x00a3 -> NUMBER SIGN, right-left
u'$' # 0x00a4 -> DOLLAR SIGN, right-left
u'\u066a' # 0x00a5 -> ARABIC PERCENT SIGN
u'&' # 0x00a6 -> AMPERSAND, right-left
u"'" # 0x00a7 -> APOSTROPHE, right-left
u'(' # 0x00a8 -> LEFT PARENTHESIS, right-left
u')' # 0x00a9 -> RIGHT PARENTHESIS, right-left
u'*' # 0x00aa -> ASTERISK, right-left
u'+' # 0x00ab -> PLUS SIGN, right-left
u'\u060c' # 0x00ac -> ARABIC COMMA
u'-' # 0x00ad -> HYPHEN-MINUS, right-left
u'.' # 0x00ae -> FULL STOP, right-left
u'/' # 0x00af -> SOLIDUS, right-left
u'\u0660' # 0x00b0 -> ARABIC-INDIC DIGIT ZERO, right-left (need override)
u'\u0661' # 0x00b1 -> ARABIC-INDIC DIGIT ONE, right-left (need override)
u'\u0662' # 0x00b2 -> ARABIC-INDIC DIGIT TWO, right-left (need override)
u'\u0663' # 0x00b3 -> ARABIC-INDIC DIGIT THREE, right-left (need override)
u'\u0664' # 0x00b4 -> ARABIC-INDIC DIGIT FOUR, right-left (need override)
u'\u0665' # 0x00b5 -> ARABIC-INDIC DIGIT FIVE, right-left (need override)
u'\u0666' # 0x00b6 -> ARABIC-INDIC DIGIT SIX, right-left (need override)
u'\u0667' # 0x00b7 -> ARABIC-INDIC DIGIT SEVEN, right-left (need override)
u'\u0668' # 0x00b8 -> ARABIC-INDIC DIGIT EIGHT, right-left (need override)
u'\u0669' # 0x00b9 -> ARABIC-INDIC DIGIT NINE, right-left (need override)
u':' # 0x00ba -> COLON, right-left
u'\u061b' # 0x00bb -> ARABIC SEMICOLON
u'<' # 0x00bc -> LESS-THAN SIGN, right-left
u'=' # 0x00bd -> EQUALS SIGN, right-left
u'>' # 0x00be -> GREATER-THAN SIGN, right-left
u'\u061f' # 0x00bf -> ARABIC QUESTION MARK
u'\u274a' # 0x00c0 -> EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
u'\u0621' # 0x00c1 -> ARABIC LETTER HAMZA
u'\u0622' # 0x00c2 -> ARABIC LETTER ALEF WITH MADDA ABOVE
u'\u0623' # 0x00c3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE
u'\u0624' # 0x00c4 -> ARABIC LETTER WAW WITH HAMZA ABOVE
u'\u0625' # 0x00c5 -> ARABIC LETTER ALEF WITH HAMZA BELOW
u'\u0626' # 0x00c6 -> ARABIC LETTER YEH WITH HAMZA ABOVE
u'\u0627' # 0x00c7 -> ARABIC LETTER ALEF
u'\u0628' # 0x00c8 -> ARABIC LETTER BEH
u'\u0629' # 0x00c9 -> ARABIC LETTER TEH MARBUTA
u'\u062a' # 0x00ca -> ARABIC LETTER TEH
u'\u062b' # 0x00cb -> ARABIC LETTER THEH
u'\u062c' # 0x00cc -> ARABIC LETTER JEEM
u'\u062d' # 0x00cd -> ARABIC LETTER HAH
u'\u062e' # 0x00ce -> ARABIC LETTER KHAH
u'\u062f' # 0x00cf -> ARABIC LETTER DAL
u'\u0630' # 0x00d0 -> ARABIC LETTER THAL
u'\u0631' # 0x00d1 -> ARABIC LETTER REH
u'\u0632' # 0x00d2 -> ARABIC LETTER ZAIN
u'\u0633' # 0x00d3 -> ARABIC LETTER SEEN
u'\u0634' # 0x00d4 -> ARABIC LETTER SHEEN
u'\u0635' # 0x00d5 -> ARABIC LETTER SAD
u'\u0636' # 0x00d6 -> ARABIC LETTER DAD
u'\u0637' # 0x00d7 -> ARABIC LETTER TAH
u'\u0638' # 0x00d8 -> ARABIC LETTER ZAH
u'\u0639' # 0x00d9 -> ARABIC LETTER AIN
u'\u063a' # 0x00da -> ARABIC LETTER GHAIN
u'[' # 0x00db -> LEFT SQUARE BRACKET, right-left
u'\\' # 0x00dc -> REVERSE SOLIDUS, right-left
u']' # 0x00dd -> RIGHT SQUARE BRACKET, right-left
u'^' # 0x00de -> CIRCUMFLEX ACCENT, right-left
u'_' # 0x00df -> LOW LINE, right-left
u'\u0640' # 0x00e0 -> ARABIC TATWEEL
u'\u0641' # 0x00e1 -> ARABIC LETTER FEH
u'\u0642' # 0x00e2 -> ARABIC LETTER QAF
u'\u0643' # 0x00e3 -> ARABIC LETTER KAF
u'\u0644' # 0x00e4 -> ARABIC LETTER LAM
u'\u0645' # 0x00e5 -> ARABIC LETTER MEEM
u'\u0646' # 0x00e6 -> ARABIC LETTER NOON
u'\u0647' # 0x00e7 -> ARABIC LETTER HEH
u'\u0648' # 0x00e8 -> ARABIC LETTER WAW
u'\u0649' # 0x00e9 -> ARABIC LETTER ALEF MAKSURA
u'\u064a' # 0x00ea -> ARABIC LETTER YEH
u'\u064b' # 0x00eb -> ARABIC FATHATAN
u'\u064c' # 0x00ec -> ARABIC DAMMATAN
u'\u064d' # 0x00ed -> ARABIC KASRATAN
u'\u064e' # 0x00ee -> ARABIC FATHA
u'\u064f' # 0x00ef -> ARABIC DAMMA
u'\u0650' # 0x00f0 -> ARABIC KASRA
u'\u0651' # 0x00f1 -> ARABIC SHADDA
u'\u0652' # 0x00f2 -> ARABIC SUKUN
u'\u067e' # 0x00f3 -> ARABIC LETTER PEH
u'\u0679' # 0x00f4 -> ARABIC LETTER TTEH
u'\u0686' # 0x00f5 -> ARABIC LETTER TCHEH
u'\u06d5' # 0x00f6 -> ARABIC LETTER AE
u'\u06a4' # 0x00f7 -> ARABIC LETTER VEH
u'\u06af' # 0x00f8 -> ARABIC LETTER GAF
u'\u0688' # 0x00f9 -> ARABIC LETTER DDAL
u'\u0691' # 0x00fa -> ARABIC LETTER RREH
u'{' # 0x00fb -> LEFT CURLY BRACKET, right-left
u'|' # 0x00fc -> VERTICAL LINE, right-left
u'}' # 0x00fd -> RIGHT CURLY BRACKET, right-left
u'\u0698' # 0x00fe -> ARABIC LETTER JEH
u'\u06d2' # 0x00ff -> ARABIC LETTER YEH BARREE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # CONTROL CHARACTER
0x0001: 0x0001, # CONTROL CHARACTER
0x0002: 0x0002, # CONTROL CHARACTER
0x0003: 0x0003, # CONTROL CHARACTER
0x0004: 0x0004, # CONTROL CHARACTER
0x0005: 0x0005, # CONTROL CHARACTER
0x0006: 0x0006, # CONTROL CHARACTER
0x0007: 0x0007, # CONTROL CHARACTER
0x0008: 0x0008, # CONTROL CHARACTER
0x0009: 0x0009, # CONTROL CHARACTER
0x000a: 0x000a, # CONTROL CHARACTER
0x000b: 0x000b, # CONTROL CHARACTER
0x000c: 0x000c, # CONTROL CHARACTER
0x000d: 0x000d, # CONTROL CHARACTER
0x000e: 0x000e, # CONTROL CHARACTER
0x000f: 0x000f, # CONTROL CHARACTER
0x0010: 0x0010, # CONTROL CHARACTER
0x0011: 0x0011, # CONTROL CHARACTER
0x0012: 0x0012, # CONTROL CHARACTER
0x0013: 0x0013, # CONTROL CHARACTER
0x0014: 0x0014, # CONTROL CHARACTER
0x0015: 0x0015, # CONTROL CHARACTER
0x0016: 0x0016, # CONTROL CHARACTER
0x0017: 0x0017, # CONTROL CHARACTER
0x0018: 0x0018, # CONTROL CHARACTER
0x0019: 0x0019, # CONTROL CHARACTER
0x001a: 0x001a, # CONTROL CHARACTER
0x001b: 0x001b, # CONTROL CHARACTER
0x001c: 0x001c, # CONTROL CHARACTER
0x001d: 0x001d, # CONTROL CHARACTER
0x001e: 0x001e, # CONTROL CHARACTER
0x001f: 0x001f, # CONTROL CHARACTER
0x0020: 0x0020, # SPACE, left-right
0x0020: 0x00a0, # SPACE, right-left
0x0021: 0x0021, # EXCLAMATION MARK, left-right
0x0021: 0x00a1, # EXCLAMATION MARK, right-left
0x0022: 0x0022, # QUOTATION MARK, left-right
0x0022: 0x00a2, # QUOTATION MARK, right-left
0x0023: 0x0023, # NUMBER SIGN, left-right
0x0023: 0x00a3, # NUMBER SIGN, right-left
0x0024: 0x0024, # DOLLAR SIGN, left-right
0x0024: 0x00a4, # DOLLAR SIGN, right-left
0x0025: 0x0025, # PERCENT SIGN, left-right
0x0026: 0x0026, # AMPERSAND, left-right
0x0026: 0x00a6, # AMPERSAND, right-left
0x0027: 0x0027, # APOSTROPHE, left-right
0x0027: 0x00a7, # APOSTROPHE, right-left
0x0028: 0x0028, # LEFT PARENTHESIS, left-right
0x0028: 0x00a8, # LEFT PARENTHESIS, right-left
0x0029: 0x0029, # RIGHT PARENTHESIS, left-right
0x0029: 0x00a9, # RIGHT PARENTHESIS, right-left
0x002a: 0x002a, # ASTERISK, left-right
0x002a: 0x00aa, # ASTERISK, right-left
0x002b: 0x002b, # PLUS SIGN, left-right
0x002b: 0x00ab, # PLUS SIGN, right-left
0x002c: 0x002c, # COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
0x002d: 0x002d, # HYPHEN-MINUS, left-right
0x002d: 0x00ad, # HYPHEN-MINUS, right-left
0x002e: 0x002e, # FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
0x002e: 0x00ae, # FULL STOP, right-left
0x002f: 0x002f, # SOLIDUS, left-right
0x002f: 0x00af, # SOLIDUS, right-left
0x0030: 0x0030, # DIGIT ZERO; in Arabic-script context, displayed as 0x0660 ARABIC-INDIC DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE; in Arabic-script context, displayed as 0x0661 ARABIC-INDIC DIGIT ONE
0x0032: 0x0032, # DIGIT TWO; in Arabic-script context, displayed as 0x0662 ARABIC-INDIC DIGIT TWO
0x0033: 0x0033, # DIGIT THREE; in Arabic-script context, displayed as 0x0663 ARABIC-INDIC DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR; in Arabic-script context, displayed as 0x0664 ARABIC-INDIC DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE; in Arabic-script context, displayed as 0x0665 ARABIC-INDIC DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX; in Arabic-script context, displayed as 0x0666 ARABIC-INDIC DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN; in Arabic-script context, displayed as 0x0667 ARABIC-INDIC DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT; in Arabic-script context, displayed as 0x0668 ARABIC-INDIC DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE; in Arabic-script context, displayed as 0x0669 ARABIC-INDIC DIGIT NINE
0x003a: 0x003a, # COLON, left-right
0x003a: 0x00ba, # COLON, right-left
0x003b: 0x003b, # SEMICOLON, left-right
0x003c: 0x003c, # LESS-THAN SIGN, left-right
0x003c: 0x00bc, # LESS-THAN SIGN, right-left
0x003d: 0x003d, # EQUALS SIGN, left-right
0x003d: 0x00bd, # EQUALS SIGN, right-left
0x003e: 0x003e, # GREATER-THAN SIGN, left-right
0x003e: 0x00be, # GREATER-THAN SIGN, right-left
0x003f: 0x003f, # QUESTION MARK, left-right
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET, left-right
0x005b: 0x00db, # LEFT SQUARE BRACKET, right-left
0x005c: 0x005c, # REVERSE SOLIDUS, left-right
0x005c: 0x00dc, # REVERSE SOLIDUS, right-left
0x005d: 0x005d, # RIGHT SQUARE BRACKET, left-right
0x005d: 0x00dd, # RIGHT SQUARE BRACKET, right-left
0x005e: 0x005e, # CIRCUMFLEX ACCENT, left-right
0x005e: 0x00de, # CIRCUMFLEX ACCENT, right-left
0x005f: 0x005f, # LOW LINE, left-right
0x005f: 0x00df, # LOW LINE, right-left
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET, left-right
0x007b: 0x00fb, # LEFT CURLY BRACKET, right-left
0x007c: 0x007c, # VERTICAL LINE, left-right
0x007c: 0x00fc, # VERTICAL LINE, right-left
0x007d: 0x007d, # RIGHT CURLY BRACKET, left-right
0x007d: 0x00fd, # RIGHT CURLY BRACKET, right-left
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # CONTROL CHARACTER
0x00a0: 0x0081, # NO-BREAK SPACE, right-left
0x00ab: 0x008c, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
0x00bb: 0x0098, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
0x00c4: 0x0080, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c7: 0x0082, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0083, # LATIN CAPITAL LETTER E WITH ACUTE
0x00d1: 0x0084, # LATIN CAPITAL LETTER N WITH TILDE
0x00d6: 0x0085, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00dc: 0x0086, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00e0: 0x0088, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x0087, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0089, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x008a, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e7: 0x008d, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008f, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x008e, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0090, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0091, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ed: 0x0092, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x0094, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x0095, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f1: 0x0096, # LATIN SMALL LETTER N WITH TILDE
0x00f3: 0x0097, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0099, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x009a, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x009b, # DIVISION SIGN, right-left
0x00f9: 0x009d, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x009c, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x009e, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x009f, # LATIN SMALL LETTER U WITH DIAERESIS
0x060c: 0x00ac, # ARABIC COMMA
0x061b: 0x00bb, # ARABIC SEMICOLON
0x061f: 0x00bf, # ARABIC QUESTION MARK
0x0621: 0x00c1, # ARABIC LETTER HAMZA
0x0622: 0x00c2, # ARABIC LETTER ALEF WITH MADDA ABOVE
0x0623: 0x00c3, # ARABIC LETTER ALEF WITH HAMZA ABOVE
0x0624: 0x00c4, # ARABIC LETTER WAW WITH HAMZA ABOVE
0x0625: 0x00c5, # ARABIC LETTER ALEF WITH HAMZA BELOW
0x0626: 0x00c6, # ARABIC LETTER YEH WITH HAMZA ABOVE
0x0627: 0x00c7, # ARABIC LETTER ALEF
0x0628: 0x00c8, # ARABIC LETTER BEH
0x0629: 0x00c9, # ARABIC LETTER TEH MARBUTA
0x062a: 0x00ca, # ARABIC LETTER TEH
0x062b: 0x00cb, # ARABIC LETTER THEH
0x062c: 0x00cc, # ARABIC LETTER JEEM
0x062d: 0x00cd, # ARABIC LETTER HAH
0x062e: 0x00ce, # ARABIC LETTER KHAH
0x062f: 0x00cf, # ARABIC LETTER DAL
0x0630: 0x00d0, # ARABIC LETTER THAL
0x0631: 0x00d1, # ARABIC LETTER REH
0x0632: 0x00d2, # ARABIC LETTER ZAIN
0x0633: 0x00d3, # ARABIC LETTER SEEN
0x0634: 0x00d4, # ARABIC LETTER SHEEN
0x0635: 0x00d5, # ARABIC LETTER SAD
0x0636: 0x00d6, # ARABIC LETTER DAD
0x0637: 0x00d7, # ARABIC LETTER TAH
0x0638: 0x00d8, # ARABIC LETTER ZAH
0x0639: 0x00d9, # ARABIC LETTER AIN
0x063a: 0x00da, # ARABIC LETTER GHAIN
0x0640: 0x00e0, # ARABIC TATWEEL
0x0641: 0x00e1, # ARABIC LETTER FEH
0x0642: 0x00e2, # ARABIC LETTER QAF
0x0643: 0x00e3, # ARABIC LETTER KAF
0x0644: 0x00e4, # ARABIC LETTER LAM
0x0645: 0x00e5, # ARABIC LETTER MEEM
0x0646: 0x00e6, # ARABIC LETTER NOON
0x0647: 0x00e7, # ARABIC LETTER HEH
0x0648: 0x00e8, # ARABIC LETTER WAW
0x0649: 0x00e9, # ARABIC LETTER ALEF MAKSURA
0x064a: 0x00ea, # ARABIC LETTER YEH
0x064b: 0x00eb, # ARABIC FATHATAN
0x064c: 0x00ec, # ARABIC DAMMATAN
0x064d: 0x00ed, # ARABIC KASRATAN
0x064e: 0x00ee, # ARABIC FATHA
0x064f: 0x00ef, # ARABIC DAMMA
0x0650: 0x00f0, # ARABIC KASRA
0x0651: 0x00f1, # ARABIC SHADDA
0x0652: 0x00f2, # ARABIC SUKUN
0x0660: 0x00b0, # ARABIC-INDIC DIGIT ZERO, right-left (need override)
0x0661: 0x00b1, # ARABIC-INDIC DIGIT ONE, right-left (need override)
0x0662: 0x00b2, # ARABIC-INDIC DIGIT TWO, right-left (need override)
0x0663: 0x00b3, # ARABIC-INDIC DIGIT THREE, right-left (need override)
0x0664: 0x00b4, # ARABIC-INDIC DIGIT FOUR, right-left (need override)
0x0665: 0x00b5, # ARABIC-INDIC DIGIT FIVE, right-left (need override)
0x0666: 0x00b6, # ARABIC-INDIC DIGIT SIX, right-left (need override)
0x0667: 0x00b7, # ARABIC-INDIC DIGIT SEVEN, right-left (need override)
0x0668: 0x00b8, # ARABIC-INDIC DIGIT EIGHT, right-left (need override)
0x0669: 0x00b9, # ARABIC-INDIC DIGIT NINE, right-left (need override)
0x066a: 0x00a5, # ARABIC PERCENT SIGN
0x0679: 0x00f4, # ARABIC LETTER TTEH
0x067e: 0x00f3, # ARABIC LETTER PEH
0x0686: 0x00f5, # ARABIC LETTER TCHEH
0x0688: 0x00f9, # ARABIC LETTER DDAL
0x0691: 0x00fa, # ARABIC LETTER RREH
0x0698: 0x00fe, # ARABIC LETTER JEH
0x06a4: 0x00f7, # ARABIC LETTER VEH
0x06af: 0x00f8, # ARABIC LETTER GAF
0x06ba: 0x008b, # ARABIC LETTER NOON GHUNNA
0x06d2: 0x00ff, # ARABIC LETTER YEH BARREE
0x06d5: 0x00f6, # ARABIC LETTER AE
0x2026: 0x0093, # HORIZONTAL ELLIPSIS, right-left
0x274a: 0x00c0, # EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
}
""" Python Character Mapping Codec mac_centeuro generated from 'MAPPINGS/VENDORS/APPLE/CENTEURO.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-centeuro',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\u0100' # 0x81 -> LATIN CAPITAL LETTER A WITH MACRON
u'\u0101' # 0x82 -> LATIN SMALL LETTER A WITH MACRON
u'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\u0104' # 0x84 -> LATIN CAPITAL LETTER A WITH OGONEK
u'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
u'\u0105' # 0x88 -> LATIN SMALL LETTER A WITH OGONEK
u'\u010c' # 0x89 -> LATIN CAPITAL LETTER C WITH CARON
u'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u010d' # 0x8B -> LATIN SMALL LETTER C WITH CARON
u'\u0106' # 0x8C -> LATIN CAPITAL LETTER C WITH ACUTE
u'\u0107' # 0x8D -> LATIN SMALL LETTER C WITH ACUTE
u'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
u'\u0179' # 0x8F -> LATIN CAPITAL LETTER Z WITH ACUTE
u'\u017a' # 0x90 -> LATIN SMALL LETTER Z WITH ACUTE
u'\u010e' # 0x91 -> LATIN CAPITAL LETTER D WITH CARON
u'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
u'\u010f' # 0x93 -> LATIN SMALL LETTER D WITH CARON
u'\u0112' # 0x94 -> LATIN CAPITAL LETTER E WITH MACRON
u'\u0113' # 0x95 -> LATIN SMALL LETTER E WITH MACRON
u'\u0116' # 0x96 -> LATIN CAPITAL LETTER E WITH DOT ABOVE
u'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
u'\u0117' # 0x98 -> LATIN SMALL LETTER E WITH DOT ABOVE
u'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
u'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
u'\u011a' # 0x9D -> LATIN CAPITAL LETTER E WITH CARON
u'\u011b' # 0x9E -> LATIN SMALL LETTER E WITH CARON
u'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u2020' # 0xA0 -> DAGGER
u'\xb0' # 0xA1 -> DEGREE SIGN
u'\u0118' # 0xA2 -> LATIN CAPITAL LETTER E WITH OGONEK
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa7' # 0xA4 -> SECTION SIGN
u'\u2022' # 0xA5 -> BULLET
u'\xb6' # 0xA6 -> PILCROW SIGN
u'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
u'\xae' # 0xA8 -> REGISTERED SIGN
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u2122' # 0xAA -> TRADE MARK SIGN
u'\u0119' # 0xAB -> LATIN SMALL LETTER E WITH OGONEK
u'\xa8' # 0xAC -> DIAERESIS
u'\u2260' # 0xAD -> NOT EQUAL TO
u'\u0123' # 0xAE -> LATIN SMALL LETTER G WITH CEDILLA
u'\u012e' # 0xAF -> LATIN CAPITAL LETTER I WITH OGONEK
u'\u012f' # 0xB0 -> LATIN SMALL LETTER I WITH OGONEK
u'\u012a' # 0xB1 -> LATIN CAPITAL LETTER I WITH MACRON
u'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
u'\u012b' # 0xB4 -> LATIN SMALL LETTER I WITH MACRON
u'\u0136' # 0xB5 -> LATIN CAPITAL LETTER K WITH CEDILLA
u'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
u'\u2211' # 0xB7 -> N-ARY SUMMATION
u'\u0142' # 0xB8 -> LATIN SMALL LETTER L WITH STROKE
u'\u013b' # 0xB9 -> LATIN CAPITAL LETTER L WITH CEDILLA
u'\u013c' # 0xBA -> LATIN SMALL LETTER L WITH CEDILLA
u'\u013d' # 0xBB -> LATIN CAPITAL LETTER L WITH CARON
u'\u013e' # 0xBC -> LATIN SMALL LETTER L WITH CARON
u'\u0139' # 0xBD -> LATIN CAPITAL LETTER L WITH ACUTE
u'\u013a' # 0xBE -> LATIN SMALL LETTER L WITH ACUTE
u'\u0145' # 0xBF -> LATIN CAPITAL LETTER N WITH CEDILLA
u'\u0146' # 0xC0 -> LATIN SMALL LETTER N WITH CEDILLA
u'\u0143' # 0xC1 -> LATIN CAPITAL LETTER N WITH ACUTE
u'\xac' # 0xC2 -> NOT SIGN
u'\u221a' # 0xC3 -> SQUARE ROOT
u'\u0144' # 0xC4 -> LATIN SMALL LETTER N WITH ACUTE
u'\u0147' # 0xC5 -> LATIN CAPITAL LETTER N WITH CARON
u'\u2206' # 0xC6 -> INCREMENT
u'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
u'\xa0' # 0xCA -> NO-BREAK SPACE
u'\u0148' # 0xCB -> LATIN SMALL LETTER N WITH CARON
u'\u0150' # 0xCC -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
u'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
u'\u0151' # 0xCE -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
u'\u014c' # 0xCF -> LATIN CAPITAL LETTER O WITH MACRON
u'\u2013' # 0xD0 -> EN DASH
u'\u2014' # 0xD1 -> EM DASH
u'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
u'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
u'\xf7' # 0xD6 -> DIVISION SIGN
u'\u25ca' # 0xD7 -> LOZENGE
u'\u014d' # 0xD8 -> LATIN SMALL LETTER O WITH MACRON
u'\u0154' # 0xD9 -> LATIN CAPITAL LETTER R WITH ACUTE
u'\u0155' # 0xDA -> LATIN SMALL LETTER R WITH ACUTE
u'\u0158' # 0xDB -> LATIN CAPITAL LETTER R WITH CARON
u'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\u0159' # 0xDE -> LATIN SMALL LETTER R WITH CARON
u'\u0156' # 0xDF -> LATIN CAPITAL LETTER R WITH CEDILLA
u'\u0157' # 0xE0 -> LATIN SMALL LETTER R WITH CEDILLA
u'\u0160' # 0xE1 -> LATIN CAPITAL LETTER S WITH CARON
u'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
u'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
u'\u0161' # 0xE4 -> LATIN SMALL LETTER S WITH CARON
u'\u015a' # 0xE5 -> LATIN CAPITAL LETTER S WITH ACUTE
u'\u015b' # 0xE6 -> LATIN SMALL LETTER S WITH ACUTE
u'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\u0164' # 0xE8 -> LATIN CAPITAL LETTER T WITH CARON
u'\u0165' # 0xE9 -> LATIN SMALL LETTER T WITH CARON
u'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
u'\u017d' # 0xEB -> LATIN CAPITAL LETTER Z WITH CARON
u'\u017e' # 0xEC -> LATIN SMALL LETTER Z WITH CARON
u'\u016a' # 0xED -> LATIN CAPITAL LETTER U WITH MACRON
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\u016b' # 0xF0 -> LATIN SMALL LETTER U WITH MACRON
u'\u016e' # 0xF1 -> LATIN CAPITAL LETTER U WITH RING ABOVE
u'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\u016f' # 0xF3 -> LATIN SMALL LETTER U WITH RING ABOVE
u'\u0170' # 0xF4 -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
u'\u0171' # 0xF5 -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
u'\u0172' # 0xF6 -> LATIN CAPITAL LETTER U WITH OGONEK
u'\u0173' # 0xF7 -> LATIN SMALL LETTER U WITH OGONEK
u'\xdd' # 0xF8 -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xfd' # 0xF9 -> LATIN SMALL LETTER Y WITH ACUTE
u'\u0137' # 0xFA -> LATIN SMALL LETTER K WITH CEDILLA
u'\u017b' # 0xFB -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
u'\u0141' # 0xFC -> LATIN CAPITAL LETTER L WITH STROKE
u'\u017c' # 0xFD -> LATIN SMALL LETTER Z WITH DOT ABOVE
u'\u0122' # 0xFE -> LATIN CAPITAL LETTER G WITH CEDILLA
u'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_croatian generated from 'MAPPINGS/VENDORS/APPLE/CROATIAN.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-croatian',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
u'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
u'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
u'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
u'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
u'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
u'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
u'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
u'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u2020' # 0xA0 -> DAGGER
u'\xb0' # 0xA1 -> DEGREE SIGN
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa7' # 0xA4 -> SECTION SIGN
u'\u2022' # 0xA5 -> BULLET
u'\xb6' # 0xA6 -> PILCROW SIGN
u'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
u'\xae' # 0xA8 -> REGISTERED SIGN
u'\u0160' # 0xA9 -> LATIN CAPITAL LETTER S WITH CARON
u'\u2122' # 0xAA -> TRADE MARK SIGN
u'\xb4' # 0xAB -> ACUTE ACCENT
u'\xa8' # 0xAC -> DIAERESIS
u'\u2260' # 0xAD -> NOT EQUAL TO
u'\u017d' # 0xAE -> LATIN CAPITAL LETTER Z WITH CARON
u'\xd8' # 0xAF -> LATIN CAPITAL LETTER O WITH STROKE
u'\u221e' # 0xB0 -> INFINITY
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
u'\u2206' # 0xB4 -> INCREMENT
u'\xb5' # 0xB5 -> MICRO SIGN
u'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
u'\u2211' # 0xB7 -> N-ARY SUMMATION
u'\u220f' # 0xB8 -> N-ARY PRODUCT
u'\u0161' # 0xB9 -> LATIN SMALL LETTER S WITH CARON
u'\u222b' # 0xBA -> INTEGRAL
u'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
u'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
u'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
u'\u017e' # 0xBE -> LATIN SMALL LETTER Z WITH CARON
u'\xf8' # 0xBF -> LATIN SMALL LETTER O WITH STROKE
u'\xbf' # 0xC0 -> INVERTED QUESTION MARK
u'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
u'\xac' # 0xC2 -> NOT SIGN
u'\u221a' # 0xC3 -> SQUARE ROOT
u'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
u'\u2248' # 0xC5 -> ALMOST EQUAL TO
u'\u0106' # 0xC6 -> LATIN CAPITAL LETTER C WITH ACUTE
u'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
u'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
u'\xa0' # 0xCA -> NO-BREAK SPACE
u'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
u'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
u'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
u'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
u'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
u'\u2014' # 0xD1 -> EM DASH
u'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
u'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
u'\xf7' # 0xD6 -> DIVISION SIGN
u'\u25ca' # 0xD7 -> LOZENGE
u'\uf8ff' # 0xD8 -> Apple logo
u'\xa9' # 0xD9 -> COPYRIGHT SIGN
u'\u2044' # 0xDA -> FRACTION SLASH
u'\u20ac' # 0xDB -> EURO SIGN
u'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\xc6' # 0xDE -> LATIN CAPITAL LETTER AE
u'\xbb' # 0xDF -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2013' # 0xE0 -> EN DASH
u'\xb7' # 0xE1 -> MIDDLE DOT
u'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
u'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2030' # 0xE4 -> PER MILLE SIGN
u'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\u0107' # 0xE6 -> LATIN SMALL LETTER C WITH ACUTE
u'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
u'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
u'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\u0131' # 0xF5 -> LATIN SMALL LETTER DOTLESS I
u'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u02dc' # 0xF7 -> SMALL TILDE
u'\xaf' # 0xF8 -> MACRON
u'\u03c0' # 0xF9 -> GREEK SMALL LETTER PI
u'\xcb' # 0xFA -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\u02da' # 0xFB -> RING ABOVE
u'\xb8' # 0xFC -> CEDILLA
u'\xca' # 0xFD -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xe6' # 0xFE -> LATIN SMALL LETTER AE
u'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_cyrillic generated from 'MAPPINGS/VENDORS/APPLE/CYRILLIC.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-cyrillic',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\u0410' # 0x80 -> CYRILLIC CAPITAL LETTER A
u'\u0411' # 0x81 -> CYRILLIC CAPITAL LETTER BE
u'\u0412' # 0x82 -> CYRILLIC CAPITAL LETTER VE
u'\u0413' # 0x83 -> CYRILLIC CAPITAL LETTER GHE
u'\u0414' # 0x84 -> CYRILLIC CAPITAL LETTER DE
u'\u0415' # 0x85 -> CYRILLIC CAPITAL LETTER IE
u'\u0416' # 0x86 -> CYRILLIC CAPITAL LETTER ZHE
u'\u0417' # 0x87 -> CYRILLIC CAPITAL LETTER ZE
u'\u0418' # 0x88 -> CYRILLIC CAPITAL LETTER I
u'\u0419' # 0x89 -> CYRILLIC CAPITAL LETTER SHORT I
u'\u041a' # 0x8A -> CYRILLIC CAPITAL LETTER KA
u'\u041b' # 0x8B -> CYRILLIC CAPITAL LETTER EL
u'\u041c' # 0x8C -> CYRILLIC CAPITAL LETTER EM
u'\u041d' # 0x8D -> CYRILLIC CAPITAL LETTER EN
u'\u041e' # 0x8E -> CYRILLIC CAPITAL LETTER O
u'\u041f' # 0x8F -> CYRILLIC CAPITAL LETTER PE
u'\u0420' # 0x90 -> CYRILLIC CAPITAL LETTER ER
u'\u0421' # 0x91 -> CYRILLIC CAPITAL LETTER ES
u'\u0422' # 0x92 -> CYRILLIC CAPITAL LETTER TE
u'\u0423' # 0x93 -> CYRILLIC CAPITAL LETTER U
u'\u0424' # 0x94 -> CYRILLIC CAPITAL LETTER EF
u'\u0425' # 0x95 -> CYRILLIC CAPITAL LETTER HA
u'\u0426' # 0x96 -> CYRILLIC CAPITAL LETTER TSE
u'\u0427' # 0x97 -> CYRILLIC CAPITAL LETTER CHE
u'\u0428' # 0x98 -> CYRILLIC CAPITAL LETTER SHA
u'\u0429' # 0x99 -> CYRILLIC CAPITAL LETTER SHCHA
u'\u042a' # 0x9A -> CYRILLIC CAPITAL LETTER HARD SIGN
u'\u042b' # 0x9B -> CYRILLIC CAPITAL LETTER YERU
u'\u042c' # 0x9C -> CYRILLIC CAPITAL LETTER SOFT SIGN
u'\u042d' # 0x9D -> CYRILLIC CAPITAL LETTER E
u'\u042e' # 0x9E -> CYRILLIC CAPITAL LETTER YU
u'\u042f' # 0x9F -> CYRILLIC CAPITAL LETTER YA
u'\u2020' # 0xA0 -> DAGGER
u'\xb0' # 0xA1 -> DEGREE SIGN
u'\u0490' # 0xA2 -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa7' # 0xA4 -> SECTION SIGN
u'\u2022' # 0xA5 -> BULLET
u'\xb6' # 0xA6 -> PILCROW SIGN
u'\u0406' # 0xA7 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
u'\xae' # 0xA8 -> REGISTERED SIGN
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u2122' # 0xAA -> TRADE MARK SIGN
u'\u0402' # 0xAB -> CYRILLIC CAPITAL LETTER DJE
u'\u0452' # 0xAC -> CYRILLIC SMALL LETTER DJE
u'\u2260' # 0xAD -> NOT EQUAL TO
u'\u0403' # 0xAE -> CYRILLIC CAPITAL LETTER GJE
u'\u0453' # 0xAF -> CYRILLIC SMALL LETTER GJE
u'\u221e' # 0xB0 -> INFINITY
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
u'\u0456' # 0xB4 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
u'\xb5' # 0xB5 -> MICRO SIGN
u'\u0491' # 0xB6 -> CYRILLIC SMALL LETTER GHE WITH UPTURN
u'\u0408' # 0xB7 -> CYRILLIC CAPITAL LETTER JE
u'\u0404' # 0xB8 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
u'\u0454' # 0xB9 -> CYRILLIC SMALL LETTER UKRAINIAN IE
u'\u0407' # 0xBA -> CYRILLIC CAPITAL LETTER YI
u'\u0457' # 0xBB -> CYRILLIC SMALL LETTER YI
u'\u0409' # 0xBC -> CYRILLIC CAPITAL LETTER LJE
u'\u0459' # 0xBD -> CYRILLIC SMALL LETTER LJE
u'\u040a' # 0xBE -> CYRILLIC CAPITAL LETTER NJE
u'\u045a' # 0xBF -> CYRILLIC SMALL LETTER NJE
u'\u0458' # 0xC0 -> CYRILLIC SMALL LETTER JE
u'\u0405' # 0xC1 -> CYRILLIC CAPITAL LETTER DZE
u'\xac' # 0xC2 -> NOT SIGN
u'\u221a' # 0xC3 -> SQUARE ROOT
u'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
u'\u2248' # 0xC5 -> ALMOST EQUAL TO
u'\u2206' # 0xC6 -> INCREMENT
u'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
u'\xa0' # 0xCA -> NO-BREAK SPACE
u'\u040b' # 0xCB -> CYRILLIC CAPITAL LETTER TSHE
u'\u045b' # 0xCC -> CYRILLIC SMALL LETTER TSHE
u'\u040c' # 0xCD -> CYRILLIC CAPITAL LETTER KJE
u'\u045c' # 0xCE -> CYRILLIC SMALL LETTER KJE
u'\u0455' # 0xCF -> CYRILLIC SMALL LETTER DZE
u'\u2013' # 0xD0 -> EN DASH
u'\u2014' # 0xD1 -> EM DASH
u'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
u'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
u'\xf7' # 0xD6 -> DIVISION SIGN
u'\u201e' # 0xD7 -> DOUBLE LOW-9 QUOTATION MARK
u'\u040e' # 0xD8 -> CYRILLIC CAPITAL LETTER SHORT U
u'\u045e' # 0xD9 -> CYRILLIC SMALL LETTER SHORT U
u'\u040f' # 0xDA -> CYRILLIC CAPITAL LETTER DZHE
u'\u045f' # 0xDB -> CYRILLIC SMALL LETTER DZHE
u'\u2116' # 0xDC -> NUMERO SIGN
u'\u0401' # 0xDD -> CYRILLIC CAPITAL LETTER IO
u'\u0451' # 0xDE -> CYRILLIC SMALL LETTER IO
u'\u044f' # 0xDF -> CYRILLIC SMALL LETTER YA
u'\u0430' # 0xE0 -> CYRILLIC SMALL LETTER A
u'\u0431' # 0xE1 -> CYRILLIC SMALL LETTER BE
u'\u0432' # 0xE2 -> CYRILLIC SMALL LETTER VE
u'\u0433' # 0xE3 -> CYRILLIC SMALL LETTER GHE
u'\u0434' # 0xE4 -> CYRILLIC SMALL LETTER DE
u'\u0435' # 0xE5 -> CYRILLIC SMALL LETTER IE
u'\u0436' # 0xE6 -> CYRILLIC SMALL LETTER ZHE
u'\u0437' # 0xE7 -> CYRILLIC SMALL LETTER ZE
u'\u0438' # 0xE8 -> CYRILLIC SMALL LETTER I
u'\u0439' # 0xE9 -> CYRILLIC SMALL LETTER SHORT I
u'\u043a' # 0xEA -> CYRILLIC SMALL LETTER KA
u'\u043b' # 0xEB -> CYRILLIC SMALL LETTER EL
u'\u043c' # 0xEC -> CYRILLIC SMALL LETTER EM
u'\u043d' # 0xED -> CYRILLIC SMALL LETTER EN
u'\u043e' # 0xEE -> CYRILLIC SMALL LETTER O
u'\u043f' # 0xEF -> CYRILLIC SMALL LETTER PE
u'\u0440' # 0xF0 -> CYRILLIC SMALL LETTER ER
u'\u0441' # 0xF1 -> CYRILLIC SMALL LETTER ES
u'\u0442' # 0xF2 -> CYRILLIC SMALL LETTER TE
u'\u0443' # 0xF3 -> CYRILLIC SMALL LETTER U
u'\u0444' # 0xF4 -> CYRILLIC SMALL LETTER EF
u'\u0445' # 0xF5 -> CYRILLIC SMALL LETTER HA
u'\u0446' # 0xF6 -> CYRILLIC SMALL LETTER TSE
u'\u0447' # 0xF7 -> CYRILLIC SMALL LETTER CHE
u'\u0448' # 0xF8 -> CYRILLIC SMALL LETTER SHA
u'\u0449' # 0xF9 -> CYRILLIC SMALL LETTER SHCHA
u'\u044a' # 0xFA -> CYRILLIC SMALL LETTER HARD SIGN
u'\u044b' # 0xFB -> CYRILLIC SMALL LETTER YERU
u'\u044c' # 0xFC -> CYRILLIC SMALL LETTER SOFT SIGN
u'\u044d' # 0xFD -> CYRILLIC SMALL LETTER E
u'\u044e' # 0xFE -> CYRILLIC SMALL LETTER YU
u'\u20ac' # 0xFF -> EURO SIGN
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_farsi generated from 'MAPPINGS/VENDORS/APPLE/FARSI.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-farsi',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE, left-right
u'!' # 0x21 -> EXCLAMATION MARK, left-right
u'"' # 0x22 -> QUOTATION MARK, left-right
u'#' # 0x23 -> NUMBER SIGN, left-right
u'$' # 0x24 -> DOLLAR SIGN, left-right
u'%' # 0x25 -> PERCENT SIGN, left-right
u'&' # 0x26 -> AMPERSAND, left-right
u"'" # 0x27 -> APOSTROPHE, left-right
u'(' # 0x28 -> LEFT PARENTHESIS, left-right
u')' # 0x29 -> RIGHT PARENTHESIS, left-right
u'*' # 0x2A -> ASTERISK, left-right
u'+' # 0x2B -> PLUS SIGN, left-right
u',' # 0x2C -> COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
u'-' # 0x2D -> HYPHEN-MINUS, left-right
u'.' # 0x2E -> FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
u'/' # 0x2F -> SOLIDUS, left-right
u'0' # 0x30 -> DIGIT ZERO; in Arabic-script context, displayed as 0x06F0 EXTENDED ARABIC-INDIC DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE; in Arabic-script context, displayed as 0x06F1 EXTENDED ARABIC-INDIC DIGIT ONE
u'2' # 0x32 -> DIGIT TWO; in Arabic-script context, displayed as 0x06F2 EXTENDED ARABIC-INDIC DIGIT TWO
u'3' # 0x33 -> DIGIT THREE; in Arabic-script context, displayed as 0x06F3 EXTENDED ARABIC-INDIC DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR; in Arabic-script context, displayed as 0x06F4 EXTENDED ARABIC-INDIC DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE; in Arabic-script context, displayed as 0x06F5 EXTENDED ARABIC-INDIC DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX; in Arabic-script context, displayed as 0x06F6 EXTENDED ARABIC-INDIC DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN; in Arabic-script context, displayed as 0x06F7 EXTENDED ARABIC-INDIC DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT; in Arabic-script context, displayed as 0x06F8 EXTENDED ARABIC-INDIC DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE; in Arabic-script context, displayed as 0x06F9 EXTENDED ARABIC-INDIC DIGIT NINE
u':' # 0x3A -> COLON, left-right
u';' # 0x3B -> SEMICOLON, left-right
u'<' # 0x3C -> LESS-THAN SIGN, left-right
u'=' # 0x3D -> EQUALS SIGN, left-right
u'>' # 0x3E -> GREATER-THAN SIGN, left-right
u'?' # 0x3F -> QUESTION MARK, left-right
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET, left-right
u'\\' # 0x5C -> REVERSE SOLIDUS, left-right
u']' # 0x5D -> RIGHT SQUARE BRACKET, left-right
u'^' # 0x5E -> CIRCUMFLEX ACCENT, left-right
u'_' # 0x5F -> LOW LINE, left-right
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET, left-right
u'|' # 0x7C -> VERTICAL LINE, left-right
u'}' # 0x7D -> RIGHT CURLY BRACKET, left-right
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xa0' # 0x81 -> NO-BREAK SPACE, right-left
u'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u06ba' # 0x8B -> ARABIC LETTER NOON GHUNNA
u'\xab' # 0x8C -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
u'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
u'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
u'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
u'\u2026' # 0x93 -> HORIZONTAL ELLIPSIS, right-left
u'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
u'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
u'\xbb' # 0x98 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
u'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf7' # 0x9B -> DIVISION SIGN, right-left
u'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
u'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
u'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
u' ' # 0xA0 -> SPACE, right-left
u'!' # 0xA1 -> EXCLAMATION MARK, right-left
u'"' # 0xA2 -> QUOTATION MARK, right-left
u'#' # 0xA3 -> NUMBER SIGN, right-left
u'$' # 0xA4 -> DOLLAR SIGN, right-left
u'\u066a' # 0xA5 -> ARABIC PERCENT SIGN
u'&' # 0xA6 -> AMPERSAND, right-left
u"'" # 0xA7 -> APOSTROPHE, right-left
u'(' # 0xA8 -> LEFT PARENTHESIS, right-left
u')' # 0xA9 -> RIGHT PARENTHESIS, right-left
u'*' # 0xAA -> ASTERISK, right-left
u'+' # 0xAB -> PLUS SIGN, right-left
u'\u060c' # 0xAC -> ARABIC COMMA
u'-' # 0xAD -> HYPHEN-MINUS, right-left
u'.' # 0xAE -> FULL STOP, right-left
u'/' # 0xAF -> SOLIDUS, right-left
u'\u06f0' # 0xB0 -> EXTENDED ARABIC-INDIC DIGIT ZERO, right-left (need override)
u'\u06f1' # 0xB1 -> EXTENDED ARABIC-INDIC DIGIT ONE, right-left (need override)
u'\u06f2' # 0xB2 -> EXTENDED ARABIC-INDIC DIGIT TWO, right-left (need override)
u'\u06f3' # 0xB3 -> EXTENDED ARABIC-INDIC DIGIT THREE, right-left (need override)
u'\u06f4' # 0xB4 -> EXTENDED ARABIC-INDIC DIGIT FOUR, right-left (need override)
u'\u06f5' # 0xB5 -> EXTENDED ARABIC-INDIC DIGIT FIVE, right-left (need override)
u'\u06f6' # 0xB6 -> EXTENDED ARABIC-INDIC DIGIT SIX, right-left (need override)
u'\u06f7' # 0xB7 -> EXTENDED ARABIC-INDIC DIGIT SEVEN, right-left (need override)
u'\u06f8' # 0xB8 -> EXTENDED ARABIC-INDIC DIGIT EIGHT, right-left (need override)
u'\u06f9' # 0xB9 -> EXTENDED ARABIC-INDIC DIGIT NINE, right-left (need override)
u':' # 0xBA -> COLON, right-left
u'\u061b' # 0xBB -> ARABIC SEMICOLON
u'<' # 0xBC -> LESS-THAN SIGN, right-left
u'=' # 0xBD -> EQUALS SIGN, right-left
u'>' # 0xBE -> GREATER-THAN SIGN, right-left
u'\u061f' # 0xBF -> ARABIC QUESTION MARK
u'\u274a' # 0xC0 -> EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
u'\u0621' # 0xC1 -> ARABIC LETTER HAMZA
u'\u0622' # 0xC2 -> ARABIC LETTER ALEF WITH MADDA ABOVE
u'\u0623' # 0xC3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE
u'\u0624' # 0xC4 -> ARABIC LETTER WAW WITH HAMZA ABOVE
u'\u0625' # 0xC5 -> ARABIC LETTER ALEF WITH HAMZA BELOW
u'\u0626' # 0xC6 -> ARABIC LETTER YEH WITH HAMZA ABOVE
u'\u0627' # 0xC7 -> ARABIC LETTER ALEF
u'\u0628' # 0xC8 -> ARABIC LETTER BEH
u'\u0629' # 0xC9 -> ARABIC LETTER TEH MARBUTA
u'\u062a' # 0xCA -> ARABIC LETTER TEH
u'\u062b' # 0xCB -> ARABIC LETTER THEH
u'\u062c' # 0xCC -> ARABIC LETTER JEEM
u'\u062d' # 0xCD -> ARABIC LETTER HAH
u'\u062e' # 0xCE -> ARABIC LETTER KHAH
u'\u062f' # 0xCF -> ARABIC LETTER DAL
u'\u0630' # 0xD0 -> ARABIC LETTER THAL
u'\u0631' # 0xD1 -> ARABIC LETTER REH
u'\u0632' # 0xD2 -> ARABIC LETTER ZAIN
u'\u0633' # 0xD3 -> ARABIC LETTER SEEN
u'\u0634' # 0xD4 -> ARABIC LETTER SHEEN
u'\u0635' # 0xD5 -> ARABIC LETTER SAD
u'\u0636' # 0xD6 -> ARABIC LETTER DAD
u'\u0637' # 0xD7 -> ARABIC LETTER TAH
u'\u0638' # 0xD8 -> ARABIC LETTER ZAH
u'\u0639' # 0xD9 -> ARABIC LETTER AIN
u'\u063a' # 0xDA -> ARABIC LETTER GHAIN
u'[' # 0xDB -> LEFT SQUARE BRACKET, right-left
u'\\' # 0xDC -> REVERSE SOLIDUS, right-left
u']' # 0xDD -> RIGHT SQUARE BRACKET, right-left
u'^' # 0xDE -> CIRCUMFLEX ACCENT, right-left
u'_' # 0xDF -> LOW LINE, right-left
u'\u0640' # 0xE0 -> ARABIC TATWEEL
u'\u0641' # 0xE1 -> ARABIC LETTER FEH
u'\u0642' # 0xE2 -> ARABIC LETTER QAF
u'\u0643' # 0xE3 -> ARABIC LETTER KAF
u'\u0644' # 0xE4 -> ARABIC LETTER LAM
u'\u0645' # 0xE5 -> ARABIC LETTER MEEM
u'\u0646' # 0xE6 -> ARABIC LETTER NOON
u'\u0647' # 0xE7 -> ARABIC LETTER HEH
u'\u0648' # 0xE8 -> ARABIC LETTER WAW
u'\u0649' # 0xE9 -> ARABIC LETTER ALEF MAKSURA
u'\u064a' # 0xEA -> ARABIC LETTER YEH
u'\u064b' # 0xEB -> ARABIC FATHATAN
u'\u064c' # 0xEC -> ARABIC DAMMATAN
u'\u064d' # 0xED -> ARABIC KASRATAN
u'\u064e' # 0xEE -> ARABIC FATHA
u'\u064f' # 0xEF -> ARABIC DAMMA
u'\u0650' # 0xF0 -> ARABIC KASRA
u'\u0651' # 0xF1 -> ARABIC SHADDA
u'\u0652' # 0xF2 -> ARABIC SUKUN
u'\u067e' # 0xF3 -> ARABIC LETTER PEH
u'\u0679' # 0xF4 -> ARABIC LETTER TTEH
u'\u0686' # 0xF5 -> ARABIC LETTER TCHEH
u'\u06d5' # 0xF6 -> ARABIC LETTER AE
u'\u06a4' # 0xF7 -> ARABIC LETTER VEH
u'\u06af' # 0xF8 -> ARABIC LETTER GAF
u'\u0688' # 0xF9 -> ARABIC LETTER DDAL
u'\u0691' # 0xFA -> ARABIC LETTER RREH
u'{' # 0xFB -> LEFT CURLY BRACKET, right-left
u'|' # 0xFC -> VERTICAL LINE, right-left
u'}' # 0xFD -> RIGHT CURLY BRACKET, right-left
u'\u0698' # 0xFE -> ARABIC LETTER JEH
u'\u06d2' # 0xFF -> ARABIC LETTER YEH BARREE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_greek generated from 'MAPPINGS/VENDORS/APPLE/GREEK.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-greek',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xb9' # 0x81 -> SUPERSCRIPT ONE
u'\xb2' # 0x82 -> SUPERSCRIPT TWO
u'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xb3' # 0x84 -> SUPERSCRIPT THREE
u'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\u0385' # 0x87 -> GREEK DIALYTIKA TONOS
u'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
u'\u0384' # 0x8B -> GREEK TONOS
u'\xa8' # 0x8C -> DIAERESIS
u'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
u'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
u'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xa3' # 0x92 -> POUND SIGN
u'\u2122' # 0x93 -> TRADE MARK SIGN
u'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\u2022' # 0x96 -> BULLET
u'\xbd' # 0x97 -> VULGAR FRACTION ONE HALF
u'\u2030' # 0x98 -> PER MILLE SIGN
u'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xa6' # 0x9B -> BROKEN BAR
u'\u20ac' # 0x9C -> EURO SIGN # before Mac OS 9.2.2, was SOFT HYPHEN
u'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
u'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u2020' # 0xA0 -> DAGGER
u'\u0393' # 0xA1 -> GREEK CAPITAL LETTER GAMMA
u'\u0394' # 0xA2 -> GREEK CAPITAL LETTER DELTA
u'\u0398' # 0xA3 -> GREEK CAPITAL LETTER THETA
u'\u039b' # 0xA4 -> GREEK CAPITAL LETTER LAMDA
u'\u039e' # 0xA5 -> GREEK CAPITAL LETTER XI
u'\u03a0' # 0xA6 -> GREEK CAPITAL LETTER PI
u'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
u'\xae' # 0xA8 -> REGISTERED SIGN
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u03a3' # 0xAA -> GREEK CAPITAL LETTER SIGMA
u'\u03aa' # 0xAB -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
u'\xa7' # 0xAC -> SECTION SIGN
u'\u2260' # 0xAD -> NOT EQUAL TO
u'\xb0' # 0xAE -> DEGREE SIGN
u'\xb7' # 0xAF -> MIDDLE DOT
u'\u0391' # 0xB0 -> GREEK CAPITAL LETTER ALPHA
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
u'\xa5' # 0xB4 -> YEN SIGN
u'\u0392' # 0xB5 -> GREEK CAPITAL LETTER BETA
u'\u0395' # 0xB6 -> GREEK CAPITAL LETTER EPSILON
u'\u0396' # 0xB7 -> GREEK CAPITAL LETTER ZETA
u'\u0397' # 0xB8 -> GREEK CAPITAL LETTER ETA
u'\u0399' # 0xB9 -> GREEK CAPITAL LETTER IOTA
u'\u039a' # 0xBA -> GREEK CAPITAL LETTER KAPPA
u'\u039c' # 0xBB -> GREEK CAPITAL LETTER MU
u'\u03a6' # 0xBC -> GREEK CAPITAL LETTER PHI
u'\u03ab' # 0xBD -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
u'\u03a8' # 0xBE -> GREEK CAPITAL LETTER PSI
u'\u03a9' # 0xBF -> GREEK CAPITAL LETTER OMEGA
u'\u03ac' # 0xC0 -> GREEK SMALL LETTER ALPHA WITH TONOS
u'\u039d' # 0xC1 -> GREEK CAPITAL LETTER NU
u'\xac' # 0xC2 -> NOT SIGN
u'\u039f' # 0xC3 -> GREEK CAPITAL LETTER OMICRON
u'\u03a1' # 0xC4 -> GREEK CAPITAL LETTER RHO
u'\u2248' # 0xC5 -> ALMOST EQUAL TO
u'\u03a4' # 0xC6 -> GREEK CAPITAL LETTER TAU
u'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
u'\xa0' # 0xCA -> NO-BREAK SPACE
u'\u03a5' # 0xCB -> GREEK CAPITAL LETTER UPSILON
u'\u03a7' # 0xCC -> GREEK CAPITAL LETTER CHI
u'\u0386' # 0xCD -> GREEK CAPITAL LETTER ALPHA WITH TONOS
u'\u0388' # 0xCE -> GREEK CAPITAL LETTER EPSILON WITH TONOS
u'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
u'\u2013' # 0xD0 -> EN DASH
u'\u2015' # 0xD1 -> HORIZONTAL BAR
u'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
u'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
u'\xf7' # 0xD6 -> DIVISION SIGN
u'\u0389' # 0xD7 -> GREEK CAPITAL LETTER ETA WITH TONOS
u'\u038a' # 0xD8 -> GREEK CAPITAL LETTER IOTA WITH TONOS
u'\u038c' # 0xD9 -> GREEK CAPITAL LETTER OMICRON WITH TONOS
u'\u038e' # 0xDA -> GREEK CAPITAL LETTER UPSILON WITH TONOS
u'\u03ad' # 0xDB -> GREEK SMALL LETTER EPSILON WITH TONOS
u'\u03ae' # 0xDC -> GREEK SMALL LETTER ETA WITH TONOS
u'\u03af' # 0xDD -> GREEK SMALL LETTER IOTA WITH TONOS
u'\u03cc' # 0xDE -> GREEK SMALL LETTER OMICRON WITH TONOS
u'\u038f' # 0xDF -> GREEK CAPITAL LETTER OMEGA WITH TONOS
u'\u03cd' # 0xE0 -> GREEK SMALL LETTER UPSILON WITH TONOS
u'\u03b1' # 0xE1 -> GREEK SMALL LETTER ALPHA
u'\u03b2' # 0xE2 -> GREEK SMALL LETTER BETA
u'\u03c8' # 0xE3 -> GREEK SMALL LETTER PSI
u'\u03b4' # 0xE4 -> GREEK SMALL LETTER DELTA
u'\u03b5' # 0xE5 -> GREEK SMALL LETTER EPSILON
u'\u03c6' # 0xE6 -> GREEK SMALL LETTER PHI
u'\u03b3' # 0xE7 -> GREEK SMALL LETTER GAMMA
u'\u03b7' # 0xE8 -> GREEK SMALL LETTER ETA
u'\u03b9' # 0xE9 -> GREEK SMALL LETTER IOTA
u'\u03be' # 0xEA -> GREEK SMALL LETTER XI
u'\u03ba' # 0xEB -> GREEK SMALL LETTER KAPPA
u'\u03bb' # 0xEC -> GREEK SMALL LETTER LAMDA
u'\u03bc' # 0xED -> GREEK SMALL LETTER MU
u'\u03bd' # 0xEE -> GREEK SMALL LETTER NU
u'\u03bf' # 0xEF -> GREEK SMALL LETTER OMICRON
u'\u03c0' # 0xF0 -> GREEK SMALL LETTER PI
u'\u03ce' # 0xF1 -> GREEK SMALL LETTER OMEGA WITH TONOS
u'\u03c1' # 0xF2 -> GREEK SMALL LETTER RHO
u'\u03c3' # 0xF3 -> GREEK SMALL LETTER SIGMA
u'\u03c4' # 0xF4 -> GREEK SMALL LETTER TAU
u'\u03b8' # 0xF5 -> GREEK SMALL LETTER THETA
u'\u03c9' # 0xF6 -> GREEK SMALL LETTER OMEGA
u'\u03c2' # 0xF7 -> GREEK SMALL LETTER FINAL SIGMA
u'\u03c7' # 0xF8 -> GREEK SMALL LETTER CHI
u'\u03c5' # 0xF9 -> GREEK SMALL LETTER UPSILON
u'\u03b6' # 0xFA -> GREEK SMALL LETTER ZETA
u'\u03ca' # 0xFB -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
u'\u03cb' # 0xFC -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
u'\u0390' # 0xFD -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
u'\u03b0' # 0xFE -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
u'\xad' # 0xFF -> SOFT HYPHEN # before Mac OS 9.2.2, was undefined
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_iceland generated from 'MAPPINGS/VENDORS/APPLE/ICELAND.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-iceland',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
u'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
u'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
u'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
u'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
u'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
u'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
u'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
u'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
u'\xdd' # 0xA0 -> LATIN CAPITAL LETTER Y WITH ACUTE
u'\xb0' # 0xA1 -> DEGREE SIGN
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa7' # 0xA4 -> SECTION SIGN
u'\u2022' # 0xA5 -> BULLET
u'\xb6' # 0xA6 -> PILCROW SIGN
u'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
u'\xae' # 0xA8 -> REGISTERED SIGN
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u2122' # 0xAA -> TRADE MARK SIGN
u'\xb4' # 0xAB -> ACUTE ACCENT
u'\xa8' # 0xAC -> DIAERESIS
u'\u2260' # 0xAD -> NOT EQUAL TO
u'\xc6' # 0xAE -> LATIN CAPITAL LETTER AE
u'\xd8' # 0xAF -> LATIN CAPITAL LETTER O WITH STROKE
u'\u221e' # 0xB0 -> INFINITY
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
u'\xa5' # 0xB4 -> YEN SIGN
u'\xb5' # 0xB5 -> MICRO SIGN
u'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
u'\u2211' # 0xB7 -> N-ARY SUMMATION
u'\u220f' # 0xB8 -> N-ARY PRODUCT
u'\u03c0' # 0xB9 -> GREEK SMALL LETTER PI
u'\u222b' # 0xBA -> INTEGRAL
u'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
u'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
u'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
u'\xe6' # 0xBE -> LATIN SMALL LETTER AE
u'\xf8' # 0xBF -> LATIN SMALL LETTER O WITH STROKE
u'\xbf' # 0xC0 -> INVERTED QUESTION MARK
u'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
u'\xac' # 0xC2 -> NOT SIGN
u'\u221a' # 0xC3 -> SQUARE ROOT
u'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
u'\u2248' # 0xC5 -> ALMOST EQUAL TO
u'\u2206' # 0xC6 -> INCREMENT
u'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
u'\xa0' # 0xCA -> NO-BREAK SPACE
u'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
u'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
u'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
u'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
u'\u2013' # 0xD0 -> EN DASH
u'\u2014' # 0xD1 -> EM DASH
u'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
u'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
u'\xf7' # 0xD6 -> DIVISION SIGN
u'\u25ca' # 0xD7 -> LOZENGE
u'\xff' # 0xD8 -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\u0178' # 0xD9 -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\u2044' # 0xDA -> FRACTION SLASH
u'\u20ac' # 0xDB -> EURO SIGN
u'\xd0' # 0xDC -> LATIN CAPITAL LETTER ETH
u'\xf0' # 0xDD -> LATIN SMALL LETTER ETH
u'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN
u'\xfe' # 0xDF -> LATIN SMALL LETTER THORN
u'\xfd' # 0xE0 -> LATIN SMALL LETTER Y WITH ACUTE
u'\xb7' # 0xE1 -> MIDDLE DOT
u'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
u'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2030' # 0xE4 -> PER MILLE SIGN
u'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xca' # 0xE6 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xcb' # 0xE8 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\uf8ff' # 0xF0 -> Apple logo
u'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\u0131' # 0xF5 -> LATIN SMALL LETTER DOTLESS I
u'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u02dc' # 0xF7 -> SMALL TILDE
u'\xaf' # 0xF8 -> MACRON
u'\u02d8' # 0xF9 -> BREVE
u'\u02d9' # 0xFA -> DOT ABOVE
u'\u02da' # 0xFB -> RING ABOVE
u'\xb8' # 0xFC -> CEDILLA
u'\u02dd' # 0xFD -> DOUBLE ACUTE ACCENT
u'\u02db' # 0xFE -> OGONEK
u'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec generated from 'LATIN2.TXT' with gencodec.py.
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
(c) Copyright 2000 Guido van Rossum.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_map)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_map)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-latin2',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x0081: 0x0100, # LATIN CAPITAL LETTER A WITH MACRON
0x0082: 0x0101, # LATIN SMALL LETTER A WITH MACRON
0x0083: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0084: 0x0104, # LATIN CAPITAL LETTER A WITH OGONEK
0x0085: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x0086: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x0087: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x0088: 0x0105, # LATIN SMALL LETTER A WITH OGONEK
0x0089: 0x010c, # LATIN CAPITAL LETTER C WITH CARON
0x008a: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x008b: 0x010d, # LATIN SMALL LETTER C WITH CARON
0x008c: 0x0106, # LATIN CAPITAL LETTER C WITH ACUTE
0x008d: 0x0107, # LATIN SMALL LETTER C WITH ACUTE
0x008e: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x008f: 0x0179, # LATIN CAPITAL LETTER Z WITH ACUTE
0x0090: 0x017a, # LATIN SMALL LETTER Z WITH ACUTE
0x0091: 0x010e, # LATIN CAPITAL LETTER D WITH CARON
0x0092: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x0093: 0x010f, # LATIN SMALL LETTER D WITH CARON
0x0094: 0x0112, # LATIN CAPITAL LETTER E WITH MACRON
0x0095: 0x0113, # LATIN SMALL LETTER E WITH MACRON
0x0096: 0x0116, # LATIN CAPITAL LETTER E WITH DOT ABOVE
0x0097: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x0098: 0x0117, # LATIN SMALL LETTER E WITH DOT ABOVE
0x0099: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x009a: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x009b: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x009c: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x009d: 0x011a, # LATIN CAPITAL LETTER E WITH CARON
0x009e: 0x011b, # LATIN SMALL LETTER E WITH CARON
0x009f: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x00a0: 0x2020, # DAGGER
0x00a1: 0x00b0, # DEGREE SIGN
0x00a2: 0x0118, # LATIN CAPITAL LETTER E WITH OGONEK
0x00a4: 0x00a7, # SECTION SIGN
0x00a5: 0x2022, # BULLET
0x00a6: 0x00b6, # PILCROW SIGN
0x00a7: 0x00df, # LATIN SMALL LETTER SHARP S
0x00a8: 0x00ae, # REGISTERED SIGN
0x00aa: 0x2122, # TRADE MARK SIGN
0x00ab: 0x0119, # LATIN SMALL LETTER E WITH OGONEK
0x00ac: 0x00a8, # DIAERESIS
0x00ad: 0x2260, # NOT EQUAL TO
0x00ae: 0x0123, # LATIN SMALL LETTER G WITH CEDILLA
0x00af: 0x012e, # LATIN CAPITAL LETTER I WITH OGONEK
0x00b0: 0x012f, # LATIN SMALL LETTER I WITH OGONEK
0x00b1: 0x012a, # LATIN CAPITAL LETTER I WITH MACRON
0x00b2: 0x2264, # LESS-THAN OR EQUAL TO
0x00b3: 0x2265, # GREATER-THAN OR EQUAL TO
0x00b4: 0x012b, # LATIN SMALL LETTER I WITH MACRON
0x00b5: 0x0136, # LATIN CAPITAL LETTER K WITH CEDILLA
0x00b6: 0x2202, # PARTIAL DIFFERENTIAL
0x00b7: 0x2211, # N-ARY SUMMATION
0x00b8: 0x0142, # LATIN SMALL LETTER L WITH STROKE
0x00b9: 0x013b, # LATIN CAPITAL LETTER L WITH CEDILLA
0x00ba: 0x013c, # LATIN SMALL LETTER L WITH CEDILLA
0x00bb: 0x013d, # LATIN CAPITAL LETTER L WITH CARON
0x00bc: 0x013e, # LATIN SMALL LETTER L WITH CARON
0x00bd: 0x0139, # LATIN CAPITAL LETTER L WITH ACUTE
0x00be: 0x013a, # LATIN SMALL LETTER L WITH ACUTE
0x00bf: 0x0145, # LATIN CAPITAL LETTER N WITH CEDILLA
0x00c0: 0x0146, # LATIN SMALL LETTER N WITH CEDILLA
0x00c1: 0x0143, # LATIN CAPITAL LETTER N WITH ACUTE
0x00c2: 0x00ac, # NOT SIGN
0x00c3: 0x221a, # SQUARE ROOT
0x00c4: 0x0144, # LATIN SMALL LETTER N WITH ACUTE
0x00c5: 0x0147, # LATIN CAPITAL LETTER N WITH CARON
0x00c6: 0x2206, # INCREMENT
0x00c7: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00c8: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00c9: 0x2026, # HORIZONTAL ELLIPSIS
0x00ca: 0x00a0, # NO-BREAK SPACE
0x00cb: 0x0148, # LATIN SMALL LETTER N WITH CARON
0x00cc: 0x0150, # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
0x00cd: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00ce: 0x0151, # LATIN SMALL LETTER O WITH DOUBLE ACUTE
0x00cf: 0x014c, # LATIN CAPITAL LETTER O WITH MACRON
0x00d0: 0x2013, # EN DASH
0x00d1: 0x2014, # EM DASH
0x00d2: 0x201c, # LEFT DOUBLE QUOTATION MARK
0x00d3: 0x201d, # RIGHT DOUBLE QUOTATION MARK
0x00d4: 0x2018, # LEFT SINGLE QUOTATION MARK
0x00d5: 0x2019, # RIGHT SINGLE QUOTATION MARK
0x00d6: 0x00f7, # DIVISION SIGN
0x00d7: 0x25ca, # LOZENGE
0x00d8: 0x014d, # LATIN SMALL LETTER O WITH MACRON
0x00d9: 0x0154, # LATIN CAPITAL LETTER R WITH ACUTE
0x00da: 0x0155, # LATIN SMALL LETTER R WITH ACUTE
0x00db: 0x0158, # LATIN CAPITAL LETTER R WITH CARON
0x00dc: 0x2039, # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x00dd: 0x203a, # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x00de: 0x0159, # LATIN SMALL LETTER R WITH CARON
0x00df: 0x0156, # LATIN CAPITAL LETTER R WITH CEDILLA
0x00e0: 0x0157, # LATIN SMALL LETTER R WITH CEDILLA
0x00e1: 0x0160, # LATIN CAPITAL LETTER S WITH CARON
0x00e2: 0x201a, # SINGLE LOW-9 QUOTATION MARK
0x00e3: 0x201e, # DOUBLE LOW-9 QUOTATION MARK
0x00e4: 0x0161, # LATIN SMALL LETTER S WITH CARON
0x00e5: 0x015a, # LATIN CAPITAL LETTER S WITH ACUTE
0x00e6: 0x015b, # LATIN SMALL LETTER S WITH ACUTE
0x00e7: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00e8: 0x0164, # LATIN CAPITAL LETTER T WITH CARON
0x00e9: 0x0165, # LATIN SMALL LETTER T WITH CARON
0x00ea: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00eb: 0x017d, # LATIN CAPITAL LETTER Z WITH CARON
0x00ec: 0x017e, # LATIN SMALL LETTER Z WITH CARON
0x00ed: 0x016a, # LATIN CAPITAL LETTER U WITH MACRON
0x00ee: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00ef: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00f0: 0x016b, # LATIN SMALL LETTER U WITH MACRON
0x00f1: 0x016e, # LATIN CAPITAL LETTER U WITH RING ABOVE
0x00f2: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00f3: 0x016f, # LATIN SMALL LETTER U WITH RING ABOVE
0x00f4: 0x0170, # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
0x00f5: 0x0171, # LATIN SMALL LETTER U WITH DOUBLE ACUTE
0x00f6: 0x0172, # LATIN CAPITAL LETTER U WITH OGONEK
0x00f7: 0x0173, # LATIN SMALL LETTER U WITH OGONEK
0x00f8: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00f9: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x00fa: 0x0137, # LATIN SMALL LETTER K WITH CEDILLA
0x00fb: 0x017b, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x00fc: 0x0141, # LATIN CAPITAL LETTER L WITH STROKE
0x00fd: 0x017c, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x00fe: 0x0122, # LATIN CAPITAL LETTER G WITH CEDILLA
0x00ff: 0x02c7, # CARON
})
### Encoding Map
encoding_map = codecs.make_encoding_map(decoding_map)
""" Python Character Mapping Codec mac_roman generated from 'MAPPINGS/VENDORS/APPLE/ROMAN.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-roman',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
u'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
u'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
u'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
u'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
u'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
u'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
u'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
u'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u2020' # 0xA0 -> DAGGER
u'\xb0' # 0xA1 -> DEGREE SIGN
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa7' # 0xA4 -> SECTION SIGN
u'\u2022' # 0xA5 -> BULLET
u'\xb6' # 0xA6 -> PILCROW SIGN
u'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
u'\xae' # 0xA8 -> REGISTERED SIGN
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u2122' # 0xAA -> TRADE MARK SIGN
u'\xb4' # 0xAB -> ACUTE ACCENT
u'\xa8' # 0xAC -> DIAERESIS
u'\u2260' # 0xAD -> NOT EQUAL TO
u'\xc6' # 0xAE -> LATIN CAPITAL LETTER AE
u'\xd8' # 0xAF -> LATIN CAPITAL LETTER O WITH STROKE
u'\u221e' # 0xB0 -> INFINITY
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
u'\xa5' # 0xB4 -> YEN SIGN
u'\xb5' # 0xB5 -> MICRO SIGN
u'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
u'\u2211' # 0xB7 -> N-ARY SUMMATION
u'\u220f' # 0xB8 -> N-ARY PRODUCT
u'\u03c0' # 0xB9 -> GREEK SMALL LETTER PI
u'\u222b' # 0xBA -> INTEGRAL
u'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
u'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
u'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
u'\xe6' # 0xBE -> LATIN SMALL LETTER AE
u'\xf8' # 0xBF -> LATIN SMALL LETTER O WITH STROKE
u'\xbf' # 0xC0 -> INVERTED QUESTION MARK
u'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
u'\xac' # 0xC2 -> NOT SIGN
u'\u221a' # 0xC3 -> SQUARE ROOT
u'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
u'\u2248' # 0xC5 -> ALMOST EQUAL TO
u'\u2206' # 0xC6 -> INCREMENT
u'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
u'\xa0' # 0xCA -> NO-BREAK SPACE
u'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
u'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
u'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
u'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
u'\u2013' # 0xD0 -> EN DASH
u'\u2014' # 0xD1 -> EM DASH
u'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
u'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
u'\xf7' # 0xD6 -> DIVISION SIGN
u'\u25ca' # 0xD7 -> LOZENGE
u'\xff' # 0xD8 -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\u0178' # 0xD9 -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\u2044' # 0xDA -> FRACTION SLASH
u'\u20ac' # 0xDB -> EURO SIGN
u'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\ufb01' # 0xDE -> LATIN SMALL LIGATURE FI
u'\ufb02' # 0xDF -> LATIN SMALL LIGATURE FL
u'\u2021' # 0xE0 -> DOUBLE DAGGER
u'\xb7' # 0xE1 -> MIDDLE DOT
u'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
u'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2030' # 0xE4 -> PER MILLE SIGN
u'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xca' # 0xE6 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xcb' # 0xE8 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\uf8ff' # 0xF0 -> Apple logo
u'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\u0131' # 0xF5 -> LATIN SMALL LETTER DOTLESS I
u'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u02dc' # 0xF7 -> SMALL TILDE
u'\xaf' # 0xF8 -> MACRON
u'\u02d8' # 0xF9 -> BREVE
u'\u02d9' # 0xFA -> DOT ABOVE
u'\u02da' # 0xFB -> RING ABOVE
u'\xb8' # 0xFC -> CEDILLA
u'\u02dd' # 0xFD -> DOUBLE ACUTE ACCENT
u'\u02db' # 0xFE -> OGONEK
u'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_romanian generated from 'MAPPINGS/VENDORS/APPLE/ROMANIAN.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-romanian',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
u'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
u'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
u'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
u'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
u'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
u'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
u'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
u'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u2020' # 0xA0 -> DAGGER
u'\xb0' # 0xA1 -> DEGREE SIGN
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa7' # 0xA4 -> SECTION SIGN
u'\u2022' # 0xA5 -> BULLET
u'\xb6' # 0xA6 -> PILCROW SIGN
u'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
u'\xae' # 0xA8 -> REGISTERED SIGN
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u2122' # 0xAA -> TRADE MARK SIGN
u'\xb4' # 0xAB -> ACUTE ACCENT
u'\xa8' # 0xAC -> DIAERESIS
u'\u2260' # 0xAD -> NOT EQUAL TO
u'\u0102' # 0xAE -> LATIN CAPITAL LETTER A WITH BREVE
u'\u0218' # 0xAF -> LATIN CAPITAL LETTER S WITH COMMA BELOW # for Unicode 3.0 and later
u'\u221e' # 0xB0 -> INFINITY
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
u'\xa5' # 0xB4 -> YEN SIGN
u'\xb5' # 0xB5 -> MICRO SIGN
u'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
u'\u2211' # 0xB7 -> N-ARY SUMMATION
u'\u220f' # 0xB8 -> N-ARY PRODUCT
u'\u03c0' # 0xB9 -> GREEK SMALL LETTER PI
u'\u222b' # 0xBA -> INTEGRAL
u'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
u'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
u'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
u'\u0103' # 0xBE -> LATIN SMALL LETTER A WITH BREVE
u'\u0219' # 0xBF -> LATIN SMALL LETTER S WITH COMMA BELOW # for Unicode 3.0 and later
u'\xbf' # 0xC0 -> INVERTED QUESTION MARK
u'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
u'\xac' # 0xC2 -> NOT SIGN
u'\u221a' # 0xC3 -> SQUARE ROOT
u'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
u'\u2248' # 0xC5 -> ALMOST EQUAL TO
u'\u2206' # 0xC6 -> INCREMENT
u'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
u'\xa0' # 0xCA -> NO-BREAK SPACE
u'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
u'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
u'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
u'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
u'\u2013' # 0xD0 -> EN DASH
u'\u2014' # 0xD1 -> EM DASH
u'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
u'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
u'\xf7' # 0xD6 -> DIVISION SIGN
u'\u25ca' # 0xD7 -> LOZENGE
u'\xff' # 0xD8 -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\u0178' # 0xD9 -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\u2044' # 0xDA -> FRACTION SLASH
u'\u20ac' # 0xDB -> EURO SIGN
u'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
u'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
u'\u021a' # 0xDE -> LATIN CAPITAL LETTER T WITH COMMA BELOW # for Unicode 3.0 and later
u'\u021b' # 0xDF -> LATIN SMALL LETTER T WITH COMMA BELOW # for Unicode 3.0 and later
u'\u2021' # 0xE0 -> DOUBLE DAGGER
u'\xb7' # 0xE1 -> MIDDLE DOT
u'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
u'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2030' # 0xE4 -> PER MILLE SIGN
u'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xca' # 0xE6 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xcb' # 0xE8 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\uf8ff' # 0xF0 -> Apple logo
u'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\u0131' # 0xF5 -> LATIN SMALL LETTER DOTLESS I
u'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u02dc' # 0xF7 -> SMALL TILDE
u'\xaf' # 0xF8 -> MACRON
u'\u02d8' # 0xF9 -> BREVE
u'\u02d9' # 0xFA -> DOT ABOVE
u'\u02da' # 0xFB -> RING ABOVE
u'\xb8' # 0xFC -> CEDILLA
u'\u02dd' # 0xFD -> DOUBLE ACUTE ACCENT
u'\u02db' # 0xFE -> OGONEK
u'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_turkish generated from 'MAPPINGS/VENDORS/APPLE/TURKISH.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-turkish',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> CONTROL CHARACTER
u'\x01' # 0x01 -> CONTROL CHARACTER
u'\x02' # 0x02 -> CONTROL CHARACTER
u'\x03' # 0x03 -> CONTROL CHARACTER
u'\x04' # 0x04 -> CONTROL CHARACTER
u'\x05' # 0x05 -> CONTROL CHARACTER
u'\x06' # 0x06 -> CONTROL CHARACTER
u'\x07' # 0x07 -> CONTROL CHARACTER
u'\x08' # 0x08 -> CONTROL CHARACTER
u'\t' # 0x09 -> CONTROL CHARACTER
u'\n' # 0x0A -> CONTROL CHARACTER
u'\x0b' # 0x0B -> CONTROL CHARACTER
u'\x0c' # 0x0C -> CONTROL CHARACTER
u'\r' # 0x0D -> CONTROL CHARACTER
u'\x0e' # 0x0E -> CONTROL CHARACTER
u'\x0f' # 0x0F -> CONTROL CHARACTER
u'\x10' # 0x10 -> CONTROL CHARACTER
u'\x11' # 0x11 -> CONTROL CHARACTER
u'\x12' # 0x12 -> CONTROL CHARACTER
u'\x13' # 0x13 -> CONTROL CHARACTER
u'\x14' # 0x14 -> CONTROL CHARACTER
u'\x15' # 0x15 -> CONTROL CHARACTER
u'\x16' # 0x16 -> CONTROL CHARACTER
u'\x17' # 0x17 -> CONTROL CHARACTER
u'\x18' # 0x18 -> CONTROL CHARACTER
u'\x19' # 0x19 -> CONTROL CHARACTER
u'\x1a' # 0x1A -> CONTROL CHARACTER
u'\x1b' # 0x1B -> CONTROL CHARACTER
u'\x1c' # 0x1C -> CONTROL CHARACTER
u'\x1d' # 0x1D -> CONTROL CHARACTER
u'\x1e' # 0x1E -> CONTROL CHARACTER
u'\x1f' # 0x1F -> CONTROL CHARACTER
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> CONTROL CHARACTER
u'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
u'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
u'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
u'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
u'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
u'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
u'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
u'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
u'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
u'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
u'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
u'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
u'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
u'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
u'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
u'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
u'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
u'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
u'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
u'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
u'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
u'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
u'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
u'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
u'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
u'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
u'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
u'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
u'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
u'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
u'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
u'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
u'\u2020' # 0xA0 -> DAGGER
u'\xb0' # 0xA1 -> DEGREE SIGN
u'\xa2' # 0xA2 -> CENT SIGN
u'\xa3' # 0xA3 -> POUND SIGN
u'\xa7' # 0xA4 -> SECTION SIGN
u'\u2022' # 0xA5 -> BULLET
u'\xb6' # 0xA6 -> PILCROW SIGN
u'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
u'\xae' # 0xA8 -> REGISTERED SIGN
u'\xa9' # 0xA9 -> COPYRIGHT SIGN
u'\u2122' # 0xAA -> TRADE MARK SIGN
u'\xb4' # 0xAB -> ACUTE ACCENT
u'\xa8' # 0xAC -> DIAERESIS
u'\u2260' # 0xAD -> NOT EQUAL TO
u'\xc6' # 0xAE -> LATIN CAPITAL LETTER AE
u'\xd8' # 0xAF -> LATIN CAPITAL LETTER O WITH STROKE
u'\u221e' # 0xB0 -> INFINITY
u'\xb1' # 0xB1 -> PLUS-MINUS SIGN
u'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
u'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
u'\xa5' # 0xB4 -> YEN SIGN
u'\xb5' # 0xB5 -> MICRO SIGN
u'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
u'\u2211' # 0xB7 -> N-ARY SUMMATION
u'\u220f' # 0xB8 -> N-ARY PRODUCT
u'\u03c0' # 0xB9 -> GREEK SMALL LETTER PI
u'\u222b' # 0xBA -> INTEGRAL
u'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
u'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
u'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
u'\xe6' # 0xBE -> LATIN SMALL LETTER AE
u'\xf8' # 0xBF -> LATIN SMALL LETTER O WITH STROKE
u'\xbf' # 0xC0 -> INVERTED QUESTION MARK
u'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
u'\xac' # 0xC2 -> NOT SIGN
u'\u221a' # 0xC3 -> SQUARE ROOT
u'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
u'\u2248' # 0xC5 -> ALMOST EQUAL TO
u'\u2206' # 0xC6 -> INCREMENT
u'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
u'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
u'\xa0' # 0xCA -> NO-BREAK SPACE
u'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
u'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
u'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
u'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
u'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
u'\u2013' # 0xD0 -> EN DASH
u'\u2014' # 0xD1 -> EM DASH
u'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
u'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
u'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
u'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
u'\xf7' # 0xD6 -> DIVISION SIGN
u'\u25ca' # 0xD7 -> LOZENGE
u'\xff' # 0xD8 -> LATIN SMALL LETTER Y WITH DIAERESIS
u'\u0178' # 0xD9 -> LATIN CAPITAL LETTER Y WITH DIAERESIS
u'\u011e' # 0xDA -> LATIN CAPITAL LETTER G WITH BREVE
u'\u011f' # 0xDB -> LATIN SMALL LETTER G WITH BREVE
u'\u0130' # 0xDC -> LATIN CAPITAL LETTER I WITH DOT ABOVE
u'\u0131' # 0xDD -> LATIN SMALL LETTER DOTLESS I
u'\u015e' # 0xDE -> LATIN CAPITAL LETTER S WITH CEDILLA
u'\u015f' # 0xDF -> LATIN SMALL LETTER S WITH CEDILLA
u'\u2021' # 0xE0 -> DOUBLE DAGGER
u'\xb7' # 0xE1 -> MIDDLE DOT
u'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
u'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
u'\u2030' # 0xE4 -> PER MILLE SIGN
u'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
u'\xca' # 0xE6 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
u'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
u'\xcb' # 0xE8 -> LATIN CAPITAL LETTER E WITH DIAERESIS
u'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
u'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
u'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
u'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
u'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
u'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
u'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
u'\uf8ff' # 0xF0 -> Apple logo
u'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
u'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
u'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
u'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
u'\uf8a0' # 0xF5 -> undefined1
u'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
u'\u02dc' # 0xF7 -> SMALL TILDE
u'\xaf' # 0xF8 -> MACRON
u'\u02d8' # 0xF9 -> BREVE
u'\u02d9' # 0xFA -> DOT ABOVE
u'\u02da' # 0xFB -> RING ABOVE
u'\xb8' # 0xFC -> CEDILLA
u'\u02dd' # 0xFD -> DOUBLE ACUTE ACCENT
u'\u02db' # 0xFE -> OGONEK
u'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python 'mbcs' Codec for Windows
Cloned by Mark Hammond ([email protected]) from ascii.py,
which was written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
# Import them explicitly to cause an ImportError
# on non-Windows systems
from codecs import mbcs_encode, mbcs_decode
# for IncrementalDecoder, IncrementalEncoder, ...
import codecs
### Codec APIs
encode = mbcs_encode
def decode(input, errors='strict'):
return mbcs_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return mbcs_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = mbcs_decode
class StreamWriter(codecs.StreamWriter):
encode = mbcs_encode
class StreamReader(codecs.StreamReader):
decode = mbcs_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mbcs',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python Character Mapping Codec for PalmOS 3.5.
Written by Sjoerd Mullender ([email protected]); based on iso8859_15.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_map)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_map)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='palmos',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
# The PalmOS character set is mostly iso-8859-1 with some differences.
decoding_map.update({
0x0080: 0x20ac, # EURO SIGN
0x0082: 0x201a, # SINGLE LOW-9 QUOTATION MARK
0x0083: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x0084: 0x201e, # DOUBLE LOW-9 QUOTATION MARK
0x0085: 0x2026, # HORIZONTAL ELLIPSIS
0x0086: 0x2020, # DAGGER
0x0087: 0x2021, # DOUBLE DAGGER
0x0088: 0x02c6, # MODIFIER LETTER CIRCUMFLEX ACCENT
0x0089: 0x2030, # PER MILLE SIGN
0x008a: 0x0160, # LATIN CAPITAL LETTER S WITH CARON
0x008b: 0x2039, # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x008c: 0x0152, # LATIN CAPITAL LIGATURE OE
0x008d: 0x2666, # BLACK DIAMOND SUIT
0x008e: 0x2663, # BLACK CLUB SUIT
0x008f: 0x2665, # BLACK HEART SUIT
0x0090: 0x2660, # BLACK SPADE SUIT
0x0091: 0x2018, # LEFT SINGLE QUOTATION MARK
0x0092: 0x2019, # RIGHT SINGLE QUOTATION MARK
0x0093: 0x201c, # LEFT DOUBLE QUOTATION MARK
0x0094: 0x201d, # RIGHT DOUBLE QUOTATION MARK
0x0095: 0x2022, # BULLET
0x0096: 0x2013, # EN DASH
0x0097: 0x2014, # EM DASH
0x0098: 0x02dc, # SMALL TILDE
0x0099: 0x2122, # TRADE MARK SIGN
0x009a: 0x0161, # LATIN SMALL LETTER S WITH CARON
0x009c: 0x0153, # LATIN SMALL LIGATURE OE
0x009f: 0x0178, # LATIN CAPITAL LETTER Y WITH DIAERESIS
})
### Encoding Map
encoding_map = codecs.make_encoding_map(decoding_map)
""" Python Character Mapping Codec generated from 'PTCP154.txt' with gencodec.py.
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
(c) Copyright 2000 Guido van Rossum.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_map)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_map)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='ptcp154',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0496, # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
0x0081: 0x0492, # CYRILLIC CAPITAL LETTER GHE WITH STROKE
0x0082: 0x04ee, # CYRILLIC CAPITAL LETTER U WITH MACRON
0x0083: 0x0493, # CYRILLIC SMALL LETTER GHE WITH STROKE
0x0084: 0x201e, # DOUBLE LOW-9 QUOTATION MARK
0x0085: 0x2026, # HORIZONTAL ELLIPSIS
0x0086: 0x04b6, # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER
0x0087: 0x04ae, # CYRILLIC CAPITAL LETTER STRAIGHT U
0x0088: 0x04b2, # CYRILLIC CAPITAL LETTER HA WITH DESCENDER
0x0089: 0x04af, # CYRILLIC SMALL LETTER STRAIGHT U
0x008a: 0x04a0, # CYRILLIC CAPITAL LETTER BASHKIR KA
0x008b: 0x04e2, # CYRILLIC CAPITAL LETTER I WITH MACRON
0x008c: 0x04a2, # CYRILLIC CAPITAL LETTER EN WITH DESCENDER
0x008d: 0x049a, # CYRILLIC CAPITAL LETTER KA WITH DESCENDER
0x008e: 0x04ba, # CYRILLIC CAPITAL LETTER SHHA
0x008f: 0x04b8, # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE
0x0090: 0x0497, # CYRILLIC SMALL LETTER ZHE WITH DESCENDER
0x0091: 0x2018, # LEFT SINGLE QUOTATION MARK
0x0092: 0x2019, # RIGHT SINGLE QUOTATION MARK
0x0093: 0x201c, # LEFT DOUBLE QUOTATION MARK
0x0094: 0x201d, # RIGHT DOUBLE QUOTATION MARK
0x0095: 0x2022, # BULLET
0x0096: 0x2013, # EN DASH
0x0097: 0x2014, # EM DASH
0x0098: 0x04b3, # CYRILLIC SMALL LETTER HA WITH DESCENDER
0x0099: 0x04b7, # CYRILLIC SMALL LETTER CHE WITH DESCENDER
0x009a: 0x04a1, # CYRILLIC SMALL LETTER BASHKIR KA
0x009b: 0x04e3, # CYRILLIC SMALL LETTER I WITH MACRON
0x009c: 0x04a3, # CYRILLIC SMALL LETTER EN WITH DESCENDER
0x009d: 0x049b, # CYRILLIC SMALL LETTER KA WITH DESCENDER
0x009e: 0x04bb, # CYRILLIC SMALL LETTER SHHA
0x009f: 0x04b9, # CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE
0x00a1: 0x040e, # CYRILLIC CAPITAL LETTER SHORT U (Byelorussian)
0x00a2: 0x045e, # CYRILLIC SMALL LETTER SHORT U (Byelorussian)
0x00a3: 0x0408, # CYRILLIC CAPITAL LETTER JE
0x00a4: 0x04e8, # CYRILLIC CAPITAL LETTER BARRED O
0x00a5: 0x0498, # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER
0x00a6: 0x04b0, # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE
0x00a8: 0x0401, # CYRILLIC CAPITAL LETTER IO
0x00aa: 0x04d8, # CYRILLIC CAPITAL LETTER SCHWA
0x00ad: 0x04ef, # CYRILLIC SMALL LETTER U WITH MACRON
0x00af: 0x049c, # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE
0x00b1: 0x04b1, # CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE
0x00b2: 0x0406, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0x00b3: 0x0456, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0x00b4: 0x0499, # CYRILLIC SMALL LETTER ZE WITH DESCENDER
0x00b5: 0x04e9, # CYRILLIC SMALL LETTER BARRED O
0x00b8: 0x0451, # CYRILLIC SMALL LETTER IO
0x00b9: 0x2116, # NUMERO SIGN
0x00ba: 0x04d9, # CYRILLIC SMALL LETTER SCHWA
0x00bc: 0x0458, # CYRILLIC SMALL LETTER JE
0x00bd: 0x04aa, # CYRILLIC CAPITAL LETTER ES WITH DESCENDER
0x00be: 0x04ab, # CYRILLIC SMALL LETTER ES WITH DESCENDER
0x00bf: 0x049d, # CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE
0x00c0: 0x0410, # CYRILLIC CAPITAL LETTER A
0x00c1: 0x0411, # CYRILLIC CAPITAL LETTER BE
0x00c2: 0x0412, # CYRILLIC CAPITAL LETTER VE
0x00c3: 0x0413, # CYRILLIC CAPITAL LETTER GHE
0x00c4: 0x0414, # CYRILLIC CAPITAL LETTER DE
0x00c5: 0x0415, # CYRILLIC CAPITAL LETTER IE
0x00c6: 0x0416, # CYRILLIC CAPITAL LETTER ZHE
0x00c7: 0x0417, # CYRILLIC CAPITAL LETTER ZE
0x00c8: 0x0418, # CYRILLIC CAPITAL LETTER I
0x00c9: 0x0419, # CYRILLIC CAPITAL LETTER SHORT I
0x00ca: 0x041a, # CYRILLIC CAPITAL LETTER KA
0x00cb: 0x041b, # CYRILLIC CAPITAL LETTER EL
0x00cc: 0x041c, # CYRILLIC CAPITAL LETTER EM
0x00cd: 0x041d, # CYRILLIC CAPITAL LETTER EN
0x00ce: 0x041e, # CYRILLIC CAPITAL LETTER O
0x00cf: 0x041f, # CYRILLIC CAPITAL LETTER PE
0x00d0: 0x0420, # CYRILLIC CAPITAL LETTER ER
0x00d1: 0x0421, # CYRILLIC CAPITAL LETTER ES
0x00d2: 0x0422, # CYRILLIC CAPITAL LETTER TE
0x00d3: 0x0423, # CYRILLIC CAPITAL LETTER U
0x00d4: 0x0424, # CYRILLIC CAPITAL LETTER EF
0x00d5: 0x0425, # CYRILLIC CAPITAL LETTER HA
0x00d6: 0x0426, # CYRILLIC CAPITAL LETTER TSE
0x00d7: 0x0427, # CYRILLIC CAPITAL LETTER CHE
0x00d8: 0x0428, # CYRILLIC CAPITAL LETTER SHA
0x00d9: 0x0429, # CYRILLIC CAPITAL LETTER SHCHA
0x00da: 0x042a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x00db: 0x042b, # CYRILLIC CAPITAL LETTER YERU
0x00dc: 0x042c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x00dd: 0x042d, # CYRILLIC CAPITAL LETTER E
0x00de: 0x042e, # CYRILLIC CAPITAL LETTER YU
0x00df: 0x042f, # CYRILLIC CAPITAL LETTER YA
0x00e0: 0x0430, # CYRILLIC SMALL LETTER A
0x00e1: 0x0431, # CYRILLIC SMALL LETTER BE
0x00e2: 0x0432, # CYRILLIC SMALL LETTER VE
0x00e3: 0x0433, # CYRILLIC SMALL LETTER GHE
0x00e4: 0x0434, # CYRILLIC SMALL LETTER DE
0x00e5: 0x0435, # CYRILLIC SMALL LETTER IE
0x00e6: 0x0436, # CYRILLIC SMALL LETTER ZHE
0x00e7: 0x0437, # CYRILLIC SMALL LETTER ZE
0x00e8: 0x0438, # CYRILLIC SMALL LETTER I
0x00e9: 0x0439, # CYRILLIC SMALL LETTER SHORT I
0x00ea: 0x043a, # CYRILLIC SMALL LETTER KA
0x00eb: 0x043b, # CYRILLIC SMALL LETTER EL
0x00ec: 0x043c, # CYRILLIC SMALL LETTER EM
0x00ed: 0x043d, # CYRILLIC SMALL LETTER EN
0x00ee: 0x043e, # CYRILLIC SMALL LETTER O
0x00ef: 0x043f, # CYRILLIC SMALL LETTER PE
0x00f0: 0x0440, # CYRILLIC SMALL LETTER ER
0x00f1: 0x0441, # CYRILLIC SMALL LETTER ES
0x00f2: 0x0442, # CYRILLIC SMALL LETTER TE
0x00f3: 0x0443, # CYRILLIC SMALL LETTER U
0x00f4: 0x0444, # CYRILLIC SMALL LETTER EF
0x00f5: 0x0445, # CYRILLIC SMALL LETTER HA
0x00f6: 0x0446, # CYRILLIC SMALL LETTER TSE
0x00f7: 0x0447, # CYRILLIC SMALL LETTER CHE
0x00f8: 0x0448, # CYRILLIC SMALL LETTER SHA
0x00f9: 0x0449, # CYRILLIC SMALL LETTER SHCHA
0x00fa: 0x044a, # CYRILLIC SMALL LETTER HARD SIGN
0x00fb: 0x044b, # CYRILLIC SMALL LETTER YERU
0x00fc: 0x044c, # CYRILLIC SMALL LETTER SOFT SIGN
0x00fd: 0x044d, # CYRILLIC SMALL LETTER E
0x00fe: 0x044e, # CYRILLIC SMALL LETTER YU
0x00ff: 0x044f, # CYRILLIC SMALL LETTER YA
})
### Encoding Map
encoding_map = codecs.make_encoding_map(decoding_map)
# -*- coding: iso-8859-1 -*-
""" Codec for the Punicode encoding, as specified in RFC 3492
Written by Martin v. L�wis.
"""
import codecs
##################### Encoding #####################################
def segregate(str):
"""3.1 Basic code point segregation"""
base = []
extended = {}
for c in str:
if ord(c) < 128:
base.append(c)
else:
extended[c] = 1
extended = extended.keys()
extended.sort()
return "".join(base).encode("ascii"),extended
def selective_len(str, max):
"""Return the length of str, considering only characters below max."""
res = 0
for c in str:
if ord(c) < max:
res += 1
return res
def selective_find(str, char, index, pos):
"""Return a pair (index, pos), indicating the next occurrence of
char in str. index is the position of the character considering
only ordinals up to and including char, and pos is the position in
the full string. index/pos is the starting position in the full
string."""
l = len(str)
while 1:
pos += 1
if pos == l:
return (-1, -1)
c = str[pos]
if c == char:
return index+1, pos
elif c < char:
index += 1
def insertion_unsort(str, extended):
"""3.2 Insertion unsort coding"""
oldchar = 0x80
result = []
oldindex = -1
for c in extended:
index = pos = -1
char = ord(c)
curlen = selective_len(str, char)
delta = (curlen+1) * (char - oldchar)
while 1:
index,pos = selective_find(str,c,index,pos)
if index == -1:
break
delta += index - oldindex
result.append(delta-1)
oldindex = index
delta = 0
oldchar = char
return result
def T(j, bias):
# Punycode parameters: tmin = 1, tmax = 26, base = 36
res = 36 * (j + 1) - bias
if res < 1: return 1
if res > 26: return 26
return res
digits = "abcdefghijklmnopqrstuvwxyz0123456789"
def generate_generalized_integer(N, bias):
"""3.3 Generalized variable-length integers"""
result = []
j = 0
while 1:
t = T(j, bias)
if N < t:
result.append(digits[N])
return result
result.append(digits[t + ((N - t) % (36 - t))])
N = (N - t) // (36 - t)
j += 1
def adapt(delta, first, numchars):
if first:
delta //= 700
else:
delta //= 2
delta += delta // numchars
# ((base - tmin) * tmax) // 2 == 455
divisions = 0
while delta > 455:
delta = delta // 35 # base - tmin
divisions += 36
bias = divisions + (36 * delta // (delta + 38))
return bias
def generate_integers(baselen, deltas):
"""3.4 Bias adaptation"""
# Punycode parameters: initial bias = 72, damp = 700, skew = 38
result = []
bias = 72
for points, delta in enumerate(deltas):
s = generate_generalized_integer(delta, bias)
result.extend(s)
bias = adapt(delta, points==0, baselen+points+1)
return "".join(result)
def punycode_encode(text):
base, extended = segregate(text)
base = base.encode("ascii")
deltas = insertion_unsort(text, extended)
extended = generate_integers(len(base), deltas)
if base:
return base + "-" + extended
return extended
##################### Decoding #####################################
def decode_generalized_number(extended, extpos, bias, errors):
"""3.3 Generalized variable-length integers"""
result = 0
w = 1
j = 0
while 1:
try:
char = ord(extended[extpos])
except IndexError:
if errors == "strict":
raise UnicodeError, "incomplete punicode string"
return extpos + 1, None
extpos += 1
if 0x41 <= char <= 0x5A: # A-Z
digit = char - 0x41
elif 0x30 <= char <= 0x39:
digit = char - 22 # 0x30-26
elif errors == "strict":
raise UnicodeError("Invalid extended code point '%s'"
% extended[extpos])
else:
return extpos, None
t = T(j, bias)
result += digit * w
if digit < t:
return extpos, result
w = w * (36 - t)
j += 1
def insertion_sort(base, extended, errors):
"""3.2 Insertion unsort coding"""
char = 0x80
pos = -1
bias = 72
extpos = 0
while extpos < len(extended):
newpos, delta = decode_generalized_number(extended, extpos,
bias, errors)
if delta is None:
# There was an error in decoding. We can't continue because
# synchronization is lost.
return base
pos += delta+1
char += pos // (len(base) + 1)
if char > 0x10FFFF:
if errors == "strict":
raise UnicodeError, ("Invalid character U+%x" % char)
char = ord('?')
pos = pos % (len(base) + 1)
base = base[:pos] + unichr(char) + base[pos:]
bias = adapt(delta, (extpos == 0), len(base))
extpos = newpos
return base
def punycode_decode(text, errors):
pos = text.rfind("-")
if pos == -1:
base = ""
extended = text
else:
base = text[:pos]
extended = text[pos+1:]
base = unicode(base, "ascii", errors)
extended = extended.upper()
return insertion_sort(base, extended, errors)
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
res = punycode_encode(input)
return res, len(input)
def decode(self,input,errors='strict'):
if errors not in ('strict', 'replace', 'ignore'):
raise UnicodeError, "Unsupported error handling "+errors
res = punycode_decode(input, errors)
return res, len(input)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return punycode_encode(input)
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
if self.errors not in ('strict', 'replace', 'ignore'):
raise UnicodeError, "Unsupported error handling "+self.errors
return punycode_decode(input, self.errors)
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='punycode',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
"""Codec for quoted-printable encoding.
Like base64 and rot13, this returns Python strings, not Unicode.
"""
import codecs, quopri
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
def quopri_encode(input, errors='strict'):
"""Encode the input, returning a tuple (output object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
# using str() because of cStringIO's Unicode undesired Unicode behavior.
f = StringIO(str(input))
g = StringIO()
quopri.encode(f, g, quotetabs=True)
output = g.getvalue()
return (output, len(input))
def quopri_decode(input, errors='strict'):
"""Decode the input, returning a tuple (output object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
f = StringIO(str(input))
g = StringIO()
quopri.decode(f, g)
output = g.getvalue()
return (output, len(input))
class Codec(codecs.Codec):
def encode(self, input,errors='strict'):
return quopri_encode(input,errors)
def decode(self, input,errors='strict'):
return quopri_decode(input,errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return quopri_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return quopri_decode(input, self.errors)[0]
class StreamWriter(Codec, codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
# encodings module API
def getregentry():
return codecs.CodecInfo(
name='quopri',
encode=quopri_encode,
decode=quopri_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
""" Python 'raw-unicode-escape' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.raw_unicode_escape_encode
decode = codecs.raw_unicode_escape_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.raw_unicode_escape_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.raw_unicode_escape_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='raw-unicode-escape',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
#!/usr/bin/env python
""" Python Character Mapping Codec for ROT13.
See http://ucsub.colorado.edu/~kominek/rot13/ for details.
Written by Marc-Andre Lemburg ([email protected]).
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_map)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_map)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='rot-13',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0041: 0x004e,
0x0042: 0x004f,
0x0043: 0x0050,
0x0044: 0x0051,
0x0045: 0x0052,
0x0046: 0x0053,
0x0047: 0x0054,
0x0048: 0x0055,
0x0049: 0x0056,
0x004a: 0x0057,
0x004b: 0x0058,
0x004c: 0x0059,
0x004d: 0x005a,
0x004e: 0x0041,
0x004f: 0x0042,
0x0050: 0x0043,
0x0051: 0x0044,
0x0052: 0x0045,
0x0053: 0x0046,
0x0054: 0x0047,
0x0055: 0x0048,
0x0056: 0x0049,
0x0057: 0x004a,
0x0058: 0x004b,
0x0059: 0x004c,
0x005a: 0x004d,
0x0061: 0x006e,
0x0062: 0x006f,
0x0063: 0x0070,
0x0064: 0x0071,
0x0065: 0x0072,
0x0066: 0x0073,
0x0067: 0x0074,
0x0068: 0x0075,
0x0069: 0x0076,
0x006a: 0x0077,
0x006b: 0x0078,
0x006c: 0x0079,
0x006d: 0x007a,
0x006e: 0x0061,
0x006f: 0x0062,
0x0070: 0x0063,
0x0071: 0x0064,
0x0072: 0x0065,
0x0073: 0x0066,
0x0074: 0x0067,
0x0075: 0x0068,
0x0076: 0x0069,
0x0077: 0x006a,
0x0078: 0x006b,
0x0079: 0x006c,
0x007a: 0x006d,
})
### Encoding Map
encoding_map = codecs.make_encoding_map(decoding_map)
### Filter API
def rot13(infile, outfile):
outfile.write(infile.read().encode('rot-13'))
if __name__ == '__main__':
import sys
rot13(sys.stdin, sys.stdout)
# -*- coding: iso-8859-1 -*-
""" Python 'escape' Codec
Written by Martin v. L�wis ([email protected]).
"""
import codecs
class Codec(codecs.Codec):
encode = codecs.escape_encode
decode = codecs.escape_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.escape_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.escape_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
def getregentry():
return codecs.CodecInfo(
name='string-escape',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python Character Mapping Codec tis_620 generated from 'python-mappings/TIS-620.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='tis-620',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
u'\x00' # 0x00 -> NULL
u'\x01' # 0x01 -> START OF HEADING
u'\x02' # 0x02 -> START OF TEXT
u'\x03' # 0x03 -> END OF TEXT
u'\x04' # 0x04 -> END OF TRANSMISSION
u'\x05' # 0x05 -> ENQUIRY
u'\x06' # 0x06 -> ACKNOWLEDGE
u'\x07' # 0x07 -> BELL
u'\x08' # 0x08 -> BACKSPACE
u'\t' # 0x09 -> HORIZONTAL TABULATION
u'\n' # 0x0A -> LINE FEED
u'\x0b' # 0x0B -> VERTICAL TABULATION
u'\x0c' # 0x0C -> FORM FEED
u'\r' # 0x0D -> CARRIAGE RETURN
u'\x0e' # 0x0E -> SHIFT OUT
u'\x0f' # 0x0F -> SHIFT IN
u'\x10' # 0x10 -> DATA LINK ESCAPE
u'\x11' # 0x11 -> DEVICE CONTROL ONE
u'\x12' # 0x12 -> DEVICE CONTROL TWO
u'\x13' # 0x13 -> DEVICE CONTROL THREE
u'\x14' # 0x14 -> DEVICE CONTROL FOUR
u'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
u'\x16' # 0x16 -> SYNCHRONOUS IDLE
u'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
u'\x18' # 0x18 -> CANCEL
u'\x19' # 0x19 -> END OF MEDIUM
u'\x1a' # 0x1A -> SUBSTITUTE
u'\x1b' # 0x1B -> ESCAPE
u'\x1c' # 0x1C -> FILE SEPARATOR
u'\x1d' # 0x1D -> GROUP SEPARATOR
u'\x1e' # 0x1E -> RECORD SEPARATOR
u'\x1f' # 0x1F -> UNIT SEPARATOR
u' ' # 0x20 -> SPACE
u'!' # 0x21 -> EXCLAMATION MARK
u'"' # 0x22 -> QUOTATION MARK
u'#' # 0x23 -> NUMBER SIGN
u'$' # 0x24 -> DOLLAR SIGN
u'%' # 0x25 -> PERCENT SIGN
u'&' # 0x26 -> AMPERSAND
u"'" # 0x27 -> APOSTROPHE
u'(' # 0x28 -> LEFT PARENTHESIS
u')' # 0x29 -> RIGHT PARENTHESIS
u'*' # 0x2A -> ASTERISK
u'+' # 0x2B -> PLUS SIGN
u',' # 0x2C -> COMMA
u'-' # 0x2D -> HYPHEN-MINUS
u'.' # 0x2E -> FULL STOP
u'/' # 0x2F -> SOLIDUS
u'0' # 0x30 -> DIGIT ZERO
u'1' # 0x31 -> DIGIT ONE
u'2' # 0x32 -> DIGIT TWO
u'3' # 0x33 -> DIGIT THREE
u'4' # 0x34 -> DIGIT FOUR
u'5' # 0x35 -> DIGIT FIVE
u'6' # 0x36 -> DIGIT SIX
u'7' # 0x37 -> DIGIT SEVEN
u'8' # 0x38 -> DIGIT EIGHT
u'9' # 0x39 -> DIGIT NINE
u':' # 0x3A -> COLON
u';' # 0x3B -> SEMICOLON
u'<' # 0x3C -> LESS-THAN SIGN
u'=' # 0x3D -> EQUALS SIGN
u'>' # 0x3E -> GREATER-THAN SIGN
u'?' # 0x3F -> QUESTION MARK
u'@' # 0x40 -> COMMERCIAL AT
u'A' # 0x41 -> LATIN CAPITAL LETTER A
u'B' # 0x42 -> LATIN CAPITAL LETTER B
u'C' # 0x43 -> LATIN CAPITAL LETTER C
u'D' # 0x44 -> LATIN CAPITAL LETTER D
u'E' # 0x45 -> LATIN CAPITAL LETTER E
u'F' # 0x46 -> LATIN CAPITAL LETTER F
u'G' # 0x47 -> LATIN CAPITAL LETTER G
u'H' # 0x48 -> LATIN CAPITAL LETTER H
u'I' # 0x49 -> LATIN CAPITAL LETTER I
u'J' # 0x4A -> LATIN CAPITAL LETTER J
u'K' # 0x4B -> LATIN CAPITAL LETTER K
u'L' # 0x4C -> LATIN CAPITAL LETTER L
u'M' # 0x4D -> LATIN CAPITAL LETTER M
u'N' # 0x4E -> LATIN CAPITAL LETTER N
u'O' # 0x4F -> LATIN CAPITAL LETTER O
u'P' # 0x50 -> LATIN CAPITAL LETTER P
u'Q' # 0x51 -> LATIN CAPITAL LETTER Q
u'R' # 0x52 -> LATIN CAPITAL LETTER R
u'S' # 0x53 -> LATIN CAPITAL LETTER S
u'T' # 0x54 -> LATIN CAPITAL LETTER T
u'U' # 0x55 -> LATIN CAPITAL LETTER U
u'V' # 0x56 -> LATIN CAPITAL LETTER V
u'W' # 0x57 -> LATIN CAPITAL LETTER W
u'X' # 0x58 -> LATIN CAPITAL LETTER X
u'Y' # 0x59 -> LATIN CAPITAL LETTER Y
u'Z' # 0x5A -> LATIN CAPITAL LETTER Z
u'[' # 0x5B -> LEFT SQUARE BRACKET
u'\\' # 0x5C -> REVERSE SOLIDUS
u']' # 0x5D -> RIGHT SQUARE BRACKET
u'^' # 0x5E -> CIRCUMFLEX ACCENT
u'_' # 0x5F -> LOW LINE
u'`' # 0x60 -> GRAVE ACCENT
u'a' # 0x61 -> LATIN SMALL LETTER A
u'b' # 0x62 -> LATIN SMALL LETTER B
u'c' # 0x63 -> LATIN SMALL LETTER C
u'd' # 0x64 -> LATIN SMALL LETTER D
u'e' # 0x65 -> LATIN SMALL LETTER E
u'f' # 0x66 -> LATIN SMALL LETTER F
u'g' # 0x67 -> LATIN SMALL LETTER G
u'h' # 0x68 -> LATIN SMALL LETTER H
u'i' # 0x69 -> LATIN SMALL LETTER I
u'j' # 0x6A -> LATIN SMALL LETTER J
u'k' # 0x6B -> LATIN SMALL LETTER K
u'l' # 0x6C -> LATIN SMALL LETTER L
u'm' # 0x6D -> LATIN SMALL LETTER M
u'n' # 0x6E -> LATIN SMALL LETTER N
u'o' # 0x6F -> LATIN SMALL LETTER O
u'p' # 0x70 -> LATIN SMALL LETTER P
u'q' # 0x71 -> LATIN SMALL LETTER Q
u'r' # 0x72 -> LATIN SMALL LETTER R
u's' # 0x73 -> LATIN SMALL LETTER S
u't' # 0x74 -> LATIN SMALL LETTER T
u'u' # 0x75 -> LATIN SMALL LETTER U
u'v' # 0x76 -> LATIN SMALL LETTER V
u'w' # 0x77 -> LATIN SMALL LETTER W
u'x' # 0x78 -> LATIN SMALL LETTER X
u'y' # 0x79 -> LATIN SMALL LETTER Y
u'z' # 0x7A -> LATIN SMALL LETTER Z
u'{' # 0x7B -> LEFT CURLY BRACKET
u'|' # 0x7C -> VERTICAL LINE
u'}' # 0x7D -> RIGHT CURLY BRACKET
u'~' # 0x7E -> TILDE
u'\x7f' # 0x7F -> DELETE
u'\x80' # 0x80 -> <control>
u'\x81' # 0x81 -> <control>
u'\x82' # 0x82 -> <control>
u'\x83' # 0x83 -> <control>
u'\x84' # 0x84 -> <control>
u'\x85' # 0x85 -> <control>
u'\x86' # 0x86 -> <control>
u'\x87' # 0x87 -> <control>
u'\x88' # 0x88 -> <control>
u'\x89' # 0x89 -> <control>
u'\x8a' # 0x8A -> <control>
u'\x8b' # 0x8B -> <control>
u'\x8c' # 0x8C -> <control>
u'\x8d' # 0x8D -> <control>
u'\x8e' # 0x8E -> <control>
u'\x8f' # 0x8F -> <control>
u'\x90' # 0x90 -> <control>
u'\x91' # 0x91 -> <control>
u'\x92' # 0x92 -> <control>
u'\x93' # 0x93 -> <control>
u'\x94' # 0x94 -> <control>
u'\x95' # 0x95 -> <control>
u'\x96' # 0x96 -> <control>
u'\x97' # 0x97 -> <control>
u'\x98' # 0x98 -> <control>
u'\x99' # 0x99 -> <control>
u'\x9a' # 0x9A -> <control>
u'\x9b' # 0x9B -> <control>
u'\x9c' # 0x9C -> <control>
u'\x9d' # 0x9D -> <control>
u'\x9e' # 0x9E -> <control>
u'\x9f' # 0x9F -> <control>
u'\ufffe'
u'\u0e01' # 0xA1 -> THAI CHARACTER KO KAI
u'\u0e02' # 0xA2 -> THAI CHARACTER KHO KHAI
u'\u0e03' # 0xA3 -> THAI CHARACTER KHO KHUAT
u'\u0e04' # 0xA4 -> THAI CHARACTER KHO KHWAI
u'\u0e05' # 0xA5 -> THAI CHARACTER KHO KHON
u'\u0e06' # 0xA6 -> THAI CHARACTER KHO RAKHANG
u'\u0e07' # 0xA7 -> THAI CHARACTER NGO NGU
u'\u0e08' # 0xA8 -> THAI CHARACTER CHO CHAN
u'\u0e09' # 0xA9 -> THAI CHARACTER CHO CHING
u'\u0e0a' # 0xAA -> THAI CHARACTER CHO CHANG
u'\u0e0b' # 0xAB -> THAI CHARACTER SO SO
u'\u0e0c' # 0xAC -> THAI CHARACTER CHO CHOE
u'\u0e0d' # 0xAD -> THAI CHARACTER YO YING
u'\u0e0e' # 0xAE -> THAI CHARACTER DO CHADA
u'\u0e0f' # 0xAF -> THAI CHARACTER TO PATAK
u'\u0e10' # 0xB0 -> THAI CHARACTER THO THAN
u'\u0e11' # 0xB1 -> THAI CHARACTER THO NANGMONTHO
u'\u0e12' # 0xB2 -> THAI CHARACTER THO PHUTHAO
u'\u0e13' # 0xB3 -> THAI CHARACTER NO NEN
u'\u0e14' # 0xB4 -> THAI CHARACTER DO DEK
u'\u0e15' # 0xB5 -> THAI CHARACTER TO TAO
u'\u0e16' # 0xB6 -> THAI CHARACTER THO THUNG
u'\u0e17' # 0xB7 -> THAI CHARACTER THO THAHAN
u'\u0e18' # 0xB8 -> THAI CHARACTER THO THONG
u'\u0e19' # 0xB9 -> THAI CHARACTER NO NU
u'\u0e1a' # 0xBA -> THAI CHARACTER BO BAIMAI
u'\u0e1b' # 0xBB -> THAI CHARACTER PO PLA
u'\u0e1c' # 0xBC -> THAI CHARACTER PHO PHUNG
u'\u0e1d' # 0xBD -> THAI CHARACTER FO FA
u'\u0e1e' # 0xBE -> THAI CHARACTER PHO PHAN
u'\u0e1f' # 0xBF -> THAI CHARACTER FO FAN
u'\u0e20' # 0xC0 -> THAI CHARACTER PHO SAMPHAO
u'\u0e21' # 0xC1 -> THAI CHARACTER MO MA
u'\u0e22' # 0xC2 -> THAI CHARACTER YO YAK
u'\u0e23' # 0xC3 -> THAI CHARACTER RO RUA
u'\u0e24' # 0xC4 -> THAI CHARACTER RU
u'\u0e25' # 0xC5 -> THAI CHARACTER LO LING
u'\u0e26' # 0xC6 -> THAI CHARACTER LU
u'\u0e27' # 0xC7 -> THAI CHARACTER WO WAEN
u'\u0e28' # 0xC8 -> THAI CHARACTER SO SALA
u'\u0e29' # 0xC9 -> THAI CHARACTER SO RUSI
u'\u0e2a' # 0xCA -> THAI CHARACTER SO SUA
u'\u0e2b' # 0xCB -> THAI CHARACTER HO HIP
u'\u0e2c' # 0xCC -> THAI CHARACTER LO CHULA
u'\u0e2d' # 0xCD -> THAI CHARACTER O ANG
u'\u0e2e' # 0xCE -> THAI CHARACTER HO NOKHUK
u'\u0e2f' # 0xCF -> THAI CHARACTER PAIYANNOI
u'\u0e30' # 0xD0 -> THAI CHARACTER SARA A
u'\u0e31' # 0xD1 -> THAI CHARACTER MAI HAN-AKAT
u'\u0e32' # 0xD2 -> THAI CHARACTER SARA AA
u'\u0e33' # 0xD3 -> THAI CHARACTER SARA AM
u'\u0e34' # 0xD4 -> THAI CHARACTER SARA I
u'\u0e35' # 0xD5 -> THAI CHARACTER SARA II
u'\u0e36' # 0xD6 -> THAI CHARACTER SARA UE
u'\u0e37' # 0xD7 -> THAI CHARACTER SARA UEE
u'\u0e38' # 0xD8 -> THAI CHARACTER SARA U
u'\u0e39' # 0xD9 -> THAI CHARACTER SARA UU
u'\u0e3a' # 0xDA -> THAI CHARACTER PHINTHU
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\u0e3f' # 0xDF -> THAI CURRENCY SYMBOL BAHT
u'\u0e40' # 0xE0 -> THAI CHARACTER SARA E
u'\u0e41' # 0xE1 -> THAI CHARACTER SARA AE
u'\u0e42' # 0xE2 -> THAI CHARACTER SARA O
u'\u0e43' # 0xE3 -> THAI CHARACTER SARA AI MAIMUAN
u'\u0e44' # 0xE4 -> THAI CHARACTER SARA AI MAIMALAI
u'\u0e45' # 0xE5 -> THAI CHARACTER LAKKHANGYAO
u'\u0e46' # 0xE6 -> THAI CHARACTER MAIYAMOK
u'\u0e47' # 0xE7 -> THAI CHARACTER MAITAIKHU
u'\u0e48' # 0xE8 -> THAI CHARACTER MAI EK
u'\u0e49' # 0xE9 -> THAI CHARACTER MAI THO
u'\u0e4a' # 0xEA -> THAI CHARACTER MAI TRI
u'\u0e4b' # 0xEB -> THAI CHARACTER MAI CHATTAWA
u'\u0e4c' # 0xEC -> THAI CHARACTER THANTHAKHAT
u'\u0e4d' # 0xED -> THAI CHARACTER NIKHAHIT
u'\u0e4e' # 0xEE -> THAI CHARACTER YAMAKKAN
u'\u0e4f' # 0xEF -> THAI CHARACTER FONGMAN
u'\u0e50' # 0xF0 -> THAI DIGIT ZERO
u'\u0e51' # 0xF1 -> THAI DIGIT ONE
u'\u0e52' # 0xF2 -> THAI DIGIT TWO
u'\u0e53' # 0xF3 -> THAI DIGIT THREE
u'\u0e54' # 0xF4 -> THAI DIGIT FOUR
u'\u0e55' # 0xF5 -> THAI DIGIT FIVE
u'\u0e56' # 0xF6 -> THAI DIGIT SIX
u'\u0e57' # 0xF7 -> THAI DIGIT SEVEN
u'\u0e58' # 0xF8 -> THAI DIGIT EIGHT
u'\u0e59' # 0xF9 -> THAI DIGIT NINE
u'\u0e5a' # 0xFA -> THAI CHARACTER ANGKHANKHU
u'\u0e5b' # 0xFB -> THAI CHARACTER KHOMUT
u'\ufffe'
u'\ufffe'
u'\ufffe'
u'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python 'undefined' Codec
This codec will always raise a ValueError exception when being
used. It is intended for use by the site.py file to switch off
automatic string to Unicode coercion.
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
raise UnicodeError("undefined encoding")
def decode(self,input,errors='strict'):
raise UnicodeError("undefined encoding")
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
raise UnicodeError("undefined encoding")
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
raise UnicodeError("undefined encoding")
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='undefined',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python 'unicode-escape' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.unicode_escape_encode
decode = codecs.unicode_escape_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.unicode_escape_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.unicode_escape_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='unicode-escape',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python 'unicode-internal' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.unicode_internal_encode
decode = codecs.unicode_internal_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.unicode_internal_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.unicode_internal_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='unicode-internal',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python 'utf-16' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs, sys
### Codec APIs
encode = codecs.utf_16_encode
def decode(input, errors='strict'):
return codecs.utf_16_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
codecs.IncrementalEncoder.__init__(self, errors)
self.encoder = None
def encode(self, input, final=False):
if self.encoder is None:
result = codecs.utf_16_encode(input, self.errors)[0]
if sys.byteorder == 'little':
self.encoder = codecs.utf_16_le_encode
else:
self.encoder = codecs.utf_16_be_encode
return result
return self.encoder(input, self.errors)[0]
def reset(self):
codecs.IncrementalEncoder.reset(self)
self.encoder = None
def getstate(self):
# state info we return to the caller:
# 0: stream is in natural order for this platform
# 2: endianness hasn't been determined yet
# (we're never writing in unnatural order)
return (2 if self.encoder is None else 0)
def setstate(self, state):
if state:
self.encoder = None
else:
if sys.byteorder == 'little':
self.encoder = codecs.utf_16_le_encode
else:
self.encoder = codecs.utf_16_be_encode
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
def __init__(self, errors='strict'):
codecs.BufferedIncrementalDecoder.__init__(self, errors)
self.decoder = None
def _buffer_decode(self, input, errors, final):
if self.decoder is None:
(output, consumed, byteorder) = \
codecs.utf_16_ex_decode(input, errors, 0, final)
if byteorder == -1:
self.decoder = codecs.utf_16_le_decode
elif byteorder == 1:
self.decoder = codecs.utf_16_be_decode
elif consumed >= 2:
raise UnicodeError("UTF-16 stream does not start with BOM")
return (output, consumed)
return self.decoder(input, self.errors, final)
def reset(self):
codecs.BufferedIncrementalDecoder.reset(self)
self.decoder = None
class StreamWriter(codecs.StreamWriter):
def __init__(self, stream, errors='strict'):
codecs.StreamWriter.__init__(self, stream, errors)
self.encoder = None
def reset(self):
codecs.StreamWriter.reset(self)
self.encoder = None
def encode(self, input, errors='strict'):
if self.encoder is None:
result = codecs.utf_16_encode(input, errors)
if sys.byteorder == 'little':
self.encoder = codecs.utf_16_le_encode
else:
self.encoder = codecs.utf_16_be_encode
return result
else:
return self.encoder(input, errors)
class StreamReader(codecs.StreamReader):
def reset(self):
codecs.StreamReader.reset(self)
try:
del self.decode
except AttributeError:
pass
def decode(self, input, errors='strict'):
(object, consumed, byteorder) = \
codecs.utf_16_ex_decode(input, errors, 0, False)
if byteorder == -1:
self.decode = codecs.utf_16_le_decode
elif byteorder == 1:
self.decode = codecs.utf_16_be_decode
elif consumed>=2:
raise UnicodeError,"UTF-16 stream does not start with BOM"
return (object, consumed)
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-16',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-16-be' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
encode = codecs.utf_16_be_encode
def decode(input, errors='strict'):
return codecs.utf_16_be_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_16_be_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_16_be_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_16_be_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_16_be_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-16-be',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-16-le' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
encode = codecs.utf_16_le_encode
def decode(input, errors='strict'):
return codecs.utf_16_le_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_16_le_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_16_le_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_16_le_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_16_le_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-16-le',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""
Python 'utf-32' Codec
"""
import codecs, sys
### Codec APIs
encode = codecs.utf_32_encode
def decode(input, errors='strict'):
return codecs.utf_32_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
codecs.IncrementalEncoder.__init__(self, errors)
self.encoder = None
def encode(self, input, final=False):
if self.encoder is None:
result = codecs.utf_32_encode(input, self.errors)[0]
if sys.byteorder == 'little':
self.encoder = codecs.utf_32_le_encode
else:
self.encoder = codecs.utf_32_be_encode
return result
return self.encoder(input, self.errors)[0]
def reset(self):
codecs.IncrementalEncoder.reset(self)
self.encoder = None
def getstate(self):
# state info we return to the caller:
# 0: stream is in natural order for this platform
# 2: endianness hasn't been determined yet
# (we're never writing in unnatural order)
return (2 if self.encoder is None else 0)
def setstate(self, state):
if state:
self.encoder = None
else:
if sys.byteorder == 'little':
self.encoder = codecs.utf_32_le_encode
else:
self.encoder = codecs.utf_32_be_encode
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
def __init__(self, errors='strict'):
codecs.BufferedIncrementalDecoder.__init__(self, errors)
self.decoder = None
def _buffer_decode(self, input, errors, final):
if self.decoder is None:
(output, consumed, byteorder) = \
codecs.utf_32_ex_decode(input, errors, 0, final)
if byteorder == -1:
self.decoder = codecs.utf_32_le_decode
elif byteorder == 1:
self.decoder = codecs.utf_32_be_decode
elif consumed >= 4:
raise UnicodeError("UTF-32 stream does not start with BOM")
return (output, consumed)
return self.decoder(input, self.errors, final)
def reset(self):
codecs.BufferedIncrementalDecoder.reset(self)
self.decoder = None
def getstate(self):
# additional state info from the base class must be None here,
# as it isn't passed along to the caller
state = codecs.BufferedIncrementalDecoder.getstate(self)[0]
# additional state info we pass to the caller:
# 0: stream is in natural order for this platform
# 1: stream is in unnatural order
# 2: endianness hasn't been determined yet
if self.decoder is None:
return (state, 2)
addstate = int((sys.byteorder == "big") !=
(self.decoder is codecs.utf_32_be_decode))
return (state, addstate)
def setstate(self, state):
# state[1] will be ignored by BufferedIncrementalDecoder.setstate()
codecs.BufferedIncrementalDecoder.setstate(self, state)
state = state[1]
if state == 0:
self.decoder = (codecs.utf_32_be_decode
if sys.byteorder == "big"
else codecs.utf_32_le_decode)
elif state == 1:
self.decoder = (codecs.utf_32_le_decode
if sys.byteorder == "big"
else codecs.utf_32_be_decode)
else:
self.decoder = None
class StreamWriter(codecs.StreamWriter):
def __init__(self, stream, errors='strict'):
self.encoder = None
codecs.StreamWriter.__init__(self, stream, errors)
def reset(self):
codecs.StreamWriter.reset(self)
self.encoder = None
def encode(self, input, errors='strict'):
if self.encoder is None:
result = codecs.utf_32_encode(input, errors)
if sys.byteorder == 'little':
self.encoder = codecs.utf_32_le_encode
else:
self.encoder = codecs.utf_32_be_encode
return result
else:
return self.encoder(input, errors)
class StreamReader(codecs.StreamReader):
def reset(self):
codecs.StreamReader.reset(self)
try:
del self.decode
except AttributeError:
pass
def decode(self, input, errors='strict'):
(object, consumed, byteorder) = \
codecs.utf_32_ex_decode(input, errors, 0, False)
if byteorder == -1:
self.decode = codecs.utf_32_le_decode
elif byteorder == 1:
self.decode = codecs.utf_32_be_decode
elif consumed>=4:
raise UnicodeError,"UTF-32 stream does not start with BOM"
return (object, consumed)
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-32',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""
Python 'utf-32-be' Codec
"""
import codecs
### Codec APIs
encode = codecs.utf_32_be_encode
def decode(input, errors='strict'):
return codecs.utf_32_be_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_32_be_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_32_be_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_32_be_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_32_be_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-32-be',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""
Python 'utf-32-le' Codec
"""
import codecs
### Codec APIs
encode = codecs.utf_32_le_encode
def decode(input, errors='strict'):
return codecs.utf_32_le_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_32_le_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_32_le_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_32_le_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_32_le_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-32-le',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-7' Codec
Written by Brian Quinlan ([email protected]).
"""
import codecs
### Codec APIs
encode = codecs.utf_7_encode
def decode(input, errors='strict'):
return codecs.utf_7_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_7_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_7_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_7_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_7_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-7',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-8' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
encode = codecs.utf_8_encode
def decode(input, errors='strict'):
return codecs.utf_8_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_8_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_8_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_8_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_8_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-8',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-8-sig' Codec
This work similar to UTF-8 with the following changes:
* On encoding/writing a UTF-8 encoded BOM will be prepended/written as the
first three bytes.
* On decoding/reading if the first three bytes are a UTF-8 encoded BOM, these
bytes will be skipped.
"""
import codecs
### Codec APIs
def encode(input, errors='strict'):
return (codecs.BOM_UTF8 + codecs.utf_8_encode(input, errors)[0], len(input))
def decode(input, errors='strict'):
prefix = 0
if input[:3] == codecs.BOM_UTF8:
input = input[3:]
prefix = 3
(output, consumed) = codecs.utf_8_decode(input, errors, True)
return (output, consumed+prefix)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
codecs.IncrementalEncoder.__init__(self, errors)
self.first = 1
def encode(self, input, final=False):
if self.first:
self.first = 0
return codecs.BOM_UTF8 + codecs.utf_8_encode(input, self.errors)[0]
else:
return codecs.utf_8_encode(input, self.errors)[0]
def reset(self):
codecs.IncrementalEncoder.reset(self)
self.first = 1
def getstate(self):
return self.first
def setstate(self, state):
self.first = state
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
def __init__(self, errors='strict'):
codecs.BufferedIncrementalDecoder.__init__(self, errors)
self.first = True
def _buffer_decode(self, input, errors, final):
if self.first:
if len(input) < 3:
if codecs.BOM_UTF8.startswith(input):
# not enough data to decide if this really is a BOM
# => try again on the next call
return (u"", 0)
else:
self.first = None
else:
self.first = None
if input[:3] == codecs.BOM_UTF8:
(output, consumed) = codecs.utf_8_decode(input[3:], errors, final)
return (output, consumed+3)
return codecs.utf_8_decode(input, errors, final)
def reset(self):
codecs.BufferedIncrementalDecoder.reset(self)
self.first = True
class StreamWriter(codecs.StreamWriter):
def reset(self):
codecs.StreamWriter.reset(self)
try:
del self.encode
except AttributeError:
pass
def encode(self, input, errors='strict'):
self.encode = codecs.utf_8_encode
return encode(input, errors)
class StreamReader(codecs.StreamReader):
def reset(self):
codecs.StreamReader.reset(self)
try:
del self.decode
except AttributeError:
pass
def decode(self, input, errors='strict'):
if len(input) < 3:
if codecs.BOM_UTF8.startswith(input):
# not enough data to decide if this is a BOM
# => try again on the next call
return (u"", 0)
elif input[:3] == codecs.BOM_UTF8:
self.decode = codecs.utf_8_decode
(output, consumed) = codecs.utf_8_decode(input[3:],errors)
return (output, consumed+3)
# (else) no BOM present
self.decode = codecs.utf_8_decode
return codecs.utf_8_decode(input, errors)
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-8-sig',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'uu_codec' Codec - UU content transfer encoding
Unlike most of the other codecs which target Unicode, this codec
will return Python string objects for both encode and decode.
Written by Marc-Andre Lemburg ([email protected]). Some details were
adapted from uu.py which was written by Lance Ellinghouse and
modified by Jack Jansen and Fredrik Lundh.
"""
import codecs, binascii
### Codec APIs
def uu_encode(input,errors='strict',filename='<data>',mode=0666):
""" Encodes the object input and returns a tuple (output
object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
from cStringIO import StringIO
from binascii import b2a_uu
# using str() because of cStringIO's Unicode undesired Unicode behavior.
infile = StringIO(str(input))
outfile = StringIO()
read = infile.read
write = outfile.write
# Remove newline chars from filename
filename = filename.replace('\n','\\n')
filename = filename.replace('\r','\\r')
# Encode
write('begin %o %s\n' % (mode & 0777, filename))
chunk = read(45)
while chunk:
write(b2a_uu(chunk))
chunk = read(45)
write(' \nend\n')
return (outfile.getvalue(), len(input))
def uu_decode(input,errors='strict'):
""" Decodes the object input and returns a tuple (output
object, length consumed).
input must be an object which provides the bf_getreadbuf
buffer slot. Python strings, buffer objects and memory
mapped files are examples of objects providing this slot.
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
Note: filename and file mode information in the input data is
ignored.
"""
assert errors == 'strict'
from cStringIO import StringIO
from binascii import a2b_uu
infile = StringIO(str(input))
outfile = StringIO()
readline = infile.readline
write = outfile.write
# Find start of encoded data
while 1:
s = readline()
if not s:
raise ValueError, 'Missing "begin" line in input data'
if s[:5] == 'begin':
break
# Decode
while 1:
s = readline()
if not s or \
s == 'end\n':
break
try:
data = a2b_uu(s)
except binascii.Error, v:
# Workaround for broken uuencoders by /Fredrik Lundh
nbytes = (((ord(s[0])-32) & 63) * 4 + 5) // 3
data = a2b_uu(s[:nbytes])
#sys.stderr.write("Warning: %s\n" % str(v))
write(data)
if not s:
raise ValueError, 'Truncated input data'
return (outfile.getvalue(), len(input))
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return uu_encode(input,errors)
def decode(self,input,errors='strict'):
return uu_decode(input,errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return uu_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return uu_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='uu',
encode=uu_encode,
decode=uu_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
_is_text_encoding=False,
)
""" Python 'zlib_codec' Codec - zlib compression encoding
Unlike most of the other codecs which target Unicode, this codec
will return Python string objects for both encode and decode.
Written by Marc-Andre Lemburg ([email protected]).
"""
import codecs
import zlib # this codec needs the optional zlib module !
### Codec APIs
def zlib_encode(input,errors='strict'):
""" Encodes the object input and returns a tuple (output
object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = zlib.compress(input)
return (output, len(input))
def zlib_decode(input,errors='strict'):
""" Decodes the object input and returns a tuple (output
object, length consumed).
input must be an object which provides the bf_getreadbuf
buffer slot. Python strings, buffer objects and memory
mapped files are examples of objects providing this slot.
errors defines the error handling to apply. It defaults to
'strict' handling which is the only currently supported
error handling for this codec.
"""
assert errors == 'strict'
output = zlib.decompress(input)
return (output, len(input))
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return zlib_encode(input, errors)
def decode(self, input, errors='strict'):
return zlib_decode(input, errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
assert errors == 'strict'
self.errors = errors
self.compressobj = zlib.compressobj()
def encode(self, input, final=False):
if final:
c = self.compressobj.compress(input)
return c + self.compressobj.flush()
else:
return self.compressobj.compress(input)
def reset(self):
self.compressobj = zlib.compressobj()
class IncrementalDecoder(codecs.IncrementalDecoder):
def __init__(self, errors='strict'):
assert errors == 'strict'
self.errors = errors
self.decompressobj = zlib.decompressobj()
def decode(self, input, final=False):
if final:
c = self.decompressobj.decompress(input)
return c + self.decompressobj.flush()
else:
return self.decompressobj.decompress(input)
def reset(self):
self.decompressobj = zlib.decompressobj()
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='zlib',
encode=zlib_encode,
decode=zlib_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
_is_text_encoding=False,
)
""" Standard "encodings" Package
Standard Python encoding modules are stored in this package
directory.
Codec modules must have names corresponding to normalized encoding
names as defined in the normalize_encoding() function below, e.g.
'utf-8' must be implemented by the module 'utf_8.py'.
Each codec module must export the following interface:
* getregentry() -> codecs.CodecInfo object
The getregentry() API must a CodecInfo object with encoder, decoder,
incrementalencoder, incrementaldecoder, streamwriter and streamreader
atttributes which adhere to the Python Codec Interface Standard.
In addition, a module may optionally also define the following
APIs which are then used by the package's codec search function:
* getaliases() -> sequence of encoding name strings to use as aliases
Alias names returned by getaliases() must be normalized encoding
names as defined by normalize_encoding().
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""#"
import codecs
from encodings import aliases
import __builtin__
_cache = {}
_unknown = '--unknown--'
_import_tail = ['*']
_norm_encoding_map = (' . '
'0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ '
' abcdefghijklmnopqrstuvwxyz '
' '
' '
' ')
_aliases = aliases.aliases
class CodecRegistryError(LookupError, SystemError):
pass
def normalize_encoding(encoding):
""" Normalize an encoding name.
Normalization works as follows: all non-alphanumeric
characters except the dot used for Python package names are
collapsed and replaced with a single underscore, e.g. ' -;#'
becomes '_'. Leading and trailing underscores are removed.
Note that encoding names should be ASCII only; if they do use
non-ASCII characters, these must be Latin-1 compatible.
"""
# Make sure we have an 8-bit string, because .translate() works
# differently for Unicode strings.
if hasattr(__builtin__, "unicode") and isinstance(encoding, unicode):
# Note that .encode('latin-1') does *not* use the codec
# registry, so this call doesn't recurse. (See unicodeobject.c
# PyUnicode_AsEncodedString() for details)
encoding = encoding.encode('latin-1')
return '_'.join(encoding.translate(_norm_encoding_map).split())
def search_function(encoding):
# Cache lookup
entry = _cache.get(encoding, _unknown)
if entry is not _unknown:
return entry
# Import the module:
#
# First try to find an alias for the normalized encoding
# name and lookup the module using the aliased name, then try to
# lookup the module using the standard import scheme, i.e. first
# try in the encodings package, then at top-level.
#
norm_encoding = normalize_encoding(encoding)
aliased_encoding = _aliases.get(norm_encoding) or \
_aliases.get(norm_encoding.replace('.', '_'))
if aliased_encoding is not None:
modnames = [aliased_encoding,
norm_encoding]
else:
modnames = [norm_encoding]
for modname in modnames:
if not modname or '.' in modname:
continue
try:
# Import is absolute to prevent the possibly malicious import of a
# module with side-effects that is not in the 'encodings' package.
mod = __import__('encodings.' + modname, fromlist=_import_tail,
level=0)
except ImportError:
pass
else:
break
else:
mod = None
try:
getregentry = mod.getregentry
except AttributeError:
# Not a codec module
mod = None
if mod is None:
# Cache misses
_cache[encoding] = None
return None
# Now ask the module for the registry entry
entry = getregentry()
if not isinstance(entry, codecs.CodecInfo):
if not 4 <= len(entry) <= 7:
raise CodecRegistryError,\
'module "%s" (%s) failed to register' % \
(mod.__name__, mod.__file__)
if not hasattr(entry[0], '__call__') or \
not hasattr(entry[1], '__call__') or \
(entry[2] is not None and not hasattr(entry[2], '__call__')) or \
(entry[3] is not None and not hasattr(entry[3], '__call__')) or \
(len(entry) > 4 and entry[4] is not None and not hasattr(entry[4], '__call__')) or \
(len(entry) > 5 and entry[5] is not None and not hasattr(entry[5], '__call__')):
raise CodecRegistryError,\
'incompatible codecs in module "%s" (%s)' % \
(mod.__name__, mod.__file__)
if len(entry)<7 or entry[6] is None:
entry += (None,)*(6-len(entry)) + (mod.__name__.split(".", 1)[1],)
entry = codecs.CodecInfo(*entry)
# Cache the codec registry entry
_cache[encoding] = entry
# Register its aliases (without overwriting previously registered
# aliases)
try:
codecaliases = mod.getaliases()
except AttributeError:
pass
else:
for alias in codecaliases:
if alias not in _aliases:
_aliases[alias] = modname
# Return the registry entry
return entry
# Register the search_function in the Python codec registry
codecs.register(search_function)
"""Basic pip uninstallation support, helper for the Windows uninstaller"""
import argparse
import ensurepip
import sys
def _main(argv=None):
parser = argparse.ArgumentParser(prog="python -m ensurepip._uninstall")
parser.add_argument(
"--version",
action="version",
version="pip {}".format(ensurepip.version()),
help="Show the version of pip this will attempt to uninstall.",
)
parser.add_argument(
"-v", "--verbose",
action="count",
default=0,
dest="verbosity",
help=("Give more output. Option is additive, and can be used up to 3 "
"times."),
)
args = parser.parse_args(argv)
return ensurepip._uninstall_helper(verbosity=args.verbosity)
if __name__ == "__main__":
sys.exit(_main())
#!/usr/bin/env python2
from __future__ import print_function
import os
import os.path
import pkgutil
import shutil
import sys
import tempfile
__all__ = ["version", "bootstrap"]
_SETUPTOOLS_VERSION = "41.2.0"
_PIP_VERSION = "19.2.3"
_PROJECTS = [
("setuptools", _SETUPTOOLS_VERSION),
("pip", _PIP_VERSION),
]
def _run_pip(args, additional_paths=None):
# Add our bundled software to the sys.path so we can import it
if additional_paths is not None:
sys.path = additional_paths + sys.path
# Install the bundled software
import pip._internal
return pip._internal.main(args)
def version():
"""
Returns a string specifying the bundled version of pip.
"""
return _PIP_VERSION
def _disable_pip_configuration_settings():
# We deliberately ignore all pip environment variables
# when invoking pip
# See http://bugs.python.org/issue19734 for details
keys_to_remove = [k for k in os.environ if k.startswith("PIP_")]
for k in keys_to_remove:
del os.environ[k]
# We also ignore the settings in the default pip configuration file
# See http://bugs.python.org/issue20053 for details
os.environ['PIP_CONFIG_FILE'] = os.devnull
def bootstrap(root=None, upgrade=False, user=False,
altinstall=False, default_pip=True,
verbosity=0):
"""
Bootstrap pip into the current Python installation (or the given root
directory).
Note that calling this function will alter both sys.path and os.environ.
"""
# Discard the return value
_bootstrap(root=root, upgrade=upgrade, user=user,
altinstall=altinstall, default_pip=default_pip,
verbosity=verbosity)
def _bootstrap(root=None, upgrade=False, user=False,
altinstall=False, default_pip=True,
verbosity=0):
"""
Bootstrap pip into the current Python installation (or the given root
directory). Returns pip command status code.
Note that calling this function will alter both sys.path and os.environ.
"""
if altinstall and default_pip:
raise ValueError("Cannot use altinstall and default_pip together")
_disable_pip_configuration_settings()
# By default, installing pip and setuptools installs all of the
# following scripts (X.Y == running Python version):
#
# pip, pipX, pipX.Y, easy_install, easy_install-X.Y
#
# pip 1.5+ allows ensurepip to request that some of those be left out
if altinstall:
# omit pip, pipX and easy_install
os.environ["ENSUREPIP_OPTIONS"] = "altinstall"
elif not default_pip:
# omit pip and easy_install
os.environ["ENSUREPIP_OPTIONS"] = "install"
tmpdir = tempfile.mkdtemp()
try:
# Put our bundled wheels into a temporary directory and construct the
# additional paths that need added to sys.path
additional_paths = []
for project, version in _PROJECTS:
wheel_name = "{}-{}-py2.py3-none-any.whl".format(project, version)
whl = pkgutil.get_data(
"ensurepip",
"_bundled/{}".format(wheel_name),
)
with open(os.path.join(tmpdir, wheel_name), "wb") as fp:
fp.write(whl)
additional_paths.append(os.path.join(tmpdir, wheel_name))
# Construct the arguments to be passed to the pip command
args = ["install", "--no-index", "--find-links", tmpdir]
if sys.platform == 'cli': # IronPython doesn't do bytecode compilation
args += ["--no-compile"]
if root:
args += ["--root", root]
if upgrade:
args += ["--upgrade"]
if user:
args += ["--user"]
if verbosity:
args += ["-" + "v" * verbosity]
return _run_pip(args + [p[0] for p in _PROJECTS], additional_paths)
finally:
shutil.rmtree(tmpdir, ignore_errors=True)
def _uninstall_helper(verbosity=0):
"""Helper to support a clean default uninstall process on Windows
Note that calling this function may alter os.environ.
"""
# Nothing to do if pip was never installed, or has been removed
try:
import pip
except ImportError:
return
# If the pip version doesn't match the bundled one, leave it alone
if pip.__version__ != _PIP_VERSION:
msg = ("ensurepip will only uninstall a matching version "
"({!r} installed, {!r} bundled)")
print(msg.format(pip.__version__, _PIP_VERSION), file=sys.stderr)
return
_disable_pip_configuration_settings()
# Construct the arguments to be passed to the pip command
args = ["uninstall", "-y", "--disable-pip-version-check"]
if verbosity:
args += ["-" + "v" * verbosity]
return _run_pip(args + [p[0] for p in reversed(_PROJECTS)])
def _main(argv=None):
import argparse
parser = argparse.ArgumentParser(prog="python -m ensurepip")
parser.add_argument(
"--version",
action="version",
version="pip {}".format(version()),
help="Show the version of pip that is bundled with this Python.",
)
parser.add_argument(
"-v", "--verbose",
action="count",
default=0,
dest="verbosity",
help=("Give more output. Option is additive, and can be used up to 3 "
"times."),
)
parser.add_argument(
"-U", "--upgrade",
action="store_true",
default=False,
help="Upgrade pip and dependencies, even if already installed.",
)
parser.add_argument(
"--user",
action="store_true",
default=False,
help="Install using the user scheme.",
)
parser.add_argument(
"--root",
default=None,
help="Install everything relative to this alternate root directory.",
)
parser.add_argument(
"--altinstall",
action="store_true",
default=False,
help=("Make an alternate install, installing only the X.Y versioned "
"scripts (Default: pipX, pipX.Y, easy_install-X.Y)."),
)
parser.add_argument(
"--default-pip",
action="store_true",
default=True,
dest="default_pip",
help=argparse.SUPPRESS,
)
parser.add_argument(
"--no-default-pip",
action="store_false",
dest="default_pip",
help=("Make a non default install, installing only the X and X.Y "
"versioned scripts."),
)
args = parser.parse_args(argv)
return _bootstrap(
root=args.root,
upgrade=args.upgrade,
user=args.user,
verbosity=args.verbosity,
altinstall=args.altinstall,
default_pip=args.default_pip,
)
import ensurepip
import sys
if __name__ == "__main__":
sys.exit(ensurepip._main())
"""Utilities for comparing files and directories.
Classes:
dircmp
Functions:
cmp(f1, f2, shallow=1) -> int
cmpfiles(a, b, common) -> ([], [], [])
"""
import os
import stat
from itertools import ifilter, ifilterfalse, imap, izip
__all__ = ["cmp","dircmp","cmpfiles"]
_cache = {}
BUFSIZE=8*1024
def cmp(f1, f2, shallow=1):
"""Compare two files.
Arguments:
f1 -- First file name
f2 -- Second file name
shallow -- Just check stat signature (do not read the files).
defaults to 1.
Return value:
True if the files are the same, False otherwise.
This function uses a cache for past comparisons and the results,
with a cache invalidation mechanism relying on stale signatures.
"""
s1 = _sig(os.stat(f1))
s2 = _sig(os.stat(f2))
if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
return False
if shallow and s1 == s2:
return True
if s1[1] != s2[1]:
return False
outcome = _cache.get((f1, f2, s1, s2))
if outcome is None:
outcome = _do_cmp(f1, f2)
if len(_cache) > 100: # limit the maximum size of the cache
_cache.clear()
_cache[f1, f2, s1, s2] = outcome
return outcome
def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)
def _do_cmp(f1, f2):
bufsize = BUFSIZE
with open(f1, 'rb') as fp1, open(f2, 'rb') as fp2:
while True:
b1 = fp1.read(bufsize)
b2 = fp2.read(bufsize)
if b1 != b2:
return False
if not b1:
return True
# Directory comparison class.
#
class dircmp:
"""A class that manages the comparison of 2 directories.
dircmp(a,b,ignore=None,hide=None)
A and B are directories.
IGNORE is a list of names to ignore,
defaults to ['RCS', 'CVS', 'tags'].
HIDE is a list of names to hide,
defaults to [os.curdir, os.pardir].
High level usage:
x = dircmp(dir1, dir2)
x.report() -> prints a report on the differences between dir1 and dir2
or
x.report_partial_closure() -> prints report on differences between dir1
and dir2, and reports on common immediate subdirectories.
x.report_full_closure() -> like report_partial_closure,
but fully recursive.
Attributes:
left_list, right_list: The files in dir1 and dir2,
filtered by hide and ignore.
common: a list of names in both dir1 and dir2.
left_only, right_only: names only in dir1, dir2.
common_dirs: subdirectories in both dir1 and dir2.
common_files: files in both dir1 and dir2.
common_funny: names in both dir1 and dir2 where the type differs between
dir1 and dir2, or the name is not stat-able.
same_files: list of identical files.
diff_files: list of filenames which differ.
funny_files: list of files which could not be compared.
subdirs: a dictionary of dircmp objects, keyed by names in common_dirs.
"""
def __init__(self, a, b, ignore=None, hide=None): # Initialize
self.left = a
self.right = b
if hide is None:
self.hide = [os.curdir, os.pardir] # Names never to be shown
else:
self.hide = hide
if ignore is None:
self.ignore = ['RCS', 'CVS', 'tags'] # Names ignored in comparison
else:
self.ignore = ignore
def phase0(self): # Compare everything except common subdirectories
self.left_list = _filter(os.listdir(self.left),
self.hide+self.ignore)
self.right_list = _filter(os.listdir(self.right),
self.hide+self.ignore)
self.left_list.sort()
self.right_list.sort()
def phase1(self): # Compute common names
a = dict(izip(imap(os.path.normcase, self.left_list), self.left_list))
b = dict(izip(imap(os.path.normcase, self.right_list), self.right_list))
self.common = map(a.__getitem__, ifilter(b.__contains__, a))
self.left_only = map(a.__getitem__, ifilterfalse(b.__contains__, a))
self.right_only = map(b.__getitem__, ifilterfalse(a.__contains__, b))
def phase2(self): # Distinguish files, directories, funnies
self.common_dirs = []
self.common_files = []
self.common_funny = []
for x in self.common:
a_path = os.path.join(self.left, x)
b_path = os.path.join(self.right, x)
ok = 1
try:
a_stat = os.stat(a_path)
except os.error, why:
# print 'Can\'t stat', a_path, ':', why[1]
ok = 0
try:
b_stat = os.stat(b_path)
except os.error, why:
# print 'Can\'t stat', b_path, ':', why[1]
ok = 0
if ok:
a_type = stat.S_IFMT(a_stat.st_mode)
b_type = stat.S_IFMT(b_stat.st_mode)
if a_type != b_type:
self.common_funny.append(x)
elif stat.S_ISDIR(a_type):
self.common_dirs.append(x)
elif stat.S_ISREG(a_type):
self.common_files.append(x)
else:
self.common_funny.append(x)
else:
self.common_funny.append(x)
def phase3(self): # Find out differences between common files
xx = cmpfiles(self.left, self.right, self.common_files)
self.same_files, self.diff_files, self.funny_files = xx
def phase4(self): # Find out differences between common subdirectories
# A new dircmp object is created for each common subdirectory,
# these are stored in a dictionary indexed by filename.
# The hide and ignore properties are inherited from the parent
self.subdirs = {}
for x in self.common_dirs:
a_x = os.path.join(self.left, x)
b_x = os.path.join(self.right, x)
self.subdirs[x] = dircmp(a_x, b_x, self.ignore, self.hide)
def phase4_closure(self): # Recursively call phase4() on subdirectories
self.phase4()
for sd in self.subdirs.itervalues():
sd.phase4_closure()
def report(self): # Print a report on the differences between a and b
# Output format is purposely lousy
print 'diff', self.left, self.right
if self.left_only:
self.left_only.sort()
print 'Only in', self.left, ':', self.left_only
if self.right_only:
self.right_only.sort()
print 'Only in', self.right, ':', self.right_only
if self.same_files:
self.same_files.sort()
print 'Identical files :', self.same_files
if self.diff_files:
self.diff_files.sort()
print 'Differing files :', self.diff_files
if self.funny_files:
self.funny_files.sort()
print 'Trouble with common files :', self.funny_files
if self.common_dirs:
self.common_dirs.sort()
print 'Common subdirectories :', self.common_dirs
if self.common_funny:
self.common_funny.sort()
print 'Common funny cases :', self.common_funny
def report_partial_closure(self): # Print reports on self and on subdirs
self.report()
for sd in self.subdirs.itervalues():
print
sd.report()
def report_full_closure(self): # Report on self and subdirs recursively
self.report()
for sd in self.subdirs.itervalues():
print
sd.report_full_closure()
methodmap = dict(subdirs=phase4,
same_files=phase3, diff_files=phase3, funny_files=phase3,
common_dirs = phase2, common_files=phase2, common_funny=phase2,
common=phase1, left_only=phase1, right_only=phase1,
left_list=phase0, right_list=phase0)
def __getattr__(self, attr):
if attr not in self.methodmap:
raise AttributeError, attr
self.methodmap[attr](self)
return getattr(self, attr)
def cmpfiles(a, b, common, shallow=1):
"""Compare common files in two directories.
a, b -- directory names
common -- list of file names found in both directories
shallow -- if true, do comparison based solely on stat() information
Returns a tuple of three lists:
files that compare equal
files that are different
filenames that aren't regular files.
"""
res = ([], [], [])
for x in common:
ax = os.path.join(a, x)
bx = os.path.join(b, x)
res[_cmp(ax, bx, shallow)].append(x)
return res
# Compare two files.
# Return:
# 0 for equal
# 1 for different
# 2 for funny cases (can't stat, etc.)
#
def _cmp(a, b, sh, abs=abs, cmp=cmp):
try:
return not abs(cmp(a, b, sh))
except (os.error, IOError):
return 2
# Return a copy with items that occur in skip removed.
#
def _filter(flist, skip):
return list(ifilterfalse(skip.__contains__, flist))
# Demonstration and testing.
#
def demo():
import sys
import getopt
options, args = getopt.getopt(sys.argv[1:], 'r')
if len(args) != 2:
raise getopt.GetoptError('need exactly two args', None)
dd = dircmp(args[0], args[1])
if ('-r', '') in options:
dd.report_full_closure()
else:
dd.report()
if __name__ == '__main__':
demo()
"""Helper class to quickly write a loop over all standard input files.
Typical use is:
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in sys.argv[1:],
defaulting to sys.stdin if the list is empty. If a filename is '-' it
is also replaced by sys.stdin. To specify an alternative list of
filenames, pass it as the argument to input(). A single file name is
also allowed.
Functions filename(), lineno() return the filename and cumulative line
number of the line that has just been read; filelineno() returns its
line number in the current file; isfirstline() returns true iff the
line just read is the first line of its file; isstdin() returns true
iff the line was read from sys.stdin. Function nextfile() closes the
current file so that the next iteration will read the first line from
the next file (if any); lines not read from the file will not count
towards the cumulative line count; the filename is not changed until
after the first line of the next file has been read. Function close()
closes the sequence.
Before any lines have been read, filename() returns None and both line
numbers are zero; nextfile() has no effect. After all lines have been
read, filename() and the line number functions return the values
pertaining to the last line read; nextfile() has no effect.
All files are opened in text mode by default, you can override this by
setting the mode parameter to input() or FileInput.__init__().
If an I/O error occurs during opening or reading a file, the IOError
exception is raised.
If sys.stdin is used more than once, the second and further use will
return no lines, except perhaps for interactive use, or if it has been
explicitly reset (e.g. using sys.stdin.seek(0)).
Empty files are opened and immediately closed; the only time their
presence in the list of filenames is noticeable at all is when the
last file opened is empty.
It is possible that the last line of a file doesn't end in a newline
character; otherwise lines are returned including the trailing
newline.
Class FileInput is the implementation; its methods filename(),
lineno(), fileline(), isfirstline(), isstdin(), nextfile() and close()
correspond to the functions in the module. In addition it has a
readline() method which returns the next input line, and a
__getitem__() method which implements the sequence behavior. The
sequence must be accessed in strictly sequential order; sequence
access and readline() cannot be mixed.
Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file.
This makes it possible to write a filter that rewrites its input file
in place. If the keyword argument backup=".<some extension>" is also
given, it specifies the extension for the backup file, and the backup
file remains around; by default, the extension is ".bak" and it is
deleted when the output file is closed. In-place filtering is
disabled when standard input is read. XXX The current implementation
does not work for MS-DOS 8+3 filesystems.
XXX Possible additions:
- optional getopt argument processing
- isatty()
- read(), read(size), even readlines()
"""
import sys, os
__all__ = ["input","close","nextfile","filename","lineno","filelineno",
"isfirstline","isstdin","FileInput"]
_state = None
# No longer used
DEFAULT_BUFSIZE = 8*1024
def input(files=None, inplace=0, backup="", bufsize=0,
mode="r", openhook=None):
"""Return an instance of the FileInput class, which can be iterated.
The parameters are passed to the constructor of the FileInput class.
The returned instance, in addition to being an iterator,
keeps global state for the functions of this module,.
"""
global _state
if _state and _state._file:
raise RuntimeError, "input() already active"
_state = FileInput(files, inplace, backup, bufsize, mode, openhook)
return _state
def close():
"""Close the sequence."""
global _state
state = _state
_state = None
if state:
state.close()
def nextfile():
"""
Close the current file so that the next iteration will read the first
line from the next file (if any); lines not read from the file will
not count towards the cumulative line count. The filename is not
changed until after the first line of the next file has been read.
Before the first line has been read, this function has no effect;
it cannot be used to skip the first file. After the last line of the
last file has been read, this function has no effect.
"""
if not _state:
raise RuntimeError, "no active input()"
return _state.nextfile()
def filename():
"""
Return the name of the file currently being read.
Before the first line has been read, returns None.
"""
if not _state:
raise RuntimeError, "no active input()"
return _state.filename()
def lineno():
"""
Return the cumulative line number of the line that has just been read.
Before the first line has been read, returns 0. After the last line
of the last file has been read, returns the line number of that line.
"""
if not _state:
raise RuntimeError, "no active input()"
return _state.lineno()
def filelineno():
"""
Return the line number in the current file. Before the first line
has been read, returns 0. After the last line of the last file has
been read, returns the line number of that line within the file.
"""
if not _state:
raise RuntimeError, "no active input()"
return _state.filelineno()
def fileno():
"""
Return the file number of the current file. When no file is currently
opened, returns -1.
"""
if not _state:
raise RuntimeError, "no active input()"
return _state.fileno()
def isfirstline():
"""
Returns true the line just read is the first line of its file,
otherwise returns false.
"""
if not _state:
raise RuntimeError, "no active input()"
return _state.isfirstline()
def isstdin():
"""
Returns true if the last line was read from sys.stdin,
otherwise returns false.
"""
if not _state:
raise RuntimeError, "no active input()"
return _state.isstdin()
class FileInput:
"""FileInput([files[, inplace[, backup[, bufsize[, mode[, openhook]]]]]])
Class FileInput is the implementation of the module; its methods
filename(), lineno(), fileline(), isfirstline(), isstdin(), fileno(),
nextfile() and close() correspond to the functions of the same name
in the module.
In addition it has a readline() method which returns the next
input line, and a __getitem__() method which implements the
sequence behavior. The sequence must be accessed in strictly
sequential order; random access and readline() cannot be mixed.
"""
def __init__(self, files=None, inplace=0, backup="", bufsize=0,
mode="r", openhook=None):
if isinstance(files, basestring):
files = (files,)
else:
if files is None:
files = sys.argv[1:]
if not files:
files = ('-',)
else:
files = tuple(files)
self._files = files
self._inplace = inplace
self._backup = backup
self._savestdout = None
self._output = None
self._filename = None
self._startlineno = 0
self._filelineno = 0
self._file = None
self._isstdin = False
self._backupfilename = None
# restrict mode argument to reading modes
if mode not in ('r', 'rU', 'U', 'rb'):
raise ValueError("FileInput opening mode must be one of "
"'r', 'rU', 'U' and 'rb'")
self._mode = mode
if inplace and openhook:
raise ValueError("FileInput cannot use an opening hook in inplace mode")
elif openhook and not hasattr(openhook, '__call__'):
raise ValueError("FileInput openhook must be callable")
self._openhook = openhook
def __del__(self):
self.close()
def close(self):
try:
self.nextfile()
finally:
self._files = ()
def __iter__(self):
return self
def next(self):
while 1:
line = self._readline()
if line:
self._filelineno += 1
return line
if not self._file:
raise StopIteration
self.nextfile()
# repeat with next file
def __getitem__(self, i):
if i != self.lineno():
raise RuntimeError, "accessing lines out of order"
try:
return self.next()
except StopIteration:
raise IndexError, "end of input reached"
def nextfile(self):
savestdout = self._savestdout
self._savestdout = 0
if savestdout:
sys.stdout = savestdout
output = self._output
self._output = 0
try:
if output:
output.close()
finally:
file = self._file
self._file = None
try:
del self._readline # restore FileInput._readline
except AttributeError:
pass
try:
if file and not self._isstdin:
file.close()
finally:
backupfilename = self._backupfilename
self._backupfilename = 0
if backupfilename and not self._backup:
try: os.unlink(backupfilename)
except OSError: pass
self._isstdin = False
def readline(self):
while 1:
line = self._readline()
if line:
self._filelineno += 1
return line
if not self._file:
return line
self.nextfile()
# repeat with next file
def _readline(self):
if not self._files:
return ""
self._filename = self._files[0]
self._files = self._files[1:]
self._startlineno = self.lineno()
self._filelineno = 0
self._file = None
self._isstdin = False
self._backupfilename = 0
if self._filename == '-':
self._filename = '<stdin>'
self._file = sys.stdin
self._isstdin = True
else:
if self._inplace:
self._backupfilename = (
self._filename + (self._backup or os.extsep+"bak"))
try: os.unlink(self._backupfilename)
except os.error: pass
# The next few lines may raise IOError
os.rename(self._filename, self._backupfilename)
self._file = open(self._backupfilename, self._mode)
try:
perm = os.fstat(self._file.fileno()).st_mode
except OSError:
self._output = open(self._filename, "w")
else:
fd = os.open(self._filename,
os.O_CREAT | os.O_WRONLY | os.O_TRUNC,
perm)
self._output = os.fdopen(fd, "w")
try:
if hasattr(os, 'chmod'):
os.chmod(self._filename, perm)
except OSError:
pass
self._savestdout = sys.stdout
sys.stdout = self._output
else:
# This may raise IOError
if self._openhook:
self._file = self._openhook(self._filename, self._mode)
else:
self._file = open(self._filename, self._mode)
self._readline = self._file.readline # hide FileInput._readline
return self._readline()
def filename(self):
return self._filename
def lineno(self):
return self._startlineno + self._filelineno
def filelineno(self):
return self._filelineno
def fileno(self):
if self._file:
try:
return self._file.fileno()
except ValueError:
return -1
else:
return -1
def isfirstline(self):
return self._filelineno == 1
def isstdin(self):
return self._isstdin
def hook_compressed(filename, mode):
ext = os.path.splitext(filename)[1]
if ext == '.gz':
import gzip
return gzip.open(filename, mode)
elif ext == '.bz2':
import bz2
return bz2.BZ2File(filename, mode)
else:
return open(filename, mode)
def hook_encoded(encoding):
import io
def openhook(filename, mode):
mode = mode.replace('U', '').replace('b', '') or 'r'
return io.open(filename, mode, encoding=encoding, newline='')
return openhook
def _test():
import getopt
inplace = 0
backup = 0
opts, args = getopt.getopt(sys.argv[1:], "ib:")
for o, a in opts:
if o == '-i': inplace = 1
if o == '-b': backup = a
for line in input(args, inplace=inplace, backup=backup):
if line[-1:] == '\n': line = line[:-1]
if line[-1:] == '\r': line = line[:-1]
print "%d: %s[%d]%s %s" % (lineno(), filename(), filelineno(),
isfirstline() and "*" or "", line)
print "%d: %s[%d]" % (lineno(), filename(), filelineno())
if __name__ == '__main__':
_test()
"""Filename matching with shell patterns.
fnmatch(FILENAME, PATTERN) matches according to the local convention.
fnmatchcase(FILENAME, PATTERN) always takes case in account.
The functions operate by translating the pattern into a regular
expression. They cache the compiled regular expressions for speed.
The function translate(PATTERN) returns a regular expression
corresponding to PATTERN. (It does not compile it.)
"""
import re
__all__ = ["filter", "fnmatch", "fnmatchcase", "translate"]
_cache = {}
_MAXCACHE = 100
def _purge():
"""Clear the pattern cache"""
_cache.clear()
def fnmatch(name, pat):
"""Test whether FILENAME matches PATTERN.
Patterns are Unix shell style:
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any char not in seq
An initial period in FILENAME is not special.
Both FILENAME and PATTERN are first case-normalized
if the operating system requires it.
If you don't want this, use fnmatchcase(FILENAME, PATTERN).
"""
import os
name = os.path.normcase(name)
pat = os.path.normcase(pat)
return fnmatchcase(name, pat)
def filter(names, pat):
"""Return the subset of the list NAMES that match PAT"""
import os,posixpath
result=[]
pat=os.path.normcase(pat)
try:
re_pat = _cache[pat]
except KeyError:
res = translate(pat)
if len(_cache) >= _MAXCACHE:
_cache.clear()
_cache[pat] = re_pat = re.compile(res)
match = re_pat.match
if os.path is posixpath:
# normcase on posix is NOP. Optimize it away from the loop.
for name in names:
if match(name):
result.append(name)
else:
for name in names:
if match(os.path.normcase(name)):
result.append(name)
return result
def fnmatchcase(name, pat):
"""Test whether FILENAME matches PATTERN, including case.
This is a version of fnmatch() which doesn't case-normalize
its arguments.
"""
try:
re_pat = _cache[pat]
except KeyError:
res = translate(pat)
if len(_cache) >= _MAXCACHE:
_cache.clear()
_cache[pat] = re_pat = re.compile(res)
return re_pat.match(name) is not None
def translate(pat):
"""Translate a shell PATTERN to a regular expression.
There is no way to quote meta-characters.
"""
i, n = 0, len(pat)
res = ''
while i < n:
c = pat[i]
i = i+1
if c == '*':
res = res + '.*'
elif c == '?':
res = res + '.'
elif c == '[':
j = i
if j < n and pat[j] == '!':
j = j+1
if j < n and pat[j] == ']':
j = j+1
while j < n and pat[j] != ']':
j = j+1
if j >= n:
res = res + '\\['
else:
stuff = pat[i:j].replace('\\','\\\\')
i = j+1
if stuff[0] == '!':
stuff = '^' + stuff[1:]
elif stuff[0] == '^':
stuff = '\\' + stuff
res = '%s[%s]' % (res, stuff)
else:
res = res + re.escape(c)
return res + '\Z(?ms)'
"""Generic output formatting.
Formatter objects transform an abstract flow of formatting events into
specific output events on writer objects. Formatters manage several stack
structures to allow various properties of a writer object to be changed and
restored; writers need not be able to handle relative changes nor any sort
of ``change back'' operation. Specific writer properties which may be
controlled via formatter objects are horizontal alignment, font, and left
margin indentations. A mechanism is provided which supports providing
arbitrary, non-exclusive style settings to a writer as well. Additional
interfaces facilitate formatting events which are not reversible, such as
paragraph separation.
Writer objects encapsulate device interfaces. Abstract devices, such as
file formats, are supported as well as physical devices. The provided
implementations all work with abstract devices. The interface makes
available mechanisms for setting the properties which formatter objects
manage and inserting data into the output.
"""
import sys
AS_IS = None
class NullFormatter:
"""A formatter which does nothing.
If the writer parameter is omitted, a NullWriter instance is created.
No methods of the writer are called by NullFormatter instances.
Implementations should inherit from this class if implementing a writer
interface but don't need to inherit any implementation.
"""
def __init__(self, writer=None):
if writer is None:
writer = NullWriter()
self.writer = writer
def end_paragraph(self, blankline): pass
def add_line_break(self): pass
def add_hor_rule(self, *args, **kw): pass
def add_label_data(self, format, counter, blankline=None): pass
def add_flowing_data(self, data): pass
def add_literal_data(self, data): pass
def flush_softspace(self): pass
def push_alignment(self, align): pass
def pop_alignment(self): pass
def push_font(self, x): pass
def pop_font(self): pass
def push_margin(self, margin): pass
def pop_margin(self): pass
def set_spacing(self, spacing): pass
def push_style(self, *styles): pass
def pop_style(self, n=1): pass
def assert_line_data(self, flag=1): pass
class AbstractFormatter:
"""The standard formatter.
This implementation has demonstrated wide applicability to many writers,
and may be used directly in most circumstances. It has been used to
implement a full-featured World Wide Web browser.
"""
# Space handling policy: blank spaces at the boundary between elements
# are handled by the outermost context. "Literal" data is not checked
# to determine context, so spaces in literal data are handled directly
# in all circumstances.
def __init__(self, writer):
self.writer = writer # Output device
self.align = None # Current alignment
self.align_stack = [] # Alignment stack
self.font_stack = [] # Font state
self.margin_stack = [] # Margin state
self.spacing = None # Vertical spacing state
self.style_stack = [] # Other state, e.g. color
self.nospace = 1 # Should leading space be suppressed
self.softspace = 0 # Should a space be inserted
self.para_end = 1 # Just ended a paragraph
self.parskip = 0 # Skipped space between paragraphs?
self.hard_break = 1 # Have a hard break
self.have_label = 0
def end_paragraph(self, blankline):
if not self.hard_break:
self.writer.send_line_break()
self.have_label = 0
if self.parskip < blankline and not self.have_label:
self.writer.send_paragraph(blankline - self.parskip)
self.parskip = blankline
self.have_label = 0
self.hard_break = self.nospace = self.para_end = 1
self.softspace = 0
def add_line_break(self):
if not (self.hard_break or self.para_end):
self.writer.send_line_break()
self.have_label = self.parskip = 0
self.hard_break = self.nospace = 1
self.softspace = 0
def add_hor_rule(self, *args, **kw):
if not self.hard_break:
self.writer.send_line_break()
self.writer.send_hor_rule(*args, **kw)
self.hard_break = self.nospace = 1
self.have_label = self.para_end = self.softspace = self.parskip = 0
def add_label_data(self, format, counter, blankline = None):
if self.have_label or not self.hard_break:
self.writer.send_line_break()
if not self.para_end:
self.writer.send_paragraph((blankline and 1) or 0)
if isinstance(format, str):
self.writer.send_label_data(self.format_counter(format, counter))
else:
self.writer.send_label_data(format)
self.nospace = self.have_label = self.hard_break = self.para_end = 1
self.softspace = self.parskip = 0
def format_counter(self, format, counter):
label = ''
for c in format:
if c == '1':
label = label + ('%d' % counter)
elif c in 'aA':
if counter > 0:
label = label + self.format_letter(c, counter)
elif c in 'iI':
if counter > 0:
label = label + self.format_roman(c, counter)
else:
label = label + c
return label
def format_letter(self, case, counter):
label = ''
while counter > 0:
counter, x = divmod(counter-1, 26)
# This makes a strong assumption that lowercase letters
# and uppercase letters form two contiguous blocks, with
# letters in order!
s = chr(ord(case) + x)
label = s + label
return label
def format_roman(self, case, counter):
ones = ['i', 'x', 'c', 'm']
fives = ['v', 'l', 'd']
label, index = '', 0
# This will die of IndexError when counter is too big
while counter > 0:
counter, x = divmod(counter, 10)
if x == 9:
label = ones[index] + ones[index+1] + label
elif x == 4:
label = ones[index] + fives[index] + label
else:
if x >= 5:
s = fives[index]
x = x-5
else:
s = ''
s = s + ones[index]*x
label = s + label
index = index + 1
if case == 'I':
return label.upper()
return label
def add_flowing_data(self, data):
if not data: return
prespace = data[:1].isspace()
postspace = data[-1:].isspace()
data = " ".join(data.split())
if self.nospace and not data:
return
elif prespace or self.softspace:
if not data:
if not self.nospace:
self.softspace = 1
self.parskip = 0
return
if not self.nospace:
data = ' ' + data
self.hard_break = self.nospace = self.para_end = \
self.parskip = self.have_label = 0
self.softspace = postspace
self.writer.send_flowing_data(data)
def add_literal_data(self, data):
if not data: return
if self.softspace:
self.writer.send_flowing_data(" ")
self.hard_break = data[-1:] == '\n'
self.nospace = self.para_end = self.softspace = \
self.parskip = self.have_label = 0
self.writer.send_literal_data(data)
def flush_softspace(self):
if self.softspace:
self.hard_break = self.para_end = self.parskip = \
self.have_label = self.softspace = 0
self.nospace = 1
self.writer.send_flowing_data(' ')
def push_alignment(self, align):
if align and align != self.align:
self.writer.new_alignment(align)
self.align = align
self.align_stack.append(align)
else:
self.align_stack.append(self.align)
def pop_alignment(self):
if self.align_stack:
del self.align_stack[-1]
if self.align_stack:
self.align = align = self.align_stack[-1]
self.writer.new_alignment(align)
else:
self.align = None
self.writer.new_alignment(None)
def push_font(self, font):
size, i, b, tt = font
if self.softspace:
self.hard_break = self.para_end = self.softspace = 0
self.nospace = 1
self.writer.send_flowing_data(' ')
if self.font_stack:
csize, ci, cb, ctt = self.font_stack[-1]
if size is AS_IS: size = csize
if i is AS_IS: i = ci
if b is AS_IS: b = cb
if tt is AS_IS: tt = ctt
font = (size, i, b, tt)
self.font_stack.append(font)
self.writer.new_font(font)
def pop_font(self):
if self.font_stack:
del self.font_stack[-1]
if self.font_stack:
font = self.font_stack[-1]
else:
font = None
self.writer.new_font(font)
def push_margin(self, margin):
self.margin_stack.append(margin)
fstack = filter(None, self.margin_stack)
if not margin and fstack:
margin = fstack[-1]
self.writer.new_margin(margin, len(fstack))
def pop_margin(self):
if self.margin_stack:
del self.margin_stack[-1]
fstack = filter(None, self.margin_stack)
if fstack:
margin = fstack[-1]
else:
margin = None
self.writer.new_margin(margin, len(fstack))
def set_spacing(self, spacing):
self.spacing = spacing
self.writer.new_spacing(spacing)
def push_style(self, *styles):
if self.softspace:
self.hard_break = self.para_end = self.softspace = 0
self.nospace = 1
self.writer.send_flowing_data(' ')
for style in styles:
self.style_stack.append(style)
self.writer.new_styles(tuple(self.style_stack))
def pop_style(self, n=1):
del self.style_stack[-n:]
self.writer.new_styles(tuple(self.style_stack))
def assert_line_data(self, flag=1):
self.nospace = self.hard_break = not flag
self.para_end = self.parskip = self.have_label = 0
class NullWriter:
"""Minimal writer interface to use in testing & inheritance.
A writer which only provides the interface definition; no actions are
taken on any methods. This should be the base class for all writers
which do not need to inherit any implementation methods.
"""
def __init__(self): pass
def flush(self): pass
def new_alignment(self, align): pass
def new_font(self, font): pass
def new_margin(self, margin, level): pass
def new_spacing(self, spacing): pass
def new_styles(self, styles): pass
def send_paragraph(self, blankline): pass
def send_line_break(self): pass
def send_hor_rule(self, *args, **kw): pass
def send_label_data(self, data): pass
def send_flowing_data(self, data): pass
def send_literal_data(self, data): pass
class AbstractWriter(NullWriter):
"""A writer which can be used in debugging formatters, but not much else.
Each method simply announces itself by printing its name and
arguments on standard output.
"""
def new_alignment(self, align):
print "new_alignment(%r)" % (align,)
def new_font(self, font):
print "new_font(%r)" % (font,)
def new_margin(self, margin, level):
print "new_margin(%r, %d)" % (margin, level)
def new_spacing(self, spacing):
print "new_spacing(%r)" % (spacing,)
def new_styles(self, styles):
print "new_styles(%r)" % (styles,)
def send_paragraph(self, blankline):
print "send_paragraph(%r)" % (blankline,)
def send_line_break(self):
print "send_line_break()"
def send_hor_rule(self, *args, **kw):
print "send_hor_rule()"
def send_label_data(self, data):
print "send_label_data(%r)" % (data,)
def send_flowing_data(self, data):
print "send_flowing_data(%r)" % (data,)
def send_literal_data(self, data):
print "send_literal_data(%r)" % (data,)
class DumbWriter(NullWriter):
"""Simple writer class which writes output on the file object passed in
as the file parameter or, if file is omitted, on standard output. The
output is simply word-wrapped to the number of columns specified by
the maxcol parameter. This class is suitable for reflowing a sequence
of paragraphs.
"""
def __init__(self, file=None, maxcol=72):
self.file = file or sys.stdout
self.maxcol = maxcol
NullWriter.__init__(self)
self.reset()
def reset(self):
self.col = 0
self.atbreak = 0
def send_paragraph(self, blankline):
self.file.write('\n'*blankline)
self.col = 0
self.atbreak = 0
def send_line_break(self):
self.file.write('\n')
self.col = 0
self.atbreak = 0
def send_hor_rule(self, *args, **kw):
self.file.write('\n')
self.file.write('-'*self.maxcol)
self.file.write('\n')
self.col = 0
self.atbreak = 0
def send_literal_data(self, data):
self.file.write(data)
i = data.rfind('\n')
if i >= 0:
self.col = 0
data = data[i+1:]
data = data.expandtabs()
self.col = self.col + len(data)
self.atbreak = 0
def send_flowing_data(self, data):
if not data: return
atbreak = self.atbreak or data[0].isspace()
col = self.col
maxcol = self.maxcol
write = self.file.write
for word in data.split():
if atbreak:
if col + len(word) >= maxcol:
write('\n')
col = 0
else:
write(' ')
col = col + 1
write(word)
col = col + len(word)
atbreak = 1
self.col = col
self.atbreak = data[-1].isspace()
def test(file = None):
w = DumbWriter()
f = AbstractFormatter(w)
if file is not None:
fp = open(file)
elif sys.argv[1:]:
fp = open(sys.argv[1])
else:
fp = sys.stdin
for line in fp:
if line == '\n':
f.end_paragraph(1)
else:
f.add_flowing_data(line)
f.end_paragraph(0)
if __name__ == '__main__':
test()
"""General floating point formatting functions.
Functions:
fix(x, digits_behind)
sci(x, digits_behind)
Each takes a number or a string and a number of digits as arguments.
Parameters:
x: number to be formatted; or a string resembling a number
digits_behind: number of digits behind the decimal point
"""
from warnings import warnpy3k
warnpy3k("the fpformat module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
import re
__all__ = ["fix","sci","NotANumber"]
# Compiled regular expression to "decode" a number
decoder = re.compile(r'^([-+]?)(\d*)((?:\.\d*)?)(([eE][-+]?\d+)?)$')
# \0 the whole thing
# \1 leading sign or empty
# \2 digits left of decimal point
# \3 fraction (empty or begins with point)
# \4 exponent part (empty or begins with 'e' or 'E')
try:
class NotANumber(ValueError):
pass
except TypeError:
NotANumber = 'fpformat.NotANumber'
def extract(s):
"""Return (sign, intpart, fraction, expo) or raise an exception:
sign is '+' or '-'
intpart is 0 or more digits beginning with a nonzero
fraction is 0 or more digits
expo is an integer"""
res = decoder.match(s)
if res is None: raise NotANumber, s
sign, intpart, fraction, exppart = res.group(1,2,3,4)
intpart = intpart.lstrip('0');
if sign == '+': sign = ''
if fraction: fraction = fraction[1:]
if exppart: expo = int(exppart[1:])
else: expo = 0
return sign, intpart, fraction, expo
def unexpo(intpart, fraction, expo):
"""Remove the exponent by changing intpart and fraction."""
if expo > 0: # Move the point left
f = len(fraction)
intpart, fraction = intpart + fraction[:expo], fraction[expo:]
if expo > f:
intpart = intpart + '0'*(expo-f)
elif expo < 0: # Move the point right
i = len(intpart)
intpart, fraction = intpart[:expo], intpart[expo:] + fraction
if expo < -i:
fraction = '0'*(-expo-i) + fraction
return intpart, fraction
def roundfrac(intpart, fraction, digs):
"""Round or extend the fraction to size digs."""
f = len(fraction)
if f <= digs:
return intpart, fraction + '0'*(digs-f)
i = len(intpart)
if i+digs < 0:
return '0'*-digs, ''
total = intpart + fraction
nextdigit = total[i+digs]
if nextdigit >= '5': # Hard case: increment last digit, may have carry!
n = i + digs - 1
while n >= 0:
if total[n] != '9': break
n = n-1
else:
total = '0' + total
i = i+1
n = 0
total = total[:n] + chr(ord(total[n]) + 1) + '0'*(len(total)-n-1)
intpart, fraction = total[:i], total[i:]
if digs >= 0:
return intpart, fraction[:digs]
else:
return intpart[:digs] + '0'*-digs, ''
def fix(x, digs):
"""Format x as [-]ddd.ddd with 'digs' digits after the point
and at least one digit before.
If digs <= 0, the point is suppressed."""
if type(x) != type(''): x = repr(x)
try:
sign, intpart, fraction, expo = extract(x)
except NotANumber:
return x
intpart, fraction = unexpo(intpart, fraction, expo)
intpart, fraction = roundfrac(intpart, fraction, digs)
while intpart and intpart[0] == '0': intpart = intpart[1:]
if intpart == '': intpart = '0'
if digs > 0: return sign + intpart + '.' + fraction
else: return sign + intpart
def sci(x, digs):
"""Format x as [-]d.dddE[+-]ddd with 'digs' digits after the point
and exactly one digit before.
If digs is <= 0, one digit is kept and the point is suppressed."""
if type(x) != type(''): x = repr(x)
sign, intpart, fraction, expo = extract(x)
if not intpart:
while fraction and fraction[0] == '0':
fraction = fraction[1:]
expo = expo - 1
if fraction:
intpart, fraction = fraction[0], fraction[1:]
expo = expo - 1
else:
intpart = '0'
else:
expo = expo + len(intpart) - 1
intpart, fraction = intpart[0], intpart[1:] + fraction
digs = max(0, digs)
intpart, fraction = roundfrac(intpart, fraction, digs)
if len(intpart) > 1:
intpart, fraction, expo = \
intpart[0], intpart[1:] + fraction[:-1], \
expo + len(intpart) - 1
s = sign + intpart
if digs > 0: s = s + '.' + fraction
e = repr(abs(expo))
e = '0'*(3-len(e)) + e
if expo < 0: e = '-' + e
else: e = '+' + e
return s + 'e' + e
def test():
"""Interactive test run."""
try:
while 1:
x, digs = input('Enter (x, digs): ')
print x, fix(x, digs), sci(x, digs)
except (EOFError, KeyboardInterrupt):
pass
# Originally contributed by Sjoerd Mullender.
# Significantly modified by Jeffrey Yasskin <jyasskin at gmail.com>.
"""Rational, infinite-precision, real numbers."""
from __future__ import division
from decimal import Decimal
import math
import numbers
import operator
import re
__all__ = ['Fraction', 'gcd']
Rational = numbers.Rational
def gcd(a, b):
"""Calculate the Greatest Common Divisor of a and b.
Unless b==0, the result will have the same sign as b (so that when
b is divided by it, the result comes out positive).
"""
while b:
a, b = b, a%b
return a
_RATIONAL_FORMAT = re.compile(r"""
\A\s* # optional whitespace at the start, then
(?P<sign>[-+]?) # an optional sign, then
(?=\d|\.\d) # lookahead for digit or .digit
(?P<num>\d*) # numerator (possibly empty)
(?: # followed by
(?:/(?P<denom>\d+))? # an optional denominator
| # or
(?:\.(?P<decimal>\d*))? # an optional fractional part
(?:E(?P<exp>[-+]?\d+))? # and optional exponent
)
\s*\Z # and optional whitespace to finish
""", re.VERBOSE | re.IGNORECASE)
class Fraction(Rational):
"""This class implements rational numbers.
In the two-argument form of the constructor, Fraction(8, 6) will
produce a rational number equivalent to 4/3. Both arguments must
be Rational. The numerator defaults to 0 and the denominator
defaults to 1 so that Fraction(3) == 3 and Fraction() == 0.
Fractions can also be constructed from:
- numeric strings similar to those accepted by the
float constructor (for example, '-2.3' or '1e10')
- strings of the form '123/456'
- float and Decimal instances
- other Rational instances (including integers)
"""
__slots__ = ('_numerator', '_denominator')
# We're immutable, so use __new__ not __init__
def __new__(cls, numerator=0, denominator=None):
"""Constructs a Fraction.
Takes a string like '3/2' or '1.5', another Rational instance, a
numerator/denominator pair, or a float.
Examples
--------
>>> Fraction(10, -8)
Fraction(-5, 4)
>>> Fraction(Fraction(1, 7), 5)
Fraction(1, 35)
>>> Fraction(Fraction(1, 7), Fraction(2, 3))
Fraction(3, 14)
>>> Fraction('314')
Fraction(314, 1)
>>> Fraction('-35/4')
Fraction(-35, 4)
>>> Fraction('3.1415') # conversion from numeric string
Fraction(6283, 2000)
>>> Fraction('-47e-2') # string may include a decimal exponent
Fraction(-47, 100)
>>> Fraction(1.47) # direct construction from float (exact conversion)
Fraction(6620291452234629, 4503599627370496)
>>> Fraction(2.25)
Fraction(9, 4)
>>> Fraction(Decimal('1.47'))
Fraction(147, 100)
"""
self = super(Fraction, cls).__new__(cls)
if denominator is None:
if isinstance(numerator, Rational):
self._numerator = numerator.numerator
self._denominator = numerator.denominator
return self
elif isinstance(numerator, float):
# Exact conversion from float
value = Fraction.from_float(numerator)
self._numerator = value._numerator
self._denominator = value._denominator
return self
elif isinstance(numerator, Decimal):
value = Fraction.from_decimal(numerator)
self._numerator = value._numerator
self._denominator = value._denominator
return self
elif isinstance(numerator, basestring):
# Handle construction from strings.
m = _RATIONAL_FORMAT.match(numerator)
if m is None:
raise ValueError('Invalid literal for Fraction: %r' %
numerator)
numerator = int(m.group('num') or '0')
denom = m.group('denom')
if denom:
denominator = int(denom)
else:
denominator = 1
decimal = m.group('decimal')
if decimal:
scale = 10**len(decimal)
numerator = numerator * scale + int(decimal)
denominator *= scale
exp = m.group('exp')
if exp:
exp = int(exp)
if exp >= 0:
numerator *= 10**exp
else:
denominator *= 10**-exp
if m.group('sign') == '-':
numerator = -numerator
else:
raise TypeError("argument should be a string "
"or a Rational instance")
elif (isinstance(numerator, Rational) and
isinstance(denominator, Rational)):
numerator, denominator = (
numerator.numerator * denominator.denominator,
denominator.numerator * numerator.denominator
)
else:
raise TypeError("both arguments should be "
"Rational instances")
if denominator == 0:
raise ZeroDivisionError('Fraction(%s, 0)' % numerator)
g = gcd(numerator, denominator)
self._numerator = numerator // g
self._denominator = denominator // g
return self
@classmethod
def from_float(cls, f):
"""Converts a finite float to a rational number, exactly.
Beware that Fraction.from_float(0.3) != Fraction(3, 10).
"""
if isinstance(f, numbers.Integral):
return cls(f)
elif not isinstance(f, float):
raise TypeError("%s.from_float() only takes floats, not %r (%s)" %
(cls.__name__, f, type(f).__name__))
if math.isnan(f) or math.isinf(f):
raise TypeError("Cannot convert %r to %s." % (f, cls.__name__))
return cls(*f.as_integer_ratio())
@classmethod
def from_decimal(cls, dec):
"""Converts a finite Decimal instance to a rational number, exactly."""
from decimal import Decimal
if isinstance(dec, numbers.Integral):
dec = Decimal(int(dec))
elif not isinstance(dec, Decimal):
raise TypeError(
"%s.from_decimal() only takes Decimals, not %r (%s)" %
(cls.__name__, dec, type(dec).__name__))
if not dec.is_finite():
# Catches infinities and nans.
raise TypeError("Cannot convert %s to %s." % (dec, cls.__name__))
sign, digits, exp = dec.as_tuple()
digits = int(''.join(map(str, digits)))
if sign:
digits = -digits
if exp >= 0:
return cls(digits * 10 ** exp)
else:
return cls(digits, 10 ** -exp)
def limit_denominator(self, max_denominator=1000000):
"""Closest Fraction to self with denominator at most max_denominator.
>>> Fraction('3.141592653589793').limit_denominator(10)
Fraction(22, 7)
>>> Fraction('3.141592653589793').limit_denominator(100)
Fraction(311, 99)
>>> Fraction(4321, 8765).limit_denominator(10000)
Fraction(4321, 8765)
"""
# Algorithm notes: For any real number x, define a *best upper
# approximation* to x to be a rational number p/q such that:
#
# (1) p/q >= x, and
# (2) if p/q > r/s >= x then s > q, for any rational r/s.
#
# Define *best lower approximation* similarly. Then it can be
# proved that a rational number is a best upper or lower
# approximation to x if, and only if, it is a convergent or
# semiconvergent of the (unique shortest) continued fraction
# associated to x.
#
# To find a best rational approximation with denominator <= M,
# we find the best upper and lower approximations with
# denominator <= M and take whichever of these is closer to x.
# In the event of a tie, the bound with smaller denominator is
# chosen. If both denominators are equal (which can happen
# only when max_denominator == 1 and self is midway between
# two integers) the lower bound---i.e., the floor of self, is
# taken.
if max_denominator < 1:
raise ValueError("max_denominator should be at least 1")
if self._denominator <= max_denominator:
return Fraction(self)
p0, q0, p1, q1 = 0, 1, 1, 0
n, d = self._numerator, self._denominator
while True:
a = n//d
q2 = q0+a*q1
if q2 > max_denominator:
break
p0, q0, p1, q1 = p1, q1, p0+a*p1, q2
n, d = d, n-a*d
k = (max_denominator-q0)//q1
bound1 = Fraction(p0+k*p1, q0+k*q1)
bound2 = Fraction(p1, q1)
if abs(bound2 - self) <= abs(bound1-self):
return bound2
else:
return bound1
@property
def numerator(a):
return a._numerator
@property
def denominator(a):
return a._denominator
def __repr__(self):
"""repr(self)"""
return ('Fraction(%s, %s)' % (self._numerator, self._denominator))
def __str__(self):
"""str(self)"""
if self._denominator == 1:
return str(self._numerator)
else:
return '%s/%s' % (self._numerator, self._denominator)
def _operator_fallbacks(monomorphic_operator, fallback_operator):
"""Generates forward and reverse operators given a purely-rational
operator and a function from the operator module.
Use this like:
__op__, __rop__ = _operator_fallbacks(just_rational_op, operator.op)
In general, we want to implement the arithmetic operations so
that mixed-mode operations either call an implementation whose
author knew about the types of both arguments, or convert both
to the nearest built in type and do the operation there. In
Fraction, that means that we define __add__ and __radd__ as:
def __add__(self, other):
# Both types have numerators/denominator attributes,
# so do the operation directly
if isinstance(other, (int, long, Fraction)):
return Fraction(self.numerator * other.denominator +
other.numerator * self.denominator,
self.denominator * other.denominator)
# float and complex don't have those operations, but we
# know about those types, so special case them.
elif isinstance(other, float):
return float(self) + other
elif isinstance(other, complex):
return complex(self) + other
# Let the other type take over.
return NotImplemented
def __radd__(self, other):
# radd handles more types than add because there's
# nothing left to fall back to.
if isinstance(other, Rational):
return Fraction(self.numerator * other.denominator +
other.numerator * self.denominator,
self.denominator * other.denominator)
elif isinstance(other, Real):
return float(other) + float(self)
elif isinstance(other, Complex):
return complex(other) + complex(self)
return NotImplemented
There are 5 different cases for a mixed-type addition on
Fraction. I'll refer to all of the above code that doesn't
refer to Fraction, float, or complex as "boilerplate". 'r'
will be an instance of Fraction, which is a subtype of
Rational (r : Fraction <: Rational), and b : B <:
Complex. The first three involve 'r + b':
1. If B <: Fraction, int, float, or complex, we handle
that specially, and all is well.
2. If Fraction falls back to the boilerplate code, and it
were to return a value from __add__, we'd miss the
possibility that B defines a more intelligent __radd__,
so the boilerplate should return NotImplemented from
__add__. In particular, we don't handle Rational
here, even though we could get an exact answer, in case
the other type wants to do something special.
3. If B <: Fraction, Python tries B.__radd__ before
Fraction.__add__. This is ok, because it was
implemented with knowledge of Fraction, so it can
handle those instances before delegating to Real or
Complex.
The next two situations describe 'b + r'. We assume that b
didn't know about Fraction in its implementation, and that it
uses similar boilerplate code:
4. If B <: Rational, then __radd_ converts both to the
builtin rational type (hey look, that's us) and
proceeds.
5. Otherwise, __radd__ tries to find the nearest common
base ABC, and fall back to its builtin type. Since this
class doesn't subclass a concrete type, there's no
implementation to fall back to, so we need to try as
hard as possible to return an actual value, or the user
will get a TypeError.
"""
def forward(a, b):
if isinstance(b, (int, long, Fraction)):
return monomorphic_operator(a, b)
elif isinstance(b, float):
return fallback_operator(float(a), b)
elif isinstance(b, complex):
return fallback_operator(complex(a), b)
else:
return NotImplemented
forward.__name__ = '__' + fallback_operator.__name__ + '__'
forward.__doc__ = monomorphic_operator.__doc__
def reverse(b, a):
if isinstance(a, Rational):
# Includes ints.
return monomorphic_operator(a, b)
elif isinstance(a, numbers.Real):
return fallback_operator(float(a), float(b))
elif isinstance(a, numbers.Complex):
return fallback_operator(complex(a), complex(b))
else:
return NotImplemented
reverse.__name__ = '__r' + fallback_operator.__name__ + '__'
reverse.__doc__ = monomorphic_operator.__doc__
return forward, reverse
def _add(a, b):
"""a + b"""
return Fraction(a.numerator * b.denominator +
b.numerator * a.denominator,
a.denominator * b.denominator)
__add__, __radd__ = _operator_fallbacks(_add, operator.add)
def _sub(a, b):
"""a - b"""
return Fraction(a.numerator * b.denominator -
b.numerator * a.denominator,
a.denominator * b.denominator)
__sub__, __rsub__ = _operator_fallbacks(_sub, operator.sub)
def _mul(a, b):
"""a * b"""
return Fraction(a.numerator * b.numerator, a.denominator * b.denominator)
__mul__, __rmul__ = _operator_fallbacks(_mul, operator.mul)
def _div(a, b):
"""a / b"""
return Fraction(a.numerator * b.denominator,
a.denominator * b.numerator)
__truediv__, __rtruediv__ = _operator_fallbacks(_div, operator.truediv)
__div__, __rdiv__ = _operator_fallbacks(_div, operator.div)
def __floordiv__(a, b):
"""a // b"""
# Will be math.floor(a / b) in 3.0.
div = a / b
if isinstance(div, Rational):
# trunc(math.floor(div)) doesn't work if the rational is
# more precise than a float because the intermediate
# rounding may cross an integer boundary.
return div.numerator // div.denominator
else:
return math.floor(div)
def __rfloordiv__(b, a):
"""a // b"""
# Will be math.floor(a / b) in 3.0.
div = a / b
if isinstance(div, Rational):
# trunc(math.floor(div)) doesn't work if the rational is
# more precise than a float because the intermediate
# rounding may cross an integer boundary.
return div.numerator // div.denominator
else:
return math.floor(div)
def __mod__(a, b):
"""a % b"""
div = a // b
return a - b * div
def __rmod__(b, a):
"""a % b"""
div = a // b
return a - b * div
def __pow__(a, b):
"""a ** b
If b is not an integer, the result will be a float or complex
since roots are generally irrational. If b is an integer, the
result will be rational.
"""
if isinstance(b, Rational):
if b.denominator == 1:
power = b.numerator
if power >= 0:
return Fraction(a._numerator ** power,
a._denominator ** power)
else:
return Fraction(a._denominator ** -power,
a._numerator ** -power)
else:
# A fractional power will generally produce an
# irrational number.
return float(a) ** float(b)
else:
return float(a) ** b
def __rpow__(b, a):
"""a ** b"""
if b._denominator == 1 and b._numerator >= 0:
# If a is an int, keep it that way if possible.
return a ** b._numerator
if isinstance(a, Rational):
return Fraction(a.numerator, a.denominator) ** b
if b._denominator == 1:
return a ** b._numerator
return a ** float(b)
def __pos__(a):
"""+a: Coerces a subclass instance to Fraction"""
return Fraction(a._numerator, a._denominator)
def __neg__(a):
"""-a"""
return Fraction(-a._numerator, a._denominator)
def __abs__(a):
"""abs(a)"""
return Fraction(abs(a._numerator), a._denominator)
def __trunc__(a):
"""trunc(a)"""
if a._numerator < 0:
return -(-a._numerator // a._denominator)
else:
return a._numerator // a._denominator
def __hash__(self):
"""hash(self)
Tricky because values that are exactly representable as a
float must have the same hash as that float.
"""
# XXX since this method is expensive, consider caching the result
if self._denominator == 1:
# Get integers right.
return hash(self._numerator)
# Expensive check, but definitely correct.
if self == float(self):
return hash(float(self))
else:
# Use tuple's hash to avoid a high collision rate on
# simple fractions.
return hash((self._numerator, self._denominator))
def __eq__(a, b):
"""a == b"""
if isinstance(b, Rational):
return (a._numerator == b.numerator and
a._denominator == b.denominator)
if isinstance(b, numbers.Complex) and b.imag == 0:
b = b.real
if isinstance(b, float):
if math.isnan(b) or math.isinf(b):
# comparisons with an infinity or nan should behave in
# the same way for any finite a, so treat a as zero.
return 0.0 == b
else:
return a == a.from_float(b)
else:
# Since a doesn't know how to compare with b, let's give b
# a chance to compare itself with a.
return NotImplemented
def _richcmp(self, other, op):
"""Helper for comparison operators, for internal use only.
Implement comparison between a Rational instance `self`, and
either another Rational instance or a float `other`. If
`other` is not a Rational instance or a float, return
NotImplemented. `op` should be one of the six standard
comparison operators.
"""
# convert other to a Rational instance where reasonable.
if isinstance(other, Rational):
return op(self._numerator * other.denominator,
self._denominator * other.numerator)
# comparisons with complex should raise a TypeError, for consistency
# with int<->complex, float<->complex, and complex<->complex comparisons.
if isinstance(other, complex):
raise TypeError("no ordering relation is defined for complex numbers")
if isinstance(other, float):
if math.isnan(other) or math.isinf(other):
return op(0.0, other)
else:
return op(self, self.from_float(other))
else:
return NotImplemented
def __lt__(a, b):
"""a < b"""
return a._richcmp(b, operator.lt)
def __gt__(a, b):
"""a > b"""
return a._richcmp(b, operator.gt)
def __le__(a, b):
"""a <= b"""
return a._richcmp(b, operator.le)
def __ge__(a, b):
"""a >= b"""
return a._richcmp(b, operator.ge)
def __nonzero__(a):
"""a != 0"""
return a._numerator != 0
# support for pickling, copy, and deepcopy
def __reduce__(self):
return (self.__class__, (str(self),))
def __copy__(self):
if type(self) == Fraction:
return self # I'm immutable; therefore I am my own clone
return self.__class__(self._numerator, self._denominator)
def __deepcopy__(self, memo):
if type(self) == Fraction:
return self # My components are also immutable
return self.__class__(self._numerator, self._denominator)
"""An FTP client class and some helper functions.
Based on RFC 959: File Transfer Protocol (FTP), by J. Postel and J. Reynolds
Example:
>>> from ftplib import FTP
>>> ftp = FTP('ftp.python.org') # connect to host, default port
>>> ftp.login() # default, i.e.: user anonymous, passwd anonymous@
'230 Guest login ok, access restrictions apply.'
>>> ftp.retrlines('LIST') # list directory contents
total 9
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 .
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 ..
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 bin
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 etc
d-wxrwxr-x 2 ftp wheel 1024 Sep 5 13:43 incoming
drwxr-xr-x 2 root wheel 1024 Nov 17 1993 lib
drwxr-xr-x 6 1094 wheel 1024 Sep 13 19:07 pub
drwxr-xr-x 3 root wheel 1024 Jan 3 1994 usr
-rw-r--r-- 1 root root 312 Aug 1 1994 welcome.msg
'226 Transfer complete.'
>>> ftp.quit()
'221 Goodbye.'
>>>
A nice test that reveals some of the network dialogue would be:
python ftplib.py -d localhost -l -p -l
"""
#
# Changes and improvements suggested by Steve Majewski.
# Modified by Jack to work on the mac.
# Modified by Siebren to support docstrings and PASV.
# Modified by Phil Schwartz to add storbinary and storlines callbacks.
# Modified by Giampaolo Rodola' to add TLS support.
#
import os
import sys
# Import SOCKS module if it exists, else standard socket module socket
try:
import SOCKS; socket = SOCKS; del SOCKS # import SOCKS as socket
from socket import getfqdn; socket.getfqdn = getfqdn; del getfqdn
except ImportError:
import socket
from socket import _GLOBAL_DEFAULT_TIMEOUT
__all__ = ["FTP","Netrc"]
# Magic number from <socket.h>
MSG_OOB = 0x1 # Process data out of band
# The standard FTP server control port
FTP_PORT = 21
# The sizehint parameter passed to readline() calls
MAXLINE = 8192
# Exception raised when an error or invalid response is received
class Error(Exception): pass
class error_reply(Error): pass # unexpected [123]xx reply
class error_temp(Error): pass # 4xx errors
class error_perm(Error): pass # 5xx errors
class error_proto(Error): pass # response does not begin with [1-5]
# All exceptions (hopefully) that may be raised here and that aren't
# (always) programming errors on our side
all_errors = (Error, IOError, EOFError)
# Line terminators (we always output CRLF, but accept any of CRLF, CR, LF)
CRLF = '\r\n'
# The class itself
class FTP:
'''An FTP client class.
To create a connection, call the class using these arguments:
host, user, passwd, acct, timeout
The first four arguments are all strings, and have default value ''.
timeout must be numeric and defaults to None if not passed,
meaning that no timeout will be set on any ftp socket(s)
If a timeout is passed, then this is now the default timeout for all ftp
socket operations for this instance.
Then use self.connect() with optional host and port argument.
To download a file, use ftp.retrlines('RETR ' + filename),
or ftp.retrbinary() with slightly different arguments.
To upload a file, use ftp.storlines() or ftp.storbinary(),
which have an open file as argument (see their definitions
below for details).
The download/upload functions first issue appropriate TYPE
and PORT or PASV commands.
'''
debugging = 0
host = ''
port = FTP_PORT
maxline = MAXLINE
sock = None
file = None
welcome = None
passiveserver = 1
# Initialization method (called by class instantiation).
# Initialize host to localhost, port to standard ftp port
# Optional arguments are host (for connect()),
# and user, passwd, acct (for login())
def __init__(self, host='', user='', passwd='', acct='',
timeout=_GLOBAL_DEFAULT_TIMEOUT):
self.timeout = timeout
if host:
self.connect(host)
if user:
self.login(user, passwd, acct)
def connect(self, host='', port=0, timeout=-999):
'''Connect to host. Arguments are:
- host: hostname to connect to (string, default previous host)
- port: port to connect to (integer, default previous port)
'''
if host != '':
self.host = host
if port > 0:
self.port = port
if timeout != -999:
self.timeout = timeout
self.sock = socket.create_connection((self.host, self.port), self.timeout)
self.af = self.sock.family
self.file = self.sock.makefile('rb')
self.welcome = self.getresp()
return self.welcome
def getwelcome(self):
'''Get the welcome message from the server.
(this is read and squirreled away by connect())'''
if self.debugging:
print '*welcome*', self.sanitize(self.welcome)
return self.welcome
def set_debuglevel(self, level):
'''Set the debugging level.
The required argument level means:
0: no debugging output (default)
1: print commands and responses but not body text etc.
2: also print raw lines read and sent before stripping CR/LF'''
self.debugging = level
debug = set_debuglevel
def set_pasv(self, val):
'''Use passive or active mode for data transfers.
With a false argument, use the normal PORT mode,
With a true argument, use the PASV command.'''
self.passiveserver = val
# Internal: "sanitize" a string for printing
def sanitize(self, s):
if s[:5] == 'pass ' or s[:5] == 'PASS ':
i = len(s)
while i > 5 and s[i-1] in '\r\n':
i = i-1
s = s[:5] + '*'*(i-5) + s[i:]
return repr(s)
# Internal: send one line to the server, appending CRLF
def putline(self, line):
if '\r' in line or '\n' in line:
raise ValueError('an illegal newline character should not be contained')
line = line + CRLF
if self.debugging > 1: print '*put*', self.sanitize(line)
self.sock.sendall(line)
# Internal: send one command to the server (through putline())
def putcmd(self, line):
if self.debugging: print '*cmd*', self.sanitize(line)
self.putline(line)
# Internal: return one line from the server, stripping CRLF.
# Raise EOFError if the connection is closed
def getline(self):
line = self.file.readline(self.maxline + 1)
if len(line) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if self.debugging > 1:
print '*get*', self.sanitize(line)
if not line: raise EOFError
if line[-2:] == CRLF: line = line[:-2]
elif line[-1:] in CRLF: line = line[:-1]
return line
# Internal: get a response from the server, which may possibly
# consist of multiple lines. Return a single string with no
# trailing CRLF. If the response consists of multiple lines,
# these are separated by '\n' characters in the string
def getmultiline(self):
line = self.getline()
if line[3:4] == '-':
code = line[:3]
while 1:
nextline = self.getline()
line = line + ('\n' + nextline)
if nextline[:3] == code and \
nextline[3:4] != '-':
break
return line
# Internal: get a response from the server.
# Raise various errors if the response indicates an error
def getresp(self):
resp = self.getmultiline()
if self.debugging: print '*resp*', self.sanitize(resp)
self.lastresp = resp[:3]
c = resp[:1]
if c in ('1', '2', '3'):
return resp
if c == '4':
raise error_temp, resp
if c == '5':
raise error_perm, resp
raise error_proto, resp
def voidresp(self):
"""Expect a response beginning with '2'."""
resp = self.getresp()
if resp[:1] != '2':
raise error_reply, resp
return resp
def abort(self):
'''Abort a file transfer. Uses out-of-band data.
This does not follow the procedure from the RFC to send Telnet
IP and Synch; that doesn't seem to work with the servers I've
tried. Instead, just send the ABOR command as OOB data.'''
line = 'ABOR' + CRLF
if self.debugging > 1: print '*put urgent*', self.sanitize(line)
self.sock.sendall(line, MSG_OOB)
resp = self.getmultiline()
if resp[:3] not in ('426', '225', '226'):
raise error_proto, resp
def sendcmd(self, cmd):
'''Send a command and return the response.'''
self.putcmd(cmd)
return self.getresp()
def voidcmd(self, cmd):
"""Send a command and expect a response beginning with '2'."""
self.putcmd(cmd)
return self.voidresp()
def sendport(self, host, port):
'''Send a PORT command with the current host and the given
port number.
'''
hbytes = host.split('.')
pbytes = [repr(port//256), repr(port%256)]
bytes = hbytes + pbytes
cmd = 'PORT ' + ','.join(bytes)
return self.voidcmd(cmd)
def sendeprt(self, host, port):
'''Send an EPRT command with the current host and the given port number.'''
af = 0
if self.af == socket.AF_INET:
af = 1
if self.af == socket.AF_INET6:
af = 2
if af == 0:
raise error_proto, 'unsupported address family'
fields = ['', repr(af), host, repr(port), '']
cmd = 'EPRT ' + '|'.join(fields)
return self.voidcmd(cmd)
def makeport(self):
'''Create a new socket and send a PORT command for it.'''
err = None
sock = None
for res in socket.getaddrinfo(None, 0, self.af, socket.SOCK_STREAM, 0, socket.AI_PASSIVE):
af, socktype, proto, canonname, sa = res
try:
sock = socket.socket(af, socktype, proto)
sock.bind(sa)
except socket.error, err:
if sock:
sock.close()
sock = None
continue
break
if sock is None:
if err is not None:
raise err
else:
raise socket.error("getaddrinfo returns an empty list")
sock.listen(1)
port = sock.getsockname()[1] # Get proper port
host = self.sock.getsockname()[0] # Get proper host
if self.af == socket.AF_INET:
resp = self.sendport(host, port)
else:
resp = self.sendeprt(host, port)
if self.timeout is not _GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(self.timeout)
return sock
def makepasv(self):
if self.af == socket.AF_INET:
host, port = parse227(self.sendcmd('PASV'))
else:
host, port = parse229(self.sendcmd('EPSV'), self.sock.getpeername())
return host, port
def ntransfercmd(self, cmd, rest=None):
"""Initiate a transfer over the data connection.
If the transfer is active, send a port command and the
transfer command, and accept the connection. If the server is
passive, send a pasv command, connect to it, and start the
transfer command. Either way, return the socket for the
connection and the expected size of the transfer. The
expected size may be None if it could not be determined.
Optional `rest' argument can be a string that is sent as the
argument to a REST command. This is essentially a server
marker used to tell the server to skip over any data up to the
given marker.
"""
size = None
if self.passiveserver:
host, port = self.makepasv()
conn = socket.create_connection((host, port), self.timeout)
try:
if rest is not None:
self.sendcmd("REST %s" % rest)
resp = self.sendcmd(cmd)
# Some servers apparently send a 200 reply to
# a LIST or STOR command, before the 150 reply
# (and way before the 226 reply). This seems to
# be in violation of the protocol (which only allows
# 1xx or error messages for LIST), so we just discard
# this response.
if resp[0] == '2':
resp = self.getresp()
if resp[0] != '1':
raise error_reply, resp
except:
conn.close()
raise
else:
sock = self.makeport()
try:
if rest is not None:
self.sendcmd("REST %s" % rest)
resp = self.sendcmd(cmd)
# See above.
if resp[0] == '2':
resp = self.getresp()
if resp[0] != '1':
raise error_reply, resp
conn, sockaddr = sock.accept()
if self.timeout is not _GLOBAL_DEFAULT_TIMEOUT:
conn.settimeout(self.timeout)
finally:
sock.close()
if resp[:3] == '150':
# this is conditional in case we received a 125
size = parse150(resp)
return conn, size
def transfercmd(self, cmd, rest=None):
"""Like ntransfercmd() but returns only the socket."""
return self.ntransfercmd(cmd, rest)[0]
def login(self, user = '', passwd = '', acct = ''):
'''Login, default anonymous.'''
if not user: user = 'anonymous'
if not passwd: passwd = ''
if not acct: acct = ''
if user == 'anonymous' and passwd in ('', '-'):
# If there is no anonymous ftp password specified
# then we'll just use anonymous@
# We don't send any other thing because:
# - We want to remain anonymous
# - We want to stop SPAM
# - We don't want to let ftp sites to discriminate by the user,
# host or country.
passwd = passwd + 'anonymous@'
resp = self.sendcmd('USER ' + user)
if resp[0] == '3': resp = self.sendcmd('PASS ' + passwd)
if resp[0] == '3': resp = self.sendcmd('ACCT ' + acct)
if resp[0] != '2':
raise error_reply, resp
return resp
def retrbinary(self, cmd, callback, blocksize=8192, rest=None):
"""Retrieve data in binary mode. A new port is created for you.
Args:
cmd: A RETR command.
callback: A single parameter callable to be called on each
block of data read.
blocksize: The maximum number of bytes to read from the
socket at one time. [default: 8192]
rest: Passed to transfercmd(). [default: None]
Returns:
The response code.
"""
self.voidcmd('TYPE I')
conn = self.transfercmd(cmd, rest)
try:
while 1:
data = conn.recv(blocksize)
if not data:
break
callback(data)
finally:
conn.close()
return self.voidresp()
def retrlines(self, cmd, callback = None):
"""Retrieve data in line mode. A new port is created for you.
Args:
cmd: A RETR, LIST, NLST, or MLSD command.
callback: An optional single parameter callable that is called
for each line with the trailing CRLF stripped.
[default: print_line()]
Returns:
The response code.
"""
if callback is None: callback = print_line
resp = self.sendcmd('TYPE A')
conn = self.transfercmd(cmd)
fp = None
try:
fp = conn.makefile('rb')
while 1:
line = fp.readline(self.maxline + 1)
if len(line) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if self.debugging > 2: print '*retr*', repr(line)
if not line:
break
if line[-2:] == CRLF:
line = line[:-2]
elif line[-1:] == '\n':
line = line[:-1]
callback(line)
finally:
if fp:
fp.close()
conn.close()
return self.voidresp()
def storbinary(self, cmd, fp, blocksize=8192, callback=None, rest=None):
"""Store a file in binary mode. A new port is created for you.
Args:
cmd: A STOR command.
fp: A file-like object with a read(num_bytes) method.
blocksize: The maximum data size to read from fp and send over
the connection at once. [default: 8192]
callback: An optional single parameter callable that is called on
each block of data after it is sent. [default: None]
rest: Passed to transfercmd(). [default: None]
Returns:
The response code.
"""
self.voidcmd('TYPE I')
conn = self.transfercmd(cmd, rest)
try:
while 1:
buf = fp.read(blocksize)
if not buf: break
conn.sendall(buf)
if callback: callback(buf)
finally:
conn.close()
return self.voidresp()
def storlines(self, cmd, fp, callback=None):
"""Store a file in line mode. A new port is created for you.
Args:
cmd: A STOR command.
fp: A file-like object with a readline() method.
callback: An optional single parameter callable that is called on
each line after it is sent. [default: None]
Returns:
The response code.
"""
self.voidcmd('TYPE A')
conn = self.transfercmd(cmd)
try:
while 1:
buf = fp.readline(self.maxline + 1)
if len(buf) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if not buf: break
if buf[-2:] != CRLF:
if buf[-1] in CRLF: buf = buf[:-1]
buf = buf + CRLF
conn.sendall(buf)
if callback: callback(buf)
finally:
conn.close()
return self.voidresp()
def acct(self, password):
'''Send new account name.'''
cmd = 'ACCT ' + password
return self.voidcmd(cmd)
def nlst(self, *args):
'''Return a list of files in a given directory (default the current).'''
cmd = 'NLST'
for arg in args:
cmd = cmd + (' ' + arg)
files = []
self.retrlines(cmd, files.append)
return files
def dir(self, *args):
'''List a directory in long form.
By default list current directory to stdout.
Optional last argument is callback function; all
non-empty arguments before it are concatenated to the
LIST command. (This *should* only be used for a pathname.)'''
cmd = 'LIST'
func = None
if args[-1:] and type(args[-1]) != type(''):
args, func = args[:-1], args[-1]
for arg in args:
if arg:
cmd = cmd + (' ' + arg)
self.retrlines(cmd, func)
def rename(self, fromname, toname):
'''Rename a file.'''
resp = self.sendcmd('RNFR ' + fromname)
if resp[0] != '3':
raise error_reply, resp
return self.voidcmd('RNTO ' + toname)
def delete(self, filename):
'''Delete a file.'''
resp = self.sendcmd('DELE ' + filename)
if resp[:3] in ('250', '200'):
return resp
else:
raise error_reply, resp
def cwd(self, dirname):
'''Change to a directory.'''
if dirname == '..':
try:
return self.voidcmd('CDUP')
except error_perm, msg:
if msg.args[0][:3] != '500':
raise
elif dirname == '':
dirname = '.' # does nothing, but could return error
cmd = 'CWD ' + dirname
return self.voidcmd(cmd)
def size(self, filename):
'''Retrieve the size of a file.'''
# The SIZE command is defined in RFC-3659
resp = self.sendcmd('SIZE ' + filename)
if resp[:3] == '213':
s = resp[3:].strip()
try:
return int(s)
except (OverflowError, ValueError):
return long(s)
def mkd(self, dirname):
'''Make a directory, return its full pathname.'''
resp = self.sendcmd('MKD ' + dirname)
return parse257(resp)
def rmd(self, dirname):
'''Remove a directory.'''
return self.voidcmd('RMD ' + dirname)
def pwd(self):
'''Return current working directory.'''
resp = self.sendcmd('PWD')
return parse257(resp)
def quit(self):
'''Quit, and close the connection.'''
resp = self.voidcmd('QUIT')
self.close()
return resp
def close(self):
'''Close the connection without assuming anything about it.'''
try:
file = self.file
self.file = None
if file is not None:
file.close()
finally:
sock = self.sock
self.sock = None
if sock is not None:
sock.close()
try:
import ssl
except ImportError:
pass
else:
class FTP_TLS(FTP):
'''A FTP subclass which adds TLS support to FTP as described
in RFC-4217.
Connect as usual to port 21 implicitly securing the FTP control
connection before authenticating.
Securing the data connection requires user to explicitly ask
for it by calling prot_p() method.
Usage example:
>>> from ftplib import FTP_TLS
>>> ftps = FTP_TLS('ftp.python.org')
>>> ftps.login() # login anonymously previously securing control channel
'230 Guest login ok, access restrictions apply.'
>>> ftps.prot_p() # switch to secure data connection
'200 Protection level set to P'
>>> ftps.retrlines('LIST') # list directory content securely
total 9
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 .
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 ..
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 bin
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 etc
d-wxrwxr-x 2 ftp wheel 1024 Sep 5 13:43 incoming
drwxr-xr-x 2 root wheel 1024 Nov 17 1993 lib
drwxr-xr-x 6 1094 wheel 1024 Sep 13 19:07 pub
drwxr-xr-x 3 root wheel 1024 Jan 3 1994 usr
-rw-r--r-- 1 root root 312 Aug 1 1994 welcome.msg
'226 Transfer complete.'
>>> ftps.quit()
'221 Goodbye.'
>>>
'''
ssl_version = ssl.PROTOCOL_SSLv23
def __init__(self, host='', user='', passwd='', acct='', keyfile=None,
certfile=None, context=None,
timeout=_GLOBAL_DEFAULT_TIMEOUT, source_address=None):
if context is not None and keyfile is not None:
raise ValueError("context and keyfile arguments are mutually "
"exclusive")
if context is not None and certfile is not None:
raise ValueError("context and certfile arguments are mutually "
"exclusive")
self.keyfile = keyfile
self.certfile = certfile
if context is None:
context = ssl._create_stdlib_context(self.ssl_version,
certfile=certfile,
keyfile=keyfile)
self.context = context
self._prot_p = False
FTP.__init__(self, host, user, passwd, acct, timeout)
def login(self, user='', passwd='', acct='', secure=True):
if secure and not isinstance(self.sock, ssl.SSLSocket):
self.auth()
return FTP.login(self, user, passwd, acct)
def auth(self):
'''Set up secure control connection by using TLS/SSL.'''
if isinstance(self.sock, ssl.SSLSocket):
raise ValueError("Already using TLS")
if self.ssl_version >= ssl.PROTOCOL_SSLv23:
resp = self.voidcmd('AUTH TLS')
else:
resp = self.voidcmd('AUTH SSL')
self.sock = self.context.wrap_socket(self.sock,
server_hostname=self.host)
self.file = self.sock.makefile(mode='rb')
return resp
def prot_p(self):
'''Set up secure data connection.'''
# PROT defines whether or not the data channel is to be protected.
# Though RFC-2228 defines four possible protection levels,
# RFC-4217 only recommends two, Clear and Private.
# Clear (PROT C) means that no security is to be used on the
# data-channel, Private (PROT P) means that the data-channel
# should be protected by TLS.
# PBSZ command MUST still be issued, but must have a parameter of
# '0' to indicate that no buffering is taking place and the data
# connection should not be encapsulated.
self.voidcmd('PBSZ 0')
resp = self.voidcmd('PROT P')
self._prot_p = True
return resp
def prot_c(self):
'''Set up clear text data connection.'''
resp = self.voidcmd('PROT C')
self._prot_p = False
return resp
# --- Overridden FTP methods
def ntransfercmd(self, cmd, rest=None):
conn, size = FTP.ntransfercmd(self, cmd, rest)
if self._prot_p:
conn = self.context.wrap_socket(conn,
server_hostname=self.host)
return conn, size
def retrbinary(self, cmd, callback, blocksize=8192, rest=None):
self.voidcmd('TYPE I')
conn = self.transfercmd(cmd, rest)
try:
while 1:
data = conn.recv(blocksize)
if not data:
break
callback(data)
# shutdown ssl layer
if isinstance(conn, ssl.SSLSocket):
conn.unwrap()
finally:
conn.close()
return self.voidresp()
def retrlines(self, cmd, callback = None):
if callback is None: callback = print_line
resp = self.sendcmd('TYPE A')
conn = self.transfercmd(cmd)
fp = conn.makefile('rb')
try:
while 1:
line = fp.readline(self.maxline + 1)
if len(line) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if self.debugging > 2: print '*retr*', repr(line)
if not line:
break
if line[-2:] == CRLF:
line = line[:-2]
elif line[-1:] == '\n':
line = line[:-1]
callback(line)
# shutdown ssl layer
if isinstance(conn, ssl.SSLSocket):
conn.unwrap()
finally:
fp.close()
conn.close()
return self.voidresp()
def storbinary(self, cmd, fp, blocksize=8192, callback=None, rest=None):
self.voidcmd('TYPE I')
conn = self.transfercmd(cmd, rest)
try:
while 1:
buf = fp.read(blocksize)
if not buf: break
conn.sendall(buf)
if callback: callback(buf)
# shutdown ssl layer
if isinstance(conn, ssl.SSLSocket):
conn.unwrap()
finally:
conn.close()
return self.voidresp()
def storlines(self, cmd, fp, callback=None):
self.voidcmd('TYPE A')
conn = self.transfercmd(cmd)
try:
while 1:
buf = fp.readline(self.maxline + 1)
if len(buf) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if not buf: break
if buf[-2:] != CRLF:
if buf[-1] in CRLF: buf = buf[:-1]
buf = buf + CRLF
conn.sendall(buf)
if callback: callback(buf)
# shutdown ssl layer
if isinstance(conn, ssl.SSLSocket):
conn.unwrap()
finally:
conn.close()
return self.voidresp()
__all__.append('FTP_TLS')
all_errors = (Error, IOError, EOFError, ssl.SSLError)
_150_re = None
def parse150(resp):
'''Parse the '150' response for a RETR request.
Returns the expected transfer size or None; size is not guaranteed to
be present in the 150 message.
'''
if resp[:3] != '150':
raise error_reply, resp
global _150_re
if _150_re is None:
import re
_150_re = re.compile("150 .* \((\d+) bytes\)", re.IGNORECASE)
m = _150_re.match(resp)
if not m:
return None
s = m.group(1)
try:
return int(s)
except (OverflowError, ValueError):
return long(s)
_227_re = None
def parse227(resp):
'''Parse the '227' response for a PASV request.
Raises error_proto if it does not contain '(h1,h2,h3,h4,p1,p2)'
Return ('host.addr.as.numbers', port#) tuple.'''
if resp[:3] != '227':
raise error_reply, resp
global _227_re
if _227_re is None:
import re
_227_re = re.compile(r'(\d+),(\d+),(\d+),(\d+),(\d+),(\d+)')
m = _227_re.search(resp)
if not m:
raise error_proto, resp
numbers = m.groups()
host = '.'.join(numbers[:4])
port = (int(numbers[4]) << 8) + int(numbers[5])
return host, port
def parse229(resp, peer):
'''Parse the '229' response for an EPSV request.
Raises error_proto if it does not contain '(|||port|)'
Return ('host.addr.as.numbers', port#) tuple.'''
if resp[:3] != '229':
raise error_reply, resp
left = resp.find('(')
if left < 0: raise error_proto, resp
right = resp.find(')', left + 1)
if right < 0:
raise error_proto, resp # should contain '(|||port|)'
if resp[left + 1] != resp[right - 1]:
raise error_proto, resp
parts = resp[left + 1:right].split(resp[left+1])
if len(parts) != 5:
raise error_proto, resp
host = peer[0]
port = int(parts[3])
return host, port
def parse257(resp):
'''Parse the '257' response for a MKD or PWD request.
This is a response to a MKD or PWD request: a directory name.
Returns the directoryname in the 257 reply.'''
if resp[:3] != '257':
raise error_reply, resp
if resp[3:5] != ' "':
return '' # Not compliant to RFC 959, but UNIX ftpd does this
dirname = ''
i = 5
n = len(resp)
while i < n:
c = resp[i]
i = i+1
if c == '"':
if i >= n or resp[i] != '"':
break
i = i+1
dirname = dirname + c
return dirname
def print_line(line):
'''Default retrlines callback to print a line.'''
print line
def ftpcp(source, sourcename, target, targetname = '', type = 'I'):
'''Copy file from one FTP-instance to another.'''
if not targetname: targetname = sourcename
type = 'TYPE ' + type
source.voidcmd(type)
target.voidcmd(type)
sourcehost, sourceport = parse227(source.sendcmd('PASV'))
target.sendport(sourcehost, sourceport)
# RFC 959: the user must "listen" [...] BEFORE sending the
# transfer request.
# So: STOR before RETR, because here the target is a "user".
treply = target.sendcmd('STOR ' + targetname)
if treply[:3] not in ('125', '150'): raise error_proto # RFC 959
sreply = source.sendcmd('RETR ' + sourcename)
if sreply[:3] not in ('125', '150'): raise error_proto # RFC 959
source.voidresp()
target.voidresp()
class Netrc:
"""Class to parse & provide access to 'netrc' format files.
See the netrc(4) man page for information on the file format.
WARNING: This class is obsolete -- use module netrc instead.
"""
__defuser = None
__defpasswd = None
__defacct = None
def __init__(self, filename=None):
if filename is None:
if "HOME" in os.environ:
filename = os.path.join(os.environ["HOME"],
".netrc")
else:
raise IOError, \
"specify file to load or set $HOME"
self.__hosts = {}
self.__macros = {}
fp = open(filename, "r")
in_macro = 0
while 1:
line = fp.readline(self.maxline + 1)
if len(line) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if not line: break
if in_macro and line.strip():
macro_lines.append(line)
continue
elif in_macro:
self.__macros[macro_name] = tuple(macro_lines)
in_macro = 0
words = line.split()
host = user = passwd = acct = None
default = 0
i = 0
while i < len(words):
w1 = words[i]
if i+1 < len(words):
w2 = words[i + 1]
else:
w2 = None
if w1 == 'default':
default = 1
elif w1 == 'machine' and w2:
host = w2.lower()
i = i + 1
elif w1 == 'login' and w2:
user = w2
i = i + 1
elif w1 == 'password' and w2:
passwd = w2
i = i + 1
elif w1 == 'account' and w2:
acct = w2
i = i + 1
elif w1 == 'macdef' and w2:
macro_name = w2
macro_lines = []
in_macro = 1
break
i = i + 1
if default:
self.__defuser = user or self.__defuser
self.__defpasswd = passwd or self.__defpasswd
self.__defacct = acct or self.__defacct
if host:
if host in self.__hosts:
ouser, opasswd, oacct = \
self.__hosts[host]
user = user or ouser
passwd = passwd or opasswd
acct = acct or oacct
self.__hosts[host] = user, passwd, acct
fp.close()
def get_hosts(self):
"""Return a list of hosts mentioned in the .netrc file."""
return self.__hosts.keys()
def get_account(self, host):
"""Returns login information for the named host.
The return value is a triple containing userid,
password, and the accounting field.
"""
host = host.lower()
user = passwd = acct = None
if host in self.__hosts:
user, passwd, acct = self.__hosts[host]
user = user or self.__defuser
passwd = passwd or self.__defpasswd
acct = acct or self.__defacct
return user, passwd, acct
def get_macros(self):
"""Return a list of all defined macro names."""
return self.__macros.keys()
def get_macro(self, macro):
"""Return a sequence of lines which define a named macro."""
return self.__macros[macro]
def test():
'''Test program.
Usage: ftp [-d] [-r[file]] host [-l[dir]] [-d[dir]] [-p] [file] ...
-d dir
-l list
-p password
'''
if len(sys.argv) < 2:
print test.__doc__
sys.exit(0)
debugging = 0
rcfile = None
while sys.argv[1] == '-d':
debugging = debugging+1
del sys.argv[1]
if sys.argv[1][:2] == '-r':
# get name of alternate ~/.netrc file:
rcfile = sys.argv[1][2:]
del sys.argv[1]
host = sys.argv[1]
ftp = FTP(host)
ftp.set_debuglevel(debugging)
userid = passwd = acct = ''
try:
netrc = Netrc(rcfile)
except IOError:
if rcfile is not None:
sys.stderr.write("Could not open account file"
" -- using anonymous login.")
else:
try:
userid, passwd, acct = netrc.get_account(host)
except KeyError:
# no account for host
sys.stderr.write(
"No account -- using anonymous login.")
ftp.login(userid, passwd, acct)
for file in sys.argv[2:]:
if file[:2] == '-l':
ftp.dir(file[2:])
elif file[:2] == '-d':
cmd = 'CWD'
if file[2:]: cmd = cmd + ' ' + file[2:]
resp = ftp.sendcmd(cmd)
elif file == '-p':
ftp.set_pasv(not ftp.passiveserver)
else:
ftp.retrbinary('RETR ' + file, \
sys.stdout.write, 1024)
ftp.quit()
if __name__ == '__main__':
test()
"""functools.py - Tools for working with functions and callable objects
"""
# Python module wrapper for _functools C module
# to allow utilities written in Python to be added
# to the functools module.
# Written by Nick Coghlan <ncoghlan at gmail.com>
# Copyright (C) 2006 Python Software Foundation.
# See C source code for _functools credits/copyright
from _functools import partial, reduce
# update_wrapper() and wraps() are tools to help write
# wrapper functions that can handle naive introspection
WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__doc__')
WRAPPER_UPDATES = ('__dict__',)
def update_wrapper(wrapper,
wrapped,
assigned = WRAPPER_ASSIGNMENTS,
updated = WRAPPER_UPDATES):
"""Update a wrapper function to look like the wrapped function
wrapper is the function to be updated
wrapped is the original function
assigned is a tuple naming the attributes assigned directly
from the wrapped function to the wrapper function (defaults to
functools.WRAPPER_ASSIGNMENTS)
updated is a tuple naming the attributes of the wrapper that
are updated with the corresponding attribute from the wrapped
function (defaults to functools.WRAPPER_UPDATES)
"""
for attr in assigned:
setattr(wrapper, attr, getattr(wrapped, attr))
for attr in updated:
getattr(wrapper, attr).update(getattr(wrapped, attr, {}))
# Return the wrapper so this can be used as a decorator via partial()
return wrapper
def wraps(wrapped,
assigned = WRAPPER_ASSIGNMENTS,
updated = WRAPPER_UPDATES):
"""Decorator factory to apply update_wrapper() to a wrapper function
Returns a decorator that invokes update_wrapper() with the decorated
function as the wrapper argument and the arguments to wraps() as the
remaining arguments. Default arguments are as for update_wrapper().
This is a convenience function to simplify applying partial() to
update_wrapper().
"""
return partial(update_wrapper, wrapped=wrapped,
assigned=assigned, updated=updated)
def total_ordering(cls):
"""Class decorator that fills in missing ordering methods"""
convert = {
'__lt__': [('__gt__', lambda self, other: not (self < other or self == other)),
('__le__', lambda self, other: self < other or self == other),
('__ne__', lambda self, other: not self == other),
('__ge__', lambda self, other: not self < other)],
'__le__': [('__ge__', lambda self, other: not self <= other or self == other),
('__lt__', lambda self, other: self <= other and not self == other),
('__ne__', lambda self, other: not self == other),
('__gt__', lambda self, other: not self <= other)],
'__gt__': [('__lt__', lambda self, other: not (self > other or self == other)),
('__ge__', lambda self, other: self > other or self == other),
('__ne__', lambda self, other: not self == other),
('__le__', lambda self, other: not self > other)],
'__ge__': [('__le__', lambda self, other: (not self >= other) or self == other),
('__gt__', lambda self, other: self >= other and not self == other),
('__ne__', lambda self, other: not self == other),
('__lt__', lambda self, other: not self >= other)]
}
defined_methods = set(dir(cls))
roots = defined_methods & set(convert)
if not roots:
raise ValueError('must define at least one ordering operation: < > <= >=')
root = max(roots) # prefer __lt__ to __le__ to __gt__ to __ge__
for opname, opfunc in convert[root]:
if opname not in defined_methods:
opfunc.__name__ = opname
opfunc.__doc__ = getattr(int, opname).__doc__
setattr(cls, opname, opfunc)
return cls
def cmp_to_key(mycmp):
"""Convert a cmp= function into a key= function"""
class K(object):
__slots__ = ['obj']
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(self.obj, other.obj) >= 0
def __ne__(self, other):
return mycmp(self.obj, other.obj) != 0
def __hash__(self):
raise TypeError('hash not implemented')
return K
"""
Path operations common to more than one OS
Do not use directly. The OS specific modules import the appropriate
functions from this module themselves.
"""
import os
import stat
__all__ = ['commonprefix', 'exists', 'getatime', 'getctime', 'getmtime',
'getsize', 'isdir', 'isfile']
try:
_unicode = unicode
except NameError:
# If Python is built without Unicode support, the unicode type
# will not exist. Fake one.
class _unicode(object):
pass
# Does a path exist?
# This is false for dangling symbolic links on systems that support them.
def exists(path):
"""Test whether a path exists. Returns False for broken symbolic links"""
try:
os.stat(path)
except os.error:
return False
return True
# This follows symbolic links, so both islink() and isdir() can be true
# for the same path on systems that support symlinks
def isfile(path):
"""Test whether a path is a regular file"""
try:
st = os.stat(path)
except os.error:
return False
return stat.S_ISREG(st.st_mode)
# Is a path a directory?
# This follows symbolic links, so both islink() and isdir()
# can be true for the same path on systems that support symlinks
def isdir(s):
"""Return true if the pathname refers to an existing directory."""
try:
st = os.stat(s)
except os.error:
return False
return stat.S_ISDIR(st.st_mode)
def getsize(filename):
"""Return the size of a file, reported by os.stat()."""
return os.stat(filename).st_size
def getmtime(filename):
"""Return the last modification time of a file, reported by os.stat()."""
return os.stat(filename).st_mtime
def getatime(filename):
"""Return the last access time of a file, reported by os.stat()."""
return os.stat(filename).st_atime
def getctime(filename):
"""Return the metadata change time of a file, reported by os.stat()."""
return os.stat(filename).st_ctime
# Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1
# Split a path in root and extension.
# The extension is everything starting at the last dot in the last
# pathname component; the root is everything before that.
# It is always true that root + ext == p.
# Generic implementation of splitext, to be parametrized with
# the separators
def _splitext(p, sep, altsep, extsep):
"""Split the extension from a pathname.
Extension is everything from the last dot to the end, ignoring
leading dots. Returns "(root, ext)"; ext may be empty."""
sepIndex = p.rfind(sep)
if altsep:
altsepIndex = p.rfind(altsep)
sepIndex = max(sepIndex, altsepIndex)
dotIndex = p.rfind(extsep)
if dotIndex > sepIndex:
# skip all leading dots
filenameIndex = sepIndex + 1
while filenameIndex < dotIndex:
if p[filenameIndex] != extsep:
return p[:dotIndex], p[dotIndex:]
filenameIndex += 1
return p, ''
"""Parser for command line options.
This module helps scripts to parse the command line arguments in
sys.argv. It supports the same conventions as the Unix getopt()
function (including the special meanings of arguments of the form `-'
and `--'). Long options similar to those supported by GNU software
may be used as well via an optional third argument. This module
provides two functions and an exception:
getopt() -- Parse command line options
gnu_getopt() -- Like getopt(), but allow option and non-option arguments
to be intermixed.
GetoptError -- exception (class) raised with 'opt' attribute, which is the
option involved with the exception.
"""
# Long option support added by Lars Wirzenius <[email protected]>.
#
# Gerrit Holl <[email protected]> moved the string-based exceptions
# to class-based exceptions.
#
# Peter Astrand <[email protected]> added gnu_getopt().
#
# TODO for gnu_getopt():
#
# - GNU getopt_long_only mechanism
# - allow the caller to specify ordering
# - RETURN_IN_ORDER option
# - GNU extension with '-' as first character of option string
# - optional arguments, specified by double colons
# - an option string with a W followed by semicolon should
# treat "-W foo" as "--foo"
__all__ = ["GetoptError","error","getopt","gnu_getopt"]
import os
class GetoptError(Exception):
opt = ''
msg = ''
def __init__(self, msg, opt=''):
self.msg = msg
self.opt = opt
Exception.__init__(self, msg, opt)
def __str__(self):
return self.msg
error = GetoptError # backward compatibility
def getopt(args, shortopts, longopts = []):
"""getopt(args, options[, long_options]) -> opts, args
Parses command line options and parameter list. args is the
argument list to be parsed, without the leading reference to the
running program. Typically, this means "sys.argv[1:]". shortopts
is the string of option letters that the script wants to
recognize, with options that require an argument followed by a
colon (i.e., the same format that Unix getopt() uses). If
specified, longopts is a list of strings with the names of the
long options which should be supported. The leading '--'
characters should not be included in the option name. Options
which require an argument should be followed by an equal sign
('=').
The return value consists of two elements: the first is a list of
(option, value) pairs; the second is the list of program arguments
left after the option list was stripped (this is a trailing slice
of the first argument). Each option-and-value pair returned has
the option as its first element, prefixed with a hyphen (e.g.,
'-x'), and the option argument as its second element, or an empty
string if the option has no argument. The options occur in the
list in the same order in which they were found, thus allowing
multiple occurrences. Long and short options may be mixed.
"""
opts = []
if type(longopts) == type(""):
longopts = [longopts]
else:
longopts = list(longopts)
while args and args[0].startswith('-') and args[0] != '-':
if args[0] == '--':
args = args[1:]
break
if args[0].startswith('--'):
opts, args = do_longs(opts, args[0][2:], longopts, args[1:])
else:
opts, args = do_shorts(opts, args[0][1:], shortopts, args[1:])
return opts, args
def gnu_getopt(args, shortopts, longopts = []):
"""getopt(args, options[, long_options]) -> opts, args
This function works like getopt(), except that GNU style scanning
mode is used by default. This means that option and non-option
arguments may be intermixed. The getopt() function stops
processing options as soon as a non-option argument is
encountered.
If the first character of the option string is `+', or if the
environment variable POSIXLY_CORRECT is set, then option
processing stops as soon as a non-option argument is encountered.
"""
opts = []
prog_args = []
if isinstance(longopts, str):
longopts = [longopts]
else:
longopts = list(longopts)
# Allow options after non-option arguments?
if shortopts.startswith('+'):
shortopts = shortopts[1:]
all_options_first = True
elif os.environ.get("POSIXLY_CORRECT"):
all_options_first = True
else:
all_options_first = False
while args:
if args[0] == '--':
prog_args += args[1:]
break
if args[0][:2] == '--':
opts, args = do_longs(opts, args[0][2:], longopts, args[1:])
elif args[0][:1] == '-' and args[0] != '-':
opts, args = do_shorts(opts, args[0][1:], shortopts, args[1:])
else:
if all_options_first:
prog_args += args
break
else:
prog_args.append(args[0])
args = args[1:]
return opts, prog_args
def do_longs(opts, opt, longopts, args):
try:
i = opt.index('=')
except ValueError:
optarg = None
else:
opt, optarg = opt[:i], opt[i+1:]
has_arg, opt = long_has_args(opt, longopts)
if has_arg:
if optarg is None:
if not args:
raise GetoptError('option --%s requires argument' % opt, opt)
optarg, args = args[0], args[1:]
elif optarg is not None:
raise GetoptError('option --%s must not have an argument' % opt, opt)
opts.append(('--' + opt, optarg or ''))
return opts, args
# Return:
# has_arg?
# full option name
def long_has_args(opt, longopts):
possibilities = [o for o in longopts if o.startswith(opt)]
if not possibilities:
raise GetoptError('option --%s not recognized' % opt, opt)
# Is there an exact match?
if opt in possibilities:
return False, opt
elif opt + '=' in possibilities:
return True, opt
# No exact match, so better be unique.
if len(possibilities) > 1:
# XXX since possibilities contains all valid continuations, might be
# nice to work them into the error msg
raise GetoptError('option --%s not a unique prefix' % opt, opt)
assert len(possibilities) == 1
unique_match = possibilities[0]
has_arg = unique_match.endswith('=')
if has_arg:
unique_match = unique_match[:-1]
return has_arg, unique_match
def do_shorts(opts, optstring, shortopts, args):
while optstring != '':
opt, optstring = optstring[0], optstring[1:]
if short_has_arg(opt, shortopts):
if optstring == '':
if not args:
raise GetoptError('option -%s requires argument' % opt,
opt)
optstring, args = args[0], args[1:]
optarg, optstring = optstring, ''
else:
optarg = ''
opts.append(('-' + opt, optarg))
return opts, args
def short_has_arg(opt, shortopts):
for i in range(len(shortopts)):
if opt == shortopts[i] != ':':
return shortopts.startswith(':', i+1)
raise GetoptError('option -%s not recognized' % opt, opt)
if __name__ == '__main__':
import sys
print getopt(sys.argv[1:], "a:b", ["alpha=", "beta"])
"""Utilities to get a password and/or the current user name.
getpass(prompt[, stream]) - Prompt for a password, with echo turned off.
getuser() - Get the user name from the environment or password database.
GetPassWarning - This UserWarning is issued when getpass() cannot prevent
echoing of the password contents while reading.
On Windows, the msvcrt module will be used.
On the Mac EasyDialogs.AskPassword is used, if available.
"""
# Authors: Piers Lauder (original)
# Guido van Rossum (Windows support and cleanup)
# Gregory P. Smith (tty support & GetPassWarning)
import os, sys, warnings
__all__ = ["getpass","getuser","GetPassWarning"]
class GetPassWarning(UserWarning): pass
def unix_getpass(prompt='Password: ', stream=None):
"""Prompt for a password, with echo turned off.
Args:
prompt: Written on stream to ask for the input. Default: 'Password: '
stream: A writable file object to display the prompt. Defaults to
the tty. If no tty is available defaults to sys.stderr.
Returns:
The seKr3t input.
Raises:
EOFError: If our input tty or stdin was closed.
GetPassWarning: When we were unable to turn echo off on the input.
Always restores terminal settings before returning.
"""
fd = None
tty = None
try:
# Always try reading and writing directly on the tty first.
fd = os.open('/dev/tty', os.O_RDWR|os.O_NOCTTY)
tty = os.fdopen(fd, 'w+', 1)
input = tty
if not stream:
stream = tty
except EnvironmentError, e:
# If that fails, see if stdin can be controlled.
try:
fd = sys.stdin.fileno()
except (AttributeError, ValueError):
passwd = fallback_getpass(prompt, stream)
input = sys.stdin
if not stream:
stream = sys.stderr
if fd is not None:
passwd = None
try:
old = termios.tcgetattr(fd) # a copy to save
new = old[:]
new[3] &= ~termios.ECHO # 3 == 'lflags'
tcsetattr_flags = termios.TCSAFLUSH
if hasattr(termios, 'TCSASOFT'):
tcsetattr_flags |= termios.TCSASOFT
try:
termios.tcsetattr(fd, tcsetattr_flags, new)
passwd = _raw_input(prompt, stream, input=input)
finally:
termios.tcsetattr(fd, tcsetattr_flags, old)
stream.flush() # issue7208
except termios.error, e:
if passwd is not None:
# _raw_input succeeded. The final tcsetattr failed. Reraise
# instead of leaving the terminal in an unknown state.
raise
# We can't control the tty or stdin. Give up and use normal IO.
# fallback_getpass() raises an appropriate warning.
del input, tty # clean up unused file objects before blocking
passwd = fallback_getpass(prompt, stream)
stream.write('\n')
return passwd
def win_getpass(prompt='Password: ', stream=None):
"""Prompt for password with echo off, using Windows getch()."""
if sys.stdin is not sys.__stdin__:
return fallback_getpass(prompt, stream)
import msvcrt
for c in prompt:
msvcrt.putch(c)
pw = ""
while 1:
c = msvcrt.getch()
if c == '\r' or c == '\n':
break
if c == '\003':
raise KeyboardInterrupt
if c == '\b':
pw = pw[:-1]
else:
pw = pw + c
msvcrt.putch('\r')
msvcrt.putch('\n')
return pw
def fallback_getpass(prompt='Password: ', stream=None):
warnings.warn("Can not control echo on the terminal.", GetPassWarning,
stacklevel=2)
if not stream:
stream = sys.stderr
print >>stream, "Warning: Password input may be echoed."
return _raw_input(prompt, stream)
def _raw_input(prompt="", stream=None, input=None):
# A raw_input() replacement that doesn't save the string in the
# GNU readline history.
if not stream:
stream = sys.stderr
if not input:
input = sys.stdin
prompt = str(prompt)
if prompt:
stream.write(prompt)
stream.flush()
# NOTE: The Python C API calls flockfile() (and unlock) during readline.
line = input.readline()
if not line:
raise EOFError
if line[-1] == '\n':
line = line[:-1]
return line
def getuser():
"""Get the username from the environment or password database.
First try various environment variables, then the password
database. This works on Windows as long as USERNAME is set.
"""
import os
for name in ('LOGNAME', 'USER', 'LNAME', 'USERNAME'):
user = os.environ.get(name)
if user:
return user
# If this fails, the exception will "explain" why
import pwd
return pwd.getpwuid(os.getuid())[0]
# Bind the name getpass to the appropriate function
try:
import termios
# it's possible there is an incompatible termios from the
# McMillan Installer, make sure we have a UNIX-compatible termios
termios.tcgetattr, termios.tcsetattr
except (ImportError, AttributeError):
try:
import msvcrt
except ImportError:
try:
from EasyDialogs import AskPassword
except ImportError:
getpass = fallback_getpass
else:
getpass = AskPassword
else:
getpass = win_getpass
else:
getpass = unix_getpass
"""Internationalization and localization support.
This module provides internationalization (I18N) and localization (L10N)
support for your Python programs by providing an interface to the GNU gettext
message catalog library.
I18N refers to the operation by which a program is made aware of multiple
languages. L10N refers to the adaptation of your program, once
internationalized, to the local language and cultural habits.
"""
# This module represents the integration of work, contributions, feedback, and
# suggestions from the following people:
#
# Martin von Loewis, who wrote the initial implementation of the underlying
# C-based libintlmodule (later renamed _gettext), along with a skeletal
# gettext.py implementation.
#
# Peter Funk, who wrote fintl.py, a fairly complete wrapper around intlmodule,
# which also included a pure-Python implementation to read .mo files if
# intlmodule wasn't available.
#
# James Henstridge, who also wrote a gettext.py module, which has some
# interesting, but currently unsupported experimental features: the notion of
# a Catalog class and instances, and the ability to add to a catalog file via
# a Python API.
#
# Barry Warsaw integrated these modules, wrote the .install() API and code,
# and conformed all C and Python code to Python's coding standards.
#
# Francois Pinard and Marc-Andre Lemburg also contributed valuably to this
# module.
#
# J. David Ibanez implemented plural forms. Bruno Haible fixed some bugs.
#
# TODO:
# - Lazy loading of .mo files. Currently the entire catalog is loaded into
# memory, but that's probably bad for large translated programs. Instead,
# the lexical sort of original strings in GNU .mo files should be exploited
# to do binary searches and lazy initializations. Or you might want to use
# the undocumented double-hash algorithm for .mo files with hash tables, but
# you'll need to study the GNU gettext code to do this.
#
# - Support Solaris .mo file formats. Unfortunately, we've been unable to
# find this format documented anywhere.
import locale, copy, os, re, struct, sys
from errno import ENOENT
__all__ = ['NullTranslations', 'GNUTranslations', 'Catalog',
'find', 'translation', 'install', 'textdomain', 'bindtextdomain',
'bind_textdomain_codeset',
'dgettext', 'dngettext', 'gettext', 'lgettext', 'ldgettext',
'ldngettext', 'lngettext', 'ngettext',
]
_default_localedir = os.path.join(sys.prefix, 'share', 'locale')
# Expression parsing for plural form selection.
#
# The gettext library supports a small subset of C syntax. The only
# incompatible difference is that integer literals starting with zero are
# decimal.
#
# https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
# http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y
_token_pattern = re.compile(r"""
(?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs
(?P<NUMBER>[0-9]+\b) | # decimal integer
(?P<NAME>n\b) | # only n is allowed
(?P<PARENTHESIS>[()]) |
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,
# <=, >=, ==, !=, &&, ||,
# ? :
# unary and bitwise ops
# not allowed
(?P<INVALID>\w+|.) # invalid token
""", re.VERBOSE|re.DOTALL)
def _tokenize(plural):
for mo in re.finditer(_token_pattern, plural):
kind = mo.lastgroup
if kind == 'WHITESPACES':
continue
value = mo.group(kind)
if kind == 'INVALID':
raise ValueError('invalid token in plural form: %s' % value)
yield value
yield ''
def _error(value):
if value:
return ValueError('unexpected token in plural form: %s' % value)
else:
return ValueError('unexpected end of plural form')
_binary_ops = (
('||',),
('&&',),
('==', '!='),
('<', '>', '<=', '>='),
('+', '-'),
('*', '/', '%'),
)
_binary_ops = {op: i for i, ops in enumerate(_binary_ops, 1) for op in ops}
_c2py_ops = {'||': 'or', '&&': 'and', '/': '//'}
def _parse(tokens, priority=-1):
result = ''
nexttok = next(tokens)
while nexttok == '!':
result += 'not '
nexttok = next(tokens)
if nexttok == '(':
sub, nexttok = _parse(tokens)
result = '%s(%s)' % (result, sub)
if nexttok != ')':
raise ValueError('unbalanced parenthesis in plural form')
elif nexttok == 'n':
result = '%s%s' % (result, nexttok)
else:
try:
value = int(nexttok, 10)
except ValueError:
raise _error(nexttok)
result = '%s%d' % (result, value)
nexttok = next(tokens)
j = 100
while nexttok in _binary_ops:
i = _binary_ops[nexttok]
if i < priority:
break
# Break chained comparisons
if i in (3, 4) and j in (3, 4): # '==', '!=', '<', '>', '<=', '>='
result = '(%s)' % result
# Replace some C operators by their Python equivalents
op = _c2py_ops.get(nexttok, nexttok)
right, nexttok = _parse(tokens, i + 1)
result = '%s %s %s' % (result, op, right)
j = i
if j == priority == 4: # '<', '>', '<=', '>='
result = '(%s)' % result
if nexttok == '?' and priority <= 0:
if_true, nexttok = _parse(tokens, 0)
if nexttok != ':':
raise _error(nexttok)
if_false, nexttok = _parse(tokens)
result = '%s if %s else %s' % (if_true, result, if_false)
if priority == 0:
result = '(%s)' % result
return result, nexttok
def _as_int(n):
try:
i = round(n)
except TypeError:
raise TypeError('Plural value must be an integer, got %s' %
(n.__class__.__name__,))
return n
def c2py(plural):
"""Gets a C expression as used in PO files for plural forms and returns a
Python function that implements an equivalent expression.
"""
if len(plural) > 1000:
raise ValueError('plural form expression is too long')
try:
result, nexttok = _parse(_tokenize(plural))
if nexttok:
raise _error(nexttok)
depth = 0
for c in result:
if c == '(':
depth += 1
if depth > 20:
# Python compiler limit is about 90.
# The most complex example has 2.
raise ValueError('plural form expression is too complex')
elif c == ')':
depth -= 1
ns = {'_as_int': _as_int}
exec('''if 1:
def func(n):
if not isinstance(n, int):
n = _as_int(n)
return int(%s)
''' % result, ns)
return ns['func']
except RuntimeError:
# Recursion error can be raised in _parse() or exec().
raise ValueError('plural form expression is too complex')
def _expand_lang(locale):
from locale import normalize
locale = normalize(locale)
COMPONENT_CODESET = 1 << 0
COMPONENT_TERRITORY = 1 << 1
COMPONENT_MODIFIER = 1 << 2
# split up the locale into its base components
mask = 0
pos = locale.find('@')
if pos >= 0:
modifier = locale[pos:]
locale = locale[:pos]
mask |= COMPONENT_MODIFIER
else:
modifier = ''
pos = locale.find('.')
if pos >= 0:
codeset = locale[pos:]
locale = locale[:pos]
mask |= COMPONENT_CODESET
else:
codeset = ''
pos = locale.find('_')
if pos >= 0:
territory = locale[pos:]
locale = locale[:pos]
mask |= COMPONENT_TERRITORY
else:
territory = ''
language = locale
ret = []
for i in range(mask+1):
if not (i & ~mask): # if all components for this combo exist ...
val = language
if i & COMPONENT_TERRITORY: val += territory
if i & COMPONENT_CODESET: val += codeset
if i & COMPONENT_MODIFIER: val += modifier
ret.append(val)
ret.reverse()
return ret
class NullTranslations:
def __init__(self, fp=None):
self._info = {}
self._charset = None
self._output_charset = None
self._fallback = None
if fp is not None:
self._parse(fp)
def _parse(self, fp):
pass
def add_fallback(self, fallback):
if self._fallback:
self._fallback.add_fallback(fallback)
else:
self._fallback = fallback
def gettext(self, message):
if self._fallback:
return self._fallback.gettext(message)
return message
def lgettext(self, message):
if self._fallback:
return self._fallback.lgettext(message)
return message
def ngettext(self, msgid1, msgid2, n):
if self._fallback:
return self._fallback.ngettext(msgid1, msgid2, n)
if n == 1:
return msgid1
else:
return msgid2
def lngettext(self, msgid1, msgid2, n):
if self._fallback:
return self._fallback.lngettext(msgid1, msgid2, n)
if n == 1:
return msgid1
else:
return msgid2
def ugettext(self, message):
if self._fallback:
return self._fallback.ugettext(message)
return unicode(message)
def ungettext(self, msgid1, msgid2, n):
if self._fallback:
return self._fallback.ungettext(msgid1, msgid2, n)
if n == 1:
return unicode(msgid1)
else:
return unicode(msgid2)
def info(self):
return self._info
def charset(self):
return self._charset
def output_charset(self):
return self._output_charset
def set_output_charset(self, charset):
self._output_charset = charset
def install(self, unicode=False, names=None):
import __builtin__
__builtin__.__dict__['_'] = unicode and self.ugettext or self.gettext
if hasattr(names, "__contains__"):
if "gettext" in names:
__builtin__.__dict__['gettext'] = __builtin__.__dict__['_']
if "ngettext" in names:
__builtin__.__dict__['ngettext'] = (unicode and self.ungettext
or self.ngettext)
if "lgettext" in names:
__builtin__.__dict__['lgettext'] = self.lgettext
if "lngettext" in names:
__builtin__.__dict__['lngettext'] = self.lngettext
class GNUTranslations(NullTranslations):
# Magic number of .mo files
LE_MAGIC = 0x950412deL
BE_MAGIC = 0xde120495L
def _parse(self, fp):
"""Override this method to support alternative .mo formats."""
unpack = struct.unpack
filename = getattr(fp, 'name', '')
# Parse the .mo file header, which consists of 5 little endian 32
# bit words.
self._catalog = catalog = {}
self.plural = lambda n: int(n != 1) # germanic plural by default
buf = fp.read()
buflen = len(buf)
# Are we big endian or little endian?
magic = unpack('<I', buf[:4])[0]
if magic == self.LE_MAGIC:
version, msgcount, masteridx, transidx = unpack('<4I', buf[4:20])
ii = '<II'
elif magic == self.BE_MAGIC:
version, msgcount, masteridx, transidx = unpack('>4I', buf[4:20])
ii = '>II'
else:
raise IOError(0, 'Bad magic number', filename)
# Now put all messages from the .mo file buffer into the catalog
# dictionary.
for i in xrange(0, msgcount):
mlen, moff = unpack(ii, buf[masteridx:masteridx+8])
mend = moff + mlen
tlen, toff = unpack(ii, buf[transidx:transidx+8])
tend = toff + tlen
if mend < buflen and tend < buflen:
msg = buf[moff:mend]
tmsg = buf[toff:tend]
else:
raise IOError(0, 'File is corrupt', filename)
# See if we're looking at GNU .mo conventions for metadata
if mlen == 0:
# Catalog description
lastk = None
for item in tmsg.splitlines():
item = item.strip()
if not item:
continue
k = v = None
if ':' in item:
k, v = item.split(':', 1)
k = k.strip().lower()
v = v.strip()
self._info[k] = v
lastk = k
elif lastk:
self._info[lastk] += '\n' + item
if k == 'content-type':
self._charset = v.split('charset=')[1]
elif k == 'plural-forms':
v = v.split(';')
plural = v[1].split('plural=')[1]
self.plural = c2py(plural)
# Note: we unconditionally convert both msgids and msgstrs to
# Unicode using the character encoding specified in the charset
# parameter of the Content-Type header. The gettext documentation
# strongly encourages msgids to be us-ascii, but some applications
# require alternative encodings (e.g. Zope's ZCML and ZPT). For
# traditional gettext applications, the msgid conversion will
# cause no problems since us-ascii should always be a subset of
# the charset encoding. We may want to fall back to 8-bit msgids
# if the Unicode conversion fails.
if '\x00' in msg:
# Plural forms
msgid1, msgid2 = msg.split('\x00')
tmsg = tmsg.split('\x00')
if self._charset:
msgid1 = unicode(msgid1, self._charset)
tmsg = [unicode(x, self._charset) for x in tmsg]
for i in range(len(tmsg)):
catalog[(msgid1, i)] = tmsg[i]
else:
if self._charset:
msg = unicode(msg, self._charset)
tmsg = unicode(tmsg, self._charset)
catalog[msg] = tmsg
# advance to next entry in the seek tables
masteridx += 8
transidx += 8
def gettext(self, message):
missing = object()
tmsg = self._catalog.get(message, missing)
if tmsg is missing:
if self._fallback:
return self._fallback.gettext(message)
return message
# Encode the Unicode tmsg back to an 8-bit string, if possible
if self._output_charset:
return tmsg.encode(self._output_charset)
elif self._charset:
return tmsg.encode(self._charset)
return tmsg
def lgettext(self, message):
missing = object()
tmsg = self._catalog.get(message, missing)
if tmsg is missing:
if self._fallback:
return self._fallback.lgettext(message)
return message
if self._output_charset:
return tmsg.encode(self._output_charset)
return tmsg.encode(locale.getpreferredencoding())
def ngettext(self, msgid1, msgid2, n):
try:
tmsg = self._catalog[(msgid1, self.plural(n))]
if self._output_charset:
return tmsg.encode(self._output_charset)
elif self._charset:
return tmsg.encode(self._charset)
return tmsg
except KeyError:
if self._fallback:
return self._fallback.ngettext(msgid1, msgid2, n)
if n == 1:
return msgid1
else:
return msgid2
def lngettext(self, msgid1, msgid2, n):
try:
tmsg = self._catalog[(msgid1, self.plural(n))]
if self._output_charset:
return tmsg.encode(self._output_charset)
return tmsg.encode(locale.getpreferredencoding())
except KeyError:
if self._fallback:
return self._fallback.lngettext(msgid1, msgid2, n)
if n == 1:
return msgid1
else:
return msgid2
def ugettext(self, message):
missing = object()
tmsg = self._catalog.get(message, missing)
if tmsg is missing:
if self._fallback:
return self._fallback.ugettext(message)
return unicode(message)
return tmsg
def ungettext(self, msgid1, msgid2, n):
try:
tmsg = self._catalog[(msgid1, self.plural(n))]
except KeyError:
if self._fallback:
return self._fallback.ungettext(msgid1, msgid2, n)
if n == 1:
tmsg = unicode(msgid1)
else:
tmsg = unicode(msgid2)
return tmsg
# Locate a .mo file using the gettext strategy
def find(domain, localedir=None, languages=None, all=0):
# Get some reasonable defaults for arguments that were not supplied
if localedir is None:
localedir = _default_localedir
if languages is None:
languages = []
for envar in ('LANGUAGE', 'LC_ALL', 'LC_MESSAGES', 'LANG'):
val = os.environ.get(envar)
if val:
languages = val.split(':')
break
if 'C' not in languages:
languages.append('C')
# now normalize and expand the languages
nelangs = []
for lang in languages:
for nelang in _expand_lang(lang):
if nelang not in nelangs:
nelangs.append(nelang)
# select a language
if all:
result = []
else:
result = None
for lang in nelangs:
if lang == 'C':
break
mofile = os.path.join(localedir, lang, 'LC_MESSAGES', '%s.mo' % domain)
if os.path.exists(mofile):
if all:
result.append(mofile)
else:
return mofile
return result
# a mapping between absolute .mo file path and Translation object
_translations = {}
def translation(domain, localedir=None, languages=None,
class_=None, fallback=False, codeset=None):
if class_ is None:
class_ = GNUTranslations
mofiles = find(domain, localedir, languages, all=1)
if not mofiles:
if fallback:
return NullTranslations()
raise IOError(ENOENT, 'No translation file found for domain', domain)
# Avoid opening, reading, and parsing the .mo file after it's been done
# once.
result = None
for mofile in mofiles:
key = (class_, os.path.abspath(mofile))
t = _translations.get(key)
if t is None:
with open(mofile, 'rb') as fp:
t = _translations.setdefault(key, class_(fp))
# Copy the translation object to allow setting fallbacks and
# output charset. All other instance data is shared with the
# cached object.
t = copy.copy(t)
if codeset:
t.set_output_charset(codeset)
if result is None:
result = t
else:
result.add_fallback(t)
return result
def install(domain, localedir=None, unicode=False, codeset=None, names=None):
t = translation(domain, localedir, fallback=True, codeset=codeset)
t.install(unicode, names)
# a mapping b/w domains and locale directories
_localedirs = {}
# a mapping b/w domains and codesets
_localecodesets = {}
# current global domain, `messages' used for compatibility w/ GNU gettext
_current_domain = 'messages'
def textdomain(domain=None):
global _current_domain
if domain is not None:
_current_domain = domain
return _current_domain
def bindtextdomain(domain, localedir=None):
global _localedirs
if localedir is not None:
_localedirs[domain] = localedir
return _localedirs.get(domain, _default_localedir)
def bind_textdomain_codeset(domain, codeset=None):
global _localecodesets
if codeset is not None:
_localecodesets[domain] = codeset
return _localecodesets.get(domain)
def dgettext(domain, message):
try:
t = translation(domain, _localedirs.get(domain, None),
codeset=_localecodesets.get(domain))
except IOError:
return message
return t.gettext(message)
def ldgettext(domain, message):
try:
t = translation(domain, _localedirs.get(domain, None),
codeset=_localecodesets.get(domain))
except IOError:
return message
return t.lgettext(message)
def dngettext(domain, msgid1, msgid2, n):
try:
t = translation(domain, _localedirs.get(domain, None),
codeset=_localecodesets.get(domain))
except IOError:
if n == 1:
return msgid1
else:
return msgid2
return t.ngettext(msgid1, msgid2, n)
def ldngettext(domain, msgid1, msgid2, n):
try:
t = translation(domain, _localedirs.get(domain, None),
codeset=_localecodesets.get(domain))
except IOError:
if n == 1:
return msgid1
else:
return msgid2
return t.lngettext(msgid1, msgid2, n)
def gettext(message):
return dgettext(_current_domain, message)
def lgettext(message):
return ldgettext(_current_domain, message)
def ngettext(msgid1, msgid2, n):
return dngettext(_current_domain, msgid1, msgid2, n)
def lngettext(msgid1, msgid2, n):
return ldngettext(_current_domain, msgid1, msgid2, n)
# dcgettext() has been deemed unnecessary and is not implemented.
# James Henstridge's Catalog constructor from GNOME gettext. Documented usage
# was:
#
# import gettext
# cat = gettext.Catalog(PACKAGE, localedir=LOCALEDIR)
# _ = cat.gettext
# print _('Hello World')
# The resulting catalog object currently don't support access through a
# dictionary API, which was supported (but apparently unused) in GNOME
# gettext.
Catalog = translation
"""Filename globbing utility."""
import sys
import os
import re
import fnmatch
try:
_unicode = unicode
except NameError:
# If Python is built without Unicode support, the unicode type
# will not exist. Fake one.
class _unicode(object):
pass
__all__ = ["glob", "iglob"]
def glob(pathname):
"""Return a list of paths matching a pathname pattern.
The pattern may contain simple shell-style wildcards a la
fnmatch. However, unlike fnmatch, filenames starting with a
dot are special cases that are not matched by '*' and '?'
patterns.
"""
return list(iglob(pathname))
def iglob(pathname):
"""Return an iterator which yields the paths matching a pathname pattern.
The pattern may contain simple shell-style wildcards a la
fnmatch. However, unlike fnmatch, filenames starting with a
dot are special cases that are not matched by '*' and '?'
patterns.
"""
dirname, basename = os.path.split(pathname)
if not has_magic(pathname):
if basename:
if os.path.lexists(pathname):
yield pathname
else:
# Patterns ending with a slash should match only directories
if os.path.isdir(dirname):
yield pathname
return
if not dirname:
for name in glob1(os.curdir, basename):
yield name
return
# `os.path.split()` returns the argument itself as a dirname if it is a
# drive or UNC path. Prevent an infinite recursion if a drive or UNC path
# contains magic characters (i.e. r'\\?\C:').
if dirname != pathname and has_magic(dirname):
dirs = iglob(dirname)
else:
dirs = [dirname]
if has_magic(basename):
glob_in_dir = glob1
else:
glob_in_dir = glob0
for dirname in dirs:
for name in glob_in_dir(dirname, basename):
yield os.path.join(dirname, name)
# These 2 helper functions non-recursively glob inside a literal directory.
# They return a list of basenames. `glob1` accepts a pattern while `glob0`
# takes a literal basename (so it only has to check for its existence).
def glob1(dirname, pattern):
if not dirname:
dirname = os.curdir
if isinstance(pattern, _unicode) and not isinstance(dirname, unicode):
dirname = unicode(dirname, sys.getfilesystemencoding() or
sys.getdefaultencoding())
try:
names = os.listdir(dirname)
except os.error:
return []
if pattern[0] != '.':
names = filter(lambda x: x[0] != '.', names)
return fnmatch.filter(names, pattern)
def glob0(dirname, basename):
if basename == '':
# `os.path.split()` returns an empty basename for paths ending with a
# directory separator. 'q*x/' should match only directories.
if os.path.isdir(dirname):
return [basename]
else:
if os.path.lexists(os.path.join(dirname, basename)):
return [basename]
return []
magic_check = re.compile('[*?[]')
def has_magic(s):
return magic_check.search(s) is not None
"""Functions that read and write gzipped files.
The user of the file doesn't have to worry about the compression,
but random access is not allowed."""
# based on Andrew Kuchling's minigzip.py distributed with the zlib module
import struct, sys, time, os
import zlib
import io
import __builtin__
__all__ = ["GzipFile","open"]
FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT = 1, 2, 4, 8, 16
READ, WRITE = 1, 2
def write32u(output, value):
# The L format writes the bit pattern correctly whether signed
# or unsigned.
output.write(struct.pack("<L", value))
def read32(input):
return struct.unpack("<I", input.read(4))[0]
def open(filename, mode="rb", compresslevel=9):
"""Shorthand for GzipFile(filename, mode, compresslevel).
The filename argument is required; mode defaults to 'rb'
and compresslevel defaults to 9.
"""
return GzipFile(filename, mode, compresslevel)
class GzipFile(io.BufferedIOBase):
"""The GzipFile class simulates most of the methods of a file object with
the exception of the readinto() and truncate() methods.
"""
myfileobj = None
max_read_chunk = 10 * 1024 * 1024 # 10Mb
def __init__(self, filename=None, mode=None,
compresslevel=9, fileobj=None, mtime=None):
"""Constructor for the GzipFile class.
At least one of fileobj and filename must be given a
non-trivial value.
The new class instance is based on fileobj, which can be a regular
file, a StringIO object, or any other object which simulates a file.
It defaults to None, in which case filename is opened to provide
a file object.
When fileobj is not None, the filename argument is only used to be
included in the gzip file header, which may include the original
filename of the uncompressed file. It defaults to the filename of
fileobj, if discernible; otherwise, it defaults to the empty string,
and in this case the original filename is not included in the header.
The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', or 'wb',
depending on whether the file will be read or written. The default
is the mode of fileobj if discernible; otherwise, the default is 'rb'.
Be aware that only the 'rb', 'ab', and 'wb' values should be used
for cross-platform portability.
The compresslevel argument is an integer from 0 to 9 controlling the
level of compression; 1 is fastest and produces the least compression,
and 9 is slowest and produces the most compression. 0 is no compression
at all. The default is 9.
The mtime argument is an optional numeric timestamp to be written
to the stream when compressing. All gzip compressed streams
are required to contain a timestamp. If omitted or None, the
current time is used. This module ignores the timestamp when
decompressing; however, some programs, such as gunzip, make use
of it. The format of the timestamp is the same as that of the
return value of time.time() and of the st_mtime member of the
object returned by os.stat().
"""
# Make sure we don't inadvertently enable universal newlines on the
# underlying file object - in read mode, this causes data corruption.
if mode:
mode = mode.replace('U', '')
# guarantee the file is opened in binary mode on platforms
# that care about that sort of thing
if mode and 'b' not in mode:
mode += 'b'
if fileobj is None:
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
if filename is None:
# Issue #13781: os.fdopen() creates a fileobj with a bogus name
# attribute. Avoid saving this in the gzip header's filename field.
filename = getattr(fileobj, 'name', '')
if not isinstance(filename, basestring) or filename == '<fdopen>':
filename = ''
if mode is None:
if hasattr(fileobj, 'mode'): mode = fileobj.mode
else: mode = 'rb'
if mode[0:1] == 'r':
self.mode = READ
# Set flag indicating start of a new member
self._new_member = True
# Buffer data read from gzip file. extrastart is offset in
# stream where buffer starts. extrasize is number of
# bytes remaining in buffer from current stream position.
self.extrabuf = ""
self.extrasize = 0
self.extrastart = 0
self.name = filename
# Starts small, scales exponentially
self.min_readsize = 100
elif mode[0:1] == 'w' or mode[0:1] == 'a':
self.mode = WRITE
self._init_write(filename)
self.compress = zlib.compressobj(compresslevel,
zlib.DEFLATED,
-zlib.MAX_WBITS,
zlib.DEF_MEM_LEVEL,
0)
else:
raise IOError, "Mode " + mode + " not supported"
self.fileobj = fileobj
self.offset = 0
self.mtime = mtime
if self.mode == WRITE:
self._write_gzip_header()
@property
def filename(self):
import warnings
warnings.warn("use the name attribute", DeprecationWarning, 2)
if self.mode == WRITE and self.name[-3:] != ".gz":
return self.name + ".gz"
return self.name
def __repr__(self):
s = repr(self.fileobj)
return '<gzip ' + s[1:-1] + ' ' + hex(id(self)) + '>'
def _check_closed(self):
"""Raises a ValueError if the underlying file object has been closed.
"""
if self.closed:
raise ValueError('I/O operation on closed file.')
def _init_write(self, filename):
self.name = filename
self.crc = zlib.crc32("") & 0xffffffffL
self.size = 0
self.writebuf = []
self.bufsize = 0
def _write_gzip_header(self):
self.fileobj.write('\037\213') # magic header
self.fileobj.write('\010') # compression method
try:
# RFC 1952 requires the FNAME field to be Latin-1. Do not
# include filenames that cannot be represented that way.
fname = os.path.basename(self.name)
if not isinstance(fname, str):
fname = fname.encode('latin-1')
if fname.endswith('.gz'):
fname = fname[:-3]
except UnicodeEncodeError:
fname = ''
flags = 0
if fname:
flags = FNAME
self.fileobj.write(chr(flags))
mtime = self.mtime
if mtime is None:
mtime = time.time()
write32u(self.fileobj, long(mtime))
self.fileobj.write('\002')
self.fileobj.write('\377')
if fname:
self.fileobj.write(fname + '\000')
def _init_read(self):
self.crc = zlib.crc32("") & 0xffffffffL
self.size = 0
def _read_gzip_header(self):
magic = self.fileobj.read(2)
if magic != '\037\213':
raise IOError, 'Not a gzipped file'
method = ord( self.fileobj.read(1) )
if method != 8:
raise IOError, 'Unknown compression method'
flag = ord( self.fileobj.read(1) )
self.mtime = read32(self.fileobj)
# extraflag = self.fileobj.read(1)
# os = self.fileobj.read(1)
self.fileobj.read(2)
if flag & FEXTRA:
# Read & discard the extra field, if present
xlen = ord(self.fileobj.read(1))
xlen = xlen + 256*ord(self.fileobj.read(1))
self.fileobj.read(xlen)
if flag & FNAME:
# Read and discard a null-terminated string containing the filename
while True:
s = self.fileobj.read(1)
if not s or s=='\000':
break
if flag & FCOMMENT:
# Read and discard a null-terminated string containing a comment
while True:
s = self.fileobj.read(1)
if not s or s=='\000':
break
if flag & FHCRC:
self.fileobj.read(2) # Read & discard the 16-bit header CRC
def write(self,data):
self._check_closed()
if self.mode != WRITE:
import errno
raise IOError(errno.EBADF, "write() on read-only GzipFile object")
if self.fileobj is None:
raise ValueError, "write() on closed GzipFile object"
# Convert data type if called by io.BufferedWriter.
if isinstance(data, memoryview):
data = data.tobytes()
if len(data) > 0:
self.fileobj.write(self.compress.compress(data))
self.size += len(data)
self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
self.offset += len(data)
return len(data)
def read(self, size=-1):
self._check_closed()
if self.mode != READ:
import errno
raise IOError(errno.EBADF, "read() on write-only GzipFile object")
if self.extrasize <= 0 and self.fileobj is None:
return ''
readsize = 1024
if size < 0: # get the whole thing
try:
while True:
self._read(readsize)
readsize = min(self.max_read_chunk, readsize * 2)
except EOFError:
size = self.extrasize
else: # just get some more of it
try:
while size > self.extrasize:
self._read(readsize)
readsize = min(self.max_read_chunk, readsize * 2)
except EOFError:
if size > self.extrasize:
size = self.extrasize
offset = self.offset - self.extrastart
chunk = self.extrabuf[offset: offset + size]
self.extrasize = self.extrasize - size
self.offset += size
return chunk
def _unread(self, buf):
self.extrasize = len(buf) + self.extrasize
self.offset -= len(buf)
def _read(self, size=1024):
if self.fileobj is None:
raise EOFError, "Reached EOF"
if self._new_member:
# If the _new_member flag is set, we have to
# jump to the next member, if there is one.
#
# First, check if we're at the end of the file;
# if so, it's time to stop; no more members to read.
pos = self.fileobj.tell() # Save current position
self.fileobj.seek(0, 2) # Seek to end of file
if pos == self.fileobj.tell():
raise EOFError, "Reached EOF"
else:
self.fileobj.seek( pos ) # Return to original position
self._init_read()
self._read_gzip_header()
self.decompress = zlib.decompressobj(-zlib.MAX_WBITS)
self._new_member = False
# Read a chunk of data from the file
buf = self.fileobj.read(size)
# If the EOF has been reached, flush the decompression object
# and mark this object as finished.
if buf == "":
uncompress = self.decompress.flush()
self._read_eof()
self._add_read_data( uncompress )
raise EOFError, 'Reached EOF'
uncompress = self.decompress.decompress(buf)
self._add_read_data( uncompress )
if self.decompress.unused_data != "":
# Ending case: we've come to the end of a member in the file,
# so seek back to the start of the unused data, finish up
# this member, and read a new gzip header.
# (The number of bytes to seek back is the length of the unused
# data, minus 8 because _read_eof() will rewind a further 8 bytes)
self.fileobj.seek( -len(self.decompress.unused_data)+8, 1)
# Check the CRC and file size, and set the flag so we read
# a new member on the next call
self._read_eof()
self._new_member = True
def _add_read_data(self, data):
self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
offset = self.offset - self.extrastart
self.extrabuf = self.extrabuf[offset:] + data
self.extrasize = self.extrasize + len(data)
self.extrastart = self.offset
self.size = self.size + len(data)
def _read_eof(self):
# We've read to the end of the file, so we have to rewind in order
# to reread the 8 bytes containing the CRC and the file size.
# We check the that the computed CRC and size of the
# uncompressed data matches the stored values. Note that the size
# stored is the true file size mod 2**32.
self.fileobj.seek(-8, 1)
crc32 = read32(self.fileobj)
isize = read32(self.fileobj) # may exceed 2GB
if crc32 != self.crc:
raise IOError("CRC check failed %s != %s" % (hex(crc32),
hex(self.crc)))
elif isize != (self.size & 0xffffffffL):
raise IOError, "Incorrect length of data produced"
# Gzip files can be padded with zeroes and still have archives.
# Consume all zero bytes and set the file position to the first
# non-zero byte. See http://www.gzip.org/#faq8
c = "\x00"
while c == "\x00":
c = self.fileobj.read(1)
if c:
self.fileobj.seek(-1, 1)
@property
def closed(self):
return self.fileobj is None
def close(self):
fileobj = self.fileobj
if fileobj is None:
return
self.fileobj = None
try:
if self.mode == WRITE:
fileobj.write(self.compress.flush())
write32u(fileobj, self.crc)
# self.size may exceed 2GB, or even 4GB
write32u(fileobj, self.size & 0xffffffffL)
finally:
myfileobj = self.myfileobj
if myfileobj:
self.myfileobj = None
myfileobj.close()
def flush(self,zlib_mode=zlib.Z_SYNC_FLUSH):
self._check_closed()
if self.mode == WRITE:
# Ensure the compressor's buffer is flushed
self.fileobj.write(self.compress.flush(zlib_mode))
self.fileobj.flush()
def fileno(self):
"""Invoke the underlying file object's fileno() method.
This will raise AttributeError if the underlying file object
doesn't support fileno().
"""
return self.fileobj.fileno()
def rewind(self):
'''Return the uncompressed stream file position indicator to the
beginning of the file'''
if self.mode != READ:
raise IOError("Can't rewind in write mode")
self.fileobj.seek(0)
self._new_member = True
self.extrabuf = ""
self.extrasize = 0
self.extrastart = 0
self.offset = 0
def readable(self):
return self.mode == READ
def writable(self):
return self.mode == WRITE
def seekable(self):
return True
def seek(self, offset, whence=0):
if whence:
if whence == 1:
offset = self.offset + offset
else:
raise ValueError('Seek from end not supported')
if self.mode == WRITE:
if offset < self.offset:
raise IOError('Negative seek in write mode')
count = offset - self.offset
for i in xrange(count // 1024):
self.write(1024 * '\0')
self.write((count % 1024) * '\0')
elif self.mode == READ:
if offset < self.offset:
# for negative seek, rewind and do positive seek
self.rewind()
count = offset - self.offset
for i in xrange(count // 1024):
self.read(1024)
self.read(count % 1024)
return self.offset
def readline(self, size=-1):
if size < 0:
# Shortcut common case - newline found in buffer.
offset = self.offset - self.extrastart
i = self.extrabuf.find('\n', offset) + 1
if i > 0:
self.extrasize -= i - offset
self.offset += i - offset
return self.extrabuf[offset: i]
size = sys.maxint
readsize = self.min_readsize
else:
readsize = size
bufs = []
while size != 0:
c = self.read(readsize)
i = c.find('\n')
# We set i=size to break out of the loop under two
# conditions: 1) there's no newline, and the chunk is
# larger than size, or 2) there is a newline, but the
# resulting line would be longer than 'size'.
if (size <= i) or (i == -1 and len(c) > size):
i = size - 1
if i >= 0 or c == '':
bufs.append(c[:i + 1]) # Add portion of last chunk
self._unread(c[i + 1:]) # Push back rest of chunk
break
# Append chunk to list, decrease 'size',
bufs.append(c)
size = size - len(c)
readsize = min(size, readsize * 2)
if readsize > self.min_readsize:
self.min_readsize = min(readsize, self.min_readsize * 2, 512)
return ''.join(bufs) # Return resulting line
def _test():
# Act like gzip; with -d, act like gunzip.
# The input file is not deleted, however, nor are any other gzip
# options or features supported.
args = sys.argv[1:]
decompress = args and args[0] == "-d"
if decompress:
args = args[1:]
if not args:
args = ["-"]
for arg in args:
if decompress:
if arg == "-":
f = GzipFile(filename="", mode="rb", fileobj=sys.stdin)
g = sys.stdout
else:
if arg[-3:] != ".gz":
print "filename doesn't end in .gz:", repr(arg)
continue
f = open(arg, "rb")
g = __builtin__.open(arg[:-3], "wb")
else:
if arg == "-":
f = sys.stdin
g = GzipFile(filename="", mode="wb", fileobj=sys.stdout)
else:
f = __builtin__.open(arg, "rb")
g = open(arg + ".gz", "wb")
while True:
chunk = f.read(1024)
if not chunk:
break
g.write(chunk)
if g is not sys.stdout:
g.close()
if f is not sys.stdin:
f.close()
if __name__ == '__main__':
_test()
# $Id$
#
# Copyright (C) 2005 Gregory P. Smith ([email protected])
# Licensed to PSF under a Contributor Agreement.
#
__doc__ = """hashlib module - A common interface to many hash functions.
new(name, string='') - returns a new hash object implementing the
given hash function; initializing the hash
using the given string data.
Named constructor functions are also available, these are much faster
than using new():
md5(), sha1(), sha224(), sha256(), sha384(), and sha512()
More algorithms may be available on your platform but the above are guaranteed
to exist. See the algorithms_guaranteed and algorithms_available attributes
to find out what algorithm names can be passed to new().
NOTE: If you want the adler32 or crc32 hash functions they are available in
the zlib module.
Choose your hash function wisely. Some have known collision weaknesses.
sha384 and sha512 will be slow on 32 bit platforms.
Hash objects have these methods:
- update(arg): Update the hash object with the string arg. Repeated calls
are equivalent to a single call with the concatenation of all
the arguments.
- digest(): Return the digest of the strings passed to the update() method
so far. This may contain non-ASCII characters, including
NUL bytes.
- hexdigest(): Like digest() except the digest is returned as a string of
double length, containing only hexadecimal digits.
- copy(): Return a copy (clone) of the hash object. This can be used to
efficiently compute the digests of strings that share a common
initial substring.
For example, to obtain the digest of the string 'Nobody inspects the
spammish repetition':
>>> import hashlib
>>> m = hashlib.md5()
>>> m.update("Nobody inspects")
>>> m.update(" the spammish repetition")
>>> m.digest()
'\\xbbd\\x9c\\x83\\xdd\\x1e\\xa5\\xc9\\xd9\\xde\\xc9\\xa1\\x8d\\xf0\\xff\\xe9'
More condensed:
>>> hashlib.sha224("Nobody inspects the spammish repetition").hexdigest()
'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'
"""
# This tuple and __get_builtin_constructor() must be modified if a new
# always available algorithm is added.
__always_supported = ('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512')
algorithms_guaranteed = set(__always_supported)
algorithms_available = set(__always_supported)
algorithms = __always_supported
__all__ = __always_supported + ('new', 'algorithms_guaranteed',
'algorithms_available', 'algorithms',
'pbkdf2_hmac')
def __get_builtin_constructor(name):
try:
if name in ('SHA1', 'sha1'):
import _sha
return _sha.new
elif name in ('MD5', 'md5'):
import _md5
return _md5.new
elif name in ('SHA256', 'sha256', 'SHA224', 'sha224'):
import _sha256
bs = name[3:]
if bs == '256':
return _sha256.sha256
elif bs == '224':
return _sha256.sha224
elif name in ('SHA512', 'sha512', 'SHA384', 'sha384'):
import _sha512
bs = name[3:]
if bs == '512':
return _sha512.sha512
elif bs == '384':
return _sha512.sha384
except ImportError:
pass # no extension module, this hash is unsupported.
raise ValueError('unsupported hash type ' + name)
def __get_openssl_constructor(name):
try:
f = getattr(_hashlib, 'openssl_' + name)
# Allow the C module to raise ValueError. The function will be
# defined but the hash not actually available thanks to OpenSSL.
f()
# Use the C function directly (very fast)
return f
except (AttributeError, ValueError):
return __get_builtin_constructor(name)
def __py_new(name, string=''):
"""new(name, string='') - Return a new hashing object using the named algorithm;
optionally initialized with a string.
"""
return __get_builtin_constructor(name)(string)
def __hash_new(name, string=''):
"""new(name, string='') - Return a new hashing object using the named algorithm;
optionally initialized with a string.
"""
try:
return _hashlib.new(name, string)
except ValueError:
# If the _hashlib module (OpenSSL) doesn't support the named
# hash, try using our builtin implementations.
# This allows for SHA224/256 and SHA384/512 support even though
# the OpenSSL library prior to 0.9.8 doesn't provide them.
return __get_builtin_constructor(name)(string)
try:
import _hashlib
new = __hash_new
__get_hash = __get_openssl_constructor
algorithms_available = algorithms_available.union(
_hashlib.openssl_md_meth_names)
except ImportError:
new = __py_new
__get_hash = __get_builtin_constructor
for __func_name in __always_supported:
# try them all, some may not work due to the OpenSSL
# version not supporting that algorithm.
try:
globals()[__func_name] = __get_hash(__func_name)
except ValueError:
import logging
logging.exception('code for hash %s was not found.', __func_name)
try:
# OpenSSL's PKCS5_PBKDF2_HMAC requires OpenSSL 1.0+ with HMAC and SHA
from _hashlib import pbkdf2_hmac
except ImportError:
import binascii
import struct
_trans_5C = b"".join(chr(x ^ 0x5C) for x in range(256))
_trans_36 = b"".join(chr(x ^ 0x36) for x in range(256))
def pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None):
"""Password based key derivation function 2 (PKCS #5 v2.0)
This Python implementations based on the hmac module about as fast
as OpenSSL's PKCS5_PBKDF2_HMAC for short passwords and much faster
for long passwords.
"""
if not isinstance(hash_name, str):
raise TypeError(hash_name)
if not isinstance(password, (bytes, bytearray)):
password = bytes(buffer(password))
if not isinstance(salt, (bytes, bytearray)):
salt = bytes(buffer(salt))
# Fast inline HMAC implementation
inner = new(hash_name)
outer = new(hash_name)
blocksize = getattr(inner, 'block_size', 64)
if len(password) > blocksize:
password = new(hash_name, password).digest()
password = password + b'\x00' * (blocksize - len(password))
inner.update(password.translate(_trans_36))
outer.update(password.translate(_trans_5C))
def prf(msg, inner=inner, outer=outer):
# PBKDF2_HMAC uses the password as key. We can re-use the same
# digest objects and just update copies to skip initialization.
icpy = inner.copy()
ocpy = outer.copy()
icpy.update(msg)
ocpy.update(icpy.digest())
return ocpy.digest()
if iterations < 1:
raise ValueError(iterations)
if dklen is None:
dklen = outer.digest_size
if dklen < 1:
raise ValueError(dklen)
hex_format_string = "%%0%ix" % (new(hash_name).digest_size * 2)
dkey = b''
loop = 1
while len(dkey) < dklen:
prev = prf(salt + struct.pack(b'>I', loop))
rkey = int(binascii.hexlify(prev), 16)
for i in xrange(iterations - 1):
prev = prf(prev)
rkey ^= int(binascii.hexlify(prev), 16)
loop += 1
dkey += binascii.unhexlify(hex_format_string % rkey)
return dkey[:dklen]
# Cleanup locals()
del __always_supported, __func_name, __get_hash
del __py_new, __hash_new, __get_openssl_constructor
# -*- coding: latin-1 -*-
"""Heap queue algorithm (a.k.a. priority queue).
Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
all k, counting elements from 0. For the sake of comparison,
non-existing elements are considered to be infinite. The interesting
property of a heap is that a[0] is always its smallest element.
Usage:
heap = [] # creates an empty heap
heappush(heap, item) # pushes a new item on the heap
item = heappop(heap) # pops the smallest item from the heap
item = heap[0] # smallest item on the heap without popping it
heapify(x) # transforms list into a heap, in-place, in linear time
item = heapreplace(heap, item) # pops and returns smallest item, and adds
# new item; the heap size is unchanged
Our API differs from textbook heap algorithms as follows:
- We use 0-based indexing. This makes the relationship between the
index for a node and the indexes for its children slightly less
obvious, but is more suitable since Python uses 0-based indexing.
- Our heappop() method returns the smallest item, not the largest.
These two make it possible to view the heap as a regular Python list
without surprises: heap[0] is the smallest item, and heap.sort()
maintains the heap invariant!
"""
# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
__about__ = """Heap queues
[explanation by Fran�ois Pinard]
Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
all k, counting elements from 0. For the sake of comparison,
non-existing elements are considered to be infinite. The interesting
property of a heap is that a[0] is always its smallest element.
The strange invariant above is meant to be an efficient memory
representation for a tournament. The numbers below are `k', not a[k]:
0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In
a usual binary tournament we see in sports, each cell is the winner
over the two cells it tops, and we can trace the winner down the tree
to see all opponents s/he had. However, in many computer applications
of such tournaments, we do not need to trace the history of a winner.
To be more memory efficient, when a winner is promoted, we try to
replace it by something else at a lower level, and the rule becomes
that a cell and the two cells it tops contain three different items,
but the top cell "wins" over the two topped cells.
If this heap invariant is protected at all time, index 0 is clearly
the overall winner. The simplest algorithmic way to remove it and
find the "next" winner is to move some loser (let's say cell 30 in the
diagram above) into the 0 position, and then percolate this new 0 down
the tree, exchanging values, until the invariant is re-established.
This is clearly logarithmic on the total number of items in the tree.
By iterating over all items, you get an O(n ln n) sort.
A nice feature of this sort is that you can efficiently insert new
items while the sort is going on, provided that the inserted items are
not "better" than the last 0'th element you extracted. This is
especially useful in simulation contexts, where the tree holds all
incoming events, and the "win" condition means the smallest scheduled
time. When an event schedule other events for execution, they are
scheduled into the future, so they can easily go into the heap. So, a
heap is a good structure for implementing schedulers (this is what I
used for my MIDI sequencer :-).
Various structures for implementing schedulers have been extensively
studied, and heaps are good for this, as they are reasonably speedy,
the speed is almost constant, and the worst case is not much different
than the average case. However, there are other representations which
are more efficient overall, yet the worst cases might be terrible.
Heaps are also very useful in big disk sorts. You most probably all
know that a big sort implies producing "runs" (which are pre-sorted
sequences, which size is usually related to the amount of CPU memory),
followed by a merging passes for these runs, which merging is often
very cleverly organised[1]. It is very important that the initial
sort produces the longest runs possible. Tournaments are a good way
to that. If, using all the memory available to hold a tournament, you
replace and percolate items that happen to fit the current run, you'll
produce runs which are twice the size of the memory for random input,
and much better for input fuzzily ordered.
Moreover, if you output the 0'th item on disk and get an input which
may not fit in the current tournament (because the value "wins" over
the last output value), it cannot fit in the heap, so the size of the
heap decreases. The freed memory could be cleverly reused immediately
for progressively building a second heap, which grows at exactly the
same rate the first heap is melting. When the first heap completely
vanishes, you switch heaps and start a new run. Clever and quite
effective!
In a word, heaps are useful memory structures to know. I use them in
a few applications, and I think it is good to keep a `heap' module
around. :-)
--------------------
[1] The disk balancing algorithms which are current, nowadays, are
more annoying than clever, and this is a consequence of the seeking
capabilities of the disks. On devices which cannot seek, like big
tape drives, the story was quite different, and one had to be very
clever to ensure (far in advance) that each tape movement will be the
most effective possible (that is, will best participate at
"progressing" the merge). Some tapes were even able to read
backwards, and this was also used to avoid the rewinding time.
Believe me, real good tape sorts were quite spectacular to watch!
From all times, sorting has always been a Great Art! :-)
"""
__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge',
'nlargest', 'nsmallest', 'heappushpop']
from itertools import islice, count, imap, izip, tee, chain
from operator import itemgetter
def cmp_lt(x, y):
# Use __lt__ if available; otherwise, try __le__.
# In Py3.x, only __lt__ will be called.
return (x < y) if hasattr(x, '__lt__') else (not y <= x)
def heappush(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
_siftdown(heap, 0, len(heap)-1)
def heappop(heap):
"""Pop the smallest item off the heap, maintaining the heap invariant."""
lastelt = heap.pop() # raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
_siftup(heap, 0)
else:
returnitem = lastelt
return returnitem
def heapreplace(heap, item):
"""Pop and return the current smallest value, and add the new item.
This is more efficient than heappop() followed by heappush(), and can be
more appropriate when using a fixed-size heap. Note that the value
returned may be larger than item! That constrains reasonable uses of
this routine unless written as part of a conditional replacement:
if item > heap[0]:
item = heapreplace(heap, item)
"""
returnitem = heap[0] # raises appropriate IndexError if heap is empty
heap[0] = item
_siftup(heap, 0)
return returnitem
def heappushpop(heap, item):
"""Fast version of a heappush followed by a heappop."""
if heap and cmp_lt(heap[0], item):
item, heap[0] = heap[0], item
_siftup(heap, 0)
return item
def heapify(x):
"""Transform list into a heap, in-place, in O(len(x)) time."""
n = len(x)
# Transform bottom-up. The largest index there's any point to looking at
# is the largest with a child index in-range, so must have 2*i + 1 < n,
# or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
for i in reversed(xrange(n//2)):
_siftup(x, i)
def _heappushpop_max(heap, item):
"""Maxheap version of a heappush followed by a heappop."""
if heap and cmp_lt(item, heap[0]):
item, heap[0] = heap[0], item
_siftup_max(heap, 0)
return item
def _heapify_max(x):
"""Transform list into a maxheap, in-place, in O(len(x)) time."""
n = len(x)
for i in reversed(range(n//2)):
_siftup_max(x, i)
def nlargest(n, iterable):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, reverse=True)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
heapify(result)
_heappushpop = heappushpop
for elem in it:
_heappushpop(result, elem)
result.sort(reverse=True)
return result
def nsmallest(n, iterable):
"""Find the n smallest elements in a dataset.
Equivalent to: sorted(iterable)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
_heapify_max(result)
_heappushpop = _heappushpop_max
for elem in it:
_heappushpop(result, elem)
result.sort()
return result
# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos
# is the index of a leaf with a possibly out-of-order value. Restore the
# heap invariant.
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if cmp_lt(newitem, parent):
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
# The child indices of heap index pos are already heaps, and we want to make
# a heap at index pos too. We do this by bubbling the smaller child of
# pos up (and so on with that child's children, etc) until hitting a leaf,
# then using _siftdown to move the oddball originally at index pos into place.
#
# We *could* break out of the loop as soon as we find a pos where newitem <=
# both its children, but turns out that's not a good idea, and despite that
# many books write the algorithm that way. During a heap pop, the last array
# element is sifted in, and that tends to be large, so that comparing it
# against values starting from the root usually doesn't pay (= usually doesn't
# get us out of the loop early). See Knuth, Volume 3, where this is
# explained and quantified in an exercise.
#
# Cutting the # of comparisons is important, since these routines have no
# way to extract "the priority" from an array element, so that intelligence
# is likely to be hiding in custom __cmp__ methods, or in array elements
# storing (priority, record) tuples. Comparisons are thus potentially
# expensive.
#
# On random arrays of length 1000, making this change cut the number of
# comparisons made by heapify() a little, and those made by exhaustive
# heappop() a lot, in accord with theory. Here are typical results from 3
# runs (3 just to demonstrate how small the variance is):
#
# Compares needed by heapify Compares needed by 1000 heappops
# -------------------------- --------------------------------
# 1837 cut to 1663 14996 cut to 8680
# 1855 cut to 1659 14966 cut to 8678
# 1847 cut to 1660 15024 cut to 8703
#
# Building the heap by using heappush() 1000 times instead required
# 2198, 2148, and 2219 compares: heapify() is more efficient, when
# you can use it.
#
# The total compares needed by list.sort() on the same lists were 8627,
# 8627, and 8632 (this should be compared to the sum of heapify() and
# heappop() compares): list.sort() is (unsurprisingly!) more efficient
# for sorting.
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not cmp_lt(heap[childpos], heap[rightpos]):
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
def _siftdown_max(heap, startpos, pos):
'Maxheap variant of _siftdown'
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if cmp_lt(parent, newitem):
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
def _siftup_max(heap, pos):
'Maxheap variant of _siftup'
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the larger child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of larger child.
rightpos = childpos + 1
if rightpos < endpos and not cmp_lt(heap[rightpos], heap[childpos]):
childpos = rightpos
# Move the larger child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown_max(heap, startpos, pos)
# If available, use C implementation
try:
from _heapq import *
except ImportError:
pass
def merge(*iterables):
'''Merge multiple sorted inputs into a single sorted output.
Similar to sorted(itertools.chain(*iterables)) but returns a generator,
does not pull the data into memory all at once, and assumes that each of
the input streams is already sorted (smallest to largest).
>>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
'''
_heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration
_len = len
h = []
h_append = h.append
for itnum, it in enumerate(map(iter, iterables)):
try:
next = it.next
h_append([next(), itnum, next])
except _StopIteration:
pass
heapify(h)
while _len(h) > 1:
try:
while 1:
v, itnum, next = s = h[0]
yield v
s[0] = next() # raises StopIteration when exhausted
_heapreplace(h, s) # restore heap condition
except _StopIteration:
_heappop(h) # remove empty iterator
if h:
# fast case when only a single iterator remains
v, itnum, next = h[0]
yield v
for v in next.__self__:
yield v
# Extend the implementations of nsmallest and nlargest to use a key= argument
_nsmallest = nsmallest
def nsmallest(n, iterable, key=None):
"""Find the n smallest elements in a dataset.
Equivalent to: sorted(iterable, key=key)[:n]
"""
# Short-cut for n==1 is to use min() when len(iterable)>0
if n == 1:
it = iter(iterable)
head = list(islice(it, 1))
if not head:
return []
if key is None:
return [min(chain(head, it))]
return [min(chain(head, it), key=key)]
# When n>=size, it's faster to use sorted()
try:
size = len(iterable)
except (TypeError, AttributeError):
pass
else:
if n >= size:
return sorted(iterable, key=key)[:n]
# When key is none, use simpler decoration
if key is None:
it = izip(iterable, count()) # decorate
result = _nsmallest(n, it)
return map(itemgetter(0), result) # undecorate
# General case, slowest method
in1, in2 = tee(iterable)
it = izip(imap(key, in1), count(), in2) # decorate
result = _nsmallest(n, it)
return map(itemgetter(2), result) # undecorate
_nlargest = nlargest
def nlargest(n, iterable, key=None):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
"""
# Short-cut for n==1 is to use max() when len(iterable)>0
if n == 1:
it = iter(iterable)
head = list(islice(it, 1))
if not head:
return []
if key is None:
return [max(chain(head, it))]
return [max(chain(head, it), key=key)]
# When n>=size, it's faster to use sorted()
try:
size = len(iterable)
except (TypeError, AttributeError):
pass
else:
if n >= size:
return sorted(iterable, key=key, reverse=True)[:n]
# When key is none, use simpler decoration
if key is None:
it = izip(iterable, count(0,-1)) # decorate
result = _nlargest(n, it)
return map(itemgetter(0), result) # undecorate
# General case, slowest method
in1, in2 = tee(iterable)
it = izip(imap(key, in1), count(0,-1), in2) # decorate
result = _nlargest(n, it)
return map(itemgetter(2), result) # undecorate
if __name__ == "__main__":
# Simple sanity test
heap = []
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
for item in data:
heappush(heap, item)
sort = []
while heap:
sort.append(heappop(heap))
print sort
import doctest
doctest.testmod()
"""HMAC (Keyed-Hashing for Message Authentication) Python module.
Implements the HMAC algorithm as described by RFC 2104.
"""
import warnings as _warnings
from operator import _compare_digest as compare_digest
trans_5C = "".join ([chr (x ^ 0x5C) for x in xrange(256)])
trans_36 = "".join ([chr (x ^ 0x36) for x in xrange(256)])
# The size of the digests returned by HMAC depends on the underlying
# hashing module used. Use digest_size from the instance of HMAC instead.
digest_size = None
# A unique object passed by HMAC.copy() to the HMAC constructor, in order
# that the latter return very quickly. HMAC("") in contrast is quite
# expensive.
_secret_backdoor_key = []
class HMAC:
"""RFC 2104 HMAC class. Also complies with RFC 4231.
This supports the API for Cryptographic Hash Functions (PEP 247).
"""
blocksize = 64 # 512-bit HMAC; can be changed in subclasses.
def __init__(self, key, msg = None, digestmod = None):
"""Create a new HMAC object.
key: key for the keyed hash object.
msg: Initial input for the hash, if provided.
digestmod: A module supporting PEP 247. *OR*
A hashlib constructor returning a new hash object.
Defaults to hashlib.md5.
"""
if key is _secret_backdoor_key: # cheap
return
if digestmod is None:
import hashlib
digestmod = hashlib.md5
if hasattr(digestmod, '__call__'):
self.digest_cons = digestmod
else:
self.digest_cons = lambda d='': digestmod.new(d)
self.outer = self.digest_cons()
self.inner = self.digest_cons()
self.digest_size = self.inner.digest_size
if hasattr(self.inner, 'block_size'):
blocksize = self.inner.block_size
if blocksize < 16:
# Very low blocksize, most likely a legacy value like
# Lib/sha.py and Lib/md5.py have.
_warnings.warn('block_size of %d seems too small; using our '
'default of %d.' % (blocksize, self.blocksize),
RuntimeWarning, 2)
blocksize = self.blocksize
else:
_warnings.warn('No block_size attribute on given digest object; '
'Assuming %d.' % (self.blocksize),
RuntimeWarning, 2)
blocksize = self.blocksize
if len(key) > blocksize:
key = self.digest_cons(key).digest()
key = key + chr(0) * (blocksize - len(key))
self.outer.update(key.translate(trans_5C))
self.inner.update(key.translate(trans_36))
if msg is not None:
self.update(msg)
## def clear(self):
## raise NotImplementedError, "clear() method not available in HMAC."
def update(self, msg):
"""Update this hashing object with the string msg.
"""
self.inner.update(msg)
def copy(self):
"""Return a separate copy of this hashing object.
An update to this copy won't affect the original object.
"""
other = self.__class__(_secret_backdoor_key)
other.digest_cons = self.digest_cons
other.digest_size = self.digest_size
other.inner = self.inner.copy()
other.outer = self.outer.copy()
return other
def _current(self):
"""Return a hash object for the current state.
To be used only internally with digest() and hexdigest().
"""
h = self.outer.copy()
h.update(self.inner.digest())
return h
def digest(self):
"""Return the hash value of this hashing object.
This returns a string containing 8-bit data. The object is
not altered in any way by this function; you can continue
updating the object after calling this function.
"""
h = self._current()
return h.digest()
def hexdigest(self):
"""Like digest(), but returns a string of hexadecimal digits instead.
"""
h = self._current()
return h.hexdigest()
def new(key, msg = None, digestmod = None):
"""Create a new hashing object and return it.
key: The starting key for the hash.
msg: if available, will immediately be hashed into the object's starting
state.
You can now feed arbitrary strings into the object using its update()
method, and can ask for the hash value at any time by calling its digest()
method.
"""
return HMAC(key, msg, digestmod)
"""HTML character entity references."""
# maps the HTML entity name to the Unicode code point
name2codepoint = {
'AElig': 0x00c6, # latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1
'Aacute': 0x00c1, # latin capital letter A with acute, U+00C1 ISOlat1
'Acirc': 0x00c2, # latin capital letter A with circumflex, U+00C2 ISOlat1
'Agrave': 0x00c0, # latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1
'Alpha': 0x0391, # greek capital letter alpha, U+0391
'Aring': 0x00c5, # latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1
'Atilde': 0x00c3, # latin capital letter A with tilde, U+00C3 ISOlat1
'Auml': 0x00c4, # latin capital letter A with diaeresis, U+00C4 ISOlat1
'Beta': 0x0392, # greek capital letter beta, U+0392
'Ccedil': 0x00c7, # latin capital letter C with cedilla, U+00C7 ISOlat1
'Chi': 0x03a7, # greek capital letter chi, U+03A7
'Dagger': 0x2021, # double dagger, U+2021 ISOpub
'Delta': 0x0394, # greek capital letter delta, U+0394 ISOgrk3
'ETH': 0x00d0, # latin capital letter ETH, U+00D0 ISOlat1
'Eacute': 0x00c9, # latin capital letter E with acute, U+00C9 ISOlat1
'Ecirc': 0x00ca, # latin capital letter E with circumflex, U+00CA ISOlat1
'Egrave': 0x00c8, # latin capital letter E with grave, U+00C8 ISOlat1
'Epsilon': 0x0395, # greek capital letter epsilon, U+0395
'Eta': 0x0397, # greek capital letter eta, U+0397
'Euml': 0x00cb, # latin capital letter E with diaeresis, U+00CB ISOlat1
'Gamma': 0x0393, # greek capital letter gamma, U+0393 ISOgrk3
'Iacute': 0x00cd, # latin capital letter I with acute, U+00CD ISOlat1
'Icirc': 0x00ce, # latin capital letter I with circumflex, U+00CE ISOlat1
'Igrave': 0x00cc, # latin capital letter I with grave, U+00CC ISOlat1
'Iota': 0x0399, # greek capital letter iota, U+0399
'Iuml': 0x00cf, # latin capital letter I with diaeresis, U+00CF ISOlat1
'Kappa': 0x039a, # greek capital letter kappa, U+039A
'Lambda': 0x039b, # greek capital letter lambda, U+039B ISOgrk3
'Mu': 0x039c, # greek capital letter mu, U+039C
'Ntilde': 0x00d1, # latin capital letter N with tilde, U+00D1 ISOlat1
'Nu': 0x039d, # greek capital letter nu, U+039D
'OElig': 0x0152, # latin capital ligature OE, U+0152 ISOlat2
'Oacute': 0x00d3, # latin capital letter O with acute, U+00D3 ISOlat1
'Ocirc': 0x00d4, # latin capital letter O with circumflex, U+00D4 ISOlat1
'Ograve': 0x00d2, # latin capital letter O with grave, U+00D2 ISOlat1
'Omega': 0x03a9, # greek capital letter omega, U+03A9 ISOgrk3
'Omicron': 0x039f, # greek capital letter omicron, U+039F
'Oslash': 0x00d8, # latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1
'Otilde': 0x00d5, # latin capital letter O with tilde, U+00D5 ISOlat1
'Ouml': 0x00d6, # latin capital letter O with diaeresis, U+00D6 ISOlat1
'Phi': 0x03a6, # greek capital letter phi, U+03A6 ISOgrk3
'Pi': 0x03a0, # greek capital letter pi, U+03A0 ISOgrk3
'Prime': 0x2033, # double prime = seconds = inches, U+2033 ISOtech
'Psi': 0x03a8, # greek capital letter psi, U+03A8 ISOgrk3
'Rho': 0x03a1, # greek capital letter rho, U+03A1
'Scaron': 0x0160, # latin capital letter S with caron, U+0160 ISOlat2
'Sigma': 0x03a3, # greek capital letter sigma, U+03A3 ISOgrk3
'THORN': 0x00de, # latin capital letter THORN, U+00DE ISOlat1
'Tau': 0x03a4, # greek capital letter tau, U+03A4
'Theta': 0x0398, # greek capital letter theta, U+0398 ISOgrk3
'Uacute': 0x00da, # latin capital letter U with acute, U+00DA ISOlat1
'Ucirc': 0x00db, # latin capital letter U with circumflex, U+00DB ISOlat1
'Ugrave': 0x00d9, # latin capital letter U with grave, U+00D9 ISOlat1
'Upsilon': 0x03a5, # greek capital letter upsilon, U+03A5 ISOgrk3
'Uuml': 0x00dc, # latin capital letter U with diaeresis, U+00DC ISOlat1
'Xi': 0x039e, # greek capital letter xi, U+039E ISOgrk3
'Yacute': 0x00dd, # latin capital letter Y with acute, U+00DD ISOlat1
'Yuml': 0x0178, # latin capital letter Y with diaeresis, U+0178 ISOlat2
'Zeta': 0x0396, # greek capital letter zeta, U+0396
'aacute': 0x00e1, # latin small letter a with acute, U+00E1 ISOlat1
'acirc': 0x00e2, # latin small letter a with circumflex, U+00E2 ISOlat1
'acute': 0x00b4, # acute accent = spacing acute, U+00B4 ISOdia
'aelig': 0x00e6, # latin small letter ae = latin small ligature ae, U+00E6 ISOlat1
'agrave': 0x00e0, # latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1
'alefsym': 0x2135, # alef symbol = first transfinite cardinal, U+2135 NEW
'alpha': 0x03b1, # greek small letter alpha, U+03B1 ISOgrk3
'amp': 0x0026, # ampersand, U+0026 ISOnum
'and': 0x2227, # logical and = wedge, U+2227 ISOtech
'ang': 0x2220, # angle, U+2220 ISOamso
'aring': 0x00e5, # latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1
'asymp': 0x2248, # almost equal to = asymptotic to, U+2248 ISOamsr
'atilde': 0x00e3, # latin small letter a with tilde, U+00E3 ISOlat1
'auml': 0x00e4, # latin small letter a with diaeresis, U+00E4 ISOlat1
'bdquo': 0x201e, # double low-9 quotation mark, U+201E NEW
'beta': 0x03b2, # greek small letter beta, U+03B2 ISOgrk3
'brvbar': 0x00a6, # broken bar = broken vertical bar, U+00A6 ISOnum
'bull': 0x2022, # bullet = black small circle, U+2022 ISOpub
'cap': 0x2229, # intersection = cap, U+2229 ISOtech
'ccedil': 0x00e7, # latin small letter c with cedilla, U+00E7 ISOlat1
'cedil': 0x00b8, # cedilla = spacing cedilla, U+00B8 ISOdia
'cent': 0x00a2, # cent sign, U+00A2 ISOnum
'chi': 0x03c7, # greek small letter chi, U+03C7 ISOgrk3
'circ': 0x02c6, # modifier letter circumflex accent, U+02C6 ISOpub
'clubs': 0x2663, # black club suit = shamrock, U+2663 ISOpub
'cong': 0x2245, # approximately equal to, U+2245 ISOtech
'copy': 0x00a9, # copyright sign, U+00A9 ISOnum
'crarr': 0x21b5, # downwards arrow with corner leftwards = carriage return, U+21B5 NEW
'cup': 0x222a, # union = cup, U+222A ISOtech
'curren': 0x00a4, # currency sign, U+00A4 ISOnum
'dArr': 0x21d3, # downwards double arrow, U+21D3 ISOamsa
'dagger': 0x2020, # dagger, U+2020 ISOpub
'darr': 0x2193, # downwards arrow, U+2193 ISOnum
'deg': 0x00b0, # degree sign, U+00B0 ISOnum
'delta': 0x03b4, # greek small letter delta, U+03B4 ISOgrk3
'diams': 0x2666, # black diamond suit, U+2666 ISOpub
'divide': 0x00f7, # division sign, U+00F7 ISOnum
'eacute': 0x00e9, # latin small letter e with acute, U+00E9 ISOlat1
'ecirc': 0x00ea, # latin small letter e with circumflex, U+00EA ISOlat1
'egrave': 0x00e8, # latin small letter e with grave, U+00E8 ISOlat1
'empty': 0x2205, # empty set = null set = diameter, U+2205 ISOamso
'emsp': 0x2003, # em space, U+2003 ISOpub
'ensp': 0x2002, # en space, U+2002 ISOpub
'epsilon': 0x03b5, # greek small letter epsilon, U+03B5 ISOgrk3
'equiv': 0x2261, # identical to, U+2261 ISOtech
'eta': 0x03b7, # greek small letter eta, U+03B7 ISOgrk3
'eth': 0x00f0, # latin small letter eth, U+00F0 ISOlat1
'euml': 0x00eb, # latin small letter e with diaeresis, U+00EB ISOlat1
'euro': 0x20ac, # euro sign, U+20AC NEW
'exist': 0x2203, # there exists, U+2203 ISOtech
'fnof': 0x0192, # latin small f with hook = function = florin, U+0192 ISOtech
'forall': 0x2200, # for all, U+2200 ISOtech
'frac12': 0x00bd, # vulgar fraction one half = fraction one half, U+00BD ISOnum
'frac14': 0x00bc, # vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum
'frac34': 0x00be, # vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum
'frasl': 0x2044, # fraction slash, U+2044 NEW
'gamma': 0x03b3, # greek small letter gamma, U+03B3 ISOgrk3
'ge': 0x2265, # greater-than or equal to, U+2265 ISOtech
'gt': 0x003e, # greater-than sign, U+003E ISOnum
'hArr': 0x21d4, # left right double arrow, U+21D4 ISOamsa
'harr': 0x2194, # left right arrow, U+2194 ISOamsa
'hearts': 0x2665, # black heart suit = valentine, U+2665 ISOpub
'hellip': 0x2026, # horizontal ellipsis = three dot leader, U+2026 ISOpub
'iacute': 0x00ed, # latin small letter i with acute, U+00ED ISOlat1
'icirc': 0x00ee, # latin small letter i with circumflex, U+00EE ISOlat1
'iexcl': 0x00a1, # inverted exclamation mark, U+00A1 ISOnum
'igrave': 0x00ec, # latin small letter i with grave, U+00EC ISOlat1
'image': 0x2111, # blackletter capital I = imaginary part, U+2111 ISOamso
'infin': 0x221e, # infinity, U+221E ISOtech
'int': 0x222b, # integral, U+222B ISOtech
'iota': 0x03b9, # greek small letter iota, U+03B9 ISOgrk3
'iquest': 0x00bf, # inverted question mark = turned question mark, U+00BF ISOnum
'isin': 0x2208, # element of, U+2208 ISOtech
'iuml': 0x00ef, # latin small letter i with diaeresis, U+00EF ISOlat1
'kappa': 0x03ba, # greek small letter kappa, U+03BA ISOgrk3
'lArr': 0x21d0, # leftwards double arrow, U+21D0 ISOtech
'lambda': 0x03bb, # greek small letter lambda, U+03BB ISOgrk3
'lang': 0x2329, # left-pointing angle bracket = bra, U+2329 ISOtech
'laquo': 0x00ab, # left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum
'larr': 0x2190, # leftwards arrow, U+2190 ISOnum
'lceil': 0x2308, # left ceiling = apl upstile, U+2308 ISOamsc
'ldquo': 0x201c, # left double quotation mark, U+201C ISOnum
'le': 0x2264, # less-than or equal to, U+2264 ISOtech
'lfloor': 0x230a, # left floor = apl downstile, U+230A ISOamsc
'lowast': 0x2217, # asterisk operator, U+2217 ISOtech
'loz': 0x25ca, # lozenge, U+25CA ISOpub
'lrm': 0x200e, # left-to-right mark, U+200E NEW RFC 2070
'lsaquo': 0x2039, # single left-pointing angle quotation mark, U+2039 ISO proposed
'lsquo': 0x2018, # left single quotation mark, U+2018 ISOnum
'lt': 0x003c, # less-than sign, U+003C ISOnum
'macr': 0x00af, # macron = spacing macron = overline = APL overbar, U+00AF ISOdia
'mdash': 0x2014, # em dash, U+2014 ISOpub
'micro': 0x00b5, # micro sign, U+00B5 ISOnum
'middot': 0x00b7, # middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum
'minus': 0x2212, # minus sign, U+2212 ISOtech
'mu': 0x03bc, # greek small letter mu, U+03BC ISOgrk3
'nabla': 0x2207, # nabla = backward difference, U+2207 ISOtech
'nbsp': 0x00a0, # no-break space = non-breaking space, U+00A0 ISOnum
'ndash': 0x2013, # en dash, U+2013 ISOpub
'ne': 0x2260, # not equal to, U+2260 ISOtech
'ni': 0x220b, # contains as member, U+220B ISOtech
'not': 0x00ac, # not sign, U+00AC ISOnum
'notin': 0x2209, # not an element of, U+2209 ISOtech
'nsub': 0x2284, # not a subset of, U+2284 ISOamsn
'ntilde': 0x00f1, # latin small letter n with tilde, U+00F1 ISOlat1
'nu': 0x03bd, # greek small letter nu, U+03BD ISOgrk3
'oacute': 0x00f3, # latin small letter o with acute, U+00F3 ISOlat1
'ocirc': 0x00f4, # latin small letter o with circumflex, U+00F4 ISOlat1
'oelig': 0x0153, # latin small ligature oe, U+0153 ISOlat2
'ograve': 0x00f2, # latin small letter o with grave, U+00F2 ISOlat1
'oline': 0x203e, # overline = spacing overscore, U+203E NEW
'omega': 0x03c9, # greek small letter omega, U+03C9 ISOgrk3
'omicron': 0x03bf, # greek small letter omicron, U+03BF NEW
'oplus': 0x2295, # circled plus = direct sum, U+2295 ISOamsb
'or': 0x2228, # logical or = vee, U+2228 ISOtech
'ordf': 0x00aa, # feminine ordinal indicator, U+00AA ISOnum
'ordm': 0x00ba, # masculine ordinal indicator, U+00BA ISOnum
'oslash': 0x00f8, # latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1
'otilde': 0x00f5, # latin small letter o with tilde, U+00F5 ISOlat1
'otimes': 0x2297, # circled times = vector product, U+2297 ISOamsb
'ouml': 0x00f6, # latin small letter o with diaeresis, U+00F6 ISOlat1
'para': 0x00b6, # pilcrow sign = paragraph sign, U+00B6 ISOnum
'part': 0x2202, # partial differential, U+2202 ISOtech
'permil': 0x2030, # per mille sign, U+2030 ISOtech
'perp': 0x22a5, # up tack = orthogonal to = perpendicular, U+22A5 ISOtech
'phi': 0x03c6, # greek small letter phi, U+03C6 ISOgrk3
'pi': 0x03c0, # greek small letter pi, U+03C0 ISOgrk3
'piv': 0x03d6, # greek pi symbol, U+03D6 ISOgrk3
'plusmn': 0x00b1, # plus-minus sign = plus-or-minus sign, U+00B1 ISOnum
'pound': 0x00a3, # pound sign, U+00A3 ISOnum
'prime': 0x2032, # prime = minutes = feet, U+2032 ISOtech
'prod': 0x220f, # n-ary product = product sign, U+220F ISOamsb
'prop': 0x221d, # proportional to, U+221D ISOtech
'psi': 0x03c8, # greek small letter psi, U+03C8 ISOgrk3
'quot': 0x0022, # quotation mark = APL quote, U+0022 ISOnum
'rArr': 0x21d2, # rightwards double arrow, U+21D2 ISOtech
'radic': 0x221a, # square root = radical sign, U+221A ISOtech
'rang': 0x232a, # right-pointing angle bracket = ket, U+232A ISOtech
'raquo': 0x00bb, # right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum
'rarr': 0x2192, # rightwards arrow, U+2192 ISOnum
'rceil': 0x2309, # right ceiling, U+2309 ISOamsc
'rdquo': 0x201d, # right double quotation mark, U+201D ISOnum
'real': 0x211c, # blackletter capital R = real part symbol, U+211C ISOamso
'reg': 0x00ae, # registered sign = registered trade mark sign, U+00AE ISOnum
'rfloor': 0x230b, # right floor, U+230B ISOamsc
'rho': 0x03c1, # greek small letter rho, U+03C1 ISOgrk3
'rlm': 0x200f, # right-to-left mark, U+200F NEW RFC 2070
'rsaquo': 0x203a, # single right-pointing angle quotation mark, U+203A ISO proposed
'rsquo': 0x2019, # right single quotation mark, U+2019 ISOnum
'sbquo': 0x201a, # single low-9 quotation mark, U+201A NEW
'scaron': 0x0161, # latin small letter s with caron, U+0161 ISOlat2
'sdot': 0x22c5, # dot operator, U+22C5 ISOamsb
'sect': 0x00a7, # section sign, U+00A7 ISOnum
'shy': 0x00ad, # soft hyphen = discretionary hyphen, U+00AD ISOnum
'sigma': 0x03c3, # greek small letter sigma, U+03C3 ISOgrk3
'sigmaf': 0x03c2, # greek small letter final sigma, U+03C2 ISOgrk3
'sim': 0x223c, # tilde operator = varies with = similar to, U+223C ISOtech
'spades': 0x2660, # black spade suit, U+2660 ISOpub
'sub': 0x2282, # subset of, U+2282 ISOtech
'sube': 0x2286, # subset of or equal to, U+2286 ISOtech
'sum': 0x2211, # n-ary sumation, U+2211 ISOamsb
'sup': 0x2283, # superset of, U+2283 ISOtech
'sup1': 0x00b9, # superscript one = superscript digit one, U+00B9 ISOnum
'sup2': 0x00b2, # superscript two = superscript digit two = squared, U+00B2 ISOnum
'sup3': 0x00b3, # superscript three = superscript digit three = cubed, U+00B3 ISOnum
'supe': 0x2287, # superset of or equal to, U+2287 ISOtech
'szlig': 0x00df, # latin small letter sharp s = ess-zed, U+00DF ISOlat1
'tau': 0x03c4, # greek small letter tau, U+03C4 ISOgrk3
'there4': 0x2234, # therefore, U+2234 ISOtech
'theta': 0x03b8, # greek small letter theta, U+03B8 ISOgrk3
'thetasym': 0x03d1, # greek small letter theta symbol, U+03D1 NEW
'thinsp': 0x2009, # thin space, U+2009 ISOpub
'thorn': 0x00fe, # latin small letter thorn with, U+00FE ISOlat1
'tilde': 0x02dc, # small tilde, U+02DC ISOdia
'times': 0x00d7, # multiplication sign, U+00D7 ISOnum
'trade': 0x2122, # trade mark sign, U+2122 ISOnum
'uArr': 0x21d1, # upwards double arrow, U+21D1 ISOamsa
'uacute': 0x00fa, # latin small letter u with acute, U+00FA ISOlat1
'uarr': 0x2191, # upwards arrow, U+2191 ISOnum
'ucirc': 0x00fb, # latin small letter u with circumflex, U+00FB ISOlat1
'ugrave': 0x00f9, # latin small letter u with grave, U+00F9 ISOlat1
'uml': 0x00a8, # diaeresis = spacing diaeresis, U+00A8 ISOdia
'upsih': 0x03d2, # greek upsilon with hook symbol, U+03D2 NEW
'upsilon': 0x03c5, # greek small letter upsilon, U+03C5 ISOgrk3
'uuml': 0x00fc, # latin small letter u with diaeresis, U+00FC ISOlat1
'weierp': 0x2118, # script capital P = power set = Weierstrass p, U+2118 ISOamso
'xi': 0x03be, # greek small letter xi, U+03BE ISOgrk3
'yacute': 0x00fd, # latin small letter y with acute, U+00FD ISOlat1
'yen': 0x00a5, # yen sign = yuan sign, U+00A5 ISOnum
'yuml': 0x00ff, # latin small letter y with diaeresis, U+00FF ISOlat1
'zeta': 0x03b6, # greek small letter zeta, U+03B6 ISOgrk3
'zwj': 0x200d, # zero width joiner, U+200D NEW RFC 2070
'zwnj': 0x200c, # zero width non-joiner, U+200C NEW RFC 2070
}
# maps the Unicode code point to the HTML entity name
codepoint2name = {}
# maps the HTML entity name to the character
# (or a character reference if the character is outside the Latin-1 range)
entitydefs = {}
for (name, codepoint) in name2codepoint.iteritems():
codepoint2name[codepoint] = name
if codepoint <= 0xff:
entitydefs[name] = chr(codepoint)
else:
entitydefs[name] = '&#%d;' % codepoint
del name, codepoint
"""HTML 2.0 parser.
See the HTML 2.0 specification:
http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_toc.html
"""
from warnings import warnpy3k
warnpy3k("the htmllib module has been removed in Python 3.0",
stacklevel=2)
del warnpy3k
import sgmllib
from formatter import AS_IS
__all__ = ["HTMLParser", "HTMLParseError"]
class HTMLParseError(sgmllib.SGMLParseError):
"""Error raised when an HTML document can't be parsed."""
class HTMLParser(sgmllib.SGMLParser):
"""This is the basic HTML parser class.
It supports all entity names required by the XHTML 1.0 Recommendation.
It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2
elements.
"""
from htmlentitydefs import entitydefs
def __init__(self, formatter, verbose=0):
"""Creates an instance of the HTMLParser class.
The formatter parameter is the formatter instance associated with
the parser.
"""
sgmllib.SGMLParser.__init__(self, verbose)
self.formatter = formatter
def error(self, message):
raise HTMLParseError(message)
def reset(self):
sgmllib.SGMLParser.reset(self)
self.savedata = None
self.isindex = 0
self.title = None
self.base = None
self.anchor = None
self.anchorlist = []
self.nofill = 0
self.list_stack = []
# ------ Methods used internally; some may be overridden
# --- Formatter interface, taking care of 'savedata' mode;
# shouldn't need to be overridden
def handle_data(self, data):
if self.savedata is not None:
self.savedata = self.savedata + data
else:
if self.nofill:
self.formatter.add_literal_data(data)
else:
self.formatter.add_flowing_data(data)
# --- Hooks to save data; shouldn't need to be overridden
def save_bgn(self):
"""Begins saving character data in a buffer instead of sending it
to the formatter object.
Retrieve the stored data via the save_end() method. Use of the
save_bgn() / save_end() pair may not be nested.
"""
self.savedata = ''
def save_end(self):
"""Ends buffering character data and returns all data saved since
the preceding call to the save_bgn() method.
If the nofill flag is false, whitespace is collapsed to single
spaces. A call to this method without a preceding call to the
save_bgn() method will raise a TypeError exception.
"""
data = self.savedata
self.savedata = None
if not self.nofill:
data = ' '.join(data.split())
return data
# --- Hooks for anchors; should probably be overridden
def anchor_bgn(self, href, name, type):
"""This method is called at the start of an anchor region.
The arguments correspond to the attributes of the <A> tag with
the same names. The default implementation maintains a list of
hyperlinks (defined by the HREF attribute for <A> tags) within
the document. The list of hyperlinks is available as the data
attribute anchorlist.
"""
self.anchor = href
if self.anchor:
self.anchorlist.append(href)
def anchor_end(self):
"""This method is called at the end of an anchor region.
The default implementation adds a textual footnote marker using an
index into the list of hyperlinks created by the anchor_bgn()method.
"""
if self.anchor:
self.handle_data("[%d]" % len(self.anchorlist))
self.anchor = None
# --- Hook for images; should probably be overridden
def handle_image(self, src, alt, *args):
"""This method is called to handle images.
The default implementation simply passes the alt value to the
handle_data() method.
"""
self.handle_data(alt)
# --------- Top level elememts
def start_html(self, attrs): pass
def end_html(self): pass
def start_head(self, attrs): pass
def end_head(self): pass
def start_body(self, attrs): pass
def end_body(self): pass
# ------ Head elements
def start_title(self, attrs):
self.save_bgn()
def end_title(self):
self.title = self.save_end()
def do_base(self, attrs):
for a, v in attrs:
if a == 'href':
self.base = v
def do_isindex(self, attrs):
self.isindex = 1
def do_link(self, attrs):
pass
def do_meta(self, attrs):
pass
def do_nextid(self, attrs): # Deprecated
pass
# ------ Body elements
# --- Headings
def start_h1(self, attrs):
self.formatter.end_paragraph(1)
self.formatter.push_font(('h1', 0, 1, 0))
def end_h1(self):
self.formatter.end_paragraph(1)
self.formatter.pop_font()
def start_h2(self, attrs):
self.formatter.end_paragraph(1)
self.formatter.push_font(('h2', 0, 1, 0))
def end_h2(self):
self.formatter.end_paragraph(1)
self.formatter.pop_font()
def start_h3(self, attrs):
self.formatter.end_paragraph(1)
self.formatter.push_font(('h3', 0, 1, 0))
def end_h3(self):
self.formatter.end_paragraph(1)
self.formatter.pop_font()
def start_h4(self, attrs):
self.formatter.end_paragraph(1)
self.formatter.push_font(('h4', 0, 1, 0))
def end_h4(self):
self.formatter.end_paragraph(1)
self.formatter.pop_font()
def start_h5(self, attrs):
self.formatter.end_paragraph(1)
self.formatter.push_font(('h5', 0, 1, 0))
def end_h5(self):
self.formatter.end_paragraph(1)
self.formatter.pop_font()
def start_h6(self, attrs):
self.formatter.end_paragraph(1)
self.formatter.push_font(('h6', 0, 1, 0))
def end_h6(self):
self.formatter.end_paragraph(1)
self.formatter.pop_font()
# --- Block Structuring Elements
def do_p(self, attrs):
self.formatter.end_paragraph(1)
def start_pre(self, attrs):
self.formatter.end_paragraph(1)
self.formatter.push_font((AS_IS, AS_IS, AS_IS, 1))
self.nofill = self.nofill + 1
def end_pre(self):
self.formatter.end_paragraph(1)
self.formatter.pop_font()
self.nofill = max(0, self.nofill - 1)
def start_xmp(self, attrs):
self.start_pre(attrs)
self.setliteral('xmp') # Tell SGML parser
def end_xmp(self):
self.end_pre()
def start_listing(self, attrs):
self.start_pre(attrs)
self.setliteral('listing') # Tell SGML parser
def end_listing(self):
self.end_pre()
def start_address(self, attrs):
self.formatter.end_paragraph(0)
self.formatter.push_font((AS_IS, 1, AS_IS, AS_IS))
def end_address(self):
self.formatter.end_paragraph(0)
self.formatter.pop_font()
def start_blockquote(self, attrs):
self.formatter.end_paragraph(1)
self.formatter.push_margin('blockquote')
def end_blockquote(self):
self.formatter.end_paragraph(1)
self.formatter.pop_margin()
# --- List Elements
def start_ul(self, attrs):
self.formatter.end_paragraph(not self.list_stack)
self.formatter.push_margin('ul')
self.list_stack.append(['ul', '*', 0])
def end_ul(self):
if self.list_stack: del self.list_stack[-1]
self.formatter.end_paragraph(not self.list_stack)
self.formatter.pop_margin()
def do_li(self, attrs):
self.formatter.end_paragraph(0)
if self.list_stack:
[dummy, label, counter] = top = self.list_stack[-1]
top[2] = counter = counter+1
else:
label, counter = '*', 0
self.formatter.add_label_data(label, counter)
def start_ol(self, attrs):
self.formatter.end_paragraph(not self.list_stack)
self.formatter.push_margin('ol')
label = '1.'
for a, v in attrs:
if a == 'type':
if len(v) == 1: v = v + '.'
label = v
self.list_stack.append(['ol', label, 0])
def end_ol(self):
if self.list_stack: del self.list_stack[-1]
self.formatter.end_paragraph(not self.list_stack)
self.formatter.pop_margin()
def start_menu(self, attrs):
self.start_ul(attrs)
def end_menu(self):
self.end_ul()
def start_dir(self, attrs):
self.start_ul(attrs)
def end_dir(self):
self.end_ul()
def start_dl(self, attrs):
self.formatter.end_paragraph(1)
self.list_stack.append(['dl', '', 0])
def end_dl(self):
self.ddpop(1)
if self.list_stack: del self.list_stack[-1]
def do_dt(self, attrs):
self.ddpop()
def do_dd(self, attrs):
self.ddpop()
self.formatter.push_margin('dd')
self.list_stack.append(['dd', '', 0])
def ddpop(self, bl=0):
self.formatter.end_paragraph(bl)
if self.list_stack:
if self.list_stack[-1][0] == 'dd':
del self.list_stack[-1]
self.formatter.pop_margin()
# --- Phrase Markup
# Idiomatic Elements
def start_cite(self, attrs): self.start_i(attrs)
def end_cite(self): self.end_i()
def start_code(self, attrs): self.start_tt(attrs)
def end_code(self): self.end_tt()
def start_em(self, attrs): self.start_i(attrs)
def end_em(self): self.end_i()
def start_kbd(self, attrs): self.start_tt(attrs)
def end_kbd(self): self.end_tt()
def start_samp(self, attrs): self.start_tt(attrs)
def end_samp(self): self.end_tt()
def start_strong(self, attrs): self.start_b(attrs)
def end_strong(self): self.end_b()
def start_var(self, attrs): self.start_i(attrs)
def end_var(self): self.end_i()
# Typographic Elements
def start_i(self, attrs):
self.formatter.push_font((AS_IS, 1, AS_IS, AS_IS))
def end_i(self):
self.formatter.pop_font()
def start_b(self, attrs):
self.formatter.push_font((AS_IS, AS_IS, 1, AS_IS))
def end_b(self):
self.formatter.pop_font()
def start_tt(self, attrs):
self.formatter.push_font((AS_IS, AS_IS, AS_IS, 1))
def end_tt(self):
self.formatter.pop_font()
def start_a(self, attrs):
href = ''
name = ''
type = ''
for attrname, value in attrs:
value = value.strip()
if attrname == 'href':
href = value
if attrname == 'name':
name = value
if attrname == 'type':
type = value.lower()
self.anchor_bgn(href, name, type)
def end_a(self):
self.anchor_end()
# --- Line Break
def do_br(self, attrs):
self.formatter.add_line_break()
# --- Horizontal Rule
def do_hr(self, attrs):
self.formatter.add_hor_rule()
# --- Image
def do_img(self, attrs):
align = ''
alt = '(image)'
ismap = ''
src = ''
width = 0
height = 0
for attrname, value in attrs:
if attrname == 'align':
align = value
if attrname == 'alt':
alt = value
if attrname == 'ismap':
ismap = value
if attrname == 'src':
src = value
if attrname == 'width':
try: width = int(value)
except ValueError: pass
if attrname == 'height':
try: height = int(value)
except ValueError: pass
self.handle_image(src, alt, ismap, align, width, height)
# --- Really Old Unofficial Deprecated Stuff
def do_plaintext(self, attrs):
self.start_pre(attrs)
self.setnomoretags() # Tell SGML parser
# --- Unhandled tags
def unknown_starttag(self, tag, attrs):
pass
def unknown_endtag(self, tag):
pass
def test(args = None):
import sys, formatter
if not args:
args = sys.argv[1:]
silent = args and args[0] == '-s'
if silent:
del args[0]
if args:
file = args[0]
else:
file = 'test.html'
if file == '-':
f = sys.stdin
else:
try:
f = open(file, 'r')
except IOError, msg:
print file, ":", msg
sys.exit(1)
data = f.read()
if f is not sys.stdin:
f.close()
if silent:
f = formatter.NullFormatter()
else:
f = formatter.AbstractFormatter(formatter.DumbWriter())
p = HTMLParser(f)
p.feed(data)
p.close()
if __name__ == '__main__':
test()
"""A parser for HTML and XHTML."""
# This file is based on sgmllib.py, but the API is slightly different.
# XXX There should be a way to distinguish between PCDATA (parsed
# character data -- the normal case), RCDATA (replaceable character
# data -- only char and entity references and end tags are special)
# and CDATA (character data -- only end tags are special).
import markupbase
import re
# Regular expressions used for parsing
interesting_normal = re.compile('[&<]')
incomplete = re.compile('&[a-zA-Z#]')
entityref = re.compile('&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]')
charref = re.compile('&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]')
starttagopen = re.compile('<[a-zA-Z]')
piclose = re.compile('>')
commentclose = re.compile(r'--\s*>')
# see http://www.w3.org/TR/html5/tokenization.html#tag-open-state
# and http://www.w3.org/TR/html5/tokenization.html#tag-name-state
# note: if you change tagfind/attrfind remember to update locatestarttagend too
tagfind = re.compile('([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*')
# this regex is currently unused, but left for backward compatibility
tagfind_tolerant = re.compile('[a-zA-Z][^\t\n\r\f />\x00]*')
attrfind = re.compile(
r'((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*'
r'(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*')
locatestarttagend = re.compile(r"""
<[a-zA-Z][^\t\n\r\f />\x00]* # tag name
(?:[\s/]* # optional whitespace before attribute name
(?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name
(?:\s*=+\s* # value indicator
(?:'[^']*' # LITA-enclosed value
|"[^"]*" # LIT-enclosed value
|(?!['"])[^>\s]* # bare value
)
)?(?:\s|/(?!>))*
)*
)?
\s* # trailing whitespace
""", re.VERBOSE)
endendtag = re.compile('>')
# the HTML 5 spec, section 8.1.2.2, doesn't allow spaces between
# </ and the tag name, so maybe this should be fixed
endtagfind = re.compile('</\s*([a-zA-Z][-.a-zA-Z0-9:_]*)\s*>')
class HTMLParseError(Exception):
"""Exception raised for all parse errors."""
def __init__(self, msg, position=(None, None)):
assert msg
self.msg = msg
self.lineno = position[0]
self.offset = position[1]
def __str__(self):
result = self.msg
if self.lineno is not None:
result = result + ", at line %d" % self.lineno
if self.offset is not None:
result = result + ", column %d" % (self.offset + 1)
return result
class HTMLParser(markupbase.ParserBase):
"""Find tags and other markup and call handler functions.
Usage:
p = HTMLParser()
p.feed(data)
...
p.close()
Start tags are handled by calling self.handle_starttag() or
self.handle_startendtag(); end tags by self.handle_endtag(). The
data between tags is passed from the parser to the derived class
by calling self.handle_data() with the data as argument (the data
may be split up in arbitrary chunks). Entity references are
passed by calling self.handle_entityref() with the entity
reference as the argument. Numeric character references are
passed to self.handle_charref() with the string containing the
reference as the argument.
"""
CDATA_CONTENT_ELEMENTS = ("script", "style")
def __init__(self):
"""Initialize and reset this instance."""
self.reset()
def reset(self):
"""Reset this instance. Loses all unprocessed data."""
self.rawdata = ''
self.lasttag = '???'
self.interesting = interesting_normal
self.cdata_elem = None
markupbase.ParserBase.reset(self)
def feed(self, data):
r"""Feed data to the parser.
Call this as often as you want, with as little or as much text
as you want (may include '\n').
"""
self.rawdata = self.rawdata + data
self.goahead(0)
def close(self):
"""Handle any buffered data."""
self.goahead(1)
def error(self, message):
raise HTMLParseError(message, self.getpos())
__starttag_text = None
def get_starttag_text(self):
"""Return full source of start tag: '<...>'."""
return self.__starttag_text
def set_cdata_mode(self, elem):
self.cdata_elem = elem.lower()
self.interesting = re.compile(r'</\s*%s\s*>' % self.cdata_elem, re.I)
def clear_cdata_mode(self):
self.interesting = interesting_normal
self.cdata_elem = None
# Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
def goahead(self, end):
rawdata = self.rawdata
i = 0
n = len(rawdata)
while i < n:
match = self.interesting.search(rawdata, i) # < or &
if match:
j = match.start()
else:
if self.cdata_elem:
break
j = n
if i < j: self.handle_data(rawdata[i:j])
i = self.updatepos(i, j)
if i == n: break
startswith = rawdata.startswith
if startswith('<', i):
if starttagopen.match(rawdata, i): # < + letter
k = self.parse_starttag(i)
elif startswith("</", i):
k = self.parse_endtag(i)
elif startswith("<!--", i):
k = self.parse_comment(i)
elif startswith("<?", i):
k = self.parse_pi(i)
elif startswith("<!", i):
k = self.parse_html_declaration(i)
elif (i + 1) < n:
self.handle_data("<")
k = i + 1
else:
break
if k < 0:
if not end:
break
k = rawdata.find('>', i + 1)
if k < 0:
k = rawdata.find('<', i + 1)
if k < 0:
k = i + 1
else:
k += 1
self.handle_data(rawdata[i:k])
i = self.updatepos(i, k)
elif startswith("&#", i):
match = charref.match(rawdata, i)
if match:
name = match.group()[2:-1]
self.handle_charref(name)
k = match.end()
if not startswith(';', k-1):
k = k - 1
i = self.updatepos(i, k)
continue
else:
if ";" in rawdata[i:]: # bail by consuming '&#'
self.handle_data(rawdata[i:i+2])
i = self.updatepos(i, i+2)
break
elif startswith('&', i):
match = entityref.match(rawdata, i)
if match:
name = match.group(1)
self.handle_entityref(name)
k = match.end()
if not startswith(';', k-1):
k = k - 1
i = self.updatepos(i, k)
continue
match = incomplete.match(rawdata, i)
if match:
# match.group() will contain at least 2 chars
if end and match.group() == rawdata[i:]:
self.error("EOF in middle of entity or char ref")
# incomplete
break
elif (i + 1) < n:
# not the end of the buffer, and can't be confused
# with some other construct
self.handle_data("&")
i = self.updatepos(i, i + 1)
else:
break
else:
assert 0, "interesting.search() lied"
# end while
if end and i < n and not self.cdata_elem:
self.handle_data(rawdata[i:n])
i = self.updatepos(i, n)
self.rawdata = rawdata[i:]
# Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
def parse_html_declaration(self, i):
rawdata = self.rawdata
if rawdata[i:i+2] != '<!':
self.error('unexpected call to parse_html_declaration()')
if rawdata[i:i+4] == '<!--':
# this case is actually already handled in goahead()
return self.parse_comment(i)
elif rawdata[i:i+3] == '<![':
return self.parse_marked_section(i)
elif rawdata[i:i+9].lower() == '<!doctype':
# find the closing >
gtpos = rawdata.find('>', i+9)
if gtpos == -1:
return -1
self.handle_decl(rawdata[i+2:gtpos])
return gtpos+1
else:
return self.parse_bogus_comment(i)
# Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
def parse_bogus_comment(self, i, report=1):
rawdata = self.rawdata
if rawdata[i:i+2] not in ('<!', '</'):
self.error('unexpected call to parse_comment()')
pos = rawdata.find('>', i+2)
if pos == -1:
return -1
if report:
self.handle_comment(rawdata[i+2:pos])
return pos + 1
# Internal -- parse processing instr, return end or -1 if not terminated
def parse_pi(self, i):
rawdata = self.rawdata
assert rawdata[i:i+2] == '<?', 'unexpected call to parse_pi()'
match = piclose.search(rawdata, i+2) # >
if not match:
return -1
j = match.start()
self.handle_pi(rawdata[i+2: j])
j = match.end()
return j
# Internal -- handle starttag, return end or -1 if not terminated
def parse_starttag(self, i):
self.__starttag_text = None
endpos = self.check_for_whole_start_tag(i)
if endpos < 0:
return endpos
rawdata = self.rawdata
self.__starttag_text = rawdata[i:endpos]
# Now parse the data between i+1 and j into a tag and attrs
attrs = []
match = tagfind.match(rawdata, i+1)
assert match, 'unexpected call to parse_starttag()'
k = match.end()
self.lasttag = tag = match.group(1).lower()
while k < endpos:
m = attrfind.match(rawdata, k)
if not m:
break
attrname, rest, attrvalue = m.group(1, 2, 3)
if not rest:
attrvalue = None
elif attrvalue[:1] == '\'' == attrvalue[-1:] or \
attrvalue[:1] == '"' == attrvalue[-1:]:
attrvalue = attrvalue[1:-1]
if attrvalue:
attrvalue = self.unescape(attrvalue)
attrs.append((attrname.lower(), attrvalue))
k = m.end()
end = rawdata[k:endpos].strip()
if end not in (">", "/>"):
lineno, offset = self.getpos()
if "\n" in self.__starttag_text:
lineno = lineno + self.__starttag_text.count("\n")
offset = len(self.__starttag_text) \
- self.__starttag_text.rfind("\n")
else:
offset = offset + len(self.__starttag_text)
self.handle_data(rawdata[i:endpos])
return endpos
if end.endswith('/>'):
# XHTML-style empty tag: <span attr="value" />
self.handle_startendtag(tag, attrs)
else:
self.handle_starttag(tag, attrs)
if tag in self.CDATA_CONTENT_ELEMENTS:
self.set_cdata_mode(tag)
return endpos
# Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
def check_for_whole_start_tag(self, i):
rawdata = self.rawdata
m = locatestarttagend.match(rawdata, i)
if m:
j = m.end()
next = rawdata[j:j+1]
if next == ">":
return j + 1
if next == "/":
if rawdata.startswith("/>", j):
return j + 2
if rawdata.startswith("/", j):
# buffer boundary
return -1
# else bogus input
self.updatepos(i, j + 1)
self.error("malformed empty start tag")
if next == "":
# end of input
return -1
if next in ("abcdefghijklmnopqrstuvwxyz=/"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
# end of input in or before attribute value, or we have the
# '/' from a '/>' ending
return -1
if j > i:
return j
else:
return i + 1
raise AssertionError("we should not get here!")
# Internal -- parse endtag, return end or -1 if incomplete
def parse_endtag(self, i):
rawdata = self.rawdata
assert rawdata[i:i+2] == "</", "unexpected call to parse_endtag"
match = endendtag.search(rawdata, i+1) # >
if not match:
return -1
gtpos = match.end()
match = endtagfind.match(rawdata, i) # </ + tag + >
if not match:
if self.cdata_elem is not None:
self.handle_data(rawdata[i:gtpos])
return gtpos
# find the name: w3.org/TR/html5/tokenization.html#tag-name-state
namematch = tagfind.match(rawdata, i+2)
if not namematch:
# w3.org/TR/html5/tokenization.html#end-tag-open-state
if rawdata[i:i+3] == '</>':
return i+3
else:
return self.parse_bogus_comment(i)
tagname = namematch.group(1).lower()
# consume and ignore other stuff between the name and the >
# Note: this is not 100% correct, since we might have things like
# </tag attr=">">, but looking for > after tha name should cover
# most of the cases and is much simpler
gtpos = rawdata.find('>', namematch.end())
self.handle_endtag(tagname)
return gtpos+1
elem = match.group(1).lower() # script or style
if self.cdata_elem is not None:
if elem != self.cdata_elem:
self.handle_data(rawdata[i:gtpos])
return gtpos
self.handle_endtag(elem)
self.clear_cdata_mode()
return gtpos
# Overridable -- finish processing of start+end tag: <tag.../>
def handle_startendtag(self, tag, attrs):
self.handle_starttag(tag, attrs)
self.handle_endtag(tag)
# Overridable -- handle start tag
def handle_starttag(self, tag, attrs):
pass
# Overridable -- handle end tag
def handle_endtag(self, tag):
pass
# Overridable -- handle character reference
def handle_charref(self, name):
pass
# Overridable -- handle entity reference
def handle_entityref(self, name):
pass
# Overridable -- handle data
def handle_data(self, data):
pass
# Overridable -- handle comment
def handle_comment(self, data):
pass
# Overridable -- handle declaration
def handle_decl(self, decl):
pass
# Overridable -- handle processing instruction
def handle_pi(self, data):
pass
def unknown_decl(self, data):
pass
# Internal -- helper to remove special character quoting
entitydefs = None
def unescape(self, s):
if '&' not in s:
return s
def replaceEntities(s):
s = s.groups()[0]
try:
if s[0] == "#":
s = s[1:]
if s[0] in ['x','X']:
c = int(s[1:], 16)
else:
c = int(s)
return unichr(c)
except ValueError:
return '&#'+s+';'
else:
# Cannot use name2codepoint directly, because HTMLParser supports apos,
# which is not part of HTML 4
if HTMLParser.entitydefs is None:
import htmlentitydefs
entitydefs = {'apos':u"'"}
for k, v in htmlentitydefs.name2codepoint.iteritems():
entitydefs[k] = unichr(v)
HTMLParser.entitydefs = entitydefs
try:
return self.entitydefs[s]
except KeyError:
return '&'+s+';'
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
r"""HTTP/1.1 client library
<intro stuff goes here>
<other stuff, too>
HTTPConnection goes through a number of "states", which define when a client
may legally make another request or fetch the response for a particular
request. This diagram details these state transitions:
(null)
|
| HTTPConnection()
v
Idle
|
| putrequest()
v
Request-started
|
| ( putheader() )* endheaders()
v
Request-sent
|
| response = getresponse()
v
Unread-response [Response-headers-read]
|\____________________
| |
| response.read() | putrequest()
v v
Idle Req-started-unread-response
______/|
/ |
response.read() | | ( putheader() )* endheaders()
v v
Request-started Req-sent-unread-response
|
| response.read()
v
Request-sent
This diagram presents the following rules:
-- a second request may not be started until {response-headers-read}
-- a response [object] cannot be retrieved until {request-sent}
-- there is no differentiation between an unread response body and a
partially read response body
Note: this enforcement is applied by the HTTPConnection class. The
HTTPResponse class does not enforce this state machine, which
implies sophisticated clients may accelerate the request/response
pipeline. Caution should be taken, though: accelerating the states
beyond the above pattern may imply knowledge of the server's
connection-close behavior for certain requests. For example, it
is impossible to tell whether the server will close the connection
UNTIL the response headers have been read; this means that further
requests cannot be placed into the pipeline until it is known that
the server will NOT be closing the connection.
Logical State __state __response
------------- ------- ----------
Idle _CS_IDLE None
Request-started _CS_REQ_STARTED None
Request-sent _CS_REQ_SENT None
Unread-response _CS_IDLE <response_class>
Req-started-unread-response _CS_REQ_STARTED <response_class>
Req-sent-unread-response _CS_REQ_SENT <response_class>
"""
from array import array
import os
import re
import socket
from sys import py3kwarning
from urlparse import urlsplit
import warnings
with warnings.catch_warnings():
if py3kwarning:
warnings.filterwarnings("ignore", ".*mimetools has been removed",
DeprecationWarning)
import mimetools
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
__all__ = ["HTTP", "HTTPResponse", "HTTPConnection",
"HTTPException", "NotConnected", "UnknownProtocol",
"UnknownTransferEncoding", "UnimplementedFileMode",
"IncompleteRead", "InvalidURL", "ImproperConnectionState",
"CannotSendRequest", "CannotSendHeader", "ResponseNotReady",
"BadStatusLine", "error", "responses"]
HTTP_PORT = 80
HTTPS_PORT = 443
_UNKNOWN = 'UNKNOWN'
# connection states
_CS_IDLE = 'Idle'
_CS_REQ_STARTED = 'Request-started'
_CS_REQ_SENT = 'Request-sent'
# status codes
# informational
CONTINUE = 100
SWITCHING_PROTOCOLS = 101
PROCESSING = 102
# successful
OK = 200
CREATED = 201
ACCEPTED = 202
NON_AUTHORITATIVE_INFORMATION = 203
NO_CONTENT = 204
RESET_CONTENT = 205
PARTIAL_CONTENT = 206
MULTI_STATUS = 207
IM_USED = 226
# redirection
MULTIPLE_CHOICES = 300
MOVED_PERMANENTLY = 301
FOUND = 302
SEE_OTHER = 303
NOT_MODIFIED = 304
USE_PROXY = 305
TEMPORARY_REDIRECT = 307
# client error
BAD_REQUEST = 400
UNAUTHORIZED = 401
PAYMENT_REQUIRED = 402
FORBIDDEN = 403
NOT_FOUND = 404
METHOD_NOT_ALLOWED = 405
NOT_ACCEPTABLE = 406
PROXY_AUTHENTICATION_REQUIRED = 407
REQUEST_TIMEOUT = 408
CONFLICT = 409
GONE = 410
LENGTH_REQUIRED = 411
PRECONDITION_FAILED = 412
REQUEST_ENTITY_TOO_LARGE = 413
REQUEST_URI_TOO_LONG = 414
UNSUPPORTED_MEDIA_TYPE = 415
REQUESTED_RANGE_NOT_SATISFIABLE = 416
EXPECTATION_FAILED = 417
UNPROCESSABLE_ENTITY = 422
LOCKED = 423
FAILED_DEPENDENCY = 424
UPGRADE_REQUIRED = 426
# server error
INTERNAL_SERVER_ERROR = 500
NOT_IMPLEMENTED = 501
BAD_GATEWAY = 502
SERVICE_UNAVAILABLE = 503
GATEWAY_TIMEOUT = 504
HTTP_VERSION_NOT_SUPPORTED = 505
INSUFFICIENT_STORAGE = 507
NOT_EXTENDED = 510
# Mapping status codes to official W3C names
responses = {
100: 'Continue',
101: 'Switching Protocols',
200: 'OK',
201: 'Created',
202: 'Accepted',
203: 'Non-Authoritative Information',
204: 'No Content',
205: 'Reset Content',
206: 'Partial Content',
300: 'Multiple Choices',
301: 'Moved Permanently',
302: 'Found',
303: 'See Other',
304: 'Not Modified',
305: 'Use Proxy',
306: '(Unused)',
307: 'Temporary Redirect',
400: 'Bad Request',
401: 'Unauthorized',
402: 'Payment Required',
403: 'Forbidden',
404: 'Not Found',
405: 'Method Not Allowed',
406: 'Not Acceptable',
407: 'Proxy Authentication Required',
408: 'Request Timeout',
409: 'Conflict',
410: 'Gone',
411: 'Length Required',
412: 'Precondition Failed',
413: 'Request Entity Too Large',
414: 'Request-URI Too Long',
415: 'Unsupported Media Type',
416: 'Requested Range Not Satisfiable',
417: 'Expectation Failed',
500: 'Internal Server Error',
501: 'Not Implemented',
502: 'Bad Gateway',
503: 'Service Unavailable',
504: 'Gateway Timeout',
505: 'HTTP Version Not Supported',
}
# maximal amount of data to read at one time in _safe_read
MAXAMOUNT = 1048576
# maximal line length when calling readline().
_MAXLINE = 65536
# maximum amount of headers accepted
_MAXHEADERS = 100
# Header name/value ABNF (http://tools.ietf.org/html/rfc7230#section-3.2)
#
# VCHAR = %x21-7E
# obs-text = %x80-FF
# header-field = field-name ":" OWS field-value OWS
# field-name = token
# field-value = *( field-content / obs-fold )
# field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
# field-vchar = VCHAR / obs-text
#
# obs-fold = CRLF 1*( SP / HTAB )
# ; obsolete line folding
# ; see Section 3.2.4
# token = 1*tchar
#
# tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
# / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
# / DIGIT / ALPHA
# ; any VCHAR, except delimiters
#
# VCHAR defined in http://tools.ietf.org/html/rfc5234#appendix-B.1
# the patterns for both name and value are more lenient than RFC
# definitions to allow for backwards compatibility
_is_legal_header_name = re.compile(r'\A[^:\s][^:\r\n]*\Z').match
_is_illegal_header_value = re.compile(r'\n(?![ \t])|\r(?![ \t\n])').search
# These characters are not allowed within HTTP URL paths.
# See https://tools.ietf.org/html/rfc3986#section-3.3 and the
# https://tools.ietf.org/html/rfc3986#appendix-A pchar definition.
# Prevents CVE-2019-9740. Includes control characters such as \r\n.
# Restrict non-ASCII characters above \x7f (0x80-0xff).
_contains_disallowed_url_pchar_re = re.compile('[\x00-\x20\x7f-\xff]')
# Arguably only these _should_ allowed:
# _is_allowed_url_pchars_re = re.compile(r"^[/!$&'()*+,;=:@%a-zA-Z0-9._~-]+$")
# We are more lenient for assumed real world compatibility purposes.
# We always set the Content-Length header for these methods because some
# servers will otherwise respond with a 411
_METHODS_EXPECTING_BODY = {'PATCH', 'POST', 'PUT'}
class HTTPMessage(mimetools.Message):
def addheader(self, key, value):
"""Add header for field key handling repeats."""
prev = self.dict.get(key)
if prev is None:
self.dict[key] = value
else:
combined = ", ".join((prev, value))
self.dict[key] = combined
def addcontinue(self, key, more):
"""Add more field data from a continuation line."""
prev = self.dict[key]
self.dict[key] = prev + "\n " + more
def readheaders(self):
"""Read header lines.
Read header lines up to the entirely blank line that terminates them.
The (normally blank) line that ends the headers is skipped, but not
included in the returned list. If an invalid line is found in the
header section, it is skipped, and further lines are processed.
The variable self.status is set to the empty string if all went well,
otherwise it is an error message. The variable self.headers is a
completely uninterpreted list of lines contained in the header (so
printing them will reproduce the header exactly as it appears in the
file).
If multiple header fields with the same name occur, they are combined
according to the rules in RFC 2616 sec 4.2:
Appending each subsequent field-value to the first, each separated
by a comma. The order in which header fields with the same field-name
are received is significant to the interpretation of the combined
field value.
"""
# XXX The implementation overrides the readheaders() method of
# rfc822.Message. The base class design isn't amenable to
# customized behavior here so the method here is a copy of the
# base class code with a few small changes.
self.dict = {}
self.unixfrom = ''
self.headers = hlist = []
self.status = ''
headerseen = ""
firstline = 1
tell = None
if not hasattr(self.fp, 'unread') and self.seekable:
tell = self.fp.tell
while True:
if len(hlist) > _MAXHEADERS:
raise HTTPException("got more than %d headers" % _MAXHEADERS)
if tell:
try:
tell()
except IOError:
tell = None
self.seekable = 0
line = self.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("header line")
if not line:
self.status = 'EOF in headers'
break
# Skip unix From name time lines
if firstline and line.startswith('From '):
self.unixfrom = self.unixfrom + line
continue
firstline = 0
if headerseen and line[0] in ' \t':
# XXX Not sure if continuation lines are handled properly
# for http and/or for repeating headers
# It's a continuation line.
hlist.append(line)
self.addcontinue(headerseen, line.strip())
continue
elif self.iscomment(line):
# It's a comment. Ignore it.
continue
elif self.islast(line):
# Note! No pushback here! The delimiter line gets eaten.
break
headerseen = self.isheader(line)
if headerseen:
# It's a legal header line, save it.
hlist.append(line)
self.addheader(headerseen, line[len(headerseen)+1:].strip())
elif headerseen is not None:
# An empty header name. These aren't allowed in HTTP, but it's
# probably a benign mistake. Don't add the header, just keep
# going.
pass
else:
# It's not a header line; skip it and try the next line.
self.status = 'Non-header line where header expected'
class HTTPResponse:
# strict: If true, raise BadStatusLine if the status line can't be
# parsed as a valid HTTP/1.0 or 1.1 status line. By default it is
# false because it prevents clients from talking to HTTP/0.9
# servers. Note that a response with a sufficiently corrupted
# status line will look like an HTTP/0.9 response.
# See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.
def __init__(self, sock, debuglevel=0, strict=0, method=None, buffering=False):
if buffering:
# The caller won't be using any sock.recv() calls, so buffering
# is fine and recommended for performance.
self.fp = sock.makefile('rb')
else:
# The buffer size is specified as zero, because the headers of
# the response are read with readline(). If the reads were
# buffered the readline() calls could consume some of the
# response, which make be read via a recv() on the underlying
# socket.
self.fp = sock.makefile('rb', 0)
self.debuglevel = debuglevel
self.strict = strict
self._method = method
self.msg = None
# from the Status-Line of the response
self.version = _UNKNOWN # HTTP-Version
self.status = _UNKNOWN # Status-Code
self.reason = _UNKNOWN # Reason-Phrase
self.chunked = _UNKNOWN # is "chunked" being used?
self.chunk_left = _UNKNOWN # bytes left to read in current chunk
self.length = _UNKNOWN # number of bytes left in response
self.will_close = _UNKNOWN # conn will close at end of response
def _read_status(self):
# Initialize with Simple-Response defaults
line = self.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("header line")
if self.debuglevel > 0:
print "reply:", repr(line)
if not line:
# Presumably, the server closed the connection before
# sending a valid response.
raise BadStatusLine("No status line received - the server has closed the connection")
try:
[version, status, reason] = line.split(None, 2)
except ValueError:
try:
[version, status] = line.split(None, 1)
reason = ""
except ValueError:
# empty version will cause next test to fail and status
# will be treated as 0.9 response.
version = ""
if not version.startswith('HTTP/'):
if self.strict:
self.close()
raise BadStatusLine(line)
else:
# assume it's a Simple-Response from an 0.9 server
self.fp = LineAndFileWrapper(line, self.fp)
return "HTTP/0.9", 200, ""
# The status code is a three-digit number
try:
status = int(status)
if status < 100 or status > 999:
raise BadStatusLine(line)
except ValueError:
raise BadStatusLine(line)
return version, status, reason
def begin(self):
if self.msg is not None:
# we've already started reading the response
return
# read until we get a non-100 response
while True:
version, status, reason = self._read_status()
if status != CONTINUE:
break
# skip the header from the 100 response
while True:
skip = self.fp.readline(_MAXLINE + 1)
if len(skip) > _MAXLINE:
raise LineTooLong("header line")
skip = skip.strip()
if not skip:
break
if self.debuglevel > 0:
print "header:", skip
self.status = status
self.reason = reason.strip()
if version == 'HTTP/1.0':
self.version = 10
elif version.startswith('HTTP/1.'):
self.version = 11 # use HTTP/1.1 code for HTTP/1.x where x>=1
elif version == 'HTTP/0.9':
self.version = 9
else:
raise UnknownProtocol(version)
if self.version == 9:
self.length = None
self.chunked = 0
self.will_close = 1
self.msg = HTTPMessage(StringIO())
return
self.msg = HTTPMessage(self.fp, 0)
if self.debuglevel > 0:
for hdr in self.msg.headers:
print "header:", hdr,
# don't let the msg keep an fp
self.msg.fp = None
# are we using the chunked-style of transfer encoding?
tr_enc = self.msg.getheader('transfer-encoding')
if tr_enc and tr_enc.lower() == "chunked":
self.chunked = 1
self.chunk_left = None
else:
self.chunked = 0
# will the connection close at the end of the response?
self.will_close = self._check_close()
# do we have a Content-Length?
# NOTE: RFC 2616, S4.4, #3 says we ignore this if tr_enc is "chunked"
length = self.msg.getheader('content-length')
if length and not self.chunked:
try:
self.length = int(length)
except ValueError:
self.length = None
else:
if self.length < 0: # ignore nonsensical negative lengths
self.length = None
else:
self.length = None
# does the body have a fixed length? (of zero)
if (status == NO_CONTENT or status == NOT_MODIFIED or
100 <= status < 200 or # 1xx codes
self._method == 'HEAD'):
self.length = 0
# if the connection remains open, and we aren't using chunked, and
# a content-length was not provided, then assume that the connection
# WILL close.
if not self.will_close and \
not self.chunked and \
self.length is None:
self.will_close = 1
def _check_close(self):
conn = self.msg.getheader('connection')
if self.version == 11:
# An HTTP/1.1 proxy is assumed to stay open unless
# explicitly closed.
conn = self.msg.getheader('connection')
if conn and "close" in conn.lower():
return True
return False
# Some HTTP/1.0 implementations have support for persistent
# connections, using rules different than HTTP/1.1.
# For older HTTP, Keep-Alive indicates persistent connection.
if self.msg.getheader('keep-alive'):
return False
# At least Akamai returns a "Connection: Keep-Alive" header,
# which was supposed to be sent by the client.
if conn and "keep-alive" in conn.lower():
return False
# Proxy-Connection is a netscape hack.
pconn = self.msg.getheader('proxy-connection')
if pconn and "keep-alive" in pconn.lower():
return False
# otherwise, assume it will close
return True
def close(self):
fp = self.fp
if fp:
self.fp = None
fp.close()
def isclosed(self):
# NOTE: it is possible that we will not ever call self.close(). This
# case occurs when will_close is TRUE, length is None, and we
# read up to the last byte, but NOT past it.
#
# IMPLIES: if will_close is FALSE, then self.close() will ALWAYS be
# called, meaning self.isclosed() is meaningful.
return self.fp is None
# XXX It would be nice to have readline and __iter__ for this, too.
def read(self, amt=None):
if self.fp is None:
return ''
if self._method == 'HEAD':
self.close()
return ''
if self.chunked:
return self._read_chunked(amt)
if amt is None:
# unbounded read
if self.length is None:
s = self.fp.read()
else:
try:
s = self._safe_read(self.length)
except IncompleteRead:
self.close()
raise
self.length = 0
self.close() # we read everything
return s
if self.length is not None:
if amt > self.length:
# clip the read to the "end of response"
amt = self.length
# we do not use _safe_read() here because this may be a .will_close
# connection, and the user is reading more bytes than will be provided
# (for example, reading in 1k chunks)
s = self.fp.read(amt)
if not s and amt:
# Ideally, we would raise IncompleteRead if the content-length
# wasn't satisfied, but it might break compatibility.
self.close()
if self.length is not None:
self.length -= len(s)
if not self.length:
self.close()
return s
def _read_chunked(self, amt):
assert self.chunked != _UNKNOWN
chunk_left = self.chunk_left
value = []
while True:
if chunk_left is None:
line = self.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("chunk size")
i = line.find(';')
if i >= 0:
line = line[:i] # strip chunk-extensions
try:
chunk_left = int(line, 16)
except ValueError:
# close the connection as protocol synchronisation is
# probably lost
self.close()
raise IncompleteRead(''.join(value))
if chunk_left == 0:
break
if amt is None:
value.append(self._safe_read(chunk_left))
elif amt < chunk_left:
value.append(self._safe_read(amt))
self.chunk_left = chunk_left - amt
return ''.join(value)
elif amt == chunk_left:
value.append(self._safe_read(amt))
self._safe_read(2) # toss the CRLF at the end of the chunk
self.chunk_left = None
return ''.join(value)
else:
value.append(self._safe_read(chunk_left))
amt -= chunk_left
# we read the whole chunk, get another
self._safe_read(2) # toss the CRLF at the end of the chunk
chunk_left = None
# read and discard trailer up to the CRLF terminator
### note: we shouldn't have any trailers!
while True:
line = self.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("trailer line")
if not line:
# a vanishingly small number of sites EOF without
# sending the trailer
break
if line == '\r\n':
break
# we read everything; close the "file"
self.close()
return ''.join(value)
def _safe_read(self, amt):
"""Read the number of bytes requested, compensating for partial reads.
Normally, we have a blocking socket, but a read() can be interrupted
by a signal (resulting in a partial read).
Note that we cannot distinguish between EOF and an interrupt when zero
bytes have been read. IncompleteRead() will be raised in this
situation.
This function should be used when <amt> bytes "should" be present for
reading. If the bytes are truly not available (due to EOF), then the
IncompleteRead exception can be used to detect the problem.
"""
# NOTE(gps): As of svn r74426 socket._fileobject.read(x) will never
# return less than x bytes unless EOF is encountered. It now handles
# signal interruptions (socket.error EINTR) internally. This code
# never caught that exception anyways. It seems largely pointless.
# self.fp.read(amt) will work fine.
s = []
while amt > 0:
chunk = self.fp.read(min(amt, MAXAMOUNT))
if not chunk:
raise IncompleteRead(''.join(s), amt)
s.append(chunk)
amt -= len(chunk)
return ''.join(s)
def fileno(self):
return self.fp.fileno()
def getheader(self, name, default=None):
if self.msg is None:
raise ResponseNotReady()
return self.msg.getheader(name, default)
def getheaders(self):
"""Return list of (header, value) tuples."""
if self.msg is None:
raise ResponseNotReady()
return self.msg.items()
class HTTPConnection:
_http_vsn = 11
_http_vsn_str = 'HTTP/1.1'
response_class = HTTPResponse
default_port = HTTP_PORT
auto_open = 1
debuglevel = 0
strict = 0
def __init__(self, host, port=None, strict=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT, source_address=None):
self.timeout = timeout
self.source_address = source_address
self.sock = None
self._buffer = []
self.__response = None
self.__state = _CS_IDLE
self._method = None
self._tunnel_host = None
self._tunnel_port = None
self._tunnel_headers = {}
if strict is not None:
self.strict = strict
(self.host, self.port) = self._get_hostport(host, port)
self._validate_host(self.host)
# This is stored as an instance variable to allow unittests
# to replace with a suitable mock
self._create_connection = socket.create_connection
def set_tunnel(self, host, port=None, headers=None):
""" Set up host and port for HTTP CONNECT tunnelling.
In a connection that uses HTTP Connect tunneling, the host passed to the
constructor is used as proxy server that relays all communication to the
endpoint passed to set_tunnel. This is done by sending a HTTP CONNECT
request to the proxy server when the connection is established.
This method must be called before the HTTP connection has been
established.
The headers argument should be a mapping of extra HTTP headers
to send with the CONNECT request.
"""
# Verify if this is required.
if self.sock:
raise RuntimeError("Can't setup tunnel for established connection.")
self._tunnel_host, self._tunnel_port = self._get_hostport(host, port)
if headers:
self._tunnel_headers = headers
else:
self._tunnel_headers.clear()
def _get_hostport(self, host, port):
if port is None:
i = host.rfind(':')
j = host.rfind(']') # ipv6 addresses have [...]
if i > j:
try:
port = int(host[i+1:])
except ValueError:
if host[i+1:] == "": # http://foo.com:/ == http://foo.com/
port = self.default_port
else:
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
host = host[:i]
else:
port = self.default_port
if host and host[0] == '[' and host[-1] == ']':
host = host[1:-1]
return (host, port)
def set_debuglevel(self, level):
self.debuglevel = level
def _tunnel(self):
self.send("CONNECT %s:%d HTTP/1.0\r\n" % (self._tunnel_host,
self._tunnel_port))
for header, value in self._tunnel_headers.iteritems():
self.send("%s: %s\r\n" % (header, value))
self.send("\r\n")
response = self.response_class(self.sock, strict = self.strict,
method = self._method)
(version, code, message) = response._read_status()
if version == "HTTP/0.9":
# HTTP/0.9 doesn't support the CONNECT verb, so if httplib has
# concluded HTTP/0.9 is being used something has gone wrong.
self.close()
raise socket.error("Invalid response from tunnel request")
if code != 200:
self.close()
raise socket.error("Tunnel connection failed: %d %s" % (code,
message.strip()))
while True:
line = response.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("header line")
if not line:
# for sites which EOF without sending trailer
break
if line == '\r\n':
break
def connect(self):
"""Connect to the host and port specified in __init__."""
self.sock = self._create_connection((self.host,self.port),
self.timeout, self.source_address)
if self._tunnel_host:
self._tunnel()
def close(self):
"""Close the connection to the HTTP server."""
self.__state = _CS_IDLE
try:
sock = self.sock
if sock:
self.sock = None
sock.close() # close it manually... there may be other refs
finally:
response = self.__response
if response:
self.__response = None
response.close()
def send(self, data):
"""Send `data' to the server."""
if self.sock is None:
if self.auto_open:
self.connect()
else:
raise NotConnected()
if self.debuglevel > 0:
print "send:", repr(data)
blocksize = 8192
if hasattr(data,'read') and not isinstance(data, array):
if self.debuglevel > 0: print "sendIng a read()able"
datablock = data.read(blocksize)
while datablock:
self.sock.sendall(datablock)
datablock = data.read(blocksize)
else:
self.sock.sendall(data)
def _output(self, s):
"""Add a line of output to the current request buffer.
Assumes that the line does *not* end with \\r\\n.
"""
self._buffer.append(s)
def _send_output(self, message_body=None):
"""Send the currently buffered request and clear the buffer.
Appends an extra \\r\\n to the buffer.
A message_body may be specified, to be appended to the request.
"""
self._buffer.extend(("", ""))
msg = "\r\n".join(self._buffer)
del self._buffer[:]
# If msg and message_body are sent in a single send() call,
# it will avoid performance problems caused by the interaction
# between delayed ack and the Nagle algorithm.
if isinstance(message_body, str):
msg += message_body
message_body = None
self.send(msg)
if message_body is not None:
#message_body was not a string (i.e. it is a file) and
#we must run the risk of Nagle
self.send(message_body)
def putrequest(self, method, url, skip_host=0, skip_accept_encoding=0):
"""Send a request to the server.
`method' specifies an HTTP request method, e.g. 'GET'.
`url' specifies the object being requested, e.g. '/index.html'.
`skip_host' if True does not add automatically a 'Host:' header
`skip_accept_encoding' if True does not add automatically an
'Accept-Encoding:' header
"""
# if a prior response has been completed, then forget about it.
if self.__response and self.__response.isclosed():
self.__response = None
# in certain cases, we cannot issue another request on this connection.
# this occurs when:
# 1) we are in the process of sending a request. (_CS_REQ_STARTED)
# 2) a response to a previous request has signalled that it is going
# to close the connection upon completion.
# 3) the headers for the previous response have not been read, thus
# we cannot determine whether point (2) is true. (_CS_REQ_SENT)
#
# if there is no prior response, then we can request at will.
#
# if point (2) is true, then we will have passed the socket to the
# response (effectively meaning, "there is no prior response"), and
# will open a new one when a new request is made.
#
# Note: if a prior response exists, then we *can* start a new request.
# We are not allowed to begin fetching the response to this new
# request, however, until that prior response is complete.
#
if self.__state == _CS_IDLE:
self.__state = _CS_REQ_STARTED
else:
raise CannotSendRequest()
# Save the method for use later in the response phase
self._method = method
url = url or '/'
self._validate_path(url)
request = '%s %s %s' % (method, url, self._http_vsn_str)
self._output(self._encode_request(request))
if self._http_vsn == 11:
# Issue some standard headers for better HTTP/1.1 compliance
if not skip_host:
# this header is issued *only* for HTTP/1.1
# connections. more specifically, this means it is
# only issued when the client uses the new
# HTTPConnection() class. backwards-compat clients
# will be using HTTP/1.0 and those clients may be
# issuing this header themselves. we should NOT issue
# it twice; some web servers (such as Apache) barf
# when they see two Host: headers
# If we need a non-standard port,include it in the
# header. If the request is going through a proxy,
# but the host of the actual URL, not the host of the
# proxy.
netloc = ''
if url.startswith('http'):
nil, netloc, nil, nil, nil = urlsplit(url)
if netloc:
try:
netloc_enc = netloc.encode("ascii")
except UnicodeEncodeError:
netloc_enc = netloc.encode("idna")
self.putheader('Host', netloc_enc)
else:
if self._tunnel_host:
host = self._tunnel_host
port = self._tunnel_port
else:
host = self.host
port = self.port
try:
host_enc = host.encode("ascii")
except UnicodeEncodeError:
host_enc = host.encode("idna")
# Wrap the IPv6 Host Header with [] (RFC 2732)
if host_enc.find(':') >= 0:
host_enc = "[" + host_enc + "]"
if port == self.default_port:
self.putheader('Host', host_enc)
else:
self.putheader('Host', "%s:%s" % (host_enc, port))
# note: we are assuming that clients will not attempt to set these
# headers since *this* library must deal with the
# consequences. this also means that when the supporting
# libraries are updated to recognize other forms, then this
# code should be changed (removed or updated).
# we only want a Content-Encoding of "identity" since we don't
# support encodings such as x-gzip or x-deflate.
if not skip_accept_encoding:
self.putheader('Accept-Encoding', 'identity')
# we can accept "chunked" Transfer-Encodings, but no others
# NOTE: no TE header implies *only* "chunked"
#self.putheader('TE', 'chunked')
# if TE is supplied in the header, then it must appear in a
# Connection header.
#self.putheader('Connection', 'TE')
else:
# For HTTP/1.0, the server will assume "not chunked"
pass
def _encode_request(self, request):
# On Python 2, request is already encoded (default)
return request
def _validate_path(self, url):
"""Validate a url for putrequest."""
# Prevent CVE-2019-9740.
match = _contains_disallowed_url_pchar_re.search(url)
if match:
msg = (
"URL can't contain control characters. {url!r} "
"(found at least {matched!r})"
).format(matched=match.group(), url=url)
raise InvalidURL(msg)
def _validate_host(self, host):
"""Validate a host so it doesn't contain control characters."""
# Prevent CVE-2019-18348.
match = _contains_disallowed_url_pchar_re.search(host)
if match:
msg = (
"URL can't contain control characters. {host!r} "
"(found at least {matched!r})"
).format(matched=match.group(), host=host)
raise InvalidURL(msg)
def putheader(self, header, *values):
"""Send a request header line to the server.
For example: h.putheader('Accept', 'text/html')
"""
if self.__state != _CS_REQ_STARTED:
raise CannotSendHeader()
header = '%s' % header
if not _is_legal_header_name(header):
raise ValueError('Invalid header name %r' % (header,))
values = [str(v) for v in values]
for one_value in values:
if _is_illegal_header_value(one_value):
raise ValueError('Invalid header value %r' % (one_value,))
hdr = '%s: %s' % (header, '\r\n\t'.join(values))
self._output(hdr)
def endheaders(self, message_body=None):
"""Indicate that the last header line has been sent to the server.
This method sends the request to the server. The optional
message_body argument can be used to pass a message body
associated with the request. The message body will be sent in
the same packet as the message headers if it is string, otherwise it is
sent as a separate packet.
"""
if self.__state == _CS_REQ_STARTED:
self.__state = _CS_REQ_SENT
else:
raise CannotSendHeader()
self._send_output(message_body)
def request(self, method, url, body=None, headers={}):
"""Send a complete request to the server."""
self._send_request(method, url, body, headers)
def _set_content_length(self, body, method):
# Set the content-length based on the body. If the body is "empty", we
# set Content-Length: 0 for methods that expect a body (RFC 7230,
# Section 3.3.2). If the body is set for other methods, we set the
# header provided we can figure out what the length is.
thelen = None
if body is None and method.upper() in _METHODS_EXPECTING_BODY:
thelen = '0'
elif body is not None:
try:
thelen = str(len(body))
except (TypeError, AttributeError):
# If this is a file-like object, try to
# fstat its file descriptor
try:
thelen = str(os.fstat(body.fileno()).st_size)
except (AttributeError, OSError):
# Don't send a length if this failed
if self.debuglevel > 0: print "Cannot stat!!"
if thelen is not None:
self.putheader('Content-Length', thelen)
def _send_request(self, method, url, body, headers):
# Honor explicitly requested Host: and Accept-Encoding: headers.
header_names = dict.fromkeys([k.lower() for k in headers])
skips = {}
if 'host' in header_names:
skips['skip_host'] = 1
if 'accept-encoding' in header_names:
skips['skip_accept_encoding'] = 1
self.putrequest(method, url, **skips)
if 'content-length' not in header_names:
self._set_content_length(body, method)
for hdr, value in headers.iteritems():
self.putheader(hdr, value)
self.endheaders(body)
def getresponse(self, buffering=False):
"Get the response from the server."
# if a prior response has been completed, then forget about it.
if self.__response and self.__response.isclosed():
self.__response = None
#
# if a prior response exists, then it must be completed (otherwise, we
# cannot read this response's header to determine the connection-close
# behavior)
#
# note: if a prior response existed, but was connection-close, then the
# socket and response were made independent of this HTTPConnection
# object since a new request requires that we open a whole new
# connection
#
# this means the prior response had one of two states:
# 1) will_close: this connection was reset and the prior socket and
# response operate independently
# 2) persistent: the response was retained and we await its
# isclosed() status to become true.
#
if self.__state != _CS_REQ_SENT or self.__response:
raise ResponseNotReady()
args = (self.sock,)
kwds = {"strict":self.strict, "method":self._method}
if self.debuglevel > 0:
args += (self.debuglevel,)
if buffering:
#only add this keyword if non-default, for compatibility with
#other response_classes.
kwds["buffering"] = True;
response = self.response_class(*args, **kwds)
try:
response.begin()
assert response.will_close != _UNKNOWN
self.__state = _CS_IDLE
if response.will_close:
# this effectively passes the connection to the response
self.close()
else:
# remember this, so we can tell when it is complete
self.__response = response
return response
except:
response.close()
raise
class HTTP:
"Compatibility class with httplib.py from 1.5."
_http_vsn = 10
_http_vsn_str = 'HTTP/1.0'
debuglevel = 0
_connection_class = HTTPConnection
def __init__(self, host='', port=None, strict=None):
"Provide a default host, since the superclass requires one."
# some joker passed 0 explicitly, meaning default port
if port == 0:
port = None
# Note that we may pass an empty string as the host; this will raise
# an error when we attempt to connect. Presumably, the client code
# will call connect before then, with a proper host.
self._setup(self._connection_class(host, port, strict))
def _setup(self, conn):
self._conn = conn
# set up delegation to flesh out interface
self.send = conn.send
self.putrequest = conn.putrequest
self.putheader = conn.putheader
self.endheaders = conn.endheaders
self.set_debuglevel = conn.set_debuglevel
conn._http_vsn = self._http_vsn
conn._http_vsn_str = self._http_vsn_str
self.file = None
def connect(self, host=None, port=None):
"Accept arguments to set the host/port, since the superclass doesn't."
if host is not None:
(self._conn.host, self._conn.port) = self._conn._get_hostport(host, port)
self._conn.connect()
def getfile(self):
"Provide a getfile, since the superclass' does not use this concept."
return self.file
def getreply(self, buffering=False):
"""Compat definition since superclass does not define it.
Returns a tuple consisting of:
- server status code (e.g. '200' if all goes well)
- server "reason" corresponding to status code
- any RFC822 headers in the response from the server
"""
try:
if not buffering:
response = self._conn.getresponse()
else:
#only add this keyword if non-default for compatibility
#with other connection classes
response = self._conn.getresponse(buffering)
except BadStatusLine, e:
### hmm. if getresponse() ever closes the socket on a bad request,
### then we are going to have problems with self.sock
### should we keep this behavior? do people use it?
# keep the socket open (as a file), and return it
self.file = self._conn.sock.makefile('rb', 0)
# close our socket -- we want to restart after any protocol error
self.close()
self.headers = None
return -1, e.line, None
self.headers = response.msg
self.file = response.fp
return response.status, response.reason, response.msg
def close(self):
self._conn.close()
# note that self.file == response.fp, which gets closed by the
# superclass. just clear the object ref here.
### hmm. messy. if status==-1, then self.file is owned by us.
### well... we aren't explicitly closing, but losing this ref will
### do it
self.file = None
try:
import ssl
except ImportError:
pass
else:
class HTTPSConnection(HTTPConnection):
"This class allows communication via SSL."
default_port = HTTPS_PORT
def __init__(self, host, port=None, key_file=None, cert_file=None,
strict=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None, context=None):
HTTPConnection.__init__(self, host, port, strict, timeout,
source_address)
self.key_file = key_file
self.cert_file = cert_file
if context is None:
context = ssl._create_default_https_context()
if key_file or cert_file:
context.load_cert_chain(cert_file, key_file)
self._context = context
def connect(self):
"Connect to a host on a given (SSL) port."
HTTPConnection.connect(self)
if self._tunnel_host:
server_hostname = self._tunnel_host
else:
server_hostname = self.host
self.sock = self._context.wrap_socket(self.sock,
server_hostname=server_hostname)
__all__.append("HTTPSConnection")
class HTTPS(HTTP):
"""Compatibility with 1.5 httplib interface
Python 1.5.2 did not have an HTTPS class, but it defined an
interface for sending http requests that is also useful for
https.
"""
_connection_class = HTTPSConnection
def __init__(self, host='', port=None, key_file=None, cert_file=None,
strict=None, context=None):
# provide a default host, pass the X509 cert info
# urf. compensate for bad input.
if port == 0:
port = None
self._setup(self._connection_class(host, port, key_file,
cert_file, strict,
context=context))
# we never actually use these for anything, but we keep them
# here for compatibility with post-1.5.2 CVS.
self.key_file = key_file
self.cert_file = cert_file
def FakeSocket (sock, sslobj):
warnings.warn("FakeSocket is deprecated, and won't be in 3.x. " +
"Use the result of ssl.wrap_socket() directly instead.",
DeprecationWarning, stacklevel=2)
return sslobj
class HTTPException(Exception):
# Subclasses that define an __init__ must call Exception.__init__
# or define self.args. Otherwise, str() will fail.
pass
class NotConnected(HTTPException):
pass
class InvalidURL(HTTPException):
pass
class UnknownProtocol(HTTPException):
def __init__(self, version):
self.args = version,
self.version = version
class UnknownTransferEncoding(HTTPException):
pass
class UnimplementedFileMode(HTTPException):
pass
class IncompleteRead(HTTPException):
def __init__(self, partial, expected=None):
self.args = partial,
self.partial = partial
self.expected = expected
def __repr__(self):
if self.expected is not None:
e = ', %i more expected' % self.expected
else:
e = ''
return 'IncompleteRead(%i bytes read%s)' % (len(self.partial), e)
def __str__(self):
return repr(self)
class ImproperConnectionState(HTTPException):
pass
class CannotSendRequest(ImproperConnectionState):
pass
class CannotSendHeader(ImproperConnectionState):
pass
class ResponseNotReady(ImproperConnectionState):
pass
class BadStatusLine(HTTPException):
def __init__(self, line):
if not line:
line = repr(line)
self.args = line,
self.line = line
class LineTooLong(HTTPException):
def __init__(self, line_type):
HTTPException.__init__(self, "got more than %d bytes when reading %s"
% (_MAXLINE, line_type))
# for backwards compatibility
error = HTTPException
class LineAndFileWrapper:
"""A limited file-like object for HTTP/0.9 responses."""
# The status-line parsing code calls readline(), which normally
# get the HTTP status line. For a 0.9 response, however, this is
# actually the first line of the body! Clients need to get a
# readable file object that contains that line.
def __init__(self, line, file):
self._line = line
self._file = file
self._line_consumed = 0
self._line_offset = 0
self._line_left = len(line)
def __getattr__(self, attr):
return getattr(self._file, attr)
def _done(self):
# called when the last byte is read from the line. After the
# call, all read methods are delegated to the underlying file
# object.
self._line_consumed = 1
self.read = self._file.read
self.readline = self._file.readline
self.readlines = self._file.readlines
def read(self, amt=None):
if self._line_consumed:
return self._file.read(amt)
assert self._line_left
if amt is None or amt > self._line_left:
s = self._line[self._line_offset:]
self._done()
if amt is None:
return s + self._file.read()
else:
return s + self._file.read(amt - len(s))
else:
assert amt <= self._line_left
i = self._line_offset
j = i + amt
s = self._line[i:j]
self._line_offset = j
self._line_left -= amt
if self._line_left == 0:
self._done()
return s
def readline(self):
if self._line_consumed:
return self._file.readline()
assert self._line_left
s = self._line[self._line_offset:]
self._done()
return s
def readlines(self, size=None):
if self._line_consumed:
return self._file.readlines(size)
assert self._line_left
L = [self._line[self._line_offset:]]
self._done()
if size is None:
return L + self._file.readlines()
else:
return L + self._file.readlines(size)
"""Import hook support.
Consistent use of this module will make it possible to change the
different mechanisms involved in loading modules independently.
While the built-in module imp exports interfaces to the built-in
module searching and loading algorithm, and it is possible to replace
the built-in function __import__ in order to change the semantics of
the import statement, until now it has been difficult to combine the
effect of different __import__ hacks, like loading modules from URLs
by rimport.py, or restricted execution by rexec.py.
This module defines three new concepts:
1) A "file system hooks" class provides an interface to a filesystem.
One hooks class is defined (Hooks), which uses the interface provided
by standard modules os and os.path. It should be used as the base
class for other hooks classes.
2) A "module loader" class provides an interface to search for a
module in a search path and to load it. It defines a method which
searches for a module in a single directory; by overriding this method
one can redefine the details of the search. If the directory is None,
built-in and frozen modules are searched instead.
Two module loader class are defined, both implementing the search
strategy used by the built-in __import__ function: ModuleLoader uses
the imp module's find_module interface, while HookableModuleLoader
uses a file system hooks class to interact with the file system. Both
use the imp module's load_* interfaces to actually load the module.
3) A "module importer" class provides an interface to import a
module, as well as interfaces to reload and unload a module. It also
provides interfaces to install and uninstall itself instead of the
default __import__ and reload (and unload) functions.
One module importer class is defined (ModuleImporter), which uses a
module loader instance passed in (by default HookableModuleLoader is
instantiated).
The classes defined here should be used as base classes for extended
functionality along those lines.
If a module importer class supports dotted names, its import_module()
must return a different value depending on whether it is called on
behalf of a "from ... import ..." statement or not. (This is caused
by the way the __import__ hook is used by the Python interpreter.) It
would also do wise to install a different version of reload().
"""
from warnings import warnpy3k, warn
warnpy3k("the ihooks module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
import __builtin__
import imp
import os
import sys
__all__ = ["BasicModuleLoader","Hooks","ModuleLoader","FancyModuleLoader",
"BasicModuleImporter","ModuleImporter","install","uninstall"]
VERBOSE = 0
from imp import C_EXTENSION, PY_SOURCE, PY_COMPILED
from imp import C_BUILTIN, PY_FROZEN, PKG_DIRECTORY
BUILTIN_MODULE = C_BUILTIN
FROZEN_MODULE = PY_FROZEN
class _Verbose:
def __init__(self, verbose = VERBOSE):
self.verbose = verbose
def get_verbose(self):
return self.verbose
def set_verbose(self, verbose):
self.verbose = verbose
# XXX The following is an experimental interface
def note(self, *args):
if self.verbose:
self.message(*args)
def message(self, format, *args):
if args:
print format%args
else:
print format
class BasicModuleLoader(_Verbose):
"""Basic module loader.
This provides the same functionality as built-in import. It
doesn't deal with checking sys.modules -- all it provides is
find_module() and a load_module(), as well as find_module_in_dir()
which searches just one directory, and can be overridden by a
derived class to change the module search algorithm when the basic
dependency on sys.path is unchanged.
The interface is a little more convenient than imp's:
find_module(name, [path]) returns None or 'stuff', and
load_module(name, stuff) loads the module.
"""
def find_module(self, name, path = None):
if path is None:
path = [None] + self.default_path()
for dir in path:
stuff = self.find_module_in_dir(name, dir)
if stuff: return stuff
return None
def default_path(self):
return sys.path
def find_module_in_dir(self, name, dir):
if dir is None:
return self.find_builtin_module(name)
else:
try:
return imp.find_module(name, [dir])
except ImportError:
return None
def find_builtin_module(self, name):
# XXX frozen packages?
if imp.is_builtin(name):
return None, '', ('', '', BUILTIN_MODULE)
if imp.is_frozen(name):
return None, '', ('', '', FROZEN_MODULE)
return None
def load_module(self, name, stuff):
file, filename, info = stuff
try:
return imp.load_module(name, file, filename, info)
finally:
if file: file.close()
class Hooks(_Verbose):
"""Hooks into the filesystem and interpreter.
By deriving a subclass you can redefine your filesystem interface,
e.g. to merge it with the URL space.
This base class behaves just like the native filesystem.
"""
# imp interface
def get_suffixes(self): return imp.get_suffixes()
def new_module(self, name): return imp.new_module(name)
def is_builtin(self, name): return imp.is_builtin(name)
def init_builtin(self, name): return imp.init_builtin(name)
def is_frozen(self, name): return imp.is_frozen(name)
def init_frozen(self, name): return imp.init_frozen(name)
def get_frozen_object(self, name): return imp.get_frozen_object(name)
def load_source(self, name, filename, file=None):
return imp.load_source(name, filename, file)
def load_compiled(self, name, filename, file=None):
return imp.load_compiled(name, filename, file)
def load_dynamic(self, name, filename, file=None):
return imp.load_dynamic(name, filename, file)
def load_package(self, name, filename, file=None):
return imp.load_module(name, file, filename, ("", "", PKG_DIRECTORY))
def add_module(self, name):
d = self.modules_dict()
if name in d: return d[name]
d[name] = m = self.new_module(name)
return m
# sys interface
def modules_dict(self): return sys.modules
def default_path(self): return sys.path
def path_split(self, x): return os.path.split(x)
def path_join(self, x, y): return os.path.join(x, y)
def path_isabs(self, x): return os.path.isabs(x)
# etc.
def path_exists(self, x): return os.path.exists(x)
def path_isdir(self, x): return os.path.isdir(x)
def path_isfile(self, x): return os.path.isfile(x)
def path_islink(self, x): return os.path.islink(x)
# etc.
def openfile(self, *x): return open(*x)
openfile_error = IOError
def listdir(self, x): return os.listdir(x)
listdir_error = os.error
# etc.
class ModuleLoader(BasicModuleLoader):
"""Default module loader; uses file system hooks.
By defining suitable hooks, you might be able to load modules from
other sources than the file system, e.g. from compressed or
encrypted files, tar files or (if you're brave!) URLs.
"""
def __init__(self, hooks = None, verbose = VERBOSE):
BasicModuleLoader.__init__(self, verbose)
self.hooks = hooks or Hooks(verbose)
def default_path(self):
return self.hooks.default_path()
def modules_dict(self):
return self.hooks.modules_dict()
def get_hooks(self):
return self.hooks
def set_hooks(self, hooks):
self.hooks = hooks
def find_builtin_module(self, name):
# XXX frozen packages?
if self.hooks.is_builtin(name):
return None, '', ('', '', BUILTIN_MODULE)
if self.hooks.is_frozen(name):
return None, '', ('', '', FROZEN_MODULE)
return None
def find_module_in_dir(self, name, dir, allow_packages=1):
if dir is None:
return self.find_builtin_module(name)
if allow_packages:
fullname = self.hooks.path_join(dir, name)
if self.hooks.path_isdir(fullname):
stuff = self.find_module_in_dir("__init__", fullname, 0)
if stuff:
file = stuff[0]
if file: file.close()
return None, fullname, ('', '', PKG_DIRECTORY)
for info in self.hooks.get_suffixes():
suff, mode, type = info
fullname = self.hooks.path_join(dir, name+suff)
try:
fp = self.hooks.openfile(fullname, mode)
return fp, fullname, info
except self.hooks.openfile_error:
pass
return None
def load_module(self, name, stuff):
file, filename, info = stuff
(suff, mode, type) = info
try:
if type == BUILTIN_MODULE:
return self.hooks.init_builtin(name)
if type == FROZEN_MODULE:
return self.hooks.init_frozen(name)
if type == C_EXTENSION:
m = self.hooks.load_dynamic(name, filename, file)
elif type == PY_SOURCE:
m = self.hooks.load_source(name, filename, file)
elif type == PY_COMPILED:
m = self.hooks.load_compiled(name, filename, file)
elif type == PKG_DIRECTORY:
m = self.hooks.load_package(name, filename, file)
else:
raise ImportError, "Unrecognized module type (%r) for %s" % \
(type, name)
finally:
if file: file.close()
m.__file__ = filename
return m
class FancyModuleLoader(ModuleLoader):
"""Fancy module loader -- parses and execs the code itself."""
def load_module(self, name, stuff):
file, filename, (suff, mode, type) = stuff
realfilename = filename
path = None
if type == PKG_DIRECTORY:
initstuff = self.find_module_in_dir("__init__", filename, 0)
if not initstuff:
raise ImportError, "No __init__ module in package %s" % name
initfile, initfilename, initinfo = initstuff
initsuff, initmode, inittype = initinfo
if inittype not in (PY_COMPILED, PY_SOURCE):
if initfile: initfile.close()
raise ImportError, \
"Bad type (%r) for __init__ module in package %s" % (
inittype, name)
path = [filename]
file = initfile
realfilename = initfilename
type = inittype
if type == FROZEN_MODULE:
code = self.hooks.get_frozen_object(name)
elif type == PY_COMPILED:
import marshal
file.seek(8)
code = marshal.load(file)
elif type == PY_SOURCE:
data = file.read()
code = compile(data, realfilename, 'exec')
else:
return ModuleLoader.load_module(self, name, stuff)
m = self.hooks.add_module(name)
if path:
m.__path__ = path
m.__file__ = filename
try:
exec code in m.__dict__
except:
d = self.hooks.modules_dict()
if name in d:
del d[name]
raise
return m
class BasicModuleImporter(_Verbose):
"""Basic module importer; uses module loader.
This provides basic import facilities but no package imports.
"""
def __init__(self, loader = None, verbose = VERBOSE):
_Verbose.__init__(self, verbose)
self.loader = loader or ModuleLoader(None, verbose)
self.modules = self.loader.modules_dict()
def get_loader(self):
return self.loader
def set_loader(self, loader):
self.loader = loader
def get_hooks(self):
return self.loader.get_hooks()
def set_hooks(self, hooks):
return self.loader.set_hooks(hooks)
def import_module(self, name, globals={}, locals={}, fromlist=[]):
name = str(name)
if name in self.modules:
return self.modules[name] # Fast path
stuff = self.loader.find_module(name)
if not stuff:
raise ImportError, "No module named %s" % name
return self.loader.load_module(name, stuff)
def reload(self, module, path = None):
name = str(module.__name__)
stuff = self.loader.find_module(name, path)
if not stuff:
raise ImportError, "Module %s not found for reload" % name
return self.loader.load_module(name, stuff)
def unload(self, module):
del self.modules[str(module.__name__)]
# XXX Should this try to clear the module's namespace?
def install(self):
self.save_import_module = __builtin__.__import__
self.save_reload = __builtin__.reload
if not hasattr(__builtin__, 'unload'):
__builtin__.unload = None
self.save_unload = __builtin__.unload
__builtin__.__import__ = self.import_module
__builtin__.reload = self.reload
__builtin__.unload = self.unload
def uninstall(self):
__builtin__.__import__ = self.save_import_module
__builtin__.reload = self.save_reload
__builtin__.unload = self.save_unload
if not __builtin__.unload:
del __builtin__.unload
class ModuleImporter(BasicModuleImporter):
"""A module importer that supports packages."""
def import_module(self, name, globals=None, locals=None, fromlist=None,
level=-1):
parent = self.determine_parent(globals, level)
q, tail = self.find_head_package(parent, str(name))
m = self.load_tail(q, tail)
if not fromlist:
return q
if hasattr(m, "__path__"):
self.ensure_fromlist(m, fromlist)
return m
def determine_parent(self, globals, level=-1):
if not globals or not level:
return None
pkgname = globals.get('__package__')
if pkgname is not None:
if not pkgname and level > 0:
raise ValueError, 'Attempted relative import in non-package'
else:
# __package__ not set, figure it out and set it
modname = globals.get('__name__')
if modname is None:
return None
if "__path__" in globals:
# __path__ is set so modname is already the package name
pkgname = modname
else:
# normal module, work out package name if any
if '.' not in modname:
if level > 0:
raise ValueError, ('Attempted relative import in '
'non-package')
globals['__package__'] = None
return None
pkgname = modname.rpartition('.')[0]
globals['__package__'] = pkgname
if level > 0:
dot = len(pkgname)
for x in range(level, 1, -1):
try:
dot = pkgname.rindex('.', 0, dot)
except ValueError:
raise ValueError('attempted relative import beyond '
'top-level package')
pkgname = pkgname[:dot]
try:
return sys.modules[pkgname]
except KeyError:
if level < 1:
warn("Parent module '%s' not found while handling "
"absolute import" % pkgname, RuntimeWarning, 1)
return None
else:
raise SystemError, ("Parent module '%s' not loaded, cannot "
"perform relative import" % pkgname)
def find_head_package(self, parent, name):
if '.' in name:
i = name.find('.')
head = name[:i]
tail = name[i+1:]
else:
head = name
tail = ""
if parent:
qname = "%s.%s" % (parent.__name__, head)
else:
qname = head
q = self.import_it(head, qname, parent)
if q: return q, tail
if parent:
qname = head
parent = None
q = self.import_it(head, qname, parent)
if q: return q, tail
raise ImportError, "No module named '%s'" % qname
def load_tail(self, q, tail):
m = q
while tail:
i = tail.find('.')
if i < 0: i = len(tail)
head, tail = tail[:i], tail[i+1:]
mname = "%s.%s" % (m.__name__, head)
m = self.import_it(head, mname, m)
if not m:
raise ImportError, "No module named '%s'" % mname
return m
def ensure_fromlist(self, m, fromlist, recursive=0):
for sub in fromlist:
if sub == "*":
if not recursive:
try:
all = m.__all__
except AttributeError:
pass
else:
self.ensure_fromlist(m, all, 1)
continue
if sub != "*" and not hasattr(m, sub):
subname = "%s.%s" % (m.__name__, sub)
submod = self.import_it(sub, subname, m)
if not submod:
raise ImportError, "No module named '%s'" % subname
def import_it(self, partname, fqname, parent, force_load=0):
if not partname:
# completely empty module name should only happen in
# 'from . import' or __import__("")
return parent
if not force_load:
try:
return self.modules[fqname]
except KeyError:
pass
try:
path = parent and parent.__path__
except AttributeError:
return None
partname = str(partname)
stuff = self.loader.find_module(partname, path)
if not stuff:
return None
fqname = str(fqname)
m = self.loader.load_module(fqname, stuff)
if parent:
setattr(parent, partname, m)
return m
def reload(self, module):
name = str(module.__name__)
if '.' not in name:
return self.import_it(name, name, None, force_load=1)
i = name.rfind('.')
pname = name[:i]
parent = self.modules[pname]
return self.import_it(name[i+1:], name, parent, force_load=1)
default_importer = None
current_importer = None
def install(importer = None):
global current_importer
current_importer = importer or default_importer or ModuleImporter()
current_importer.install()
def uninstall():
global current_importer
current_importer.uninstall()
"""IMAP4 client.
Based on RFC 2060.
Public class: IMAP4
Public variable: Debug
Public functions: Internaldate2tuple
Int2AP
ParseFlags
Time2Internaldate
"""
# Author: Piers Lauder <[email protected]> December 1997.
#
# Authentication code contributed by Donn Cave <[email protected]> June 1998.
# String method conversion by ESR, February 2001.
# GET/SETACL contributed by Anthony Baxter <[email protected]> April 2001.
# IMAP4_SSL contributed by Tino Lange <[email protected]> March 2002.
# GET/SETQUOTA contributed by Andreas Zeidler <[email protected]> June 2002.
# PROXYAUTH contributed by Rick Holbert <[email protected]> November 2002.
# GET/SETANNOTATION contributed by Tomas Lindroos <[email protected]> June 2005.
__version__ = "2.58"
import binascii, errno, random, re, socket, subprocess, sys, time
__all__ = ["IMAP4", "IMAP4_stream", "Internaldate2tuple",
"Int2AP", "ParseFlags", "Time2Internaldate"]
# Globals
CRLF = '\r\n'
Debug = 0
IMAP4_PORT = 143
IMAP4_SSL_PORT = 993
AllowedVersions = ('IMAP4REV1', 'IMAP4') # Most recent first
# Maximal line length when calling readline(). This is to prevent
# reading arbitrary length lines. RFC 3501 and 2060 (IMAP 4rev1)
# don't specify a line length. RFC 2683 suggests limiting client
# command lines to 1000 octets and that servers should be prepared
# to accept command lines up to 8000 octets, so we used to use 10K here.
# In the modern world (eg: gmail) the response to, for example, a
# search command can be quite large, so we now use 1M.
_MAXLINE = 1000000
# Commands
Commands = {
# name valid states
'APPEND': ('AUTH', 'SELECTED'),
'AUTHENTICATE': ('NONAUTH',),
'CAPABILITY': ('NONAUTH', 'AUTH', 'SELECTED', 'LOGOUT'),
'CHECK': ('SELECTED',),
'CLOSE': ('SELECTED',),
'COPY': ('SELECTED',),
'CREATE': ('AUTH', 'SELECTED'),
'DELETE': ('AUTH', 'SELECTED'),
'DELETEACL': ('AUTH', 'SELECTED'),
'EXAMINE': ('AUTH', 'SELECTED'),
'EXPUNGE': ('SELECTED',),
'FETCH': ('SELECTED',),
'GETACL': ('AUTH', 'SELECTED'),
'GETANNOTATION':('AUTH', 'SELECTED'),
'GETQUOTA': ('AUTH', 'SELECTED'),
'GETQUOTAROOT': ('AUTH', 'SELECTED'),
'MYRIGHTS': ('AUTH', 'SELECTED'),
'LIST': ('AUTH', 'SELECTED'),
'LOGIN': ('NONAUTH',),
'LOGOUT': ('NONAUTH', 'AUTH', 'SELECTED', 'LOGOUT'),
'LSUB': ('AUTH', 'SELECTED'),
'MOVE': ('SELECTED',),
'NAMESPACE': ('AUTH', 'SELECTED'),
'NOOP': ('NONAUTH', 'AUTH', 'SELECTED', 'LOGOUT'),
'PARTIAL': ('SELECTED',), # NB: obsolete
'PROXYAUTH': ('AUTH',),
'RENAME': ('AUTH', 'SELECTED'),
'SEARCH': ('SELECTED',),
'SELECT': ('AUTH', 'SELECTED'),
'SETACL': ('AUTH', 'SELECTED'),
'SETANNOTATION':('AUTH', 'SELECTED'),
'SETQUOTA': ('AUTH', 'SELECTED'),
'SORT': ('SELECTED',),
'STATUS': ('AUTH', 'SELECTED'),
'STORE': ('SELECTED',),
'SUBSCRIBE': ('AUTH', 'SELECTED'),
'THREAD': ('SELECTED',),
'UID': ('SELECTED',),
'UNSUBSCRIBE': ('AUTH', 'SELECTED'),
}
# Patterns to match server responses
Continuation = re.compile(r'\+( (?P<data>.*))?')
Flags = re.compile(r'.*FLAGS \((?P<flags>[^\)]*)\)')
InternalDate = re.compile(r'.*INTERNALDATE "'
r'(?P<day>[ 0123][0-9])-(?P<mon>[A-Z][a-z][a-z])-(?P<year>[0-9][0-9][0-9][0-9])'
r' (?P<hour>[0-9][0-9]):(?P<min>[0-9][0-9]):(?P<sec>[0-9][0-9])'
r' (?P<zonen>[-+])(?P<zoneh>[0-9][0-9])(?P<zonem>[0-9][0-9])'
r'"')
Literal = re.compile(r'.*{(?P<size>\d+)}$')
MapCRLF = re.compile(r'\r\n|\r|\n')
Response_code = re.compile(r'\[(?P<type>[A-Z-]+)( (?P<data>[^\]]*))?\]')
Untagged_response = re.compile(r'\* (?P<type>[A-Z-]+)( (?P<data>.*))?')
Untagged_status = re.compile(r'\* (?P<data>\d+) (?P<type>[A-Z-]+)( (?P<data2>.*))?')
class IMAP4:
"""IMAP4 client class.
Instantiate with: IMAP4([host[, port]])
host - host's name (default: localhost);
port - port number (default: standard IMAP4 port).
All IMAP4rev1 commands are supported by methods of the same
name (in lower-case).
All arguments to commands are converted to strings, except for
AUTHENTICATE, and the last argument to APPEND which is passed as
an IMAP4 literal. If necessary (the string contains any
non-printing characters or white-space and isn't enclosed with
either parentheses or double quotes) each string is quoted.
However, the 'password' argument to the LOGIN command is always
quoted. If you want to avoid having an argument string quoted
(eg: the 'flags' argument to STORE) then enclose the string in
parentheses (eg: "(\Deleted)").
Each command returns a tuple: (type, [data, ...]) where 'type'
is usually 'OK' or 'NO', and 'data' is either the text from the
tagged response, or untagged results from command. Each 'data'
is either a string, or a tuple. If a tuple, then the first part
is the header of the response, and the second part contains
the data (ie: 'literal' value).
Errors raise the exception class <instance>.error("<reason>").
IMAP4 server errors raise <instance>.abort("<reason>"),
which is a sub-class of 'error'. Mailbox status changes
from READ-WRITE to READ-ONLY raise the exception class
<instance>.readonly("<reason>"), which is a sub-class of 'abort'.
"error" exceptions imply a program error.
"abort" exceptions imply the connection should be reset, and
the command re-tried.
"readonly" exceptions imply the command should be re-tried.
Note: to use this module, you must read the RFCs pertaining to the
IMAP4 protocol, as the semantics of the arguments to each IMAP4
command are left to the invoker, not to mention the results. Also,
most IMAP servers implement a sub-set of the commands available here.
"""
class error(Exception): pass # Logical errors - debug required
class abort(error): pass # Service errors - close and retry
class readonly(abort): pass # Mailbox status changed to READ-ONLY
mustquote = re.compile(r"[^\w!#$%&'*+,.:;<=>?^`|~-]")
def __init__(self, host = '', port = IMAP4_PORT):
self.debug = Debug
self.state = 'LOGOUT'
self.literal = None # A literal argument to a command
self.tagged_commands = {} # Tagged commands awaiting response
self.untagged_responses = {} # {typ: [data, ...], ...}
self.continuation_response = '' # Last continuation response
self.is_readonly = False # READ-ONLY desired state
self.tagnum = 0
# Open socket to server.
self.open(host, port)
# Create unique tag for this session,
# and compile tagged response matcher.
self.tagpre = Int2AP(random.randint(4096, 65535))
self.tagre = re.compile(r'(?P<tag>'
+ self.tagpre
+ r'\d+) (?P<type>[A-Z]+) (?P<data>.*)')
# Get server welcome message,
# request and store CAPABILITY response.
if __debug__:
self._cmd_log_len = 10
self._cmd_log_idx = 0
self._cmd_log = {} # Last `_cmd_log_len' interactions
if self.debug >= 1:
self._mesg('imaplib version %s' % __version__)
self._mesg('new IMAP4 connection, tag=%s' % self.tagpre)
self.welcome = self._get_response()
if 'PREAUTH' in self.untagged_responses:
self.state = 'AUTH'
elif 'OK' in self.untagged_responses:
self.state = 'NONAUTH'
else:
raise self.error(self.welcome)
typ, dat = self.capability()
if dat == [None]:
raise self.error('no CAPABILITY response from server')
self.capabilities = tuple(dat[-1].upper().split())
if __debug__:
if self.debug >= 3:
self._mesg('CAPABILITIES: %r' % (self.capabilities,))
for version in AllowedVersions:
if not version in self.capabilities:
continue
self.PROTOCOL_VERSION = version
return
raise self.error('server not IMAP4 compliant')
def __getattr__(self, attr):
# Allow UPPERCASE variants of IMAP4 command methods.
if attr in Commands:
return getattr(self, attr.lower())
raise AttributeError("Unknown IMAP4 command: '%s'" % attr)
# Overridable methods
def open(self, host = '', port = IMAP4_PORT):
"""Setup connection to remote server on "host:port"
(default: localhost:standard IMAP4 port).
This connection will be used by the routines:
read, readline, send, shutdown.
"""
self.host = host
self.port = port
self.sock = socket.create_connection((host, port))
self.file = self.sock.makefile('rb')
def read(self, size):
"""Read 'size' bytes from remote."""
return self.file.read(size)
def readline(self):
"""Read line from remote."""
line = self.file.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise self.error("got more than %d bytes" % _MAXLINE)
return line
def send(self, data):
"""Send data to remote."""
self.sock.sendall(data)
def shutdown(self):
"""Close I/O established in "open"."""
self.file.close()
try:
self.sock.shutdown(socket.SHUT_RDWR)
except socket.error as e:
# The server might already have closed the connection.
# On Windows, this may result in WSAEINVAL (error 10022):
# An invalid operation was attempted.
if e.errno not in (errno.ENOTCONN, 10022):
raise
finally:
self.sock.close()
def socket(self):
"""Return socket instance used to connect to IMAP4 server.
socket = <instance>.socket()
"""
return self.sock
# Utility methods
def recent(self):
"""Return most recent 'RECENT' responses if any exist,
else prompt server for an update using the 'NOOP' command.
(typ, [data]) = <instance>.recent()
'data' is None if no new messages,
else list of RECENT responses, most recent last.
"""
name = 'RECENT'
typ, dat = self._untagged_response('OK', [None], name)
if dat[-1]:
return typ, dat
typ, dat = self.noop() # Prod server for response
return self._untagged_response(typ, dat, name)
def response(self, code):
"""Return data for response 'code' if received, or None.
Old value for response 'code' is cleared.
(code, [data]) = <instance>.response(code)
"""
return self._untagged_response(code, [None], code.upper())
# IMAP4 commands
def append(self, mailbox, flags, date_time, message):
"""Append message to named mailbox.
(typ, [data]) = <instance>.append(mailbox, flags, date_time, message)
All args except `message' can be None.
"""
name = 'APPEND'
if not mailbox:
mailbox = 'INBOX'
if flags:
if (flags[0],flags[-1]) != ('(',')'):
flags = '(%s)' % flags
else:
flags = None
if date_time:
date_time = Time2Internaldate(date_time)
else:
date_time = None
self.literal = MapCRLF.sub(CRLF, message)
return self._simple_command(name, mailbox, flags, date_time)
def authenticate(self, mechanism, authobject):
"""Authenticate command - requires response processing.
'mechanism' specifies which authentication mechanism is to
be used - it must appear in <instance>.capabilities in the
form AUTH=<mechanism>.
'authobject' must be a callable object:
data = authobject(response)
It will be called to process server continuation responses.
It should return data that will be encoded and sent to server.
It should return None if the client abort response '*' should
be sent instead.
"""
mech = mechanism.upper()
# XXX: shouldn't this code be removed, not commented out?
#cap = 'AUTH=%s' % mech
#if not cap in self.capabilities: # Let the server decide!
# raise self.error("Server doesn't allow %s authentication." % mech)
self.literal = _Authenticator(authobject).process
typ, dat = self._simple_command('AUTHENTICATE', mech)
if typ != 'OK':
raise self.error(dat[-1])
self.state = 'AUTH'
return typ, dat
def capability(self):
"""(typ, [data]) = <instance>.capability()
Fetch capabilities list from server."""
name = 'CAPABILITY'
typ, dat = self._simple_command(name)
return self._untagged_response(typ, dat, name)
def check(self):
"""Checkpoint mailbox on server.
(typ, [data]) = <instance>.check()
"""
return self._simple_command('CHECK')
def close(self):
"""Close currently selected mailbox.
Deleted messages are removed from writable mailbox.
This is the recommended command before 'LOGOUT'.
(typ, [data]) = <instance>.close()
"""
try:
typ, dat = self._simple_command('CLOSE')
finally:
self.state = 'AUTH'
return typ, dat
def copy(self, message_set, new_mailbox):
"""Copy 'message_set' messages onto end of 'new_mailbox'.
(typ, [data]) = <instance>.copy(message_set, new_mailbox)
"""
return self._simple_command('COPY', message_set, new_mailbox)
def create(self, mailbox):
"""Create new mailbox.
(typ, [data]) = <instance>.create(mailbox)
"""
return self._simple_command('CREATE', mailbox)
def delete(self, mailbox):
"""Delete old mailbox.
(typ, [data]) = <instance>.delete(mailbox)
"""
return self._simple_command('DELETE', mailbox)
def deleteacl(self, mailbox, who):
"""Delete the ACLs (remove any rights) set for who on mailbox.
(typ, [data]) = <instance>.deleteacl(mailbox, who)
"""
return self._simple_command('DELETEACL', mailbox, who)
def expunge(self):
"""Permanently remove deleted items from selected mailbox.
Generates 'EXPUNGE' response for each deleted message.
(typ, [data]) = <instance>.expunge()
'data' is list of 'EXPUNGE'd message numbers in order received.
"""
name = 'EXPUNGE'
typ, dat = self._simple_command(name)
return self._untagged_response(typ, dat, name)
def fetch(self, message_set, message_parts):
"""Fetch (parts of) messages.
(typ, [data, ...]) = <instance>.fetch(message_set, message_parts)
'message_parts' should be a string of selected parts
enclosed in parentheses, eg: "(UID BODY[TEXT])".
'data' are tuples of message part envelope and data.
"""
name = 'FETCH'
typ, dat = self._simple_command(name, message_set, message_parts)
return self._untagged_response(typ, dat, name)
def getacl(self, mailbox):
"""Get the ACLs for a mailbox.
(typ, [data]) = <instance>.getacl(mailbox)
"""
typ, dat = self._simple_command('GETACL', mailbox)
return self._untagged_response(typ, dat, 'ACL')
def getannotation(self, mailbox, entry, attribute):
"""(typ, [data]) = <instance>.getannotation(mailbox, entry, attribute)
Retrieve ANNOTATIONs."""
typ, dat = self._simple_command('GETANNOTATION', mailbox, entry, attribute)
return self._untagged_response(typ, dat, 'ANNOTATION')
def getquota(self, root):
"""Get the quota root's resource usage and limits.
Part of the IMAP4 QUOTA extension defined in rfc2087.
(typ, [data]) = <instance>.getquota(root)
"""
typ, dat = self._simple_command('GETQUOTA', root)
return self._untagged_response(typ, dat, 'QUOTA')
def getquotaroot(self, mailbox):
"""Get the list of quota roots for the named mailbox.
(typ, [[QUOTAROOT responses...], [QUOTA responses]]) = <instance>.getquotaroot(mailbox)
"""
typ, dat = self._simple_command('GETQUOTAROOT', mailbox)
typ, quota = self._untagged_response(typ, dat, 'QUOTA')
typ, quotaroot = self._untagged_response(typ, dat, 'QUOTAROOT')
return typ, [quotaroot, quota]
def list(self, directory='""', pattern='*'):
"""List mailbox names in directory matching pattern.
(typ, [data]) = <instance>.list(directory='""', pattern='*')
'data' is list of LIST responses.
"""
name = 'LIST'
typ, dat = self._simple_command(name, directory, pattern)
return self._untagged_response(typ, dat, name)
def login(self, user, password):
"""Identify client using plaintext password.
(typ, [data]) = <instance>.login(user, password)
NB: 'password' will be quoted.
"""
typ, dat = self._simple_command('LOGIN', user, self._quote(password))
if typ != 'OK':
raise self.error(dat[-1])
self.state = 'AUTH'
return typ, dat
def login_cram_md5(self, user, password):
""" Force use of CRAM-MD5 authentication.
(typ, [data]) = <instance>.login_cram_md5(user, password)
"""
self.user, self.password = user, password
return self.authenticate('CRAM-MD5', self._CRAM_MD5_AUTH)
def _CRAM_MD5_AUTH(self, challenge):
""" Authobject to use with CRAM-MD5 authentication. """
import hmac
return self.user + " " + hmac.HMAC(self.password, challenge).hexdigest()
def logout(self):
"""Shutdown connection to server.
(typ, [data]) = <instance>.logout()
Returns server 'BYE' response.
"""
self.state = 'LOGOUT'
try: typ, dat = self._simple_command('LOGOUT')
except: typ, dat = 'NO', ['%s: %s' % sys.exc_info()[:2]]
self.shutdown()
if 'BYE' in self.untagged_responses:
return 'BYE', self.untagged_responses['BYE']
return typ, dat
def lsub(self, directory='""', pattern='*'):
"""List 'subscribed' mailbox names in directory matching pattern.
(typ, [data, ...]) = <instance>.lsub(directory='""', pattern='*')
'data' are tuples of message part envelope and data.
"""
name = 'LSUB'
typ, dat = self._simple_command(name, directory, pattern)
return self._untagged_response(typ, dat, name)
def myrights(self, mailbox):
"""Show my ACLs for a mailbox (i.e. the rights that I have on mailbox).
(typ, [data]) = <instance>.myrights(mailbox)
"""
typ,dat = self._simple_command('MYRIGHTS', mailbox)
return self._untagged_response(typ, dat, 'MYRIGHTS')
def namespace(self):
""" Returns IMAP namespaces ala rfc2342
(typ, [data, ...]) = <instance>.namespace()
"""
name = 'NAMESPACE'
typ, dat = self._simple_command(name)
return self._untagged_response(typ, dat, name)
def noop(self):
"""Send NOOP command.
(typ, [data]) = <instance>.noop()
"""
if __debug__:
if self.debug >= 3:
self._dump_ur(self.untagged_responses)
return self._simple_command('NOOP')
def partial(self, message_num, message_part, start, length):
"""Fetch truncated part of a message.
(typ, [data, ...]) = <instance>.partial(message_num, message_part, start, length)
'data' is tuple of message part envelope and data.
"""
name = 'PARTIAL'
typ, dat = self._simple_command(name, message_num, message_part, start, length)
return self._untagged_response(typ, dat, 'FETCH')
def proxyauth(self, user):
"""Assume authentication as "user".
Allows an authorised administrator to proxy into any user's
mailbox.
(typ, [data]) = <instance>.proxyauth(user)
"""
name = 'PROXYAUTH'
return self._simple_command('PROXYAUTH', user)
def rename(self, oldmailbox, newmailbox):
"""Rename old mailbox name to new.
(typ, [data]) = <instance>.rename(oldmailbox, newmailbox)
"""
return self._simple_command('RENAME', oldmailbox, newmailbox)
def search(self, charset, *criteria):
"""Search mailbox for matching messages.
(typ, [data]) = <instance>.search(charset, criterion, ...)
'data' is space separated list of matching message numbers.
"""
name = 'SEARCH'
if charset:
typ, dat = self._simple_command(name, 'CHARSET', charset, *criteria)
else:
typ, dat = self._simple_command(name, *criteria)
return self._untagged_response(typ, dat, name)
def select(self, mailbox='INBOX', readonly=False):
"""Select a mailbox.
Flush all untagged responses.
(typ, [data]) = <instance>.select(mailbox='INBOX', readonly=False)
'data' is count of messages in mailbox ('EXISTS' response).
Mandated responses are ('FLAGS', 'EXISTS', 'RECENT', 'UIDVALIDITY'), so
other responses should be obtained via <instance>.response('FLAGS') etc.
"""
self.untagged_responses = {} # Flush old responses.
self.is_readonly = readonly
if readonly:
name = 'EXAMINE'
else:
name = 'SELECT'
typ, dat = self._simple_command(name, mailbox)
if typ != 'OK':
self.state = 'AUTH' # Might have been 'SELECTED'
return typ, dat
self.state = 'SELECTED'
if 'READ-ONLY' in self.untagged_responses \
and not readonly:
if __debug__:
if self.debug >= 1:
self._dump_ur(self.untagged_responses)
raise self.readonly('%s is not writable' % mailbox)
return typ, self.untagged_responses.get('EXISTS', [None])
def setacl(self, mailbox, who, what):
"""Set a mailbox acl.
(typ, [data]) = <instance>.setacl(mailbox, who, what)
"""
return self._simple_command('SETACL', mailbox, who, what)
def setannotation(self, *args):
"""(typ, [data]) = <instance>.setannotation(mailbox[, entry, attribute]+)
Set ANNOTATIONs."""
typ, dat = self._simple_command('SETANNOTATION', *args)
return self._untagged_response(typ, dat, 'ANNOTATION')
def setquota(self, root, limits):
"""Set the quota root's resource limits.
(typ, [data]) = <instance>.setquota(root, limits)
"""
typ, dat = self._simple_command('SETQUOTA', root, limits)
return self._untagged_response(typ, dat, 'QUOTA')
def sort(self, sort_criteria, charset, *search_criteria):
"""IMAP4rev1 extension SORT command.
(typ, [data]) = <instance>.sort(sort_criteria, charset, search_criteria, ...)
"""
name = 'SORT'
#if not name in self.capabilities: # Let the server decide!
# raise self.error('unimplemented extension command: %s' % name)
if (sort_criteria[0],sort_criteria[-1]) != ('(',')'):
sort_criteria = '(%s)' % sort_criteria
typ, dat = self._simple_command(name, sort_criteria, charset, *search_criteria)
return self._untagged_response(typ, dat, name)
def status(self, mailbox, names):
"""Request named status conditions for mailbox.
(typ, [data]) = <instance>.status(mailbox, names)
"""
name = 'STATUS'
#if self.PROTOCOL_VERSION == 'IMAP4': # Let the server decide!
# raise self.error('%s unimplemented in IMAP4 (obtain IMAP4rev1 server, or re-code)' % name)
typ, dat = self._simple_command(name, mailbox, names)
return self._untagged_response(typ, dat, name)
def store(self, message_set, command, flags):
"""Alters flag dispositions for messages in mailbox.
(typ, [data]) = <instance>.store(message_set, command, flags)
"""
if (flags[0],flags[-1]) != ('(',')'):
flags = '(%s)' % flags # Avoid quoting the flags
typ, dat = self._simple_command('STORE', message_set, command, flags)
return self._untagged_response(typ, dat, 'FETCH')
def subscribe(self, mailbox):
"""Subscribe to new mailbox.
(typ, [data]) = <instance>.subscribe(mailbox)
"""
return self._simple_command('SUBSCRIBE', mailbox)
def thread(self, threading_algorithm, charset, *search_criteria):
"""IMAPrev1 extension THREAD command.
(type, [data]) = <instance>.thread(threading_algorithm, charset, search_criteria, ...)
"""
name = 'THREAD'
typ, dat = self._simple_command(name, threading_algorithm, charset, *search_criteria)
return self._untagged_response(typ, dat, name)
def uid(self, command, *args):
"""Execute "command arg ..." with messages identified by UID,
rather than message number.
(typ, [data]) = <instance>.uid(command, arg1, arg2, ...)
Returns response appropriate to 'command'.
"""
command = command.upper()
if not command in Commands:
raise self.error("Unknown IMAP4 UID command: %s" % command)
if self.state not in Commands[command]:
raise self.error("command %s illegal in state %s, "
"only allowed in states %s" %
(command, self.state,
', '.join(Commands[command])))
name = 'UID'
typ, dat = self._simple_command(name, command, *args)
if command in ('SEARCH', 'SORT', 'THREAD'):
name = command
else:
name = 'FETCH'
return self._untagged_response(typ, dat, name)
def unsubscribe(self, mailbox):
"""Unsubscribe from old mailbox.
(typ, [data]) = <instance>.unsubscribe(mailbox)
"""
return self._simple_command('UNSUBSCRIBE', mailbox)
def xatom(self, name, *args):
"""Allow simple extension commands
notified by server in CAPABILITY response.
Assumes command is legal in current state.
(typ, [data]) = <instance>.xatom(name, arg, ...)
Returns response appropriate to extension command `name'.
"""
name = name.upper()
#if not name in self.capabilities: # Let the server decide!
# raise self.error('unknown extension command: %s' % name)
if not name in Commands:
Commands[name] = (self.state,)
return self._simple_command(name, *args)
# Private methods
def _append_untagged(self, typ, dat):
if dat is None: dat = ''
ur = self.untagged_responses
if __debug__:
if self.debug >= 5:
self._mesg('untagged_responses[%s] %s += ["%s"]' %
(typ, len(ur.get(typ,'')), dat))
if typ in ur:
ur[typ].append(dat)
else:
ur[typ] = [dat]
def _check_bye(self):
bye = self.untagged_responses.get('BYE')
if bye:
raise self.abort(bye[-1])
def _command(self, name, *args):
if self.state not in Commands[name]:
self.literal = None
raise self.error("command %s illegal in state %s, "
"only allowed in states %s" %
(name, self.state,
', '.join(Commands[name])))
for typ in ('OK', 'NO', 'BAD'):
if typ in self.untagged_responses:
del self.untagged_responses[typ]
if 'READ-ONLY' in self.untagged_responses \
and not self.is_readonly:
raise self.readonly('mailbox status changed to READ-ONLY')
tag = self._new_tag()
data = '%s %s' % (tag, name)
for arg in args:
if arg is None: continue
data = '%s %s' % (data, self._checkquote(arg))
literal = self.literal
if literal is not None:
self.literal = None
if type(literal) is type(self._command):
literator = literal
else:
literator = None
data = '%s {%s}' % (data, len(literal))
if __debug__:
if self.debug >= 4:
self._mesg('> %s' % data)
else:
self._log('> %s' % data)
try:
self.send('%s%s' % (data, CRLF))
except (socket.error, OSError), val:
raise self.abort('socket error: %s' % val)
if literal is None:
return tag
while 1:
# Wait for continuation response
while self._get_response():
if self.tagged_commands[tag]: # BAD/NO?
return tag
# Send literal
if literator:
literal = literator(self.continuation_response)
if __debug__:
if self.debug >= 4:
self._mesg('write literal size %s' % len(literal))
try:
self.send(literal)
self.send(CRLF)
except (socket.error, OSError), val:
raise self.abort('socket error: %s' % val)
if not literator:
break
return tag
def _command_complete(self, name, tag):
# BYE is expected after LOGOUT
if name != 'LOGOUT':
self._check_bye()
try:
typ, data = self._get_tagged_response(tag)
except self.abort, val:
raise self.abort('command: %s => %s' % (name, val))
except self.error, val:
raise self.error('command: %s => %s' % (name, val))
if name != 'LOGOUT':
self._check_bye()
if typ == 'BAD':
raise self.error('%s command error: %s %s' % (name, typ, data))
return typ, data
def _get_response(self):
# Read response and store.
#
# Returns None for continuation responses,
# otherwise first response line received.
resp = self._get_line()
# Command completion response?
if self._match(self.tagre, resp):
tag = self.mo.group('tag')
if not tag in self.tagged_commands:
raise self.abort('unexpected tagged response: %s' % resp)
typ = self.mo.group('type')
dat = self.mo.group('data')
self.tagged_commands[tag] = (typ, [dat])
else:
dat2 = None
# '*' (untagged) responses?
if not self._match(Untagged_response, resp):
if self._match(Untagged_status, resp):
dat2 = self.mo.group('data2')
if self.mo is None:
# Only other possibility is '+' (continuation) response...
if self._match(Continuation, resp):
self.continuation_response = self.mo.group('data')
return None # NB: indicates continuation
raise self.abort("unexpected response: '%s'" % resp)
typ = self.mo.group('type')
dat = self.mo.group('data')
if dat is None: dat = '' # Null untagged response
if dat2: dat = dat + ' ' + dat2
# Is there a literal to come?
while self._match(Literal, dat):
# Read literal direct from connection.
size = int(self.mo.group('size'))
if __debug__:
if self.debug >= 4:
self._mesg('read literal size %s' % size)
data = self.read(size)
# Store response with literal as tuple
self._append_untagged(typ, (dat, data))
# Read trailer - possibly containing another literal
dat = self._get_line()
self._append_untagged(typ, dat)
# Bracketed response information?
if typ in ('OK', 'NO', 'BAD') and self._match(Response_code, dat):
self._append_untagged(self.mo.group('type'), self.mo.group('data'))
if __debug__:
if self.debug >= 1 and typ in ('NO', 'BAD', 'BYE'):
self._mesg('%s response: %s' % (typ, dat))
return resp
def _get_tagged_response(self, tag):
while 1:
result = self.tagged_commands[tag]
if result is not None:
del self.tagged_commands[tag]
return result
# If we've seen a BYE at this point, the socket will be
# closed, so report the BYE now.
self._check_bye()
# Some have reported "unexpected response" exceptions.
# Note that ignoring them here causes loops.
# Instead, send me details of the unexpected response and
# I'll update the code in `_get_response()'.
try:
self._get_response()
except self.abort, val:
if __debug__:
if self.debug >= 1:
self.print_log()
raise
def _get_line(self):
line = self.readline()
if not line:
raise self.abort('socket error: EOF')
# Protocol mandates all lines terminated by CRLF
if not line.endswith('\r\n'):
raise self.abort('socket error: unterminated line')
line = line[:-2]
if __debug__:
if self.debug >= 4:
self._mesg('< %s' % line)
else:
self._log('< %s' % line)
return line
def _match(self, cre, s):
# Run compiled regular expression match method on 's'.
# Save result, return success.
self.mo = cre.match(s)
if __debug__:
if self.mo is not None and self.debug >= 5:
self._mesg("\tmatched r'%s' => %r" % (cre.pattern, self.mo.groups()))
return self.mo is not None
def _new_tag(self):
tag = '%s%s' % (self.tagpre, self.tagnum)
self.tagnum = self.tagnum + 1
self.tagged_commands[tag] = None
return tag
def _checkquote(self, arg):
# Must quote command args if non-alphanumeric chars present,
# and not already quoted.
if type(arg) is not type(''):
return arg
if len(arg) >= 2 and (arg[0],arg[-1]) in (('(',')'),('"','"')):
return arg
if arg and self.mustquote.search(arg) is None:
return arg
return self._quote(arg)
def _quote(self, arg):
arg = arg.replace('\\', '\\\\')
arg = arg.replace('"', '\\"')
return '"%s"' % arg
def _simple_command(self, name, *args):
return self._command_complete(name, self._command(name, *args))
def _untagged_response(self, typ, dat, name):
if typ == 'NO':
return typ, dat
if not name in self.untagged_responses:
return typ, [None]
data = self.untagged_responses.pop(name)
if __debug__:
if self.debug >= 5:
self._mesg('untagged_responses[%s] => %s' % (name, data))
return typ, data
if __debug__:
def _mesg(self, s, secs=None):
if secs is None:
secs = time.time()
tm = time.strftime('%M:%S', time.localtime(secs))
sys.stderr.write(' %s.%02d %s\n' % (tm, (secs*100)%100, s))
sys.stderr.flush()
def _dump_ur(self, dict):
# Dump untagged responses (in `dict').
l = dict.items()
if not l: return
t = '\n\t\t'
l = map(lambda x:'%s: "%s"' % (x[0], x[1][0] and '" "'.join(x[1]) or ''), l)
self._mesg('untagged responses dump:%s%s' % (t, t.join(l)))
def _log(self, line):
# Keep log of last `_cmd_log_len' interactions for debugging.
self._cmd_log[self._cmd_log_idx] = (line, time.time())
self._cmd_log_idx += 1
if self._cmd_log_idx >= self._cmd_log_len:
self._cmd_log_idx = 0
def print_log(self):
self._mesg('last %d IMAP4 interactions:' % len(self._cmd_log))
i, n = self._cmd_log_idx, self._cmd_log_len
while n:
try:
self._mesg(*self._cmd_log[i])
except:
pass
i += 1
if i >= self._cmd_log_len:
i = 0
n -= 1
try:
import ssl
except ImportError:
pass
else:
class IMAP4_SSL(IMAP4):
"""IMAP4 client class over SSL connection
Instantiate with: IMAP4_SSL([host[, port[, keyfile[, certfile]]]])
host - host's name (default: localhost);
port - port number (default: standard IMAP4 SSL port).
keyfile - PEM formatted file that contains your private key (default: None);
certfile - PEM formatted certificate chain file (default: None);
for more documentation see the docstring of the parent class IMAP4.
"""
def __init__(self, host = '', port = IMAP4_SSL_PORT, keyfile = None, certfile = None):
self.keyfile = keyfile
self.certfile = certfile
IMAP4.__init__(self, host, port)
def open(self, host = '', port = IMAP4_SSL_PORT):
"""Setup connection to remote server on "host:port".
(default: localhost:standard IMAP4 SSL port).
This connection will be used by the routines:
read, readline, send, shutdown.
"""
self.host = host
self.port = port
self.sock = socket.create_connection((host, port))
self.sslobj = ssl.wrap_socket(self.sock, self.keyfile, self.certfile)
self.file = self.sslobj.makefile('rb')
def send(self, data):
"""Send data to remote."""
bytes = len(data)
while bytes > 0:
sent = self.sslobj.write(data)
if sent == bytes:
break # avoid copy
data = data[sent:]
bytes = bytes - sent
def shutdown(self):
"""Close I/O established in "open"."""
self.file.close()
self.sock.close()
def socket(self):
"""Return socket instance used to connect to IMAP4 server.
socket = <instance>.socket()
"""
return self.sock
def ssl(self):
"""Return SSLObject instance used to communicate with the IMAP4 server.
ssl = ssl.wrap_socket(<instance>.socket)
"""
return self.sslobj
__all__.append("IMAP4_SSL")
class IMAP4_stream(IMAP4):
"""IMAP4 client class over a stream
Instantiate with: IMAP4_stream(command)
where "command" is a string that can be passed to subprocess.Popen()
for more documentation see the docstring of the parent class IMAP4.
"""
def __init__(self, command):
self.command = command
IMAP4.__init__(self)
def open(self, host = None, port = None):
"""Setup a stream connection.
This connection will be used by the routines:
read, readline, send, shutdown.
"""
self.host = None # For compatibility with parent class
self.port = None
self.sock = None
self.file = None
self.process = subprocess.Popen(self.command,
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
shell=True, close_fds=True)
self.writefile = self.process.stdin
self.readfile = self.process.stdout
def read(self, size):
"""Read 'size' bytes from remote."""
return self.readfile.read(size)
def readline(self):
"""Read line from remote."""
return self.readfile.readline()
def send(self, data):
"""Send data to remote."""
self.writefile.write(data)
self.writefile.flush()
def shutdown(self):
"""Close I/O established in "open"."""
self.readfile.close()
self.writefile.close()
self.process.wait()
class _Authenticator:
"""Private class to provide en/decoding
for base64-based authentication conversation.
"""
def __init__(self, mechinst):
self.mech = mechinst # Callable object to provide/process data
def process(self, data):
ret = self.mech(self.decode(data))
if ret is None:
return '*' # Abort conversation
return self.encode(ret)
def encode(self, inp):
#
# Invoke binascii.b2a_base64 iteratively with
# short even length buffers, strip the trailing
# line feed from the result and append. "Even"
# means a number that factors to both 6 and 8,
# so when it gets to the end of the 8-bit input
# there's no partial 6-bit output.
#
oup = ''
while inp:
if len(inp) > 48:
t = inp[:48]
inp = inp[48:]
else:
t = inp
inp = ''
e = binascii.b2a_base64(t)
if e:
oup = oup + e[:-1]
return oup
def decode(self, inp):
if not inp:
return ''
return binascii.a2b_base64(inp)
Mon2num = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
def Internaldate2tuple(resp):
"""Parse an IMAP4 INTERNALDATE string.
Return corresponding local time. The return value is a
time.struct_time instance or None if the string has wrong format.
"""
mo = InternalDate.match(resp)
if not mo:
return None
mon = Mon2num[mo.group('mon')]
zonen = mo.group('zonen')
day = int(mo.group('day'))
year = int(mo.group('year'))
hour = int(mo.group('hour'))
min = int(mo.group('min'))
sec = int(mo.group('sec'))
zoneh = int(mo.group('zoneh'))
zonem = int(mo.group('zonem'))
# INTERNALDATE timezone must be subtracted to get UT
zone = (zoneh*60 + zonem)*60
if zonen == '-':
zone = -zone
tt = (year, mon, day, hour, min, sec, -1, -1, -1)
utc = time.mktime(tt)
# Following is necessary because the time module has no 'mkgmtime'.
# 'mktime' assumes arg in local timezone, so adds timezone/altzone.
lt = time.localtime(utc)
if time.daylight and lt[-1]:
zone = zone + time.altzone
else:
zone = zone + time.timezone
return time.localtime(utc - zone)
def Int2AP(num):
"""Convert integer to A-P string representation."""
val = ''; AP = 'ABCDEFGHIJKLMNOP'
num = int(abs(num))
while num:
num, mod = divmod(num, 16)
val = AP[mod] + val
return val
def ParseFlags(resp):
"""Convert IMAP4 flags response to python tuple."""
mo = Flags.match(resp)
if not mo:
return ()
return tuple(mo.group('flags').split())
def Time2Internaldate(date_time):
"""Convert date_time to IMAP4 INTERNALDATE representation.
Return string in form: '"DD-Mmm-YYYY HH:MM:SS +HHMM"'. The
date_time argument can be a number (int or float) representing
seconds since epoch (as returned by time.time()), a 9-tuple
representing local time (as returned by time.localtime()), or a
double-quoted string. In the last case, it is assumed to already
be in the correct format.
"""
if isinstance(date_time, (int, long, float)):
tt = time.localtime(date_time)
elif isinstance(date_time, (tuple, time.struct_time)):
tt = date_time
elif isinstance(date_time, str) and (date_time[0],date_time[-1]) == ('"','"'):
return date_time # Assume in correct format
else:
raise ValueError("date_time not of a known type")
dt = time.strftime("%d-%b-%Y %H:%M:%S", tt)
if dt[0] == '0':
dt = ' ' + dt[1:]
if time.daylight and tt[-1]:
zone = -time.altzone
else:
zone = -time.timezone
return '"' + dt + " %+03d%02d" % divmod(zone//60, 60) + '"'
if __name__ == '__main__':
# To test: invoke either as 'python imaplib.py [IMAP4_server_hostname]'
# or 'python imaplib.py -s "rsh IMAP4_server_hostname exec /etc/rimapd"'
# to test the IMAP4_stream class
import getopt, getpass
try:
optlist, args = getopt.getopt(sys.argv[1:], 'd:s:')
except getopt.error, val:
optlist, args = (), ()
stream_command = None
for opt,val in optlist:
if opt == '-d':
Debug = int(val)
elif opt == '-s':
stream_command = val
if not args: args = (stream_command,)
if not args: args = ('',)
host = args[0]
USER = getpass.getuser()
PASSWD = getpass.getpass("IMAP password for %s on %s: " % (USER, host or "localhost"))
test_mesg = 'From: %(user)s@localhost%(lf)sSubject: IMAP4 test%(lf)s%(lf)sdata...%(lf)s' % {'user':USER, 'lf':'\n'}
test_seq1 = (
('login', (USER, PASSWD)),
('create', ('/tmp/xxx 1',)),
('rename', ('/tmp/xxx 1', '/tmp/yyy')),
('CREATE', ('/tmp/yyz 2',)),
('append', ('/tmp/yyz 2', None, None, test_mesg)),
('list', ('/tmp', 'yy*')),
('select', ('/tmp/yyz 2',)),
('search', (None, 'SUBJECT', 'test')),
('fetch', ('1', '(FLAGS INTERNALDATE RFC822)')),
('store', ('1', 'FLAGS', '(\Deleted)')),
('namespace', ()),
('expunge', ()),
('recent', ()),
('close', ()),
)
test_seq2 = (
('select', ()),
('response',('UIDVALIDITY',)),
('uid', ('SEARCH', 'ALL')),
('response', ('EXISTS',)),
('append', (None, None, None, test_mesg)),
('recent', ()),
('logout', ()),
)
def run(cmd, args):
M._mesg('%s %s' % (cmd, args))
typ, dat = getattr(M, cmd)(*args)
M._mesg('%s => %s %s' % (cmd, typ, dat))
if typ == 'NO': raise dat[0]
return dat
try:
if stream_command:
M = IMAP4_stream(stream_command)
else:
M = IMAP4(host)
if M.state == 'AUTH':
test_seq1 = test_seq1[1:] # Login not needed
M._mesg('PROTOCOL_VERSION = %s' % M.PROTOCOL_VERSION)
M._mesg('CAPABILITIES = %r' % (M.capabilities,))
for cmd,args in test_seq1:
run(cmd, args)
for ml in run('list', ('/tmp/', 'yy%')):
mo = re.match(r'.*"([^"]+)"$', ml)
if mo: path = mo.group(1)
else: path = ml.split()[-1]
run('delete', (path,))
for cmd,args in test_seq2:
dat = run(cmd, args)
if (cmd,args) != ('uid', ('SEARCH', 'ALL')):
continue
uid = dat[-1].split()
if not uid: continue
run('uid', ('FETCH', '%s' % uid[-1],
'(FLAGS INTERNALDATE RFC822.SIZE RFC822.HEADER RFC822.TEXT)'))
print '\nAll tests OK.'
except:
print '\nTests failed.'
if not Debug:
print '''
If you would like to see debugging output,
try: %s -d5
''' % sys.argv[0]
raise
"""Recognize image file formats based on their first few bytes."""
__all__ = ["what"]
#-------------------------#
# Recognize image headers #
#-------------------------#
def what(file, h=None):
f = None
try:
if h is None:
if isinstance(file, basestring):
f = open(file, 'rb')
h = f.read(32)
else:
location = file.tell()
h = file.read(32)
file.seek(location)
for tf in tests:
res = tf(h, f)
if res:
return res
finally:
if f: f.close()
return None
#---------------------------------#
# Subroutines per image file type #
#---------------------------------#
tests = []
def test_jpeg(h, f):
"""JPEG data in JFIF format"""
if h[6:10] == 'JFIF':
return 'jpeg'
tests.append(test_jpeg)
def test_exif(h, f):
"""JPEG data in Exif format"""
if h[6:10] == 'Exif':
return 'jpeg'
tests.append(test_exif)
def test_png(h, f):
if h[:8] == "\211PNG\r\n\032\n":
return 'png'
tests.append(test_png)
def test_gif(h, f):
"""GIF ('87 and '89 variants)"""
if h[:6] in ('GIF87a', 'GIF89a'):
return 'gif'
tests.append(test_gif)
def test_tiff(h, f):
"""TIFF (can be in Motorola or Intel byte order)"""
if h[:2] in ('MM', 'II'):
return 'tiff'
tests.append(test_tiff)
def test_rgb(h, f):
"""SGI image library"""
if h[:2] == '\001\332':
return 'rgb'
tests.append(test_rgb)
def test_pbm(h, f):
"""PBM (portable bitmap)"""
if len(h) >= 3 and \
h[0] == 'P' and h[1] in '14' and h[2] in ' \t\n\r':
return 'pbm'
tests.append(test_pbm)
def test_pgm(h, f):
"""PGM (portable graymap)"""
if len(h) >= 3 and \
h[0] == 'P' and h[1] in '25' and h[2] in ' \t\n\r':
return 'pgm'
tests.append(test_pgm)
def test_ppm(h, f):
"""PPM (portable pixmap)"""
if len(h) >= 3 and \
h[0] == 'P' and h[1] in '36' and h[2] in ' \t\n\r':
return 'ppm'
tests.append(test_ppm)
def test_rast(h, f):
"""Sun raster file"""
if h[:4] == '\x59\xA6\x6A\x95':
return 'rast'
tests.append(test_rast)
def test_xbm(h, f):
"""X bitmap (X10 or X11)"""
s = '#define '
if h[:len(s)] == s:
return 'xbm'
tests.append(test_xbm)
def test_bmp(h, f):
if h[:2] == 'BM':
return 'bmp'
tests.append(test_bmp)
#--------------------#
# Small test program #
#--------------------#
def test():
import sys
recursive = 0
if sys.argv[1:] and sys.argv[1] == '-r':
del sys.argv[1:2]
recursive = 1
try:
if sys.argv[1:]:
testall(sys.argv[1:], recursive, 1)
else:
testall(['.'], recursive, 1)
except KeyboardInterrupt:
sys.stderr.write('\n[Interrupted]\n')
sys.exit(1)
def testall(list, recursive, toplevel):
import sys
import os
for filename in list:
if os.path.isdir(filename):
print filename + '/:',
if recursive or toplevel:
print 'recursing down:'
import glob
names = glob.glob(os.path.join(filename, '*'))
testall(names, recursive, 0)
else:
print '*** directory (use -r) ***'
else:
print filename + ':',
sys.stdout.flush()
try:
print what(filename)
except IOError:
print '*** not found ***'
"""Backport of importlib.import_module from 3.x."""
# While not critical (and in no way guaranteed!), it would be nice to keep this
# code compatible with Python 2.3.
import sys
def _resolve_name(name, package, level):
"""Return the absolute name of the module to be imported."""
if not hasattr(package, 'rindex'):
raise ValueError("'package' not set to a string")
dot = len(package)
for x in xrange(level, 1, -1):
try:
dot = package.rindex('.', 0, dot)
except ValueError:
raise ValueError("attempted relative import beyond top-level "
"package")
return "%s.%s" % (package[:dot], name)
def import_module(name, package=None):
"""Import a module.
The 'package' argument is required when performing a relative import. It
specifies the package to use as the anchor point from which to resolve the
relative import to an absolute import.
"""
if name.startswith('.'):
if not package:
raise TypeError("relative imports require the 'package' argument")
level = 0
for character in name:
if character != '.':
break
level += 1
name = _resolve_name(name[level:], package, level)
__import__(name)
return sys.modules[name]
"""
Import utilities
Exported classes:
ImportManager Manage the import process
Importer Base class for replacing standard import functions
BuiltinImporter Emulate the import mechanism for builtin and frozen modules
DynLoadSuffixImporter
"""
from warnings import warnpy3k
warnpy3k("the imputil module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
# note: avoid importing non-builtin modules
import imp ### not available in Jython?
import sys
import __builtin__
# for the DirectoryImporter
import struct
import marshal
__all__ = ["ImportManager","Importer","BuiltinImporter"]
_StringType = type('')
_ModuleType = type(sys) ### doesn't work in Jython...
class ImportManager:
"Manage the import process."
def install(self, namespace=vars(__builtin__)):
"Install this ImportManager into the specified namespace."
if isinstance(namespace, _ModuleType):
namespace = vars(namespace)
# Note: we have no notion of "chaining"
# Record the previous import hook, then install our own.
self.previous_importer = namespace['__import__']
self.namespace = namespace
namespace['__import__'] = self._import_hook
### fix this
#namespace['reload'] = self._reload_hook
def uninstall(self):
"Restore the previous import mechanism."
self.namespace['__import__'] = self.previous_importer
def add_suffix(self, suffix, importFunc):
assert hasattr(importFunc, '__call__')
self.fs_imp.add_suffix(suffix, importFunc)
######################################################################
#
# PRIVATE METHODS
#
clsFilesystemImporter = None
def __init__(self, fs_imp=None):
# we're definitely going to be importing something in the future,
# so let's just load the OS-related facilities.
if not _os_stat:
_os_bootstrap()
# This is the Importer that we use for grabbing stuff from the
# filesystem. It defines one more method (import_from_dir) for our use.
if fs_imp is None:
cls = self.clsFilesystemImporter or _FilesystemImporter
fs_imp = cls()
self.fs_imp = fs_imp
# Initialize the set of suffixes that we recognize and import.
# The default will import dynamic-load modules first, followed by
# .py files (or a .py file's cached bytecode)
for desc in imp.get_suffixes():
if desc[2] == imp.C_EXTENSION:
self.add_suffix(desc[0],
DynLoadSuffixImporter(desc).import_file)
self.add_suffix('.py', py_suffix_importer)
def _import_hook(self, fqname, globals=None, locals=None, fromlist=None):
"""Python calls this hook to locate and import a module."""
parts = fqname.split('.')
# determine the context of this import
parent = self._determine_import_context(globals)
# if there is a parent, then its importer should manage this import
if parent:
module = parent.__importer__._do_import(parent, parts, fromlist)
if module:
return module
# has the top module already been imported?
try:
top_module = sys.modules[parts[0]]
except KeyError:
# look for the topmost module
top_module = self._import_top_module(parts[0])
if not top_module:
# the topmost module wasn't found at all.
raise ImportError, 'No module named ' + fqname
# fast-path simple imports
if len(parts) == 1:
if not fromlist:
return top_module
if not top_module.__dict__.get('__ispkg__'):
# __ispkg__ isn't defined (the module was not imported by us),
# or it is zero.
#
# In the former case, there is no way that we could import
# sub-modules that occur in the fromlist (but we can't raise an
# error because it may just be names) because we don't know how
# to deal with packages that were imported by other systems.
#
# In the latter case (__ispkg__ == 0), there can't be any sub-
# modules present, so we can just return.
#
# In both cases, since len(parts) == 1, the top_module is also
# the "bottom" which is the defined return when a fromlist
# exists.
return top_module
importer = top_module.__dict__.get('__importer__')
if importer:
return importer._finish_import(top_module, parts[1:], fromlist)
# Grrr, some people "import os.path" or do "from os.path import ..."
if len(parts) == 2 and hasattr(top_module, parts[1]):
if fromlist:
return getattr(top_module, parts[1])
else:
return top_module
# If the importer does not exist, then we have to bail. A missing
# importer means that something else imported the module, and we have
# no knowledge of how to get sub-modules out of the thing.
raise ImportError, 'No module named ' + fqname
def _determine_import_context(self, globals):
"""Returns the context in which a module should be imported.
The context could be a loaded (package) module and the imported module
will be looked for within that package. The context could also be None,
meaning there is no context -- the module should be looked for as a
"top-level" module.
"""
if not globals or not globals.get('__importer__'):
# globals does not refer to one of our modules or packages. That
# implies there is no relative import context (as far as we are
# concerned), and it should just pick it off the standard path.
return None
# The globals refer to a module or package of ours. It will define
# the context of the new import. Get the module/package fqname.
parent_fqname = globals['__name__']
# if a package is performing the import, then return itself (imports
# refer to pkg contents)
if globals['__ispkg__']:
parent = sys.modules[parent_fqname]
assert globals is parent.__dict__
return parent
i = parent_fqname.rfind('.')
# a module outside of a package has no particular import context
if i == -1:
return None
# if a module in a package is performing the import, then return the
# package (imports refer to siblings)
parent_fqname = parent_fqname[:i]
parent = sys.modules[parent_fqname]
assert parent.__name__ == parent_fqname
return parent
def _import_top_module(self, name):
# scan sys.path looking for a location in the filesystem that contains
# the module, or an Importer object that can import the module.
for item in sys.path:
if isinstance(item, _StringType):
module = self.fs_imp.import_from_dir(item, name)
else:
module = item.import_top(name)
if module:
return module
return None
def _reload_hook(self, module):
"Python calls this hook to reload a module."
# reloading of a module may or may not be possible (depending on the
# importer), but at least we can validate that it's ours to reload
importer = module.__dict__.get('__importer__')
if not importer:
### oops. now what...
pass
# okay. it is using the imputil system, and we must delegate it, but
# we don't know what to do (yet)
### we should blast the module dict and do another get_code(). need to
### flesh this out and add proper docco...
raise SystemError, "reload not yet implemented"
class Importer:
"Base class for replacing standard import functions."
def import_top(self, name):
"Import a top-level module."
return self._import_one(None, name, name)
######################################################################
#
# PRIVATE METHODS
#
def _finish_import(self, top, parts, fromlist):
# if "a.b.c" was provided, then load the ".b.c" portion down from
# below the top-level module.
bottom = self._load_tail(top, parts)
# if the form is "import a.b.c", then return "a"
if not fromlist:
# no fromlist: return the top of the import tree
return top
# the top module was imported by self.
#
# this means that the bottom module was also imported by self (just
# now, or in the past and we fetched it from sys.modules).
#
# since we imported/handled the bottom module, this means that we can
# also handle its fromlist (and reliably use __ispkg__).
# if the bottom node is a package, then (potentially) import some
# modules.
#
# note: if it is not a package, then "fromlist" refers to names in
# the bottom module rather than modules.
# note: for a mix of names and modules in the fromlist, we will
# import all modules and insert those into the namespace of
# the package module. Python will pick up all fromlist names
# from the bottom (package) module; some will be modules that
# we imported and stored in the namespace, others are expected
# to be present already.
if bottom.__ispkg__:
self._import_fromlist(bottom, fromlist)
# if the form is "from a.b import c, d" then return "b"
return bottom
def _import_one(self, parent, modname, fqname):
"Import a single module."
# has the module already been imported?
try:
return sys.modules[fqname]
except KeyError:
pass
# load the module's code, or fetch the module itself
result = self.get_code(parent, modname, fqname)
if result is None:
return None
module = self._process_result(result, fqname)
# insert the module into its parent
if parent:
setattr(parent, modname, module)
return module
def _process_result(self, result, fqname):
ispkg, code, values = result
# did get_code() return an actual module? (rather than a code object)
is_module = isinstance(code, _ModuleType)
# use the returned module, or create a new one to exec code into
if is_module:
module = code
else:
module = imp.new_module(fqname)
### record packages a bit differently??
module.__importer__ = self
module.__ispkg__ = ispkg
# insert additional values into the module (before executing the code)
module.__dict__.update(values)
# the module is almost ready... make it visible
sys.modules[fqname] = module
# execute the code within the module's namespace
if not is_module:
try:
exec code in module.__dict__
except:
if fqname in sys.modules:
del sys.modules[fqname]
raise
# fetch from sys.modules instead of returning module directly.
# also make module's __name__ agree with fqname, in case
# the "exec code in module.__dict__" played games on us.
module = sys.modules[fqname]
module.__name__ = fqname
return module
def _load_tail(self, m, parts):
"""Import the rest of the modules, down from the top-level module.
Returns the last module in the dotted list of modules.
"""
for part in parts:
fqname = "%s.%s" % (m.__name__, part)
m = self._import_one(m, part, fqname)
if not m:
raise ImportError, "No module named " + fqname
return m
def _import_fromlist(self, package, fromlist):
'Import any sub-modules in the "from" list.'
# if '*' is present in the fromlist, then look for the '__all__'
# variable to find additional items (modules) to import.
if '*' in fromlist:
fromlist = list(fromlist) + \
list(package.__dict__.get('__all__', []))
for sub in fromlist:
# if the name is already present, then don't try to import it (it
# might not be a module!).
if sub != '*' and not hasattr(package, sub):
subname = "%s.%s" % (package.__name__, sub)
submod = self._import_one(package, sub, subname)
if not submod:
raise ImportError, "cannot import name " + subname
def _do_import(self, parent, parts, fromlist):
"""Attempt to import the module relative to parent.
This method is used when the import context specifies that <self>
imported the parent module.
"""
top_name = parts[0]
top_fqname = parent.__name__ + '.' + top_name
top_module = self._import_one(parent, top_name, top_fqname)
if not top_module:
# this importer and parent could not find the module (relatively)
return None
return self._finish_import(top_module, parts[1:], fromlist)
######################################################################
#
# METHODS TO OVERRIDE
#
def get_code(self, parent, modname, fqname):
"""Find and retrieve the code for the given module.
parent specifies a parent module to define a context for importing. It
may be None, indicating no particular context for the search.
modname specifies a single module (not dotted) within the parent.
fqname specifies the fully-qualified module name. This is a
(potentially) dotted name from the "root" of the module namespace
down to the modname.
If there is no parent, then modname==fqname.
This method should return None, or a 3-tuple.
* If the module was not found, then None should be returned.
* The first item of the 2- or 3-tuple should be the integer 0 or 1,
specifying whether the module that was found is a package or not.
* The second item is the code object for the module (it will be
executed within the new module's namespace). This item can also
be a fully-loaded module object (e.g. loaded from a shared lib).
* The third item is a dictionary of name/value pairs that will be
inserted into new module before the code object is executed. This
is provided in case the module's code expects certain values (such
as where the module was found). When the second item is a module
object, then these names/values will be inserted *after* the module
has been loaded/initialized.
"""
raise RuntimeError, "get_code not implemented"
######################################################################
#
# Some handy stuff for the Importers
#
# byte-compiled file suffix character
_suffix_char = __debug__ and 'c' or 'o'
# byte-compiled file suffix
_suffix = '.py' + _suffix_char
def _compile(pathname, timestamp):
"""Compile (and cache) a Python source file.
The file specified by <pathname> is compiled to a code object and
returned.
Presuming the appropriate privileges exist, the bytecodes will be
saved back to the filesystem for future imports. The source file's
modification timestamp must be provided as a Long value.
"""
codestring = open(pathname, 'rU').read()
if codestring and codestring[-1] != '\n':
codestring = codestring + '\n'
code = __builtin__.compile(codestring, pathname, 'exec')
# try to cache the compiled code
try:
f = open(pathname + _suffix_char, 'wb')
except IOError:
pass
else:
f.write('\0\0\0\0')
f.write(struct.pack('<I', timestamp))
marshal.dump(code, f)
f.flush()
f.seek(0, 0)
f.write(imp.get_magic())
f.close()
return code
_os_stat = _os_path_join = None
def _os_bootstrap():
"Set up 'os' module replacement functions for use during import bootstrap."
names = sys.builtin_module_names
join = None
if 'posix' in names:
sep = '/'
from posix import stat
elif 'nt' in names:
sep = '\\'
from nt import stat
elif 'dos' in names:
sep = '\\'
from dos import stat
elif 'os2' in names:
sep = '\\'
from os2 import stat
else:
raise ImportError, 'no os specific module found'
if join is None:
def join(a, b, sep=sep):
if a == '':
return b
lastchar = a[-1:]
if lastchar == '/' or lastchar == sep:
return a + b
return a + sep + b
global _os_stat
_os_stat = stat
global _os_path_join
_os_path_join = join
def _os_path_isdir(pathname):
"Local replacement for os.path.isdir()."
try:
s = _os_stat(pathname)
except OSError:
return None
return (s.st_mode & 0170000) == 0040000
def _timestamp(pathname):
"Return the file modification time as a Long."
try:
s = _os_stat(pathname)
except OSError:
return None
return long(s.st_mtime)
######################################################################
#
# Emulate the import mechanism for builtin and frozen modules
#
class BuiltinImporter(Importer):
def get_code(self, parent, modname, fqname):
if parent:
# these modules definitely do not occur within a package context
return None
# look for the module
if imp.is_builtin(modname):
type = imp.C_BUILTIN
elif imp.is_frozen(modname):
type = imp.PY_FROZEN
else:
# not found
return None
# got it. now load and return it.
module = imp.load_module(modname, None, modname, ('', '', type))
return 0, module, { }
######################################################################
#
# Internal importer used for importing from the filesystem
#
class _FilesystemImporter(Importer):
def __init__(self):
self.suffixes = [ ]
def add_suffix(self, suffix, importFunc):
assert hasattr(importFunc, '__call__')
self.suffixes.append((suffix, importFunc))
def import_from_dir(self, dir, fqname):
result = self._import_pathname(_os_path_join(dir, fqname), fqname)
if result:
return self._process_result(result, fqname)
return None
def get_code(self, parent, modname, fqname):
# This importer is never used with an empty parent. Its existence is
# private to the ImportManager. The ImportManager uses the
# import_from_dir() method to import top-level modules/packages.
# This method is only used when we look for a module within a package.
assert parent
for submodule_path in parent.__path__:
code = self._import_pathname(_os_path_join(submodule_path, modname), fqname)
if code is not None:
return code
return self._import_pathname(_os_path_join(parent.__pkgdir__, modname),
fqname)
def _import_pathname(self, pathname, fqname):
if _os_path_isdir(pathname):
result = self._import_pathname(_os_path_join(pathname, '__init__'),
fqname)
if result:
values = result[2]
values['__pkgdir__'] = pathname
values['__path__'] = [ pathname ]
return 1, result[1], values
return None
for suffix, importFunc in self.suffixes:
filename = pathname + suffix
try:
finfo = _os_stat(filename)
except OSError:
pass
else:
return importFunc(filename, finfo, fqname)
return None
######################################################################
#
# SUFFIX-BASED IMPORTERS
#
def py_suffix_importer(filename, finfo, fqname):
file = filename[:-3] + _suffix
t_py = long(finfo[8])
t_pyc = _timestamp(file)
code = None
if t_pyc is not None and t_pyc >= t_py:
f = open(file, 'rb')
if f.read(4) == imp.get_magic():
t = struct.unpack('<I', f.read(4))[0]
if t == t_py:
code = marshal.load(f)
f.close()
if code is None:
file = filename
code = _compile(file, t_py)
return 0, code, { '__file__' : file }
class DynLoadSuffixImporter:
def __init__(self, desc):
self.desc = desc
def import_file(self, filename, finfo, fqname):
fp = open(filename, self.desc[1])
module = imp.load_module(fqname, fp, filename, self.desc)
module.__file__ = filename
return 0, module, { }
######################################################################
def _print_importers():
items = sys.modules.items()
items.sort()
for name, module in items:
if module:
print name, module.__dict__.get('__importer__', '-- no importer')
else:
print name, '-- non-existent module'
def _test_revamp():
ImportManager().install()
sys.path.insert(0, BuiltinImporter())
######################################################################
#
# TODO
#
# from Finn Bock:
# type(sys) is not a module in Jython. what to use instead?
# imp.C_EXTENSION is not in Jython. same for get_suffixes and new_module
#
# given foo.py of:
# import sys
# sys.modules['foo'] = sys
#
# ---- standard import mechanism
# >>> import foo
# >>> foo
# <module 'sys' (built-in)>
#
# ---- revamped import mechanism
# >>> import imputil
# >>> imputil._test_revamp()
# >>> import foo
# >>> foo
# <module 'foo' from 'foo.py'>
#
#
# from MAL:
# should BuiltinImporter exist in sys.path or hard-wired in ImportManager?
# need __path__ processing
# performance
# move chaining to a subclass [gjs: it's been nuked]
# deinstall should be possible
# query mechanism needed: is a specific Importer installed?
# py/pyc/pyo piping hooks to filter/process these files
# wish list:
# distutils importer hooked to list of standard Internet repositories
# module->file location mapper to speed FS-based imports
# relative imports
# keep chaining so that it can play nice with other import hooks
#
# from Gordon:
# push MAL's mapper into sys.path[0] as a cache (hard-coded for apps)
#
# from Guido:
# need to change sys.* references for rexec environs
# need hook for MAL's walk-me-up import strategy, or Tim's absolute strategy
# watch out for sys.modules[...] is None
# flag to force absolute imports? (speeds _determine_import_context and
# checking for a relative module)
# insert names of archives into sys.path (see quote below)
# note: reload does NOT blast module dict
# shift import mechanisms and policies around; provide for hooks, overrides
# (see quote below)
# add get_source stuff
# get_topcode and get_subcode
# CRLF handling in _compile
# race condition in _compile
# refactoring of os.py to deal with _os_bootstrap problem
# any special handling to do for importing a module with a SyntaxError?
# (e.g. clean up the traceback)
# implement "domain" for path-type functionality using pkg namespace
# (rather than FS-names like __path__)
# don't use the word "private"... maybe "internal"
#
#
# Guido's comments on sys.path caching:
#
# We could cache this in a dictionary: the ImportManager can have a
# cache dict mapping pathnames to importer objects, and a separate
# method for coming up with an importer given a pathname that's not yet
# in the cache. The method should do a stat and/or look at the
# extension to decide which importer class to use; you can register new
# importer classes by registering a suffix or a Boolean function, plus a
# class. If you register a new importer class, the cache is zapped.
# The cache is independent from sys.path (but maintained per
# ImportManager instance) so that rearrangements of sys.path do the
# right thing. If a path is dropped from sys.path the corresponding
# cache entry is simply no longer used.
#
# My/Guido's comments on factoring ImportManager and Importer:
#
# > However, we still have a tension occurring here:
# >
# > 1) implementing policy in ImportManager assists in single-point policy
# > changes for app/rexec situations
# > 2) implementing policy in Importer assists in package-private policy
# > changes for normal, operating conditions
# >
# > I'll see if I can sort out a way to do this. Maybe the Importer class will
# > implement the methods (which can be overridden to change policy) by
# > delegating to ImportManager.
#
# Maybe also think about what kind of policies an Importer would be
# likely to want to change. I have a feeling that a lot of the code
# there is actually not so much policy but a *necessity* to get things
# working given the calling conventions for the __import__ hook: whether
# to return the head or tail of a dotted name, or when to do the "finish
# fromlist" stuff.
#
# -*- coding: iso-8859-1 -*-
"""Get useful information from live Python objects.
This module encapsulates the interface provided by the internal special
attributes (func_*, co_*, im_*, tb_*, etc.) in a friendlier fashion.
It also provides some help for examining source code and class layout.
Here are some of the useful functions provided by this module:
ismodule(), isclass(), ismethod(), isfunction(), isgeneratorfunction(),
isgenerator(), istraceback(), isframe(), iscode(), isbuiltin(),
isroutine() - check object types
getmembers() - get members of an object that satisfy a given condition
getfile(), getsourcefile(), getsource() - find an object's source code
getdoc(), getcomments() - get documentation on an object
getmodule() - determine the module that an object came from
getclasstree() - arrange classes so as to represent their hierarchy
getargspec(), getargvalues(), getcallargs() - get info about function arguments
formatargspec(), formatargvalues() - format an argument spec
getouterframes(), getinnerframes() - get info about frames
currentframe() - get the current stack frame
stack(), trace() - get info about frames on the stack or in a traceback
"""
# This module is in the public domain. No warranties.
__author__ = 'Ka-Ping Yee <[email protected]>'
__date__ = '1 Jan 2001'
import sys
import os
import types
import string
import re
import dis
import imp
import tokenize
import linecache
from operator import attrgetter
from collections import namedtuple
# These constants are from Include/code.h.
CO_OPTIMIZED, CO_NEWLOCALS, CO_VARARGS, CO_VARKEYWORDS = 0x1, 0x2, 0x4, 0x8
CO_NESTED, CO_GENERATOR, CO_NOFREE = 0x10, 0x20, 0x40
# See Include/object.h
TPFLAGS_IS_ABSTRACT = 1 << 20
# ----------------------------------------------------------- type-checking
def ismodule(object):
"""Return true if the object is a module.
Module objects provide these attributes:
__doc__ documentation string
__file__ filename (missing for built-in modules)"""
return isinstance(object, types.ModuleType)
def isclass(object):
"""Return true if the object is a class.
Class objects provide these attributes:
__doc__ documentation string
__module__ name of module in which this class was defined"""
return isinstance(object, (type, types.ClassType))
def ismethod(object):
"""Return true if the object is an instance method.
Instance method objects provide these attributes:
__doc__ documentation string
__name__ name with which this method was defined
im_class class object in which this method belongs
im_func function object containing implementation of method
im_self instance to which this method is bound, or None"""
return isinstance(object, types.MethodType)
def ismethoddescriptor(object):
"""Return true if the object is a method descriptor.
But not if ismethod() or isclass() or isfunction() are true.
This is new in Python 2.2, and, for example, is true of int.__add__.
An object passing this test has a __get__ attribute but not a __set__
attribute, but beyond that the set of attributes varies. __name__ is
usually sensible, and __doc__ often is.
Methods implemented via descriptors that also pass one of the other
tests return false from the ismethoddescriptor() test, simply because
the other tests promise more -- you can, e.g., count on having the
im_func attribute (etc) when an object passes ismethod()."""
return (hasattr(object, "__get__")
and not hasattr(object, "__set__") # else it's a data descriptor
and not ismethod(object) # mutual exclusion
and not isfunction(object)
and not isclass(object))
def isdatadescriptor(object):
"""Return true if the object is a data descriptor.
Data descriptors have both a __get__ and a __set__ attribute. Examples are
properties (defined in Python) and getsets and members (defined in C).
Typically, data descriptors will also have __name__ and __doc__ attributes
(properties, getsets, and members have both of these attributes), but this
is not guaranteed."""
return (hasattr(object, "__set__") and hasattr(object, "__get__"))
if hasattr(types, 'MemberDescriptorType'):
# CPython and equivalent
def ismemberdescriptor(object):
"""Return true if the object is a member descriptor.
Member descriptors are specialized descriptors defined in extension
modules."""
return isinstance(object, types.MemberDescriptorType)
else:
# Other implementations
def ismemberdescriptor(object):
"""Return true if the object is a member descriptor.
Member descriptors are specialized descriptors defined in extension
modules."""
return False
if hasattr(types, 'GetSetDescriptorType'):
# CPython and equivalent
def isgetsetdescriptor(object):
"""Return true if the object is a getset descriptor.
getset descriptors are specialized descriptors defined in extension
modules."""
return isinstance(object, types.GetSetDescriptorType)
else:
# Other implementations
def isgetsetdescriptor(object):
"""Return true if the object is a getset descriptor.
getset descriptors are specialized descriptors defined in extension
modules."""
return False
def isfunction(object):
"""Return true if the object is a user-defined function.
Function objects provide these attributes:
__doc__ documentation string
__name__ name with which this function was defined
func_code code object containing compiled function bytecode
func_defaults tuple of any default values for arguments
func_doc (same as __doc__)
func_globals global namespace in which this function was defined
func_name (same as __name__)"""
return isinstance(object, types.FunctionType)
def isgeneratorfunction(object):
"""Return true if the object is a user-defined generator function.
Generator function objects provide the same attributes as functions.
See help(isfunction) for a list of attributes."""
return bool((isfunction(object) or ismethod(object)) and
object.func_code.co_flags & CO_GENERATOR)
def isgenerator(object):
"""Return true if the object is a generator.
Generator objects provide these attributes:
__iter__ defined to support iteration over container
close raises a new GeneratorExit exception inside the
generator to terminate the iteration
gi_code code object
gi_frame frame object or possibly None once the generator has
been exhausted
gi_running set to 1 when generator is executing, 0 otherwise
next return the next item from the container
send resumes the generator and "sends" a value that becomes
the result of the current yield-expression
throw used to raise an exception inside the generator"""
return isinstance(object, types.GeneratorType)
def istraceback(object):
"""Return true if the object is a traceback.
Traceback objects provide these attributes:
tb_frame frame object at this level
tb_lasti index of last attempted instruction in bytecode
tb_lineno current line number in Python source code
tb_next next inner traceback object (called by this level)"""
return isinstance(object, types.TracebackType)
def isframe(object):
"""Return true if the object is a frame object.
Frame objects provide these attributes:
f_back next outer frame object (this frame's caller)
f_builtins built-in namespace seen by this frame
f_code code object being executed in this frame
f_exc_traceback traceback if raised in this frame, or None
f_exc_type exception type if raised in this frame, or None
f_exc_value exception value if raised in this frame, or None
f_globals global namespace seen by this frame
f_lasti index of last attempted instruction in bytecode
f_lineno current line number in Python source code
f_locals local namespace seen by this frame
f_restricted 0 or 1 if frame is in restricted execution mode
f_trace tracing function for this frame, or None"""
return isinstance(object, types.FrameType)
def iscode(object):
"""Return true if the object is a code object.
Code objects provide these attributes:
co_argcount number of arguments (not including * or ** args)
co_code string of raw compiled bytecode
co_consts tuple of constants used in the bytecode
co_filename name of file in which this code object was created
co_firstlineno number of first line in Python source code
co_flags bitmap: 1=optimized | 2=newlocals | 4=*arg | 8=**arg
co_lnotab encoded mapping of line numbers to bytecode indices
co_name name with which this code object was defined
co_names tuple of names of local variables
co_nlocals number of local variables
co_stacksize virtual machine stack space required
co_varnames tuple of names of arguments and local variables"""
return isinstance(object, types.CodeType)
def isbuiltin(object):
"""Return true if the object is a built-in function or method.
Built-in functions and methods provide these attributes:
__doc__ documentation string
__name__ original name of this function or method
__self__ instance to which a method is bound, or None"""
return isinstance(object, types.BuiltinFunctionType)
def isroutine(object):
"""Return true if the object is any kind of function or method."""
return (isbuiltin(object)
or isfunction(object)
or ismethod(object)
or ismethoddescriptor(object))
def isabstract(object):
"""Return true if the object is an abstract base class (ABC)."""
return bool(isinstance(object, type) and object.__flags__ & TPFLAGS_IS_ABSTRACT)
def getmembers(object, predicate=None):
"""Return all members of an object as (name, value) pairs sorted by name.
Optionally, only return members that satisfy a given predicate."""
results = []
for key in dir(object):
try:
value = getattr(object, key)
except AttributeError:
continue
if not predicate or predicate(value):
results.append((key, value))
results.sort()
return results
Attribute = namedtuple('Attribute', 'name kind defining_class object')
def classify_class_attrs(cls):
"""Return list of attribute-descriptor tuples.
For each name in dir(cls), the return list contains a 4-tuple
with these elements:
0. The name (a string).
1. The kind of attribute this is, one of these strings:
'class method' created via classmethod()
'static method' created via staticmethod()
'property' created via property()
'method' any other flavor of method
'data' not a method
2. The class which defined this attribute (a class).
3. The object as obtained directly from the defining class's
__dict__, not via getattr. This is especially important for
data attributes: C.data is just a data object, but
C.__dict__['data'] may be a data descriptor with additional
info, like a __doc__ string.
"""
mro = getmro(cls)
names = dir(cls)
result = []
for name in names:
# Get the object associated with the name, and where it was defined.
# Getting an obj from the __dict__ sometimes reveals more than
# using getattr. Static and class methods are dramatic examples.
# Furthermore, some objects may raise an Exception when fetched with
# getattr(). This is the case with some descriptors (bug #1785).
# Thus, we only use getattr() as a last resort.
homecls = None
for base in (cls,) + mro:
if name in base.__dict__:
obj = base.__dict__[name]
homecls = base
break
else:
obj = getattr(cls, name)
homecls = getattr(obj, "__objclass__", homecls)
# Classify the object.
if isinstance(obj, staticmethod):
kind = "static method"
elif isinstance(obj, classmethod):
kind = "class method"
elif isinstance(obj, property):
kind = "property"
elif ismethoddescriptor(obj):
kind = "method"
elif isdatadescriptor(obj):
kind = "data"
else:
obj_via_getattr = getattr(cls, name)
if (ismethod(obj_via_getattr) or
ismethoddescriptor(obj_via_getattr)):
kind = "method"
else:
kind = "data"
obj = obj_via_getattr
result.append(Attribute(name, kind, homecls, obj))
return result
# ----------------------------------------------------------- class helpers
def _searchbases(cls, accum):
# Simulate the "classic class" search order.
if cls in accum:
return
accum.append(cls)
for base in cls.__bases__:
_searchbases(base, accum)
def getmro(cls):
"Return tuple of base classes (including cls) in method resolution order."
if hasattr(cls, "__mro__"):
return cls.__mro__
else:
result = []
_searchbases(cls, result)
return tuple(result)
# -------------------------------------------------- source code extraction
def indentsize(line):
"""Return the indent size, in spaces, at the start of a line of text."""
expline = string.expandtabs(line)
return len(expline) - len(string.lstrip(expline))
def getdoc(object):
"""Get the documentation string for an object.
All tabs are expanded to spaces. To clean up docstrings that are
indented to line up with blocks of code, any whitespace than can be
uniformly removed from the second line onwards is removed."""
try:
doc = object.__doc__
except AttributeError:
return None
if not isinstance(doc, types.StringTypes):
return None
return cleandoc(doc)
def cleandoc(doc):
"""Clean up indentation from docstrings.
Any whitespace that can be uniformly removed from the second line
onwards is removed."""
try:
lines = string.split(string.expandtabs(doc), '\n')
except UnicodeError:
return None
else:
# Find minimum indentation of any non-blank lines after first line.
margin = sys.maxint
for line in lines[1:]:
content = len(string.lstrip(line))
if content:
indent = len(line) - content
margin = min(margin, indent)
# Remove indentation.
if lines:
lines[0] = lines[0].lstrip()
if margin < sys.maxint:
for i in range(1, len(lines)): lines[i] = lines[i][margin:]
# Remove any trailing or leading blank lines.
while lines and not lines[-1]:
lines.pop()
while lines and not lines[0]:
lines.pop(0)
return string.join(lines, '\n')
def getfile(object):
"""Work out which source or compiled file an object was defined in."""
if ismodule(object):
if hasattr(object, '__file__'):
return object.__file__
raise TypeError('{!r} is a built-in module'.format(object))
if isclass(object):
object = sys.modules.get(object.__module__)
if hasattr(object, '__file__'):
return object.__file__
raise TypeError('{!r} is a built-in class'.format(object))
if ismethod(object):
object = object.im_func
if isfunction(object):
object = object.func_code
if istraceback(object):
object = object.tb_frame
if isframe(object):
object = object.f_code
if iscode(object):
return object.co_filename
raise TypeError('{!r} is not a module, class, method, '
'function, traceback, frame, or code object'.format(object))
ModuleInfo = namedtuple('ModuleInfo', 'name suffix mode module_type')
def getmoduleinfo(path):
"""Get the module name, suffix, mode, and module type for a given file."""
filename = os.path.basename(path)
suffixes = map(lambda info:
(-len(info[0]), info[0], info[1], info[2]),
imp.get_suffixes())
suffixes.sort() # try longest suffixes first, in case they overlap
for neglen, suffix, mode, mtype in suffixes:
if filename[neglen:] == suffix:
return ModuleInfo(filename[:neglen], suffix, mode, mtype)
def getmodulename(path):
"""Return the module name for a given file, or None."""
info = getmoduleinfo(path)
if info: return info[0]
def getsourcefile(object):
"""Return the filename that can be used to locate an object's source.
Return None if no way can be identified to get the source.
"""
filename = getfile(object)
if string.lower(filename[-4:]) in ('.pyc', '.pyo'):
filename = filename[:-4] + '.py'
for suffix, mode, kind in imp.get_suffixes():
if 'b' in mode and string.lower(filename[-len(suffix):]) == suffix:
# Looks like a binary file. We want to only return a text file.
return None
if os.path.exists(filename):
return filename
# only return a non-existent filename if the module has a PEP 302 loader
if hasattr(getmodule(object, filename), '__loader__'):
return filename
# or it is in the linecache
if filename in linecache.cache:
return filename
def getabsfile(object, _filename=None):
"""Return an absolute path to the source or compiled file for an object.
The idea is for each object to have a unique origin, so this routine
normalizes the result as much as possible."""
if _filename is None:
_filename = getsourcefile(object) or getfile(object)
return os.path.normcase(os.path.abspath(_filename))
modulesbyfile = {}
_filesbymodname = {}
def getmodule(object, _filename=None):
"""Return the module an object was defined in, or None if not found."""
if ismodule(object):
return object
if hasattr(object, '__module__'):
return sys.modules.get(object.__module__)
# Try the filename to modulename cache
if _filename is not None and _filename in modulesbyfile:
return sys.modules.get(modulesbyfile[_filename])
# Try the cache again with the absolute file name
try:
file = getabsfile(object, _filename)
except TypeError:
return None
if file in modulesbyfile:
return sys.modules.get(modulesbyfile[file])
# Update the filename to module name cache and check yet again
# Copy sys.modules in order to cope with changes while iterating
for modname, module in sys.modules.items():
if ismodule(module) and hasattr(module, '__file__'):
f = module.__file__
if f == _filesbymodname.get(modname, None):
# Have already mapped this module, so skip it
continue
_filesbymodname[modname] = f
f = getabsfile(module)
# Always map to the name the module knows itself by
modulesbyfile[f] = modulesbyfile[
os.path.realpath(f)] = module.__name__
if file in modulesbyfile:
return sys.modules.get(modulesbyfile[file])
# Check the main module
main = sys.modules['__main__']
if not hasattr(object, '__name__'):
return None
if hasattr(main, object.__name__):
mainobject = getattr(main, object.__name__)
if mainobject is object:
return main
# Check builtins
builtin = sys.modules['__builtin__']
if hasattr(builtin, object.__name__):
builtinobject = getattr(builtin, object.__name__)
if builtinobject is object:
return builtin
def findsource(object):
"""Return the entire source file and starting line number for an object.
The argument may be a module, class, method, function, traceback, frame,
or code object. The source code is returned as a list of all the lines
in the file and the line number indexes a line in that list. An IOError
is raised if the source code cannot be retrieved."""
file = getfile(object)
sourcefile = getsourcefile(object)
if not sourcefile and file[:1] + file[-1:] != '<>':
raise IOError('source code not available')
file = sourcefile if sourcefile else file
module = getmodule(object, file)
if module:
lines = linecache.getlines(file, module.__dict__)
else:
lines = linecache.getlines(file)
if not lines:
raise IOError('could not get source code')
if ismodule(object):
return lines, 0
if isclass(object):
name = object.__name__
pat = re.compile(r'^(\s*)class\s*' + name + r'\b')
# make some effort to find the best matching class definition:
# use the one with the least indentation, which is the one
# that's most probably not inside a function definition.
candidates = []
for i in range(len(lines)):
match = pat.match(lines[i])
if match:
# if it's at toplevel, it's already the best one
if lines[i][0] == 'c':
return lines, i
# else add whitespace to candidate list
candidates.append((match.group(1), i))
if candidates:
# this will sort by whitespace, and by line number,
# less whitespace first
candidates.sort()
return lines, candidates[0][1]
else:
raise IOError('could not find class definition')
if ismethod(object):
object = object.im_func
if isfunction(object):
object = object.func_code
if istraceback(object):
object = object.tb_frame
if isframe(object):
object = object.f_code
if iscode(object):
if not hasattr(object, 'co_firstlineno'):
raise IOError('could not find function definition')
lnum = object.co_firstlineno - 1
pat = re.compile(r'^(\s*def\s)|(.*(?<!\w)lambda(:|\s))|^(\s*@)')
while lnum > 0:
if pat.match(lines[lnum]): break
lnum = lnum - 1
return lines, lnum
raise IOError('could not find code object')
def getcomments(object):
"""Get lines of comments immediately preceding an object's source code.
Returns None when source can't be found.
"""
try:
lines, lnum = findsource(object)
except (IOError, TypeError):
return None
if ismodule(object):
# Look for a comment block at the top of the file.
start = 0
if lines and lines[0][:2] == '#!': start = 1
while start < len(lines) and string.strip(lines[start]) in ('', '#'):
start = start + 1
if start < len(lines) and lines[start][:1] == '#':
comments = []
end = start
while end < len(lines) and lines[end][:1] == '#':
comments.append(string.expandtabs(lines[end]))
end = end + 1
return string.join(comments, '')
# Look for a preceding block of comments at the same indentation.
elif lnum > 0:
indent = indentsize(lines[lnum])
end = lnum - 1
if end >= 0 and string.lstrip(lines[end])[:1] == '#' and \
indentsize(lines[end]) == indent:
comments = [string.lstrip(string.expandtabs(lines[end]))]
if end > 0:
end = end - 1
comment = string.lstrip(string.expandtabs(lines[end]))
while comment[:1] == '#' and indentsize(lines[end]) == indent:
comments[:0] = [comment]
end = end - 1
if end < 0: break
comment = string.lstrip(string.expandtabs(lines[end]))
while comments and string.strip(comments[0]) == '#':
comments[:1] = []
while comments and string.strip(comments[-1]) == '#':
comments[-1:] = []
return string.join(comments, '')
class EndOfBlock(Exception): pass
class BlockFinder:
"""Provide a tokeneater() method to detect the end of a code block."""
def __init__(self):
self.indent = 0
self.islambda = False
self.started = False
self.passline = False
self.last = 1
def tokeneater(self, type, token, srow_scol, erow_ecol, line):
srow, scol = srow_scol
erow, ecol = erow_ecol
if not self.started:
# look for the first "def", "class" or "lambda"
if token in ("def", "class", "lambda"):
if token == "lambda":
self.islambda = True
self.started = True
self.passline = True # skip to the end of the line
elif type == tokenize.NEWLINE:
self.passline = False # stop skipping when a NEWLINE is seen
self.last = srow
if self.islambda: # lambdas always end at the first NEWLINE
raise EndOfBlock
elif self.passline:
pass
elif type == tokenize.INDENT:
self.indent = self.indent + 1
self.passline = True
elif type == tokenize.DEDENT:
self.indent = self.indent - 1
# the end of matching indent/dedent pairs end a block
# (note that this only works for "def"/"class" blocks,
# not e.g. for "if: else:" or "try: finally:" blocks)
if self.indent <= 0:
raise EndOfBlock
elif self.indent == 0 and type not in (tokenize.COMMENT, tokenize.NL):
# any other token on the same indentation level end the previous
# block as well, except the pseudo-tokens COMMENT and NL.
raise EndOfBlock
def getblock(lines):
"""Extract the block of code at the top of the given list of lines."""
blockfinder = BlockFinder()
try:
tokenize.tokenize(iter(lines).next, blockfinder.tokeneater)
except (EndOfBlock, IndentationError):
pass
return lines[:blockfinder.last]
def getsourcelines(object):
"""Return a list of source lines and starting line number for an object.
The argument may be a module, class, method, function, traceback, frame,
or code object. The source code is returned as a list of the lines
corresponding to the object and the line number indicates where in the
original source file the first line of code was found. An IOError is
raised if the source code cannot be retrieved."""
lines, lnum = findsource(object)
if istraceback(object):
object = object.tb_frame
# for module or frame that corresponds to module, return all source lines
if (ismodule(object) or
(isframe(object) and object.f_code.co_name == "<module>")):
return lines, 0
else:
return getblock(lines[lnum:]), lnum + 1
def getsource(object):
"""Return the text of the source code for an object.
The argument may be a module, class, method, function, traceback, frame,
or code object. The source code is returned as a single string. An
IOError is raised if the source code cannot be retrieved."""
lines, lnum = getsourcelines(object)
return string.join(lines, '')
# --------------------------------------------------- class tree extraction
def walktree(classes, children, parent):
"""Recursive helper function for getclasstree()."""
results = []
classes.sort(key=attrgetter('__module__', '__name__'))
for c in classes:
results.append((c, c.__bases__))
if c in children:
results.append(walktree(children[c], children, c))
return results
def getclasstree(classes, unique=0):
"""Arrange the given list of classes into a hierarchy of nested lists.
Where a nested list appears, it contains classes derived from the class
whose entry immediately precedes the list. Each entry is a 2-tuple
containing a class and a tuple of its base classes. If the 'unique'
argument is true, exactly one entry appears in the returned structure
for each class in the given list. Otherwise, classes using multiple
inheritance and their descendants will appear multiple times."""
children = {}
roots = []
for c in classes:
if c.__bases__:
for parent in c.__bases__:
if not parent in children:
children[parent] = []
if c not in children[parent]:
children[parent].append(c)
if unique and parent in classes: break
elif c not in roots:
roots.append(c)
for parent in children:
if parent not in classes:
roots.append(parent)
return walktree(roots, children, None)
# ------------------------------------------------ argument list extraction
Arguments = namedtuple('Arguments', 'args varargs keywords')
def getargs(co):
"""Get information about the arguments accepted by a code object.
Three things are returned: (args, varargs, varkw), where 'args' is
a list of argument names (possibly containing nested lists), and
'varargs' and 'varkw' are the names of the * and ** arguments or None."""
if not iscode(co):
raise TypeError('{!r} is not a code object'.format(co))
nargs = co.co_argcount
names = co.co_varnames
args = list(names[:nargs])
step = 0
# The following acrobatics are for anonymous (tuple) arguments.
for i in range(nargs):
if args[i][:1] in ('', '.'):
stack, remain, count = [], [], []
while step < len(co.co_code):
op = ord(co.co_code[step])
step = step + 1
if op >= dis.HAVE_ARGUMENT:
opname = dis.opname[op]
value = ord(co.co_code[step]) + ord(co.co_code[step+1])*256
step = step + 2
if opname in ('UNPACK_TUPLE', 'UNPACK_SEQUENCE'):
remain.append(value)
count.append(value)
elif opname in ('STORE_FAST', 'STORE_DEREF'):
if opname == 'STORE_FAST':
stack.append(names[value])
else:
stack.append(co.co_cellvars[value])
# Special case for sublists of length 1: def foo((bar))
# doesn't generate the UNPACK_TUPLE bytecode, so if
# `remain` is empty here, we have such a sublist.
if not remain:
stack[0] = [stack[0]]
break
else:
remain[-1] = remain[-1] - 1
while remain[-1] == 0:
remain.pop()
size = count.pop()
stack[-size:] = [stack[-size:]]
if not remain: break
remain[-1] = remain[-1] - 1
if not remain: break
args[i] = stack[0]
varargs = None
if co.co_flags & CO_VARARGS:
varargs = co.co_varnames[nargs]
nargs = nargs + 1
varkw = None
if co.co_flags & CO_VARKEYWORDS:
varkw = co.co_varnames[nargs]
return Arguments(args, varargs, varkw)
ArgSpec = namedtuple('ArgSpec', 'args varargs keywords defaults')
def getargspec(func):
"""Get the names and default values of a function's arguments.
A tuple of four things is returned: (args, varargs, varkw, defaults).
'args' is a list of the argument names (it may contain nested lists).
'varargs' and 'varkw' are the names of the * and ** arguments or None.
'defaults' is an n-tuple of the default values of the last n arguments.
"""
if ismethod(func):
func = func.im_func
if not isfunction(func):
raise TypeError('{!r} is not a Python function'.format(func))
args, varargs, varkw = getargs(func.func_code)
return ArgSpec(args, varargs, varkw, func.func_defaults)
ArgInfo = namedtuple('ArgInfo', 'args varargs keywords locals')
def getargvalues(frame):
"""Get information about arguments passed into a particular frame.
A tuple of four things is returned: (args, varargs, varkw, locals).
'args' is a list of the argument names (it may contain nested lists).
'varargs' and 'varkw' are the names of the * and ** arguments or None.
'locals' is the locals dictionary of the given frame."""
args, varargs, varkw = getargs(frame.f_code)
return ArgInfo(args, varargs, varkw, frame.f_locals)
def joinseq(seq):
if len(seq) == 1:
return '(' + seq[0] + ',)'
else:
return '(' + string.join(seq, ', ') + ')'
def strseq(object, convert, join=joinseq):
"""Recursively walk a sequence, stringifying each element."""
if type(object) in (list, tuple):
return join(map(lambda o, c=convert, j=join: strseq(o, c, j), object))
else:
return convert(object)
def formatargspec(args, varargs=None, varkw=None, defaults=None,
formatarg=str,
formatvarargs=lambda name: '*' + name,
formatvarkw=lambda name: '**' + name,
formatvalue=lambda value: '=' + repr(value),
join=joinseq):
"""Format an argument spec from the 4 values returned by getargspec.
The first four arguments are (args, varargs, varkw, defaults). The
other four arguments are the corresponding optional formatting functions
that are called to turn names and values into strings. The ninth
argument is an optional function to format the sequence of arguments."""
specs = []
if defaults:
firstdefault = len(args) - len(defaults)
for i, arg in enumerate(args):
spec = strseq(arg, formatarg, join)
if defaults and i >= firstdefault:
spec = spec + formatvalue(defaults[i - firstdefault])
specs.append(spec)
if varargs is not None:
specs.append(formatvarargs(varargs))
if varkw is not None:
specs.append(formatvarkw(varkw))
return '(' + string.join(specs, ', ') + ')'
def formatargvalues(args, varargs, varkw, locals,
formatarg=str,
formatvarargs=lambda name: '*' + name,
formatvarkw=lambda name: '**' + name,
formatvalue=lambda value: '=' + repr(value),
join=joinseq):
"""Format an argument spec from the 4 values returned by getargvalues.
The first four arguments are (args, varargs, varkw, locals). The
next four arguments are the corresponding optional formatting functions
that are called to turn names and values into strings. The ninth
argument is an optional function to format the sequence of arguments."""
def convert(name, locals=locals,
formatarg=formatarg, formatvalue=formatvalue):
return formatarg(name) + formatvalue(locals[name])
specs = []
for i in range(len(args)):
specs.append(strseq(args[i], convert, join))
if varargs:
specs.append(formatvarargs(varargs) + formatvalue(locals[varargs]))
if varkw:
specs.append(formatvarkw(varkw) + formatvalue(locals[varkw]))
return '(' + string.join(specs, ', ') + ')'
def getcallargs(func, *positional, **named):
"""Get the mapping of arguments to values.
A dict is returned, with keys the function argument names (including the
names of the * and ** arguments, if any), and values the respective bound
values from 'positional' and 'named'."""
args, varargs, varkw, defaults = getargspec(func)
f_name = func.__name__
arg2value = {}
# The following closures are basically because of tuple parameter unpacking.
assigned_tuple_params = []
def assign(arg, value):
if isinstance(arg, str):
arg2value[arg] = value
else:
assigned_tuple_params.append(arg)
value = iter(value)
for i, subarg in enumerate(arg):
try:
subvalue = next(value)
except StopIteration:
raise ValueError('need more than %d %s to unpack' %
(i, 'values' if i > 1 else 'value'))
assign(subarg,subvalue)
try:
next(value)
except StopIteration:
pass
else:
raise ValueError('too many values to unpack')
def is_assigned(arg):
if isinstance(arg,str):
return arg in arg2value
return arg in assigned_tuple_params
if ismethod(func) and func.im_self is not None:
# implicit 'self' (or 'cls' for classmethods) argument
positional = (func.im_self,) + positional
num_pos = len(positional)
num_total = num_pos + len(named)
num_args = len(args)
num_defaults = len(defaults) if defaults else 0
for arg, value in zip(args, positional):
assign(arg, value)
if varargs:
if num_pos > num_args:
assign(varargs, positional[-(num_pos-num_args):])
else:
assign(varargs, ())
elif 0 < num_args < num_pos:
raise TypeError('%s() takes %s %d %s (%d given)' % (
f_name, 'at most' if defaults else 'exactly', num_args,
'arguments' if num_args > 1 else 'argument', num_total))
elif num_args == 0 and num_total:
if varkw:
if num_pos:
# XXX: We should use num_pos, but Python also uses num_total:
raise TypeError('%s() takes exactly 0 arguments '
'(%d given)' % (f_name, num_total))
else:
raise TypeError('%s() takes no arguments (%d given)' %
(f_name, num_total))
for arg in args:
if isinstance(arg, str) and arg in named:
if is_assigned(arg):
raise TypeError("%s() got multiple values for keyword "
"argument '%s'" % (f_name, arg))
else:
assign(arg, named.pop(arg))
if defaults: # fill in any missing values with the defaults
for arg, value in zip(args[-num_defaults:], defaults):
if not is_assigned(arg):
assign(arg, value)
if varkw:
assign(varkw, named)
elif named:
unexpected = next(iter(named))
try:
unicode
except NameError:
pass
else:
if isinstance(unexpected, unicode):
unexpected = unexpected.encode(sys.getdefaultencoding(), 'replace')
raise TypeError("%s() got an unexpected keyword argument '%s'" %
(f_name, unexpected))
unassigned = num_args - len([arg for arg in args if is_assigned(arg)])
if unassigned:
num_required = num_args - num_defaults
raise TypeError('%s() takes %s %d %s (%d given)' % (
f_name, 'at least' if defaults else 'exactly', num_required,
'arguments' if num_required > 1 else 'argument', num_total))
return arg2value
# -------------------------------------------------- stack frame extraction
Traceback = namedtuple('Traceback', 'filename lineno function code_context index')
def getframeinfo(frame, context=1):
"""Get information about a frame or traceback object.
A tuple of five things is returned: the filename, the line number of
the current line, the function name, a list of lines of context from
the source code, and the index of the current line within that list.
The optional second argument specifies the number of lines of context
to return, which are centered around the current line."""
if istraceback(frame):
lineno = frame.tb_lineno
frame = frame.tb_frame
else:
lineno = frame.f_lineno
if not isframe(frame):
raise TypeError('{!r} is not a frame or traceback object'.format(frame))
filename = getsourcefile(frame) or getfile(frame)
if context > 0:
start = lineno - 1 - context//2
try:
lines, lnum = findsource(frame)
except IOError:
lines = index = None
else:
start = max(start, 1)
start = max(0, min(start, len(lines) - context))
lines = lines[start:start+context]
index = lineno - 1 - start
else:
lines = index = None
return Traceback(filename, lineno, frame.f_code.co_name, lines, index)
def getlineno(frame):
"""Get the line number from a frame object, allowing for optimization."""
# FrameType.f_lineno is now a descriptor that grovels co_lnotab
return frame.f_lineno
def getouterframes(frame, context=1):
"""Get a list of records for a frame and all higher (calling) frames.
Each record contains a frame object, filename, line number, function
name, a list of lines of context, and index within the context."""
framelist = []
while frame:
framelist.append((frame,) + getframeinfo(frame, context))
frame = frame.f_back
return framelist
def getinnerframes(tb, context=1):
"""Get a list of records for a traceback's frame and all lower frames.
Each record contains a frame object, filename, line number, function
name, a list of lines of context, and index within the context."""
framelist = []
while tb:
framelist.append((tb.tb_frame,) + getframeinfo(tb, context))
tb = tb.tb_next
return framelist
if hasattr(sys, '_getframe'):
currentframe = sys._getframe
else:
currentframe = lambda _=None: None
def stack(context=1):
"""Return a list of records for the stack above the caller's frame."""
return getouterframes(sys._getframe(1), context)
def trace(context=1):
"""Return a list of records for the stack below the current exception."""
return getinnerframes(sys.exc_info()[2], context)
"""The io module provides the Python interfaces to stream handling. The
builtin open function is defined in this module.
At the top of the I/O hierarchy is the abstract base class IOBase. It
defines the basic interface to a stream. Note, however, that there is no
separation between reading and writing to streams; implementations are
allowed to raise an IOError if they do not support a given operation.
Extending IOBase is RawIOBase which deals simply with the reading and
writing of raw bytes to a stream. FileIO subclasses RawIOBase to provide
an interface to OS files.
BufferedIOBase deals with buffering on a raw byte stream (RawIOBase). Its
subclasses, BufferedWriter, BufferedReader, and BufferedRWPair buffer
streams that are readable, writable, and both respectively.
BufferedRandom provides a buffered interface to random access
streams. BytesIO is a simple stream of in-memory bytes.
Another IOBase subclass, TextIOBase, deals with the encoding and decoding
of streams into text. TextIOWrapper, which extends it, is a buffered text
interface to a buffered raw stream (`BufferedIOBase`). Finally, StringIO
is an in-memory stream for text.
Argument names are not part of the specification, and only the arguments
of open() are intended to be used as keyword arguments.
data:
DEFAULT_BUFFER_SIZE
An int containing the default buffer size used by the module's buffered
I/O classes. open() uses the file's blksize (as obtained by os.stat) if
possible.
"""
# New I/O library conforming to PEP 3116.
__author__ = ("Guido van Rossum <[email protected]>, "
"Mike Verdone <[email protected]>, "
"Mark Russell <[email protected]>, "
"Antoine Pitrou <[email protected]>, "
"Amaury Forgeot d'Arc <[email protected]>, "
"Benjamin Peterson <[email protected]>")
__all__ = ["BlockingIOError", "open", "IOBase", "RawIOBase", "FileIO",
"BytesIO", "StringIO", "BufferedIOBase",
"BufferedReader", "BufferedWriter", "BufferedRWPair",
"BufferedRandom", "TextIOBase", "TextIOWrapper",
"UnsupportedOperation", "SEEK_SET", "SEEK_CUR", "SEEK_END"]
import _io
import abc
from _io import (DEFAULT_BUFFER_SIZE, BlockingIOError, UnsupportedOperation,
open, FileIO, BytesIO, StringIO, BufferedReader,
BufferedWriter, BufferedRWPair, BufferedRandom,
IncrementalNewlineDecoder, TextIOWrapper)
OpenWrapper = _io.open # for compatibility with _pyio
# for seek()
SEEK_SET = 0
SEEK_CUR = 1
SEEK_END = 2
# Declaring ABCs in C is tricky so we do it here.
# Method descriptions and default implementations are inherited from the C
# version however.
class IOBase(_io._IOBase):
__metaclass__ = abc.ABCMeta
__doc__ = _io._IOBase.__doc__
class RawIOBase(_io._RawIOBase, IOBase):
__doc__ = _io._RawIOBase.__doc__
class BufferedIOBase(_io._BufferedIOBase, IOBase):
__doc__ = _io._BufferedIOBase.__doc__
class TextIOBase(_io._TextIOBase, IOBase):
__doc__ = _io._TextIOBase.__doc__
RawIOBase.register(FileIO)
for klass in (BytesIO, BufferedReader, BufferedWriter, BufferedRandom,
BufferedRWPair):
BufferedIOBase.register(klass)
for klass in (StringIO, TextIOWrapper):
TextIOBase.register(klass)
del klass
"""Implementation of JSONDecoder
"""
import re
import sys
import struct
from json import scanner
try:
from _json import scanstring as c_scanstring
except ImportError:
c_scanstring = None
__all__ = ['JSONDecoder']
FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
def _floatconstants():
nan, = struct.unpack('>d', b'\x7f\xf8\x00\x00\x00\x00\x00\x00')
inf, = struct.unpack('>d', b'\x7f\xf0\x00\x00\x00\x00\x00\x00')
return nan, inf, -inf
NaN, PosInf, NegInf = _floatconstants()
def linecol(doc, pos):
lineno = doc.count('\n', 0, pos) + 1
if lineno == 1:
colno = pos + 1
else:
colno = pos - doc.rindex('\n', 0, pos)
return lineno, colno
def errmsg(msg, doc, pos, end=None):
# Note that this function is called from _json
lineno, colno = linecol(doc, pos)
if end is None:
fmt = '{0}: line {1} column {2} (char {3})'
return fmt.format(msg, lineno, colno, pos)
#fmt = '%s: line %d column %d (char %d)'
#return fmt % (msg, lineno, colno, pos)
endlineno, endcolno = linecol(doc, end)
fmt = '{0}: line {1} column {2} - line {3} column {4} (char {5} - {6})'
return fmt.format(msg, lineno, colno, endlineno, endcolno, pos, end)
#fmt = '%s: line %d column %d - line %d column %d (char %d - %d)'
#return fmt % (msg, lineno, colno, endlineno, endcolno, pos, end)
_CONSTANTS = {
'-Infinity': NegInf,
'Infinity': PosInf,
'NaN': NaN,
}
STRINGCHUNK = re.compile(r'(.*?)(["\\\x00-\x1f])', FLAGS)
BACKSLASH = {
'"': u'"', '\\': u'\\', '/': u'/',
'b': u'\b', 'f': u'\f', 'n': u'\n', 'r': u'\r', 't': u'\t',
}
DEFAULT_ENCODING = "utf-8"
def _decode_uXXXX(s, pos):
esc = s[pos + 1:pos + 5]
if len(esc) == 4 and esc[1] not in 'xX':
try:
return int(esc, 16)
except ValueError:
pass
msg = "Invalid \\uXXXX escape"
raise ValueError(errmsg(msg, s, pos))
def py_scanstring(s, end, encoding=None, strict=True,
_b=BACKSLASH, _m=STRINGCHUNK.match):
"""Scan the string s for a JSON string. End is the index of the
character in s after the quote that started the JSON string.
Unescapes all valid JSON string escape sequences and raises ValueError
on attempt to decode an invalid string. If strict is False then literal
control characters are allowed in the string.
Returns a tuple of the decoded string and the index of the character in s
after the end quote."""
if encoding is None:
encoding = DEFAULT_ENCODING
chunks = []
_append = chunks.append
begin = end - 1
while 1:
chunk = _m(s, end)
if chunk is None:
raise ValueError(
errmsg("Unterminated string starting at", s, begin))
end = chunk.end()
content, terminator = chunk.groups()
# Content is contains zero or more unescaped string characters
if content:
if not isinstance(content, unicode):
content = unicode(content, encoding)
_append(content)
# Terminator is the end of string, a literal control character,
# or a backslash denoting that an escape sequence follows
if terminator == '"':
break
elif terminator != '\\':
if strict:
#msg = "Invalid control character %r at" % (terminator,)
msg = "Invalid control character {0!r} at".format(terminator)
raise ValueError(errmsg(msg, s, end))
else:
_append(terminator)
continue
try:
esc = s[end]
except IndexError:
raise ValueError(
errmsg("Unterminated string starting at", s, begin))
# If not a unicode escape sequence, must be in the lookup table
if esc != 'u':
try:
char = _b[esc]
except KeyError:
msg = "Invalid \\escape: " + repr(esc)
raise ValueError(errmsg(msg, s, end))
end += 1
else:
# Unicode escape sequence
uni = _decode_uXXXX(s, end)
end += 5
# Check for surrogate pair on UCS-4 systems
if sys.maxunicode > 65535 and \
0xd800 <= uni <= 0xdbff and s[end:end + 2] == '\\u':
uni2 = _decode_uXXXX(s, end + 1)
if 0xdc00 <= uni2 <= 0xdfff:
uni = 0x10000 + (((uni - 0xd800) << 10) | (uni2 - 0xdc00))
end += 6
char = unichr(uni)
# Append the unescaped character
_append(char)
return u''.join(chunks), end
# Use speedup if available
scanstring = c_scanstring or py_scanstring
WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)
WHITESPACE_STR = ' \t\n\r'
def JSONObject(s_and_end, encoding, strict, scan_once, object_hook,
object_pairs_hook, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
s, end = s_and_end
pairs = []
pairs_append = pairs.append
# Use a slice to prevent IndexError from being raised, the following
# check will raise a more specific ValueError if the string is empty
nextchar = s[end:end + 1]
# Normally we expect nextchar == '"'
if nextchar != '"':
if nextchar in _ws:
end = _w(s, end).end()
nextchar = s[end:end + 1]
# Trivial empty object
if nextchar == '}':
if object_pairs_hook is not None:
result = object_pairs_hook(pairs)
return result, end + 1
pairs = {}
if object_hook is not None:
pairs = object_hook(pairs)
return pairs, end + 1
elif nextchar != '"':
raise ValueError(errmsg(
"Expecting property name enclosed in double quotes", s, end))
end += 1
while True:
key, end = scanstring(s, end, encoding, strict)
# To skip some function call overhead we optimize the fast paths where
# the JSON key separator is ": " or just ":".
if s[end:end + 1] != ':':
end = _w(s, end).end()
if s[end:end + 1] != ':':
raise ValueError(errmsg("Expecting ':' delimiter", s, end))
end += 1
try:
if s[end] in _ws:
end += 1
if s[end] in _ws:
end = _w(s, end + 1).end()
except IndexError:
pass
try:
value, end = scan_once(s, end)
except StopIteration:
raise ValueError(errmsg("Expecting object", s, end))
pairs_append((key, value))
try:
nextchar = s[end]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end]
except IndexError:
nextchar = ''
end += 1
if nextchar == '}':
break
elif nextchar != ',':
raise ValueError(errmsg("Expecting ',' delimiter", s, end - 1))
try:
nextchar = s[end]
if nextchar in _ws:
end += 1
nextchar = s[end]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end]
except IndexError:
nextchar = ''
end += 1
if nextchar != '"':
raise ValueError(errmsg(
"Expecting property name enclosed in double quotes", s, end - 1))
if object_pairs_hook is not None:
result = object_pairs_hook(pairs)
return result, end
pairs = dict(pairs)
if object_hook is not None:
pairs = object_hook(pairs)
return pairs, end
def JSONArray(s_and_end, scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
s, end = s_and_end
values = []
nextchar = s[end:end + 1]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end:end + 1]
# Look-ahead for trivial empty array
if nextchar == ']':
return values, end + 1
_append = values.append
while True:
try:
value, end = scan_once(s, end)
except StopIteration:
raise ValueError(errmsg("Expecting object", s, end))
_append(value)
nextchar = s[end:end + 1]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end:end + 1]
end += 1
if nextchar == ']':
break
elif nextchar != ',':
raise ValueError(errmsg("Expecting ',' delimiter", s, end))
try:
if s[end] in _ws:
end += 1
if s[end] in _ws:
end = _w(s, end + 1).end()
except IndexError:
pass
return values, end
class JSONDecoder(object):
"""Simple JSON <http://json.org> decoder
Performs the following translations in decoding by default:
+---------------+-------------------+
| JSON | Python |
+===============+===================+
| object | dict |
+---------------+-------------------+
| array | list |
+---------------+-------------------+
| string | unicode |
+---------------+-------------------+
| number (int) | int, long |
+---------------+-------------------+
| number (real) | float |
+---------------+-------------------+
| true | True |
+---------------+-------------------+
| false | False |
+---------------+-------------------+
| null | None |
+---------------+-------------------+
It also understands ``NaN``, ``Infinity``, and ``-Infinity`` as
their corresponding ``float`` values, which is outside the JSON spec.
"""
def __init__(self, encoding=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, strict=True,
object_pairs_hook=None):
"""``encoding`` determines the encoding used to interpret any ``str``
objects decoded by this instance (utf-8 by default). It has no
effect when decoding ``unicode`` objects.
Note that currently only encodings that are a superset of ASCII work,
strings of other encodings should be passed in as ``unicode``.
``object_hook``, if specified, will be called with the result
of every JSON object decoded and its return value will be used in
place of the given ``dict``. This can be used to provide custom
deserializations (e.g. to support JSON-RPC class hinting).
``object_pairs_hook``, if specified will be called with the result of
every JSON object decoded with an ordered list of pairs. The return
value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict will remember the order of insertion). If
``object_hook`` is also defined, the ``object_pairs_hook`` takes
priority.
``parse_float``, if specified, will be called with the string
of every JSON float to be decoded. By default this is equivalent to
float(num_str). This can be used to use another datatype or parser
for JSON floats (e.g. decimal.Decimal).
``parse_int``, if specified, will be called with the string
of every JSON int to be decoded. By default this is equivalent to
int(num_str). This can be used to use another datatype or parser
for JSON integers (e.g. float).
``parse_constant``, if specified, will be called with one of the
following strings: -Infinity, Infinity, NaN.
This can be used to raise an exception if invalid JSON numbers
are encountered.
If ``strict`` is false (true is the default), then control
characters will be allowed inside strings. Control characters in
this context are those with character codes in the 0-31 range,
including ``'\\t'`` (tab), ``'\\n'``, ``'\\r'`` and ``'\\0'``.
"""
self.encoding = encoding
self.object_hook = object_hook
self.object_pairs_hook = object_pairs_hook
self.parse_float = parse_float or float
self.parse_int = parse_int or int
self.parse_constant = parse_constant or _CONSTANTS.__getitem__
self.strict = strict
self.parse_object = JSONObject
self.parse_array = JSONArray
self.parse_string = scanstring
self.scan_once = scanner.make_scanner(self)
def decode(self, s, _w=WHITESPACE.match):
"""Return the Python representation of ``s`` (a ``str`` or ``unicode``
instance containing a JSON document)
"""
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
end = _w(s, end).end()
if end != len(s):
raise ValueError(errmsg("Extra data", s, end, len(s)))
return obj
def raw_decode(self, s, idx=0):
"""Decode a JSON document from ``s`` (a ``str`` or ``unicode``
beginning with a JSON document) and return a 2-tuple of the Python
representation and the index in ``s`` where the document ended.
This can be used to decode a JSON document from a string that may
have extraneous data at the end.
"""
try:
obj, end = self.scan_once(s, idx)
except StopIteration:
raise ValueError("No JSON object could be decoded")
return obj, end
"""Implementation of JSONEncoder
"""
import re
try:
from _json import encode_basestring_ascii as c_encode_basestring_ascii
except ImportError:
c_encode_basestring_ascii = None
try:
from _json import make_encoder as c_make_encoder
except ImportError:
c_make_encoder = None
ESCAPE = re.compile(r'[\x00-\x1f\\"\b\f\n\r\t]')
ESCAPE_ASCII = re.compile(r'([\\"]|[^\ -~])')
HAS_UTF8 = re.compile(r'[\x80-\xff]')
ESCAPE_DCT = {
'\\': '\\\\',
'"': '\\"',
'\b': '\\b',
'\f': '\\f',
'\n': '\\n',
'\r': '\\r',
'\t': '\\t',
}
for i in range(0x20):
ESCAPE_DCT.setdefault(chr(i), '\\u{0:04x}'.format(i))
#ESCAPE_DCT.setdefault(chr(i), '\\u%04x' % (i,))
INFINITY = float('inf')
FLOAT_REPR = float.__repr__
def encode_basestring(s):
"""Return a JSON representation of a Python string
"""
def replace(match):
return ESCAPE_DCT[match.group(0)]
return '"' + ESCAPE.sub(replace, s) + '"'
def py_encode_basestring_ascii(s):
"""Return an ASCII-only JSON representation of a Python string
"""
if isinstance(s, str) and HAS_UTF8.search(s) is not None:
s = s.decode('utf-8')
def replace(match):
s = match.group(0)
try:
return ESCAPE_DCT[s]
except KeyError:
n = ord(s)
if n < 0x10000:
return '\\u{0:04x}'.format(n)
#return '\\u%04x' % (n,)
else:
# surrogate pair
n -= 0x10000
s1 = 0xd800 | ((n >> 10) & 0x3ff)
s2 = 0xdc00 | (n & 0x3ff)
return '\\u{0:04x}\\u{1:04x}'.format(s1, s2)
#return '\\u%04x\\u%04x' % (s1, s2)
return '"' + str(ESCAPE_ASCII.sub(replace, s)) + '"'
encode_basestring_ascii = (
c_encode_basestring_ascii or py_encode_basestring_ascii)
class JSONEncoder(object):
"""Extensible JSON <http://json.org> encoder for Python data structures.
Supports the following objects and types by default:
+-------------------+---------------+
| Python | JSON |
+===================+===============+
| dict | object |
+-------------------+---------------+
| list, tuple | array |
+-------------------+---------------+
| str, unicode | string |
+-------------------+---------------+
| int, long, float | number |
+-------------------+---------------+
| True | true |
+-------------------+---------------+
| False | false |
+-------------------+---------------+
| None | null |
+-------------------+---------------+
To extend this to recognize other objects, subclass and implement a
``.default()`` method with another method that returns a serializable
object for ``o`` if possible, otherwise it should call the superclass
implementation (to raise ``TypeError``).
"""
item_separator = ', '
key_separator = ': '
def __init__(self, skipkeys=False, ensure_ascii=True,
check_circular=True, allow_nan=True, sort_keys=False,
indent=None, separators=None, encoding='utf-8', default=None):
"""Constructor for JSONEncoder, with sensible defaults.
If skipkeys is false, then it is a TypeError to attempt
encoding of keys that are not str, int, long, float or None. If
skipkeys is True, such items are simply skipped.
If *ensure_ascii* is true (the default), all non-ASCII
characters in the output are escaped with \uXXXX sequences,
and the results are str instances consisting of ASCII
characters only. If ensure_ascii is False, a result may be a
unicode instance. This usually happens if the input contains
unicode strings or the *encoding* parameter is used.
If check_circular is true, then lists, dicts, and custom encoded
objects will be checked for circular references during encoding to
prevent an infinite recursion (which would cause an OverflowError).
Otherwise, no such check takes place.
If allow_nan is true, then NaN, Infinity, and -Infinity will be
encoded as such. This behavior is not JSON specification compliant,
but is consistent with most JavaScript based encoders and decoders.
Otherwise, it will be a ValueError to encode such floats.
If sort_keys is true, then the output of dictionaries will be
sorted by key; this is useful for regression tests to ensure
that JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer, then JSON array
elements and object members will be pretty-printed with that
indent level. An indent level of 0 will only insert newlines.
None is the most compact representation. Since the default
item separator is ', ', the output might include trailing
whitespace when indent is specified. You can use
separators=(',', ': ') to avoid this.
If specified, separators should be a (item_separator, key_separator)
tuple. The default is (', ', ': '). To get the most compact JSON
representation you should specify (',', ':') to eliminate whitespace.
If specified, default is a function that gets called for objects
that can't otherwise be serialized. It should return a JSON encodable
version of the object or raise a ``TypeError``.
If encoding is not None, then all input strings will be
transformed into unicode using that encoding prior to JSON-encoding.
The default is UTF-8.
"""
self.skipkeys = skipkeys
self.ensure_ascii = ensure_ascii
self.check_circular = check_circular
self.allow_nan = allow_nan
self.sort_keys = sort_keys
self.indent = indent
if separators is not None:
self.item_separator, self.key_separator = separators
if default is not None:
self.default = default
self.encoding = encoding
def default(self, o):
"""Implement this method in a subclass such that it returns
a serializable object for ``o``, or calls the base implementation
(to raise a ``TypeError``).
For example, to support arbitrary iterators, you could
implement default like this::
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# Let the base class default method raise the TypeError
return JSONEncoder.default(self, o)
"""
raise TypeError(repr(o) + " is not JSON serializable")
def encode(self, o):
"""Return a JSON string representation of a Python data structure.
>>> JSONEncoder().encode({"foo": ["bar", "baz"]})
'{"foo": ["bar", "baz"]}'
"""
# This is for extremely simple cases and benchmarks.
if isinstance(o, basestring):
if isinstance(o, str):
_encoding = self.encoding
if (_encoding is not None
and not (_encoding == 'utf-8')):
o = o.decode(_encoding)
if self.ensure_ascii:
return encode_basestring_ascii(o)
else:
return encode_basestring(o)
# This doesn't pass the iterator directly to ''.join() because the
# exceptions aren't as detailed. The list call should be roughly
# equivalent to the PySequence_Fast that ''.join() would do.
chunks = self.iterencode(o, _one_shot=True)
if not isinstance(chunks, (list, tuple)):
chunks = list(chunks)
return ''.join(chunks)
def iterencode(self, o, _one_shot=False):
"""Encode the given object and yield each string
representation as available.
For example::
for chunk in JSONEncoder().iterencode(bigobject):
mysocket.write(chunk)
"""
if self.check_circular:
markers = {}
else:
markers = None
if self.ensure_ascii:
_encoder = encode_basestring_ascii
else:
_encoder = encode_basestring
if self.encoding != 'utf-8':
def _encoder(o, _orig_encoder=_encoder, _encoding=self.encoding):
if isinstance(o, str):
o = o.decode(_encoding)
return _orig_encoder(o)
def floatstr(o, allow_nan=self.allow_nan,
_repr=FLOAT_REPR, _inf=INFINITY, _neginf=-INFINITY):
# Check for specials. Note that this type of test is processor
# and/or platform-specific, so do tests which don't depend on the
# internals.
if o != o:
text = 'NaN'
elif o == _inf:
text = 'Infinity'
elif o == _neginf:
text = '-Infinity'
else:
return _repr(o)
if not allow_nan:
raise ValueError(
"Out of range float values are not JSON compliant: " +
repr(o))
return text
if (_one_shot and c_make_encoder is not None
and self.indent is None and not self.sort_keys):
_iterencode = c_make_encoder(
markers, self.default, _encoder, self.indent,
self.key_separator, self.item_separator, self.sort_keys,
self.skipkeys, self.allow_nan)
else:
_iterencode = _make_iterencode(
markers, self.default, _encoder, self.indent, floatstr,
self.key_separator, self.item_separator, self.sort_keys,
self.skipkeys, _one_shot)
return _iterencode(o, 0)
def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
_key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,
## HACK: hand-optimized bytecode; turn globals into locals
ValueError=ValueError,
basestring=basestring,
dict=dict,
float=float,
id=id,
int=int,
isinstance=isinstance,
list=list,
long=long,
str=str,
tuple=tuple,
):
def _iterencode_list(lst, _current_indent_level):
if not lst:
yield '[]'
return
if markers is not None:
markerid = id(lst)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = lst
buf = '['
if _indent is not None:
_current_indent_level += 1
newline_indent = '\n' + (' ' * (_indent * _current_indent_level))
separator = _item_separator + newline_indent
buf += newline_indent
else:
newline_indent = None
separator = _item_separator
first = True
for value in lst:
if first:
first = False
else:
buf = separator
if isinstance(value, basestring):
yield buf + _encoder(value)
elif value is None:
yield buf + 'null'
elif value is True:
yield buf + 'true'
elif value is False:
yield buf + 'false'
elif isinstance(value, (int, long)):
yield buf + str(value)
elif isinstance(value, float):
yield buf + _floatstr(value)
else:
yield buf
if isinstance(value, (list, tuple)):
chunks = _iterencode_list(value, _current_indent_level)
elif isinstance(value, dict):
chunks = _iterencode_dict(value, _current_indent_level)
else:
chunks = _iterencode(value, _current_indent_level)
for chunk in chunks:
yield chunk
if newline_indent is not None:
_current_indent_level -= 1
yield '\n' + (' ' * (_indent * _current_indent_level))
yield ']'
if markers is not None:
del markers[markerid]
def _iterencode_dict(dct, _current_indent_level):
if not dct:
yield '{}'
return
if markers is not None:
markerid = id(dct)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = dct
yield '{'
if _indent is not None:
_current_indent_level += 1
newline_indent = '\n' + (' ' * (_indent * _current_indent_level))
item_separator = _item_separator + newline_indent
yield newline_indent
else:
newline_indent = None
item_separator = _item_separator
first = True
if _sort_keys:
items = sorted(dct.items(), key=lambda kv: kv[0])
else:
items = dct.iteritems()
for key, value in items:
if isinstance(key, basestring):
pass
# JavaScript is weakly typed for these, so it makes sense to
# also allow them. Many encoders seem to do something like this.
elif isinstance(key, float):
key = _floatstr(key)
elif key is True:
key = 'true'
elif key is False:
key = 'false'
elif key is None:
key = 'null'
elif isinstance(key, (int, long)):
key = str(key)
elif _skipkeys:
continue
else:
raise TypeError("key " + repr(key) + " is not a string")
if first:
first = False
else:
yield item_separator
yield _encoder(key)
yield _key_separator
if isinstance(value, basestring):
yield _encoder(value)
elif value is None:
yield 'null'
elif value is True:
yield 'true'
elif value is False:
yield 'false'
elif isinstance(value, (int, long)):
yield str(value)
elif isinstance(value, float):
yield _floatstr(value)
else:
if isinstance(value, (list, tuple)):
chunks = _iterencode_list(value, _current_indent_level)
elif isinstance(value, dict):
chunks = _iterencode_dict(value, _current_indent_level)
else:
chunks = _iterencode(value, _current_indent_level)
for chunk in chunks:
yield chunk
if newline_indent is not None:
_current_indent_level -= 1
yield '\n' + (' ' * (_indent * _current_indent_level))
yield '}'
if markers is not None:
del markers[markerid]
def _iterencode(o, _current_indent_level):
if isinstance(o, basestring):
yield _encoder(o)
elif o is None:
yield 'null'
elif o is True:
yield 'true'
elif o is False:
yield 'false'
elif isinstance(o, (int, long)):
yield str(o)
elif isinstance(o, float):
yield _floatstr(o)
elif isinstance(o, (list, tuple)):
for chunk in _iterencode_list(o, _current_indent_level):
yield chunk
elif isinstance(o, dict):
for chunk in _iterencode_dict(o, _current_indent_level):
yield chunk
else:
if markers is not None:
markerid = id(o)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = o
o = _default(o)
for chunk in _iterencode(o, _current_indent_level):
yield chunk
if markers is not None:
del markers[markerid]
return _iterencode
"""JSON token scanner
"""
import re
try:
from _json import make_scanner as c_make_scanner
except ImportError:
c_make_scanner = None
__all__ = ['make_scanner']
NUMBER_RE = re.compile(
r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))
def py_make_scanner(context):
parse_object = context.parse_object
parse_array = context.parse_array
parse_string = context.parse_string
match_number = NUMBER_RE.match
encoding = context.encoding
strict = context.strict
parse_float = context.parse_float
parse_int = context.parse_int
parse_constant = context.parse_constant
object_hook = context.object_hook
object_pairs_hook = context.object_pairs_hook
def _scan_once(string, idx):
try:
nextchar = string[idx]
except IndexError:
raise StopIteration
if nextchar == '"':
return parse_string(string, idx + 1, encoding, strict)
elif nextchar == '{':
return parse_object((string, idx + 1), encoding, strict,
_scan_once, object_hook, object_pairs_hook)
elif nextchar == '[':
return parse_array((string, idx + 1), _scan_once)
elif nextchar == 'n' and string[idx:idx + 4] == 'null':
return None, idx + 4
elif nextchar == 't' and string[idx:idx + 4] == 'true':
return True, idx + 4
elif nextchar == 'f' and string[idx:idx + 5] == 'false':
return False, idx + 5
m = match_number(string, idx)
if m is not None:
integer, frac, exp = m.groups()
if frac or exp:
res = parse_float(integer + (frac or '') + (exp or ''))
else:
res = parse_int(integer)
return res, m.end()
elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
return parse_constant('NaN'), idx + 3
elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
return parse_constant('Infinity'), idx + 8
elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
return parse_constant('-Infinity'), idx + 9
else:
raise StopIteration
return _scan_once
make_scanner = c_make_scanner or py_make_scanner
r"""Command-line tool to validate and pretty-print JSON
Usage::
$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
$ echo '{ 1.2:3.4}' | python -m json.tool
Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
"""
import sys
import json
def main():
if len(sys.argv) == 1:
infile = sys.stdin
outfile = sys.stdout
elif len(sys.argv) == 2:
infile = open(sys.argv[1], 'rb')
outfile = sys.stdout
elif len(sys.argv) == 3:
infile = open(sys.argv[1], 'rb')
outfile = open(sys.argv[2], 'wb')
else:
raise SystemExit(sys.argv[0] + " [infile [outfile]]")
with infile:
try:
obj = json.load(infile)
except ValueError, e:
raise SystemExit(e)
with outfile:
json.dump(obj, outfile, sort_keys=True,
indent=4, separators=(',', ': '))
outfile.write('\n')
if __name__ == '__main__':
main()
r"""JSON (JavaScript Object Notation) <http://json.org> is a subset of
JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
interchange format.
:mod:`json` exposes an API familiar to users of the standard library
:mod:`marshal` and :mod:`pickle` modules. It is the externally maintained
version of the :mod:`json` library contained in Python 2.6, but maintains
compatibility with Python 2.4 and Python 2.5 and (currently) has
significant performance advantages, even without using the optional C
extension for speedups.
Encoding basic Python object hierarchies::
>>> import json
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
'["foo", {"bar": ["baz", null, 1.0, 2]}]'
>>> print json.dumps("\"foo\bar")
"\"foo\bar"
>>> print json.dumps(u'\u1234')
"\u1234"
>>> print json.dumps('\\')
"\\"
>>> print json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True)
{"a": 0, "b": 0, "c": 0}
>>> from StringIO import StringIO
>>> io = StringIO()
>>> json.dump(['streaming API'], io)
>>> io.getvalue()
'["streaming API"]'
Compact encoding::
>>> import json
>>> json.dumps([1,2,3,{'4': 5, '6': 7}], sort_keys=True, separators=(',',':'))
'[1,2,3,{"4":5,"6":7}]'
Pretty printing::
>>> import json
>>> print json.dumps({'4': 5, '6': 7}, sort_keys=True,
... indent=4, separators=(',', ': '))
{
"4": 5,
"6": 7
}
Decoding JSON::
>>> import json
>>> obj = [u'foo', {u'bar': [u'baz', None, 1.0, 2]}]
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]') == obj
True
>>> json.loads('"\\"foo\\bar"') == u'"foo\x08ar'
True
>>> from StringIO import StringIO
>>> io = StringIO('["streaming API"]')
>>> json.load(io)[0] == 'streaming API'
True
Specializing JSON object decoding::
>>> import json
>>> def as_complex(dct):
... if '__complex__' in dct:
... return complex(dct['real'], dct['imag'])
... return dct
...
>>> json.loads('{"__complex__": true, "real": 1, "imag": 2}',
... object_hook=as_complex)
(1+2j)
>>> from decimal import Decimal
>>> json.loads('1.1', parse_float=Decimal) == Decimal('1.1')
True
Specializing JSON object encoding::
>>> import json
>>> def encode_complex(obj):
... if isinstance(obj, complex):
... return [obj.real, obj.imag]
... raise TypeError(repr(obj) + " is not JSON serializable")
...
>>> json.dumps(2 + 1j, default=encode_complex)
'[2.0, 1.0]'
>>> json.JSONEncoder(default=encode_complex).encode(2 + 1j)
'[2.0, 1.0]'
>>> ''.join(json.JSONEncoder(default=encode_complex).iterencode(2 + 1j))
'[2.0, 1.0]'
Using json.tool from the shell to validate and pretty-print::
$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
$ echo '{ 1.2:3.4}' | python -m json.tool
Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
"""
__version__ = '2.0.9'
__all__ = [
'dump', 'dumps', 'load', 'loads',
'JSONDecoder', 'JSONEncoder',
]
__author__ = 'Bob Ippolito <[email protected]>'
from .decoder import JSONDecoder
from .encoder import JSONEncoder
_default_encoder = JSONEncoder(
skipkeys=False,
ensure_ascii=True,
check_circular=True,
allow_nan=True,
indent=None,
separators=None,
encoding='utf-8',
default=None,
)
def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None,
encoding='utf-8', default=None, sort_keys=False, **kw):
"""Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
``.write()``-supporting file-like object).
If ``skipkeys`` is true then ``dict`` keys that are not basic types
(``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
will be skipped instead of raising a ``TypeError``.
If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only. If ``ensure_ascii`` is
false, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.
If ``check_circular`` is false, then the circular reference check
for container types will be skipped and a circular reference will
result in an ``OverflowError`` (or worse).
If ``allow_nan`` is false, then it will be a ``ValueError`` to
serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``)
in strict compliance of the JSON specification, instead of using the
JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).
If ``indent`` is a non-negative integer, then JSON array elements and
object members will be pretty-printed with that indent level. An indent
level of 0 will only insert newlines. ``None`` is the most compact
representation. Since the default item separator is ``', '``, the
output might include trailing whitespace when ``indent`` is specified.
You can use ``separators=(',', ': ')`` to avoid this.
If ``separators`` is an ``(item_separator, dict_separator)`` tuple
then it will be used instead of the default ``(', ', ': ')`` separators.
``(',', ':')`` is the most compact JSON representation.
``encoding`` is the character encoding for str instances, default is UTF-8.
``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.
If *sort_keys* is true (default: ``False``), then the output of
dictionaries will be sorted by key.
To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
``.default()`` method to serialize additional types), specify it with
the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.
"""
# cached encoder
if (not skipkeys and ensure_ascii and
check_circular and allow_nan and
cls is None and indent is None and separators is None and
encoding == 'utf-8' and default is None and not sort_keys and not kw):
iterable = _default_encoder.iterencode(obj)
else:
if cls is None:
cls = JSONEncoder
iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
check_circular=check_circular, allow_nan=allow_nan, indent=indent,
separators=separators, encoding=encoding,
default=default, sort_keys=sort_keys, **kw).iterencode(obj)
# could accelerate with writelines in some versions of Python, at
# a debuggability cost
for chunk in iterable:
fp.write(chunk)
def dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None,
encoding='utf-8', default=None, sort_keys=False, **kw):
"""Serialize ``obj`` to a JSON formatted ``str``.
If ``skipkeys`` is true then ``dict`` keys that are not basic types
(``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
will be skipped instead of raising a ``TypeError``.
If ``ensure_ascii`` is false, all non-ASCII characters are not escaped, and
the return value may be a ``unicode`` instance. See ``dump`` for details.
If ``check_circular`` is false, then the circular reference check
for container types will be skipped and a circular reference will
result in an ``OverflowError`` (or worse).
If ``allow_nan`` is false, then it will be a ``ValueError`` to
serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in
strict compliance of the JSON specification, instead of using the
JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).
If ``indent`` is a non-negative integer, then JSON array elements and
object members will be pretty-printed with that indent level. An indent
level of 0 will only insert newlines. ``None`` is the most compact
representation. Since the default item separator is ``', '``, the
output might include trailing whitespace when ``indent`` is specified.
You can use ``separators=(',', ': ')`` to avoid this.
If ``separators`` is an ``(item_separator, dict_separator)`` tuple
then it will be used instead of the default ``(', ', ': ')`` separators.
``(',', ':')`` is the most compact JSON representation.
``encoding`` is the character encoding for str instances, default is UTF-8.
``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.
If *sort_keys* is true (default: ``False``), then the output of
dictionaries will be sorted by key.
To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
``.default()`` method to serialize additional types), specify it with
the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.
"""
# cached encoder
if (not skipkeys and ensure_ascii and
check_circular and allow_nan and
cls is None and indent is None and separators is None and
encoding == 'utf-8' and default is None and not sort_keys and not kw):
return _default_encoder.encode(obj)
if cls is None:
cls = JSONEncoder
return cls(
skipkeys=skipkeys, ensure_ascii=ensure_ascii,
check_circular=check_circular, allow_nan=allow_nan, indent=indent,
separators=separators, encoding=encoding, default=default,
sort_keys=sort_keys, **kw).encode(obj)
_default_decoder = JSONDecoder(encoding=None, object_hook=None,
object_pairs_hook=None)
def load(fp, encoding=None, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
"""Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
a JSON document) to a Python object.
If the contents of ``fp`` is encoded with an ASCII based encoding other
than utf-8 (e.g. latin-1), then an appropriate ``encoding`` name must
be specified. Encodings that are not ASCII based (such as UCS-2) are
not allowed, and should be wrapped with
``codecs.getreader(fp)(encoding)``, or simply decoded to a ``unicode``
object and passed to ``loads()``
``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``. This feature
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
``object_pairs_hook`` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict will remember the order of insertion). If
``object_hook`` is also defined, the ``object_pairs_hook`` takes priority.
To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
kwarg; otherwise ``JSONDecoder`` is used.
"""
return loads(fp.read(),
encoding=encoding, cls=cls, object_hook=object_hook,
parse_float=parse_float, parse_int=parse_int,
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook,
**kw)
def loads(s, encoding=None, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
"""Deserialize ``s`` (a ``str`` or ``unicode`` instance containing a JSON
document) to a Python object.
If ``s`` is a ``str`` instance and is encoded with an ASCII based encoding
other than utf-8 (e.g. latin-1) then an appropriate ``encoding`` name
must be specified. Encodings that are not ASCII based (such as UCS-2)
are not allowed and should be decoded to ``unicode`` first.
``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``. This feature
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
``object_pairs_hook`` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict will remember the order of insertion). If
``object_hook`` is also defined, the ``object_pairs_hook`` takes priority.
``parse_float``, if specified, will be called with the string
of every JSON float to be decoded. By default this is equivalent to
float(num_str). This can be used to use another datatype or parser
for JSON floats (e.g. decimal.Decimal).
``parse_int``, if specified, will be called with the string
of every JSON int to be decoded. By default this is equivalent to
int(num_str). This can be used to use another datatype or parser
for JSON integers (e.g. float).
``parse_constant``, if specified, will be called with one of the
following strings: -Infinity, Infinity, NaN.
This can be used to raise an exception if invalid JSON numbers
are encountered.
To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
kwarg; otherwise ``JSONDecoder`` is used.
"""
if (cls is None and encoding is None and object_hook is None and
parse_int is None and parse_float is None and
parse_constant is None and object_pairs_hook is None and not kw):
return _default_decoder.decode(s)
if cls is None:
cls = JSONDecoder
if object_hook is not None:
kw['object_hook'] = object_hook
if object_pairs_hook is not None:
kw['object_pairs_hook'] = object_pairs_hook
if parse_float is not None:
kw['parse_float'] = parse_float
if parse_int is not None:
kw['parse_int'] = parse_int
if parse_constant is not None:
kw['parse_constant'] = parse_constant
return cls(encoding=encoding, **kw).decode(s)
#! /usr/bin/env python
"""Keywords (from "graminit.c")
This file is automatically generated; please don't muck it up!
To update the symbols in this file, 'cd' to the top directory of
the python source tree after building the interpreter and run:
./python Lib/keyword.py
"""
__all__ = ["iskeyword", "kwlist"]
kwlist = [
#--start keywords--
'and',
'as',
'assert',
'break',
'class',
'continue',
'def',
'del',
'elif',
'else',
'except',
'exec',
'finally',
'for',
'from',
'global',
'if',
'import',
'in',
'is',
'lambda',
'not',
'or',
'pass',
'print',
'raise',
'return',
'try',
'while',
'with',
'yield',
#--end keywords--
]
iskeyword = frozenset(kwlist).__contains__
def main():
import sys, re
args = sys.argv[1:]
iptfile = args and args[0] or "Python/graminit.c"
if len(args) > 1: optfile = args[1]
else: optfile = "Lib/keyword.py"
# scan the source file for keywords
fp = open(iptfile)
strprog = re.compile('"([^"]+)"')
lines = []
for line in fp:
if '{1, "' in line:
match = strprog.search(line)
if match:
lines.append(" '" + match.group(1) + "',\n")
fp.close()
lines.sort()
# load the output skeleton from the target
fp = open(optfile)
format = fp.readlines()
fp.close()
# insert the lines of keywords
try:
start = format.index("#--start keywords--\n") + 1
end = format.index("#--end keywords--\n")
format[start:end] = lines
except ValueError:
sys.stderr.write("target does not contain format markers\n")
sys.exit(1)
# write the output file
fp = open(optfile, 'w')
fp.write(''.join(format))
fp.close()
if __name__ == "__main__":
main()
"""A bottom-up tree matching algorithm implementation meant to speed
up 2to3's matching process. After the tree patterns are reduced to
their rarest linear path, a linear Aho-Corasick automaton is
created. The linear automaton traverses the linear paths from the
leaves to the root of the AST and returns a set of nodes for further
matching. This reduces significantly the number of candidate nodes."""
__author__ = "George Boutsioukis <[email protected]>"
import logging
import itertools
from collections import defaultdict
from . import pytree
from .btm_utils import reduce_tree
class BMNode(object):
"""Class for a node of the Aho-Corasick automaton used in matching"""
count = itertools.count()
def __init__(self):
self.transition_table = {}
self.fixers = []
self.id = next(BMNode.count)
self.content = ''
class BottomMatcher(object):
"""The main matcher class. After instantiating the patterns should
be added using the add_fixer method"""
def __init__(self):
self.match = set()
self.root = BMNode()
self.nodes = [self.root]
self.fixers = []
self.logger = logging.getLogger("RefactoringTool")
def add_fixer(self, fixer):
"""Reduces a fixer's pattern tree to a linear path and adds it
to the matcher(a common Aho-Corasick automaton). The fixer is
appended on the matching states and called when they are
reached"""
self.fixers.append(fixer)
tree = reduce_tree(fixer.pattern_tree)
linear = tree.get_linear_subpattern()
match_nodes = self.add(linear, start=self.root)
for match_node in match_nodes:
match_node.fixers.append(fixer)
def add(self, pattern, start):
"Recursively adds a linear pattern to the AC automaton"
#print("adding pattern", pattern, "to", start)
if not pattern:
#print("empty pattern")
return [start]
if isinstance(pattern[0], tuple):
#alternatives
#print("alternatives")
match_nodes = []
for alternative in pattern[0]:
#add all alternatives, and add the rest of the pattern
#to each end node
end_nodes = self.add(alternative, start=start)
for end in end_nodes:
match_nodes.extend(self.add(pattern[1:], end))
return match_nodes
else:
#single token
#not last
if pattern[0] not in start.transition_table:
#transition did not exist, create new
next_node = BMNode()
start.transition_table[pattern[0]] = next_node
else:
#transition exists already, follow
next_node = start.transition_table[pattern[0]]
if pattern[1:]:
end_nodes = self.add(pattern[1:], start=next_node)
else:
end_nodes = [next_node]
return end_nodes
def run(self, leaves):
"""The main interface with the bottom matcher. The tree is
traversed from the bottom using the constructed
automaton. Nodes are only checked once as the tree is
retraversed. When the automaton fails, we give it one more
shot(in case the above tree matches as a whole with the
rejected leaf), then we break for the next leaf. There is the
special case of multiple arguments(see code comments) where we
recheck the nodes
Args:
The leaves of the AST tree to be matched
Returns:
A dictionary of node matches with fixers as the keys
"""
current_ac_node = self.root
results = defaultdict(list)
for leaf in leaves:
current_ast_node = leaf
while current_ast_node:
current_ast_node.was_checked = True
for child in current_ast_node.children:
# multiple statements, recheck
if isinstance(child, pytree.Leaf) and child.value == u";":
current_ast_node.was_checked = False
break
if current_ast_node.type == 1:
#name
node_token = current_ast_node.value
else:
node_token = current_ast_node.type
if node_token in current_ac_node.transition_table:
#token matches
current_ac_node = current_ac_node.transition_table[node_token]
for fixer in current_ac_node.fixers:
if not fixer in results:
results[fixer] = []
results[fixer].append(current_ast_node)
else:
#matching failed, reset automaton
current_ac_node = self.root
if (current_ast_node.parent is not None
and current_ast_node.parent.was_checked):
#the rest of the tree upwards has been checked, next leaf
break
#recheck the rejected node once from the root
if node_token in current_ac_node.transition_table:
#token matches
current_ac_node = current_ac_node.transition_table[node_token]
for fixer in current_ac_node.fixers:
if not fixer in results.keys():
results[fixer] = []
results[fixer].append(current_ast_node)
current_ast_node = current_ast_node.parent
return results
def print_ac(self):
"Prints a graphviz diagram of the BM automaton(for debugging)"
print("digraph g{")
def print_node(node):
for subnode_key in node.transition_table.keys():
subnode = node.transition_table[subnode_key]
print("%d -> %d [label=%s] //%s" %
(node.id, subnode.id, type_repr(subnode_key), str(subnode.fixers)))
if subnode_key == 1:
print(subnode.content)
print_node(subnode)
print_node(self.root)
print("}")
# taken from pytree.py for debugging; only used by print_ac
_type_reprs = {}
def type_repr(type_num):
global _type_reprs
if not _type_reprs:
from .pygram import python_symbols
# printing tokens is possible but not as useful
# from .pgen2 import token // token.__dict__.items():
for name, val in python_symbols.__dict__.items():
if type(val) == int: _type_reprs[val] = name
return _type_reprs.setdefault(type_num, type_num)
"Utility functions used by the btm_matcher module"
from . import pytree
from .pgen2 import grammar, token
from .pygram import pattern_symbols, python_symbols
syms = pattern_symbols
pysyms = python_symbols
tokens = grammar.opmap
token_labels = token
TYPE_ANY = -1
TYPE_ALTERNATIVES = -2
TYPE_GROUP = -3
class MinNode(object):
"""This class serves as an intermediate representation of the
pattern tree during the conversion to sets of leaf-to-root
subpatterns"""
def __init__(self, type=None, name=None):
self.type = type
self.name = name
self.children = []
self.leaf = False
self.parent = None
self.alternatives = []
self.group = []
def __repr__(self):
return str(self.type) + ' ' + str(self.name)
def leaf_to_root(self):
"""Internal method. Returns a characteristic path of the
pattern tree. This method must be run for all leaves until the
linear subpatterns are merged into a single"""
node = self
subp = []
while node:
if node.type == TYPE_ALTERNATIVES:
node.alternatives.append(subp)
if len(node.alternatives) == len(node.children):
#last alternative
subp = [tuple(node.alternatives)]
node.alternatives = []
node = node.parent
continue
else:
node = node.parent
subp = None
break
if node.type == TYPE_GROUP:
node.group.append(subp)
#probably should check the number of leaves
if len(node.group) == len(node.children):
subp = get_characteristic_subpattern(node.group)
node.group = []
node = node.parent
continue
else:
node = node.parent
subp = None
break
if node.type == token_labels.NAME and node.name:
#in case of type=name, use the name instead
subp.append(node.name)
else:
subp.append(node.type)
node = node.parent
return subp
def get_linear_subpattern(self):
"""Drives the leaf_to_root method. The reason that
leaf_to_root must be run multiple times is because we need to
reject 'group' matches; for example the alternative form
(a | b c) creates a group [b c] that needs to be matched. Since
matching multiple linear patterns overcomes the automaton's
capabilities, leaf_to_root merges each group into a single
choice based on 'characteristic'ity,
i.e. (a|b c) -> (a|b) if b more characteristic than c
Returns: The most 'characteristic'(as defined by
get_characteristic_subpattern) path for the compiled pattern
tree.
"""
for l in self.leaves():
subp = l.leaf_to_root()
if subp:
return subp
def leaves(self):
"Generator that returns the leaves of the tree"
for child in self.children:
for x in child.leaves():
yield x
if not self.children:
yield self
def reduce_tree(node, parent=None):
"""
Internal function. Reduces a compiled pattern tree to an
intermediate representation suitable for feeding the
automaton. This also trims off any optional pattern elements(like
[a], a*).
"""
new_node = None
#switch on the node type
if node.type == syms.Matcher:
#skip
node = node.children[0]
if node.type == syms.Alternatives :
#2 cases
if len(node.children) <= 2:
#just a single 'Alternative', skip this node
new_node = reduce_tree(node.children[0], parent)
else:
#real alternatives
new_node = MinNode(type=TYPE_ALTERNATIVES)
#skip odd children('|' tokens)
for child in node.children:
if node.children.index(child)%2:
continue
reduced = reduce_tree(child, new_node)
if reduced is not None:
new_node.children.append(reduced)
elif node.type == syms.Alternative:
if len(node.children) > 1:
new_node = MinNode(type=TYPE_GROUP)
for child in node.children:
reduced = reduce_tree(child, new_node)
if reduced:
new_node.children.append(reduced)
if not new_node.children:
# delete the group if all of the children were reduced to None
new_node = None
else:
new_node = reduce_tree(node.children[0], parent)
elif node.type == syms.Unit:
if (isinstance(node.children[0], pytree.Leaf) and
node.children[0].value == '('):
#skip parentheses
return reduce_tree(node.children[1], parent)
if ((isinstance(node.children[0], pytree.Leaf) and
node.children[0].value == '[')
or
(len(node.children)>1 and
hasattr(node.children[1], "value") and
node.children[1].value == '[')):
#skip whole unit if its optional
return None
leaf = True
details_node = None
alternatives_node = None
has_repeater = False
repeater_node = None
has_variable_name = False
for child in node.children:
if child.type == syms.Details:
leaf = False
details_node = child
elif child.type == syms.Repeater:
has_repeater = True
repeater_node = child
elif child.type == syms.Alternatives:
alternatives_node = child
if hasattr(child, 'value') and child.value == '=': # variable name
has_variable_name = True
#skip variable name
if has_variable_name:
#skip variable name, '='
name_leaf = node.children[2]
if hasattr(name_leaf, 'value') and name_leaf.value == '(':
# skip parenthesis
name_leaf = node.children[3]
else:
name_leaf = node.children[0]
#set node type
if name_leaf.type == token_labels.NAME:
#(python) non-name or wildcard
if name_leaf.value == 'any':
new_node = MinNode(type=TYPE_ANY)
else:
if hasattr(token_labels, name_leaf.value):
new_node = MinNode(type=getattr(token_labels, name_leaf.value))
else:
new_node = MinNode(type=getattr(pysyms, name_leaf.value))
elif name_leaf.type == token_labels.STRING:
#(python) name or character; remove the apostrophes from
#the string value
name = name_leaf.value.strip("'")
if name in tokens:
new_node = MinNode(type=tokens[name])
else:
new_node = MinNode(type=token_labels.NAME, name=name)
elif name_leaf.type == syms.Alternatives:
new_node = reduce_tree(alternatives_node, parent)
#handle repeaters
if has_repeater:
if repeater_node.children[0].value == '*':
#reduce to None
new_node = None
elif repeater_node.children[0].value == '+':
#reduce to a single occurrence i.e. do nothing
pass
else:
#TODO: handle {min, max} repeaters
raise NotImplementedError
pass
#add children
if details_node and new_node is not None:
for child in details_node.children[1:-1]:
#skip '<', '>' markers
reduced = reduce_tree(child, new_node)
if reduced is not None:
new_node.children.append(reduced)
if new_node:
new_node.parent = parent
return new_node
def get_characteristic_subpattern(subpatterns):
"""Picks the most characteristic from a list of linear patterns
Current order used is:
names > common_names > common_chars
"""
if not isinstance(subpatterns, list):
return subpatterns
if len(subpatterns)==1:
return subpatterns[0]
# first pick out the ones containing variable names
subpatterns_with_names = []
subpatterns_with_common_names = []
common_names = ['in', 'for', 'if' , 'not', 'None']
subpatterns_with_common_chars = []
common_chars = "[]().,:"
for subpattern in subpatterns:
if any(rec_test(subpattern, lambda x: type(x) is str)):
if any(rec_test(subpattern,
lambda x: isinstance(x, str) and x in common_chars)):
subpatterns_with_common_chars.append(subpattern)
elif any(rec_test(subpattern,
lambda x: isinstance(x, str) and x in common_names)):
subpatterns_with_common_names.append(subpattern)
else:
subpatterns_with_names.append(subpattern)
if subpatterns_with_names:
subpatterns = subpatterns_with_names
elif subpatterns_with_common_names:
subpatterns = subpatterns_with_common_names
elif subpatterns_with_common_chars:
subpatterns = subpatterns_with_common_chars
# of the remaining subpatterns pick out the longest one
return max(subpatterns, key=len)
def rec_test(sequence, test_func):
"""Tests test_func on all items of sequence and items of included
sub-iterables"""
for x in sequence:
if isinstance(x, (list, tuple)):
for y in rec_test(x, test_func):
yield y
else:
yield test_func(x)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Base class for fixers (optional, but recommended)."""
# Python imports
import itertools
# Local imports
from .patcomp import PatternCompiler
from . import pygram
from .fixer_util import does_tree_import
class BaseFix(object):
"""Optional base class for fixers.
The subclass name must be FixFooBar where FooBar is the result of
removing underscores and capitalizing the words of the fix name.
For example, the class name for a fixer named 'has_key' should be
FixHasKey.
"""
PATTERN = None # Most subclasses should override with a string literal
pattern = None # Compiled pattern, set by compile_pattern()
pattern_tree = None # Tree representation of the pattern
options = None # Options object passed to initializer
filename = None # The filename (set by set_filename)
logger = None # A logger (set by set_filename)
numbers = itertools.count(1) # For new_name()
used_names = set() # A set of all used NAMEs
order = "post" # Does the fixer prefer pre- or post-order traversal
explicit = False # Is this ignored by refactor.py -f all?
run_order = 5 # Fixers will be sorted by run order before execution
# Lower numbers will be run first.
_accept_type = None # [Advanced and not public] This tells RefactoringTool
# which node type to accept when there's not a pattern.
keep_line_order = False # For the bottom matcher: match with the
# original line order
BM_compatible = False # Compatibility with the bottom matching
# module; every fixer should set this
# manually
# Shortcut for access to Python grammar symbols
syms = pygram.python_symbols
def __init__(self, options, log):
"""Initializer. Subclass may override.
Args:
options: a dict containing the options passed to RefactoringTool
that could be used to customize the fixer through the command line.
log: a list to append warnings and other messages to.
"""
self.options = options
self.log = log
self.compile_pattern()
def compile_pattern(self):
"""Compiles self.PATTERN into self.pattern.
Subclass may override if it doesn't want to use
self.{pattern,PATTERN} in .match().
"""
if self.PATTERN is not None:
PC = PatternCompiler()
self.pattern, self.pattern_tree = PC.compile_pattern(self.PATTERN,
with_tree=True)
def set_filename(self, filename):
"""Set the filename, and a logger derived from it.
The main refactoring tool should call this.
"""
self.filename = filename
def match(self, node):
"""Returns match for a given parse tree node.
Should return a true or false object (not necessarily a bool).
It may return a non-empty dict of matching sub-nodes as
returned by a matching pattern.
Subclass may override.
"""
results = {"node": node}
return self.pattern.match(node, results) and results
def transform(self, node, results):
"""Returns the transformation for a given parse tree node.
Args:
node: the root of the parse tree that matched the fixer.
results: a dict mapping symbolic names to part of the match.
Returns:
None, or a node that is a modified copy of the
argument node. The node argument may also be modified in-place to
effect the same change.
Subclass *must* override.
"""
raise NotImplementedError()
def new_name(self, template=u"xxx_todo_changeme"):
"""Return a string suitable for use as an identifier
The new name is guaranteed not to conflict with other identifiers.
"""
name = template
while name in self.used_names:
name = template + unicode(self.numbers.next())
self.used_names.add(name)
return name
def log_message(self, message):
if self.first_log:
self.first_log = False
self.log.append("### In file %s ###" % self.filename)
self.log.append(message)
def cannot_convert(self, node, reason=None):
"""Warn the user that a given chunk of code is not valid Python 3,
but that it cannot be converted automatically.
First argument is the top-level node for the code in question.
Optional second argument is why it can't be converted.
"""
lineno = node.get_lineno()
for_output = node.clone()
for_output.prefix = u""
msg = "Line %d: could not convert: %s"
self.log_message(msg % (lineno, for_output))
if reason:
self.log_message(reason)
def warning(self, node, reason):
"""Used for warning the user about possible uncertainty in the
translation.
First argument is the top-level node for the code in question.
Optional second argument is why it can't be converted.
"""
lineno = node.get_lineno()
self.log_message("Line %d: %s" % (lineno, reason))
def start_tree(self, tree, filename):
"""Some fixers need to maintain tree-wide state.
This method is called once, at the start of tree fix-up.
tree - the root node of the tree to be processed.
filename - the name of the file the tree came from.
"""
self.used_names = tree.used_names
self.set_filename(filename)
self.numbers = itertools.count(1)
self.first_log = True
def finish_tree(self, tree, filename):
"""Some fixers need to maintain tree-wide state.
This method is called once, at the conclusion of tree fix-up.
tree - the root node of the tree to be processed.
filename - the name of the file the tree came from.
"""
pass
class ConditionalFix(BaseFix):
""" Base class for fixers which not execute if an import is found. """
# This is the name of the import which, if found, will cause the test to be skipped
skip_on = None
def start_tree(self, *args):
super(ConditionalFix, self).start_tree(*args)
self._should_skip = None
def should_skip(self, node):
if self._should_skip is not None:
return self._should_skip
pkg = self.skip_on.split(".")
name = pkg[-1]
pkg = ".".join(pkg[:-1])
self._should_skip = does_tree_import(pkg, name, node)
return self._should_skip
"""Utility functions, node construction macros, etc."""
# Author: Collin Winter
from itertools import islice
# Local imports
from .pgen2 import token
from .pytree import Leaf, Node
from .pygram import python_symbols as syms
from . import patcomp
###########################################################
### Common node-construction "macros"
###########################################################
def KeywordArg(keyword, value):
return Node(syms.argument,
[keyword, Leaf(token.EQUAL, u"="), value])
def LParen():
return Leaf(token.LPAR, u"(")
def RParen():
return Leaf(token.RPAR, u")")
def Assign(target, source):
"""Build an assignment statement"""
if not isinstance(target, list):
target = [target]
if not isinstance(source, list):
source.prefix = u" "
source = [source]
return Node(syms.atom,
target + [Leaf(token.EQUAL, u"=", prefix=u" ")] + source)
def Name(name, prefix=None):
"""Return a NAME leaf"""
return Leaf(token.NAME, name, prefix=prefix)
def Attr(obj, attr):
"""A node tuple for obj.attr"""
return [obj, Node(syms.trailer, [Dot(), attr])]
def Comma():
"""A comma leaf"""
return Leaf(token.COMMA, u",")
def Dot():
"""A period (.) leaf"""
return Leaf(token.DOT, u".")
def ArgList(args, lparen=LParen(), rparen=RParen()):
"""A parenthesised argument list, used by Call()"""
node = Node(syms.trailer, [lparen.clone(), rparen.clone()])
if args:
node.insert_child(1, Node(syms.arglist, args))
return node
def Call(func_name, args=None, prefix=None):
"""A function call"""
node = Node(syms.power, [func_name, ArgList(args)])
if prefix is not None:
node.prefix = prefix
return node
def Newline():
"""A newline literal"""
return Leaf(token.NEWLINE, u"\n")
def BlankLine():
"""A blank line"""
return Leaf(token.NEWLINE, u"")
def Number(n, prefix=None):
return Leaf(token.NUMBER, n, prefix=prefix)
def Subscript(index_node):
"""A numeric or string subscript"""
return Node(syms.trailer, [Leaf(token.LBRACE, u"["),
index_node,
Leaf(token.RBRACE, u"]")])
def String(string, prefix=None):
"""A string leaf"""
return Leaf(token.STRING, string, prefix=prefix)
def ListComp(xp, fp, it, test=None):
"""A list comprehension of the form [xp for fp in it if test].
If test is None, the "if test" part is omitted.
"""
xp.prefix = u""
fp.prefix = u" "
it.prefix = u" "
for_leaf = Leaf(token.NAME, u"for")
for_leaf.prefix = u" "
in_leaf = Leaf(token.NAME, u"in")
in_leaf.prefix = u" "
inner_args = [for_leaf, fp, in_leaf, it]
if test:
test.prefix = u" "
if_leaf = Leaf(token.NAME, u"if")
if_leaf.prefix = u" "
inner_args.append(Node(syms.comp_if, [if_leaf, test]))
inner = Node(syms.listmaker, [xp, Node(syms.comp_for, inner_args)])
return Node(syms.atom,
[Leaf(token.LBRACE, u"["),
inner,
Leaf(token.RBRACE, u"]")])
def FromImport(package_name, name_leafs):
""" Return an import statement in the form:
from package import name_leafs"""
# XXX: May not handle dotted imports properly (eg, package_name='foo.bar')
#assert package_name == '.' or '.' not in package_name, "FromImport has "\
# "not been tested with dotted package names -- use at your own "\
# "peril!"
for leaf in name_leafs:
# Pull the leaves out of their old tree
leaf.remove()
children = [Leaf(token.NAME, u"from"),
Leaf(token.NAME, package_name, prefix=u" "),
Leaf(token.NAME, u"import", prefix=u" "),
Node(syms.import_as_names, name_leafs)]
imp = Node(syms.import_from, children)
return imp
###########################################################
### Determine whether a node represents a given literal
###########################################################
def is_tuple(node):
"""Does the node represent a tuple literal?"""
if isinstance(node, Node) and node.children == [LParen(), RParen()]:
return True
return (isinstance(node, Node)
and len(node.children) == 3
and isinstance(node.children[0], Leaf)
and isinstance(node.children[1], Node)
and isinstance(node.children[2], Leaf)
and node.children[0].value == u"("
and node.children[2].value == u")")
def is_list(node):
"""Does the node represent a list literal?"""
return (isinstance(node, Node)
and len(node.children) > 1
and isinstance(node.children[0], Leaf)
and isinstance(node.children[-1], Leaf)
and node.children[0].value == u"["
and node.children[-1].value == u"]")
###########################################################
### Misc
###########################################################
def parenthesize(node):
return Node(syms.atom, [LParen(), node, RParen()])
consuming_calls = set(["sorted", "list", "set", "any", "all", "tuple", "sum",
"min", "max", "enumerate"])
def attr_chain(obj, attr):
"""Follow an attribute chain.
If you have a chain of objects where a.foo -> b, b.foo-> c, etc,
use this to iterate over all objects in the chain. Iteration is
terminated by getattr(x, attr) is None.
Args:
obj: the starting object
attr: the name of the chaining attribute
Yields:
Each successive object in the chain.
"""
next = getattr(obj, attr)
while next:
yield next
next = getattr(next, attr)
p0 = """for_stmt< 'for' any 'in' node=any ':' any* >
| comp_for< 'for' any 'in' node=any any* >
"""
p1 = """
power<
( 'iter' | 'list' | 'tuple' | 'sorted' | 'set' | 'sum' |
'any' | 'all' | 'enumerate' | (any* trailer< '.' 'join' >) )
trailer< '(' node=any ')' >
any*
>
"""
p2 = """
power<
( 'sorted' | 'enumerate' )
trailer< '(' arglist<node=any any*> ')' >
any*
>
"""
pats_built = False
def in_special_context(node):
""" Returns true if node is in an environment where all that is required
of it is being iterable (ie, it doesn't matter if it returns a list
or an iterator).
See test_map_nochange in test_fixers.py for some examples and tests.
"""
global p0, p1, p2, pats_built
if not pats_built:
p0 = patcomp.compile_pattern(p0)
p1 = patcomp.compile_pattern(p1)
p2 = patcomp.compile_pattern(p2)
pats_built = True
patterns = [p0, p1, p2]
for pattern, parent in zip(patterns, attr_chain(node, "parent")):
results = {}
if pattern.match(parent, results) and results["node"] is node:
return True
return False
def is_probably_builtin(node):
"""
Check that something isn't an attribute or function name etc.
"""
prev = node.prev_sibling
if prev is not None and prev.type == token.DOT:
# Attribute lookup.
return False
parent = node.parent
if parent.type in (syms.funcdef, syms.classdef):
return False
if parent.type == syms.expr_stmt and parent.children[0] is node:
# Assignment.
return False
if parent.type == syms.parameters or \
(parent.type == syms.typedargslist and (
(prev is not None and prev.type == token.COMMA) or
parent.children[0] is node
)):
# The name of an argument.
return False
return True
def find_indentation(node):
"""Find the indentation of *node*."""
while node is not None:
if node.type == syms.suite and len(node.children) > 2:
indent = node.children[1]
if indent.type == token.INDENT:
return indent.value
node = node.parent
return u""
###########################################################
### The following functions are to find bindings in a suite
###########################################################
def make_suite(node):
if node.type == syms.suite:
return node
node = node.clone()
parent, node.parent = node.parent, None
suite = Node(syms.suite, [node])
suite.parent = parent
return suite
def find_root(node):
"""Find the top level namespace."""
# Scamper up to the top level namespace
while node.type != syms.file_input:
node = node.parent
if not node:
raise ValueError("root found before file_input node was found.")
return node
def does_tree_import(package, name, node):
""" Returns true if name is imported from package at the
top level of the tree which node belongs to.
To cover the case of an import like 'import foo', use
None for the package and 'foo' for the name. """
binding = find_binding(name, find_root(node), package)
return bool(binding)
def is_import(node):
"""Returns true if the node is an import statement."""
return node.type in (syms.import_name, syms.import_from)
def touch_import(package, name, node):
""" Works like `does_tree_import` but adds an import statement
if it was not imported. """
def is_import_stmt(node):
return (node.type == syms.simple_stmt and node.children and
is_import(node.children[0]))
root = find_root(node)
if does_tree_import(package, name, root):
return
# figure out where to insert the new import. First try to find
# the first import and then skip to the last one.
insert_pos = offset = 0
for idx, node in enumerate(root.children):
if not is_import_stmt(node):
continue
for offset, node2 in enumerate(root.children[idx:]):
if not is_import_stmt(node2):
break
insert_pos = idx + offset
break
# if there are no imports where we can insert, find the docstring.
# if that also fails, we stick to the beginning of the file
if insert_pos == 0:
for idx, node in enumerate(root.children):
if (node.type == syms.simple_stmt and node.children and
node.children[0].type == token.STRING):
insert_pos = idx + 1
break
if package is None:
import_ = Node(syms.import_name, [
Leaf(token.NAME, u"import"),
Leaf(token.NAME, name, prefix=u" ")
])
else:
import_ = FromImport(package, [Leaf(token.NAME, name, prefix=u" ")])
children = [import_, Newline()]
root.insert_child(insert_pos, Node(syms.simple_stmt, children))
_def_syms = set([syms.classdef, syms.funcdef])
def find_binding(name, node, package=None):
""" Returns the node which binds variable name, otherwise None.
If optional argument package is supplied, only imports will
be returned.
See test cases for examples."""
for child in node.children:
ret = None
if child.type == syms.for_stmt:
if _find(name, child.children[1]):
return child
n = find_binding(name, make_suite(child.children[-1]), package)
if n: ret = n
elif child.type in (syms.if_stmt, syms.while_stmt):
n = find_binding(name, make_suite(child.children[-1]), package)
if n: ret = n
elif child.type == syms.try_stmt:
n = find_binding(name, make_suite(child.children[2]), package)
if n:
ret = n
else:
for i, kid in enumerate(child.children[3:]):
if kid.type == token.COLON and kid.value == ":":
# i+3 is the colon, i+4 is the suite
n = find_binding(name, make_suite(child.children[i+4]), package)
if n: ret = n
elif child.type in _def_syms and child.children[1].value == name:
ret = child
elif _is_import_binding(child, name, package):
ret = child
elif child.type == syms.simple_stmt:
ret = find_binding(name, child, package)
elif child.type == syms.expr_stmt:
if _find(name, child.children[0]):
ret = child
if ret:
if not package:
return ret
if is_import(ret):
return ret
return None
_block_syms = set([syms.funcdef, syms.classdef, syms.trailer])
def _find(name, node):
nodes = [node]
while nodes:
node = nodes.pop()
if node.type > 256 and node.type not in _block_syms:
nodes.extend(node.children)
elif node.type == token.NAME and node.value == name:
return node
return None
def _is_import_binding(node, name, package=None):
""" Will reuturn node if node will import name, or node
will import * from package. None is returned otherwise.
See test cases for examples. """
if node.type == syms.import_name and not package:
imp = node.children[1]
if imp.type == syms.dotted_as_names:
for child in imp.children:
if child.type == syms.dotted_as_name:
if child.children[2].value == name:
return node
elif child.type == token.NAME and child.value == name:
return node
elif imp.type == syms.dotted_as_name:
last = imp.children[-1]
if last.type == token.NAME and last.value == name:
return node
elif imp.type == token.NAME and imp.value == name:
return node
elif node.type == syms.import_from:
# unicode(...) is used to make life easier here, because
# from a.b import parses to ['import', ['a', '.', 'b'], ...]
if package and unicode(node.children[1]).strip() != package:
return None
n = node.children[3]
if package and _find(u"as", n):
# See test_from_import_as for explanation
return None
elif n.type == syms.import_as_names and _find(name, n):
return node
elif n.type == syms.import_as_name:
child = n.children[2]
if child.type == token.NAME and child.value == name:
return node
elif n.type == token.NAME and n.value == name:
return node
elif package and n.type == token.STAR:
return node
return None
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for apply().
This converts apply(func, v, k) into (func)(*v, **k)."""
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Call, Comma, parenthesize
class FixApply(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'apply'
trailer<
'('
arglist<
(not argument<NAME '=' any>) func=any ','
(not argument<NAME '=' any>) args=any [','
(not argument<NAME '=' any>) kwds=any] [',']
>
')'
>
>
"""
def transform(self, node, results):
syms = self.syms
assert results
func = results["func"]
args = results["args"]
kwds = results.get("kwds")
# I feel like we should be able to express this logic in the
# PATTERN above but I don't know how to do it so...
if args:
if args.type == self.syms.star_expr:
return # Make no change.
if (args.type == self.syms.argument and
args.children[0].value == '**'):
return # Make no change.
if kwds and (kwds.type == self.syms.argument and
kwds.children[0].value == '**'):
return # Make no change.
prefix = node.prefix
func = func.clone()
if (func.type not in (token.NAME, syms.atom) and
(func.type != syms.power or
func.children[-2].type == token.DOUBLESTAR)):
# Need to parenthesize
func = parenthesize(func)
func.prefix = ""
args = args.clone()
args.prefix = ""
if kwds is not None:
kwds = kwds.clone()
kwds.prefix = ""
l_newargs = [pytree.Leaf(token.STAR, u"*"), args]
if kwds is not None:
l_newargs.extend([Comma(),
pytree.Leaf(token.DOUBLESTAR, u"**"),
kwds])
l_newargs[-2].prefix = u" " # that's the ** token
# XXX Sometimes we could be cleverer, e.g. apply(f, (x, y) + t)
# can be translated into f(x, y, *t) instead of f(*(x, y) + t)
#new = pytree.Node(syms.power, (func, ArgList(l_newargs)))
return Call(func, l_newargs, prefix=prefix)
"""Fixer that replaces deprecated unittest method names."""
# Author: Ezio Melotti
from ..fixer_base import BaseFix
from ..fixer_util import Name
NAMES = dict(
assert_="assertTrue",
assertEquals="assertEqual",
assertNotEquals="assertNotEqual",
assertAlmostEquals="assertAlmostEqual",
assertNotAlmostEquals="assertNotAlmostEqual",
assertRegexpMatches="assertRegex",
assertRaisesRegexp="assertRaisesRegex",
failUnlessEqual="assertEqual",
failIfEqual="assertNotEqual",
failUnlessAlmostEqual="assertAlmostEqual",
failIfAlmostEqual="assertNotAlmostEqual",
failUnless="assertTrue",
failUnlessRaises="assertRaises",
failIf="assertFalse",
)
class FixAsserts(BaseFix):
PATTERN = """
power< any+ trailer< '.' meth=(%s)> any* >
""" % '|'.join(map(repr, NAMES))
def transform(self, node, results):
name = results["meth"][0]
name.replace(Name(NAMES[str(name)], prefix=name.prefix))
"""Fixer for basestring -> str."""
# Author: Christian Heimes
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixBasestring(fixer_base.BaseFix):
BM_compatible = True
PATTERN = "'basestring'"
def transform(self, node, results):
return Name(u"str", prefix=node.prefix)
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that changes buffer(...) into memoryview(...)."""
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixBuffer(fixer_base.BaseFix):
BM_compatible = True
explicit = True # The user must ask for this fixer
PATTERN = """
power< name='buffer' trailer< '(' [any] ')' > any* >
"""
def transform(self, node, results):
name = results["name"]
name.replace(Name(u"memoryview", prefix=name.prefix))
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for dict methods.
d.keys() -> list(d.keys())
d.items() -> list(d.items())
d.values() -> list(d.values())
d.iterkeys() -> iter(d.keys())
d.iteritems() -> iter(d.items())
d.itervalues() -> iter(d.values())
d.viewkeys() -> d.keys()
d.viewitems() -> d.items()
d.viewvalues() -> d.values()
Except in certain very specific contexts: the iter() can be dropped
when the context is list(), sorted(), iter() or for...in; the list()
can be dropped when the context is list() or sorted() (but not iter()
or for...in!). Special contexts that apply to both: list(), sorted(), tuple()
set(), any(), all(), sum().
Note: iter(d.keys()) could be written as iter(d) but since the
original d.iterkeys() was also redundant we don't fix this. And there
are (rare) contexts where it makes a difference (e.g. when passing it
as an argument to a function that introspects the argument).
"""
# Local imports
from .. import pytree
from .. import patcomp
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, LParen, RParen, ArgList, Dot
from .. import fixer_util
iter_exempt = fixer_util.consuming_calls | set(["iter"])
class FixDict(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< head=any+
trailer< '.' method=('keys'|'items'|'values'|
'iterkeys'|'iteritems'|'itervalues'|
'viewkeys'|'viewitems'|'viewvalues') >
parens=trailer< '(' ')' >
tail=any*
>
"""
def transform(self, node, results):
head = results["head"]
method = results["method"][0] # Extract node for method name
tail = results["tail"]
syms = self.syms
method_name = method.value
isiter = method_name.startswith(u"iter")
isview = method_name.startswith(u"view")
if isiter or isview:
method_name = method_name[4:]
assert method_name in (u"keys", u"items", u"values"), repr(method)
head = [n.clone() for n in head]
tail = [n.clone() for n in tail]
special = not tail and self.in_special_context(node, isiter)
args = head + [pytree.Node(syms.trailer,
[Dot(),
Name(method_name,
prefix=method.prefix)]),
results["parens"].clone()]
new = pytree.Node(syms.power, args)
if not (special or isview):
new.prefix = u""
new = Call(Name(u"iter" if isiter else u"list"), [new])
if tail:
new = pytree.Node(syms.power, [new] + tail)
new.prefix = node.prefix
return new
P1 = "power< func=NAME trailer< '(' node=any ')' > any* >"
p1 = patcomp.compile_pattern(P1)
P2 = """for_stmt< 'for' any 'in' node=any ':' any* >
| comp_for< 'for' any 'in' node=any any* >
"""
p2 = patcomp.compile_pattern(P2)
def in_special_context(self, node, isiter):
if node.parent is None:
return False
results = {}
if (node.parent.parent is not None and
self.p1.match(node.parent.parent, results) and
results["node"] is node):
if isiter:
# iter(d.iterkeys()) -> iter(d.keys()), etc.
return results["func"].value in iter_exempt
else:
# list(d.keys()) -> list(d.keys()), etc.
return results["func"].value in fixer_util.consuming_calls
if not isiter:
return False
# for ... in d.iterkeys() -> for ... in d.keys(), etc.
return self.p2.match(node.parent, results) and results["node"] is node
"""Fixer for except statements with named exceptions.
The following cases will be converted:
- "except E, T:" where T is a name:
except E as T:
- "except E, T:" where T is not a name, tuple or list:
except E as t:
T = t
This is done because the target of an "except" clause must be a
name.
- "except E, T:" where T is a tuple or list literal:
except E as t:
T = t.args
"""
# Author: Collin Winter
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Assign, Attr, Name, is_tuple, is_list, syms
def find_excepts(nodes):
for i, n in enumerate(nodes):
if n.type == syms.except_clause:
if n.children[0].value == u'except':
yield (n, nodes[i+2])
class FixExcept(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
try_stmt< 'try' ':' (simple_stmt | suite)
cleanup=(except_clause ':' (simple_stmt | suite))+
tail=(['except' ':' (simple_stmt | suite)]
['else' ':' (simple_stmt | suite)]
['finally' ':' (simple_stmt | suite)]) >
"""
def transform(self, node, results):
syms = self.syms
tail = [n.clone() for n in results["tail"]]
try_cleanup = [ch.clone() for ch in results["cleanup"]]
for except_clause, e_suite in find_excepts(try_cleanup):
if len(except_clause.children) == 4:
(E, comma, N) = except_clause.children[1:4]
comma.replace(Name(u"as", prefix=u" "))
if N.type != token.NAME:
# Generate a new N for the except clause
new_N = Name(self.new_name(), prefix=u" ")
target = N.clone()
target.prefix = u""
N.replace(new_N)
new_N = new_N.clone()
# Insert "old_N = new_N" as the first statement in
# the except body. This loop skips leading whitespace
# and indents
#TODO(cwinter) suite-cleanup
suite_stmts = e_suite.children
for i, stmt in enumerate(suite_stmts):
if isinstance(stmt, pytree.Node):
break
# The assignment is different if old_N is a tuple or list
# In that case, the assignment is old_N = new_N.args
if is_tuple(N) or is_list(N):
assign = Assign(target, Attr(new_N, Name(u'args')))
else:
assign = Assign(target, new_N)
#TODO(cwinter) stopgap until children becomes a smart list
for child in reversed(suite_stmts[:i]):
e_suite.insert_child(0, child)
e_suite.insert_child(i, assign)
elif N.prefix == u"":
# No space after a comma is legal; no space after "as",
# not so much.
N.prefix = u" "
#TODO(cwinter) fix this when children becomes a smart list
children = [c.clone() for c in node.children[:3]] + try_cleanup + tail
return pytree.Node(node.type, children)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for exec.
This converts usages of the exec statement into calls to a built-in
exec() function.
exec code in ns1, ns2 -> exec(code, ns1, ns2)
"""
# Local imports
from .. import pytree
from .. import fixer_base
from ..fixer_util import Comma, Name, Call
class FixExec(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
exec_stmt< 'exec' a=any 'in' b=any [',' c=any] >
|
exec_stmt< 'exec' (not atom<'(' [any] ')'>) a=any >
"""
def transform(self, node, results):
assert results
syms = self.syms
a = results["a"]
b = results.get("b")
c = results.get("c")
args = [a.clone()]
args[0].prefix = ""
if b is not None:
args.extend([Comma(), b.clone()])
if c is not None:
args.extend([Comma(), c.clone()])
return Call(Name(u"exec"), args, prefix=node.prefix)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for execfile.
This converts usages of the execfile function into calls to the built-in
exec() function.
"""
from .. import fixer_base
from ..fixer_util import (Comma, Name, Call, LParen, RParen, Dot, Node,
ArgList, String, syms)
class FixExecfile(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'execfile' trailer< '(' arglist< filename=any [',' globals=any [',' locals=any ] ] > ')' > >
|
power< 'execfile' trailer< '(' filename=any ')' > >
"""
def transform(self, node, results):
assert results
filename = results["filename"]
globals = results.get("globals")
locals = results.get("locals")
# Copy over the prefix from the right parentheses end of the execfile
# call.
execfile_paren = node.children[-1].children[-1].clone()
# Construct open().read().
open_args = ArgList([filename.clone(), Comma(), String('"rb"', ' ')],
rparen=execfile_paren)
open_call = Node(syms.power, [Name(u"open"), open_args])
read = [Node(syms.trailer, [Dot(), Name(u'read')]),
Node(syms.trailer, [LParen(), RParen()])]
open_expr = [open_call] + read
# Wrap the open call in a compile call. This is so the filename will be
# preserved in the execed code.
filename_arg = filename.clone()
filename_arg.prefix = u" "
exec_str = String(u"'exec'", u" ")
compile_args = open_expr + [Comma(), filename_arg, Comma(), exec_str]
compile_call = Call(Name(u"compile"), compile_args, u"")
# Finally, replace the execfile call with an exec call.
args = [compile_call]
if globals is not None:
args.extend([Comma(), globals.clone()])
if locals is not None:
args.extend([Comma(), locals.clone()])
return Call(Name(u"exec"), args, prefix=node.prefix)
"""
Convert use of sys.exitfunc to use the atexit module.
"""
# Author: Benjamin Peterson
from lib2to3 import pytree, fixer_base
from lib2to3.fixer_util import Name, Attr, Call, Comma, Newline, syms
class FixExitfunc(fixer_base.BaseFix):
keep_line_order = True
BM_compatible = True
PATTERN = """
(
sys_import=import_name<'import'
('sys'
|
dotted_as_names< (any ',')* 'sys' (',' any)* >
)
>
|
expr_stmt<
power< 'sys' trailer< '.' 'exitfunc' > >
'=' func=any >
)
"""
def __init__(self, *args):
super(FixExitfunc, self).__init__(*args)
def start_tree(self, tree, filename):
super(FixExitfunc, self).start_tree(tree, filename)
self.sys_import = None
def transform(self, node, results):
# First, find the sys import. We'll just hope it's global scope.
if "sys_import" in results:
if self.sys_import is None:
self.sys_import = results["sys_import"]
return
func = results["func"].clone()
func.prefix = u""
register = pytree.Node(syms.power,
Attr(Name(u"atexit"), Name(u"register"))
)
call = Call(register, [func], node.prefix)
node.replace(call)
if self.sys_import is None:
# That's interesting.
self.warning(node, "Can't find sys import; Please add an atexit "
"import at the top of your file.")
return
# Now add an atexit import after the sys import.
names = self.sys_import.children[1]
if names.type == syms.dotted_as_names:
names.append_child(Comma())
names.append_child(Name(u"atexit", u" "))
else:
containing_stmt = self.sys_import.parent
position = containing_stmt.children.index(self.sys_import)
stmt_container = containing_stmt.parent
new_import = pytree.Node(syms.import_name,
[Name(u"import"), Name(u"atexit", u" ")]
)
new = pytree.Node(syms.simple_stmt, [new_import])
containing_stmt.insert_child(position + 1, Newline())
containing_stmt.insert_child(position + 2, new)
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that changes filter(F, X) into list(filter(F, X)).
We avoid the transformation if the filter() call is directly contained
in iter(<>), list(<>), tuple(<>), sorted(<>), ...join(<>), or
for V in <>:.
NOTE: This is still not correct if the original code was depending on
filter(F, X) to return a string if X is a string and a tuple if X is a
tuple. That would require type inference, which we don't do. Let
Python 2.6 figure it out.
"""
# Local imports
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, ListComp, in_special_context
class FixFilter(fixer_base.ConditionalFix):
BM_compatible = True
PATTERN = """
filter_lambda=power<
'filter'
trailer<
'('
arglist<
lambdef< 'lambda'
(fp=NAME | vfpdef< '(' fp=NAME ')'> ) ':' xp=any
>
','
it=any
>
')'
>
>
|
power<
'filter'
trailer< '(' arglist< none='None' ',' seq=any > ')' >
>
|
power<
'filter'
args=trailer< '(' [any] ')' >
>
"""
skip_on = "future_builtins.filter"
def transform(self, node, results):
if self.should_skip(node):
return
if "filter_lambda" in results:
new = ListComp(results.get("fp").clone(),
results.get("fp").clone(),
results.get("it").clone(),
results.get("xp").clone())
elif "none" in results:
new = ListComp(Name(u"_f"),
Name(u"_f"),
results["seq"].clone(),
Name(u"_f"))
else:
if in_special_context(node):
return None
new = node.clone()
new.prefix = u""
new = Call(Name(u"list"), [new])
new.prefix = node.prefix
return new
"""Fix function attribute names (f.func_x -> f.__x__)."""
# Author: Collin Winter
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixFuncattrs(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< any+ trailer< '.' attr=('func_closure' | 'func_doc' | 'func_globals'
| 'func_name' | 'func_defaults' | 'func_code'
| 'func_dict') > any* >
"""
def transform(self, node, results):
attr = results["attr"][0]
attr.replace(Name((u"__%s__" % attr.value[5:]),
prefix=attr.prefix))
"""Remove __future__ imports
from __future__ import foo is replaced with an empty line.
"""
# Author: Christian Heimes
# Local imports
from .. import fixer_base
from ..fixer_util import BlankLine
class FixFuture(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """import_from< 'from' module_name="__future__" 'import' any >"""
# This should be run last -- some things check for the import
run_order = 10
def transform(self, node, results):
new = BlankLine()
new.prefix = node.prefix
return new
"""
Fixer that changes os.getcwdu() to os.getcwd().
"""
# Author: Victor Stinner
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixGetcwdu(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'os' trailer< dot='.' name='getcwdu' > any* >
"""
def transform(self, node, results):
name = results["name"]
name.replace(Name(u"getcwd", prefix=name.prefix))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for has_key().
Calls to .has_key() methods are expressed in terms of the 'in'
operator:
d.has_key(k) -> k in d
CAVEATS:
1) While the primary target of this fixer is dict.has_key(), the
fixer will change any has_key() method call, regardless of its
class.
2) Cases like this will not be converted:
m = d.has_key
if m(k):
...
Only *calls* to has_key() are converted. While it is possible to
convert the above to something like
m = d.__contains__
if m(k):
...
this is currently not done.
"""
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, parenthesize
class FixHasKey(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
anchor=power<
before=any+
trailer< '.' 'has_key' >
trailer<
'('
( not(arglist | argument<any '=' any>) arg=any
| arglist<(not argument<any '=' any>) arg=any ','>
)
')'
>
after=any*
>
|
negation=not_test<
'not'
anchor=power<
before=any+
trailer< '.' 'has_key' >
trailer<
'('
( not(arglist | argument<any '=' any>) arg=any
| arglist<(not argument<any '=' any>) arg=any ','>
)
')'
>
>
>
"""
def transform(self, node, results):
assert results
syms = self.syms
if (node.parent.type == syms.not_test and
self.pattern.match(node.parent)):
# Don't transform a node matching the first alternative of the
# pattern when its parent matches the second alternative
return None
negation = results.get("negation")
anchor = results["anchor"]
prefix = node.prefix
before = [n.clone() for n in results["before"]]
arg = results["arg"].clone()
after = results.get("after")
if after:
after = [n.clone() for n in after]
if arg.type in (syms.comparison, syms.not_test, syms.and_test,
syms.or_test, syms.test, syms.lambdef, syms.argument):
arg = parenthesize(arg)
if len(before) == 1:
before = before[0]
else:
before = pytree.Node(syms.power, before)
before.prefix = u" "
n_op = Name(u"in", prefix=u" ")
if negation:
n_not = Name(u"not", prefix=u" ")
n_op = pytree.Node(syms.comp_op, (n_not, n_op))
new = pytree.Node(syms.comparison, (arg, n_op, before))
if after:
new = parenthesize(new)
new = pytree.Node(syms.power, (new,) + tuple(after))
if node.parent.type in (syms.comparison, syms.expr, syms.xor_expr,
syms.and_expr, syms.shift_expr,
syms.arith_expr, syms.term,
syms.factor, syms.power):
new = parenthesize(new)
new.prefix = prefix
return new
"""Adjust some old Python 2 idioms to their modern counterparts.
* Change some type comparisons to isinstance() calls:
type(x) == T -> isinstance(x, T)
type(x) is T -> isinstance(x, T)
type(x) != T -> not isinstance(x, T)
type(x) is not T -> not isinstance(x, T)
* Change "while 1:" into "while True:".
* Change both
v = list(EXPR)
v.sort()
foo(v)
and the more general
v = EXPR
v.sort()
foo(v)
into
v = sorted(EXPR)
foo(v)
"""
# Author: Jacques Frechet, Collin Winter
# Local imports
from .. import fixer_base
from ..fixer_util import Call, Comma, Name, Node, BlankLine, syms
CMP = "(n='!=' | '==' | 'is' | n=comp_op< 'is' 'not' >)"
TYPE = "power< 'type' trailer< '(' x=any ')' > >"
class FixIdioms(fixer_base.BaseFix):
explicit = True # The user must ask for this fixer
PATTERN = r"""
isinstance=comparison< %s %s T=any >
|
isinstance=comparison< T=any %s %s >
|
while_stmt< 'while' while='1' ':' any+ >
|
sorted=any<
any*
simple_stmt<
expr_stmt< id1=any '='
power< list='list' trailer< '(' (not arglist<any+>) any ')' > >
>
'\n'
>
sort=
simple_stmt<
power< id2=any
trailer< '.' 'sort' > trailer< '(' ')' >
>
'\n'
>
next=any*
>
|
sorted=any<
any*
simple_stmt< expr_stmt< id1=any '=' expr=any > '\n' >
sort=
simple_stmt<
power< id2=any
trailer< '.' 'sort' > trailer< '(' ')' >
>
'\n'
>
next=any*
>
""" % (TYPE, CMP, CMP, TYPE)
def match(self, node):
r = super(FixIdioms, self).match(node)
# If we've matched one of the sort/sorted subpatterns above, we
# want to reject matches where the initial assignment and the
# subsequent .sort() call involve different identifiers.
if r and "sorted" in r:
if r["id1"] == r["id2"]:
return r
return None
return r
def transform(self, node, results):
if "isinstance" in results:
return self.transform_isinstance(node, results)
elif "while" in results:
return self.transform_while(node, results)
elif "sorted" in results:
return self.transform_sort(node, results)
else:
raise RuntimeError("Invalid match")
def transform_isinstance(self, node, results):
x = results["x"].clone() # The thing inside of type()
T = results["T"].clone() # The type being compared against
x.prefix = u""
T.prefix = u" "
test = Call(Name(u"isinstance"), [x, Comma(), T])
if "n" in results:
test.prefix = u" "
test = Node(syms.not_test, [Name(u"not"), test])
test.prefix = node.prefix
return test
def transform_while(self, node, results):
one = results["while"]
one.replace(Name(u"True", prefix=one.prefix))
def transform_sort(self, node, results):
sort_stmt = results["sort"]
next_stmt = results["next"]
list_call = results.get("list")
simple_expr = results.get("expr")
if list_call:
list_call.replace(Name(u"sorted", prefix=list_call.prefix))
elif simple_expr:
new = simple_expr.clone()
new.prefix = u""
simple_expr.replace(Call(Name(u"sorted"), [new],
prefix=simple_expr.prefix))
else:
raise RuntimeError("should not have reached here")
sort_stmt.remove()
btwn = sort_stmt.prefix
# Keep any prefix lines between the sort_stmt and the list_call and
# shove them right after the sorted() call.
if u"\n" in btwn:
if next_stmt:
# The new prefix should be everything from the sort_stmt's
# prefix up to the last newline, then the old prefix after a new
# line.
prefix_lines = (btwn.rpartition(u"\n")[0], next_stmt[0].prefix)
next_stmt[0].prefix = u"\n".join(prefix_lines)
else:
assert list_call.parent
assert list_call.next_sibling is None
# Put a blank line after list_call and set its prefix.
end_line = BlankLine()
list_call.parent.append_child(end_line)
assert list_call.next_sibling is end_line
# The new prefix should be everything up to the first new line
# of sort_stmt's prefix.
end_line.prefix = btwn.rpartition(u"\n")[0]
"""Fixer for import statements.
If spam is being imported from the local directory, this import:
from spam import eggs
Becomes:
from .spam import eggs
And this import:
import spam
Becomes:
from . import spam
"""
# Local imports
from .. import fixer_base
from os.path import dirname, join, exists, sep
from ..fixer_util import FromImport, syms, token
def traverse_imports(names):
"""
Walks over all the names imported in a dotted_as_names node.
"""
pending = [names]
while pending:
node = pending.pop()
if node.type == token.NAME:
yield node.value
elif node.type == syms.dotted_name:
yield "".join([ch.value for ch in node.children])
elif node.type == syms.dotted_as_name:
pending.append(node.children[0])
elif node.type == syms.dotted_as_names:
pending.extend(node.children[::-2])
else:
raise AssertionError("unknown node type")
class FixImport(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
import_from< 'from' imp=any 'import' ['('] any [')'] >
|
import_name< 'import' imp=any >
"""
def start_tree(self, tree, name):
super(FixImport, self).start_tree(tree, name)
self.skip = "absolute_import" in tree.future_features
def transform(self, node, results):
if self.skip:
return
imp = results['imp']
if node.type == syms.import_from:
# Some imps are top-level (eg: 'import ham')
# some are first level (eg: 'import ham.eggs')
# some are third level (eg: 'import ham.eggs as spam')
# Hence, the loop
while not hasattr(imp, 'value'):
imp = imp.children[0]
if self.probably_a_local_import(imp.value):
imp.value = u"." + imp.value
imp.changed()
else:
have_local = False
have_absolute = False
for mod_name in traverse_imports(imp):
if self.probably_a_local_import(mod_name):
have_local = True
else:
have_absolute = True
if have_absolute:
if have_local:
# We won't handle both sibling and absolute imports in the
# same statement at the moment.
self.warning(node, "absolute and local imports together")
return
new = FromImport(u".", [imp])
new.prefix = node.prefix
return new
def probably_a_local_import(self, imp_name):
if imp_name.startswith(u"."):
# Relative imports are certainly not local imports.
return False
imp_name = imp_name.split(u".", 1)[0]
base_path = dirname(self.filename)
base_path = join(base_path, imp_name)
# If there is no __init__.py next to the file its not in a package
# so can't be a relative import.
if not exists(join(dirname(base_path), "__init__.py")):
return False
for ext in [".py", sep, ".pyc", ".so", ".sl", ".pyd"]:
if exists(base_path + ext):
return True
return False
"""Fix incompatible imports and module references."""
# Authors: Collin Winter, Nick Edds
# Local imports
from .. import fixer_base
from ..fixer_util import Name, attr_chain
MAPPING = {'StringIO': 'io',
'cStringIO': 'io',
'cPickle': 'pickle',
'__builtin__' : 'builtins',
'copy_reg': 'copyreg',
'Queue': 'queue',
'SocketServer': 'socketserver',
'ConfigParser': 'configparser',
'repr': 'reprlib',
'FileDialog': 'tkinter.filedialog',
'tkFileDialog': 'tkinter.filedialog',
'SimpleDialog': 'tkinter.simpledialog',
'tkSimpleDialog': 'tkinter.simpledialog',
'tkColorChooser': 'tkinter.colorchooser',
'tkCommonDialog': 'tkinter.commondialog',
'Dialog': 'tkinter.dialog',
'Tkdnd': 'tkinter.dnd',
'tkFont': 'tkinter.font',
'tkMessageBox': 'tkinter.messagebox',
'ScrolledText': 'tkinter.scrolledtext',
'Tkconstants': 'tkinter.constants',
'Tix': 'tkinter.tix',
'ttk': 'tkinter.ttk',
'Tkinter': 'tkinter',
'markupbase': '_markupbase',
'_winreg': 'winreg',
'thread': '_thread',
'dummy_thread': '_dummy_thread',
# anydbm and whichdb are handled by fix_imports2
'dbhash': 'dbm.bsd',
'dumbdbm': 'dbm.dumb',
'dbm': 'dbm.ndbm',
'gdbm': 'dbm.gnu',
'xmlrpclib': 'xmlrpc.client',
'DocXMLRPCServer': 'xmlrpc.server',
'SimpleXMLRPCServer': 'xmlrpc.server',
'httplib': 'http.client',
'htmlentitydefs' : 'html.entities',
'HTMLParser' : 'html.parser',
'Cookie': 'http.cookies',
'cookielib': 'http.cookiejar',
'BaseHTTPServer': 'http.server',
'SimpleHTTPServer': 'http.server',
'CGIHTTPServer': 'http.server',
#'test.test_support': 'test.support',
'commands': 'subprocess',
'UserString' : 'collections',
'UserList' : 'collections',
'urlparse' : 'urllib.parse',
'robotparser' : 'urllib.robotparser',
}
def alternates(members):
return "(" + "|".join(map(repr, members)) + ")"
def build_pattern(mapping=MAPPING):
mod_list = ' | '.join(["module_name='%s'" % key for key in mapping])
bare_names = alternates(mapping.keys())
yield """name_import=import_name< 'import' ((%s) |
multiple_imports=dotted_as_names< any* (%s) any* >) >
""" % (mod_list, mod_list)
yield """import_from< 'from' (%s) 'import' ['(']
( any | import_as_name< any 'as' any > |
import_as_names< any* >) [')'] >
""" % mod_list
yield """import_name< 'import' (dotted_as_name< (%s) 'as' any > |
multiple_imports=dotted_as_names<
any* dotted_as_name< (%s) 'as' any > any* >) >
""" % (mod_list, mod_list)
# Find usages of module members in code e.g. thread.foo(bar)
yield "power< bare_with_attr=(%s) trailer<'.' any > any* >" % bare_names
class FixImports(fixer_base.BaseFix):
BM_compatible = True
keep_line_order = True
# This is overridden in fix_imports2.
mapping = MAPPING
# We want to run this fixer late, so fix_import doesn't try to make stdlib
# renames into relative imports.
run_order = 6
def build_pattern(self):
return "|".join(build_pattern(self.mapping))
def compile_pattern(self):
# We override this, so MAPPING can be pragmatically altered and the
# changes will be reflected in PATTERN.
self.PATTERN = self.build_pattern()
super(FixImports, self).compile_pattern()
# Don't match the node if it's within another match.
def match(self, node):
match = super(FixImports, self).match
results = match(node)
if results:
# Module usage could be in the trailer of an attribute lookup, so we
# might have nested matches when "bare_with_attr" is present.
if "bare_with_attr" not in results and \
any(match(obj) for obj in attr_chain(node, "parent")):
return False
return results
return False
def start_tree(self, tree, filename):
super(FixImports, self).start_tree(tree, filename)
self.replace = {}
def transform(self, node, results):
import_mod = results.get("module_name")
if import_mod:
mod_name = import_mod.value
new_name = unicode(self.mapping[mod_name])
import_mod.replace(Name(new_name, prefix=import_mod.prefix))
if "name_import" in results:
# If it's not a "from x import x, y" or "import x as y" import,
# marked its usage to be replaced.
self.replace[mod_name] = new_name
if "multiple_imports" in results:
# This is a nasty hack to fix multiple imports on a line (e.g.,
# "import StringIO, urlparse"). The problem is that I can't
# figure out an easy way to make a pattern recognize the keys of
# MAPPING randomly sprinkled in an import statement.
results = self.match(node)
if results:
self.transform(node, results)
else:
# Replace usage of the module.
bare_name = results["bare_with_attr"][0]
new_name = self.replace.get(bare_name.value)
if new_name:
bare_name.replace(Name(new_name, prefix=bare_name.prefix))
"""Fix incompatible imports and module references that must be fixed after
fix_imports."""
from . import fix_imports
MAPPING = {
'whichdb': 'dbm',
'anydbm': 'dbm',
}
class FixImports2(fix_imports.FixImports):
run_order = 7
mapping = MAPPING
"""Fixer that changes input(...) into eval(input(...))."""
# Author: Andre Roberge
# Local imports
from .. import fixer_base
from ..fixer_util import Call, Name
from .. import patcomp
context = patcomp.compile_pattern("power< 'eval' trailer< '(' any ')' > >")
class FixInput(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'input' args=trailer< '(' [any] ')' > >
"""
def transform(self, node, results):
# If we're already wrapped in an eval() call, we're done.
if context.match(node.parent.parent):
return
new = node.clone()
new.prefix = u""
return Call(Name(u"eval"), [new], prefix=node.prefix)
# Copyright 2006 Georg Brandl.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for intern().
intern(s) -> sys.intern(s)"""
# Local imports
from .. import pytree
from .. import fixer_base
from ..fixer_util import Name, Attr, touch_import
class FixIntern(fixer_base.BaseFix):
BM_compatible = True
order = "pre"
PATTERN = """
power< 'intern'
trailer< lpar='('
( not(arglist | argument<any '=' any>) obj=any
| obj=arglist<(not argument<any '=' any>) any ','> )
rpar=')' >
after=any*
>
"""
def transform(self, node, results):
if results:
# I feel like we should be able to express this logic in the
# PATTERN above but I don't know how to do it so...
obj = results['obj']
if obj:
if obj.type == self.syms.star_expr:
return # Make no change.
if (obj.type == self.syms.argument and
obj.children[0].value == '**'):
return # Make no change.
syms = self.syms
obj = results["obj"].clone()
if obj.type == syms.arglist:
newarglist = obj.clone()
else:
newarglist = pytree.Node(syms.arglist, [obj.clone()])
after = results["after"]
if after:
after = [n.clone() for n in after]
new = pytree.Node(syms.power,
Attr(Name(u"sys"), Name(u"intern")) +
[pytree.Node(syms.trailer,
[results["lpar"].clone(),
newarglist,
results["rpar"].clone()])] + after)
new.prefix = node.prefix
touch_import(None, u'sys', node)
return new
# Copyright 2008 Armin Ronacher.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that cleans up a tuple argument to isinstance after the tokens
in it were fixed. This is mainly used to remove double occurrences of
tokens as a leftover of the long -> int / unicode -> str conversion.
eg. isinstance(x, (int, long)) -> isinstance(x, (int, int))
-> isinstance(x, int)
"""
from .. import fixer_base
from ..fixer_util import token
class FixIsinstance(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power<
'isinstance'
trailer< '(' arglist< any ',' atom< '('
args=testlist_gexp< any+ >
')' > > ')' >
>
"""
run_order = 6
def transform(self, node, results):
names_inserted = set()
testlist = results["args"]
args = testlist.children
new_args = []
iterator = enumerate(args)
for idx, arg in iterator:
if arg.type == token.NAME and arg.value in names_inserted:
if idx < len(args) - 1 and args[idx + 1].type == token.COMMA:
iterator.next()
continue
else:
new_args.append(arg)
if arg.type == token.NAME:
names_inserted.add(arg.value)
if new_args and new_args[-1].type == token.COMMA:
del new_args[-1]
if len(new_args) == 1:
atom = testlist.parent
new_args[0].prefix = atom.prefix
atom.replace(new_args[0])
else:
args[:] = new_args
node.changed()
""" Fixer for itertools.(imap|ifilter|izip) --> (map|filter|zip) and
itertools.ifilterfalse --> itertools.filterfalse (bugs 2360-2363)
imports from itertools are fixed in fix_itertools_import.py
If itertools is imported as something else (ie: import itertools as it;
it.izip(spam, eggs)) method calls will not get fixed.
"""
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixItertools(fixer_base.BaseFix):
BM_compatible = True
it_funcs = "('imap'|'ifilter'|'izip'|'izip_longest'|'ifilterfalse')"
PATTERN = """
power< it='itertools'
trailer<
dot='.' func=%(it_funcs)s > trailer< '(' [any] ')' > >
|
power< func=%(it_funcs)s trailer< '(' [any] ')' > >
""" %(locals())
# Needs to be run after fix_(map|zip|filter)
run_order = 6
def transform(self, node, results):
prefix = None
func = results['func'][0]
if ('it' in results and
func.value not in (u'ifilterfalse', u'izip_longest')):
dot, it = (results['dot'], results['it'])
# Remove the 'itertools'
prefix = it.prefix
it.remove()
# Replace the node which contains ('.', 'function') with the
# function (to be consistent with the second part of the pattern)
dot.remove()
func.parent.replace(func)
prefix = prefix or func.prefix
func.replace(Name(func.value[1:], prefix=prefix))
""" Fixer for imports of itertools.(imap|ifilter|izip|ifilterfalse) """
# Local imports
from lib2to3 import fixer_base
from lib2to3.fixer_util import BlankLine, syms, token
class FixItertoolsImports(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
import_from< 'from' 'itertools' 'import' imports=any >
""" %(locals())
def transform(self, node, results):
imports = results['imports']
if imports.type == syms.import_as_name or not imports.children:
children = [imports]
else:
children = imports.children
for child in children[::2]:
if child.type == token.NAME:
member = child.value
name_node = child
elif child.type == token.STAR:
# Just leave the import as is.
return
else:
assert child.type == syms.import_as_name
name_node = child.children[0]
member_name = name_node.value
if member_name in (u'imap', u'izip', u'ifilter'):
child.value = None
child.remove()
elif member_name in (u'ifilterfalse', u'izip_longest'):
node.changed()
name_node.value = (u'filterfalse' if member_name[1] == u'f'
else u'zip_longest')
# Make sure the import statement is still sane
children = imports.children[:] or [imports]
remove_comma = True
for child in children:
if remove_comma and child.type == token.COMMA:
child.remove()
else:
remove_comma ^= True
while children and children[-1].type == token.COMMA:
children.pop().remove()
# If there are no imports left, just get rid of the entire statement
if (not (imports.children or getattr(imports, 'value', None)) or
imports.parent is None):
p = node.prefix
node = BlankLine()
node.prefix = p
return node
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that turns 'long' into 'int' everywhere.
"""
# Local imports
from lib2to3 import fixer_base
from lib2to3.fixer_util import is_probably_builtin
class FixLong(fixer_base.BaseFix):
BM_compatible = True
PATTERN = "'long'"
def transform(self, node, results):
if is_probably_builtin(node):
node.value = u"int"
node.changed()
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that changes map(F, ...) into list(map(F, ...)) unless there
exists a 'from future_builtins import map' statement in the top-level
namespace.
As a special case, map(None, X) is changed into list(X). (This is
necessary because the semantics are changed in this case -- the new
map(None, X) is equivalent to [(x,) for x in X].)
We avoid the transformation (except for the special case mentioned
above) if the map() call is directly contained in iter(<>), list(<>),
tuple(<>), sorted(<>), ...join(<>), or for V in <>:.
NOTE: This is still not correct if the original code was depending on
map(F, X, Y, ...) to go on until the longest argument is exhausted,
substituting None for missing values -- like zip(), it now stops as
soon as the shortest argument is exhausted.
"""
# Local imports
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, ListComp, in_special_context
from ..pygram import python_symbols as syms
class FixMap(fixer_base.ConditionalFix):
BM_compatible = True
PATTERN = """
map_none=power<
'map'
trailer< '(' arglist< 'None' ',' arg=any [','] > ')' >
>
|
map_lambda=power<
'map'
trailer<
'('
arglist<
lambdef< 'lambda'
(fp=NAME | vfpdef< '(' fp=NAME ')'> ) ':' xp=any
>
','
it=any
>
')'
>
>
|
power<
'map' trailer< '(' [arglist=any] ')' >
>
"""
skip_on = 'future_builtins.map'
def transform(self, node, results):
if self.should_skip(node):
return
if node.parent.type == syms.simple_stmt:
self.warning(node, "You should use a for loop here")
new = node.clone()
new.prefix = u""
new = Call(Name(u"list"), [new])
elif "map_lambda" in results:
new = ListComp(results["xp"].clone(),
results["fp"].clone(),
results["it"].clone())
else:
if "map_none" in results:
new = results["arg"].clone()
else:
if "arglist" in results:
args = results["arglist"]
if args.type == syms.arglist and \
args.children[0].type == token.NAME and \
args.children[0].value == "None":
self.warning(node, "cannot convert map(None, ...) "
"with multiple arguments because map() "
"now truncates to the shortest sequence")
return
if in_special_context(node):
return None
new = node.clone()
new.prefix = u""
new = Call(Name(u"list"), [new])
new.prefix = node.prefix
return new
"""Fix bound method attributes (method.im_? -> method.__?__).
"""
# Author: Christian Heimes
# Local imports
from .. import fixer_base
from ..fixer_util import Name
MAP = {
"im_func" : "__func__",
"im_self" : "__self__",
"im_class" : "__self__.__class__"
}
class FixMethodattrs(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< any+ trailer< '.' attr=('im_func' | 'im_self' | 'im_class') > any* >
"""
def transform(self, node, results):
attr = results["attr"][0]
new = unicode(MAP[attr.value])
attr.replace(Name(new, prefix=attr.prefix))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that turns <> into !=."""
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
class FixNe(fixer_base.BaseFix):
# This is so simple that we don't need the pattern compiler.
_accept_type = token.NOTEQUAL
def match(self, node):
# Override
return node.value == u"<>"
def transform(self, node, results):
new = pytree.Leaf(token.NOTEQUAL, u"!=", prefix=node.prefix)
return new
"""Fixer for __nonzero__ -> __bool__ methods."""
# Author: Collin Winter
# Local imports
from .. import fixer_base
from ..fixer_util import Name, syms
class FixNonzero(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
classdef< 'class' any+ ':'
suite< any*
funcdef< 'def' name='__nonzero__'
parameters< '(' NAME ')' > any+ >
any* > >
"""
def transform(self, node, results):
name = results["name"]
new = Name(u"__bool__", prefix=name.prefix)
name.replace(new)
"""Fixer that turns 1L into 1, 0755 into 0o755.
"""
# Copyright 2007 Georg Brandl.
# Licensed to PSF under a Contributor Agreement.
# Local imports
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Number
class FixNumliterals(fixer_base.BaseFix):
# This is so simple that we don't need the pattern compiler.
_accept_type = token.NUMBER
def match(self, node):
# Override
return (node.value.startswith(u"0") or node.value[-1] in u"Ll")
def transform(self, node, results):
val = node.value
if val[-1] in u'Ll':
val = val[:-1]
elif val.startswith(u'0') and val.isdigit() and len(set(val)) > 1:
val = u"0o" + val[1:]
return Number(val, prefix=node.prefix)
"""Fixer for operator functions.
operator.isCallable(obj) -> hasattr(obj, '__call__')
operator.sequenceIncludes(obj) -> operator.contains(obj)
operator.isSequenceType(obj) -> isinstance(obj, collections.Sequence)
operator.isMappingType(obj) -> isinstance(obj, collections.Mapping)
operator.isNumberType(obj) -> isinstance(obj, numbers.Number)
operator.repeat(obj, n) -> operator.mul(obj, n)
operator.irepeat(obj, n) -> operator.imul(obj, n)
"""
# Local imports
from lib2to3 import fixer_base
from lib2to3.fixer_util import Call, Name, String, touch_import
def invocation(s):
def dec(f):
f.invocation = s
return f
return dec
class FixOperator(fixer_base.BaseFix):
BM_compatible = True
order = "pre"
methods = """
method=('isCallable'|'sequenceIncludes'
|'isSequenceType'|'isMappingType'|'isNumberType'
|'repeat'|'irepeat')
"""
obj = "'(' obj=any ')'"
PATTERN = """
power< module='operator'
trailer< '.' %(methods)s > trailer< %(obj)s > >
|
power< %(methods)s trailer< %(obj)s > >
""" % dict(methods=methods, obj=obj)
def transform(self, node, results):
method = self._check_method(node, results)
if method is not None:
return method(node, results)
@invocation("operator.contains(%s)")
def _sequenceIncludes(self, node, results):
return self._handle_rename(node, results, u"contains")
@invocation("hasattr(%s, '__call__')")
def _isCallable(self, node, results):
obj = results["obj"]
args = [obj.clone(), String(u", "), String(u"'__call__'")]
return Call(Name(u"hasattr"), args, prefix=node.prefix)
@invocation("operator.mul(%s)")
def _repeat(self, node, results):
return self._handle_rename(node, results, u"mul")
@invocation("operator.imul(%s)")
def _irepeat(self, node, results):
return self._handle_rename(node, results, u"imul")
@invocation("isinstance(%s, collections.Sequence)")
def _isSequenceType(self, node, results):
return self._handle_type2abc(node, results, u"collections", u"Sequence")
@invocation("isinstance(%s, collections.Mapping)")
def _isMappingType(self, node, results):
return self._handle_type2abc(node, results, u"collections", u"Mapping")
@invocation("isinstance(%s, numbers.Number)")
def _isNumberType(self, node, results):
return self._handle_type2abc(node, results, u"numbers", u"Number")
def _handle_rename(self, node, results, name):
method = results["method"][0]
method.value = name
method.changed()
def _handle_type2abc(self, node, results, module, abc):
touch_import(None, module, node)
obj = results["obj"]
args = [obj.clone(), String(u", " + u".".join([module, abc]))]
return Call(Name(u"isinstance"), args, prefix=node.prefix)
def _check_method(self, node, results):
method = getattr(self, "_" + results["method"][0].value.encode("ascii"))
if callable(method):
if "module" in results:
return method
else:
sub = (unicode(results["obj"]),)
invocation_str = unicode(method.invocation) % sub
self.warning(node, u"You should use '%s' here." % invocation_str)
return None
"""Fixer that addes parentheses where they are required
This converts ``[x for x in 1, 2]`` to ``[x for x in (1, 2)]``."""
# By Taek Joo Kim and Benjamin Peterson
# Local imports
from .. import fixer_base
from ..fixer_util import LParen, RParen
# XXX This doesn't support nested for loops like [x for x in 1, 2 for x in 1, 2]
class FixParen(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
atom< ('[' | '(')
(listmaker< any
comp_for<
'for' NAME 'in'
target=testlist_safe< any (',' any)+ [',']
>
[any]
>
>
|
testlist_gexp< any
comp_for<
'for' NAME 'in'
target=testlist_safe< any (',' any)+ [',']
>
[any]
>
>)
(']' | ')') >
"""
def transform(self, node, results):
target = results["target"]
lparen = LParen()
lparen.prefix = target.prefix
target.prefix = u"" # Make it hug the parentheses
target.insert_child(0, lparen)
target.append_child(RParen())
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for print.
Change:
'print' into 'print()'
'print ...' into 'print(...)'
'print ... ,' into 'print(..., end=" ")'
'print >>x, ...' into 'print(..., file=x)'
No changes are applied if print_function is imported from __future__
"""
# Local imports
from .. import patcomp
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, Comma, String, is_tuple
parend_expr = patcomp.compile_pattern(
"""atom< '(' [atom|STRING|NAME] ')' >"""
)
class FixPrint(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
simple_stmt< any* bare='print' any* > | print_stmt
"""
def transform(self, node, results):
assert results
bare_print = results.get("bare")
if bare_print:
# Special-case print all by itself
bare_print.replace(Call(Name(u"print"), [],
prefix=bare_print.prefix))
return
assert node.children[0] == Name(u"print")
args = node.children[1:]
if len(args) == 1 and parend_expr.match(args[0]):
# We don't want to keep sticking parens around an
# already-parenthesised expression.
return
sep = end = file = None
if args and args[-1] == Comma():
args = args[:-1]
end = " "
if args and args[0] == pytree.Leaf(token.RIGHTSHIFT, u">>"):
assert len(args) >= 2
file = args[1].clone()
args = args[3:] # Strip a possible comma after the file expression
# Now synthesize a print(args, sep=..., end=..., file=...) node.
l_args = [arg.clone() for arg in args]
if l_args:
l_args[0].prefix = u""
if sep is not None or end is not None or file is not None:
if sep is not None:
self.add_kwarg(l_args, u"sep", String(repr(sep)))
if end is not None:
self.add_kwarg(l_args, u"end", String(repr(end)))
if file is not None:
self.add_kwarg(l_args, u"file", file)
n_stmt = Call(Name(u"print"), l_args)
n_stmt.prefix = node.prefix
return n_stmt
def add_kwarg(self, l_nodes, s_kwd, n_expr):
# XXX All this prefix-setting may lose comments (though rarely)
n_expr.prefix = u""
n_argument = pytree.Node(self.syms.argument,
(Name(s_kwd),
pytree.Leaf(token.EQUAL, u"="),
n_expr))
if l_nodes:
l_nodes.append(Comma())
n_argument.prefix = u" "
l_nodes.append(n_argument)
"""Fixer for 'raise E, V, T'
raise -> raise
raise E -> raise E
raise E, V -> raise E(V)
raise E, V, T -> raise E(V).with_traceback(T)
raise E, None, T -> raise E.with_traceback(T)
raise (((E, E'), E''), E'''), V -> raise E(V)
raise "foo", V, T -> warns about string exceptions
CAVEATS:
1) "raise E, V" will be incorrectly translated if V is an exception
instance. The correct Python 3 idiom is
raise E from V
but since we can't detect instance-hood by syntax alone and since
any client code would have to be changed as well, we don't automate
this.
"""
# Author: Collin Winter
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, Attr, ArgList, is_tuple
class FixRaise(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
raise_stmt< 'raise' exc=any [',' val=any [',' tb=any]] >
"""
def transform(self, node, results):
syms = self.syms
exc = results["exc"].clone()
if exc.type == token.STRING:
msg = "Python 3 does not support string exceptions"
self.cannot_convert(node, msg)
return
# Python 2 supports
# raise ((((E1, E2), E3), E4), E5), V
# as a synonym for
# raise E1, V
# Since Python 3 will not support this, we recurse down any tuple
# literals, always taking the first element.
if is_tuple(exc):
while is_tuple(exc):
# exc.children[1:-1] is the unparenthesized tuple
# exc.children[1].children[0] is the first element of the tuple
exc = exc.children[1].children[0].clone()
exc.prefix = u" "
if "val" not in results:
# One-argument raise
new = pytree.Node(syms.raise_stmt, [Name(u"raise"), exc])
new.prefix = node.prefix
return new
val = results["val"].clone()
if is_tuple(val):
args = [c.clone() for c in val.children[1:-1]]
else:
val.prefix = u""
args = [val]
if "tb" in results:
tb = results["tb"].clone()
tb.prefix = u""
e = exc
# If there's a traceback and None is passed as the value, then don't
# add a call, since the user probably just wants to add a
# traceback. See issue #9661.
if val.type != token.NAME or val.value != u"None":
e = Call(exc, args)
with_tb = Attr(e, Name(u'with_traceback')) + [ArgList([tb])]
new = pytree.Node(syms.simple_stmt, [Name(u"raise")] + with_tb)
new.prefix = node.prefix
return new
else:
return pytree.Node(syms.raise_stmt,
[Name(u"raise"), Call(exc, args)],
prefix=node.prefix)
"""Fixer that changes raw_input(...) into input(...)."""
# Author: Andre Roberge
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixRawInput(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< name='raw_input' trailer< '(' [any] ')' > any* >
"""
def transform(self, node, results):
name = results["name"]
name.replace(Name(u"input", prefix=name.prefix))
# Copyright 2008 Armin Ronacher.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for reduce().
Makes sure reduce() is imported from the functools module if reduce is
used in that module.
"""
from lib2to3 import fixer_base
from lib2to3.fixer_util import touch_import
class FixReduce(fixer_base.BaseFix):
BM_compatible = True
order = "pre"
PATTERN = """
power< 'reduce'
trailer< '('
arglist< (
(not(argument<any '=' any>) any ','
not(argument<any '=' any>) any) |
(not(argument<any '=' any>) any ','
not(argument<any '=' any>) any ','
not(argument<any '=' any>) any)
) >
')' >
>
"""
def transform(self, node, results):
touch_import(u'functools', u'reduce', node)
"""Fix incompatible renames
Fixes:
* sys.maxint -> sys.maxsize
"""
# Author: Christian Heimes
# based on Collin Winter's fix_import
# Local imports
from .. import fixer_base
from ..fixer_util import Name, attr_chain
MAPPING = {"sys": {"maxint" : "maxsize"},
}
LOOKUP = {}
def alternates(members):
return "(" + "|".join(map(repr, members)) + ")"
def build_pattern():
#bare = set()
for module, replace in MAPPING.items():
for old_attr, new_attr in replace.items():
LOOKUP[(module, old_attr)] = new_attr
#bare.add(module)
#bare.add(old_attr)
#yield """
# import_name< 'import' (module=%r
# | dotted_as_names< any* module=%r any* >) >
# """ % (module, module)
yield """
import_from< 'from' module_name=%r 'import'
( attr_name=%r | import_as_name< attr_name=%r 'as' any >) >
""" % (module, old_attr, old_attr)
yield """
power< module_name=%r trailer< '.' attr_name=%r > any* >
""" % (module, old_attr)
#yield """bare_name=%s""" % alternates(bare)
class FixRenames(fixer_base.BaseFix):
BM_compatible = True
PATTERN = "|".join(build_pattern())
order = "pre" # Pre-order tree traversal
# Don't match the node if it's within another match
def match(self, node):
match = super(FixRenames, self).match
results = match(node)
if results:
if any(match(obj) for obj in attr_chain(node, "parent")):
return False
return results
return False
#def start_tree(self, tree, filename):
# super(FixRenames, self).start_tree(tree, filename)
# self.replace = {}
def transform(self, node, results):
mod_name = results.get("module_name")
attr_name = results.get("attr_name")
#bare_name = results.get("bare_name")
#import_mod = results.get("module")
if mod_name and attr_name:
new_attr = unicode(LOOKUP[(mod_name.value, attr_name.value)])
attr_name.replace(Name(new_attr, prefix=attr_name.prefix))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that transforms `xyzzy` into repr(xyzzy)."""
# Local imports
from .. import fixer_base
from ..fixer_util import Call, Name, parenthesize
class FixRepr(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
atom < '`' expr=any '`' >
"""
def transform(self, node, results):
expr = results["expr"].clone()
if expr.type == self.syms.testlist1:
expr = parenthesize(expr)
return Call(Name(u"repr"), [expr], prefix=node.prefix)
"""
Optional fixer to transform set() calls to set literals.
"""
# Author: Benjamin Peterson
from lib2to3 import fixer_base, pytree
from lib2to3.fixer_util import token, syms
class FixSetLiteral(fixer_base.BaseFix):
BM_compatible = True
explicit = True
PATTERN = """power< 'set' trailer< '('
(atom=atom< '[' (items=listmaker< any ((',' any)* [',']) >
|
single=any) ']' >
|
atom< '(' items=testlist_gexp< any ((',' any)* [',']) > ')' >
)
')' > >
"""
def transform(self, node, results):
single = results.get("single")
if single:
# Make a fake listmaker
fake = pytree.Node(syms.listmaker, [single.clone()])
single.replace(fake)
items = fake
else:
items = results["items"]
# Build the contents of the literal
literal = [pytree.Leaf(token.LBRACE, u"{")]
literal.extend(n.clone() for n in items.children)
literal.append(pytree.Leaf(token.RBRACE, u"}"))
# Set the prefix of the right brace to that of the ')' or ']'
literal[-1].prefix = items.next_sibling.prefix
maker = pytree.Node(syms.dictsetmaker, literal)
maker.prefix = node.prefix
# If the original was a one tuple, we need to remove the extra comma.
if len(maker.children) == 4:
n = maker.children[2]
n.remove()
maker.children[-1].prefix = n.prefix
# Finally, replace the set call with our shiny new literal.
return maker
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for StandardError -> Exception."""
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixStandarderror(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
'StandardError'
"""
def transform(self, node, results):
return Name(u"Exception", prefix=node.prefix)
"""Fixer for sys.exc_{type, value, traceback}
sys.exc_type -> sys.exc_info()[0]
sys.exc_value -> sys.exc_info()[1]
sys.exc_traceback -> sys.exc_info()[2]
"""
# By Jeff Balogh and Benjamin Peterson
# Local imports
from .. import fixer_base
from ..fixer_util import Attr, Call, Name, Number, Subscript, Node, syms
class FixSysExc(fixer_base.BaseFix):
# This order matches the ordering of sys.exc_info().
exc_info = [u"exc_type", u"exc_value", u"exc_traceback"]
BM_compatible = True
PATTERN = """
power< 'sys' trailer< dot='.' attribute=(%s) > >
""" % '|'.join("'%s'" % e for e in exc_info)
def transform(self, node, results):
sys_attr = results["attribute"][0]
index = Number(self.exc_info.index(sys_attr.value))
call = Call(Name(u"exc_info"), prefix=sys_attr.prefix)
attr = Attr(Name(u"sys"), call)
attr[1].children[0].prefix = results["dot"].prefix
attr.append(Subscript(index))
return Node(syms.power, attr, prefix=node.prefix)
"""Fixer for generator.throw(E, V, T).
g.throw(E) -> g.throw(E)
g.throw(E, V) -> g.throw(E(V))
g.throw(E, V, T) -> g.throw(E(V).with_traceback(T))
g.throw("foo"[, V[, T]]) will warn about string exceptions."""
# Author: Collin Winter
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, ArgList, Attr, is_tuple
class FixThrow(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< any trailer< '.' 'throw' >
trailer< '(' args=arglist< exc=any ',' val=any [',' tb=any] > ')' >
>
|
power< any trailer< '.' 'throw' > trailer< '(' exc=any ')' > >
"""
def transform(self, node, results):
syms = self.syms
exc = results["exc"].clone()
if exc.type is token.STRING:
self.cannot_convert(node, "Python 3 does not support string exceptions")
return
# Leave "g.throw(E)" alone
val = results.get(u"val")
if val is None:
return
val = val.clone()
if is_tuple(val):
args = [c.clone() for c in val.children[1:-1]]
else:
val.prefix = u""
args = [val]
throw_args = results["args"]
if "tb" in results:
tb = results["tb"].clone()
tb.prefix = u""
e = Call(exc, args)
with_tb = Attr(e, Name(u'with_traceback')) + [ArgList([tb])]
throw_args.replace(pytree.Node(syms.power, with_tb))
else:
throw_args.replace(Call(exc, args))
"""Fixer for function definitions with tuple parameters.
def func(((a, b), c), d):
...
->
def func(x, d):
((a, b), c) = x
...
It will also support lambdas:
lambda (x, y): x + y -> lambda t: t[0] + t[1]
# The parens are a syntax error in Python 3
lambda (x): x + y -> lambda x: x + y
"""
# Author: Collin Winter
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Assign, Name, Newline, Number, Subscript, syms
def is_docstring(stmt):
return isinstance(stmt, pytree.Node) and \
stmt.children[0].type == token.STRING
class FixTupleParams(fixer_base.BaseFix):
run_order = 4 #use a lower order since lambda is part of other
#patterns
BM_compatible = True
PATTERN = """
funcdef< 'def' any parameters< '(' args=any ')' >
['->' any] ':' suite=any+ >
|
lambda=
lambdef< 'lambda' args=vfpdef< '(' inner=any ')' >
':' body=any
>
"""
def transform(self, node, results):
if "lambda" in results:
return self.transform_lambda(node, results)
new_lines = []
suite = results["suite"]
args = results["args"]
# This crap is so "def foo(...): x = 5; y = 7" is handled correctly.
# TODO(cwinter): suite-cleanup
if suite[0].children[1].type == token.INDENT:
start = 2
indent = suite[0].children[1].value
end = Newline()
else:
start = 0
indent = u"; "
end = pytree.Leaf(token.INDENT, u"")
# We need access to self for new_name(), and making this a method
# doesn't feel right. Closing over self and new_lines makes the
# code below cleaner.
def handle_tuple(tuple_arg, add_prefix=False):
n = Name(self.new_name())
arg = tuple_arg.clone()
arg.prefix = u""
stmt = Assign(arg, n.clone())
if add_prefix:
n.prefix = u" "
tuple_arg.replace(n)
new_lines.append(pytree.Node(syms.simple_stmt,
[stmt, end.clone()]))
if args.type == syms.tfpdef:
handle_tuple(args)
elif args.type == syms.typedargslist:
for i, arg in enumerate(args.children):
if arg.type == syms.tfpdef:
# Without add_prefix, the emitted code is correct,
# just ugly.
handle_tuple(arg, add_prefix=(i > 0))
if not new_lines:
return
# This isn't strictly necessary, but it plays nicely with other fixers.
# TODO(cwinter) get rid of this when children becomes a smart list
for line in new_lines:
line.parent = suite[0]
# TODO(cwinter) suite-cleanup
after = start
if start == 0:
new_lines[0].prefix = u" "
elif is_docstring(suite[0].children[start]):
new_lines[0].prefix = indent
after = start + 1
for line in new_lines:
line.parent = suite[0]
suite[0].children[after:after] = new_lines
for i in range(after+1, after+len(new_lines)+1):
suite[0].children[i].prefix = indent
suite[0].changed()
def transform_lambda(self, node, results):
args = results["args"]
body = results["body"]
inner = simplify_args(results["inner"])
# Replace lambda ((((x)))): x with lambda x: x
if inner.type == token.NAME:
inner = inner.clone()
inner.prefix = u" "
args.replace(inner)
return
params = find_params(args)
to_index = map_to_index(params)
tup_name = self.new_name(tuple_name(params))
new_param = Name(tup_name, prefix=u" ")
args.replace(new_param.clone())
for n in body.post_order():
if n.type == token.NAME and n.value in to_index:
subscripts = [c.clone() for c in to_index[n.value]]
new = pytree.Node(syms.power,
[new_param.clone()] + subscripts)
new.prefix = n.prefix
n.replace(new)
### Helper functions for transform_lambda()
def simplify_args(node):
if node.type in (syms.vfplist, token.NAME):
return node
elif node.type == syms.vfpdef:
# These look like vfpdef< '(' x ')' > where x is NAME
# or another vfpdef instance (leading to recursion).
while node.type == syms.vfpdef:
node = node.children[1]
return node
raise RuntimeError("Received unexpected node %s" % node)
def find_params(node):
if node.type == syms.vfpdef:
return find_params(node.children[1])
elif node.type == token.NAME:
return node.value
return [find_params(c) for c in node.children if c.type != token.COMMA]
def map_to_index(param_list, prefix=[], d=None):
if d is None:
d = {}
for i, obj in enumerate(param_list):
trailer = [Subscript(Number(unicode(i)))]
if isinstance(obj, list):
map_to_index(obj, trailer, d=d)
else:
d[obj] = prefix + trailer
return d
def tuple_name(param_list):
l = []
for obj in param_list:
if isinstance(obj, list):
l.append(tuple_name(obj))
else:
l.append(obj)
return u"_".join(l)
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for removing uses of the types module.
These work for only the known names in the types module. The forms above
can include types. or not. ie, It is assumed the module is imported either as:
import types
from types import ... # either * or specific types
The import statements are not modified.
There should be another fixer that handles at least the following constants:
type([]) -> list
type(()) -> tuple
type('') -> str
"""
# Local imports
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name
_TYPE_MAPPING = {
'BooleanType' : 'bool',
'BufferType' : 'memoryview',
'ClassType' : 'type',
'ComplexType' : 'complex',
'DictType': 'dict',
'DictionaryType' : 'dict',
'EllipsisType' : 'type(Ellipsis)',
#'FileType' : 'io.IOBase',
'FloatType': 'float',
'IntType': 'int',
'ListType': 'list',
'LongType': 'int',
'ObjectType' : 'object',
'NoneType': 'type(None)',
'NotImplementedType' : 'type(NotImplemented)',
'SliceType' : 'slice',
'StringType': 'bytes', # XXX ?
'StringTypes' : '(str,)', # XXX ?
'TupleType': 'tuple',
'TypeType' : 'type',
'UnicodeType': 'str',
'XRangeType' : 'range',
}
_pats = ["power< 'types' trailer< '.' name='%s' > >" % t for t in _TYPE_MAPPING]
class FixTypes(fixer_base.BaseFix):
BM_compatible = True
PATTERN = '|'.join(_pats)
def transform(self, node, results):
new_value = unicode(_TYPE_MAPPING.get(results["name"].value))
if new_value:
return Name(new_value, prefix=node.prefix)
return None
r"""Fixer for unicode.
* Changes unicode to str and unichr to chr.
* If "...\u..." is not unicode literal change it into "...\\u...".
* Change u"..." into "...".
"""
from ..pgen2 import token
from .. import fixer_base
_mapping = {u"unichr" : u"chr", u"unicode" : u"str"}
class FixUnicode(fixer_base.BaseFix):
BM_compatible = True
PATTERN = "STRING | 'unicode' | 'unichr'"
def start_tree(self, tree, filename):
super(FixUnicode, self).start_tree(tree, filename)
self.unicode_literals = 'unicode_literals' in tree.future_features
def transform(self, node, results):
if node.type == token.NAME:
new = node.clone()
new.value = _mapping[node.value]
return new
elif node.type == token.STRING:
val = node.value
if not self.unicode_literals and val[0] in u'\'"' and u'\\' in val:
val = ur'\\'.join([
v.replace(u'\\u', ur'\\u').replace(u'\\U', ur'\\U')
for v in val.split(ur'\\')
])
if val[0] in u'uU':
val = val[1:]
if val == node.value:
return node
new = node.clone()
new.value = val
return new
"""Fix changes imports of urllib which are now incompatible.
This is rather similar to fix_imports, but because of the more
complex nature of the fixing for urllib, it has its own fixer.
"""
# Author: Nick Edds
# Local imports
from lib2to3.fixes.fix_imports import alternates, FixImports
from lib2to3 import fixer_base
from lib2to3.fixer_util import (Name, Comma, FromImport, Newline,
find_indentation, Node, syms)
MAPPING = {"urllib": [
("urllib.request",
["URLopener", "FancyURLopener", "urlretrieve",
"_urlopener", "urlopen", "urlcleanup",
"pathname2url", "url2pathname"]),
("urllib.parse",
["quote", "quote_plus", "unquote", "unquote_plus",
"urlencode", "splitattr", "splithost", "splitnport",
"splitpasswd", "splitport", "splitquery", "splittag",
"splittype", "splituser", "splitvalue", ]),
("urllib.error",
["ContentTooShortError"])],
"urllib2" : [
("urllib.request",
["urlopen", "install_opener", "build_opener",
"Request", "OpenerDirector", "BaseHandler",
"HTTPDefaultErrorHandler", "HTTPRedirectHandler",
"HTTPCookieProcessor", "ProxyHandler",
"HTTPPasswordMgr",
"HTTPPasswordMgrWithDefaultRealm",
"AbstractBasicAuthHandler",
"HTTPBasicAuthHandler", "ProxyBasicAuthHandler",
"AbstractDigestAuthHandler",
"HTTPDigestAuthHandler", "ProxyDigestAuthHandler",
"HTTPHandler", "HTTPSHandler", "FileHandler",
"FTPHandler", "CacheFTPHandler",
"UnknownHandler"]),
("urllib.error",
["URLError", "HTTPError"]),
]
}
# Duplicate the url parsing functions for urllib2.
MAPPING["urllib2"].append(MAPPING["urllib"][1])
def build_pattern():
bare = set()
for old_module, changes in MAPPING.items():
for change in changes:
new_module, members = change
members = alternates(members)
yield """import_name< 'import' (module=%r
| dotted_as_names< any* module=%r any* >) >
""" % (old_module, old_module)
yield """import_from< 'from' mod_member=%r 'import'
( member=%s | import_as_name< member=%s 'as' any > |
import_as_names< members=any* >) >
""" % (old_module, members, members)
yield """import_from< 'from' module_star=%r 'import' star='*' >
""" % old_module
yield """import_name< 'import'
dotted_as_name< module_as=%r 'as' any > >
""" % old_module
# bare_with_attr has a special significance for FixImports.match().
yield """power< bare_with_attr=%r trailer< '.' member=%s > any* >
""" % (old_module, members)
class FixUrllib(FixImports):
def build_pattern(self):
return "|".join(build_pattern())
def transform_import(self, node, results):
"""Transform for the basic import case. Replaces the old
import name with a comma separated list of its
replacements.
"""
import_mod = results.get("module")
pref = import_mod.prefix
names = []
# create a Node list of the replacement modules
for name in MAPPING[import_mod.value][:-1]:
names.extend([Name(name[0], prefix=pref), Comma()])
names.append(Name(MAPPING[import_mod.value][-1][0], prefix=pref))
import_mod.replace(names)
def transform_member(self, node, results):
"""Transform for imports of specific module elements. Replaces
the module to be imported from with the appropriate new
module.
"""
mod_member = results.get("mod_member")
pref = mod_member.prefix
member = results.get("member")
# Simple case with only a single member being imported
if member:
# this may be a list of length one, or just a node
if isinstance(member, list):
member = member[0]
new_name = None
for change in MAPPING[mod_member.value]:
if member.value in change[1]:
new_name = change[0]
break
if new_name:
mod_member.replace(Name(new_name, prefix=pref))
else:
self.cannot_convert(node, "This is an invalid module element")
# Multiple members being imported
else:
# a dictionary for replacements, order matters
modules = []
mod_dict = {}
members = results["members"]
for member in members:
# we only care about the actual members
if member.type == syms.import_as_name:
as_name = member.children[2].value
member_name = member.children[0].value
else:
member_name = member.value
as_name = None
if member_name != u",":
for change in MAPPING[mod_member.value]:
if member_name in change[1]:
if change[0] not in mod_dict:
modules.append(change[0])
mod_dict.setdefault(change[0], []).append(member)
new_nodes = []
indentation = find_indentation(node)
first = True
def handle_name(name, prefix):
if name.type == syms.import_as_name:
kids = [Name(name.children[0].value, prefix=prefix),
name.children[1].clone(),
name.children[2].clone()]
return [Node(syms.import_as_name, kids)]
return [Name(name.value, prefix=prefix)]
for module in modules:
elts = mod_dict[module]
names = []
for elt in elts[:-1]:
names.extend(handle_name(elt, pref))
names.append(Comma())
names.extend(handle_name(elts[-1], pref))
new = FromImport(module, names)
if not first or node.parent.prefix.endswith(indentation):
new.prefix = indentation
new_nodes.append(new)
first = False
if new_nodes:
nodes = []
for new_node in new_nodes[:-1]:
nodes.extend([new_node, Newline()])
nodes.append(new_nodes[-1])
node.replace(nodes)
else:
self.cannot_convert(node, "All module elements are invalid")
def transform_dot(self, node, results):
"""Transform for calls to module members in code."""
module_dot = results.get("bare_with_attr")
member = results.get("member")
new_name = None
if isinstance(member, list):
member = member[0]
for change in MAPPING[module_dot.value]:
if member.value in change[1]:
new_name = change[0]
break
if new_name:
module_dot.replace(Name(new_name,
prefix=module_dot.prefix))
else:
self.cannot_convert(node, "This is an invalid module element")
def transform(self, node, results):
if results.get("module"):
self.transform_import(node, results)
elif results.get("mod_member"):
self.transform_member(node, results)
elif results.get("bare_with_attr"):
self.transform_dot(node, results)
# Renaming and star imports are not supported for these modules.
elif results.get("module_star"):
self.cannot_convert(node, "Cannot handle star imports.")
elif results.get("module_as"):
self.cannot_convert(node, "This module is now multiple modules")
"""Fixer that changes 'a ,b' into 'a, b'.
This also changes '{a :b}' into '{a: b}', but does not touch other
uses of colons. It does not touch other uses of whitespace.
"""
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
class FixWsComma(fixer_base.BaseFix):
explicit = True # The user must ask for this fixers
PATTERN = """
any<(not(',') any)+ ',' ((not(',') any)+ ',')* [not(',') any]>
"""
COMMA = pytree.Leaf(token.COMMA, u",")
COLON = pytree.Leaf(token.COLON, u":")
SEPS = (COMMA, COLON)
def transform(self, node, results):
new = node.clone()
comma = False
for child in new.children:
if child in self.SEPS:
prefix = child.prefix
if prefix.isspace() and u"\n" not in prefix:
child.prefix = u""
comma = True
else:
if comma:
prefix = child.prefix
if not prefix:
child.prefix = u" "
comma = False
return new
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that changes xrange(...) into range(...)."""
# Local imports
from .. import fixer_base
from ..fixer_util import Name, Call, consuming_calls
from .. import patcomp
class FixXrange(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power<
(name='range'|name='xrange') trailer< '(' args=any ')' >
rest=any* >
"""
def start_tree(self, tree, filename):
super(FixXrange, self).start_tree(tree, filename)
self.transformed_xranges = set()
def finish_tree(self, tree, filename):
self.transformed_xranges = None
def transform(self, node, results):
name = results["name"]
if name.value == u"xrange":
return self.transform_xrange(node, results)
elif name.value == u"range":
return self.transform_range(node, results)
else:
raise ValueError(repr(name))
def transform_xrange(self, node, results):
name = results["name"]
name.replace(Name(u"range", prefix=name.prefix))
# This prevents the new range call from being wrapped in a list later.
self.transformed_xranges.add(id(node))
def transform_range(self, node, results):
if (id(node) not in self.transformed_xranges and
not self.in_special_context(node)):
range_call = Call(Name(u"range"), [results["args"].clone()])
# Encase the range call in list().
list_call = Call(Name(u"list"), [range_call],
prefix=node.prefix)
# Put things that were after the range() call after the list call.
for n in results["rest"]:
list_call.append_child(n)
return list_call
P1 = "power< func=NAME trailer< '(' node=any ')' > any* >"
p1 = patcomp.compile_pattern(P1)
P2 = """for_stmt< 'for' any 'in' node=any ':' any* >
| comp_for< 'for' any 'in' node=any any* >
| comparison< any 'in' node=any any*>
"""
p2 = patcomp.compile_pattern(P2)
def in_special_context(self, node):
if node.parent is None:
return False
results = {}
if (node.parent.parent is not None and
self.p1.match(node.parent.parent, results) and
results["node"] is node):
# list(d.keys()) -> list(d.keys()), etc.
return results["func"].value in consuming_calls
# for ... in d.iterkeys() -> for ... in d.keys(), etc.
return self.p2.match(node.parent, results) and results["node"] is node
"""Fix "for x in f.xreadlines()" -> "for x in f".
This fixer will also convert g(f.xreadlines) into g(f.__iter__)."""
# Author: Collin Winter
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixXreadlines(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< call=any+ trailer< '.' 'xreadlines' > trailer< '(' ')' > >
|
power< any+ trailer< '.' no_call='xreadlines' > >
"""
def transform(self, node, results):
no_call = results.get("no_call")
if no_call:
no_call.replace(Name(u"__iter__", prefix=no_call.prefix))
else:
node.replace([x.clone() for x in results["call"]])
"""
Fixer that changes zip(seq0, seq1, ...) into list(zip(seq0, seq1, ...)
unless there exists a 'from future_builtins import zip' statement in the
top-level namespace.
We avoid the transformation if the zip() call is directly contained in
iter(<>), list(<>), tuple(<>), sorted(<>), ...join(<>), or for V in <>:.
"""
# Local imports
from .. import fixer_base
from ..fixer_util import Name, Call, in_special_context
class FixZip(fixer_base.ConditionalFix):
BM_compatible = True
PATTERN = """
power< 'zip' args=trailer< '(' [any] ')' >
>
"""
skip_on = "future_builtins.zip"
def transform(self, node, results):
if self.should_skip(node):
return
if in_special_context(node):
return None
new = node.clone()
new.prefix = u""
new = Call(Name(u"list"), [new])
new.prefix = node.prefix
return new
# Dummy file to make this directory a package.
"""
Main program for 2to3.
"""
from __future__ import with_statement
import sys
import os
import difflib
import logging
import shutil
import optparse
from . import refactor
def diff_texts(a, b, filename):
"""Return a unified diff of two strings."""
a = a.splitlines()
b = b.splitlines()
return difflib.unified_diff(a, b, filename, filename,
"(original)", "(refactored)",
lineterm="")
class StdoutRefactoringTool(refactor.MultiprocessRefactoringTool):
"""
A refactoring tool that can avoid overwriting its input files.
Prints output to stdout.
Output files can optionally be written to a different directory and or
have an extra file suffix appended to their name for use in situations
where you do not want to replace the input files.
"""
def __init__(self, fixers, options, explicit, nobackups, show_diffs,
input_base_dir='', output_dir='', append_suffix=''):
"""
Args:
fixers: A list of fixers to import.
options: A dict with RefactoringTool configuration.
explicit: A list of fixers to run even if they are explicit.
nobackups: If true no backup '.bak' files will be created for those
files that are being refactored.
show_diffs: Should diffs of the refactoring be printed to stdout?
input_base_dir: The base directory for all input files. This class
will strip this path prefix off of filenames before substituting
it with output_dir. Only meaningful if output_dir is supplied.
All files processed by refactor() must start with this path.
output_dir: If supplied, all converted files will be written into
this directory tree instead of input_base_dir.
append_suffix: If supplied, all files output by this tool will have
this appended to their filename. Useful for changing .py to
.py3 for example by passing append_suffix='3'.
"""
self.nobackups = nobackups
self.show_diffs = show_diffs
if input_base_dir and not input_base_dir.endswith(os.sep):
input_base_dir += os.sep
self._input_base_dir = input_base_dir
self._output_dir = output_dir
self._append_suffix = append_suffix
super(StdoutRefactoringTool, self).__init__(fixers, options, explicit)
def log_error(self, msg, *args, **kwargs):
self.errors.append((msg, args, kwargs))
self.logger.error(msg, *args, **kwargs)
def write_file(self, new_text, filename, old_text, encoding):
orig_filename = filename
if self._output_dir:
if filename.startswith(self._input_base_dir):
filename = os.path.join(self._output_dir,
filename[len(self._input_base_dir):])
else:
raise ValueError('filename %s does not start with the '
'input_base_dir %s' % (
filename, self._input_base_dir))
if self._append_suffix:
filename += self._append_suffix
if orig_filename != filename:
output_dir = os.path.dirname(filename)
if not os.path.isdir(output_dir):
os.makedirs(output_dir)
self.log_message('Writing converted %s to %s.', orig_filename,
filename)
if not self.nobackups:
# Make backup
backup = filename + ".bak"
if os.path.lexists(backup):
try:
os.remove(backup)
except os.error, err:
self.log_message("Can't remove backup %s", backup)
try:
os.rename(filename, backup)
except os.error, err:
self.log_message("Can't rename %s to %s", filename, backup)
# Actually write the new file
write = super(StdoutRefactoringTool, self).write_file
write(new_text, filename, old_text, encoding)
if not self.nobackups:
shutil.copymode(backup, filename)
if orig_filename != filename:
# Preserve the file mode in the new output directory.
shutil.copymode(orig_filename, filename)
def print_output(self, old, new, filename, equal):
if equal:
self.log_message("No changes to %s", filename)
else:
self.log_message("Refactored %s", filename)
if self.show_diffs:
diff_lines = diff_texts(old, new, filename)
try:
if self.output_lock is not None:
with self.output_lock:
for line in diff_lines:
print line
sys.stdout.flush()
else:
for line in diff_lines:
print line
except UnicodeEncodeError:
warn("couldn't encode %s's diff for your terminal" %
(filename,))
return
def warn(msg):
print >> sys.stderr, "WARNING: %s" % (msg,)
def main(fixer_pkg, args=None):
"""Main program.
Args:
fixer_pkg: the name of a package where the fixers are located.
args: optional; a list of command line arguments. If omitted,
sys.argv[1:] is used.
Returns a suggested exit status (0, 1, 2).
"""
# Set up option parser
parser = optparse.OptionParser(usage="2to3 [options] file|dir ...")
parser.add_option("-d", "--doctests_only", action="store_true",
help="Fix up doctests only")
parser.add_option("-f", "--fix", action="append", default=[],
help="Each FIX specifies a transformation; default: all")
parser.add_option("-j", "--processes", action="store", default=1,
type="int", help="Run 2to3 concurrently")
parser.add_option("-x", "--nofix", action="append", default=[],
help="Prevent a transformation from being run")
parser.add_option("-l", "--list-fixes", action="store_true",
help="List available transformations")
parser.add_option("-p", "--print-function", action="store_true",
help="Modify the grammar so that print() is a function")
parser.add_option("-v", "--verbose", action="store_true",
help="More verbose logging")
parser.add_option("--no-diffs", action="store_true",
help="Don't show diffs of the refactoring")
parser.add_option("-w", "--write", action="store_true",
help="Write back modified files")
parser.add_option("-n", "--nobackups", action="store_true", default=False,
help="Don't write backups for modified files")
parser.add_option("-o", "--output-dir", action="store", type="str",
default="", help="Put output files in this directory "
"instead of overwriting the input files. Requires -n.")
parser.add_option("-W", "--write-unchanged-files", action="store_true",
help="Also write files even if no changes were required"
" (useful with --output-dir); implies -w.")
parser.add_option("--add-suffix", action="store", type="str", default="",
help="Append this string to all output filenames."
" Requires -n if non-empty. "
"ex: --add-suffix='3' will generate .py3 files.")
# Parse command line arguments
refactor_stdin = False
flags = {}
options, args = parser.parse_args(args)
if options.write_unchanged_files:
flags["write_unchanged_files"] = True
if not options.write:
warn("--write-unchanged-files/-W implies -w.")
options.write = True
# If we allowed these, the original files would be renamed to backup names
# but not replaced.
if options.output_dir and not options.nobackups:
parser.error("Can't use --output-dir/-o without -n.")
if options.add_suffix and not options.nobackups:
parser.error("Can't use --add-suffix without -n.")
if not options.write and options.no_diffs:
warn("not writing files and not printing diffs; that's not very useful")
if not options.write and options.nobackups:
parser.error("Can't use -n without -w")
if options.list_fixes:
print "Available transformations for the -f/--fix option:"
for fixname in refactor.get_all_fix_names(fixer_pkg):
print fixname
if not args:
return 0
if not args:
print >> sys.stderr, "At least one file or directory argument required."
print >> sys.stderr, "Use --help to show usage."
return 2
if "-" in args:
refactor_stdin = True
if options.write:
print >> sys.stderr, "Can't write to stdin."
return 2
if options.print_function:
flags["print_function"] = True
# Set up logging handler
level = logging.DEBUG if options.verbose else logging.INFO
logging.basicConfig(format='%(name)s: %(message)s', level=level)
logger = logging.getLogger('lib2to3.main')
# Initialize the refactoring tool
avail_fixes = set(refactor.get_fixers_from_package(fixer_pkg))
unwanted_fixes = set(fixer_pkg + ".fix_" + fix for fix in options.nofix)
explicit = set()
if options.fix:
all_present = False
for fix in options.fix:
if fix == "all":
all_present = True
else:
explicit.add(fixer_pkg + ".fix_" + fix)
requested = avail_fixes.union(explicit) if all_present else explicit
else:
requested = avail_fixes.union(explicit)
fixer_names = requested.difference(unwanted_fixes)
input_base_dir = os.path.commonprefix(args)
if (input_base_dir and not input_base_dir.endswith(os.sep)
and not os.path.isdir(input_base_dir)):
# One or more similar names were passed, their directory is the base.
# os.path.commonprefix() is ignorant of path elements, this corrects
# for that weird API.
input_base_dir = os.path.dirname(input_base_dir)
if options.output_dir:
input_base_dir = input_base_dir.rstrip(os.sep)
logger.info('Output in %r will mirror the input directory %r layout.',
options.output_dir, input_base_dir)
rt = StdoutRefactoringTool(
sorted(fixer_names), flags, sorted(explicit),
options.nobackups, not options.no_diffs,
input_base_dir=input_base_dir,
output_dir=options.output_dir,
append_suffix=options.add_suffix)
# Refactor all files and directories passed as arguments
if not rt.errors:
if refactor_stdin:
rt.refactor_stdin()
else:
try:
rt.refactor(args, options.write, options.doctests_only,
options.processes)
except refactor.MultiprocessingUnsupported:
assert options.processes > 1
print >> sys.stderr, "Sorry, -j isn't " \
"supported on this platform."
return 1
rt.summarize()
# Return error status (0 if rt.errors is zero)
return int(bool(rt.errors))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Pattern compiler.
The grammar is taken from PatternGrammar.txt.
The compiler compiles a pattern to a pytree.*Pattern instance.
"""
__author__ = "Guido van Rossum <[email protected]>"
# Python imports
import StringIO
# Fairly local imports
from .pgen2 import driver, literals, token, tokenize, parse, grammar
# Really local imports
from . import pytree
from . import pygram
class PatternSyntaxError(Exception):
pass
def tokenize_wrapper(input):
"""Tokenizes a string suppressing significant whitespace."""
skip = set((token.NEWLINE, token.INDENT, token.DEDENT))
tokens = tokenize.generate_tokens(StringIO.StringIO(input).readline)
for quintuple in tokens:
type, value, start, end, line_text = quintuple
if type not in skip:
yield quintuple
class PatternCompiler(object):
def __init__(self, grammar_file=None):
"""Initializer.
Takes an optional alternative filename for the pattern grammar.
"""
if grammar_file is None:
self.grammar = pygram.pattern_grammar
self.syms = pygram.pattern_symbols
else:
self.grammar = driver.load_grammar(grammar_file)
self.syms = pygram.Symbols(self.grammar)
self.pygrammar = pygram.python_grammar
self.pysyms = pygram.python_symbols
self.driver = driver.Driver(self.grammar, convert=pattern_convert)
def compile_pattern(self, input, debug=False, with_tree=False):
"""Compiles a pattern string to a nested pytree.*Pattern object."""
tokens = tokenize_wrapper(input)
try:
root = self.driver.parse_tokens(tokens, debug=debug)
except parse.ParseError as e:
raise PatternSyntaxError(str(e))
if with_tree:
return self.compile_node(root), root
else:
return self.compile_node(root)
def compile_node(self, node):
"""Compiles a node, recursively.
This is one big switch on the node type.
"""
# XXX Optimize certain Wildcard-containing-Wildcard patterns
# that can be merged
if node.type == self.syms.Matcher:
node = node.children[0] # Avoid unneeded recursion
if node.type == self.syms.Alternatives:
# Skip the odd children since they are just '|' tokens
alts = [self.compile_node(ch) for ch in node.children[::2]]
if len(alts) == 1:
return alts[0]
p = pytree.WildcardPattern([[a] for a in alts], min=1, max=1)
return p.optimize()
if node.type == self.syms.Alternative:
units = [self.compile_node(ch) for ch in node.children]
if len(units) == 1:
return units[0]
p = pytree.WildcardPattern([units], min=1, max=1)
return p.optimize()
if node.type == self.syms.NegatedUnit:
pattern = self.compile_basic(node.children[1:])
p = pytree.NegatedPattern(pattern)
return p.optimize()
assert node.type == self.syms.Unit
name = None
nodes = node.children
if len(nodes) >= 3 and nodes[1].type == token.EQUAL:
name = nodes[0].value
nodes = nodes[2:]
repeat = None
if len(nodes) >= 2 and nodes[-1].type == self.syms.Repeater:
repeat = nodes[-1]
nodes = nodes[:-1]
# Now we've reduced it to: STRING | NAME [Details] | (...) | [...]
pattern = self.compile_basic(nodes, repeat)
if repeat is not None:
assert repeat.type == self.syms.Repeater
children = repeat.children
child = children[0]
if child.type == token.STAR:
min = 0
max = pytree.HUGE
elif child.type == token.PLUS:
min = 1
max = pytree.HUGE
elif child.type == token.LBRACE:
assert children[-1].type == token.RBRACE
assert len(children) in (3, 5)
min = max = self.get_int(children[1])
if len(children) == 5:
max = self.get_int(children[3])
else:
assert False
if min != 1 or max != 1:
pattern = pattern.optimize()
pattern = pytree.WildcardPattern([[pattern]], min=min, max=max)
if name is not None:
pattern.name = name
return pattern.optimize()
def compile_basic(self, nodes, repeat=None):
# Compile STRING | NAME [Details] | (...) | [...]
assert len(nodes) >= 1
node = nodes[0]
if node.type == token.STRING:
value = unicode(literals.evalString(node.value))
return pytree.LeafPattern(_type_of_literal(value), value)
elif node.type == token.NAME:
value = node.value
if value.isupper():
if value not in TOKEN_MAP:
raise PatternSyntaxError("Invalid token: %r" % value)
if nodes[1:]:
raise PatternSyntaxError("Can't have details for token")
return pytree.LeafPattern(TOKEN_MAP[value])
else:
if value == "any":
type = None
elif not value.startswith("_"):
type = getattr(self.pysyms, value, None)
if type is None:
raise PatternSyntaxError("Invalid symbol: %r" % value)
if nodes[1:]: # Details present
content = [self.compile_node(nodes[1].children[1])]
else:
content = None
return pytree.NodePattern(type, content)
elif node.value == "(":
return self.compile_node(nodes[1])
elif node.value == "[":
assert repeat is None
subpattern = self.compile_node(nodes[1])
return pytree.WildcardPattern([[subpattern]], min=0, max=1)
assert False, node
def get_int(self, node):
assert node.type == token.NUMBER
return int(node.value)
# Map named tokens to the type value for a LeafPattern
TOKEN_MAP = {"NAME": token.NAME,
"STRING": token.STRING,
"NUMBER": token.NUMBER,
"TOKEN": None}
def _type_of_literal(value):
if value[0].isalpha():
return token.NAME
elif value in grammar.opmap:
return grammar.opmap[value]
else:
return None
def pattern_convert(grammar, raw_node_info):
"""Converts raw node information to a Node or Leaf instance."""
type, value, context, children = raw_node_info
if children or type in grammar.number2symbol:
return pytree.Node(type, children, context=context)
else:
return pytree.Leaf(type, value, context=context)
def compile_pattern(pattern):
return PatternCompiler().compile_pattern(pattern)
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Convert graminit.[ch] spit out by pgen to Python code.
Pgen is the Python parser generator. It is useful to quickly create a
parser from a grammar file in Python's grammar notation. But I don't
want my parsers to be written in C (yet), so I'm translating the
parsing tables to Python data structures and writing a Python parse
engine.
Note that the token numbers are constants determined by the standard
Python tokenizer. The standard token module defines these numbers and
their names (the names are not used much). The token numbers are
hardcoded into the Python tokenizer and into pgen. A Python
implementation of the Python tokenizer is also available, in the
standard tokenize module.
On the other hand, symbol numbers (representing the grammar's
non-terminals) are assigned by pgen based on the actual grammar
input.
Note: this module is pretty much obsolete; the pgen module generates
equivalent grammar tables directly from the Grammar.txt input file
without having to invoke the Python pgen C program.
"""
# Python imports
import re
# Local imports
from pgen2 import grammar, token
class Converter(grammar.Grammar):
"""Grammar subclass that reads classic pgen output files.
The run() method reads the tables as produced by the pgen parser
generator, typically contained in two C files, graminit.h and
graminit.c. The other methods are for internal use only.
See the base class for more documentation.
"""
def run(self, graminit_h, graminit_c):
"""Load the grammar tables from the text files written by pgen."""
self.parse_graminit_h(graminit_h)
self.parse_graminit_c(graminit_c)
self.finish_off()
def parse_graminit_h(self, filename):
"""Parse the .h file written by pgen. (Internal)
This file is a sequence of #define statements defining the
nonterminals of the grammar as numbers. We build two tables
mapping the numbers to names and back.
"""
try:
f = open(filename)
except IOError, err:
print "Can't open %s: %s" % (filename, err)
return False
self.symbol2number = {}
self.number2symbol = {}
lineno = 0
for line in f:
lineno += 1
mo = re.match(r"^#define\s+(\w+)\s+(\d+)$", line)
if not mo and line.strip():
print "%s(%s): can't parse %s" % (filename, lineno,
line.strip())
else:
symbol, number = mo.groups()
number = int(number)
assert symbol not in self.symbol2number
assert number not in self.number2symbol
self.symbol2number[symbol] = number
self.number2symbol[number] = symbol
return True
def parse_graminit_c(self, filename):
"""Parse the .c file written by pgen. (Internal)
The file looks as follows. The first two lines are always this:
#include "pgenheaders.h"
#include "grammar.h"
After that come four blocks:
1) one or more state definitions
2) a table defining dfas
3) a table defining labels
4) a struct defining the grammar
A state definition has the following form:
- one or more arc arrays, each of the form:
static arc arcs_<n>_<m>[<k>] = {
{<i>, <j>},
...
};
- followed by a state array, of the form:
static state states_<s>[<t>] = {
{<k>, arcs_<n>_<m>},
...
};
"""
try:
f = open(filename)
except IOError, err:
print "Can't open %s: %s" % (filename, err)
return False
# The code below essentially uses f's iterator-ness!
lineno = 0
# Expect the two #include lines
lineno, line = lineno+1, f.next()
assert line == '#include "pgenheaders.h"\n', (lineno, line)
lineno, line = lineno+1, f.next()
assert line == '#include "grammar.h"\n', (lineno, line)
# Parse the state definitions
lineno, line = lineno+1, f.next()
allarcs = {}
states = []
while line.startswith("static arc "):
while line.startswith("static arc "):
mo = re.match(r"static arc arcs_(\d+)_(\d+)\[(\d+)\] = {$",
line)
assert mo, (lineno, line)
n, m, k = map(int, mo.groups())
arcs = []
for _ in range(k):
lineno, line = lineno+1, f.next()
mo = re.match(r"\s+{(\d+), (\d+)},$", line)
assert mo, (lineno, line)
i, j = map(int, mo.groups())
arcs.append((i, j))
lineno, line = lineno+1, f.next()
assert line == "};\n", (lineno, line)
allarcs[(n, m)] = arcs
lineno, line = lineno+1, f.next()
mo = re.match(r"static state states_(\d+)\[(\d+)\] = {$", line)
assert mo, (lineno, line)
s, t = map(int, mo.groups())
assert s == len(states), (lineno, line)
state = []
for _ in range(t):
lineno, line = lineno+1, f.next()
mo = re.match(r"\s+{(\d+), arcs_(\d+)_(\d+)},$", line)
assert mo, (lineno, line)
k, n, m = map(int, mo.groups())
arcs = allarcs[n, m]
assert k == len(arcs), (lineno, line)
state.append(arcs)
states.append(state)
lineno, line = lineno+1, f.next()
assert line == "};\n", (lineno, line)
lineno, line = lineno+1, f.next()
self.states = states
# Parse the dfas
dfas = {}
mo = re.match(r"static dfa dfas\[(\d+)\] = {$", line)
assert mo, (lineno, line)
ndfas = int(mo.group(1))
for i in range(ndfas):
lineno, line = lineno+1, f.next()
mo = re.match(r'\s+{(\d+), "(\w+)", (\d+), (\d+), states_(\d+),$',
line)
assert mo, (lineno, line)
symbol = mo.group(2)
number, x, y, z = map(int, mo.group(1, 3, 4, 5))
assert self.symbol2number[symbol] == number, (lineno, line)
assert self.number2symbol[number] == symbol, (lineno, line)
assert x == 0, (lineno, line)
state = states[z]
assert y == len(state), (lineno, line)
lineno, line = lineno+1, f.next()
mo = re.match(r'\s+("(?:\\\d\d\d)*")},$', line)
assert mo, (lineno, line)
first = {}
rawbitset = eval(mo.group(1))
for i, c in enumerate(rawbitset):
byte = ord(c)
for j in range(8):
if byte & (1<<j):
first[i*8 + j] = 1
dfas[number] = (state, first)
lineno, line = lineno+1, f.next()
assert line == "};\n", (lineno, line)
self.dfas = dfas
# Parse the labels
labels = []
lineno, line = lineno+1, f.next()
mo = re.match(r"static label labels\[(\d+)\] = {$", line)
assert mo, (lineno, line)
nlabels = int(mo.group(1))
for i in range(nlabels):
lineno, line = lineno+1, f.next()
mo = re.match(r'\s+{(\d+), (0|"\w+")},$', line)
assert mo, (lineno, line)
x, y = mo.groups()
x = int(x)
if y == "0":
y = None
else:
y = eval(y)
labels.append((x, y))
lineno, line = lineno+1, f.next()
assert line == "};\n", (lineno, line)
self.labels = labels
# Parse the grammar struct
lineno, line = lineno+1, f.next()
assert line == "grammar _PyParser_Grammar = {\n", (lineno, line)
lineno, line = lineno+1, f.next()
mo = re.match(r"\s+(\d+),$", line)
assert mo, (lineno, line)
ndfas = int(mo.group(1))
assert ndfas == len(self.dfas)
lineno, line = lineno+1, f.next()
assert line == "\tdfas,\n", (lineno, line)
lineno, line = lineno+1, f.next()
mo = re.match(r"\s+{(\d+), labels},$", line)
assert mo, (lineno, line)
nlabels = int(mo.group(1))
assert nlabels == len(self.labels), (lineno, line)
lineno, line = lineno+1, f.next()
mo = re.match(r"\s+(\d+)$", line)
assert mo, (lineno, line)
start = int(mo.group(1))
assert start in self.number2symbol, (lineno, line)
self.start = start
lineno, line = lineno+1, f.next()
assert line == "};\n", (lineno, line)
try:
lineno, line = lineno+1, f.next()
except StopIteration:
pass
else:
assert 0, (lineno, line)
def finish_off(self):
"""Create additional useful structures. (Internal)."""
self.keywords = {} # map from keyword strings to arc labels
self.tokens = {} # map from numeric token values to arc labels
for ilabel, (type, value) in enumerate(self.labels):
if type == token.NAME and value is not None:
self.keywords[value] = ilabel
elif value is None:
self.tokens[type] = ilabel
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
# Modifications:
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Parser driver.
This provides a high-level interface to parse a file into a syntax tree.
"""
__author__ = "Guido van Rossum <[email protected]>"
__all__ = ["Driver", "load_grammar"]
# Python imports
import codecs
import os
import logging
import pkgutil
import StringIO
import sys
# Pgen imports
from . import grammar, parse, token, tokenize, pgen
class Driver(object):
def __init__(self, grammar, convert=None, logger=None):
self.grammar = grammar
if logger is None:
logger = logging.getLogger()
self.logger = logger
self.convert = convert
def parse_tokens(self, tokens, debug=False):
"""Parse a series of tokens and return the syntax tree."""
# XXX Move the prefix computation into a wrapper around tokenize.
p = parse.Parser(self.grammar, self.convert)
p.setup()
lineno = 1
column = 0
type = value = start = end = line_text = None
prefix = u""
for quintuple in tokens:
type, value, start, end, line_text = quintuple
if start != (lineno, column):
assert (lineno, column) <= start, ((lineno, column), start)
s_lineno, s_column = start
if lineno < s_lineno:
prefix += "\n" * (s_lineno - lineno)
lineno = s_lineno
column = 0
if column < s_column:
prefix += line_text[column:s_column]
column = s_column
if type in (tokenize.COMMENT, tokenize.NL):
prefix += value
lineno, column = end
if value.endswith("\n"):
lineno += 1
column = 0
continue
if type == token.OP:
type = grammar.opmap[value]
if debug:
self.logger.debug("%s %r (prefix=%r)",
token.tok_name[type], value, prefix)
if p.addtoken(type, value, (prefix, start)):
if debug:
self.logger.debug("Stop.")
break
prefix = ""
lineno, column = end
if value.endswith("\n"):
lineno += 1
column = 0
else:
# We never broke out -- EOF is too soon (how can this happen???)
raise parse.ParseError("incomplete input",
type, value, (prefix, start))
return p.rootnode
def parse_stream_raw(self, stream, debug=False):
"""Parse a stream and return the syntax tree."""
tokens = tokenize.generate_tokens(stream.readline)
return self.parse_tokens(tokens, debug)
def parse_stream(self, stream, debug=False):
"""Parse a stream and return the syntax tree."""
return self.parse_stream_raw(stream, debug)
def parse_file(self, filename, encoding=None, debug=False):
"""Parse a file and return the syntax tree."""
stream = codecs.open(filename, "r", encoding)
try:
return self.parse_stream(stream, debug)
finally:
stream.close()
def parse_string(self, text, debug=False):
"""Parse a string and return the syntax tree."""
tokens = tokenize.generate_tokens(StringIO.StringIO(text).readline)
return self.parse_tokens(tokens, debug)
def _generate_pickle_name(gt):
head, tail = os.path.splitext(gt)
if tail == ".txt":
tail = ""
return head + tail + ".".join(map(str, sys.version_info)) + ".pickle"
def load_grammar(gt="Grammar.txt", gp=None,
save=True, force=False, logger=None):
"""Load the grammar (maybe from a pickle)."""
if logger is None:
logger = logging.getLogger()
gp = _generate_pickle_name(gt) if gp is None else gp
if force or not _newer(gp, gt):
logger.info("Generating grammar tables from %s", gt)
g = pgen.generate_grammar(gt)
if save:
logger.info("Writing grammar tables to %s", gp)
try:
g.dump(gp)
except IOError as e:
logger.info("Writing failed: %s", e)
else:
g = grammar.Grammar()
g.load(gp)
return g
def _newer(a, b):
"""Inquire whether file a was written since file b."""
if not os.path.exists(a):
return False
if not os.path.exists(b):
return True
return os.path.getmtime(a) >= os.path.getmtime(b)
def load_packaged_grammar(package, grammar_source):
"""Normally, loads a pickled grammar by doing
pkgutil.get_data(package, pickled_grammar)
where *pickled_grammar* is computed from *grammar_source* by adding the
Python version and using a ``.pickle`` extension.
However, if *grammar_source* is an extant file, load_grammar(grammar_source)
is called instead. This facilitates using a packaged grammar file when needed
but preserves load_grammar's automatic regeneration behavior when possible.
"""
if os.path.isfile(grammar_source):
return load_grammar(grammar_source)
pickled_name = _generate_pickle_name(os.path.basename(grammar_source))
data = pkgutil.get_data(package, pickled_name)
g = grammar.Grammar()
g.loads(data)
return g
def main(*args):
"""Main program, when run as a script: produce grammar pickle files.
Calls load_grammar for each argument, a path to a grammar text file.
"""
if not args:
args = sys.argv[1:]
logging.basicConfig(level=logging.INFO, stream=sys.stdout,
format='%(message)s')
for gt in args:
load_grammar(gt, save=True, force=True)
return True
if __name__ == "__main__":
sys.exit(int(not main()))
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""This module defines the data structures used to represent a grammar.
These are a bit arcane because they are derived from the data
structures used by Python's 'pgen' parser generator.
There's also a table here mapping operators to their names in the
token module; the Python tokenize module reports all operators as the
fallback token code OP, but the parser needs the actual token code.
"""
# Python imports
import collections
import pickle
# Local imports
from . import token, tokenize
class Grammar(object):
"""Pgen parsing tables conversion class.
Once initialized, this class supplies the grammar tables for the
parsing engine implemented by parse.py. The parsing engine
accesses the instance variables directly. The class here does not
provide initialization of the tables; several subclasses exist to
do this (see the conv and pgen modules).
The load() method reads the tables from a pickle file, which is
much faster than the other ways offered by subclasses. The pickle
file is written by calling dump() (after loading the grammar
tables using a subclass). The report() method prints a readable
representation of the tables to stdout, for debugging.
The instance variables are as follows:
symbol2number -- a dict mapping symbol names to numbers. Symbol
numbers are always 256 or higher, to distinguish
them from token numbers, which are between 0 and
255 (inclusive).
number2symbol -- a dict mapping numbers to symbol names;
these two are each other's inverse.
states -- a list of DFAs, where each DFA is a list of
states, each state is a list of arcs, and each
arc is a (i, j) pair where i is a label and j is
a state number. The DFA number is the index into
this list. (This name is slightly confusing.)
Final states are represented by a special arc of
the form (0, j) where j is its own state number.
dfas -- a dict mapping symbol numbers to (DFA, first)
pairs, where DFA is an item from the states list
above, and first is a set of tokens that can
begin this grammar rule (represented by a dict
whose values are always 1).
labels -- a list of (x, y) pairs where x is either a token
number or a symbol number, and y is either None
or a string; the strings are keywords. The label
number is the index in this list; label numbers
are used to mark state transitions (arcs) in the
DFAs.
start -- the number of the grammar's start symbol.
keywords -- a dict mapping keyword strings to arc labels.
tokens -- a dict mapping token numbers to arc labels.
"""
def __init__(self):
self.symbol2number = {}
self.number2symbol = {}
self.states = []
self.dfas = {}
self.labels = [(0, "EMPTY")]
self.keywords = {}
self.tokens = {}
self.symbol2label = {}
self.start = 256
def dump(self, filename):
"""Dump the grammar tables to a pickle file.
dump() recursively changes all dict to OrderedDict, so the pickled file
is not exactly the same as what was passed in to dump(). load() uses the
pickled file to create the tables, but only changes OrderedDict to dict
at the top level; it does not recursively change OrderedDict to dict.
So, the loaded tables are different from the original tables that were
passed to load() in that some of the OrderedDict (from the pickled file)
are not changed back to dict. For parsing, this has no effect on
performance because OrderedDict uses dict's __getitem__ with nothing in
between.
"""
with open(filename, "wb") as f:
d = _make_deterministic(self.__dict__)
pickle.dump(d, f, 2)
def load(self, filename):
"""Load the grammar tables from a pickle file."""
f = open(filename, "rb")
d = pickle.load(f)
f.close()
self.__dict__.update(d)
def loads(self, pkl):
"""Load the grammar tables from a pickle bytes object."""
self.__dict__.update(pickle.loads(pkl))
def copy(self):
"""
Copy the grammar.
"""
new = self.__class__()
for dict_attr in ("symbol2number", "number2symbol", "dfas", "keywords",
"tokens", "symbol2label"):
setattr(new, dict_attr, getattr(self, dict_attr).copy())
new.labels = self.labels[:]
new.states = self.states[:]
new.start = self.start
return new
def report(self):
"""Dump the grammar tables to standard output, for debugging."""
from pprint import pprint
print "s2n"
pprint(self.symbol2number)
print "n2s"
pprint(self.number2symbol)
print "states"
pprint(self.states)
print "dfas"
pprint(self.dfas)
print "labels"
pprint(self.labels)
print "start", self.start
def _make_deterministic(top):
if isinstance(top, dict):
return collections.OrderedDict(
sorted(((k, _make_deterministic(v)) for k, v in top.iteritems())))
if isinstance(top, list):
return [_make_deterministic(e) for e in top]
if isinstance(top, tuple):
return tuple(_make_deterministic(e) for e in top)
return top
# Map from operator to number (since tokenize doesn't do this)
opmap_raw = """
( LPAR
) RPAR
[ LSQB
] RSQB
: COLON
, COMMA
; SEMI
+ PLUS
- MINUS
* STAR
/ SLASH
| VBAR
& AMPER
< LESS
> GREATER
= EQUAL
. DOT
% PERCENT
` BACKQUOTE
{ LBRACE
} RBRACE
@ AT
@= ATEQUAL
== EQEQUAL
!= NOTEQUAL
<> NOTEQUAL
<= LESSEQUAL
>= GREATEREQUAL
~ TILDE
^ CIRCUMFLEX
<< LEFTSHIFT
>> RIGHTSHIFT
** DOUBLESTAR
+= PLUSEQUAL
-= MINEQUAL
*= STAREQUAL
/= SLASHEQUAL
%= PERCENTEQUAL
&= AMPEREQUAL
|= VBAREQUAL
^= CIRCUMFLEXEQUAL
<<= LEFTSHIFTEQUAL
>>= RIGHTSHIFTEQUAL
**= DOUBLESTAREQUAL
// DOUBLESLASH
//= DOUBLESLASHEQUAL
-> RARROW
"""
opmap = {}
for line in opmap_raw.splitlines():
if line:
op, name = line.split()
opmap[op] = getattr(token, name)
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Safely evaluate Python string literals without using eval()."""
import re
simple_escapes = {"a": "\a",
"b": "\b",
"f": "\f",
"n": "\n",
"r": "\r",
"t": "\t",
"v": "\v",
"'": "'",
'"': '"',
"\\": "\\"}
def escape(m):
all, tail = m.group(0, 1)
assert all.startswith("\\")
esc = simple_escapes.get(tail)
if esc is not None:
return esc
if tail.startswith("x"):
hexes = tail[1:]
if len(hexes) < 2:
raise ValueError("invalid hex string escape ('\\%s')" % tail)
try:
i = int(hexes, 16)
except ValueError:
raise ValueError("invalid hex string escape ('\\%s')" % tail)
else:
try:
i = int(tail, 8)
except ValueError:
raise ValueError("invalid octal string escape ('\\%s')" % tail)
return chr(i)
def evalString(s):
assert s.startswith("'") or s.startswith('"'), repr(s[:1])
q = s[0]
if s[:3] == q*3:
q = q*3
assert s.endswith(q), repr(s[-len(q):])
assert len(s) >= 2*len(q)
s = s[len(q):-len(q)]
return re.sub(r"\\(\'|\"|\\|[abfnrtv]|x.{0,2}|[0-7]{1,3})", escape, s)
def test():
for i in range(256):
c = chr(i)
s = repr(c)
e = evalString(s)
if e != c:
print i, c, s, e
if __name__ == "__main__":
test()
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Parser engine for the grammar tables generated by pgen.
The grammar table must be loaded first.
See Parser/parser.c in the Python distribution for additional info on
how this parsing engine works.
"""
# Local imports
from . import token
class ParseError(Exception):
"""Exception to signal the parser is stuck."""
def __init__(self, msg, type, value, context):
Exception.__init__(self, "%s: type=%r, value=%r, context=%r" %
(msg, type, value, context))
self.msg = msg
self.type = type
self.value = value
self.context = context
class Parser(object):
"""Parser engine.
The proper usage sequence is:
p = Parser(grammar, [converter]) # create instance
p.setup([start]) # prepare for parsing
<for each input token>:
if p.addtoken(...): # parse a token; may raise ParseError
break
root = p.rootnode # root of abstract syntax tree
A Parser instance may be reused by calling setup() repeatedly.
A Parser instance contains state pertaining to the current token
sequence, and should not be used concurrently by different threads
to parse separate token sequences.
See driver.py for how to get input tokens by tokenizing a file or
string.
Parsing is complete when addtoken() returns True; the root of the
abstract syntax tree can then be retrieved from the rootnode
instance variable. When a syntax error occurs, addtoken() raises
the ParseError exception. There is no error recovery; the parser
cannot be used after a syntax error was reported (but it can be
reinitialized by calling setup()).
"""
def __init__(self, grammar, convert=None):
"""Constructor.
The grammar argument is a grammar.Grammar instance; see the
grammar module for more information.
The parser is not ready yet for parsing; you must call the
setup() method to get it started.
The optional convert argument is a function mapping concrete
syntax tree nodes to abstract syntax tree nodes. If not
given, no conversion is done and the syntax tree produced is
the concrete syntax tree. If given, it must be a function of
two arguments, the first being the grammar (a grammar.Grammar
instance), and the second being the concrete syntax tree node
to be converted. The syntax tree is converted from the bottom
up.
A concrete syntax tree node is a (type, value, context, nodes)
tuple, where type is the node type (a token or symbol number),
value is None for symbols and a string for tokens, context is
None or an opaque value used for error reporting (typically a
(lineno, offset) pair), and nodes is a list of children for
symbols, and None for tokens.
An abstract syntax tree node may be anything; this is entirely
up to the converter function.
"""
self.grammar = grammar
self.convert = convert or (lambda grammar, node: node)
def setup(self, start=None):
"""Prepare for parsing.
This *must* be called before starting to parse.
The optional argument is an alternative start symbol; it
defaults to the grammar's start symbol.
You can use a Parser instance to parse any number of programs;
each time you call setup() the parser is reset to an initial
state determined by the (implicit or explicit) start symbol.
"""
if start is None:
start = self.grammar.start
# Each stack entry is a tuple: (dfa, state, node).
# A node is a tuple: (type, value, context, children),
# where children is a list of nodes or None, and context may be None.
newnode = (start, None, None, [])
stackentry = (self.grammar.dfas[start], 0, newnode)
self.stack = [stackentry]
self.rootnode = None
self.used_names = set() # Aliased to self.rootnode.used_names in pop()
def addtoken(self, type, value, context):
"""Add a token; return True iff this is the end of the program."""
# Map from token to label
ilabel = self.classify(type, value, context)
# Loop until the token is shifted; may raise exceptions
while True:
dfa, state, node = self.stack[-1]
states, first = dfa
arcs = states[state]
# Look for a state with this label
for i, newstate in arcs:
t, v = self.grammar.labels[i]
if ilabel == i:
# Look it up in the list of labels
assert t < 256
# Shift a token; we're done with it
self.shift(type, value, newstate, context)
# Pop while we are in an accept-only state
state = newstate
while states[state] == [(0, state)]:
self.pop()
if not self.stack:
# Done parsing!
return True
dfa, state, node = self.stack[-1]
states, first = dfa
# Done with this token
return False
elif t >= 256:
# See if it's a symbol and if we're in its first set
itsdfa = self.grammar.dfas[t]
itsstates, itsfirst = itsdfa
if ilabel in itsfirst:
# Push a symbol
self.push(t, self.grammar.dfas[t], newstate, context)
break # To continue the outer while loop
else:
if (0, state) in arcs:
# An accepting state, pop it and try something else
self.pop()
if not self.stack:
# Done parsing, but another token is input
raise ParseError("too much input",
type, value, context)
else:
# No success finding a transition
raise ParseError("bad input", type, value, context)
def classify(self, type, value, context):
"""Turn a token into a label. (Internal)"""
if type == token.NAME:
# Keep a listing of all used names
self.used_names.add(value)
# Check for reserved words
ilabel = self.grammar.keywords.get(value)
if ilabel is not None:
return ilabel
ilabel = self.grammar.tokens.get(type)
if ilabel is None:
raise ParseError("bad token", type, value, context)
return ilabel
def shift(self, type, value, newstate, context):
"""Shift a token. (Internal)"""
dfa, state, node = self.stack[-1]
newnode = (type, value, context, None)
newnode = self.convert(self.grammar, newnode)
if newnode is not None:
node[-1].append(newnode)
self.stack[-1] = (dfa, newstate, node)
def push(self, type, newdfa, newstate, context):
"""Push a nonterminal. (Internal)"""
dfa, state, node = self.stack[-1]
newnode = (type, None, context, [])
self.stack[-1] = (dfa, newstate, node)
self.stack.append((newdfa, 0, newnode))
def pop(self):
"""Pop a nonterminal. (Internal)"""
popdfa, popstate, popnode = self.stack.pop()
newnode = self.convert(self.grammar, popnode)
if newnode is not None:
if self.stack:
dfa, state, node = self.stack[-1]
node[-1].append(newnode)
else:
self.rootnode = newnode
self.rootnode.used_names = self.used_names
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
# Pgen imports
from . import grammar, token, tokenize
class PgenGrammar(grammar.Grammar):
pass
class ParserGenerator(object):
def __init__(self, filename, stream=None):
close_stream = None
if stream is None:
stream = open(filename)
close_stream = stream.close
self.filename = filename
self.stream = stream
self.generator = tokenize.generate_tokens(stream.readline)
self.gettoken() # Initialize lookahead
self.dfas, self.startsymbol = self.parse()
if close_stream is not None:
close_stream()
self.first = {} # map from symbol name to set of tokens
self.addfirstsets()
def make_grammar(self):
c = PgenGrammar()
names = self.dfas.keys()
names.sort()
names.remove(self.startsymbol)
names.insert(0, self.startsymbol)
for name in names:
i = 256 + len(c.symbol2number)
c.symbol2number[name] = i
c.number2symbol[i] = name
for name in names:
dfa = self.dfas[name]
states = []
for state in dfa:
arcs = []
for label, next in sorted(state.arcs.iteritems()):
arcs.append((self.make_label(c, label), dfa.index(next)))
if state.isfinal:
arcs.append((0, dfa.index(state)))
states.append(arcs)
c.states.append(states)
c.dfas[c.symbol2number[name]] = (states, self.make_first(c, name))
c.start = c.symbol2number[self.startsymbol]
return c
def make_first(self, c, name):
rawfirst = self.first[name]
first = {}
for label in sorted(rawfirst):
ilabel = self.make_label(c, label)
##assert ilabel not in first # XXX failed on <> ... !=
first[ilabel] = 1
return first
def make_label(self, c, label):
# XXX Maybe this should be a method on a subclass of converter?
ilabel = len(c.labels)
if label[0].isalpha():
# Either a symbol name or a named token
if label in c.symbol2number:
# A symbol name (a non-terminal)
if label in c.symbol2label:
return c.symbol2label[label]
else:
c.labels.append((c.symbol2number[label], None))
c.symbol2label[label] = ilabel
return ilabel
else:
# A named token (NAME, NUMBER, STRING)
itoken = getattr(token, label, None)
assert isinstance(itoken, (int, long)), label
assert itoken in token.tok_name, label
if itoken in c.tokens:
return c.tokens[itoken]
else:
c.labels.append((itoken, None))
c.tokens[itoken] = ilabel
return ilabel
else:
# Either a keyword or an operator
assert label[0] in ('"', "'"), label
value = eval(label)
if value[0].isalpha():
# A keyword
if value in c.keywords:
return c.keywords[value]
else:
c.labels.append((token.NAME, value))
c.keywords[value] = ilabel
return ilabel
else:
# An operator (any non-numeric token)
itoken = grammar.opmap[value] # Fails if unknown token
if itoken in c.tokens:
return c.tokens[itoken]
else:
c.labels.append((itoken, None))
c.tokens[itoken] = ilabel
return ilabel
def addfirstsets(self):
names = self.dfas.keys()
names.sort()
for name in names:
if name not in self.first:
self.calcfirst(name)
#print name, self.first[name].keys()
def calcfirst(self, name):
dfa = self.dfas[name]
self.first[name] = None # dummy to detect left recursion
state = dfa[0]
totalset = {}
overlapcheck = {}
for label, next in state.arcs.iteritems():
if label in self.dfas:
if label in self.first:
fset = self.first[label]
if fset is None:
raise ValueError("recursion for rule %r" % name)
else:
self.calcfirst(label)
fset = self.first[label]
totalset.update(fset)
overlapcheck[label] = fset
else:
totalset[label] = 1
overlapcheck[label] = {label: 1}
inverse = {}
for label, itsfirst in overlapcheck.iteritems():
for symbol in itsfirst:
if symbol in inverse:
raise ValueError("rule %s is ambiguous; %s is in the"
" first sets of %s as well as %s" %
(name, symbol, label, inverse[symbol]))
inverse[symbol] = label
self.first[name] = totalset
def parse(self):
dfas = {}
startsymbol = None
# MSTART: (NEWLINE | RULE)* ENDMARKER
while self.type != token.ENDMARKER:
while self.type == token.NEWLINE:
self.gettoken()
# RULE: NAME ':' RHS NEWLINE
name = self.expect(token.NAME)
self.expect(token.OP, ":")
a, z = self.parse_rhs()
self.expect(token.NEWLINE)
#self.dump_nfa(name, a, z)
dfa = self.make_dfa(a, z)
#self.dump_dfa(name, dfa)
oldlen = len(dfa)
self.simplify_dfa(dfa)
newlen = len(dfa)
dfas[name] = dfa
#print name, oldlen, newlen
if startsymbol is None:
startsymbol = name
return dfas, startsymbol
def make_dfa(self, start, finish):
# To turn an NFA into a DFA, we define the states of the DFA
# to correspond to *sets* of states of the NFA. Then do some
# state reduction. Let's represent sets as dicts with 1 for
# values.
assert isinstance(start, NFAState)
assert isinstance(finish, NFAState)
def closure(state):
base = {}
addclosure(state, base)
return base
def addclosure(state, base):
assert isinstance(state, NFAState)
if state in base:
return
base[state] = 1
for label, next in state.arcs:
if label is None:
addclosure(next, base)
states = [DFAState(closure(start), finish)]
for state in states: # NB states grows while we're iterating
arcs = {}
for nfastate in state.nfaset:
for label, next in nfastate.arcs:
if label is not None:
addclosure(next, arcs.setdefault(label, {}))
for label, nfaset in sorted(arcs.iteritems()):
for st in states:
if st.nfaset == nfaset:
break
else:
st = DFAState(nfaset, finish)
states.append(st)
state.addarc(st, label)
return states # List of DFAState instances; first one is start
def dump_nfa(self, name, start, finish):
print "Dump of NFA for", name
todo = [start]
for i, state in enumerate(todo):
print " State", i, state is finish and "(final)" or ""
for label, next in state.arcs:
if next in todo:
j = todo.index(next)
else:
j = len(todo)
todo.append(next)
if label is None:
print " -> %d" % j
else:
print " %s -> %d" % (label, j)
def dump_dfa(self, name, dfa):
print "Dump of DFA for", name
for i, state in enumerate(dfa):
print " State", i, state.isfinal and "(final)" or ""
for label, next in sorted(state.arcs.iteritems()):
print " %s -> %d" % (label, dfa.index(next))
def simplify_dfa(self, dfa):
# This is not theoretically optimal, but works well enough.
# Algorithm: repeatedly look for two states that have the same
# set of arcs (same labels pointing to the same nodes) and
# unify them, until things stop changing.
# dfa is a list of DFAState instances
changes = True
while changes:
changes = False
for i, state_i in enumerate(dfa):
for j in range(i+1, len(dfa)):
state_j = dfa[j]
if state_i == state_j:
#print " unify", i, j
del dfa[j]
for state in dfa:
state.unifystate(state_j, state_i)
changes = True
break
def parse_rhs(self):
# RHS: ALT ('|' ALT)*
a, z = self.parse_alt()
if self.value != "|":
return a, z
else:
aa = NFAState()
zz = NFAState()
aa.addarc(a)
z.addarc(zz)
while self.value == "|":
self.gettoken()
a, z = self.parse_alt()
aa.addarc(a)
z.addarc(zz)
return aa, zz
def parse_alt(self):
# ALT: ITEM+
a, b = self.parse_item()
while (self.value in ("(", "[") or
self.type in (token.NAME, token.STRING)):
c, d = self.parse_item()
b.addarc(c)
b = d
return a, b
def parse_item(self):
# ITEM: '[' RHS ']' | ATOM ['+' | '*']
if self.value == "[":
self.gettoken()
a, z = self.parse_rhs()
self.expect(token.OP, "]")
a.addarc(z)
return a, z
else:
a, z = self.parse_atom()
value = self.value
if value not in ("+", "*"):
return a, z
self.gettoken()
z.addarc(a)
if value == "+":
return a, z
else:
return a, a
def parse_atom(self):
# ATOM: '(' RHS ')' | NAME | STRING
if self.value == "(":
self.gettoken()
a, z = self.parse_rhs()
self.expect(token.OP, ")")
return a, z
elif self.type in (token.NAME, token.STRING):
a = NFAState()
z = NFAState()
a.addarc(z, self.value)
self.gettoken()
return a, z
else:
self.raise_error("expected (...) or NAME or STRING, got %s/%s",
self.type, self.value)
def expect(self, type, value=None):
if self.type != type or (value is not None and self.value != value):
self.raise_error("expected %s/%s, got %s/%s",
type, value, self.type, self.value)
value = self.value
self.gettoken()
return value
def gettoken(self):
tup = self.generator.next()
while tup[0] in (tokenize.COMMENT, tokenize.NL):
tup = self.generator.next()
self.type, self.value, self.begin, self.end, self.line = tup
#print token.tok_name[self.type], repr(self.value)
def raise_error(self, msg, *args):
if args:
try:
msg = msg % args
except:
msg = " ".join([msg] + map(str, args))
raise SyntaxError(msg, (self.filename, self.end[0],
self.end[1], self.line))
class NFAState(object):
def __init__(self):
self.arcs = [] # list of (label, NFAState) pairs
def addarc(self, next, label=None):
assert label is None or isinstance(label, str)
assert isinstance(next, NFAState)
self.arcs.append((label, next))
class DFAState(object):
def __init__(self, nfaset, final):
assert isinstance(nfaset, dict)
assert isinstance(iter(nfaset).next(), NFAState)
assert isinstance(final, NFAState)
self.nfaset = nfaset
self.isfinal = final in nfaset
self.arcs = {} # map from label to DFAState
def addarc(self, next, label):
assert isinstance(label, str)
assert label not in self.arcs
assert isinstance(next, DFAState)
self.arcs[label] = next
def unifystate(self, old, new):
for label, next in self.arcs.iteritems():
if next is old:
self.arcs[label] = new
def __eq__(self, other):
# Equality test -- ignore the nfaset instance variable
assert isinstance(other, DFAState)
if self.isfinal != other.isfinal:
return False
# Can't just return self.arcs == other.arcs, because that
# would invoke this method recursively, with cycles...
if len(self.arcs) != len(other.arcs):
return False
for label, next in self.arcs.iteritems():
if next is not other.arcs.get(label):
return False
return True
__hash__ = None # For Py3 compatibility.
def generate_grammar(filename="Grammar.txt"):
p = ParserGenerator(filename)
return p.make_grammar()
#! /usr/bin/env python
"""Token constants (from "token.h")."""
# Taken from Python (r53757) and modified to include some tokens
# originally monkeypatched in by pgen2.tokenize
#--start constants--
ENDMARKER = 0
NAME = 1
NUMBER = 2
STRING = 3
NEWLINE = 4
INDENT = 5
DEDENT = 6
LPAR = 7
RPAR = 8
LSQB = 9
RSQB = 10
COLON = 11
COMMA = 12
SEMI = 13
PLUS = 14
MINUS = 15
STAR = 16
SLASH = 17
VBAR = 18
AMPER = 19
LESS = 20
GREATER = 21
EQUAL = 22
DOT = 23
PERCENT = 24
BACKQUOTE = 25
LBRACE = 26
RBRACE = 27
EQEQUAL = 28
NOTEQUAL = 29
LESSEQUAL = 30
GREATEREQUAL = 31
TILDE = 32
CIRCUMFLEX = 33
LEFTSHIFT = 34
RIGHTSHIFT = 35
DOUBLESTAR = 36
PLUSEQUAL = 37
MINEQUAL = 38
STAREQUAL = 39
SLASHEQUAL = 40
PERCENTEQUAL = 41
AMPEREQUAL = 42
VBAREQUAL = 43
CIRCUMFLEXEQUAL = 44
LEFTSHIFTEQUAL = 45
RIGHTSHIFTEQUAL = 46
DOUBLESTAREQUAL = 47
DOUBLESLASH = 48
DOUBLESLASHEQUAL = 49
AT = 50
ATEQUAL = 51
OP = 52
COMMENT = 53
NL = 54
RARROW = 55
ERRORTOKEN = 56
N_TOKENS = 57
NT_OFFSET = 256
#--end constants--
tok_name = {}
for _name, _value in globals().items():
if type(_value) is type(0):
tok_name[_value] = _name
def ISTERMINAL(x):
return x < NT_OFFSET
def ISNONTERMINAL(x):
return x >= NT_OFFSET
def ISEOF(x):
return x == ENDMARKER
# Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation.
# All rights reserved.
"""Tokenization help for Python programs.
generate_tokens(readline) is a generator that breaks a stream of
text into Python tokens. It accepts a readline-like method which is called
repeatedly to get the next line of input (or "" for EOF). It generates
5-tuples with these members:
the token type (see token.py)
the token (a string)
the starting (row, column) indices of the token (a 2-tuple of ints)
the ending (row, column) indices of the token (a 2-tuple of ints)
the original line (string)
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators
Older entry points
tokenize_loop(readline, tokeneater)
tokenize(readline, tokeneater=printtoken)
are the same, except instead of generating tokens, tokeneater is a callback
function to which the 5 fields described above are passed as 5 arguments,
each time a new token is found."""
__author__ = 'Ka-Ping Yee <[email protected]>'
__credits__ = \
'GvR, ESR, Tim Peters, Thomas Wouters, Fred Drake, Skip Montanaro'
import string, re
from codecs import BOM_UTF8, lookup
from lib2to3.pgen2.token import *
from . import token
__all__ = [x for x in dir(token) if x[0] != '_'] + ["tokenize",
"generate_tokens", "untokenize"]
del token
try:
bytes
except NameError:
# Support bytes type in Python <= 2.5, so 2to3 turns itself into
# valid Python 3 code.
bytes = str
def group(*choices): return '(' + '|'.join(choices) + ')'
def any(*choices): return group(*choices) + '*'
def maybe(*choices): return group(*choices) + '?'
Whitespace = r'[ \f\t]*'
Comment = r'#[^\r\n]*'
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
Name = r'[a-zA-Z_]\w*'
Binnumber = r'0[bB][01]*'
Hexnumber = r'0[xX][\da-fA-F]*[lL]?'
Octnumber = r'0[oO]?[0-7]*[lL]?'
Decnumber = r'[1-9]\d*[lL]?'
Intnumber = group(Binnumber, Hexnumber, Octnumber, Decnumber)
Exponent = r'[eE][-+]?\d+'
Pointfloat = group(r'\d+\.\d*', r'\.\d+') + maybe(Exponent)
Expfloat = r'\d+' + Exponent
Floatnumber = group(Pointfloat, Expfloat)
Imagnumber = group(r'\d+[jJ]', Floatnumber + r'[jJ]')
Number = group(Imagnumber, Floatnumber, Intnumber)
# Tail end of ' string.
Single = r"[^'\\]*(?:\\.[^'\\]*)*'"
# Tail end of " string.
Double = r'[^"\\]*(?:\\.[^"\\]*)*"'
# Tail end of ''' string.
Single3 = r"[^'\\]*(?:(?:\\.|'(?!''))[^'\\]*)*'''"
# Tail end of """ string.
Double3 = r'[^"\\]*(?:(?:\\.|"(?!""))[^"\\]*)*"""'
Triple = group("[ubUB]?[rR]?'''", '[ubUB]?[rR]?"""')
# Single-line ' or " string.
String = group(r"[uU]?[rR]?'[^\n'\\]*(?:\\.[^\n'\\]*)*'",
r'[uU]?[rR]?"[^\n"\\]*(?:\\.[^\n"\\]*)*"')
# Because of leftmost-then-longest match semantics, be sure to put the
# longest operators first (e.g., if = came before ==, == would get
# recognized as two instances of =).
Operator = group(r"\*\*=?", r">>=?", r"<<=?", r"<>", r"!=",
r"//=?", r"->",
r"[+\-*/%&@|^=<>]=?",
r"~")
Bracket = '[][(){}]'
Special = group(r'\r?\n', r'[:;.,`@]')
Funny = group(Operator, Bracket, Special)
PlainToken = group(Number, Funny, String, Name)
Token = Ignore + PlainToken
# First (or only) line of ' or " string.
ContStr = group(r"[uUbB]?[rR]?'[^\n'\\]*(?:\\.[^\n'\\]*)*" +
group("'", r'\\\r?\n'),
r'[uUbB]?[rR]?"[^\n"\\]*(?:\\.[^\n"\\]*)*' +
group('"', r'\\\r?\n'))
PseudoExtras = group(r'\\\r?\n', Comment, Triple)
PseudoToken = Whitespace + group(PseudoExtras, Number, Funny, ContStr, Name)
tokenprog, pseudoprog, single3prog, double3prog = map(
re.compile, (Token, PseudoToken, Single3, Double3))
endprogs = {"'": re.compile(Single), '"': re.compile(Double),
"'''": single3prog, '"""': double3prog,
"r'''": single3prog, 'r"""': double3prog,
"u'''": single3prog, 'u"""': double3prog,
"b'''": single3prog, 'b"""': double3prog,
"ur'''": single3prog, 'ur"""': double3prog,
"br'''": single3prog, 'br"""': double3prog,
"R'''": single3prog, 'R"""': double3prog,
"U'''": single3prog, 'U"""': double3prog,
"B'''": single3prog, 'B"""': double3prog,
"uR'''": single3prog, 'uR"""': double3prog,
"Ur'''": single3prog, 'Ur"""': double3prog,
"UR'''": single3prog, 'UR"""': double3prog,
"bR'''": single3prog, 'bR"""': double3prog,
"Br'''": single3prog, 'Br"""': double3prog,
"BR'''": single3prog, 'BR"""': double3prog,
'r': None, 'R': None,
'u': None, 'U': None,
'b': None, 'B': None}
triple_quoted = {}
for t in ("'''", '"""',
"r'''", 'r"""', "R'''", 'R"""',
"u'''", 'u"""', "U'''", 'U"""',
"b'''", 'b"""', "B'''", 'B"""',
"ur'''", 'ur"""', "Ur'''", 'Ur"""',
"uR'''", 'uR"""', "UR'''", 'UR"""',
"br'''", 'br"""', "Br'''", 'Br"""',
"bR'''", 'bR"""', "BR'''", 'BR"""',):
triple_quoted[t] = t
single_quoted = {}
for t in ("'", '"',
"r'", 'r"', "R'", 'R"',
"u'", 'u"', "U'", 'U"',
"b'", 'b"', "B'", 'B"',
"ur'", 'ur"', "Ur'", 'Ur"',
"uR'", 'uR"', "UR'", 'UR"',
"br'", 'br"', "Br'", 'Br"',
"bR'", 'bR"', "BR'", 'BR"', ):
single_quoted[t] = t
tabsize = 8
class TokenError(Exception): pass
class StopTokenizing(Exception): pass
def printtoken(type, token, start, end, line): # for testing
(srow, scol) = start
(erow, ecol) = end
print "%d,%d-%d,%d:\t%s\t%s" % \
(srow, scol, erow, ecol, tok_name[type], repr(token))
def tokenize(readline, tokeneater=printtoken):
"""
The tokenize() function accepts two parameters: one representing the
input stream, and one providing an output mechanism for tokenize().
The first parameter, readline, must be a callable object which provides
the same interface as the readline() method of built-in file objects.
Each call to the function should return one line of input as a string.
The second parameter, tokeneater, must also be a callable object. It is
called once for each token, with five arguments, corresponding to the
tuples generated by generate_tokens().
"""
try:
tokenize_loop(readline, tokeneater)
except StopTokenizing:
pass
# backwards compatible interface
def tokenize_loop(readline, tokeneater):
for token_info in generate_tokens(readline):
tokeneater(*token_info)
class Untokenizer:
def __init__(self):
self.tokens = []
self.prev_row = 1
self.prev_col = 0
def add_whitespace(self, start):
row, col = start
assert row <= self.prev_row
col_offset = col - self.prev_col
if col_offset:
self.tokens.append(" " * col_offset)
def untokenize(self, iterable):
for t in iterable:
if len(t) == 2:
self.compat(t, iterable)
break
tok_type, token, start, end, line = t
self.add_whitespace(start)
self.tokens.append(token)
self.prev_row, self.prev_col = end
if tok_type in (NEWLINE, NL):
self.prev_row += 1
self.prev_col = 0
return "".join(self.tokens)
def compat(self, token, iterable):
startline = False
indents = []
toks_append = self.tokens.append
toknum, tokval = token
if toknum in (NAME, NUMBER):
tokval += ' '
if toknum in (NEWLINE, NL):
startline = True
for tok in iterable:
toknum, tokval = tok[:2]
if toknum in (NAME, NUMBER):
tokval += ' '
if toknum == INDENT:
indents.append(tokval)
continue
elif toknum == DEDENT:
indents.pop()
continue
elif toknum in (NEWLINE, NL):
startline = True
elif startline and indents:
toks_append(indents[-1])
startline = False
toks_append(tokval)
cookie_re = re.compile(r'^[ \t\f]*#.*?coding[:=][ \t]*([-\w.]+)')
blank_re = re.compile(r'^[ \t\f]*(?:[#\r\n]|$)')
def _get_normal_name(orig_enc):
"""Imitates get_normal_name in tokenizer.c."""
# Only care about the first 12 characters.
enc = orig_enc[:12].lower().replace("_", "-")
if enc == "utf-8" or enc.startswith("utf-8-"):
return "utf-8"
if enc in ("latin-1", "iso-8859-1", "iso-latin-1") or \
enc.startswith(("latin-1-", "iso-8859-1-", "iso-latin-1-")):
return "iso-8859-1"
return orig_enc
def detect_encoding(readline):
"""
The detect_encoding() function is used to detect the encoding that should
be used to decode a Python source file. It requires one argument, readline,
in the same way as the tokenize() generator.
It will call readline a maximum of twice, and return the encoding used
(as a string) and a list of any lines (left as bytes) it has read
in.
It detects the encoding from the presence of a utf-8 bom or an encoding
cookie as specified in pep-0263. If both a bom and a cookie are present, but
disagree, a SyntaxError will be raised. If the encoding cookie is an invalid
charset, raise a SyntaxError. Note that if a utf-8 bom is found,
'utf-8-sig' is returned.
If no encoding is specified, then the default of 'utf-8' will be returned.
"""
bom_found = False
encoding = None
default = 'utf-8'
def read_or_stop():
try:
return readline()
except StopIteration:
return bytes()
def find_cookie(line):
try:
line_string = line.decode('ascii')
except UnicodeDecodeError:
return None
match = cookie_re.match(line_string)
if not match:
return None
encoding = _get_normal_name(match.group(1))
try:
codec = lookup(encoding)
except LookupError:
# This behaviour mimics the Python interpreter
raise SyntaxError("unknown encoding: " + encoding)
if bom_found:
if codec.name != 'utf-8':
# This behaviour mimics the Python interpreter
raise SyntaxError('encoding problem: utf-8')
encoding += '-sig'
return encoding
first = read_or_stop()
if first.startswith(BOM_UTF8):
bom_found = True
first = first[3:]
default = 'utf-8-sig'
if not first:
return default, []
encoding = find_cookie(first)
if encoding:
return encoding, [first]
if not blank_re.match(first):
return default, [first]
second = read_or_stop()
if not second:
return default, [first]
encoding = find_cookie(second)
if encoding:
return encoding, [first, second]
return default, [first, second]
def untokenize(iterable):
"""Transform tokens back into Python source code.
Each element returned by the iterable must be a token sequence
with at least two elements, a token number and token value. If
only two tokens are passed, the resulting output is poor.
Round-trip invariant for full input:
Untokenized source will match input source exactly
Round-trip invariant for limited intput:
# Output text will tokenize the back to the input
t1 = [tok[:2] for tok in generate_tokens(f.readline)]
newcode = untokenize(t1)
readline = iter(newcode.splitlines(1)).next
t2 = [tok[:2] for tokin generate_tokens(readline)]
assert t1 == t2
"""
ut = Untokenizer()
return ut.untokenize(iterable)
def generate_tokens(readline):
"""
The generate_tokens() generator requires one argument, readline, which
must be a callable object which provides the same interface as the
readline() method of built-in file objects. Each call to the function
should return one line of input as a string. Alternately, readline
can be a callable function terminating with StopIteration:
readline = open(myfile).next # Example of alternate readline
The generator produces 5-tuples with these members: the token type; the
token string; a 2-tuple (srow, scol) of ints specifying the row and
column where the token begins in the source; a 2-tuple (erow, ecol) of
ints specifying the row and column where the token ends in the source;
and the line on which the token was found. The line passed is the
logical line; continuation lines are included.
"""
lnum = parenlev = continued = 0
namechars, numchars = string.ascii_letters + '_', '0123456789'
contstr, needcont = '', 0
contline = None
indents = [0]
while 1: # loop over lines in stream
try:
line = readline()
except StopIteration:
line = ''
lnum = lnum + 1
pos, max = 0, len(line)
if contstr: # continued string
if not line:
raise TokenError, ("EOF in multi-line string", strstart)
endmatch = endprog.match(line)
if endmatch:
pos = end = endmatch.end(0)
yield (STRING, contstr + line[:end],
strstart, (lnum, end), contline + line)
contstr, needcont = '', 0
contline = None
elif needcont and line[-2:] != '\\\n' and line[-3:] != '\\\r\n':
yield (ERRORTOKEN, contstr + line,
strstart, (lnum, len(line)), contline)
contstr = ''
contline = None
continue
else:
contstr = contstr + line
contline = contline + line
continue
elif parenlev == 0 and not continued: # new statement
if not line: break
column = 0
while pos < max: # measure leading whitespace
if line[pos] == ' ': column = column + 1
elif line[pos] == '\t': column = (column//tabsize + 1)*tabsize
elif line[pos] == '\f': column = 0
else: break
pos = pos + 1
if pos == max: break
if line[pos] in '#\r\n': # skip comments or blank lines
if line[pos] == '#':
comment_token = line[pos:].rstrip('\r\n')
nl_pos = pos + len(comment_token)
yield (COMMENT, comment_token,
(lnum, pos), (lnum, pos + len(comment_token)), line)
yield (NL, line[nl_pos:],
(lnum, nl_pos), (lnum, len(line)), line)
else:
yield ((NL, COMMENT)[line[pos] == '#'], line[pos:],
(lnum, pos), (lnum, len(line)), line)
continue
if column > indents[-1]: # count indents or dedents
indents.append(column)
yield (INDENT, line[:pos], (lnum, 0), (lnum, pos), line)
while column < indents[-1]:
if column not in indents:
raise IndentationError(
"unindent does not match any outer indentation level",
("<tokenize>", lnum, pos, line))
indents = indents[:-1]
yield (DEDENT, '', (lnum, pos), (lnum, pos), line)
else: # continued statement
if not line:
raise TokenError, ("EOF in multi-line statement", (lnum, 0))
continued = 0
while pos < max:
pseudomatch = pseudoprog.match(line, pos)
if pseudomatch: # scan for tokens
start, end = pseudomatch.span(1)
spos, epos, pos = (lnum, start), (lnum, end), end
token, initial = line[start:end], line[start]
if initial in numchars or \
(initial == '.' and token != '.'): # ordinary number
yield (NUMBER, token, spos, epos, line)
elif initial in '\r\n':
newline = NEWLINE
if parenlev > 0:
newline = NL
yield (newline, token, spos, epos, line)
elif initial == '#':
assert not token.endswith("\n")
yield (COMMENT, token, spos, epos, line)
elif token in triple_quoted:
endprog = endprogs[token]
endmatch = endprog.match(line, pos)
if endmatch: # all on one line
pos = endmatch.end(0)
token = line[start:pos]
yield (STRING, token, spos, (lnum, pos), line)
else:
strstart = (lnum, start) # multiple lines
contstr = line[start:]
contline = line
break
elif initial in single_quoted or \
token[:2] in single_quoted or \
token[:3] in single_quoted:
if token[-1] == '\n': # continued string
strstart = (lnum, start)
endprog = (endprogs[initial] or endprogs[token[1]] or
endprogs[token[2]])
contstr, needcont = line[start:], 1
contline = line
break
else: # ordinary string
yield (STRING, token, spos, epos, line)
elif initial in namechars: # ordinary name
yield (NAME, token, spos, epos, line)
elif initial == '\\': # continued stmt
# This yield is new; needed for better idempotency:
yield (NL, token, spos, (lnum, pos), line)
continued = 1
else:
if initial in '([{': parenlev = parenlev + 1
elif initial in ')]}': parenlev = parenlev - 1
yield (OP, token, spos, epos, line)
else:
yield (ERRORTOKEN, line[pos],
(lnum, pos), (lnum, pos+1), line)
pos = pos + 1
for indent in indents[1:]: # pop remaining indent levels
yield (DEDENT, '', (lnum, 0), (lnum, 0), '')
yield (ENDMARKER, '', (lnum, 0), (lnum, 0), '')
if __name__ == '__main__': # testing
import sys
if len(sys.argv) > 1: tokenize(open(sys.argv[1]).readline)
else: tokenize(sys.stdin.readline)
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""The pgen2 package."""
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Export the Python grammar and symbols."""
# Python imports
import os
# Local imports
from .pgen2 import token
from .pgen2 import driver
from . import pytree
# The grammar file
_GRAMMAR_FILE = os.path.join(os.path.dirname(__file__), "Grammar.txt")
_PATTERN_GRAMMAR_FILE = os.path.join(os.path.dirname(__file__),
"PatternGrammar.txt")
class Symbols(object):
def __init__(self, grammar):
"""Initializer.
Creates an attribute for each grammar symbol (nonterminal),
whose value is the symbol's type (an int >= 256).
"""
for name, symbol in grammar.symbol2number.iteritems():
setattr(self, name, symbol)
python_grammar = driver.load_packaged_grammar("lib2to3", _GRAMMAR_FILE)
python_symbols = Symbols(python_grammar)
python_grammar_no_print_statement = python_grammar.copy()
del python_grammar_no_print_statement.keywords["print"]
pattern_grammar = driver.load_packaged_grammar("lib2to3", _PATTERN_GRAMMAR_FILE)
pattern_symbols = Symbols(pattern_grammar)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""
Python parse tree definitions.
This is a very concrete parse tree; we need to keep every token and
even the comments and whitespace between tokens.
There's also a pattern matching implementation here.
"""
__author__ = "Guido van Rossum <[email protected]>"
import sys
import warnings
from StringIO import StringIO
HUGE = 0x7FFFFFFF # maximum repeat count, default max
_type_reprs = {}
def type_repr(type_num):
global _type_reprs
if not _type_reprs:
from .pygram import python_symbols
# printing tokens is possible but not as useful
# from .pgen2 import token // token.__dict__.items():
for name, val in python_symbols.__dict__.items():
if type(val) == int: _type_reprs[val] = name
return _type_reprs.setdefault(type_num, type_num)
class Base(object):
"""
Abstract base class for Node and Leaf.
This provides some default functionality and boilerplate using the
template pattern.
A node may be a subnode of at most one parent.
"""
# Default values for instance variables
type = None # int: token number (< 256) or symbol number (>= 256)
parent = None # Parent node pointer, or None
children = () # Tuple of subnodes
was_changed = False
was_checked = False
def __new__(cls, *args, **kwds):
"""Constructor that prevents Base from being instantiated."""
assert cls is not Base, "Cannot instantiate Base"
return object.__new__(cls)
def __eq__(self, other):
"""
Compare two nodes for equality.
This calls the method _eq().
"""
if self.__class__ is not other.__class__:
return NotImplemented
return self._eq(other)
__hash__ = None # For Py3 compatibility.
def __ne__(self, other):
"""
Compare two nodes for inequality.
This calls the method _eq().
"""
if self.__class__ is not other.__class__:
return NotImplemented
return not self._eq(other)
def _eq(self, other):
"""
Compare two nodes for equality.
This is called by __eq__ and __ne__. It is only called if the two nodes
have the same type. This must be implemented by the concrete subclass.
Nodes should be considered equal if they have the same structure,
ignoring the prefix string and other context information.
"""
raise NotImplementedError
def clone(self):
"""
Return a cloned (deep) copy of self.
This must be implemented by the concrete subclass.
"""
raise NotImplementedError
def post_order(self):
"""
Return a post-order iterator for the tree.
This must be implemented by the concrete subclass.
"""
raise NotImplementedError
def pre_order(self):
"""
Return a pre-order iterator for the tree.
This must be implemented by the concrete subclass.
"""
raise NotImplementedError
def set_prefix(self, prefix):
"""
Set the prefix for the node (see Leaf class).
DEPRECATED; use the prefix property directly.
"""
warnings.warn("set_prefix() is deprecated; use the prefix property",
DeprecationWarning, stacklevel=2)
self.prefix = prefix
def get_prefix(self):
"""
Return the prefix for the node (see Leaf class).
DEPRECATED; use the prefix property directly.
"""
warnings.warn("get_prefix() is deprecated; use the prefix property",
DeprecationWarning, stacklevel=2)
return self.prefix
def replace(self, new):
"""Replace this node with a new one in the parent."""
assert self.parent is not None, str(self)
assert new is not None
if not isinstance(new, list):
new = [new]
l_children = []
found = False
for ch in self.parent.children:
if ch is self:
assert not found, (self.parent.children, self, new)
if new is not None:
l_children.extend(new)
found = True
else:
l_children.append(ch)
assert found, (self.children, self, new)
self.parent.changed()
self.parent.children = l_children
for x in new:
x.parent = self.parent
self.parent = None
def get_lineno(self):
"""Return the line number which generated the invocant node."""
node = self
while not isinstance(node, Leaf):
if not node.children:
return
node = node.children[0]
return node.lineno
def changed(self):
if self.parent:
self.parent.changed()
self.was_changed = True
def remove(self):
"""
Remove the node from the tree. Returns the position of the node in its
parent's children before it was removed.
"""
if self.parent:
for i, node in enumerate(self.parent.children):
if node is self:
self.parent.changed()
del self.parent.children[i]
self.parent = None
return i
@property
def next_sibling(self):
"""
The node immediately following the invocant in their parent's children
list. If the invocant does not have a next sibling, it is None
"""
if self.parent is None:
return None
# Can't use index(); we need to test by identity
for i, child in enumerate(self.parent.children):
if child is self:
try:
return self.parent.children[i+1]
except IndexError:
return None
@property
def prev_sibling(self):
"""
The node immediately preceding the invocant in their parent's children
list. If the invocant does not have a previous sibling, it is None.
"""
if self.parent is None:
return None
# Can't use index(); we need to test by identity
for i, child in enumerate(self.parent.children):
if child is self:
if i == 0:
return None
return self.parent.children[i-1]
def leaves(self):
for child in self.children:
for x in child.leaves():
yield x
def depth(self):
if self.parent is None:
return 0
return 1 + self.parent.depth()
def get_suffix(self):
"""
Return the string immediately following the invocant node. This is
effectively equivalent to node.next_sibling.prefix
"""
next_sib = self.next_sibling
if next_sib is None:
return u""
return next_sib.prefix
if sys.version_info < (3, 0):
def __str__(self):
return unicode(self).encode("ascii")
class Node(Base):
"""Concrete implementation for interior nodes."""
def __init__(self,type, children,
context=None,
prefix=None,
fixers_applied=None):
"""
Initializer.
Takes a type constant (a symbol number >= 256), a sequence of
child nodes, and an optional context keyword argument.
As a side effect, the parent pointers of the children are updated.
"""
assert type >= 256, type
self.type = type
self.children = list(children)
for ch in self.children:
assert ch.parent is None, repr(ch)
ch.parent = self
if prefix is not None:
self.prefix = prefix
if fixers_applied:
self.fixers_applied = fixers_applied[:]
else:
self.fixers_applied = None
def __repr__(self):
"""Return a canonical string representation."""
return "%s(%s, %r)" % (self.__class__.__name__,
type_repr(self.type),
self.children)
def __unicode__(self):
"""
Return a pretty string representation.
This reproduces the input source exactly.
"""
return u"".join(map(unicode, self.children))
if sys.version_info > (3, 0):
__str__ = __unicode__
def _eq(self, other):
"""Compare two nodes for equality."""
return (self.type, self.children) == (other.type, other.children)
def clone(self):
"""Return a cloned (deep) copy of self."""
return Node(self.type, [ch.clone() for ch in self.children],
fixers_applied=self.fixers_applied)
def post_order(self):
"""Return a post-order iterator for the tree."""
for child in self.children:
for node in child.post_order():
yield node
yield self
def pre_order(self):
"""Return a pre-order iterator for the tree."""
yield self
for child in self.children:
for node in child.pre_order():
yield node
def _prefix_getter(self):
"""
The whitespace and comments preceding this node in the input.
"""
if not self.children:
return ""
return self.children[0].prefix
def _prefix_setter(self, prefix):
if self.children:
self.children[0].prefix = prefix
prefix = property(_prefix_getter, _prefix_setter)
def set_child(self, i, child):
"""
Equivalent to 'node.children[i] = child'. This method also sets the
child's parent attribute appropriately.
"""
child.parent = self
self.children[i].parent = None
self.children[i] = child
self.changed()
def insert_child(self, i, child):
"""
Equivalent to 'node.children.insert(i, child)'. This method also sets
the child's parent attribute appropriately.
"""
child.parent = self
self.children.insert(i, child)
self.changed()
def append_child(self, child):
"""
Equivalent to 'node.children.append(child)'. This method also sets the
child's parent attribute appropriately.
"""
child.parent = self
self.children.append(child)
self.changed()
class Leaf(Base):
"""Concrete implementation for leaf nodes."""
# Default values for instance variables
_prefix = "" # Whitespace and comments preceding this token in the input
lineno = 0 # Line where this token starts in the input
column = 0 # Column where this token tarts in the input
def __init__(self, type, value,
context=None,
prefix=None,
fixers_applied=[]):
"""
Initializer.
Takes a type constant (a token number < 256), a string value, and an
optional context keyword argument.
"""
assert 0 <= type < 256, type
if context is not None:
self._prefix, (self.lineno, self.column) = context
self.type = type
self.value = value
if prefix is not None:
self._prefix = prefix
self.fixers_applied = fixers_applied[:]
def __repr__(self):
"""Return a canonical string representation."""
return "%s(%r, %r)" % (self.__class__.__name__,
self.type,
self.value)
def __unicode__(self):
"""
Return a pretty string representation.
This reproduces the input source exactly.
"""
return self.prefix + unicode(self.value)
if sys.version_info > (3, 0):
__str__ = __unicode__
def _eq(self, other):
"""Compare two nodes for equality."""
return (self.type, self.value) == (other.type, other.value)
def clone(self):
"""Return a cloned (deep) copy of self."""
return Leaf(self.type, self.value,
(self.prefix, (self.lineno, self.column)),
fixers_applied=self.fixers_applied)
def leaves(self):
yield self
def post_order(self):
"""Return a post-order iterator for the tree."""
yield self
def pre_order(self):
"""Return a pre-order iterator for the tree."""
yield self
def _prefix_getter(self):
"""
The whitespace and comments preceding this token in the input.
"""
return self._prefix
def _prefix_setter(self, prefix):
self.changed()
self._prefix = prefix
prefix = property(_prefix_getter, _prefix_setter)
def convert(gr, raw_node):
"""
Convert raw node information to a Node or Leaf instance.
This is passed to the parser driver which calls it whenever a reduction of a
grammar rule produces a new complete node, so that the tree is build
strictly bottom-up.
"""
type, value, context, children = raw_node
if children or type in gr.number2symbol:
# If there's exactly one child, return that child instead of
# creating a new node.
if len(children) == 1:
return children[0]
return Node(type, children, context=context)
else:
return Leaf(type, value, context=context)
class BasePattern(object):
"""
A pattern is a tree matching pattern.
It looks for a specific node type (token or symbol), and
optionally for a specific content.
This is an abstract base class. There are three concrete
subclasses:
- LeafPattern matches a single leaf node;
- NodePattern matches a single node (usually non-leaf);
- WildcardPattern matches a sequence of nodes of variable length.
"""
# Defaults for instance variables
type = None # Node type (token if < 256, symbol if >= 256)
content = None # Optional content matching pattern
name = None # Optional name used to store match in results dict
def __new__(cls, *args, **kwds):
"""Constructor that prevents BasePattern from being instantiated."""
assert cls is not BasePattern, "Cannot instantiate BasePattern"
return object.__new__(cls)
def __repr__(self):
args = [type_repr(self.type), self.content, self.name]
while args and args[-1] is None:
del args[-1]
return "%s(%s)" % (self.__class__.__name__, ", ".join(map(repr, args)))
def optimize(self):
"""
A subclass can define this as a hook for optimizations.
Returns either self or another node with the same effect.
"""
return self
def match(self, node, results=None):
"""
Does this pattern exactly match a node?
Returns True if it matches, False if not.
If results is not None, it must be a dict which will be
updated with the nodes matching named subpatterns.
Default implementation for non-wildcard patterns.
"""
if self.type is not None and node.type != self.type:
return False
if self.content is not None:
r = None
if results is not None:
r = {}
if not self._submatch(node, r):
return False
if r:
results.update(r)
if results is not None and self.name:
results[self.name] = node
return True
def match_seq(self, nodes, results=None):
"""
Does this pattern exactly match a sequence of nodes?
Default implementation for non-wildcard patterns.
"""
if len(nodes) != 1:
return False
return self.match(nodes[0], results)
def generate_matches(self, nodes):
"""
Generator yielding all matches for this pattern.
Default implementation for non-wildcard patterns.
"""
r = {}
if nodes and self.match(nodes[0], r):
yield 1, r
class LeafPattern(BasePattern):
def __init__(self, type=None, content=None, name=None):
"""
Initializer. Takes optional type, content, and name.
The type, if given must be a token type (< 256). If not given,
this matches any *leaf* node; the content may still be required.
The content, if given, must be a string.
If a name is given, the matching node is stored in the results
dict under that key.
"""
if type is not None:
assert 0 <= type < 256, type
if content is not None:
assert isinstance(content, basestring), repr(content)
self.type = type
self.content = content
self.name = name
def match(self, node, results=None):
"""Override match() to insist on a leaf node."""
if not isinstance(node, Leaf):
return False
return BasePattern.match(self, node, results)
def _submatch(self, node, results=None):
"""
Match the pattern's content to the node's children.
This assumes the node type matches and self.content is not None.
Returns True if it matches, False if not.
If results is not None, it must be a dict which will be
updated with the nodes matching named subpatterns.
When returning False, the results dict may still be updated.
"""
return self.content == node.value
class NodePattern(BasePattern):
wildcards = False
def __init__(self, type=None, content=None, name=None):
"""
Initializer. Takes optional type, content, and name.
The type, if given, must be a symbol type (>= 256). If the
type is None this matches *any* single node (leaf or not),
except if content is not None, in which it only matches
non-leaf nodes that also match the content pattern.
The content, if not None, must be a sequence of Patterns that
must match the node's children exactly. If the content is
given, the type must not be None.
If a name is given, the matching node is stored in the results
dict under that key.
"""
if type is not None:
assert type >= 256, type
if content is not None:
assert not isinstance(content, basestring), repr(content)
content = list(content)
for i, item in enumerate(content):
assert isinstance(item, BasePattern), (i, item)
if isinstance(item, WildcardPattern):
self.wildcards = True
self.type = type
self.content = content
self.name = name
def _submatch(self, node, results=None):
"""
Match the pattern's content to the node's children.
This assumes the node type matches and self.content is not None.
Returns True if it matches, False if not.
If results is not None, it must be a dict which will be
updated with the nodes matching named subpatterns.
When returning False, the results dict may still be updated.
"""
if self.wildcards:
for c, r in generate_matches(self.content, node.children):
if c == len(node.children):
if results is not None:
results.update(r)
return True
return False
if len(self.content) != len(node.children):
return False
for subpattern, child in zip(self.content, node.children):
if not subpattern.match(child, results):
return False
return True
class WildcardPattern(BasePattern):
"""
A wildcard pattern can match zero or more nodes.
This has all the flexibility needed to implement patterns like:
.* .+ .? .{m,n}
(a b c | d e | f)
(...)* (...)+ (...)? (...){m,n}
except it always uses non-greedy matching.
"""
def __init__(self, content=None, min=0, max=HUGE, name=None):
"""
Initializer.
Args:
content: optional sequence of subsequences of patterns;
if absent, matches one node;
if present, each subsequence is an alternative [*]
min: optional minimum number of times to match, default 0
max: optional maximum number of times to match, default HUGE
name: optional name assigned to this match
[*] Thus, if content is [[a, b, c], [d, e], [f, g, h]] this is
equivalent to (a b c | d e | f g h); if content is None,
this is equivalent to '.' in regular expression terms.
The min and max parameters work as follows:
min=0, max=maxint: .*
min=1, max=maxint: .+
min=0, max=1: .?
min=1, max=1: .
If content is not None, replace the dot with the parenthesized
list of alternatives, e.g. (a b c | d e | f g h)*
"""
assert 0 <= min <= max <= HUGE, (min, max)
if content is not None:
content = tuple(map(tuple, content)) # Protect against alterations
# Check sanity of alternatives
assert len(content), repr(content) # Can't have zero alternatives
for alt in content:
assert len(alt), repr(alt) # Can have empty alternatives
self.content = content
self.min = min
self.max = max
self.name = name
def optimize(self):
"""Optimize certain stacked wildcard patterns."""
subpattern = None
if (self.content is not None and
len(self.content) == 1 and len(self.content[0]) == 1):
subpattern = self.content[0][0]
if self.min == 1 and self.max == 1:
if self.content is None:
return NodePattern(name=self.name)
if subpattern is not None and self.name == subpattern.name:
return subpattern.optimize()
if (self.min <= 1 and isinstance(subpattern, WildcardPattern) and
subpattern.min <= 1 and self.name == subpattern.name):
return WildcardPattern(subpattern.content,
self.min*subpattern.min,
self.max*subpattern.max,
subpattern.name)
return self
def match(self, node, results=None):
"""Does this pattern exactly match a node?"""
return self.match_seq([node], results)
def match_seq(self, nodes, results=None):
"""Does this pattern exactly match a sequence of nodes?"""
for c, r in self.generate_matches(nodes):
if c == len(nodes):
if results is not None:
results.update(r)
if self.name:
results[self.name] = list(nodes)
return True
return False
def generate_matches(self, nodes):
"""
Generator yielding matches for a sequence of nodes.
Args:
nodes: sequence of nodes
Yields:
(count, results) tuples where:
count: the match comprises nodes[:count];
results: dict containing named submatches.
"""
if self.content is None:
# Shortcut for special case (see __init__.__doc__)
for count in xrange(self.min, 1 + min(len(nodes), self.max)):
r = {}
if self.name:
r[self.name] = nodes[:count]
yield count, r
elif self.name == "bare_name":
yield self._bare_name_matches(nodes)
else:
# The reason for this is that hitting the recursion limit usually
# results in some ugly messages about how RuntimeErrors are being
# ignored. We don't do this on non-CPython implementation because
# they don't have this problem.
if hasattr(sys, "getrefcount"):
save_stderr = sys.stderr
sys.stderr = StringIO()
try:
for count, r in self._recursive_matches(nodes, 0):
if self.name:
r[self.name] = nodes[:count]
yield count, r
except RuntimeError:
# We fall back to the iterative pattern matching scheme if the recursive
# scheme hits the recursion limit.
for count, r in self._iterative_matches(nodes):
if self.name:
r[self.name] = nodes[:count]
yield count, r
finally:
if hasattr(sys, "getrefcount"):
sys.stderr = save_stderr
def _iterative_matches(self, nodes):
"""Helper to iteratively yield the matches."""
nodelen = len(nodes)
if 0 >= self.min:
yield 0, {}
results = []
# generate matches that use just one alt from self.content
for alt in self.content:
for c, r in generate_matches(alt, nodes):
yield c, r
results.append((c, r))
# for each match, iterate down the nodes
while results:
new_results = []
for c0, r0 in results:
# stop if the entire set of nodes has been matched
if c0 < nodelen and c0 <= self.max:
for alt in self.content:
for c1, r1 in generate_matches(alt, nodes[c0:]):
if c1 > 0:
r = {}
r.update(r0)
r.update(r1)
yield c0 + c1, r
new_results.append((c0 + c1, r))
results = new_results
def _bare_name_matches(self, nodes):
"""Special optimized matcher for bare_name."""
count = 0
r = {}
done = False
max = len(nodes)
while not done and count < max:
done = True
for leaf in self.content:
if leaf[0].match(nodes[count], r):
count += 1
done = False
break
r[self.name] = nodes[:count]
return count, r
def _recursive_matches(self, nodes, count):
"""Helper to recursively yield the matches."""
assert self.content is not None
if count >= self.min:
yield 0, {}
if count < self.max:
for alt in self.content:
for c0, r0 in generate_matches(alt, nodes):
for c1, r1 in self._recursive_matches(nodes[c0:], count+1):
r = {}
r.update(r0)
r.update(r1)
yield c0 + c1, r
class NegatedPattern(BasePattern):
def __init__(self, content=None):
"""
Initializer.
The argument is either a pattern or None. If it is None, this
only matches an empty sequence (effectively '$' in regex
lingo). If it is not None, this matches whenever the argument
pattern doesn't have any matches.
"""
if content is not None:
assert isinstance(content, BasePattern), repr(content)
self.content = content
def match(self, node):
# We never match a node in its entirety
return False
def match_seq(self, nodes):
# We only match an empty sequence of nodes in its entirety
return len(nodes) == 0
def generate_matches(self, nodes):
if self.content is None:
# Return a match if there is an empty sequence
if len(nodes) == 0:
yield 0, {}
else:
# Return a match if the argument pattern has no matches
for c, r in self.content.generate_matches(nodes):
return
yield 0, {}
def generate_matches(patterns, nodes):
"""
Generator yielding matches for a sequence of patterns and nodes.
Args:
patterns: a sequence of patterns
nodes: a sequence of nodes
Yields:
(count, results) tuples where:
count: the entire sequence of patterns matches nodes[:count];
results: dict containing named submatches.
"""
if not patterns:
yield 0, {}
else:
p, rest = patterns[0], patterns[1:]
for c0, r0 in p.generate_matches(nodes):
if not rest:
yield c0, r0
else:
for c1, r1 in generate_matches(rest, nodes[c0:]):
r = {}
r.update(r0)
r.update(r1)
yield c0 + c1, r
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Refactoring framework.
Used as a main program, this can refactor any number of files and/or
recursively descend down directories. Imported as a module, this
provides infrastructure to write your own refactoring tool.
"""
from __future__ import with_statement
__author__ = "Guido van Rossum <[email protected]>"
# Python imports
import os
import pkgutil
import sys
import logging
import operator
import collections
import StringIO
from itertools import chain
# Local imports
from .pgen2 import driver, tokenize, token
from .fixer_util import find_root
from . import pytree, pygram
from . import btm_utils as bu
from . import btm_matcher as bm
def get_all_fix_names(fixer_pkg, remove_prefix=True):
"""Return a sorted list of all available fix names in the given package."""
pkg = __import__(fixer_pkg, [], [], ["*"])
fix_names = []
for finder, name, ispkg in pkgutil.iter_modules(pkg.__path__):
if name.startswith("fix_"):
if remove_prefix:
name = name[4:]
fix_names.append(name)
return fix_names
class _EveryNode(Exception):
pass
def _get_head_types(pat):
""" Accepts a pytree Pattern Node and returns a set
of the pattern types which will match first. """
if isinstance(pat, (pytree.NodePattern, pytree.LeafPattern)):
# NodePatters must either have no type and no content
# or a type and content -- so they don't get any farther
# Always return leafs
if pat.type is None:
raise _EveryNode
return set([pat.type])
if isinstance(pat, pytree.NegatedPattern):
if pat.content:
return _get_head_types(pat.content)
raise _EveryNode # Negated Patterns don't have a type
if isinstance(pat, pytree.WildcardPattern):
# Recurse on each node in content
r = set()
for p in pat.content:
for x in p:
r.update(_get_head_types(x))
return r
raise Exception("Oh no! I don't understand pattern %s" %(pat))
def _get_headnode_dict(fixer_list):
""" Accepts a list of fixers and returns a dictionary
of head node type --> fixer list. """
head_nodes = collections.defaultdict(list)
every = []
for fixer in fixer_list:
if fixer.pattern:
try:
heads = _get_head_types(fixer.pattern)
except _EveryNode:
every.append(fixer)
else:
for node_type in heads:
head_nodes[node_type].append(fixer)
else:
if fixer._accept_type is not None:
head_nodes[fixer._accept_type].append(fixer)
else:
every.append(fixer)
for node_type in chain(pygram.python_grammar.symbol2number.itervalues(),
pygram.python_grammar.tokens):
head_nodes[node_type].extend(every)
return dict(head_nodes)
def get_fixers_from_package(pkg_name):
"""
Return the fully qualified names for fixers in the package pkg_name.
"""
return [pkg_name + "." + fix_name
for fix_name in get_all_fix_names(pkg_name, False)]
def _identity(obj):
return obj
if sys.version_info < (3, 0):
import codecs
_open_with_encoding = codecs.open
# codecs.open doesn't translate newlines sadly.
def _from_system_newlines(input):
return input.replace(u"\r\n", u"\n")
def _to_system_newlines(input):
if os.linesep != "\n":
return input.replace(u"\n", os.linesep)
else:
return input
else:
_open_with_encoding = open
_from_system_newlines = _identity
_to_system_newlines = _identity
def _detect_future_features(source):
have_docstring = False
gen = tokenize.generate_tokens(StringIO.StringIO(source).readline)
def advance():
tok = gen.next()
return tok[0], tok[1]
ignore = frozenset((token.NEWLINE, tokenize.NL, token.COMMENT))
features = set()
try:
while True:
tp, value = advance()
if tp in ignore:
continue
elif tp == token.STRING:
if have_docstring:
break
have_docstring = True
elif tp == token.NAME and value == u"from":
tp, value = advance()
if tp != token.NAME or value != u"__future__":
break
tp, value = advance()
if tp != token.NAME or value != u"import":
break
tp, value = advance()
if tp == token.OP and value == u"(":
tp, value = advance()
while tp == token.NAME:
features.add(value)
tp, value = advance()
if tp != token.OP or value != u",":
break
tp, value = advance()
else:
break
except StopIteration:
pass
return frozenset(features)
class FixerError(Exception):
"""A fixer could not be loaded."""
class RefactoringTool(object):
_default_options = {"print_function" : False,
"write_unchanged_files" : False}
CLASS_PREFIX = "Fix" # The prefix for fixer classes
FILE_PREFIX = "fix_" # The prefix for modules with a fixer within
def __init__(self, fixer_names, options=None, explicit=None):
"""Initializer.
Args:
fixer_names: a list of fixers to import
options: a dict with configuration.
explicit: a list of fixers to run even if they are explicit.
"""
self.fixers = fixer_names
self.explicit = explicit or []
self.options = self._default_options.copy()
if options is not None:
self.options.update(options)
if self.options["print_function"]:
self.grammar = pygram.python_grammar_no_print_statement
else:
self.grammar = pygram.python_grammar
# When this is True, the refactor*() methods will call write_file() for
# files processed even if they were not changed during refactoring. If
# and only if the refactor method's write parameter was True.
self.write_unchanged_files = self.options.get("write_unchanged_files")
self.errors = []
self.logger = logging.getLogger("RefactoringTool")
self.fixer_log = []
self.wrote = False
self.driver = driver.Driver(self.grammar,
convert=pytree.convert,
logger=self.logger)
self.pre_order, self.post_order = self.get_fixers()
self.files = [] # List of files that were or should be modified
self.BM = bm.BottomMatcher()
self.bmi_pre_order = [] # Bottom Matcher incompatible fixers
self.bmi_post_order = []
for fixer in chain(self.post_order, self.pre_order):
if fixer.BM_compatible:
self.BM.add_fixer(fixer)
# remove fixers that will be handled by the bottom-up
# matcher
elif fixer in self.pre_order:
self.bmi_pre_order.append(fixer)
elif fixer in self.post_order:
self.bmi_post_order.append(fixer)
self.bmi_pre_order_heads = _get_headnode_dict(self.bmi_pre_order)
self.bmi_post_order_heads = _get_headnode_dict(self.bmi_post_order)
def get_fixers(self):
"""Inspects the options to load the requested patterns and handlers.
Returns:
(pre_order, post_order), where pre_order is the list of fixers that
want a pre-order AST traversal, and post_order is the list that want
post-order traversal.
"""
pre_order_fixers = []
post_order_fixers = []
for fix_mod_path in self.fixers:
mod = __import__(fix_mod_path, {}, {}, ["*"])
fix_name = fix_mod_path.rsplit(".", 1)[-1]
if fix_name.startswith(self.FILE_PREFIX):
fix_name = fix_name[len(self.FILE_PREFIX):]
parts = fix_name.split("_")
class_name = self.CLASS_PREFIX + "".join([p.title() for p in parts])
try:
fix_class = getattr(mod, class_name)
except AttributeError:
raise FixerError("Can't find %s.%s" % (fix_name, class_name))
fixer = fix_class(self.options, self.fixer_log)
if fixer.explicit and self.explicit is not True and \
fix_mod_path not in self.explicit:
self.log_message("Skipping optional fixer: %s", fix_name)
continue
self.log_debug("Adding transformation: %s", fix_name)
if fixer.order == "pre":
pre_order_fixers.append(fixer)
elif fixer.order == "post":
post_order_fixers.append(fixer)
else:
raise FixerError("Illegal fixer order: %r" % fixer.order)
key_func = operator.attrgetter("run_order")
pre_order_fixers.sort(key=key_func)
post_order_fixers.sort(key=key_func)
return (pre_order_fixers, post_order_fixers)
def log_error(self, msg, *args, **kwds):
"""Called when an error occurs."""
raise
def log_message(self, msg, *args):
"""Hook to log a message."""
if args:
msg = msg % args
self.logger.info(msg)
def log_debug(self, msg, *args):
if args:
msg = msg % args
self.logger.debug(msg)
def print_output(self, old_text, new_text, filename, equal):
"""Called with the old version, new version, and filename of a
refactored file."""
pass
def refactor(self, items, write=False, doctests_only=False):
"""Refactor a list of files and directories."""
for dir_or_file in items:
if os.path.isdir(dir_or_file):
self.refactor_dir(dir_or_file, write, doctests_only)
else:
self.refactor_file(dir_or_file, write, doctests_only)
def refactor_dir(self, dir_name, write=False, doctests_only=False):
"""Descends down a directory and refactor every Python file found.
Python files are assumed to have a .py extension.
Files and subdirectories starting with '.' are skipped.
"""
py_ext = os.extsep + "py"
for dirpath, dirnames, filenames in os.walk(dir_name):
self.log_debug("Descending into %s", dirpath)
dirnames.sort()
filenames.sort()
for name in filenames:
if (not name.startswith(".") and
os.path.splitext(name)[1] == py_ext):
fullname = os.path.join(dirpath, name)
self.refactor_file(fullname, write, doctests_only)
# Modify dirnames in-place to remove subdirs with leading dots
dirnames[:] = [dn for dn in dirnames if not dn.startswith(".")]
def _read_python_source(self, filename):
"""
Do our best to decode a Python source file correctly.
"""
try:
f = open(filename, "rb")
except IOError as err:
self.log_error("Can't open %s: %s", filename, err)
return None, None
try:
encoding = tokenize.detect_encoding(f.readline)[0]
finally:
f.close()
with _open_with_encoding(filename, "r", encoding=encoding) as f:
return _from_system_newlines(f.read()), encoding
def refactor_file(self, filename, write=False, doctests_only=False):
"""Refactors a file."""
input, encoding = self._read_python_source(filename)
if input is None:
# Reading the file failed.
return
input += u"\n" # Silence certain parse errors
if doctests_only:
self.log_debug("Refactoring doctests in %s", filename)
output = self.refactor_docstring(input, filename)
if self.write_unchanged_files or output != input:
self.processed_file(output, filename, input, write, encoding)
else:
self.log_debug("No doctest changes in %s", filename)
else:
tree = self.refactor_string(input, filename)
if self.write_unchanged_files or (tree and tree.was_changed):
# The [:-1] is to take off the \n we added earlier
self.processed_file(unicode(tree)[:-1], filename,
write=write, encoding=encoding)
else:
self.log_debug("No changes in %s", filename)
def refactor_string(self, data, name):
"""Refactor a given input string.
Args:
data: a string holding the code to be refactored.
name: a human-readable name for use in error/log messages.
Returns:
An AST corresponding to the refactored input stream; None if
there were errors during the parse.
"""
features = _detect_future_features(data)
if "print_function" in features:
self.driver.grammar = pygram.python_grammar_no_print_statement
try:
tree = self.driver.parse_string(data)
except Exception as err:
self.log_error("Can't parse %s: %s: %s",
name, err.__class__.__name__, err)
return
finally:
self.driver.grammar = self.grammar
tree.future_features = features
self.log_debug("Refactoring %s", name)
self.refactor_tree(tree, name)
return tree
def refactor_stdin(self, doctests_only=False):
input = sys.stdin.read()
if doctests_only:
self.log_debug("Refactoring doctests in stdin")
output = self.refactor_docstring(input, "<stdin>")
if self.write_unchanged_files or output != input:
self.processed_file(output, "<stdin>", input)
else:
self.log_debug("No doctest changes in stdin")
else:
tree = self.refactor_string(input, "<stdin>")
if self.write_unchanged_files or (tree and tree.was_changed):
self.processed_file(unicode(tree), "<stdin>", input)
else:
self.log_debug("No changes in stdin")
def refactor_tree(self, tree, name):
"""Refactors a parse tree (modifying the tree in place).
For compatible patterns the bottom matcher module is
used. Otherwise the tree is traversed node-to-node for
matches.
Args:
tree: a pytree.Node instance representing the root of the tree
to be refactored.
name: a human-readable name for this tree.
Returns:
True if the tree was modified, False otherwise.
"""
for fixer in chain(self.pre_order, self.post_order):
fixer.start_tree(tree, name)
#use traditional matching for the incompatible fixers
self.traverse_by(self.bmi_pre_order_heads, tree.pre_order())
self.traverse_by(self.bmi_post_order_heads, tree.post_order())
# obtain a set of candidate nodes
match_set = self.BM.run(tree.leaves())
while any(match_set.values()):
for fixer in self.BM.fixers:
if fixer in match_set and match_set[fixer]:
#sort by depth; apply fixers from bottom(of the AST) to top
match_set[fixer].sort(key=pytree.Base.depth, reverse=True)
if fixer.keep_line_order:
#some fixers(eg fix_imports) must be applied
#with the original file's line order
match_set[fixer].sort(key=pytree.Base.get_lineno)
for node in list(match_set[fixer]):
if node in match_set[fixer]:
match_set[fixer].remove(node)
try:
find_root(node)
except ValueError:
# this node has been cut off from a
# previous transformation ; skip
continue
if node.fixers_applied and fixer in node.fixers_applied:
# do not apply the same fixer again
continue
results = fixer.match(node)
if results:
new = fixer.transform(node, results)
if new is not None:
node.replace(new)
#new.fixers_applied.append(fixer)
for node in new.post_order():
# do not apply the fixer again to
# this or any subnode
if not node.fixers_applied:
node.fixers_applied = []
node.fixers_applied.append(fixer)
# update the original match set for
# the added code
new_matches = self.BM.run(new.leaves())
for fxr in new_matches:
if not fxr in match_set:
match_set[fxr]=[]
match_set[fxr].extend(new_matches[fxr])
for fixer in chain(self.pre_order, self.post_order):
fixer.finish_tree(tree, name)
return tree.was_changed
def traverse_by(self, fixers, traversal):
"""Traverse an AST, applying a set of fixers to each node.
This is a helper method for refactor_tree().
Args:
fixers: a list of fixer instances.
traversal: a generator that yields AST nodes.
Returns:
None
"""
if not fixers:
return
for node in traversal:
for fixer in fixers[node.type]:
results = fixer.match(node)
if results:
new = fixer.transform(node, results)
if new is not None:
node.replace(new)
node = new
def processed_file(self, new_text, filename, old_text=None, write=False,
encoding=None):
"""
Called when a file has been refactored and there may be changes.
"""
self.files.append(filename)
if old_text is None:
old_text = self._read_python_source(filename)[0]
if old_text is None:
return
equal = old_text == new_text
self.print_output(old_text, new_text, filename, equal)
if equal:
self.log_debug("No changes to %s", filename)
if not self.write_unchanged_files:
return
if write:
self.write_file(new_text, filename, old_text, encoding)
else:
self.log_debug("Not writing changes to %s", filename)
def write_file(self, new_text, filename, old_text, encoding=None):
"""Writes a string to a file.
It first shows a unified diff between the old text and the new text, and
then rewrites the file; the latter is only done if the write option is
set.
"""
try:
f = _open_with_encoding(filename, "w", encoding=encoding)
except os.error as err:
self.log_error("Can't create %s: %s", filename, err)
return
try:
f.write(_to_system_newlines(new_text))
except os.error as err:
self.log_error("Can't write %s: %s", filename, err)
finally:
f.close()
self.log_debug("Wrote changes to %s", filename)
self.wrote = True
PS1 = ">>> "
PS2 = "... "
def refactor_docstring(self, input, filename):
"""Refactors a docstring, looking for doctests.
This returns a modified version of the input string. It looks
for doctests, which start with a ">>>" prompt, and may be
continued with "..." prompts, as long as the "..." is indented
the same as the ">>>".
(Unfortunately we can't use the doctest module's parser,
since, like most parsers, it is not geared towards preserving
the original source.)
"""
result = []
block = None
block_lineno = None
indent = None
lineno = 0
for line in input.splitlines(True):
lineno += 1
if line.lstrip().startswith(self.PS1):
if block is not None:
result.extend(self.refactor_doctest(block, block_lineno,
indent, filename))
block_lineno = lineno
block = [line]
i = line.find(self.PS1)
indent = line[:i]
elif (indent is not None and
(line.startswith(indent + self.PS2) or
line == indent + self.PS2.rstrip() + u"\n")):
block.append(line)
else:
if block is not None:
result.extend(self.refactor_doctest(block, block_lineno,
indent, filename))
block = None
indent = None
result.append(line)
if block is not None:
result.extend(self.refactor_doctest(block, block_lineno,
indent, filename))
return u"".join(result)
def refactor_doctest(self, block, lineno, indent, filename):
"""Refactors one doctest.
A doctest is given as a block of lines, the first of which starts
with ">>>" (possibly indented), while the remaining lines start
with "..." (identically indented).
"""
try:
tree = self.parse_block(block, lineno, indent)
except Exception as err:
if self.logger.isEnabledFor(logging.DEBUG):
for line in block:
self.log_debug("Source: %s", line.rstrip(u"\n"))
self.log_error("Can't parse docstring in %s line %s: %s: %s",
filename, lineno, err.__class__.__name__, err)
return block
if self.refactor_tree(tree, filename):
new = unicode(tree).splitlines(True)
# Undo the adjustment of the line numbers in wrap_toks() below.
clipped, new = new[:lineno-1], new[lineno-1:]
assert clipped == [u"\n"] * (lineno-1), clipped
if not new[-1].endswith(u"\n"):
new[-1] += u"\n"
block = [indent + self.PS1 + new.pop(0)]
if new:
block += [indent + self.PS2 + line for line in new]
return block
def summarize(self):
if self.wrote:
were = "were"
else:
were = "need to be"
if not self.files:
self.log_message("No files %s modified.", were)
else:
self.log_message("Files that %s modified:", were)
for file in self.files:
self.log_message(file)
if self.fixer_log:
self.log_message("Warnings/messages while refactoring:")
for message in self.fixer_log:
self.log_message(message)
if self.errors:
if len(self.errors) == 1:
self.log_message("There was 1 error:")
else:
self.log_message("There were %d errors:", len(self.errors))
for msg, args, kwds in self.errors:
self.log_message(msg, *args, **kwds)
def parse_block(self, block, lineno, indent):
"""Parses a block into a tree.
This is necessary to get correct line number / offset information
in the parser diagnostics and embedded into the parse tree.
"""
tree = self.driver.parse_tokens(self.wrap_toks(block, lineno, indent))
tree.future_features = frozenset()
return tree
def wrap_toks(self, block, lineno, indent):
"""Wraps a tokenize stream to systematically modify start/end."""
tokens = tokenize.generate_tokens(self.gen_lines(block, indent).next)
for type, value, (line0, col0), (line1, col1), line_text in tokens:
line0 += lineno - 1
line1 += lineno - 1
# Don't bother updating the columns; this is too complicated
# since line_text would also have to be updated and it would
# still break for tokens spanning lines. Let the user guess
# that the column numbers for doctests are relative to the
# end of the prompt string (PS1 or PS2).
yield type, value, (line0, col0), (line1, col1), line_text
def gen_lines(self, block, indent):
"""Generates lines as expected by tokenize from a list of lines.
This strips the first len(indent + self.PS1) characters off each line.
"""
prefix1 = indent + self.PS1
prefix2 = indent + self.PS2
prefix = prefix1
for line in block:
if line.startswith(prefix):
yield line[len(prefix):]
elif line == prefix.rstrip() + u"\n":
yield u"\n"
else:
raise AssertionError("line=%r, prefix=%r" % (line, prefix))
prefix = prefix2
while True:
yield ""
class MultiprocessingUnsupported(Exception):
pass
class MultiprocessRefactoringTool(RefactoringTool):
def __init__(self, *args, **kwargs):
super(MultiprocessRefactoringTool, self).__init__(*args, **kwargs)
self.queue = None
self.output_lock = None
def refactor(self, items, write=False, doctests_only=False,
num_processes=1):
if num_processes == 1:
return super(MultiprocessRefactoringTool, self).refactor(
items, write, doctests_only)
try:
import multiprocessing
except ImportError:
raise MultiprocessingUnsupported
if self.queue is not None:
raise RuntimeError("already doing multiple processes")
self.queue = multiprocessing.JoinableQueue()
self.output_lock = multiprocessing.Lock()
processes = [multiprocessing.Process(target=self._child)
for i in xrange(num_processes)]
try:
for p in processes:
p.start()
super(MultiprocessRefactoringTool, self).refactor(items, write,
doctests_only)
finally:
self.queue.join()
for i in xrange(num_processes):
self.queue.put(None)
for p in processes:
if p.is_alive():
p.join()
self.queue = None
def _child(self):
task = self.queue.get()
while task is not None:
args, kwargs = task
try:
super(MultiprocessRefactoringTool, self).refactor_file(
*args, **kwargs)
finally:
self.queue.task_done()
task = self.queue.get()
def refactor_file(self, *args, **kwargs):
if self.queue is not None:
self.queue.put((args, kwargs))
else:
return super(MultiprocessRefactoringTool, self).refactor_file(
*args, **kwargs)
#empty
import sys
from .main import main
sys.exit(main("lib2to3.fixes"))
"""Cache lines from files.
This is intended to read lines from modules imported -- hence if a filename
is not found, it will look down the module search path for a file by
that name.
"""
import sys
import os
__all__ = ["getline", "clearcache", "checkcache"]
def getline(filename, lineno, module_globals=None):
lines = getlines(filename, module_globals)
if 1 <= lineno <= len(lines):
return lines[lineno-1]
else:
return ''
# The cache
cache = {} # The cache
def clearcache():
"""Clear the cache entirely."""
global cache
cache = {}
def getlines(filename, module_globals=None):
"""Get the lines for a file from the cache.
Update the cache if it doesn't contain an entry for this file already."""
if filename in cache:
return cache[filename][2]
try:
return updatecache(filename, module_globals)
except MemoryError:
clearcache()
return []
def checkcache(filename=None):
"""Discard cache entries that are out of date.
(This is not checked upon each call!)"""
if filename is None:
filenames = cache.keys()
else:
if filename in cache:
filenames = [filename]
else:
return
for filename in filenames:
size, mtime, lines, fullname = cache[filename]
if mtime is None:
continue # no-op for files loaded via a __loader__
try:
stat = os.stat(fullname)
except os.error:
del cache[filename]
continue
if size != stat.st_size or mtime != stat.st_mtime:
del cache[filename]
def updatecache(filename, module_globals=None):
"""Update a cache entry and return its list of lines.
If something's wrong, print a message, discard the cache entry,
and return an empty list."""
if filename in cache:
del cache[filename]
if not filename or (filename.startswith('<') and filename.endswith('>')):
return []
fullname = filename
try:
stat = os.stat(fullname)
except OSError:
basename = filename
# Try for a __loader__, if available
if module_globals and '__loader__' in module_globals:
name = module_globals.get('__name__')
loader = module_globals['__loader__']
get_source = getattr(loader, 'get_source', None)
if name and get_source:
try:
data = get_source(name)
except (ImportError, IOError):
pass
else:
if data is None:
# No luck, the PEP302 loader cannot find the source
# for this module.
return []
cache[filename] = (
len(data), None,
[line+'\n' for line in data.splitlines()], fullname
)
return cache[filename][2]
# Try looking through the module search path, which is only useful
# when handling a relative filename.
if os.path.isabs(filename):
return []
for dirname in sys.path:
# When using imputil, sys.path may contain things other than
# strings; ignore them when it happens.
try:
fullname = os.path.join(dirname, basename)
except (TypeError, AttributeError):
# Not sufficiently string-like to do anything useful with.
continue
try:
stat = os.stat(fullname)
break
except os.error:
pass
else:
return []
try:
with open(fullname, 'rU') as fp:
lines = fp.readlines()
except IOError:
return []
if lines and not lines[-1].endswith('\n'):
lines[-1] += '\n'
size, mtime = stat.st_size, stat.st_mtime
cache[filename] = size, mtime, lines, fullname
return lines
"""Locale support module.
The module provides low-level access to the C lib's locale APIs and adds high
level number formatting APIs as well as a locale aliasing engine to complement
these.
The aliasing engine includes support for many commonly used locale names and
maps them to values suitable for passing to the C lib's setlocale() function. It
also includes default encodings for all supported locale names.
"""
import sys
import encodings
import encodings.aliases
import re
import operator
import functools
# keep a copy of the builtin str type, because 'str' name is overridden
# in globals by a function below
_str = str
try:
_unicode = unicode
except NameError:
# If Python is built without Unicode support, the unicode type
# will not exist. Fake one.
class _unicode(object):
pass
# Try importing the _locale module.
#
# If this fails, fall back on a basic 'C' locale emulation.
# Yuck: LC_MESSAGES is non-standard: can't tell whether it exists before
# trying the import. So __all__ is also fiddled at the end of the file.
__all__ = ["getlocale", "getdefaultlocale", "getpreferredencoding", "Error",
"setlocale", "resetlocale", "localeconv", "strcoll", "strxfrm",
"str", "atof", "atoi", "format", "format_string", "currency",
"normalize", "LC_CTYPE", "LC_COLLATE", "LC_TIME", "LC_MONETARY",
"LC_NUMERIC", "LC_ALL", "CHAR_MAX"]
try:
from _locale import *
except ImportError:
# Locale emulation
CHAR_MAX = 127
LC_ALL = 6
LC_COLLATE = 3
LC_CTYPE = 0
LC_MESSAGES = 5
LC_MONETARY = 4
LC_NUMERIC = 1
LC_TIME = 2
Error = ValueError
def localeconv():
""" localeconv() -> dict.
Returns numeric and monetary locale-specific parameters.
"""
# 'C' locale default values
return {'grouping': [127],
'currency_symbol': '',
'n_sign_posn': 127,
'p_cs_precedes': 127,
'n_cs_precedes': 127,
'mon_grouping': [],
'n_sep_by_space': 127,
'decimal_point': '.',
'negative_sign': '',
'positive_sign': '',
'p_sep_by_space': 127,
'int_curr_symbol': '',
'p_sign_posn': 127,
'thousands_sep': '',
'mon_thousands_sep': '',
'frac_digits': 127,
'mon_decimal_point': '',
'int_frac_digits': 127}
def setlocale(category, value=None):
""" setlocale(integer,string=None) -> string.
Activates/queries locale processing.
"""
if value not in (None, '', 'C'):
raise Error, '_locale emulation only supports "C" locale'
return 'C'
def strcoll(a,b):
""" strcoll(string,string) -> int.
Compares two strings according to the locale.
"""
return cmp(a,b)
def strxfrm(s):
""" strxfrm(string) -> string.
Returns a string that behaves for cmp locale-aware.
"""
return s
_localeconv = localeconv
# With this dict, you can override some items of localeconv's return value.
# This is useful for testing purposes.
_override_localeconv = {}
@functools.wraps(_localeconv)
def localeconv():
d = _localeconv()
if _override_localeconv:
d.update(_override_localeconv)
return d
### Number formatting APIs
# Author: Martin von Loewis
# improved by Georg Brandl
# Iterate over grouping intervals
def _grouping_intervals(grouping):
last_interval = None
for interval in grouping:
# if grouping is -1, we are done
if interval == CHAR_MAX:
return
# 0: re-use last group ad infinitum
if interval == 0:
if last_interval is None:
raise ValueError("invalid grouping")
while True:
yield last_interval
yield interval
last_interval = interval
#perform the grouping from right to left
def _group(s, monetary=False):
conv = localeconv()
thousands_sep = conv[monetary and 'mon_thousands_sep' or 'thousands_sep']
grouping = conv[monetary and 'mon_grouping' or 'grouping']
if not grouping:
return (s, 0)
if s[-1] == ' ':
stripped = s.rstrip()
right_spaces = s[len(stripped):]
s = stripped
else:
right_spaces = ''
left_spaces = ''
groups = []
for interval in _grouping_intervals(grouping):
if not s or s[-1] not in "0123456789":
# only non-digit characters remain (sign, spaces)
left_spaces = s
s = ''
break
groups.append(s[-interval:])
s = s[:-interval]
if s:
groups.append(s)
groups.reverse()
return (
left_spaces + thousands_sep.join(groups) + right_spaces,
len(thousands_sep) * (len(groups) - 1)
)
# Strip a given amount of excess padding from the given string
def _strip_padding(s, amount):
lpos = 0
while amount and s[lpos] == ' ':
lpos += 1
amount -= 1
rpos = len(s) - 1
while amount and s[rpos] == ' ':
rpos -= 1
amount -= 1
return s[lpos:rpos+1]
_percent_re = re.compile(r'%(?:\((?P<key>.*?)\))?'
r'(?P<modifiers>[-#0-9 +*.hlL]*?)[eEfFgGdiouxXcrs%]')
def format(percent, value, grouping=False, monetary=False, *additional):
"""Returns the locale-aware substitution of a %? specifier
(percent).
additional is for format strings which contain one or more
'*' modifiers."""
# this is only for one-percent-specifier strings and this should be checked
match = _percent_re.match(percent)
if not match or len(match.group())!= len(percent):
raise ValueError(("format() must be given exactly one %%char "
"format specifier, %s not valid") % repr(percent))
return _format(percent, value, grouping, monetary, *additional)
def _format(percent, value, grouping=False, monetary=False, *additional):
if additional:
formatted = percent % ((value,) + additional)
else:
formatted = percent % value
# floats and decimal ints need special action!
if percent[-1] in 'eEfFgG':
seps = 0
parts = formatted.split('.')
if grouping:
parts[0], seps = _group(parts[0], monetary=monetary)
decimal_point = localeconv()[monetary and 'mon_decimal_point'
or 'decimal_point']
formatted = decimal_point.join(parts)
if seps:
formatted = _strip_padding(formatted, seps)
elif percent[-1] in 'diu':
seps = 0
if grouping:
formatted, seps = _group(formatted, monetary=monetary)
if seps:
formatted = _strip_padding(formatted, seps)
return formatted
def format_string(f, val, grouping=False):
"""Formats a string in the same way that the % formatting would use,
but takes the current locale into account.
Grouping is applied if the third parameter is true."""
percents = list(_percent_re.finditer(f))
new_f = _percent_re.sub('%s', f)
if operator.isMappingType(val):
new_val = []
for perc in percents:
if perc.group()[-1]=='%':
new_val.append('%')
else:
new_val.append(format(perc.group(), val, grouping))
else:
if not isinstance(val, tuple):
val = (val,)
new_val = []
i = 0
for perc in percents:
if perc.group()[-1]=='%':
new_val.append('%')
else:
starcount = perc.group('modifiers').count('*')
new_val.append(_format(perc.group(),
val[i],
grouping,
False,
*val[i+1:i+1+starcount]))
i += (1 + starcount)
val = tuple(new_val)
return new_f % val
def currency(val, symbol=True, grouping=False, international=False):
"""Formats val according to the currency settings
in the current locale."""
conv = localeconv()
# check for illegal values
digits = conv[international and 'int_frac_digits' or 'frac_digits']
if digits == 127:
raise ValueError("Currency formatting is not possible using "
"the 'C' locale.")
s = format('%%.%if' % digits, abs(val), grouping, monetary=True)
# '<' and '>' are markers if the sign must be inserted between symbol and value
s = '<' + s + '>'
if symbol:
smb = conv[international and 'int_curr_symbol' or 'currency_symbol']
precedes = conv[val<0 and 'n_cs_precedes' or 'p_cs_precedes']
separated = conv[val<0 and 'n_sep_by_space' or 'p_sep_by_space']
if precedes:
s = smb + (separated and ' ' or '') + s
else:
s = s + (separated and ' ' or '') + smb
sign_pos = conv[val<0 and 'n_sign_posn' or 'p_sign_posn']
sign = conv[val<0 and 'negative_sign' or 'positive_sign']
if sign_pos == 0:
s = '(' + s + ')'
elif sign_pos == 1:
s = sign + s
elif sign_pos == 2:
s = s + sign
elif sign_pos == 3:
s = s.replace('<', sign)
elif sign_pos == 4:
s = s.replace('>', sign)
else:
# the default if nothing specified;
# this should be the most fitting sign position
s = sign + s
return s.replace('<', '').replace('>', '')
def str(val):
"""Convert float to string, taking the locale into account."""
return format("%.12g", val)
def atof(string, func=float):
"Parses a string as a float according to the locale settings."
#First, get rid of the grouping
ts = localeconv()['thousands_sep']
if ts:
string = string.replace(ts, '')
#next, replace the decimal point with a dot
dd = localeconv()['decimal_point']
if dd:
string = string.replace(dd, '.')
#finally, parse the string
return func(string)
def atoi(str):
"Converts a string to an integer according to the locale settings."
return atof(str, int)
def _test():
setlocale(LC_ALL, "")
#do grouping
s1 = format("%d", 123456789,1)
print s1, "is", atoi(s1)
#standard formatting
s1 = str(3.14)
print s1, "is", atof(s1)
### Locale name aliasing engine
# Author: Marc-Andre Lemburg, [email protected]
# Various tweaks by Fredrik Lundh <[email protected]>
# store away the low-level version of setlocale (it's
# overridden below)
_setlocale = setlocale
# Avoid relying on the locale-dependent .lower() method
# (see issue #1813).
_ascii_lower_map = ''.join(
chr(x + 32 if x >= ord('A') and x <= ord('Z') else x)
for x in range(256)
)
def _replace_encoding(code, encoding):
if '.' in code:
langname = code[:code.index('.')]
else:
langname = code
# Convert the encoding to a C lib compatible encoding string
norm_encoding = encodings.normalize_encoding(encoding)
#print('norm encoding: %r' % norm_encoding)
norm_encoding = encodings.aliases.aliases.get(norm_encoding,
norm_encoding)
#print('aliased encoding: %r' % norm_encoding)
encoding = locale_encoding_alias.get(norm_encoding,
norm_encoding)
#print('found encoding %r' % encoding)
return langname + '.' + encoding
def normalize(localename):
""" Returns a normalized locale code for the given locale
name.
The returned locale code is formatted for use with
setlocale().
If normalization fails, the original name is returned
unchanged.
If the given encoding is not known, the function defaults to
the default encoding for the locale code just like setlocale()
does.
"""
# Normalize the locale name and extract the encoding and modifier
if isinstance(localename, _unicode):
localename = localename.encode('ascii')
code = localename.translate(_ascii_lower_map)
if ':' in code:
# ':' is sometimes used as encoding delimiter.
code = code.replace(':', '.')
if '@' in code:
code, modifier = code.split('@', 1)
else:
modifier = ''
if '.' in code:
langname, encoding = code.split('.')[:2]
else:
langname = code
encoding = ''
# First lookup: fullname (possibly with encoding and modifier)
lang_enc = langname
if encoding:
norm_encoding = encoding.replace('-', '')
norm_encoding = norm_encoding.replace('_', '')
lang_enc += '.' + norm_encoding
lookup_name = lang_enc
if modifier:
lookup_name += '@' + modifier
code = locale_alias.get(lookup_name, None)
if code is not None:
return code
#print('first lookup failed')
if modifier:
# Second try: fullname without modifier (possibly with encoding)
code = locale_alias.get(lang_enc, None)
if code is not None:
#print('lookup without modifier succeeded')
if '@' not in code:
return code + '@' + modifier
if code.split('@', 1)[1].translate(_ascii_lower_map) == modifier:
return code
#print('second lookup failed')
if encoding:
# Third try: langname (without encoding, possibly with modifier)
lookup_name = langname
if modifier:
lookup_name += '@' + modifier
code = locale_alias.get(lookup_name, None)
if code is not None:
#print('lookup without encoding succeeded')
if '@' not in code:
return _replace_encoding(code, encoding)
code, modifier = code.split('@', 1)
return _replace_encoding(code, encoding) + '@' + modifier
if modifier:
# Fourth try: langname (without encoding and modifier)
code = locale_alias.get(langname, None)
if code is not None:
#print('lookup without modifier and encoding succeeded')
if '@' not in code:
return _replace_encoding(code, encoding) + '@' + modifier
code, defmod = code.split('@', 1)
if defmod.translate(_ascii_lower_map) == modifier:
return _replace_encoding(code, encoding) + '@' + defmod
return localename
def _parse_localename(localename):
""" Parses the locale code for localename and returns the
result as tuple (language code, encoding).
The localename is normalized and passed through the locale
alias engine. A ValueError is raised in case the locale name
cannot be parsed.
The language code corresponds to RFC 1766. code and encoding
can be None in case the values cannot be determined or are
unknown to this implementation.
"""
code = normalize(localename)
if '@' in code:
# Deal with locale modifiers
code, modifier = code.split('@', 1)
if modifier == 'euro' and '.' not in code:
# Assume Latin-9 for @euro locales. This is bogus,
# since some systems may use other encodings for these
# locales. Also, we ignore other modifiers.
return code, 'iso-8859-15'
if '.' in code:
return tuple(code.split('.')[:2])
elif code == 'C':
return None, None
raise ValueError, 'unknown locale: %s' % localename
def _build_localename(localetuple):
""" Builds a locale code from the given tuple (language code,
encoding).
No aliasing or normalizing takes place.
"""
language, encoding = localetuple
if language is None:
language = 'C'
if encoding is None:
return language
else:
return language + '.' + encoding
def getdefaultlocale(envvars=('LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE')):
""" Tries to determine the default locale settings and returns
them as tuple (language code, encoding).
According to POSIX, a program which has not called
setlocale(LC_ALL, "") runs using the portable 'C' locale.
Calling setlocale(LC_ALL, "") lets it use the default locale as
defined by the LANG variable. Since we don't want to interfere
with the current locale setting we thus emulate the behavior
in the way described above.
To maintain compatibility with other platforms, not only the
LANG variable is tested, but a list of variables given as
envvars parameter. The first found to be defined will be
used. envvars defaults to the search path used in GNU gettext;
it must always contain the variable name 'LANG'.
Except for the code 'C', the language code corresponds to RFC
1766. code and encoding can be None in case the values cannot
be determined.
"""
try:
# check if it's supported by the _locale module
import _locale
code, encoding = _locale._getdefaultlocale()
except (ImportError, AttributeError):
pass
else:
# make sure the code/encoding values are valid
if sys.platform == "win32" and code and code[:2] == "0x":
# map windows language identifier to language name
code = windows_locale.get(int(code, 0))
# ...add other platform-specific processing here, if
# necessary...
return code, encoding
# fall back on POSIX behaviour
import os
lookup = os.environ.get
for variable in envvars:
localename = lookup(variable,None)
if localename:
if variable == 'LANGUAGE':
localename = localename.split(':')[0]
break
else:
localename = 'C'
return _parse_localename(localename)
def getlocale(category=LC_CTYPE):
""" Returns the current setting for the given locale category as
tuple (language code, encoding).
category may be one of the LC_* value except LC_ALL. It
defaults to LC_CTYPE.
Except for the code 'C', the language code corresponds to RFC
1766. code and encoding can be None in case the values cannot
be determined.
"""
localename = _setlocale(category)
if category == LC_ALL and ';' in localename:
raise TypeError, 'category LC_ALL is not supported'
return _parse_localename(localename)
def setlocale(category, locale=None):
""" Set the locale for the given category. The locale can be
a string, an iterable of two strings (language code and encoding),
or None.
Iterables are converted to strings using the locale aliasing
engine. Locale strings are passed directly to the C lib.
category may be given as one of the LC_* values.
"""
if locale and not isinstance(locale, (_str, _unicode)):
# convert to string
locale = normalize(_build_localename(locale))
return _setlocale(category, locale)
def resetlocale(category=LC_ALL):
""" Sets the locale for category to the default setting.
The default setting is determined by calling
getdefaultlocale(). category defaults to LC_ALL.
"""
_setlocale(category, _build_localename(getdefaultlocale()))
if sys.platform.startswith("win"):
# On Win32, this will return the ANSI code page
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using."""
import _locale
return _locale._getdefaultlocale()[1]
else:
# On Unix, if CODESET is available, use that.
try:
CODESET
except NameError:
# Fall back to parsing environment variables :-(
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using,
by looking at environment variables."""
return getdefaultlocale()[1]
else:
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using,
according to the system configuration."""
if do_setlocale:
oldloc = setlocale(LC_CTYPE)
try:
setlocale(LC_CTYPE, "")
except Error:
pass
result = nl_langinfo(CODESET)
setlocale(LC_CTYPE, oldloc)
else:
result = nl_langinfo(CODESET)
if not result and sys.platform == 'darwin':
# nl_langinfo can return an empty string
# when the setting has an invalid value.
# Default to UTF-8 in that case because
# UTF-8 is the default charset on OSX and
# returning nothing will crash the
# interpreter.
result = 'UTF-8'
return result
### Database
#
# The following data was extracted from the locale.alias file which
# comes with X11 and then hand edited removing the explicit encoding
# definitions and adding some more aliases. The file is usually
# available as /usr/lib/X11/locale/locale.alias.
#
#
# The local_encoding_alias table maps lowercase encoding alias names
# to C locale encoding names (case-sensitive). Note that normalize()
# first looks up the encoding in the encodings.aliases dictionary and
# then applies this mapping to find the correct C lib name for the
# encoding.
#
locale_encoding_alias = {
# Mappings for non-standard encoding names used in locale names
'437': 'C',
'c': 'C',
'en': 'ISO8859-1',
'jis': 'JIS7',
'jis7': 'JIS7',
'ajec': 'eucJP',
# Mappings from Python codec names to C lib encoding names
'ascii': 'ISO8859-1',
'latin_1': 'ISO8859-1',
'iso8859_1': 'ISO8859-1',
'iso8859_10': 'ISO8859-10',
'iso8859_11': 'ISO8859-11',
'iso8859_13': 'ISO8859-13',
'iso8859_14': 'ISO8859-14',
'iso8859_15': 'ISO8859-15',
'iso8859_16': 'ISO8859-16',
'iso8859_2': 'ISO8859-2',
'iso8859_3': 'ISO8859-3',
'iso8859_4': 'ISO8859-4',
'iso8859_5': 'ISO8859-5',
'iso8859_6': 'ISO8859-6',
'iso8859_7': 'ISO8859-7',
'iso8859_8': 'ISO8859-8',
'iso8859_9': 'ISO8859-9',
'iso2022_jp': 'JIS7',
'shift_jis': 'SJIS',
'tactis': 'TACTIS',
'euc_jp': 'eucJP',
'euc_kr': 'eucKR',
'utf_8': 'UTF-8',
'koi8_r': 'KOI8-R',
'koi8_u': 'KOI8-U',
# XXX This list is still incomplete. If you know more
# mappings, please file a bug report. Thanks.
}
#
# The locale_alias table maps lowercase alias names to C locale names
# (case-sensitive). Encodings are always separated from the locale
# name using a dot ('.'); they should only be given in case the
# language name is needed to interpret the given encoding alias
# correctly (CJK codes often have this need).
#
# Note that the normalize() function which uses this tables
# removes '_' and '-' characters from the encoding part of the
# locale name before doing the lookup. This saves a lot of
# space in the table.
#
# MAL 2004-12-10:
# Updated alias mapping to most recent locale.alias file
# from X.org distribution using makelocalealias.py.
#
# These are the differences compared to the old mapping (Python 2.4
# and older):
#
# updated 'bg' -> 'bg_BG.ISO8859-5' to 'bg_BG.CP1251'
# updated 'bg_bg' -> 'bg_BG.ISO8859-5' to 'bg_BG.CP1251'
# updated 'bulgarian' -> 'bg_BG.ISO8859-5' to 'bg_BG.CP1251'
# updated 'cz' -> 'cz_CZ.ISO8859-2' to 'cs_CZ.ISO8859-2'
# updated 'cz_cz' -> 'cz_CZ.ISO8859-2' to 'cs_CZ.ISO8859-2'
# updated 'czech' -> 'cs_CS.ISO8859-2' to 'cs_CZ.ISO8859-2'
# updated 'dutch' -> 'nl_BE.ISO8859-1' to 'nl_NL.ISO8859-1'
# updated 'et' -> 'et_EE.ISO8859-4' to 'et_EE.ISO8859-15'
# updated 'et_ee' -> 'et_EE.ISO8859-4' to 'et_EE.ISO8859-15'
# updated 'fi' -> 'fi_FI.ISO8859-1' to 'fi_FI.ISO8859-15'
# updated 'fi_fi' -> 'fi_FI.ISO8859-1' to 'fi_FI.ISO8859-15'
# updated 'iw' -> 'iw_IL.ISO8859-8' to 'he_IL.ISO8859-8'
# updated 'iw_il' -> 'iw_IL.ISO8859-8' to 'he_IL.ISO8859-8'
# updated 'japanese' -> 'ja_JP.SJIS' to 'ja_JP.eucJP'
# updated 'lt' -> 'lt_LT.ISO8859-4' to 'lt_LT.ISO8859-13'
# updated 'lv' -> 'lv_LV.ISO8859-4' to 'lv_LV.ISO8859-13'
# updated 'sl' -> 'sl_CS.ISO8859-2' to 'sl_SI.ISO8859-2'
# updated 'slovene' -> 'sl_CS.ISO8859-2' to 'sl_SI.ISO8859-2'
# updated 'th_th' -> 'th_TH.TACTIS' to 'th_TH.ISO8859-11'
# updated 'zh_cn' -> 'zh_CN.eucCN' to 'zh_CN.gb2312'
# updated 'zh_cn.big5' -> 'zh_TW.eucTW' to 'zh_TW.big5'
# updated 'zh_tw' -> 'zh_TW.eucTW' to 'zh_TW.big5'
#
# MAL 2008-05-30:
# Updated alias mapping to most recent locale.alias file
# from X.org distribution using makelocalealias.py.
#
# These are the differences compared to the old mapping (Python 2.5
# and older):
#
# updated 'cs_cs.iso88592' -> 'cs_CZ.ISO8859-2' to 'cs_CS.ISO8859-2'
# updated 'serbocroatian' -> 'sh_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sh' -> 'sh_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sh_hr.iso88592' -> 'sh_HR.ISO8859-2' to 'hr_HR.ISO8859-2'
# updated 'sh_sp' -> 'sh_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sh_yu' -> 'sh_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sp' -> 'sp_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sp_yu' -> 'sp_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr@cyrillic' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr_sp' -> 'sr_SP.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sr_yu' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr_yu.cp1251@cyrillic' -> 'sr_YU.CP1251' to 'sr_CS.CP1251'
# updated 'sr_yu.iso88592' -> 'sr_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sr_yu.iso88595' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr_yu.iso88595@cyrillic' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr_yu.microsoftcp1251@cyrillic' -> 'sr_YU.CP1251' to 'sr_CS.CP1251'
# updated 'sr_yu.utf8@cyrillic' -> 'sr_YU.UTF-8' to 'sr_CS.UTF-8'
# updated 'sr_yu@cyrillic' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
#
# AP 2010-04-12:
# Updated alias mapping to most recent locale.alias file
# from X.org distribution using makelocalealias.py.
#
# These are the differences compared to the old mapping (Python 2.6.5
# and older):
#
# updated 'ru' -> 'ru_RU.ISO8859-5' to 'ru_RU.UTF-8'
# updated 'ru_ru' -> 'ru_RU.ISO8859-5' to 'ru_RU.UTF-8'
# updated 'serbocroatian' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sh' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sh_yu' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sr' -> 'sr_CS.ISO8859-5' to 'sr_RS.UTF-8'
# updated 'sr@cyrillic' -> 'sr_CS.ISO8859-5' to 'sr_RS.UTF-8'
# updated 'sr@latn' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sr_cs.utf8@latn' -> 'sr_CS.UTF-8' to 'sr_RS.UTF-8@latin'
# updated 'sr_cs@latn' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sr_yu' -> 'sr_CS.ISO8859-5' to 'sr_RS.UTF-8@latin'
# updated 'sr_yu.utf8@cyrillic' -> 'sr_CS.UTF-8' to 'sr_RS.UTF-8'
# updated 'sr_yu@cyrillic' -> 'sr_CS.ISO8859-5' to 'sr_RS.UTF-8'
#
# SS 2013-12-20:
# Updated alias mapping to most recent locale.alias file
# from X.org distribution using makelocalealias.py.
#
# These are the differences compared to the old mapping (Python 2.7.6
# and older):
#
# updated 'a3' -> 'a3_AZ.KOI8-C' to 'az_AZ.KOI8-C'
# updated 'a3_az' -> 'a3_AZ.KOI8-C' to 'az_AZ.KOI8-C'
# updated 'a3_az.koi8c' -> 'a3_AZ.KOI8-C' to 'az_AZ.KOI8-C'
# updated 'cs_cs.iso88592' -> 'cs_CS.ISO8859-2' to 'cs_CZ.ISO8859-2'
# updated 'hebrew' -> 'iw_IL.ISO8859-8' to 'he_IL.ISO8859-8'
# updated 'hebrew.iso88598' -> 'iw_IL.ISO8859-8' to 'he_IL.ISO8859-8'
# updated 'sd' -> '[email protected]' to 'sd_IN.UTF-8'
# updated 'sr@latn' -> 'sr_RS.UTF-8@latin' to 'sr_CS.UTF-8@latin'
# updated 'sr_cs' -> 'sr_RS.UTF-8' to 'sr_CS.UTF-8'
# updated 'sr_cs.utf8@latn' -> 'sr_RS.UTF-8@latin' to 'sr_CS.UTF-8@latin'
# updated 'sr_cs@latn' -> 'sr_RS.UTF-8@latin' to 'sr_CS.UTF-8@latin'
#
# SS 2014-10-01:
# Updated alias mapping with glibc 2.19 supported locales.
#
# SS 2018-05-05:
# Updated alias mapping with glibc 2.27 supported locales.
#
# These are the differences compared to the old mapping (Python 2.7.15
# and older):
#
# updated 'ca_es@valencia' -> 'ca_ES.ISO8859-15@valencia' to 'ca_ES.UTF-8@valencia'
# updated 'english.iso88591' -> 'en_EN.ISO8859-1' to 'en_US.ISO8859-1'
# updated 'kk_kz' -> 'kk_KZ.RK1048' to 'kk_KZ.ptcp154'
# updated 'russian' -> 'ru_RU.ISO8859-5' to 'ru_RU.KOI8-R'
locale_alias = {
'a3': 'az_AZ.KOI8-C',
'a3_az': 'az_AZ.KOI8-C',
'a3_az.koi8c': 'az_AZ.KOI8-C',
'a3_az.koic': 'az_AZ.KOI8-C',
'aa_dj': 'aa_DJ.ISO8859-1',
'aa_er': 'aa_ER.UTF-8',
'aa_et': 'aa_ET.UTF-8',
'af': 'af_ZA.ISO8859-1',
'af_za': 'af_ZA.ISO8859-1',
'af_za.iso88591': 'af_ZA.ISO8859-1',
'agr_pe': 'agr_PE.UTF-8',
'ak_gh': 'ak_GH.UTF-8',
'am': 'am_ET.UTF-8',
'am_et': 'am_ET.UTF-8',
'american': 'en_US.ISO8859-1',
'american.iso88591': 'en_US.ISO8859-1',
'an_es': 'an_ES.ISO8859-15',
'anp_in': 'anp_IN.UTF-8',
'ar': 'ar_AA.ISO8859-6',
'ar_aa': 'ar_AA.ISO8859-6',
'ar_aa.iso88596': 'ar_AA.ISO8859-6',
'ar_ae': 'ar_AE.ISO8859-6',
'ar_ae.iso88596': 'ar_AE.ISO8859-6',
'ar_bh': 'ar_BH.ISO8859-6',
'ar_bh.iso88596': 'ar_BH.ISO8859-6',
'ar_dz': 'ar_DZ.ISO8859-6',
'ar_dz.iso88596': 'ar_DZ.ISO8859-6',
'ar_eg': 'ar_EG.ISO8859-6',
'ar_eg.iso88596': 'ar_EG.ISO8859-6',
'ar_in': 'ar_IN.UTF-8',
'ar_iq': 'ar_IQ.ISO8859-6',
'ar_iq.iso88596': 'ar_IQ.ISO8859-6',
'ar_jo': 'ar_JO.ISO8859-6',
'ar_jo.iso88596': 'ar_JO.ISO8859-6',
'ar_kw': 'ar_KW.ISO8859-6',
'ar_kw.iso88596': 'ar_KW.ISO8859-6',
'ar_lb': 'ar_LB.ISO8859-6',
'ar_lb.iso88596': 'ar_LB.ISO8859-6',
'ar_ly': 'ar_LY.ISO8859-6',
'ar_ly.iso88596': 'ar_LY.ISO8859-6',
'ar_ma': 'ar_MA.ISO8859-6',
'ar_ma.iso88596': 'ar_MA.ISO8859-6',
'ar_om': 'ar_OM.ISO8859-6',
'ar_om.iso88596': 'ar_OM.ISO8859-6',
'ar_qa': 'ar_QA.ISO8859-6',
'ar_qa.iso88596': 'ar_QA.ISO8859-6',
'ar_sa': 'ar_SA.ISO8859-6',
'ar_sa.iso88596': 'ar_SA.ISO8859-6',
'ar_sd': 'ar_SD.ISO8859-6',
'ar_sd.iso88596': 'ar_SD.ISO8859-6',
'ar_ss': 'ar_SS.UTF-8',
'ar_sy': 'ar_SY.ISO8859-6',
'ar_sy.iso88596': 'ar_SY.ISO8859-6',
'ar_tn': 'ar_TN.ISO8859-6',
'ar_tn.iso88596': 'ar_TN.ISO8859-6',
'ar_ye': 'ar_YE.ISO8859-6',
'ar_ye.iso88596': 'ar_YE.ISO8859-6',
'arabic': 'ar_AA.ISO8859-6',
'arabic.iso88596': 'ar_AA.ISO8859-6',
'as': 'as_IN.UTF-8',
'as_in': 'as_IN.UTF-8',
'ast_es': 'ast_ES.ISO8859-15',
'ayc_pe': 'ayc_PE.UTF-8',
'az': 'az_AZ.ISO8859-9E',
'az_az': 'az_AZ.ISO8859-9E',
'az_az.iso88599e': 'az_AZ.ISO8859-9E',
'az_ir': 'az_IR.UTF-8',
'be': 'be_BY.CP1251',
'be@latin': 'be_BY.UTF-8@latin',
'be_bg.utf8': 'bg_BG.UTF-8',
'be_by': 'be_BY.CP1251',
'be_by.cp1251': 'be_BY.CP1251',
'be_by.microsoftcp1251': 'be_BY.CP1251',
'be_by.utf8@latin': 'be_BY.UTF-8@latin',
'be_by@latin': 'be_BY.UTF-8@latin',
'bem_zm': 'bem_ZM.UTF-8',
'ber_dz': 'ber_DZ.UTF-8',
'ber_ma': 'ber_MA.UTF-8',
'bg': 'bg_BG.CP1251',
'bg_bg': 'bg_BG.CP1251',
'bg_bg.cp1251': 'bg_BG.CP1251',
'bg_bg.iso88595': 'bg_BG.ISO8859-5',
'bg_bg.koi8r': 'bg_BG.KOI8-R',
'bg_bg.microsoftcp1251': 'bg_BG.CP1251',
'bhb_in.utf8': 'bhb_IN.UTF-8',
'bho_in': 'bho_IN.UTF-8',
'bho_np': 'bho_NP.UTF-8',
'bi_vu': 'bi_VU.UTF-8',
'bn_bd': 'bn_BD.UTF-8',
'bn_in': 'bn_IN.UTF-8',
'bo_cn': 'bo_CN.UTF-8',
'bo_in': 'bo_IN.UTF-8',
'bokmal': 'nb_NO.ISO8859-1',
'bokm\xe5l': 'nb_NO.ISO8859-1',
'br': 'br_FR.ISO8859-1',
'br_fr': 'br_FR.ISO8859-1',
'br_fr.iso88591': 'br_FR.ISO8859-1',
'br_fr.iso885914': 'br_FR.ISO8859-14',
'br_fr.iso885915': 'br_FR.ISO8859-15',
'br_fr.iso885915@euro': 'br_FR.ISO8859-15',
'br_fr.utf8@euro': 'br_FR.UTF-8',
'br_fr@euro': 'br_FR.ISO8859-15',
'brx_in': 'brx_IN.UTF-8',
'bs': 'bs_BA.ISO8859-2',
'bs_ba': 'bs_BA.ISO8859-2',
'bs_ba.iso88592': 'bs_BA.ISO8859-2',
'bulgarian': 'bg_BG.CP1251',
'byn_er': 'byn_ER.UTF-8',
'c': 'C',
'c-french': 'fr_CA.ISO8859-1',
'c-french.iso88591': 'fr_CA.ISO8859-1',
'c.ascii': 'C',
'c.en': 'C',
'c.iso88591': 'en_US.ISO8859-1',
'c.utf8': 'en_US.UTF-8',
'c_c': 'C',
'c_c.c': 'C',
'ca': 'ca_ES.ISO8859-1',
'ca_ad': 'ca_AD.ISO8859-1',
'ca_ad.iso88591': 'ca_AD.ISO8859-1',
'ca_ad.iso885915': 'ca_AD.ISO8859-15',
'ca_ad.iso885915@euro': 'ca_AD.ISO8859-15',
'ca_ad.utf8@euro': 'ca_AD.UTF-8',
'ca_ad@euro': 'ca_AD.ISO8859-15',
'ca_es': 'ca_ES.ISO8859-1',
'ca_es.iso88591': 'ca_ES.ISO8859-1',
'ca_es.iso885915': 'ca_ES.ISO8859-15',
'ca_es.iso885915@euro': 'ca_ES.ISO8859-15',
'ca_es.utf8@euro': 'ca_ES.UTF-8',
'ca_es@euro': 'ca_ES.ISO8859-15',
'ca_es@valencia': 'ca_ES.UTF-8@valencia',
'ca_fr': 'ca_FR.ISO8859-1',
'ca_fr.iso88591': 'ca_FR.ISO8859-1',
'ca_fr.iso885915': 'ca_FR.ISO8859-15',
'ca_fr.iso885915@euro': 'ca_FR.ISO8859-15',
'ca_fr.utf8@euro': 'ca_FR.UTF-8',
'ca_fr@euro': 'ca_FR.ISO8859-15',
'ca_it': 'ca_IT.ISO8859-1',
'ca_it.iso88591': 'ca_IT.ISO8859-1',
'ca_it.iso885915': 'ca_IT.ISO8859-15',
'ca_it.iso885915@euro': 'ca_IT.ISO8859-15',
'ca_it.utf8@euro': 'ca_IT.UTF-8',
'ca_it@euro': 'ca_IT.ISO8859-15',
'catalan': 'ca_ES.ISO8859-1',
'ce_ru': 'ce_RU.UTF-8',
'cextend': 'en_US.ISO8859-1',
'cextend.en': 'en_US.ISO8859-1',
'chinese-s': 'zh_CN.eucCN',
'chinese-t': 'zh_TW.eucTW',
'chr_us': 'chr_US.UTF-8',
'ckb_iq': 'ckb_IQ.UTF-8',
'cmn_tw': 'cmn_TW.UTF-8',
'crh_ua': 'crh_UA.UTF-8',
'croatian': 'hr_HR.ISO8859-2',
'cs': 'cs_CZ.ISO8859-2',
'cs_cs': 'cs_CZ.ISO8859-2',
'cs_cs.iso88592': 'cs_CZ.ISO8859-2',
'cs_cz': 'cs_CZ.ISO8859-2',
'cs_cz.iso88592': 'cs_CZ.ISO8859-2',
'csb_pl': 'csb_PL.UTF-8',
'cv_ru': 'cv_RU.UTF-8',
'cy': 'cy_GB.ISO8859-1',
'cy_gb': 'cy_GB.ISO8859-1',
'cy_gb.iso88591': 'cy_GB.ISO8859-1',
'cy_gb.iso885914': 'cy_GB.ISO8859-14',
'cy_gb.iso885915': 'cy_GB.ISO8859-15',
'cy_gb@euro': 'cy_GB.ISO8859-15',
'cz': 'cs_CZ.ISO8859-2',
'cz_cz': 'cs_CZ.ISO8859-2',
'czech': 'cs_CZ.ISO8859-2',
'da': 'da_DK.ISO8859-1',
'da.iso885915': 'da_DK.ISO8859-15',
'da_dk': 'da_DK.ISO8859-1',
'da_dk.88591': 'da_DK.ISO8859-1',
'da_dk.885915': 'da_DK.ISO8859-15',
'da_dk.iso88591': 'da_DK.ISO8859-1',
'da_dk.iso885915': 'da_DK.ISO8859-15',
'da_dk@euro': 'da_DK.ISO8859-15',
'danish': 'da_DK.ISO8859-1',
'danish.iso88591': 'da_DK.ISO8859-1',
'dansk': 'da_DK.ISO8859-1',
'de': 'de_DE.ISO8859-1',
'de.iso885915': 'de_DE.ISO8859-15',
'de_at': 'de_AT.ISO8859-1',
'de_at.iso88591': 'de_AT.ISO8859-1',
'de_at.iso885915': 'de_AT.ISO8859-15',
'de_at.iso885915@euro': 'de_AT.ISO8859-15',
'de_at.utf8@euro': 'de_AT.UTF-8',
'de_at@euro': 'de_AT.ISO8859-15',
'de_be': 'de_BE.ISO8859-1',
'de_be.iso88591': 'de_BE.ISO8859-1',
'de_be.iso885915': 'de_BE.ISO8859-15',
'de_be.iso885915@euro': 'de_BE.ISO8859-15',
'de_be.utf8@euro': 'de_BE.UTF-8',
'de_be@euro': 'de_BE.ISO8859-15',
'de_ch': 'de_CH.ISO8859-1',
'de_ch.iso88591': 'de_CH.ISO8859-1',
'de_ch.iso885915': 'de_CH.ISO8859-15',
'de_ch@euro': 'de_CH.ISO8859-15',
'de_de': 'de_DE.ISO8859-1',
'de_de.88591': 'de_DE.ISO8859-1',
'de_de.885915': 'de_DE.ISO8859-15',
'de_de.885915@euro': 'de_DE.ISO8859-15',
'de_de.iso88591': 'de_DE.ISO8859-1',
'de_de.iso885915': 'de_DE.ISO8859-15',
'de_de.iso885915@euro': 'de_DE.ISO8859-15',
'de_de.utf8@euro': 'de_DE.UTF-8',
'de_de@euro': 'de_DE.ISO8859-15',
'de_it': 'de_IT.ISO8859-1',
'de_li.utf8': 'de_LI.UTF-8',
'de_lu': 'de_LU.ISO8859-1',
'de_lu.iso88591': 'de_LU.ISO8859-1',
'de_lu.iso885915': 'de_LU.ISO8859-15',
'de_lu.iso885915@euro': 'de_LU.ISO8859-15',
'de_lu.utf8@euro': 'de_LU.UTF-8',
'de_lu@euro': 'de_LU.ISO8859-15',
'deutsch': 'de_DE.ISO8859-1',
'doi_in': 'doi_IN.UTF-8',
'dutch': 'nl_NL.ISO8859-1',
'dutch.iso88591': 'nl_BE.ISO8859-1',
'dv_mv': 'dv_MV.UTF-8',
'dz_bt': 'dz_BT.UTF-8',
'ee': 'ee_EE.ISO8859-4',
'ee_ee': 'ee_EE.ISO8859-4',
'ee_ee.iso88594': 'ee_EE.ISO8859-4',
'eesti': 'et_EE.ISO8859-1',
'el': 'el_GR.ISO8859-7',
'el_cy': 'el_CY.ISO8859-7',
'el_gr': 'el_GR.ISO8859-7',
'el_gr.iso88597': 'el_GR.ISO8859-7',
'el_gr@euro': 'el_GR.ISO8859-15',
'en': 'en_US.ISO8859-1',
'en.iso88591': 'en_US.ISO8859-1',
'en_ag': 'en_AG.UTF-8',
'en_au': 'en_AU.ISO8859-1',
'en_au.iso88591': 'en_AU.ISO8859-1',
'en_be': 'en_BE.ISO8859-1',
'en_be@euro': 'en_BE.ISO8859-15',
'en_bw': 'en_BW.ISO8859-1',
'en_bw.iso88591': 'en_BW.ISO8859-1',
'en_ca': 'en_CA.ISO8859-1',
'en_ca.iso88591': 'en_CA.ISO8859-1',
'en_dk': 'en_DK.ISO8859-1',
'en_dk.iso88591': 'en_DK.ISO8859-1',
'en_dk.iso885915': 'en_DK.ISO8859-15',
'en_dl.utf8': 'en_DL.UTF-8',
'en_gb': 'en_GB.ISO8859-1',
'en_gb.88591': 'en_GB.ISO8859-1',
'en_gb.iso88591': 'en_GB.ISO8859-1',
'en_gb.iso885915': 'en_GB.ISO8859-15',
'en_gb@euro': 'en_GB.ISO8859-15',
'en_hk': 'en_HK.ISO8859-1',
'en_hk.iso88591': 'en_HK.ISO8859-1',
'en_ie': 'en_IE.ISO8859-1',
'en_ie.iso88591': 'en_IE.ISO8859-1',
'en_ie.iso885915': 'en_IE.ISO8859-15',
'en_ie.iso885915@euro': 'en_IE.ISO8859-15',
'en_ie.utf8@euro': 'en_IE.UTF-8',
'en_ie@euro': 'en_IE.ISO8859-15',
'en_il': 'en_IL.UTF-8',
'en_in': 'en_IN.ISO8859-1',
'en_ng': 'en_NG.UTF-8',
'en_nz': 'en_NZ.ISO8859-1',
'en_nz.iso88591': 'en_NZ.ISO8859-1',
'en_ph': 'en_PH.ISO8859-1',
'en_ph.iso88591': 'en_PH.ISO8859-1',
'en_sc.utf8': 'en_SC.UTF-8',
'en_sg': 'en_SG.ISO8859-1',
'en_sg.iso88591': 'en_SG.ISO8859-1',
'en_uk': 'en_GB.ISO8859-1',
'en_us': 'en_US.ISO8859-1',
'en_us.88591': 'en_US.ISO8859-1',
'en_us.885915': 'en_US.ISO8859-15',
'en_us.iso88591': 'en_US.ISO8859-1',
'en_us.iso885915': 'en_US.ISO8859-15',
'en_us.iso885915@euro': 'en_US.ISO8859-15',
'en_us@euro': 'en_US.ISO8859-15',
'en_us@euro@euro': 'en_US.ISO8859-15',
'en_za': 'en_ZA.ISO8859-1',
'en_za.88591': 'en_ZA.ISO8859-1',
'en_za.iso88591': 'en_ZA.ISO8859-1',
'en_za.iso885915': 'en_ZA.ISO8859-15',
'en_za@euro': 'en_ZA.ISO8859-15',
'en_zm': 'en_ZM.UTF-8',
'en_zw': 'en_ZW.ISO8859-1',
'en_zw.iso88591': 'en_ZW.ISO8859-1',
'en_zw.utf8': 'en_ZS.UTF-8',
'eng_gb': 'en_GB.ISO8859-1',
'eng_gb.8859': 'en_GB.ISO8859-1',
'english': 'en_EN.ISO8859-1',
'english.iso88591': 'en_US.ISO8859-1',
'english_uk': 'en_GB.ISO8859-1',
'english_uk.8859': 'en_GB.ISO8859-1',
'english_united-states': 'en_US.ISO8859-1',
'english_united-states.437': 'C',
'english_us': 'en_US.ISO8859-1',
'english_us.8859': 'en_US.ISO8859-1',
'english_us.ascii': 'en_US.ISO8859-1',
'eo': 'eo_XX.ISO8859-3',
'eo.utf8': 'eo.UTF-8',
'eo_eo': 'eo_EO.ISO8859-3',
'eo_eo.iso88593': 'eo_EO.ISO8859-3',
'eo_us.utf8': 'eo_US.UTF-8',
'eo_xx': 'eo_XX.ISO8859-3',
'eo_xx.iso88593': 'eo_XX.ISO8859-3',
'es': 'es_ES.ISO8859-1',
'es_ar': 'es_AR.ISO8859-1',
'es_ar.iso88591': 'es_AR.ISO8859-1',
'es_bo': 'es_BO.ISO8859-1',
'es_bo.iso88591': 'es_BO.ISO8859-1',
'es_cl': 'es_CL.ISO8859-1',
'es_cl.iso88591': 'es_CL.ISO8859-1',
'es_co': 'es_CO.ISO8859-1',
'es_co.iso88591': 'es_CO.ISO8859-1',
'es_cr': 'es_CR.ISO8859-1',
'es_cr.iso88591': 'es_CR.ISO8859-1',
'es_cu': 'es_CU.UTF-8',
'es_do': 'es_DO.ISO8859-1',
'es_do.iso88591': 'es_DO.ISO8859-1',
'es_ec': 'es_EC.ISO8859-1',
'es_ec.iso88591': 'es_EC.ISO8859-1',
'es_es': 'es_ES.ISO8859-1',
'es_es.88591': 'es_ES.ISO8859-1',
'es_es.iso88591': 'es_ES.ISO8859-1',
'es_es.iso885915': 'es_ES.ISO8859-15',
'es_es.iso885915@euro': 'es_ES.ISO8859-15',
'es_es.utf8@euro': 'es_ES.UTF-8',
'es_es@euro': 'es_ES.ISO8859-15',
'es_gt': 'es_GT.ISO8859-1',
'es_gt.iso88591': 'es_GT.ISO8859-1',
'es_hn': 'es_HN.ISO8859-1',
'es_hn.iso88591': 'es_HN.ISO8859-1',
'es_mx': 'es_MX.ISO8859-1',
'es_mx.iso88591': 'es_MX.ISO8859-1',
'es_ni': 'es_NI.ISO8859-1',
'es_ni.iso88591': 'es_NI.ISO8859-1',
'es_pa': 'es_PA.ISO8859-1',
'es_pa.iso88591': 'es_PA.ISO8859-1',
'es_pa.iso885915': 'es_PA.ISO8859-15',
'es_pa@euro': 'es_PA.ISO8859-15',
'es_pe': 'es_PE.ISO8859-1',
'es_pe.iso88591': 'es_PE.ISO8859-1',
'es_pe.iso885915': 'es_PE.ISO8859-15',
'es_pe@euro': 'es_PE.ISO8859-15',
'es_pr': 'es_PR.ISO8859-1',
'es_pr.iso88591': 'es_PR.ISO8859-1',
'es_py': 'es_PY.ISO8859-1',
'es_py.iso88591': 'es_PY.ISO8859-1',
'es_py.iso885915': 'es_PY.ISO8859-15',
'es_py@euro': 'es_PY.ISO8859-15',
'es_sv': 'es_SV.ISO8859-1',
'es_sv.iso88591': 'es_SV.ISO8859-1',
'es_sv.iso885915': 'es_SV.ISO8859-15',
'es_sv@euro': 'es_SV.ISO8859-15',
'es_us': 'es_US.ISO8859-1',
'es_us.iso88591': 'es_US.ISO8859-1',
'es_uy': 'es_UY.ISO8859-1',
'es_uy.iso88591': 'es_UY.ISO8859-1',
'es_uy.iso885915': 'es_UY.ISO8859-15',
'es_uy@euro': 'es_UY.ISO8859-15',
'es_ve': 'es_VE.ISO8859-1',
'es_ve.iso88591': 'es_VE.ISO8859-1',
'es_ve.iso885915': 'es_VE.ISO8859-15',
'es_ve@euro': 'es_VE.ISO8859-15',
'estonian': 'et_EE.ISO8859-1',
'et': 'et_EE.ISO8859-15',
'et_ee': 'et_EE.ISO8859-15',
'et_ee.iso88591': 'et_EE.ISO8859-1',
'et_ee.iso885913': 'et_EE.ISO8859-13',
'et_ee.iso885915': 'et_EE.ISO8859-15',
'et_ee.iso88594': 'et_EE.ISO8859-4',
'et_ee@euro': 'et_EE.ISO8859-15',
'eu': 'eu_ES.ISO8859-1',
'eu_es': 'eu_ES.ISO8859-1',
'eu_es.iso88591': 'eu_ES.ISO8859-1',
'eu_es.iso885915': 'eu_ES.ISO8859-15',
'eu_es.iso885915@euro': 'eu_ES.ISO8859-15',
'eu_es.utf8@euro': 'eu_ES.UTF-8',
'eu_es@euro': 'eu_ES.ISO8859-15',
'eu_fr': 'eu_FR.ISO8859-1',
'fa': 'fa_IR.UTF-8',
'fa_ir': 'fa_IR.UTF-8',
'fa_ir.isiri3342': 'fa_IR.ISIRI-3342',
'ff_sn': 'ff_SN.UTF-8',
'fi': 'fi_FI.ISO8859-15',
'fi.iso885915': 'fi_FI.ISO8859-15',
'fi_fi': 'fi_FI.ISO8859-15',
'fi_fi.88591': 'fi_FI.ISO8859-1',
'fi_fi.iso88591': 'fi_FI.ISO8859-1',
'fi_fi.iso885915': 'fi_FI.ISO8859-15',
'fi_fi.iso885915@euro': 'fi_FI.ISO8859-15',
'fi_fi.utf8@euro': 'fi_FI.UTF-8',
'fi_fi@euro': 'fi_FI.ISO8859-15',
'fil_ph': 'fil_PH.UTF-8',
'finnish': 'fi_FI.ISO8859-1',
'finnish.iso88591': 'fi_FI.ISO8859-1',
'fo': 'fo_FO.ISO8859-1',
'fo_fo': 'fo_FO.ISO8859-1',
'fo_fo.iso88591': 'fo_FO.ISO8859-1',
'fo_fo.iso885915': 'fo_FO.ISO8859-15',
'fo_fo@euro': 'fo_FO.ISO8859-15',
'fr': 'fr_FR.ISO8859-1',
'fr.iso885915': 'fr_FR.ISO8859-15',
'fr_be': 'fr_BE.ISO8859-1',
'fr_be.88591': 'fr_BE.ISO8859-1',
'fr_be.iso88591': 'fr_BE.ISO8859-1',
'fr_be.iso885915': 'fr_BE.ISO8859-15',
'fr_be.iso885915@euro': 'fr_BE.ISO8859-15',
'fr_be.utf8@euro': 'fr_BE.UTF-8',
'fr_be@euro': 'fr_BE.ISO8859-15',
'fr_ca': 'fr_CA.ISO8859-1',
'fr_ca.88591': 'fr_CA.ISO8859-1',
'fr_ca.iso88591': 'fr_CA.ISO8859-1',
'fr_ca.iso885915': 'fr_CA.ISO8859-15',
'fr_ca@euro': 'fr_CA.ISO8859-15',
'fr_ch': 'fr_CH.ISO8859-1',
'fr_ch.88591': 'fr_CH.ISO8859-1',
'fr_ch.iso88591': 'fr_CH.ISO8859-1',
'fr_ch.iso885915': 'fr_CH.ISO8859-15',
'fr_ch@euro': 'fr_CH.ISO8859-15',
'fr_fr': 'fr_FR.ISO8859-1',
'fr_fr.88591': 'fr_FR.ISO8859-1',
'fr_fr.iso88591': 'fr_FR.ISO8859-1',
'fr_fr.iso885915': 'fr_FR.ISO8859-15',
'fr_fr.iso885915@euro': 'fr_FR.ISO8859-15',
'fr_fr.utf8@euro': 'fr_FR.UTF-8',
'fr_fr@euro': 'fr_FR.ISO8859-15',
'fr_lu': 'fr_LU.ISO8859-1',
'fr_lu.88591': 'fr_LU.ISO8859-1',
'fr_lu.iso88591': 'fr_LU.ISO8859-1',
'fr_lu.iso885915': 'fr_LU.ISO8859-15',
'fr_lu.iso885915@euro': 'fr_LU.ISO8859-15',
'fr_lu.utf8@euro': 'fr_LU.UTF-8',
'fr_lu@euro': 'fr_LU.ISO8859-15',
'fran\xe7ais': 'fr_FR.ISO8859-1',
'fre_fr': 'fr_FR.ISO8859-1',
'fre_fr.8859': 'fr_FR.ISO8859-1',
'french': 'fr_FR.ISO8859-1',
'french.iso88591': 'fr_CH.ISO8859-1',
'french_france': 'fr_FR.ISO8859-1',
'french_france.8859': 'fr_FR.ISO8859-1',
'fur_it': 'fur_IT.UTF-8',
'fy_de': 'fy_DE.UTF-8',
'fy_nl': 'fy_NL.UTF-8',
'ga': 'ga_IE.ISO8859-1',
'ga_ie': 'ga_IE.ISO8859-1',
'ga_ie.iso88591': 'ga_IE.ISO8859-1',
'ga_ie.iso885914': 'ga_IE.ISO8859-14',
'ga_ie.iso885915': 'ga_IE.ISO8859-15',
'ga_ie.iso885915@euro': 'ga_IE.ISO8859-15',
'ga_ie.utf8@euro': 'ga_IE.UTF-8',
'ga_ie@euro': 'ga_IE.ISO8859-15',
'galego': 'gl_ES.ISO8859-1',
'galician': 'gl_ES.ISO8859-1',
'gd': 'gd_GB.ISO8859-1',
'gd_gb': 'gd_GB.ISO8859-1',
'gd_gb.iso88591': 'gd_GB.ISO8859-1',
'gd_gb.iso885914': 'gd_GB.ISO8859-14',
'gd_gb.iso885915': 'gd_GB.ISO8859-15',
'gd_gb@euro': 'gd_GB.ISO8859-15',
'ger_de': 'de_DE.ISO8859-1',
'ger_de.8859': 'de_DE.ISO8859-1',
'german': 'de_DE.ISO8859-1',
'german.iso88591': 'de_CH.ISO8859-1',
'german_germany': 'de_DE.ISO8859-1',
'german_germany.8859': 'de_DE.ISO8859-1',
'gez_er': 'gez_ER.UTF-8',
'gez_et': 'gez_ET.UTF-8',
'gl': 'gl_ES.ISO8859-1',
'gl_es': 'gl_ES.ISO8859-1',
'gl_es.iso88591': 'gl_ES.ISO8859-1',
'gl_es.iso885915': 'gl_ES.ISO8859-15',
'gl_es.iso885915@euro': 'gl_ES.ISO8859-15',
'gl_es.utf8@euro': 'gl_ES.UTF-8',
'gl_es@euro': 'gl_ES.ISO8859-15',
'greek': 'el_GR.ISO8859-7',
'greek.iso88597': 'el_GR.ISO8859-7',
'gu_in': 'gu_IN.UTF-8',
'gv': 'gv_GB.ISO8859-1',
'gv_gb': 'gv_GB.ISO8859-1',
'gv_gb.iso88591': 'gv_GB.ISO8859-1',
'gv_gb.iso885914': 'gv_GB.ISO8859-14',
'gv_gb.iso885915': 'gv_GB.ISO8859-15',
'gv_gb@euro': 'gv_GB.ISO8859-15',
'ha_ng': 'ha_NG.UTF-8',
'hak_tw': 'hak_TW.UTF-8',
'he': 'he_IL.ISO8859-8',
'he_il': 'he_IL.ISO8859-8',
'he_il.cp1255': 'he_IL.CP1255',
'he_il.iso88598': 'he_IL.ISO8859-8',
'he_il.microsoftcp1255': 'he_IL.CP1255',
'hebrew': 'he_IL.ISO8859-8',
'hebrew.iso88598': 'he_IL.ISO8859-8',
'hi': 'hi_IN.ISCII-DEV',
'hi_in': 'hi_IN.ISCII-DEV',
'hi_in.isciidev': 'hi_IN.ISCII-DEV',
'hif_fj': 'hif_FJ.UTF-8',
'hne': 'hne_IN.UTF-8',
'hne_in': 'hne_IN.UTF-8',
'hr': 'hr_HR.ISO8859-2',
'hr_hr': 'hr_HR.ISO8859-2',
'hr_hr.iso88592': 'hr_HR.ISO8859-2',
'hrvatski': 'hr_HR.ISO8859-2',
'hsb_de': 'hsb_DE.ISO8859-2',
'ht_ht': 'ht_HT.UTF-8',
'hu': 'hu_HU.ISO8859-2',
'hu_hu': 'hu_HU.ISO8859-2',
'hu_hu.iso88592': 'hu_HU.ISO8859-2',
'hungarian': 'hu_HU.ISO8859-2',
'hy_am': 'hy_AM.UTF-8',
'hy_am.armscii8': 'hy_AM.ARMSCII_8',
'ia': 'ia.UTF-8',
'ia_fr': 'ia_FR.UTF-8',
'icelandic': 'is_IS.ISO8859-1',
'icelandic.iso88591': 'is_IS.ISO8859-1',
'id': 'id_ID.ISO8859-1',
'id_id': 'id_ID.ISO8859-1',
'ig_ng': 'ig_NG.UTF-8',
'ik_ca': 'ik_CA.UTF-8',
'in': 'id_ID.ISO8859-1',
'in_id': 'id_ID.ISO8859-1',
'is': 'is_IS.ISO8859-1',
'is_is': 'is_IS.ISO8859-1',
'is_is.iso88591': 'is_IS.ISO8859-1',
'is_is.iso885915': 'is_IS.ISO8859-15',
'is_is@euro': 'is_IS.ISO8859-15',
'iso-8859-1': 'en_US.ISO8859-1',
'iso-8859-15': 'en_US.ISO8859-15',
'iso8859-1': 'en_US.ISO8859-1',
'iso8859-15': 'en_US.ISO8859-15',
'iso_8859_1': 'en_US.ISO8859-1',
'iso_8859_15': 'en_US.ISO8859-15',
'it': 'it_IT.ISO8859-1',
'it.iso885915': 'it_IT.ISO8859-15',
'it_ch': 'it_CH.ISO8859-1',
'it_ch.iso88591': 'it_CH.ISO8859-1',
'it_ch.iso885915': 'it_CH.ISO8859-15',
'it_ch@euro': 'it_CH.ISO8859-15',
'it_it': 'it_IT.ISO8859-1',
'it_it.88591': 'it_IT.ISO8859-1',
'it_it.iso88591': 'it_IT.ISO8859-1',
'it_it.iso885915': 'it_IT.ISO8859-15',
'it_it.iso885915@euro': 'it_IT.ISO8859-15',
'it_it.utf8@euro': 'it_IT.UTF-8',
'it_it@euro': 'it_IT.ISO8859-15',
'italian': 'it_IT.ISO8859-1',
'italian.iso88591': 'it_IT.ISO8859-1',
'iu': 'iu_CA.NUNACOM-8',
'iu_ca': 'iu_CA.NUNACOM-8',
'iu_ca.nunacom8': 'iu_CA.NUNACOM-8',
'iw': 'he_IL.ISO8859-8',
'iw_il': 'he_IL.ISO8859-8',
'iw_il.iso88598': 'he_IL.ISO8859-8',
'iw_il.utf8': 'iw_IL.UTF-8',
'ja': 'ja_JP.eucJP',
'ja.jis': 'ja_JP.JIS7',
'ja.sjis': 'ja_JP.SJIS',
'ja_jp': 'ja_JP.eucJP',
'ja_jp.ajec': 'ja_JP.eucJP',
'ja_jp.euc': 'ja_JP.eucJP',
'ja_jp.eucjp': 'ja_JP.eucJP',
'ja_jp.iso-2022-jp': 'ja_JP.JIS7',
'ja_jp.iso2022jp': 'ja_JP.JIS7',
'ja_jp.jis': 'ja_JP.JIS7',
'ja_jp.jis7': 'ja_JP.JIS7',
'ja_jp.mscode': 'ja_JP.SJIS',
'ja_jp.pck': 'ja_JP.SJIS',
'ja_jp.sjis': 'ja_JP.SJIS',
'ja_jp.ujis': 'ja_JP.eucJP',
'japan': 'ja_JP.eucJP',
'japanese': 'ja_JP.eucJP',
'japanese-euc': 'ja_JP.eucJP',
'japanese.euc': 'ja_JP.eucJP',
'japanese.sjis': 'ja_JP.SJIS',
'jp_jp': 'ja_JP.eucJP',
'ka': 'ka_GE.GEORGIAN-ACADEMY',
'ka_ge': 'ka_GE.GEORGIAN-ACADEMY',
'ka_ge.georgianacademy': 'ka_GE.GEORGIAN-ACADEMY',
'ka_ge.georgianps': 'ka_GE.GEORGIAN-PS',
'ka_ge.georgianrs': 'ka_GE.GEORGIAN-ACADEMY',
'kab_dz': 'kab_DZ.UTF-8',
'kk_kz': 'kk_KZ.ptcp154',
'kl': 'kl_GL.ISO8859-1',
'kl_gl': 'kl_GL.ISO8859-1',
'kl_gl.iso88591': 'kl_GL.ISO8859-1',
'kl_gl.iso885915': 'kl_GL.ISO8859-15',
'kl_gl@euro': 'kl_GL.ISO8859-15',
'km_kh': 'km_KH.UTF-8',
'kn': 'kn_IN.UTF-8',
'kn_in': 'kn_IN.UTF-8',
'ko': 'ko_KR.eucKR',
'ko_kr': 'ko_KR.eucKR',
'ko_kr.euc': 'ko_KR.eucKR',
'ko_kr.euckr': 'ko_KR.eucKR',
'kok_in': 'kok_IN.UTF-8',
'korean': 'ko_KR.eucKR',
'korean.euc': 'ko_KR.eucKR',
'ks': 'ks_IN.UTF-8',
'ks_in': 'ks_IN.UTF-8',
'ks_in.utf8@devanagari': 'ks_IN.UTF-8@devanagari',
'ks_in@devanagari': 'ks_IN.UTF-8@devanagari',
'[email protected]': 'ks_IN.UTF-8@devanagari',
'ku_tr': 'ku_TR.ISO8859-9',
'kw': 'kw_GB.ISO8859-1',
'kw_gb': 'kw_GB.ISO8859-1',
'kw_gb.iso88591': 'kw_GB.ISO8859-1',
'kw_gb.iso885914': 'kw_GB.ISO8859-14',
'kw_gb.iso885915': 'kw_GB.ISO8859-15',
'kw_gb@euro': 'kw_GB.ISO8859-15',
'ky': 'ky_KG.UTF-8',
'ky_kg': 'ky_KG.UTF-8',
'lb_lu': 'lb_LU.UTF-8',
'lg_ug': 'lg_UG.ISO8859-10',
'li_be': 'li_BE.UTF-8',
'li_nl': 'li_NL.UTF-8',
'lij_it': 'lij_IT.UTF-8',
'lithuanian': 'lt_LT.ISO8859-13',
'ln_cd': 'ln_CD.UTF-8',
'lo': 'lo_LA.MULELAO-1',
'lo_la': 'lo_LA.MULELAO-1',
'lo_la.cp1133': 'lo_LA.IBM-CP1133',
'lo_la.ibmcp1133': 'lo_LA.IBM-CP1133',
'lo_la.mulelao1': 'lo_LA.MULELAO-1',
'lt': 'lt_LT.ISO8859-13',
'lt_lt': 'lt_LT.ISO8859-13',
'lt_lt.iso885913': 'lt_LT.ISO8859-13',
'lt_lt.iso88594': 'lt_LT.ISO8859-4',
'lv': 'lv_LV.ISO8859-13',
'lv_lv': 'lv_LV.ISO8859-13',
'lv_lv.iso885913': 'lv_LV.ISO8859-13',
'lv_lv.iso88594': 'lv_LV.ISO8859-4',
'lzh_tw': 'lzh_TW.UTF-8',
'mag_in': 'mag_IN.UTF-8',
'mai': 'mai_IN.UTF-8',
'mai_in': 'mai_IN.UTF-8',
'mai_np': 'mai_NP.UTF-8',
'mfe_mu': 'mfe_MU.UTF-8',
'mg_mg': 'mg_MG.ISO8859-15',
'mhr_ru': 'mhr_RU.UTF-8',
'mi': 'mi_NZ.ISO8859-1',
'mi_nz': 'mi_NZ.ISO8859-1',
'mi_nz.iso88591': 'mi_NZ.ISO8859-1',
'miq_ni': 'miq_NI.UTF-8',
'mjw_in': 'mjw_IN.UTF-8',
'mk': 'mk_MK.ISO8859-5',
'mk_mk': 'mk_MK.ISO8859-5',
'mk_mk.cp1251': 'mk_MK.CP1251',
'mk_mk.iso88595': 'mk_MK.ISO8859-5',
'mk_mk.microsoftcp1251': 'mk_MK.CP1251',
'ml': 'ml_IN.UTF-8',
'ml_in': 'ml_IN.UTF-8',
'mn_mn': 'mn_MN.UTF-8',
'mni_in': 'mni_IN.UTF-8',
'mr': 'mr_IN.UTF-8',
'mr_in': 'mr_IN.UTF-8',
'ms': 'ms_MY.ISO8859-1',
'ms_my': 'ms_MY.ISO8859-1',
'ms_my.iso88591': 'ms_MY.ISO8859-1',
'mt': 'mt_MT.ISO8859-3',
'mt_mt': 'mt_MT.ISO8859-3',
'mt_mt.iso88593': 'mt_MT.ISO8859-3',
'my_mm': 'my_MM.UTF-8',
'nan_tw': 'nan_TW.UTF-8',
'nb': 'nb_NO.ISO8859-1',
'nb_no': 'nb_NO.ISO8859-1',
'nb_no.88591': 'nb_NO.ISO8859-1',
'nb_no.iso88591': 'nb_NO.ISO8859-1',
'nb_no.iso885915': 'nb_NO.ISO8859-15',
'nb_no@euro': 'nb_NO.ISO8859-15',
'nds_de': 'nds_DE.UTF-8',
'nds_nl': 'nds_NL.UTF-8',
'ne_np': 'ne_NP.UTF-8',
'nhn_mx': 'nhn_MX.UTF-8',
'niu_nu': 'niu_NU.UTF-8',
'niu_nz': 'niu_NZ.UTF-8',
'nl': 'nl_NL.ISO8859-1',
'nl.iso885915': 'nl_NL.ISO8859-15',
'nl_aw': 'nl_AW.UTF-8',
'nl_be': 'nl_BE.ISO8859-1',
'nl_be.88591': 'nl_BE.ISO8859-1',
'nl_be.iso88591': 'nl_BE.ISO8859-1',
'nl_be.iso885915': 'nl_BE.ISO8859-15',
'nl_be.iso885915@euro': 'nl_BE.ISO8859-15',
'nl_be.utf8@euro': 'nl_BE.UTF-8',
'nl_be@euro': 'nl_BE.ISO8859-15',
'nl_nl': 'nl_NL.ISO8859-1',
'nl_nl.88591': 'nl_NL.ISO8859-1',
'nl_nl.iso88591': 'nl_NL.ISO8859-1',
'nl_nl.iso885915': 'nl_NL.ISO8859-15',
'nl_nl.iso885915@euro': 'nl_NL.ISO8859-15',
'nl_nl.utf8@euro': 'nl_NL.UTF-8',
'nl_nl@euro': 'nl_NL.ISO8859-15',
'nn': 'nn_NO.ISO8859-1',
'nn_no': 'nn_NO.ISO8859-1',
'nn_no.88591': 'nn_NO.ISO8859-1',
'nn_no.iso88591': 'nn_NO.ISO8859-1',
'nn_no.iso885915': 'nn_NO.ISO8859-15',
'nn_no@euro': 'nn_NO.ISO8859-15',
'no': 'no_NO.ISO8859-1',
'no@nynorsk': 'ny_NO.ISO8859-1',
'no_no': 'no_NO.ISO8859-1',
'no_no.88591': 'no_NO.ISO8859-1',
'no_no.iso88591': 'no_NO.ISO8859-1',
'no_no.iso885915': 'no_NO.ISO8859-15',
'no_no.iso88591@bokmal': 'no_NO.ISO8859-1',
'no_no.iso88591@nynorsk': 'no_NO.ISO8859-1',
'no_no@euro': 'no_NO.ISO8859-15',
'norwegian': 'no_NO.ISO8859-1',
'norwegian.iso88591': 'no_NO.ISO8859-1',
'nr': 'nr_ZA.ISO8859-1',
'nr_za': 'nr_ZA.ISO8859-1',
'nr_za.iso88591': 'nr_ZA.ISO8859-1',
'nso': 'nso_ZA.ISO8859-15',
'nso_za': 'nso_ZA.ISO8859-15',
'nso_za.iso885915': 'nso_ZA.ISO8859-15',
'ny': 'ny_NO.ISO8859-1',
'ny_no': 'ny_NO.ISO8859-1',
'ny_no.88591': 'ny_NO.ISO8859-1',
'ny_no.iso88591': 'ny_NO.ISO8859-1',
'ny_no.iso885915': 'ny_NO.ISO8859-15',
'ny_no@euro': 'ny_NO.ISO8859-15',
'nynorsk': 'nn_NO.ISO8859-1',
'oc': 'oc_FR.ISO8859-1',
'oc_fr': 'oc_FR.ISO8859-1',
'oc_fr.iso88591': 'oc_FR.ISO8859-1',
'oc_fr.iso885915': 'oc_FR.ISO8859-15',
'oc_fr@euro': 'oc_FR.ISO8859-15',
'om_et': 'om_ET.UTF-8',
'om_ke': 'om_KE.ISO8859-1',
'or': 'or_IN.UTF-8',
'or_in': 'or_IN.UTF-8',
'os_ru': 'os_RU.UTF-8',
'pa': 'pa_IN.UTF-8',
'pa_in': 'pa_IN.UTF-8',
'pa_pk': 'pa_PK.UTF-8',
'pap_an': 'pap_AN.UTF-8',
'pap_aw': 'pap_AW.UTF-8',
'pap_cw': 'pap_CW.UTF-8',
'pd': 'pd_US.ISO8859-1',
'pd_de': 'pd_DE.ISO8859-1',
'pd_de.iso88591': 'pd_DE.ISO8859-1',
'pd_de.iso885915': 'pd_DE.ISO8859-15',
'pd_de@euro': 'pd_DE.ISO8859-15',
'pd_us': 'pd_US.ISO8859-1',
'pd_us.iso88591': 'pd_US.ISO8859-1',
'pd_us.iso885915': 'pd_US.ISO8859-15',
'pd_us@euro': 'pd_US.ISO8859-15',
'ph': 'ph_PH.ISO8859-1',
'ph_ph': 'ph_PH.ISO8859-1',
'ph_ph.iso88591': 'ph_PH.ISO8859-1',
'pl': 'pl_PL.ISO8859-2',
'pl_pl': 'pl_PL.ISO8859-2',
'pl_pl.iso88592': 'pl_PL.ISO8859-2',
'polish': 'pl_PL.ISO8859-2',
'portuguese': 'pt_PT.ISO8859-1',
'portuguese.iso88591': 'pt_PT.ISO8859-1',
'portuguese_brazil': 'pt_BR.ISO8859-1',
'portuguese_brazil.8859': 'pt_BR.ISO8859-1',
'posix': 'C',
'posix-utf2': 'C',
'pp': 'pp_AN.ISO8859-1',
'pp_an': 'pp_AN.ISO8859-1',
'pp_an.iso88591': 'pp_AN.ISO8859-1',
'ps_af': 'ps_AF.UTF-8',
'pt': 'pt_PT.ISO8859-1',
'pt.iso885915': 'pt_PT.ISO8859-15',
'pt_br': 'pt_BR.ISO8859-1',
'pt_br.88591': 'pt_BR.ISO8859-1',
'pt_br.iso88591': 'pt_BR.ISO8859-1',
'pt_br.iso885915': 'pt_BR.ISO8859-15',
'pt_br@euro': 'pt_BR.ISO8859-15',
'pt_pt': 'pt_PT.ISO8859-1',
'pt_pt.88591': 'pt_PT.ISO8859-1',
'pt_pt.iso88591': 'pt_PT.ISO8859-1',
'pt_pt.iso885915': 'pt_PT.ISO8859-15',
'pt_pt.iso885915@euro': 'pt_PT.ISO8859-15',
'pt_pt.utf8@euro': 'pt_PT.UTF-8',
'pt_pt@euro': 'pt_PT.ISO8859-15',
'quz_pe': 'quz_PE.UTF-8',
'raj_in': 'raj_IN.UTF-8',
'ro': 'ro_RO.ISO8859-2',
'ro_ro': 'ro_RO.ISO8859-2',
'ro_ro.iso88592': 'ro_RO.ISO8859-2',
'romanian': 'ro_RO.ISO8859-2',
'ru': 'ru_RU.UTF-8',
'ru.koi8r': 'ru_RU.KOI8-R',
'ru_ru': 'ru_RU.UTF-8',
'ru_ru.cp1251': 'ru_RU.CP1251',
'ru_ru.iso88595': 'ru_RU.ISO8859-5',
'ru_ru.koi8r': 'ru_RU.KOI8-R',
'ru_ru.microsoftcp1251': 'ru_RU.CP1251',
'ru_ua': 'ru_UA.KOI8-U',
'ru_ua.cp1251': 'ru_UA.CP1251',
'ru_ua.koi8u': 'ru_UA.KOI8-U',
'ru_ua.microsoftcp1251': 'ru_UA.CP1251',
'rumanian': 'ro_RO.ISO8859-2',
'russian': 'ru_RU.KOI8-R',
'rw': 'rw_RW.ISO8859-1',
'rw_rw': 'rw_RW.ISO8859-1',
'rw_rw.iso88591': 'rw_RW.ISO8859-1',
'sa_in': 'sa_IN.UTF-8',
'sat_in': 'sat_IN.UTF-8',
'sc_it': 'sc_IT.UTF-8',
'sd': 'sd_IN.UTF-8',
'sd@devanagari': 'sd_IN.UTF-8@devanagari',
'sd_in': 'sd_IN.UTF-8',
'sd_in.utf8@devanagari': 'sd_IN.UTF-8@devanagari',
'sd_in@devanagari': 'sd_IN.UTF-8@devanagari',
'[email protected]': 'sd_IN.UTF-8@devanagari',
'sd_pk': 'sd_PK.UTF-8',
'se_no': 'se_NO.UTF-8',
'serbocroatian': 'sr_RS.UTF-8@latin',
'sgs_lt': 'sgs_LT.UTF-8',
'sh': 'sr_RS.UTF-8@latin',
'sh_ba.iso88592@bosnia': 'sr_CS.ISO8859-2',
'sh_hr': 'sh_HR.ISO8859-2',
'sh_hr.iso88592': 'hr_HR.ISO8859-2',
'sh_sp': 'sr_CS.ISO8859-2',
'sh_yu': 'sr_RS.UTF-8@latin',
'shn_mm': 'shn_MM.UTF-8',
'shs_ca': 'shs_CA.UTF-8',
'si': 'si_LK.UTF-8',
'si_lk': 'si_LK.UTF-8',
'sid_et': 'sid_ET.UTF-8',
'sinhala': 'si_LK.UTF-8',
'sk': 'sk_SK.ISO8859-2',
'sk_sk': 'sk_SK.ISO8859-2',
'sk_sk.iso88592': 'sk_SK.ISO8859-2',
'sl': 'sl_SI.ISO8859-2',
'sl_cs': 'sl_CS.ISO8859-2',
'sl_si': 'sl_SI.ISO8859-2',
'sl_si.iso88592': 'sl_SI.ISO8859-2',
'slovak': 'sk_SK.ISO8859-2',
'slovene': 'sl_SI.ISO8859-2',
'slovenian': 'sl_SI.ISO8859-2',
'sm_ws': 'sm_WS.UTF-8',
'so_dj': 'so_DJ.ISO8859-1',
'so_et': 'so_ET.UTF-8',
'so_ke': 'so_KE.ISO8859-1',
'so_so': 'so_SO.ISO8859-1',
'sp': 'sr_CS.ISO8859-5',
'sp_yu': 'sr_CS.ISO8859-5',
'spanish': 'es_ES.ISO8859-1',
'spanish.iso88591': 'es_ES.ISO8859-1',
'spanish_spain': 'es_ES.ISO8859-1',
'spanish_spain.8859': 'es_ES.ISO8859-1',
'sq': 'sq_AL.ISO8859-2',
'sq_al': 'sq_AL.ISO8859-2',
'sq_al.iso88592': 'sq_AL.ISO8859-2',
'sq_mk': 'sq_MK.UTF-8',
'sr': 'sr_RS.UTF-8',
'sr@cyrillic': 'sr_RS.UTF-8',
'sr@latin': 'sr_RS.UTF-8@latin',
'sr@latn': 'sr_CS.UTF-8@latin',
'sr_cs': 'sr_CS.UTF-8',
'sr_cs.iso88592': 'sr_CS.ISO8859-2',
'sr_cs.iso88592@latn': 'sr_CS.ISO8859-2',
'sr_cs.iso88595': 'sr_CS.ISO8859-5',
'sr_cs.utf8@latn': 'sr_CS.UTF-8@latin',
'sr_cs@latn': 'sr_CS.UTF-8@latin',
'sr_me': 'sr_ME.UTF-8',
'sr_rs': 'sr_RS.UTF-8',
'sr_rs.utf8@latn': 'sr_RS.UTF-8@latin',
'sr_rs@latin': 'sr_RS.UTF-8@latin',
'sr_rs@latn': 'sr_RS.UTF-8@latin',
'sr_sp': 'sr_CS.ISO8859-2',
'sr_yu': 'sr_RS.UTF-8@latin',
'sr_yu.cp1251@cyrillic': 'sr_CS.CP1251',
'sr_yu.iso88592': 'sr_CS.ISO8859-2',
'sr_yu.iso88595': 'sr_CS.ISO8859-5',
'sr_yu.iso88595@cyrillic': 'sr_CS.ISO8859-5',
'sr_yu.microsoftcp1251@cyrillic': 'sr_CS.CP1251',
'sr_yu.utf8': 'sr_RS.UTF-8',
'sr_yu.utf8@cyrillic': 'sr_RS.UTF-8',
'sr_yu@cyrillic': 'sr_RS.UTF-8',
'ss': 'ss_ZA.ISO8859-1',
'ss_za': 'ss_ZA.ISO8859-1',
'ss_za.iso88591': 'ss_ZA.ISO8859-1',
'st': 'st_ZA.ISO8859-1',
'st_za': 'st_ZA.ISO8859-1',
'st_za.iso88591': 'st_ZA.ISO8859-1',
'sv': 'sv_SE.ISO8859-1',
'sv.iso885915': 'sv_SE.ISO8859-15',
'sv_fi': 'sv_FI.ISO8859-1',
'sv_fi.iso88591': 'sv_FI.ISO8859-1',
'sv_fi.iso885915': 'sv_FI.ISO8859-15',
'sv_fi.iso885915@euro': 'sv_FI.ISO8859-15',
'sv_fi.utf8@euro': 'sv_FI.UTF-8',
'sv_fi@euro': 'sv_FI.ISO8859-15',
'sv_se': 'sv_SE.ISO8859-1',
'sv_se.88591': 'sv_SE.ISO8859-1',
'sv_se.iso88591': 'sv_SE.ISO8859-1',
'sv_se.iso885915': 'sv_SE.ISO8859-15',
'sv_se@euro': 'sv_SE.ISO8859-15',
'sw_ke': 'sw_KE.UTF-8',
'sw_tz': 'sw_TZ.UTF-8',
'swedish': 'sv_SE.ISO8859-1',
'swedish.iso88591': 'sv_SE.ISO8859-1',
'szl_pl': 'szl_PL.UTF-8',
'ta': 'ta_IN.TSCII-0',
'ta_in': 'ta_IN.TSCII-0',
'ta_in.tscii': 'ta_IN.TSCII-0',
'ta_in.tscii0': 'ta_IN.TSCII-0',
'ta_lk': 'ta_LK.UTF-8',
'tcy_in.utf8': 'tcy_IN.UTF-8',
'te': 'te_IN.UTF-8',
'te_in': 'te_IN.UTF-8',
'tg': 'tg_TJ.KOI8-C',
'tg_tj': 'tg_TJ.KOI8-C',
'tg_tj.koi8c': 'tg_TJ.KOI8-C',
'th': 'th_TH.ISO8859-11',
'th_th': 'th_TH.ISO8859-11',
'th_th.iso885911': 'th_TH.ISO8859-11',
'th_th.tactis': 'th_TH.TIS620',
'th_th.tis620': 'th_TH.TIS620',
'thai': 'th_TH.ISO8859-11',
'the_np': 'the_NP.UTF-8',
'ti_er': 'ti_ER.UTF-8',
'ti_et': 'ti_ET.UTF-8',
'tig_er': 'tig_ER.UTF-8',
'tk_tm': 'tk_TM.UTF-8',
'tl': 'tl_PH.ISO8859-1',
'tl_ph': 'tl_PH.ISO8859-1',
'tl_ph.iso88591': 'tl_PH.ISO8859-1',
'tn': 'tn_ZA.ISO8859-15',
'tn_za': 'tn_ZA.ISO8859-15',
'tn_za.iso885915': 'tn_ZA.ISO8859-15',
'to_to': 'to_TO.UTF-8',
'tpi_pg': 'tpi_PG.UTF-8',
'tr': 'tr_TR.ISO8859-9',
'tr_cy': 'tr_CY.ISO8859-9',
'tr_tr': 'tr_TR.ISO8859-9',
'tr_tr.iso88599': 'tr_TR.ISO8859-9',
'ts': 'ts_ZA.ISO8859-1',
'ts_za': 'ts_ZA.ISO8859-1',
'ts_za.iso88591': 'ts_ZA.ISO8859-1',
'tt': 'tt_RU.TATAR-CYR',
'tt_ru': 'tt_RU.TATAR-CYR',
'tt_ru.koi8c': 'tt_RU.KOI8-C',
'tt_ru.tatarcyr': 'tt_RU.TATAR-CYR',
'tt_ru@iqtelif': 'tt_RU.UTF-8@iqtelif',
'turkish': 'tr_TR.ISO8859-9',
'turkish.iso88599': 'tr_TR.ISO8859-9',
'ug_cn': 'ug_CN.UTF-8',
'uk': 'uk_UA.KOI8-U',
'uk_ua': 'uk_UA.KOI8-U',
'uk_ua.cp1251': 'uk_UA.CP1251',
'uk_ua.iso88595': 'uk_UA.ISO8859-5',
'uk_ua.koi8u': 'uk_UA.KOI8-U',
'uk_ua.microsoftcp1251': 'uk_UA.CP1251',
'univ': 'en_US.utf',
'universal': 'en_US.utf',
'universal.utf8@ucs4': 'en_US.UTF-8',
'unm_us': 'unm_US.UTF-8',
'ur': 'ur_PK.CP1256',
'ur_in': 'ur_IN.UTF-8',
'ur_pk': 'ur_PK.CP1256',
'ur_pk.cp1256': 'ur_PK.CP1256',
'ur_pk.microsoftcp1256': 'ur_PK.CP1256',
'uz': 'uz_UZ.UTF-8',
'uz_uz': 'uz_UZ.UTF-8',
'uz_uz.iso88591': 'uz_UZ.ISO8859-1',
'uz_uz.utf8@cyrillic': 'uz_UZ.UTF-8',
'uz_uz@cyrillic': 'uz_UZ.UTF-8',
've': 've_ZA.UTF-8',
've_za': 've_ZA.UTF-8',
'vi': 'vi_VN.TCVN',
'vi_vn': 'vi_VN.TCVN',
'vi_vn.tcvn': 'vi_VN.TCVN',
'vi_vn.tcvn5712': 'vi_VN.TCVN',
'vi_vn.viscii': 'vi_VN.VISCII',
'vi_vn.viscii111': 'vi_VN.VISCII',
'wa': 'wa_BE.ISO8859-1',
'wa_be': 'wa_BE.ISO8859-1',
'wa_be.iso88591': 'wa_BE.ISO8859-1',
'wa_be.iso885915': 'wa_BE.ISO8859-15',
'wa_be.iso885915@euro': 'wa_BE.ISO8859-15',
'wa_be@euro': 'wa_BE.ISO8859-15',
'wae_ch': 'wae_CH.UTF-8',
'wal_et': 'wal_ET.UTF-8',
'wo_sn': 'wo_SN.UTF-8',
'xh': 'xh_ZA.ISO8859-1',
'xh_za': 'xh_ZA.ISO8859-1',
'xh_za.iso88591': 'xh_ZA.ISO8859-1',
'yi': 'yi_US.CP1255',
'yi_us': 'yi_US.CP1255',
'yi_us.cp1255': 'yi_US.CP1255',
'yi_us.microsoftcp1255': 'yi_US.CP1255',
'yo_ng': 'yo_NG.UTF-8',
'yue_hk': 'yue_HK.UTF-8',
'yuw_pg': 'yuw_PG.UTF-8',
'zh': 'zh_CN.eucCN',
'zh_cn': 'zh_CN.gb2312',
'zh_cn.big5': 'zh_TW.big5',
'zh_cn.euc': 'zh_CN.eucCN',
'zh_cn.gb18030': 'zh_CN.gb18030',
'zh_cn.gb2312': 'zh_CN.gb2312',
'zh_cn.gbk': 'zh_CN.gbk',
'zh_hk': 'zh_HK.big5hkscs',
'zh_hk.big5': 'zh_HK.big5',
'zh_hk.big5hk': 'zh_HK.big5hkscs',
'zh_hk.big5hkscs': 'zh_HK.big5hkscs',
'zh_sg': 'zh_SG.GB2312',
'zh_sg.gbk': 'zh_SG.GBK',
'zh_tw': 'zh_TW.big5',
'zh_tw.big5': 'zh_TW.big5',
'zh_tw.euc': 'zh_TW.eucTW',
'zh_tw.euctw': 'zh_TW.eucTW',
'zu': 'zu_ZA.ISO8859-1',
'zu_za': 'zu_ZA.ISO8859-1',
'zu_za.iso88591': 'zu_ZA.ISO8859-1',
}
#
# This maps Windows language identifiers to locale strings.
#
# This list has been updated from
# http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_238z.asp
# to include every locale up to Windows Vista.
#
# NOTE: this mapping is incomplete. If your language is missing, please
# submit a bug report to the Python bug tracker at http://bugs.python.org/
# Make sure you include the missing language identifier and the suggested
# locale code.
#
windows_locale = {
0x0436: "af_ZA", # Afrikaans
0x041c: "sq_AL", # Albanian
0x0484: "gsw_FR",# Alsatian - France
0x045e: "am_ET", # Amharic - Ethiopia
0x0401: "ar_SA", # Arabic - Saudi Arabia
0x0801: "ar_IQ", # Arabic - Iraq
0x0c01: "ar_EG", # Arabic - Egypt
0x1001: "ar_LY", # Arabic - Libya
0x1401: "ar_DZ", # Arabic - Algeria
0x1801: "ar_MA", # Arabic - Morocco
0x1c01: "ar_TN", # Arabic - Tunisia
0x2001: "ar_OM", # Arabic - Oman
0x2401: "ar_YE", # Arabic - Yemen
0x2801: "ar_SY", # Arabic - Syria
0x2c01: "ar_JO", # Arabic - Jordan
0x3001: "ar_LB", # Arabic - Lebanon
0x3401: "ar_KW", # Arabic - Kuwait
0x3801: "ar_AE", # Arabic - United Arab Emirates
0x3c01: "ar_BH", # Arabic - Bahrain
0x4001: "ar_QA", # Arabic - Qatar
0x042b: "hy_AM", # Armenian
0x044d: "as_IN", # Assamese - India
0x042c: "az_AZ", # Azeri - Latin
0x082c: "az_AZ", # Azeri - Cyrillic
0x046d: "ba_RU", # Bashkir
0x042d: "eu_ES", # Basque - Russia
0x0423: "be_BY", # Belarusian
0x0445: "bn_IN", # Begali
0x201a: "bs_BA", # Bosnian - Cyrillic
0x141a: "bs_BA", # Bosnian - Latin
0x047e: "br_FR", # Breton - France
0x0402: "bg_BG", # Bulgarian
# 0x0455: "my_MM", # Burmese - Not supported
0x0403: "ca_ES", # Catalan
0x0004: "zh_CHS",# Chinese - Simplified
0x0404: "zh_TW", # Chinese - Taiwan
0x0804: "zh_CN", # Chinese - PRC
0x0c04: "zh_HK", # Chinese - Hong Kong S.A.R.
0x1004: "zh_SG", # Chinese - Singapore
0x1404: "zh_MO", # Chinese - Macao S.A.R.
0x7c04: "zh_CHT",# Chinese - Traditional
0x0483: "co_FR", # Corsican - France
0x041a: "hr_HR", # Croatian
0x101a: "hr_BA", # Croatian - Bosnia
0x0405: "cs_CZ", # Czech
0x0406: "da_DK", # Danish
0x048c: "gbz_AF",# Dari - Afghanistan
0x0465: "div_MV",# Divehi - Maldives
0x0413: "nl_NL", # Dutch - The Netherlands
0x0813: "nl_BE", # Dutch - Belgium
0x0409: "en_US", # English - United States
0x0809: "en_GB", # English - United Kingdom
0x0c09: "en_AU", # English - Australia
0x1009: "en_CA", # English - Canada
0x1409: "en_NZ", # English - New Zealand
0x1809: "en_IE", # English - Ireland
0x1c09: "en_ZA", # English - South Africa
0x2009: "en_JA", # English - Jamaica
0x2409: "en_CB", # English - Caribbean
0x2809: "en_BZ", # English - Belize
0x2c09: "en_TT", # English - Trinidad
0x3009: "en_ZW", # English - Zimbabwe
0x3409: "en_PH", # English - Philippines
0x4009: "en_IN", # English - India
0x4409: "en_MY", # English - Malaysia
0x4809: "en_IN", # English - Singapore
0x0425: "et_EE", # Estonian
0x0438: "fo_FO", # Faroese
0x0464: "fil_PH",# Filipino
0x040b: "fi_FI", # Finnish
0x040c: "fr_FR", # French - France
0x080c: "fr_BE", # French - Belgium
0x0c0c: "fr_CA", # French - Canada
0x100c: "fr_CH", # French - Switzerland
0x140c: "fr_LU", # French - Luxembourg
0x180c: "fr_MC", # French - Monaco
0x0462: "fy_NL", # Frisian - Netherlands
0x0456: "gl_ES", # Galician
0x0437: "ka_GE", # Georgian
0x0407: "de_DE", # German - Germany
0x0807: "de_CH", # German - Switzerland
0x0c07: "de_AT", # German - Austria
0x1007: "de_LU", # German - Luxembourg
0x1407: "de_LI", # German - Liechtenstein
0x0408: "el_GR", # Greek
0x046f: "kl_GL", # Greenlandic - Greenland
0x0447: "gu_IN", # Gujarati
0x0468: "ha_NG", # Hausa - Latin
0x040d: "he_IL", # Hebrew
0x0439: "hi_IN", # Hindi
0x040e: "hu_HU", # Hungarian
0x040f: "is_IS", # Icelandic
0x0421: "id_ID", # Indonesian
0x045d: "iu_CA", # Inuktitut - Syllabics
0x085d: "iu_CA", # Inuktitut - Latin
0x083c: "ga_IE", # Irish - Ireland
0x0410: "it_IT", # Italian - Italy
0x0810: "it_CH", # Italian - Switzerland
0x0411: "ja_JP", # Japanese
0x044b: "kn_IN", # Kannada - India
0x043f: "kk_KZ", # Kazakh
0x0453: "kh_KH", # Khmer - Cambodia
0x0486: "qut_GT",# K'iche - Guatemala
0x0487: "rw_RW", # Kinyarwanda - Rwanda
0x0457: "kok_IN",# Konkani
0x0412: "ko_KR", # Korean
0x0440: "ky_KG", # Kyrgyz
0x0454: "lo_LA", # Lao - Lao PDR
0x0426: "lv_LV", # Latvian
0x0427: "lt_LT", # Lithuanian
0x082e: "dsb_DE",# Lower Sorbian - Germany
0x046e: "lb_LU", # Luxembourgish
0x042f: "mk_MK", # FYROM Macedonian
0x043e: "ms_MY", # Malay - Malaysia
0x083e: "ms_BN", # Malay - Brunei Darussalam
0x044c: "ml_IN", # Malayalam - India
0x043a: "mt_MT", # Maltese
0x0481: "mi_NZ", # Maori
0x047a: "arn_CL",# Mapudungun
0x044e: "mr_IN", # Marathi
0x047c: "moh_CA",# Mohawk - Canada
0x0450: "mn_MN", # Mongolian - Cyrillic
0x0850: "mn_CN", # Mongolian - PRC
0x0461: "ne_NP", # Nepali
0x0414: "nb_NO", # Norwegian - Bokmal
0x0814: "nn_NO", # Norwegian - Nynorsk
0x0482: "oc_FR", # Occitan - France
0x0448: "or_IN", # Oriya - India
0x0463: "ps_AF", # Pashto - Afghanistan
0x0429: "fa_IR", # Persian
0x0415: "pl_PL", # Polish
0x0416: "pt_BR", # Portuguese - Brazil
0x0816: "pt_PT", # Portuguese - Portugal
0x0446: "pa_IN", # Punjabi
0x046b: "quz_BO",# Quechua (Bolivia)
0x086b: "quz_EC",# Quechua (Ecuador)
0x0c6b: "quz_PE",# Quechua (Peru)
0x0418: "ro_RO", # Romanian - Romania
0x0417: "rm_CH", # Romansh
0x0419: "ru_RU", # Russian
0x243b: "smn_FI",# Sami Finland
0x103b: "smj_NO",# Sami Norway
0x143b: "smj_SE",# Sami Sweden
0x043b: "se_NO", # Sami Northern Norway
0x083b: "se_SE", # Sami Northern Sweden
0x0c3b: "se_FI", # Sami Northern Finland
0x203b: "sms_FI",# Sami Skolt
0x183b: "sma_NO",# Sami Southern Norway
0x1c3b: "sma_SE",# Sami Southern Sweden
0x044f: "sa_IN", # Sanskrit
0x0c1a: "sr_SP", # Serbian - Cyrillic
0x1c1a: "sr_BA", # Serbian - Bosnia Cyrillic
0x081a: "sr_SP", # Serbian - Latin
0x181a: "sr_BA", # Serbian - Bosnia Latin
0x045b: "si_LK", # Sinhala - Sri Lanka
0x046c: "ns_ZA", # Northern Sotho
0x0432: "tn_ZA", # Setswana - Southern Africa
0x041b: "sk_SK", # Slovak
0x0424: "sl_SI", # Slovenian
0x040a: "es_ES", # Spanish - Spain
0x080a: "es_MX", # Spanish - Mexico
0x0c0a: "es_ES", # Spanish - Spain (Modern)
0x100a: "es_GT", # Spanish - Guatemala
0x140a: "es_CR", # Spanish - Costa Rica
0x180a: "es_PA", # Spanish - Panama
0x1c0a: "es_DO", # Spanish - Dominican Republic
0x200a: "es_VE", # Spanish - Venezuela
0x240a: "es_CO", # Spanish - Colombia
0x280a: "es_PE", # Spanish - Peru
0x2c0a: "es_AR", # Spanish - Argentina
0x300a: "es_EC", # Spanish - Ecuador
0x340a: "es_CL", # Spanish - Chile
0x380a: "es_UR", # Spanish - Uruguay
0x3c0a: "es_PY", # Spanish - Paraguay
0x400a: "es_BO", # Spanish - Bolivia
0x440a: "es_SV", # Spanish - El Salvador
0x480a: "es_HN", # Spanish - Honduras
0x4c0a: "es_NI", # Spanish - Nicaragua
0x500a: "es_PR", # Spanish - Puerto Rico
0x540a: "es_US", # Spanish - United States
# 0x0430: "", # Sutu - Not supported
0x0441: "sw_KE", # Swahili
0x041d: "sv_SE", # Swedish - Sweden
0x081d: "sv_FI", # Swedish - Finland
0x045a: "syr_SY",# Syriac
0x0428: "tg_TJ", # Tajik - Cyrillic
0x085f: "tmz_DZ",# Tamazight - Latin
0x0449: "ta_IN", # Tamil
0x0444: "tt_RU", # Tatar
0x044a: "te_IN", # Telugu
0x041e: "th_TH", # Thai
0x0851: "bo_BT", # Tibetan - Bhutan
0x0451: "bo_CN", # Tibetan - PRC
0x041f: "tr_TR", # Turkish
0x0442: "tk_TM", # Turkmen - Cyrillic
0x0480: "ug_CN", # Uighur - Arabic
0x0422: "uk_UA", # Ukrainian
0x042e: "wen_DE",# Upper Sorbian - Germany
0x0420: "ur_PK", # Urdu
0x0820: "ur_IN", # Urdu - India
0x0443: "uz_UZ", # Uzbek - Latin
0x0843: "uz_UZ", # Uzbek - Cyrillic
0x042a: "vi_VN", # Vietnamese
0x0452: "cy_GB", # Welsh
0x0488: "wo_SN", # Wolof - Senegal
0x0434: "xh_ZA", # Xhosa - South Africa
0x0485: "sah_RU",# Yakut - Cyrillic
0x0478: "ii_CN", # Yi - PRC
0x046a: "yo_NG", # Yoruba - Nigeria
0x0435: "zu_ZA", # Zulu
}
def _print_locale():
""" Test function.
"""
categories = {}
def _init_categories(categories=categories):
for k,v in globals().items():
if k[:3] == 'LC_':
categories[k] = v
_init_categories()
del categories['LC_ALL']
print 'Locale defaults as determined by getdefaultlocale():'
print '-'*72
lang, enc = getdefaultlocale()
print 'Language: ', lang or '(undefined)'
print 'Encoding: ', enc or '(undefined)'
print
print 'Locale settings on startup:'
print '-'*72
for name,category in categories.items():
print name, '...'
lang, enc = getlocale(category)
print ' Language: ', lang or '(undefined)'
print ' Encoding: ', enc or '(undefined)'
print
print
print 'Locale settings after calling resetlocale():'
print '-'*72
resetlocale()
for name,category in categories.items():
print name, '...'
lang, enc = getlocale(category)
print ' Language: ', lang or '(undefined)'
print ' Encoding: ', enc or '(undefined)'
print
try:
setlocale(LC_ALL, "")
except:
print 'NOTE:'
print 'setlocale(LC_ALL, "") does not support the default locale'
print 'given in the OS environment variables.'
else:
print
print 'Locale settings after calling setlocale(LC_ALL, ""):'
print '-'*72
for name,category in categories.items():
print name, '...'
lang, enc = getlocale(category)
print ' Language: ', lang or '(undefined)'
print ' Encoding: ', enc or '(undefined)'
print
###
try:
LC_MESSAGES
except NameError:
pass
else:
__all__.append("LC_MESSAGES")
if __name__=='__main__':
print 'Locale aliasing:'
print
_print_locale()
print
print 'Number formatting:'
print
_test()
# Copyright 2001-2014 by Vinay Sajip. All Rights Reserved.
#
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Vinay Sajip
# not be used in advertising or publicity pertaining to distribution
# of the software without specific, written prior permission.
# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
"""
Configuration functions for the logging package for Python. The core package
is based on PEP 282 and comments thereto in comp.lang.python, and influenced
by Apache's log4j system.
Copyright (C) 2001-2014 Vinay Sajip. All Rights Reserved.
To use, simply 'import logging' and log away!
"""
import cStringIO
import errno
import io
import logging
import logging.handlers
import os
import re
import socket
import struct
import sys
import traceback
import types
try:
import thread
import threading
except ImportError:
thread = None
from SocketServer import ThreadingTCPServer, StreamRequestHandler
DEFAULT_LOGGING_CONFIG_PORT = 9030
RESET_ERROR = errno.ECONNRESET
#
# The following code implements a socket listener for on-the-fly
# reconfiguration of logging.
#
# _listener holds the server object doing the listening
_listener = None
def fileConfig(fname, defaults=None, disable_existing_loggers=True):
"""
Read the logging configuration from a ConfigParser-format file.
This can be called several times from an application, allowing an end user
the ability to select from various pre-canned configurations (if the
developer provides a mechanism to present the choices and load the chosen
configuration).
"""
import ConfigParser
cp = ConfigParser.ConfigParser(defaults)
if hasattr(fname, 'readline'):
cp.readfp(fname)
else:
cp.read(fname)
formatters = _create_formatters(cp)
# critical section
logging._acquireLock()
try:
logging._handlers.clear()
del logging._handlerList[:]
# Handlers add themselves to logging._handlers
handlers = _install_handlers(cp, formatters)
_install_loggers(cp, handlers, disable_existing_loggers)
finally:
logging._releaseLock()
def _resolve(name):
"""Resolve a dotted name to a global object."""
name = name.split('.')
used = name.pop(0)
found = __import__(used)
for n in name:
used = used + '.' + n
try:
found = getattr(found, n)
except AttributeError:
__import__(used)
found = getattr(found, n)
return found
def _strip_spaces(alist):
return map(lambda x: x.strip(), alist)
def _encoded(s):
return s if isinstance(s, str) else s.encode('utf-8')
def _create_formatters(cp):
"""Create and return formatters"""
flist = cp.get("formatters", "keys")
if not len(flist):
return {}
flist = flist.split(",")
flist = _strip_spaces(flist)
formatters = {}
for form in flist:
sectname = "formatter_%s" % form
opts = cp.options(sectname)
if "format" in opts:
fs = cp.get(sectname, "format", 1)
else:
fs = None
if "datefmt" in opts:
dfs = cp.get(sectname, "datefmt", 1)
else:
dfs = None
c = logging.Formatter
if "class" in opts:
class_name = cp.get(sectname, "class")
if class_name:
c = _resolve(class_name)
f = c(fs, dfs)
formatters[form] = f
return formatters
def _install_handlers(cp, formatters):
"""Install and return handlers"""
hlist = cp.get("handlers", "keys")
if not len(hlist):
return {}
hlist = hlist.split(",")
hlist = _strip_spaces(hlist)
handlers = {}
fixups = [] #for inter-handler references
for hand in hlist:
sectname = "handler_%s" % hand
klass = cp.get(sectname, "class")
opts = cp.options(sectname)
if "formatter" in opts:
fmt = cp.get(sectname, "formatter")
else:
fmt = ""
try:
klass = eval(klass, vars(logging))
except (AttributeError, NameError):
klass = _resolve(klass)
args = cp.get(sectname, "args")
args = eval(args, vars(logging))
h = klass(*args)
if "level" in opts:
level = cp.get(sectname, "level")
h.setLevel(logging._levelNames[level])
if len(fmt):
h.setFormatter(formatters[fmt])
if issubclass(klass, logging.handlers.MemoryHandler):
if "target" in opts:
target = cp.get(sectname,"target")
else:
target = ""
if len(target): #the target handler may not be loaded yet, so keep for later...
fixups.append((h, target))
handlers[hand] = h
#now all handlers are loaded, fixup inter-handler references...
for h, t in fixups:
h.setTarget(handlers[t])
return handlers
def _install_loggers(cp, handlers, disable_existing_loggers):
"""Create and install loggers"""
# configure the root first
llist = cp.get("loggers", "keys")
llist = llist.split(",")
llist = list(map(lambda x: x.strip(), llist))
llist.remove("root")
sectname = "logger_root"
root = logging.root
log = root
opts = cp.options(sectname)
if "level" in opts:
level = cp.get(sectname, "level")
log.setLevel(logging._levelNames[level])
for h in root.handlers[:]:
root.removeHandler(h)
hlist = cp.get(sectname, "handlers")
if len(hlist):
hlist = hlist.split(",")
hlist = _strip_spaces(hlist)
for hand in hlist:
log.addHandler(handlers[hand])
#and now the others...
#we don't want to lose the existing loggers,
#since other threads may have pointers to them.
#existing is set to contain all existing loggers,
#and as we go through the new configuration we
#remove any which are configured. At the end,
#what's left in existing is the set of loggers
#which were in the previous configuration but
#which are not in the new configuration.
existing = list(root.manager.loggerDict.keys())
#The list needs to be sorted so that we can
#avoid disabling child loggers of explicitly
#named loggers. With a sorted list it is easier
#to find the child loggers.
existing.sort()
#We'll keep the list of existing loggers
#which are children of named loggers here...
child_loggers = []
#now set up the new ones...
for log in llist:
sectname = "logger_%s" % log
qn = cp.get(sectname, "qualname")
opts = cp.options(sectname)
if "propagate" in opts:
propagate = cp.getint(sectname, "propagate")
else:
propagate = 1
logger = logging.getLogger(qn)
if qn in existing:
i = existing.index(qn) + 1 # start with the entry after qn
prefixed = qn + "."
pflen = len(prefixed)
num_existing = len(existing)
while i < num_existing:
if existing[i][:pflen] == prefixed:
child_loggers.append(existing[i])
i += 1
existing.remove(qn)
if "level" in opts:
level = cp.get(sectname, "level")
logger.setLevel(logging._levelNames[level])
for h in logger.handlers[:]:
logger.removeHandler(h)
logger.propagate = propagate
logger.disabled = 0
hlist = cp.get(sectname, "handlers")
if len(hlist):
hlist = hlist.split(",")
hlist = _strip_spaces(hlist)
for hand in hlist:
logger.addHandler(handlers[hand])
#Disable any old loggers. There's no point deleting
#them as other threads may continue to hold references
#and by disabling them, you stop them doing any logging.
#However, don't disable children of named loggers, as that's
#probably not what was intended by the user.
for log in existing:
logger = root.manager.loggerDict[log]
if log in child_loggers:
logger.level = logging.NOTSET
logger.handlers = []
logger.propagate = 1
else:
logger.disabled = disable_existing_loggers
IDENTIFIER = re.compile('^[a-z_][a-z0-9_]*$', re.I)
def valid_ident(s):
m = IDENTIFIER.match(s)
if not m:
raise ValueError('Not a valid Python identifier: %r' % s)
return True
class ConvertingMixin(object):
"""For ConvertingXXX's, this mixin class provides common functions"""
def convert_with_key(self, key, value, replace=True):
result = self.configurator.convert(value)
#If the converted value is different, save for next time
if value is not result:
if replace:
self[key] = result
if type(result) in (ConvertingDict, ConvertingList,
ConvertingTuple):
result.parent = self
result.key = key
return result
def convert(self, value):
result = self.configurator.convert(value)
if value is not result:
if type(result) in (ConvertingDict, ConvertingList,
ConvertingTuple):
result.parent = self
return result
# The ConvertingXXX classes are wrappers around standard Python containers,
# and they serve to convert any suitable values in the container. The
# conversion converts base dicts, lists and tuples to their wrapped
# equivalents, whereas strings which match a conversion format are converted
# appropriately.
#
# Each wrapper should have a configurator attribute holding the actual
# configurator to use for conversion.
class ConvertingDict(dict, ConvertingMixin):
"""A converting dictionary wrapper."""
def __getitem__(self, key):
value = dict.__getitem__(self, key)
return self.convert_with_key(key, value)
def get(self, key, default=None):
value = dict.get(self, key, default)
return self.convert_with_key(key, value)
def pop(self, key, default=None):
value = dict.pop(self, key, default)
return self.convert_with_key(key, value, replace=False)
class ConvertingList(list, ConvertingMixin):
"""A converting list wrapper."""
def __getitem__(self, key):
value = list.__getitem__(self, key)
return self.convert_with_key(key, value)
def pop(self, idx=-1):
value = list.pop(self, idx)
return self.convert(value)
class ConvertingTuple(tuple, ConvertingMixin):
"""A converting tuple wrapper."""
def __getitem__(self, key):
value = tuple.__getitem__(self, key)
# Can't replace a tuple entry.
return self.convert_with_key(key, value, replace=False)
class BaseConfigurator(object):
"""
The configurator base class which defines some useful defaults.
"""
CONVERT_PATTERN = re.compile(r'^(?P<prefix>[a-z]+)://(?P<suffix>.*)$')
WORD_PATTERN = re.compile(r'^\s*(\w+)\s*')
DOT_PATTERN = re.compile(r'^\.\s*(\w+)\s*')
INDEX_PATTERN = re.compile(r'^\[\s*(\w+)\s*\]\s*')
DIGIT_PATTERN = re.compile(r'^\d+$')
value_converters = {
'ext' : 'ext_convert',
'cfg' : 'cfg_convert',
}
# We might want to use a different one, e.g. importlib
importer = __import__
def __init__(self, config):
self.config = ConvertingDict(config)
self.config.configurator = self
# Issue 12718: winpdb replaces __import__ with a Python function, which
# ends up being treated as a bound method. To avoid problems, we
# set the importer on the instance, but leave it defined in the class
# so existing code doesn't break
if type(__import__) == types.FunctionType:
self.importer = __import__
def resolve(self, s):
"""
Resolve strings to objects using standard import and attribute
syntax.
"""
name = s.split('.')
used = name.pop(0)
try:
found = self.importer(used)
for frag in name:
used += '.' + frag
try:
found = getattr(found, frag)
except AttributeError:
self.importer(used)
found = getattr(found, frag)
return found
except ImportError:
e, tb = sys.exc_info()[1:]
v = ValueError('Cannot resolve %r: %s' % (s, e))
v.__cause__, v.__traceback__ = e, tb
raise v
def ext_convert(self, value):
"""Default converter for the ext:// protocol."""
return self.resolve(value)
def cfg_convert(self, value):
"""Default converter for the cfg:// protocol."""
rest = value
m = self.WORD_PATTERN.match(rest)
if m is None:
raise ValueError("Unable to convert %r" % value)
else:
rest = rest[m.end():]
d = self.config[m.groups()[0]]
#print d, rest
while rest:
m = self.DOT_PATTERN.match(rest)
if m:
d = d[m.groups()[0]]
else:
m = self.INDEX_PATTERN.match(rest)
if m:
idx = m.groups()[0]
if not self.DIGIT_PATTERN.match(idx):
d = d[idx]
else:
try:
n = int(idx) # try as number first (most likely)
d = d[n]
except TypeError:
d = d[idx]
if m:
rest = rest[m.end():]
else:
raise ValueError('Unable to convert '
'%r at %r' % (value, rest))
#rest should be empty
return d
def convert(self, value):
"""
Convert values to an appropriate type. dicts, lists and tuples are
replaced by their converting alternatives. Strings are checked to
see if they have a conversion format and are converted if they do.
"""
if not isinstance(value, ConvertingDict) and isinstance(value, dict):
value = ConvertingDict(value)
value.configurator = self
elif not isinstance(value, ConvertingList) and isinstance(value, list):
value = ConvertingList(value)
value.configurator = self
elif not isinstance(value, ConvertingTuple) and\
isinstance(value, tuple):
value = ConvertingTuple(value)
value.configurator = self
elif isinstance(value, basestring): # str for py3k
m = self.CONVERT_PATTERN.match(value)
if m:
d = m.groupdict()
prefix = d['prefix']
converter = self.value_converters.get(prefix, None)
if converter:
suffix = d['suffix']
converter = getattr(self, converter)
value = converter(suffix)
return value
def configure_custom(self, config):
"""Configure an object with a user-supplied factory."""
c = config.pop('()')
if not hasattr(c, '__call__') and hasattr(types, 'ClassType') and type(c) != types.ClassType:
c = self.resolve(c)
props = config.pop('.', None)
# Check for valid identifiers
kwargs = dict([(k, config[k]) for k in config if valid_ident(k)])
result = c(**kwargs)
if props:
for name, value in props.items():
setattr(result, name, value)
return result
def as_tuple(self, value):
"""Utility function which converts lists to tuples."""
if isinstance(value, list):
value = tuple(value)
return value
class DictConfigurator(BaseConfigurator):
"""
Configure logging using a dictionary-like object to describe the
configuration.
"""
def configure(self):
"""Do the configuration."""
config = self.config
if 'version' not in config:
raise ValueError("dictionary doesn't specify a version")
if config['version'] != 1:
raise ValueError("Unsupported version: %s" % config['version'])
incremental = config.pop('incremental', False)
EMPTY_DICT = {}
logging._acquireLock()
try:
if incremental:
handlers = config.get('handlers', EMPTY_DICT)
for name in handlers:
if name not in logging._handlers:
raise ValueError('No handler found with '
'name %r' % name)
else:
try:
handler = logging._handlers[name]
handler_config = handlers[name]
level = handler_config.get('level', None)
if level:
handler.setLevel(logging._checkLevel(level))
except StandardError as e:
raise ValueError('Unable to configure handler '
'%r: %s' % (name, e))
loggers = config.get('loggers', EMPTY_DICT)
for name in loggers:
try:
self.configure_logger(name, loggers[name], True)
except StandardError as e:
raise ValueError('Unable to configure logger '
'%r: %s' % (name, e))
root = config.get('root', None)
if root:
try:
self.configure_root(root, True)
except StandardError as e:
raise ValueError('Unable to configure root '
'logger: %s' % e)
else:
disable_existing = config.pop('disable_existing_loggers', True)
logging._handlers.clear()
del logging._handlerList[:]
# Do formatters first - they don't refer to anything else
formatters = config.get('formatters', EMPTY_DICT)
for name in formatters:
try:
formatters[name] = self.configure_formatter(
formatters[name])
except StandardError as e:
raise ValueError('Unable to configure '
'formatter %r: %s' % (name, e))
# Next, do filters - they don't refer to anything else, either
filters = config.get('filters', EMPTY_DICT)
for name in filters:
try:
filters[name] = self.configure_filter(filters[name])
except StandardError as e:
raise ValueError('Unable to configure '
'filter %r: %s' % (name, e))
# Next, do handlers - they refer to formatters and filters
# As handlers can refer to other handlers, sort the keys
# to allow a deterministic order of configuration
handlers = config.get('handlers', EMPTY_DICT)
deferred = []
for name in sorted(handlers):
try:
handler = self.configure_handler(handlers[name])
handler.name = name
handlers[name] = handler
except StandardError as e:
if 'target not configured yet' in str(e):
deferred.append(name)
else:
raise ValueError('Unable to configure handler '
'%r: %s' % (name, e))
# Now do any that were deferred
for name in deferred:
try:
handler = self.configure_handler(handlers[name])
handler.name = name
handlers[name] = handler
except StandardError as e:
raise ValueError('Unable to configure handler '
'%r: %s' % (name, e))
# Next, do loggers - they refer to handlers and filters
#we don't want to lose the existing loggers,
#since other threads may have pointers to them.
#existing is set to contain all existing loggers,
#and as we go through the new configuration we
#remove any which are configured. At the end,
#what's left in existing is the set of loggers
#which were in the previous configuration but
#which are not in the new configuration.
root = logging.root
existing = root.manager.loggerDict.keys()
#The list needs to be sorted so that we can
#avoid disabling child loggers of explicitly
#named loggers. With a sorted list it is easier
#to find the child loggers.
existing.sort()
#We'll keep the list of existing loggers
#which are children of named loggers here...
child_loggers = []
#now set up the new ones...
loggers = config.get('loggers', EMPTY_DICT)
for name in loggers:
name = _encoded(name)
if name in existing:
i = existing.index(name)
prefixed = name + "."
pflen = len(prefixed)
num_existing = len(existing)
i = i + 1 # look at the entry after name
while (i < num_existing) and\
(existing[i][:pflen] == prefixed):
child_loggers.append(existing[i])
i = i + 1
existing.remove(name)
try:
self.configure_logger(name, loggers[name])
except StandardError as e:
raise ValueError('Unable to configure logger '
'%r: %s' % (name, e))
#Disable any old loggers. There's no point deleting
#them as other threads may continue to hold references
#and by disabling them, you stop them doing any logging.
#However, don't disable children of named loggers, as that's
#probably not what was intended by the user.
for log in existing:
logger = root.manager.loggerDict[log]
if log in child_loggers:
logger.level = logging.NOTSET
logger.handlers = []
logger.propagate = True
elif disable_existing:
logger.disabled = True
# And finally, do the root logger
root = config.get('root', None)
if root:
try:
self.configure_root(root)
except StandardError as e:
raise ValueError('Unable to configure root '
'logger: %s' % e)
finally:
logging._releaseLock()
def configure_formatter(self, config):
"""Configure a formatter from a dictionary."""
if '()' in config:
factory = config['()'] # for use in exception handler
try:
result = self.configure_custom(config)
except TypeError as te:
if "'format'" not in str(te):
raise
#Name of parameter changed from fmt to format.
#Retry with old name.
#This is so that code can be used with older Python versions
#(e.g. by Django)
config['fmt'] = config.pop('format')
config['()'] = factory
result = self.configure_custom(config)
else:
fmt = config.get('format', None)
dfmt = config.get('datefmt', None)
result = logging.Formatter(fmt, dfmt)
return result
def configure_filter(self, config):
"""Configure a filter from a dictionary."""
if '()' in config:
result = self.configure_custom(config)
else:
name = config.get('name', '')
result = logging.Filter(name)
return result
def add_filters(self, filterer, filters):
"""Add filters to a filterer from a list of names."""
for f in filters:
try:
filterer.addFilter(self.config['filters'][f])
except StandardError as e:
raise ValueError('Unable to add filter %r: %s' % (f, e))
def configure_handler(self, config):
"""Configure a handler from a dictionary."""
formatter = config.pop('formatter', None)
if formatter:
try:
formatter = self.config['formatters'][formatter]
except StandardError as e:
raise ValueError('Unable to set formatter '
'%r: %s' % (formatter, e))
level = config.pop('level', None)
filters = config.pop('filters', None)
if '()' in config:
c = config.pop('()')
if not hasattr(c, '__call__') and hasattr(types, 'ClassType') and type(c) != types.ClassType:
c = self.resolve(c)
factory = c
else:
cname = config.pop('class')
klass = self.resolve(cname)
#Special case for handler which refers to another handler
if issubclass(klass, logging.handlers.MemoryHandler) and\
'target' in config:
try:
th = self.config['handlers'][config['target']]
if not isinstance(th, logging.Handler):
config['class'] = cname # restore for deferred configuration
raise StandardError('target not configured yet')
config['target'] = th
except StandardError as e:
raise ValueError('Unable to set target handler '
'%r: %s' % (config['target'], e))
elif issubclass(klass, logging.handlers.SMTPHandler) and\
'mailhost' in config:
config['mailhost'] = self.as_tuple(config['mailhost'])
elif issubclass(klass, logging.handlers.SysLogHandler) and\
'address' in config:
config['address'] = self.as_tuple(config['address'])
factory = klass
kwargs = dict([(k, config[k]) for k in config if valid_ident(k)])
try:
result = factory(**kwargs)
except TypeError as te:
if "'stream'" not in str(te):
raise
#The argument name changed from strm to stream
#Retry with old name.
#This is so that code can be used with older Python versions
#(e.g. by Django)
kwargs['strm'] = kwargs.pop('stream')
result = factory(**kwargs)
if formatter:
result.setFormatter(formatter)
if level is not None:
result.setLevel(logging._checkLevel(level))
if filters:
self.add_filters(result, filters)
return result
def add_handlers(self, logger, handlers):
"""Add handlers to a logger from a list of names."""
for h in handlers:
try:
logger.addHandler(self.config['handlers'][h])
except StandardError as e:
raise ValueError('Unable to add handler %r: %s' % (h, e))
def common_logger_config(self, logger, config, incremental=False):
"""
Perform configuration which is common to root and non-root loggers.
"""
level = config.get('level', None)
if level is not None:
logger.setLevel(logging._checkLevel(level))
if not incremental:
#Remove any existing handlers
for h in logger.handlers[:]:
logger.removeHandler(h)
handlers = config.get('handlers', None)
if handlers:
self.add_handlers(logger, handlers)
filters = config.get('filters', None)
if filters:
self.add_filters(logger, filters)
def configure_logger(self, name, config, incremental=False):
"""Configure a non-root logger from a dictionary."""
logger = logging.getLogger(name)
self.common_logger_config(logger, config, incremental)
propagate = config.get('propagate', None)
if propagate is not None:
logger.propagate = propagate
def configure_root(self, config, incremental=False):
"""Configure a root logger from a dictionary."""
root = logging.getLogger()
self.common_logger_config(root, config, incremental)
dictConfigClass = DictConfigurator
def dictConfig(config):
"""Configure logging using a dictionary."""
dictConfigClass(config).configure()
def listen(port=DEFAULT_LOGGING_CONFIG_PORT):
"""
Start up a socket server on the specified port, and listen for new
configurations.
These will be sent as a file suitable for processing by fileConfig().
Returns a Thread object on which you can call start() to start the server,
and which you can join() when appropriate. To stop the server, call
stopListening().
"""
if not thread:
raise NotImplementedError("listen() needs threading to work")
class ConfigStreamHandler(StreamRequestHandler):
"""
Handler for a logging configuration request.
It expects a completely new logging configuration and uses fileConfig
to install it.
"""
def handle(self):
"""
Handle a request.
Each request is expected to be a 4-byte length, packed using
struct.pack(">L", n), followed by the config file.
Uses fileConfig() to do the grunt work.
"""
import tempfile
try:
conn = self.connection
chunk = conn.recv(4)
if len(chunk) == 4:
slen = struct.unpack(">L", chunk)[0]
chunk = self.connection.recv(slen)
while len(chunk) < slen:
chunk = chunk + conn.recv(slen - len(chunk))
try:
import json
d =json.loads(chunk)
assert isinstance(d, dict)
dictConfig(d)
except:
#Apply new configuration.
file = cStringIO.StringIO(chunk)
try:
fileConfig(file)
except (KeyboardInterrupt, SystemExit):
raise
except:
traceback.print_exc()
if self.server.ready:
self.server.ready.set()
except socket.error as e:
if e.errno != RESET_ERROR:
raise
class ConfigSocketReceiver(ThreadingTCPServer):
"""
A simple TCP socket-based logging config receiver.
"""
allow_reuse_address = 1
def __init__(self, host='localhost', port=DEFAULT_LOGGING_CONFIG_PORT,
handler=None, ready=None):
ThreadingTCPServer.__init__(self, (host, port), handler)
logging._acquireLock()
self.abort = 0
logging._releaseLock()
self.timeout = 1
self.ready = ready
def serve_until_stopped(self):
import select
abort = 0
while not abort:
rd, wr, ex = select.select([self.socket.fileno()],
[], [],
self.timeout)
if rd:
self.handle_request()
logging._acquireLock()
abort = self.abort
logging._releaseLock()
self.socket.close()
class Server(threading.Thread):
def __init__(self, rcvr, hdlr, port):
super(Server, self).__init__()
self.rcvr = rcvr
self.hdlr = hdlr
self.port = port
self.ready = threading.Event()
def run(self):
server = self.rcvr(port=self.port, handler=self.hdlr,
ready=self.ready)
if self.port == 0:
self.port = server.server_address[1]
self.ready.set()
global _listener
logging._acquireLock()
_listener = server
logging._releaseLock()
server.serve_until_stopped()
return Server(ConfigSocketReceiver, ConfigStreamHandler, port)
def stopListening():
"""
Stop the listening server which was created with a call to listen().
"""
global _listener
logging._acquireLock()
try:
if _listener:
_listener.abort = 1
_listener = None
finally:
logging._releaseLock()
# Copyright 2001-2013 by Vinay Sajip. All Rights Reserved.
#
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Vinay Sajip
# not be used in advertising or publicity pertaining to distribution
# of the software without specific, written prior permission.
# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
"""
Additional handlers for the logging package for Python. The core package is
based on PEP 282 and comments thereto in comp.lang.python.
Copyright (C) 2001-2013 Vinay Sajip. All Rights Reserved.
To use, simply 'import logging.handlers' and log away!
"""
import errno, logging, socket, os, cPickle, struct, time, re
from stat import ST_DEV, ST_INO, ST_MTIME
try:
import codecs
except ImportError:
codecs = None
try:
unicode
_unicode = True
except NameError:
_unicode = False
#
# Some constants...
#
DEFAULT_TCP_LOGGING_PORT = 9020
DEFAULT_UDP_LOGGING_PORT = 9021
DEFAULT_HTTP_LOGGING_PORT = 9022
DEFAULT_SOAP_LOGGING_PORT = 9023
SYSLOG_UDP_PORT = 514
SYSLOG_TCP_PORT = 514
_MIDNIGHT = 24 * 60 * 60 # number of seconds in a day
class BaseRotatingHandler(logging.FileHandler):
"""
Base class for handlers that rotate log files at a certain point.
Not meant to be instantiated directly. Instead, use RotatingFileHandler
or TimedRotatingFileHandler.
"""
def __init__(self, filename, mode, encoding=None, delay=0):
"""
Use the specified filename for streamed logging
"""
if codecs is None:
encoding = None
logging.FileHandler.__init__(self, filename, mode, encoding, delay)
self.mode = mode
self.encoding = encoding
def emit(self, record):
"""
Emit a record.
Output the record to the file, catering for rollover as described
in doRollover().
"""
try:
if self.shouldRollover(record):
self.doRollover()
logging.FileHandler.emit(self, record)
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
class RotatingFileHandler(BaseRotatingHandler):
"""
Handler for logging to a set of files, which switches from one file
to the next when the current file reaches a certain size.
"""
def __init__(self, filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=0):
"""
Open the specified file and use it as the stream for logging.
By default, the file grows indefinitely. You can specify particular
values of maxBytes and backupCount to allow the file to rollover at
a predetermined size.
Rollover occurs whenever the current log file is nearly maxBytes in
length. If backupCount is >= 1, the system will successively create
new files with the same pathname as the base file, but with extensions
".1", ".2" etc. appended to it. For example, with a backupCount of 5
and a base file name of "app.log", you would get "app.log",
"app.log.1", "app.log.2", ... through to "app.log.5". The file being
written to is always "app.log" - when it gets filled up, it is closed
and renamed to "app.log.1", and if files "app.log.1", "app.log.2" etc.
exist, then they are renamed to "app.log.2", "app.log.3" etc.
respectively.
If maxBytes is zero, rollover never occurs.
"""
# If rotation/rollover is wanted, it doesn't make sense to use another
# mode. If for example 'w' were specified, then if there were multiple
# runs of the calling application, the logs from previous runs would be
# lost if the 'w' is respected, because the log file would be truncated
# on each run.
if maxBytes > 0:
mode = 'a'
BaseRotatingHandler.__init__(self, filename, mode, encoding, delay)
self.maxBytes = maxBytes
self.backupCount = backupCount
def doRollover(self):
"""
Do a rollover, as described in __init__().
"""
if self.stream:
self.stream.close()
self.stream = None
if self.backupCount > 0:
for i in range(self.backupCount - 1, 0, -1):
sfn = "%s.%d" % (self.baseFilename, i)
dfn = "%s.%d" % (self.baseFilename, i + 1)
if os.path.exists(sfn):
#print "%s -> %s" % (sfn, dfn)
if os.path.exists(dfn):
os.remove(dfn)
os.rename(sfn, dfn)
dfn = self.baseFilename + ".1"
if os.path.exists(dfn):
os.remove(dfn)
# Issue 18940: A file may not have been created if delay is True.
if os.path.exists(self.baseFilename):
os.rename(self.baseFilename, dfn)
if not self.delay:
self.stream = self._open()
def shouldRollover(self, record):
"""
Determine if rollover should occur.
Basically, see if the supplied record would cause the file to exceed
the size limit we have.
"""
if self.stream is None: # delay was set...
self.stream = self._open()
if self.maxBytes > 0: # are we rolling over?
msg = "%s\n" % self.format(record)
self.stream.seek(0, 2) #due to non-posix-compliant Windows feature
if self.stream.tell() + len(msg) >= self.maxBytes:
return 1
return 0
class TimedRotatingFileHandler(BaseRotatingHandler):
"""
Handler for logging to a file, rotating the log file at certain timed
intervals.
If backupCount is > 0, when rollover is done, no more than backupCount
files are kept - the oldest ones are deleted.
"""
def __init__(self, filename, when='h', interval=1, backupCount=0, encoding=None, delay=False, utc=False):
BaseRotatingHandler.__init__(self, filename, 'a', encoding, delay)
self.when = when.upper()
self.backupCount = backupCount
self.utc = utc
# Calculate the real rollover interval, which is just the number of
# seconds between rollovers. Also set the filename suffix used when
# a rollover occurs. Current 'when' events supported:
# S - Seconds
# M - Minutes
# H - Hours
# D - Days
# midnight - roll over at midnight
# W{0-6} - roll over on a certain day; 0 - Monday
#
# Case of the 'when' specifier is not important; lower or upper case
# will work.
if self.when == 'S':
self.interval = 1 # one second
self.suffix = "%Y-%m-%d_%H-%M-%S"
self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}-\d{2}-\d{2}$"
elif self.when == 'M':
self.interval = 60 # one minute
self.suffix = "%Y-%m-%d_%H-%M"
self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}-\d{2}$"
elif self.when == 'H':
self.interval = 60 * 60 # one hour
self.suffix = "%Y-%m-%d_%H"
self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}$"
elif self.when == 'D' or self.when == 'MIDNIGHT':
self.interval = 60 * 60 * 24 # one day
self.suffix = "%Y-%m-%d"
self.extMatch = r"^\d{4}-\d{2}-\d{2}$"
elif self.when.startswith('W'):
self.interval = 60 * 60 * 24 * 7 # one week
if len(self.when) != 2:
raise ValueError("You must specify a day for weekly rollover from 0 to 6 (0 is Monday): %s" % self.when)
if self.when[1] < '0' or self.when[1] > '6':
raise ValueError("Invalid day specified for weekly rollover: %s" % self.when)
self.dayOfWeek = int(self.when[1])
self.suffix = "%Y-%m-%d"
self.extMatch = r"^\d{4}-\d{2}-\d{2}$"
else:
raise ValueError("Invalid rollover interval specified: %s" % self.when)
self.extMatch = re.compile(self.extMatch)
self.interval = self.interval * interval # multiply by units requested
if os.path.exists(filename):
t = os.stat(filename)[ST_MTIME]
else:
t = int(time.time())
self.rolloverAt = self.computeRollover(t)
def computeRollover(self, currentTime):
"""
Work out the rollover time based on the specified time.
"""
result = currentTime + self.interval
# If we are rolling over at midnight or weekly, then the interval is already known.
# What we need to figure out is WHEN the next interval is. In other words,
# if you are rolling over at midnight, then your base interval is 1 day,
# but you want to start that one day clock at midnight, not now. So, we
# have to fudge the rolloverAt value in order to trigger the first rollover
# at the right time. After that, the regular interval will take care of
# the rest. Note that this code doesn't care about leap seconds. :)
if self.when == 'MIDNIGHT' or self.when.startswith('W'):
# This could be done with less code, but I wanted it to be clear
if self.utc:
t = time.gmtime(currentTime)
else:
t = time.localtime(currentTime)
currentHour = t[3]
currentMinute = t[4]
currentSecond = t[5]
# r is the number of seconds left between now and midnight
r = _MIDNIGHT - ((currentHour * 60 + currentMinute) * 60 +
currentSecond)
result = currentTime + r
# If we are rolling over on a certain day, add in the number of days until
# the next rollover, but offset by 1 since we just calculated the time
# until the next day starts. There are three cases:
# Case 1) The day to rollover is today; in this case, do nothing
# Case 2) The day to rollover is further in the interval (i.e., today is
# day 2 (Wednesday) and rollover is on day 6 (Sunday). Days to
# next rollover is simply 6 - 2 - 1, or 3.
# Case 3) The day to rollover is behind us in the interval (i.e., today
# is day 5 (Saturday) and rollover is on day 3 (Thursday).
# Days to rollover is 6 - 5 + 3, or 4. In this case, it's the
# number of days left in the current week (1) plus the number
# of days in the next week until the rollover day (3).
# The calculations described in 2) and 3) above need to have a day added.
# This is because the above time calculation takes us to midnight on this
# day, i.e. the start of the next day.
if self.when.startswith('W'):
day = t[6] # 0 is Monday
if day != self.dayOfWeek:
if day < self.dayOfWeek:
daysToWait = self.dayOfWeek - day
else:
daysToWait = 6 - day + self.dayOfWeek + 1
newRolloverAt = result + (daysToWait * (60 * 60 * 24))
if not self.utc:
dstNow = t[-1]
dstAtRollover = time.localtime(newRolloverAt)[-1]
if dstNow != dstAtRollover:
if not dstNow: # DST kicks in before next rollover, so we need to deduct an hour
addend = -3600
else: # DST bows out before next rollover, so we need to add an hour
addend = 3600
newRolloverAt += addend
result = newRolloverAt
return result
def shouldRollover(self, record):
"""
Determine if rollover should occur.
record is not used, as we are just comparing times, but it is needed so
the method signatures are the same
"""
t = int(time.time())
if t >= self.rolloverAt:
return 1
#print "No need to rollover: %d, %d" % (t, self.rolloverAt)
return 0
def getFilesToDelete(self):
"""
Determine the files to delete when rolling over.
More specific than the earlier method, which just used glob.glob().
"""
dirName, baseName = os.path.split(self.baseFilename)
fileNames = os.listdir(dirName)
result = []
prefix = baseName + "."
plen = len(prefix)
for fileName in fileNames:
if fileName[:plen] == prefix:
suffix = fileName[plen:]
if self.extMatch.match(suffix):
result.append(os.path.join(dirName, fileName))
result.sort()
if len(result) < self.backupCount:
result = []
else:
result = result[:len(result) - self.backupCount]
return result
def doRollover(self):
"""
do a rollover; in this case, a date/time stamp is appended to the filename
when the rollover happens. However, you want the file to be named for the
start of the interval, not the current time. If there is a backup count,
then we have to get a list of matching filenames, sort them and remove
the one with the oldest suffix.
"""
if self.stream:
self.stream.close()
self.stream = None
# get the time that this sequence started at and make it a TimeTuple
currentTime = int(time.time())
dstNow = time.localtime(currentTime)[-1]
t = self.rolloverAt - self.interval
if self.utc:
timeTuple = time.gmtime(t)
else:
timeTuple = time.localtime(t)
dstThen = timeTuple[-1]
if dstNow != dstThen:
if dstNow:
addend = 3600
else:
addend = -3600
timeTuple = time.localtime(t + addend)
dfn = self.baseFilename + "." + time.strftime(self.suffix, timeTuple)
if os.path.exists(dfn):
os.remove(dfn)
# Issue 18940: A file may not have been created if delay is True.
if os.path.exists(self.baseFilename):
os.rename(self.baseFilename, dfn)
if self.backupCount > 0:
for s in self.getFilesToDelete():
os.remove(s)
if not self.delay:
self.stream = self._open()
newRolloverAt = self.computeRollover(currentTime)
while newRolloverAt <= currentTime:
newRolloverAt = newRolloverAt + self.interval
#If DST changes and midnight or weekly rollover, adjust for this.
if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc:
dstAtRollover = time.localtime(newRolloverAt)[-1]
if dstNow != dstAtRollover:
if not dstNow: # DST kicks in before next rollover, so we need to deduct an hour
addend = -3600
else: # DST bows out before next rollover, so we need to add an hour
addend = 3600
newRolloverAt += addend
self.rolloverAt = newRolloverAt
class WatchedFileHandler(logging.FileHandler):
"""
A handler for logging to a file, which watches the file
to see if it has changed while in use. This can happen because of
usage of programs such as newsyslog and logrotate which perform
log file rotation. This handler, intended for use under Unix,
watches the file to see if it has changed since the last emit.
(A file has changed if its device or inode have changed.)
If it has changed, the old file stream is closed, and the file
opened to get a new stream.
This handler is not appropriate for use under Windows, because
under Windows open files cannot be moved or renamed - logging
opens the files with exclusive locks - and so there is no need
for such a handler. Furthermore, ST_INO is not supported under
Windows; stat always returns zero for this value.
This handler is based on a suggestion and patch by Chad J.
Schroeder.
"""
def __init__(self, filename, mode='a', encoding=None, delay=0):
logging.FileHandler.__init__(self, filename, mode, encoding, delay)
self.dev, self.ino = -1, -1
self._statstream()
def _statstream(self):
if self.stream:
sres = os.fstat(self.stream.fileno())
self.dev, self.ino = sres[ST_DEV], sres[ST_INO]
def emit(self, record):
"""
Emit a record.
First check if the underlying file has changed, and if it
has, close the old stream and reopen the file to get the
current stream.
"""
# Reduce the chance of race conditions by stat'ing by path only
# once and then fstat'ing our new fd if we opened a new log stream.
# See issue #14632: Thanks to John Mulligan for the problem report
# and patch.
try:
# stat the file by path, checking for existence
sres = os.stat(self.baseFilename)
except OSError as err:
if err.errno == errno.ENOENT:
sres = None
else:
raise
# compare file system stat with that of our stream file handle
if not sres or sres[ST_DEV] != self.dev or sres[ST_INO] != self.ino:
if self.stream is not None:
# we have an open file handle, clean it up
self.stream.flush()
self.stream.close()
self.stream = None # See Issue #21742: _open () might fail.
# open a new file handle and get new stat info from that fd
self.stream = self._open()
self._statstream()
logging.FileHandler.emit(self, record)
class SocketHandler(logging.Handler):
"""
A handler class which writes logging records, in pickle format, to
a streaming socket. The socket is kept open across logging calls.
If the peer resets it, an attempt is made to reconnect on the next call.
The pickle which is sent is that of the LogRecord's attribute dictionary
(__dict__), so that the receiver does not need to have the logging module
installed in order to process the logging event.
To unpickle the record at the receiving end into a LogRecord, use the
makeLogRecord function.
"""
def __init__(self, host, port):
"""
Initializes the handler with a specific host address and port.
The attribute 'closeOnError' is set to 1 - which means that if
a socket error occurs, the socket is silently closed and then
reopened on the next logging call.
"""
logging.Handler.__init__(self)
self.host = host
self.port = port
self.sock = None
self.closeOnError = 0
self.retryTime = None
#
# Exponential backoff parameters.
#
self.retryStart = 1.0
self.retryMax = 30.0
self.retryFactor = 2.0
def makeSocket(self, timeout=1):
"""
A factory method which allows subclasses to define the precise
type of socket they want.
"""
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
if hasattr(s, 'settimeout'):
s.settimeout(timeout)
s.connect((self.host, self.port))
return s
def createSocket(self):
"""
Try to create a socket, using an exponential backoff with
a max retry time. Thanks to Robert Olson for the original patch
(SF #815911) which has been slightly refactored.
"""
now = time.time()
# Either retryTime is None, in which case this
# is the first time back after a disconnect, or
# we've waited long enough.
if self.retryTime is None:
attempt = 1
else:
attempt = (now >= self.retryTime)
if attempt:
try:
self.sock = self.makeSocket()
self.retryTime = None # next time, no delay before trying
except socket.error:
#Creation failed, so set the retry time and return.
if self.retryTime is None:
self.retryPeriod = self.retryStart
else:
self.retryPeriod = self.retryPeriod * self.retryFactor
if self.retryPeriod > self.retryMax:
self.retryPeriod = self.retryMax
self.retryTime = now + self.retryPeriod
def send(self, s):
"""
Send a pickled string to the socket.
This function allows for partial sends which can happen when the
network is busy.
"""
if self.sock is None:
self.createSocket()
#self.sock can be None either because we haven't reached the retry
#time yet, or because we have reached the retry time and retried,
#but are still unable to connect.
if self.sock:
try:
if hasattr(self.sock, "sendall"):
self.sock.sendall(s)
else:
sentsofar = 0
left = len(s)
while left > 0:
sent = self.sock.send(s[sentsofar:])
sentsofar = sentsofar + sent
left = left - sent
except socket.error:
self.sock.close()
self.sock = None # so we can call createSocket next time
def makePickle(self, record):
"""
Pickles the record in binary format with a length prefix, and
returns it ready for transmission across the socket.
"""
ei = record.exc_info
if ei:
# just to get traceback text into record.exc_text ...
dummy = self.format(record)
record.exc_info = None # to avoid Unpickleable error
# See issue #14436: If msg or args are objects, they may not be
# available on the receiving end. So we convert the msg % args
# to a string, save it as msg and zap the args.
d = dict(record.__dict__)
d['msg'] = record.getMessage()
d['args'] = None
s = cPickle.dumps(d, 1)
if ei:
record.exc_info = ei # for next handler
slen = struct.pack(">L", len(s))
return slen + s
def handleError(self, record):
"""
Handle an error during logging.
An error has occurred during logging. Most likely cause -
connection lost. Close the socket so that we can retry on the
next event.
"""
if self.closeOnError and self.sock:
self.sock.close()
self.sock = None #try to reconnect next time
else:
logging.Handler.handleError(self, record)
def emit(self, record):
"""
Emit a record.
Pickles the record and writes it to the socket in binary format.
If there is an error with the socket, silently drop the packet.
If there was a problem with the socket, re-establishes the
socket.
"""
try:
s = self.makePickle(record)
self.send(s)
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
def close(self):
"""
Closes the socket.
"""
self.acquire()
try:
sock = self.sock
if sock:
self.sock = None
sock.close()
finally:
self.release()
logging.Handler.close(self)
class DatagramHandler(SocketHandler):
"""
A handler class which writes logging records, in pickle format, to
a datagram socket. The pickle which is sent is that of the LogRecord's
attribute dictionary (__dict__), so that the receiver does not need to
have the logging module installed in order to process the logging event.
To unpickle the record at the receiving end into a LogRecord, use the
makeLogRecord function.
"""
def __init__(self, host, port):
"""
Initializes the handler with a specific host address and port.
"""
SocketHandler.__init__(self, host, port)
self.closeOnError = 0
def makeSocket(self):
"""
The factory method of SocketHandler is here overridden to create
a UDP socket (SOCK_DGRAM).
"""
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
return s
def send(self, s):
"""
Send a pickled string to a socket.
This function no longer allows for partial sends which can happen
when the network is busy - UDP does not guarantee delivery and
can deliver packets out of sequence.
"""
if self.sock is None:
self.createSocket()
self.sock.sendto(s, (self.host, self.port))
class SysLogHandler(logging.Handler):
"""
A handler class which sends formatted logging records to a syslog
server. Based on Sam Rushing's syslog module:
http://www.nightmare.com/squirl/python-ext/misc/syslog.py
Contributed by Nicolas Untz (after which minor refactoring changes
have been made).
"""
# from <linux/sys/syslog.h>:
# ======================================================================
# priorities/facilities are encoded into a single 32-bit quantity, where
# the bottom 3 bits are the priority (0-7) and the top 28 bits are the
# facility (0-big number). Both the priorities and the facilities map
# roughly one-to-one to strings in the syslogd(8) source code. This
# mapping is included in this file.
#
# priorities (these are ordered)
LOG_EMERG = 0 # system is unusable
LOG_ALERT = 1 # action must be taken immediately
LOG_CRIT = 2 # critical conditions
LOG_ERR = 3 # error conditions
LOG_WARNING = 4 # warning conditions
LOG_NOTICE = 5 # normal but significant condition
LOG_INFO = 6 # informational
LOG_DEBUG = 7 # debug-level messages
# facility codes
LOG_KERN = 0 # kernel messages
LOG_USER = 1 # random user-level messages
LOG_MAIL = 2 # mail system
LOG_DAEMON = 3 # system daemons
LOG_AUTH = 4 # security/authorization messages
LOG_SYSLOG = 5 # messages generated internally by syslogd
LOG_LPR = 6 # line printer subsystem
LOG_NEWS = 7 # network news subsystem
LOG_UUCP = 8 # UUCP subsystem
LOG_CRON = 9 # clock daemon
LOG_AUTHPRIV = 10 # security/authorization messages (private)
LOG_FTP = 11 # FTP daemon
# other codes through 15 reserved for system use
LOG_LOCAL0 = 16 # reserved for local use
LOG_LOCAL1 = 17 # reserved for local use
LOG_LOCAL2 = 18 # reserved for local use
LOG_LOCAL3 = 19 # reserved for local use
LOG_LOCAL4 = 20 # reserved for local use
LOG_LOCAL5 = 21 # reserved for local use
LOG_LOCAL6 = 22 # reserved for local use
LOG_LOCAL7 = 23 # reserved for local use
priority_names = {
"alert": LOG_ALERT,
"crit": LOG_CRIT,
"critical": LOG_CRIT,
"debug": LOG_DEBUG,
"emerg": LOG_EMERG,
"err": LOG_ERR,
"error": LOG_ERR, # DEPRECATED
"info": LOG_INFO,
"notice": LOG_NOTICE,
"panic": LOG_EMERG, # DEPRECATED
"warn": LOG_WARNING, # DEPRECATED
"warning": LOG_WARNING,
}
facility_names = {
"auth": LOG_AUTH,
"authpriv": LOG_AUTHPRIV,
"cron": LOG_CRON,
"daemon": LOG_DAEMON,
"ftp": LOG_FTP,
"kern": LOG_KERN,
"lpr": LOG_LPR,
"mail": LOG_MAIL,
"news": LOG_NEWS,
"security": LOG_AUTH, # DEPRECATED
"syslog": LOG_SYSLOG,
"user": LOG_USER,
"uucp": LOG_UUCP,
"local0": LOG_LOCAL0,
"local1": LOG_LOCAL1,
"local2": LOG_LOCAL2,
"local3": LOG_LOCAL3,
"local4": LOG_LOCAL4,
"local5": LOG_LOCAL5,
"local6": LOG_LOCAL6,
"local7": LOG_LOCAL7,
}
#The map below appears to be trivially lowercasing the key. However,
#there's more to it than meets the eye - in some locales, lowercasing
#gives unexpected results. See SF #1524081: in the Turkish locale,
#"INFO".lower() != "info"
priority_map = {
"DEBUG" : "debug",
"INFO" : "info",
"WARNING" : "warning",
"ERROR" : "error",
"CRITICAL" : "critical"
}
def __init__(self, address=('localhost', SYSLOG_UDP_PORT),
facility=LOG_USER, socktype=None):
"""
Initialize a handler.
If address is specified as a string, a UNIX socket is used. To log to a
local syslogd, "SysLogHandler(address="/dev/log")" can be used.
If facility is not specified, LOG_USER is used. If socktype is
specified as socket.SOCK_DGRAM or socket.SOCK_STREAM, that specific
socket type will be used. For Unix sockets, you can also specify a
socktype of None, in which case socket.SOCK_DGRAM will be used, falling
back to socket.SOCK_STREAM.
"""
logging.Handler.__init__(self)
self.address = address
self.facility = facility
self.socktype = socktype
if isinstance(address, basestring):
self.unixsocket = 1
self._connect_unixsocket(address)
else:
self.unixsocket = False
if socktype is None:
socktype = socket.SOCK_DGRAM
host, port = address
ress = socket.getaddrinfo(host, port, 0, socktype)
if not ress:
raise socket.error("getaddrinfo returns an empty list")
for res in ress:
af, socktype, proto, _, sa = res
err = sock = None
try:
sock = socket.socket(af, socktype, proto)
if socktype == socket.SOCK_STREAM:
sock.connect(sa)
break
except socket.error as exc:
err = exc
if sock is not None:
sock.close()
if err is not None:
raise err
self.socket = sock
self.socktype = socktype
def _connect_unixsocket(self, address):
use_socktype = self.socktype
if use_socktype is None:
use_socktype = socket.SOCK_DGRAM
self.socket = socket.socket(socket.AF_UNIX, use_socktype)
try:
self.socket.connect(address)
# it worked, so set self.socktype to the used type
self.socktype = use_socktype
except socket.error:
self.socket.close()
if self.socktype is not None:
# user didn't specify falling back, so fail
raise
use_socktype = socket.SOCK_STREAM
self.socket = socket.socket(socket.AF_UNIX, use_socktype)
try:
self.socket.connect(address)
# it worked, so set self.socktype to the used type
self.socktype = use_socktype
except socket.error:
self.socket.close()
raise
# curious: when talking to the unix-domain '/dev/log' socket, a
# zero-terminator seems to be required. this string is placed
# into a class variable so that it can be overridden if
# necessary.
log_format_string = '<%d>%s\000'
def encodePriority(self, facility, priority):
"""
Encode the facility and priority. You can pass in strings or
integers - if strings are passed, the facility_names and
priority_names mapping dictionaries are used to convert them to
integers.
"""
if isinstance(facility, basestring):
facility = self.facility_names[facility]
if isinstance(priority, basestring):
priority = self.priority_names[priority]
return (facility << 3) | priority
def close(self):
"""
Closes the socket.
"""
self.acquire()
try:
if self.unixsocket:
self.socket.close()
finally:
self.release()
logging.Handler.close(self)
def mapPriority(self, levelName):
"""
Map a logging level name to a key in the priority_names map.
This is useful in two scenarios: when custom levels are being
used, and in the case where you can't do a straightforward
mapping by lowercasing the logging level name because of locale-
specific issues (see SF #1524081).
"""
return self.priority_map.get(levelName, "warning")
def emit(self, record):
"""
Emit a record.
The record is formatted, and then sent to the syslog server. If
exception information is present, it is NOT sent to the server.
"""
try:
msg = self.format(record) + '\000'
"""
We need to convert record level to lowercase, maybe this will
change in the future.
"""
prio = '<%d>' % self.encodePriority(self.facility,
self.mapPriority(record.levelname))
# Message is a string. Convert to bytes as required by RFC 5424
if type(msg) is unicode:
msg = msg.encode('utf-8')
msg = prio + msg
if self.unixsocket:
try:
self.socket.send(msg)
except socket.error:
self.socket.close() # See issue 17981
self._connect_unixsocket(self.address)
self.socket.send(msg)
elif self.socktype == socket.SOCK_DGRAM:
self.socket.sendto(msg, self.address)
else:
self.socket.sendall(msg)
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
class SMTPHandler(logging.Handler):
"""
A handler class which sends an SMTP email for each logging event.
"""
def __init__(self, mailhost, fromaddr, toaddrs, subject,
credentials=None, secure=None):
"""
Initialize the handler.
Initialize the instance with the from and to addresses and subject
line of the email. To specify a non-standard SMTP port, use the
(host, port) tuple format for the mailhost argument. To specify
authentication credentials, supply a (username, password) tuple
for the credentials argument. To specify the use of a secure
protocol (TLS), pass in a tuple for the secure argument. This will
only be used when authentication credentials are supplied. The tuple
will be either an empty tuple, or a single-value tuple with the name
of a keyfile, or a 2-value tuple with the names of the keyfile and
certificate file. (This tuple is passed to the `starttls` method).
"""
logging.Handler.__init__(self)
if isinstance(mailhost, (list, tuple)):
self.mailhost, self.mailport = mailhost
else:
self.mailhost, self.mailport = mailhost, None
if isinstance(credentials, (list, tuple)):
self.username, self.password = credentials
else:
self.username = None
self.fromaddr = fromaddr
if isinstance(toaddrs, basestring):
toaddrs = [toaddrs]
self.toaddrs = toaddrs
self.subject = subject
self.secure = secure
self._timeout = 5.0
def getSubject(self, record):
"""
Determine the subject for the email.
If you want to specify a subject line which is record-dependent,
override this method.
"""
return self.subject
def emit(self, record):
"""
Emit a record.
Format the record and send it to the specified addressees.
"""
try:
import smtplib
from email.utils import formatdate
port = self.mailport
if not port:
port = smtplib.SMTP_PORT
smtp = smtplib.SMTP(self.mailhost, port, timeout=self._timeout)
msg = self.format(record)
msg = "From: %s\r\nTo: %s\r\nSubject: %s\r\nDate: %s\r\n\r\n%s" % (
self.fromaddr,
",".join(self.toaddrs),
self.getSubject(record),
formatdate(), msg)
if self.username:
if self.secure is not None:
smtp.ehlo()
smtp.starttls(*self.secure)
smtp.ehlo()
smtp.login(self.username, self.password)
smtp.sendmail(self.fromaddr, self.toaddrs, msg)
smtp.quit()
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
class NTEventLogHandler(logging.Handler):
"""
A handler class which sends events to the NT Event Log. Adds a
registry entry for the specified application name. If no dllname is
provided, win32service.pyd (which contains some basic message
placeholders) is used. Note that use of these placeholders will make
your event logs big, as the entire message source is held in the log.
If you want slimmer logs, you have to pass in the name of your own DLL
which contains the message definitions you want to use in the event log.
"""
def __init__(self, appname, dllname=None, logtype="Application"):
logging.Handler.__init__(self)
try:
import win32evtlogutil, win32evtlog
self.appname = appname
self._welu = win32evtlogutil
if not dllname:
dllname = os.path.split(self._welu.__file__)
dllname = os.path.split(dllname[0])
dllname = os.path.join(dllname[0], r'win32service.pyd')
self.dllname = dllname
self.logtype = logtype
self._welu.AddSourceToRegistry(appname, dllname, logtype)
self.deftype = win32evtlog.EVENTLOG_ERROR_TYPE
self.typemap = {
logging.DEBUG : win32evtlog.EVENTLOG_INFORMATION_TYPE,
logging.INFO : win32evtlog.EVENTLOG_INFORMATION_TYPE,
logging.WARNING : win32evtlog.EVENTLOG_WARNING_TYPE,
logging.ERROR : win32evtlog.EVENTLOG_ERROR_TYPE,
logging.CRITICAL: win32evtlog.EVENTLOG_ERROR_TYPE,
}
except ImportError:
print("The Python Win32 extensions for NT (service, event "\
"logging) appear not to be available.")
self._welu = None
def getMessageID(self, record):
"""
Return the message ID for the event record. If you are using your
own messages, you could do this by having the msg passed to the
logger being an ID rather than a formatting string. Then, in here,
you could use a dictionary lookup to get the message ID. This
version returns 1, which is the base message ID in win32service.pyd.
"""
return 1
def getEventCategory(self, record):
"""
Return the event category for the record.
Override this if you want to specify your own categories. This version
returns 0.
"""
return 0
def getEventType(self, record):
"""
Return the event type for the record.
Override this if you want to specify your own types. This version does
a mapping using the handler's typemap attribute, which is set up in
__init__() to a dictionary which contains mappings for DEBUG, INFO,
WARNING, ERROR and CRITICAL. If you are using your own levels you will
either need to override this method or place a suitable dictionary in
the handler's typemap attribute.
"""
return self.typemap.get(record.levelno, self.deftype)
def emit(self, record):
"""
Emit a record.
Determine the message ID, event category and event type. Then
log the message in the NT event log.
"""
if self._welu:
try:
id = self.getMessageID(record)
cat = self.getEventCategory(record)
type = self.getEventType(record)
msg = self.format(record)
self._welu.ReportEvent(self.appname, id, cat, type, [msg])
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
def close(self):
"""
Clean up this handler.
You can remove the application name from the registry as a
source of event log entries. However, if you do this, you will
not be able to see the events as you intended in the Event Log
Viewer - it needs to be able to access the registry to get the
DLL name.
"""
#self._welu.RemoveSourceFromRegistry(self.appname, self.logtype)
logging.Handler.close(self)
class HTTPHandler(logging.Handler):
"""
A class which sends records to a Web server, using either GET or
POST semantics.
"""
def __init__(self, host, url, method="GET"):
"""
Initialize the instance with the host, the request URL, and the method
("GET" or "POST")
"""
logging.Handler.__init__(self)
method = method.upper()
if method not in ["GET", "POST"]:
raise ValueError("method must be GET or POST")
self.host = host
self.url = url
self.method = method
def mapLogRecord(self, record):
"""
Default implementation of mapping the log record into a dict
that is sent as the CGI data. Overwrite in your class.
Contributed by Franz Glasner.
"""
return record.__dict__
def emit(self, record):
"""
Emit a record.
Send the record to the Web server as a percent-encoded dictionary
"""
try:
import httplib, urllib
host = self.host
h = httplib.HTTP(host)
url = self.url
data = urllib.urlencode(self.mapLogRecord(record))
if self.method == "GET":
if (url.find('?') >= 0):
sep = '&'
else:
sep = '?'
url = url + "%c%s" % (sep, data)
h.putrequest(self.method, url)
# support multiple hosts on one IP address...
# need to strip optional :port from host, if present
i = host.find(":")
if i >= 0:
host = host[:i]
h.putheader("Host", host)
if self.method == "POST":
h.putheader("Content-type",
"application/x-www-form-urlencoded")
h.putheader("Content-length", str(len(data)))
h.endheaders(data if self.method == "POST" else None)
h.getreply() #can't do anything with the result
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
class BufferingHandler(logging.Handler):
"""
A handler class which buffers logging records in memory. Whenever each
record is added to the buffer, a check is made to see if the buffer should
be flushed. If it should, then flush() is expected to do what's needed.
"""
def __init__(self, capacity):
"""
Initialize the handler with the buffer size.
"""
logging.Handler.__init__(self)
self.capacity = capacity
self.buffer = []
def shouldFlush(self, record):
"""
Should the handler flush its buffer?
Returns true if the buffer is up to capacity. This method can be
overridden to implement custom flushing strategies.
"""
return (len(self.buffer) >= self.capacity)
def emit(self, record):
"""
Emit a record.
Append the record. If shouldFlush() tells us to, call flush() to process
the buffer.
"""
self.buffer.append(record)
if self.shouldFlush(record):
self.flush()
def flush(self):
"""
Override to implement custom flushing behaviour.
This version just zaps the buffer to empty.
"""
self.acquire()
try:
self.buffer = []
finally:
self.release()
def close(self):
"""
Close the handler.
This version just flushes and chains to the parent class' close().
"""
try:
self.flush()
finally:
logging.Handler.close(self)
class MemoryHandler(BufferingHandler):
"""
A handler class which buffers logging records in memory, periodically
flushing them to a target handler. Flushing occurs whenever the buffer
is full, or when an event of a certain severity or greater is seen.
"""
def __init__(self, capacity, flushLevel=logging.ERROR, target=None):
"""
Initialize the handler with the buffer size, the level at which
flushing should occur and an optional target.
Note that without a target being set either here or via setTarget(),
a MemoryHandler is no use to anyone!
"""
BufferingHandler.__init__(self, capacity)
self.flushLevel = flushLevel
self.target = target
def shouldFlush(self, record):
"""
Check for buffer full or a record at the flushLevel or higher.
"""
return (len(self.buffer) >= self.capacity) or \
(record.levelno >= self.flushLevel)
def setTarget(self, target):
"""
Set the target handler for this handler.
"""
self.target = target
def flush(self):
"""
For a MemoryHandler, flushing means just sending the buffered
records to the target, if there is one. Override if you want
different behaviour.
"""
self.acquire()
try:
if self.target:
for record in self.buffer:
self.target.handle(record)
self.buffer = []
finally:
self.release()
def close(self):
"""
Flush, set the target to None and lose the buffer.
"""
try:
self.flush()
finally:
self.acquire()
try:
self.target = None
BufferingHandler.close(self)
finally:
self.release()
# Copyright 2001-2014 by Vinay Sajip. All Rights Reserved.
#
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Vinay Sajip
# not be used in advertising or publicity pertaining to distribution
# of the software without specific, written prior permission.
# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
"""
Logging package for Python. Based on PEP 282 and comments thereto in
comp.lang.python.
Copyright (C) 2001-2014 Vinay Sajip. All Rights Reserved.
To use, simply 'import logging' and log away!
"""
import sys, os, time, cStringIO, traceback, warnings, weakref, collections
__all__ = ['BASIC_FORMAT', 'BufferingFormatter', 'CRITICAL', 'DEBUG', 'ERROR',
'FATAL', 'FileHandler', 'Filter', 'Formatter', 'Handler', 'INFO',
'LogRecord', 'Logger', 'LoggerAdapter', 'NOTSET', 'NullHandler',
'StreamHandler', 'WARN', 'WARNING', 'addLevelName', 'basicConfig',
'captureWarnings', 'critical', 'debug', 'disable', 'error',
'exception', 'fatal', 'getLevelName', 'getLogger', 'getLoggerClass',
'info', 'log', 'makeLogRecord', 'setLoggerClass', 'warn', 'warning']
try:
import codecs
except ImportError:
codecs = None
try:
import thread
import threading
except ImportError:
thread = None
__author__ = "Vinay Sajip <[email protected]>"
__status__ = "production"
# Note: the attributes below are no longer maintained.
__version__ = "0.5.1.2"
__date__ = "07 February 2010"
#---------------------------------------------------------------------------
# Miscellaneous module data
#---------------------------------------------------------------------------
try:
unicode
_unicode = True
except NameError:
_unicode = False
# next bit filched from 1.5.2's inspect.py
def currentframe():
"""Return the frame object for the caller's stack frame."""
try:
raise Exception
except:
return sys.exc_info()[2].tb_frame.f_back
if hasattr(sys, '_getframe'): currentframe = lambda: sys._getframe(3)
# done filching
#
# _srcfile is used when walking the stack to check when we've got the first
# caller stack frame.
#
_srcfile = os.path.normcase(currentframe.__code__.co_filename)
# _srcfile is only used in conjunction with sys._getframe().
# To provide compatibility with older versions of Python, set _srcfile
# to None if _getframe() is not available; this value will prevent
# findCaller() from being called.
#if not hasattr(sys, "_getframe"):
# _srcfile = None
#
#_startTime is used as the base when calculating the relative time of events
#
_startTime = time.time()
#
#raiseExceptions is used to see if exceptions during handling should be
#propagated
#
raiseExceptions = 1
#
# If you don't want threading information in the log, set this to zero
#
logThreads = 1
#
# If you don't want multiprocessing information in the log, set this to zero
#
logMultiprocessing = 1
#
# If you don't want process information in the log, set this to zero
#
logProcesses = 1
#---------------------------------------------------------------------------
# Level related stuff
#---------------------------------------------------------------------------
#
# Default levels and level names, these can be replaced with any positive set
# of values having corresponding names. There is a pseudo-level, NOTSET, which
# is only really there as a lower limit for user-defined levels. Handlers and
# loggers are initialized with NOTSET so that they will log all messages, even
# at user-defined levels.
#
CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
INFO = 20
DEBUG = 10
NOTSET = 0
_levelNames = {
CRITICAL : 'CRITICAL',
ERROR : 'ERROR',
WARNING : 'WARNING',
INFO : 'INFO',
DEBUG : 'DEBUG',
NOTSET : 'NOTSET',
'CRITICAL' : CRITICAL,
'ERROR' : ERROR,
'WARN' : WARNING,
'WARNING' : WARNING,
'INFO' : INFO,
'DEBUG' : DEBUG,
'NOTSET' : NOTSET,
}
def getLevelName(level):
"""
Return the textual representation of logging level 'level'.
If the level is one of the predefined levels (CRITICAL, ERROR, WARNING,
INFO, DEBUG) then you get the corresponding string. If you have
associated levels with names using addLevelName then the name you have
associated with 'level' is returned.
If a numeric value corresponding to one of the defined levels is passed
in, the corresponding string representation is returned.
Otherwise, the string "Level %s" % level is returned.
"""
return _levelNames.get(level, ("Level %s" % level))
def addLevelName(level, levelName):
"""
Associate 'levelName' with 'level'.
This is used when converting levels to text during message formatting.
"""
_acquireLock()
try: #unlikely to cause an exception, but you never know...
_levelNames[level] = levelName
_levelNames[levelName] = level
finally:
_releaseLock()
def _checkLevel(level):
if isinstance(level, (int, long)):
rv = level
elif str(level) == level:
if level not in _levelNames:
raise ValueError("Unknown level: %r" % level)
rv = _levelNames[level]
else:
raise TypeError("Level not an integer or a valid string: %r" % level)
return rv
#---------------------------------------------------------------------------
# Thread-related stuff
#---------------------------------------------------------------------------
#
#_lock is used to serialize access to shared data structures in this module.
#This needs to be an RLock because fileConfig() creates and configures
#Handlers, and so might arbitrary user threads. Since Handler code updates the
#shared dictionary _handlers, it needs to acquire the lock. But if configuring,
#the lock would already have been acquired - so we need an RLock.
#The same argument applies to Loggers and Manager.loggerDict.
#
if thread:
_lock = threading.RLock()
else:
_lock = None
def _acquireLock():
"""
Acquire the module-level lock for serializing access to shared data.
This should be released with _releaseLock().
"""
if _lock:
_lock.acquire()
def _releaseLock():
"""
Release the module-level lock acquired by calling _acquireLock().
"""
if _lock:
_lock.release()
#---------------------------------------------------------------------------
# The logging record
#---------------------------------------------------------------------------
class LogRecord(object):
"""
A LogRecord instance represents an event being logged.
LogRecord instances are created every time something is logged. They
contain all the information pertinent to the event being logged. The
main information passed in is in msg and args, which are combined
using str(msg) % args to create the message field of the record. The
record also includes information such as when the record was created,
the source line where the logging call was made, and any exception
information to be logged.
"""
def __init__(self, name, level, pathname, lineno,
msg, args, exc_info, func=None):
"""
Initialize a logging record with interesting information.
"""
ct = time.time()
self.name = name
self.msg = msg
#
# The following statement allows passing of a dictionary as a sole
# argument, so that you can do something like
# logging.debug("a %(a)d b %(b)s", {'a':1, 'b':2})
# Suggested by Stefan Behnel.
# Note that without the test for args[0], we get a problem because
# during formatting, we test to see if the arg is present using
# 'if self.args:'. If the event being logged is e.g. 'Value is %d'
# and if the passed arg fails 'if self.args:' then no formatting
# is done. For example, logger.warn('Value is %d', 0) would log
# 'Value is %d' instead of 'Value is 0'.
# For the use case of passing a dictionary, this should not be a
# problem.
# Issue #21172: a request was made to relax the isinstance check
# to hasattr(args[0], '__getitem__'). However, the docs on string
# formatting still seem to suggest a mapping object is required.
# Thus, while not removing the isinstance check, it does now look
# for collections.Mapping rather than, as before, dict.
if (args and len(args) == 1 and isinstance(args[0], collections.Mapping)
and args[0]):
args = args[0]
self.args = args
self.levelname = getLevelName(level)
self.levelno = level
self.pathname = pathname
try:
self.filename = os.path.basename(pathname)
self.module = os.path.splitext(self.filename)[0]
except (TypeError, ValueError, AttributeError):
self.filename = pathname
self.module = "Unknown module"
self.exc_info = exc_info
self.exc_text = None # used to cache the traceback text
self.lineno = lineno
self.funcName = func
self.created = ct
self.msecs = (ct - long(ct)) * 1000
self.relativeCreated = (self.created - _startTime) * 1000
if logThreads and thread:
self.thread = thread.get_ident()
self.threadName = threading.current_thread().name
else:
self.thread = None
self.threadName = None
if not logMultiprocessing:
self.processName = None
else:
self.processName = 'MainProcess'
mp = sys.modules.get('multiprocessing')
if mp is not None:
# Errors may occur if multiprocessing has not finished loading
# yet - e.g. if a custom import hook causes third-party code
# to run when multiprocessing calls import. See issue 8200
# for an example
try:
self.processName = mp.current_process().name
except StandardError:
pass
if logProcesses and hasattr(os, 'getpid'):
self.process = os.getpid()
else:
self.process = None
def __str__(self):
return '<LogRecord: %s, %s, %s, %s, "%s">'%(self.name, self.levelno,
self.pathname, self.lineno, self.msg)
def getMessage(self):
"""
Return the message for this LogRecord.
Return the message for this LogRecord after merging any user-supplied
arguments with the message.
"""
if not _unicode: #if no unicode support...
msg = str(self.msg)
else:
msg = self.msg
if not isinstance(msg, basestring):
try:
msg = str(self.msg)
except UnicodeError:
msg = self.msg #Defer encoding till later
if self.args:
msg = msg % self.args
return msg
def makeLogRecord(dict):
"""
Make a LogRecord whose attributes are defined by the specified dictionary,
This function is useful for converting a logging event received over
a socket connection (which is sent as a dictionary) into a LogRecord
instance.
"""
rv = LogRecord(None, None, "", 0, "", (), None, None)
rv.__dict__.update(dict)
return rv
#---------------------------------------------------------------------------
# Formatter classes and functions
#---------------------------------------------------------------------------
class Formatter(object):
"""
Formatter instances are used to convert a LogRecord to text.
Formatters need to know how a LogRecord is constructed. They are
responsible for converting a LogRecord to (usually) a string which can
be interpreted by either a human or an external system. The base Formatter
allows a formatting string to be specified. If none is supplied, the
default value of "%s(message)\\n" is used.
The Formatter can be initialized with a format string which makes use of
knowledge of the LogRecord attributes - e.g. the default value mentioned
above makes use of the fact that the user's message and arguments are pre-
formatted into a LogRecord's message attribute. Currently, the useful
attributes in a LogRecord are described by:
%(name)s Name of the logger (logging channel)
%(levelno)s Numeric logging level for the message (DEBUG, INFO,
WARNING, ERROR, CRITICAL)
%(levelname)s Text logging level for the message ("DEBUG", "INFO",
"WARNING", "ERROR", "CRITICAL")
%(pathname)s Full pathname of the source file where the logging
call was issued (if available)
%(filename)s Filename portion of pathname
%(module)s Module (name portion of filename)
%(lineno)d Source line number where the logging call was issued
(if available)
%(funcName)s Function name
%(created)f Time when the LogRecord was created (time.time()
return value)
%(asctime)s Textual time when the LogRecord was created
%(msecs)d Millisecond portion of the creation time
%(relativeCreated)d Time in milliseconds when the LogRecord was created,
relative to the time the logging module was loaded
(typically at application startup time)
%(thread)d Thread ID (if available)
%(threadName)s Thread name (if available)
%(process)d Process ID (if available)
%(message)s The result of record.getMessage(), computed just as
the record is emitted
"""
converter = time.localtime
def __init__(self, fmt=None, datefmt=None):
"""
Initialize the formatter with specified format strings.
Initialize the formatter either with the specified format string, or a
default as described above. Allow for specialized date formatting with
the optional datefmt argument (if omitted, you get the ISO8601 format).
"""
if fmt:
self._fmt = fmt
else:
self._fmt = "%(message)s"
self.datefmt = datefmt
def formatTime(self, record, datefmt=None):
"""
Return the creation time of the specified LogRecord as formatted text.
This method should be called from format() by a formatter which
wants to make use of a formatted time. This method can be overridden
in formatters to provide for any specific requirement, but the
basic behaviour is as follows: if datefmt (a string) is specified,
it is used with time.strftime() to format the creation time of the
record. Otherwise, the ISO8601 format is used. The resulting
string is returned. This function uses a user-configurable function
to convert the creation time to a tuple. By default, time.localtime()
is used; to change this for a particular formatter instance, set the
'converter' attribute to a function with the same signature as
time.localtime() or time.gmtime(). To change it for all formatters,
for example if you want all logging times to be shown in GMT,
set the 'converter' attribute in the Formatter class.
"""
ct = self.converter(record.created)
if datefmt:
s = time.strftime(datefmt, ct)
else:
t = time.strftime("%Y-%m-%d %H:%M:%S", ct)
s = "%s,%03d" % (t, record.msecs)
return s
def formatException(self, ei):
"""
Format and return the specified exception information as a string.
This default implementation just uses
traceback.print_exception()
"""
sio = cStringIO.StringIO()
traceback.print_exception(ei[0], ei[1], ei[2], None, sio)
s = sio.getvalue()
sio.close()
if s[-1:] == "\n":
s = s[:-1]
return s
def usesTime(self):
"""
Check if the format uses the creation time of the record.
"""
return self._fmt.find("%(asctime)") >= 0
def format(self, record):
"""
Format the specified record as text.
The record's attribute dictionary is used as the operand to a
string formatting operation which yields the returned string.
Before formatting the dictionary, a couple of preparatory steps
are carried out. The message attribute of the record is computed
using LogRecord.getMessage(). If the formatting string uses the
time (as determined by a call to usesTime(), formatTime() is
called to format the event time. If there is exception information,
it is formatted using formatException() and appended to the message.
"""
record.message = record.getMessage()
if self.usesTime():
record.asctime = self.formatTime(record, self.datefmt)
try:
s = self._fmt % record.__dict__
except UnicodeDecodeError as e:
# Issue 25664. The logger name may be Unicode. Try again ...
try:
record.name = record.name.decode('utf-8')
s = self._fmt % record.__dict__
except UnicodeDecodeError:
raise e
if record.exc_info:
# Cache the traceback text to avoid converting it multiple times
# (it's constant anyway)
if not record.exc_text:
record.exc_text = self.formatException(record.exc_info)
if record.exc_text:
if s[-1:] != "\n":
s = s + "\n"
try:
s = s + record.exc_text
except UnicodeError:
# Sometimes filenames have non-ASCII chars, which can lead
# to errors when s is Unicode and record.exc_text is str
# See issue 8924.
# We also use replace for when there are multiple
# encodings, e.g. UTF-8 for the filesystem and latin-1
# for a script. See issue 13232.
s = s + record.exc_text.decode(sys.getfilesystemencoding(),
'replace')
return s
#
# The default formatter to use when no other is specified
#
_defaultFormatter = Formatter()
class BufferingFormatter(object):
"""
A formatter suitable for formatting a number of records.
"""
def __init__(self, linefmt=None):
"""
Optionally specify a formatter which will be used to format each
individual record.
"""
if linefmt:
self.linefmt = linefmt
else:
self.linefmt = _defaultFormatter
def formatHeader(self, records):
"""
Return the header string for the specified records.
"""
return ""
def formatFooter(self, records):
"""
Return the footer string for the specified records.
"""
return ""
def format(self, records):
"""
Format the specified records and return the result as a string.
"""
rv = ""
if len(records) > 0:
rv = rv + self.formatHeader(records)
for record in records:
rv = rv + self.linefmt.format(record)
rv = rv + self.formatFooter(records)
return rv
#---------------------------------------------------------------------------
# Filter classes and functions
#---------------------------------------------------------------------------
class Filter(object):
"""
Filter instances are used to perform arbitrary filtering of LogRecords.
Loggers and Handlers can optionally use Filter instances to filter
records as desired. The base filter class only allows events which are
below a certain point in the logger hierarchy. For example, a filter
initialized with "A.B" will allow events logged by loggers "A.B",
"A.B.C", "A.B.C.D", "A.B.D" etc. but not "A.BB", "B.A.B" etc. If
initialized with the empty string, all events are passed.
"""
def __init__(self, name=''):
"""
Initialize a filter.
Initialize with the name of the logger which, together with its
children, will have its events allowed through the filter. If no
name is specified, allow every event.
"""
self.name = name
self.nlen = len(name)
def filter(self, record):
"""
Determine if the specified record is to be logged.
Is the specified record to be logged? Returns 0 for no, nonzero for
yes. If deemed appropriate, the record may be modified in-place.
"""
if self.nlen == 0:
return 1
elif self.name == record.name:
return 1
elif record.name.find(self.name, 0, self.nlen) != 0:
return 0
return (record.name[self.nlen] == ".")
class Filterer(object):
"""
A base class for loggers and handlers which allows them to share
common code.
"""
def __init__(self):
"""
Initialize the list of filters to be an empty list.
"""
self.filters = []
def addFilter(self, filter):
"""
Add the specified filter to this handler.
"""
if not (filter in self.filters):
self.filters.append(filter)
def removeFilter(self, filter):
"""
Remove the specified filter from this handler.
"""
if filter in self.filters:
self.filters.remove(filter)
def filter(self, record):
"""
Determine if a record is loggable by consulting all the filters.
The default is to allow the record to be logged; any filter can veto
this and the record is then dropped. Returns a zero value if a record
is to be dropped, else non-zero.
"""
rv = 1
for f in self.filters:
if not f.filter(record):
rv = 0
break
return rv
#---------------------------------------------------------------------------
# Handler classes and functions
#---------------------------------------------------------------------------
_handlers = weakref.WeakValueDictionary() #map of handler names to handlers
_handlerList = [] # added to allow handlers to be removed in reverse of order initialized
def _removeHandlerRef(wr):
"""
Remove a handler reference from the internal cleanup list.
"""
# This function can be called during module teardown, when globals are
# set to None. It can also be called from another thread. So we need to
# pre-emptively grab the necessary globals and check if they're None,
# to prevent race conditions and failures during interpreter shutdown.
acquire, release, handlers = _acquireLock, _releaseLock, _handlerList
if acquire and release and handlers:
try:
acquire()
try:
if wr in handlers:
handlers.remove(wr)
finally:
release()
except TypeError:
# https://bugs.python.org/issue21149 - If the RLock object behind
# acquire() and release() has been partially finalized you may see
# an error about NoneType not being callable. Absolutely nothing
# we can do in this GC during process shutdown situation. Eat it.
pass
def _addHandlerRef(handler):
"""
Add a handler to the internal cleanup list using a weak reference.
"""
_acquireLock()
try:
_handlerList.append(weakref.ref(handler, _removeHandlerRef))
finally:
_releaseLock()
class Handler(Filterer):
"""
Handler instances dispatch logging events to specific destinations.
The base handler class. Acts as a placeholder which defines the Handler
interface. Handlers can optionally use Formatter instances to format
records as desired. By default, no formatter is specified; in this case,
the 'raw' message as determined by record.message is logged.
"""
def __init__(self, level=NOTSET):
"""
Initializes the instance - basically setting the formatter to None
and the filter list to empty.
"""
Filterer.__init__(self)
self._name = None
self.level = _checkLevel(level)
self.formatter = None
# Add the handler to the global _handlerList (for cleanup on shutdown)
_addHandlerRef(self)
self.createLock()
def get_name(self):
return self._name
def set_name(self, name):
_acquireLock()
try:
if self._name in _handlers:
del _handlers[self._name]
self._name = name
if name:
_handlers[name] = self
finally:
_releaseLock()
name = property(get_name, set_name)
def createLock(self):
"""
Acquire a thread lock for serializing access to the underlying I/O.
"""
if thread:
self.lock = threading.RLock()
else:
self.lock = None
def acquire(self):
"""
Acquire the I/O thread lock.
"""
if self.lock:
self.lock.acquire()
def release(self):
"""
Release the I/O thread lock.
"""
if self.lock:
self.lock.release()
def setLevel(self, level):
"""
Set the logging level of this handler.
"""
self.level = _checkLevel(level)
def format(self, record):
"""
Format the specified record.
If a formatter is set, use it. Otherwise, use the default formatter
for the module.
"""
if self.formatter:
fmt = self.formatter
else:
fmt = _defaultFormatter
return fmt.format(record)
def emit(self, record):
"""
Do whatever it takes to actually log the specified logging record.
This version is intended to be implemented by subclasses and so
raises a NotImplementedError.
"""
raise NotImplementedError('emit must be implemented '
'by Handler subclasses')
def handle(self, record):
"""
Conditionally emit the specified logging record.
Emission depends on filters which may have been added to the handler.
Wrap the actual emission of the record with acquisition/release of
the I/O thread lock. Returns whether the filter passed the record for
emission.
"""
rv = self.filter(record)
if rv:
self.acquire()
try:
self.emit(record)
finally:
self.release()
return rv
def setFormatter(self, fmt):
"""
Set the formatter for this handler.
"""
self.formatter = fmt
def flush(self):
"""
Ensure all logging output has been flushed.
This version does nothing and is intended to be implemented by
subclasses.
"""
pass
def close(self):
"""
Tidy up any resources used by the handler.
This version removes the handler from an internal map of handlers,
_handlers, which is used for handler lookup by name. Subclasses
should ensure that this gets called from overridden close()
methods.
"""
#get the module data lock, as we're updating a shared structure.
_acquireLock()
try: #unlikely to raise an exception, but you never know...
if self._name and self._name in _handlers:
del _handlers[self._name]
finally:
_releaseLock()
def handleError(self, record):
"""
Handle errors which occur during an emit() call.
This method should be called from handlers when an exception is
encountered during an emit() call. If raiseExceptions is false,
exceptions get silently ignored. This is what is mostly wanted
for a logging system - most users will not care about errors in
the logging system, they are more interested in application errors.
You could, however, replace this with a custom handler if you wish.
The record which was being processed is passed in to this method.
"""
if raiseExceptions and sys.stderr: # see issue 13807
ei = sys.exc_info()
try:
traceback.print_exception(ei[0], ei[1], ei[2],
None, sys.stderr)
sys.stderr.write('Logged from file %s, line %s\n' % (
record.filename, record.lineno))
except IOError:
pass # see issue 5971
finally:
del ei
class StreamHandler(Handler):
"""
A handler class which writes logging records, appropriately formatted,
to a stream. Note that this class does not close the stream, as
sys.stdout or sys.stderr may be used.
"""
def __init__(self, stream=None):
"""
Initialize the handler.
If stream is not specified, sys.stderr is used.
"""
Handler.__init__(self)
if stream is None:
stream = sys.stderr
self.stream = stream
def flush(self):
"""
Flushes the stream.
"""
self.acquire()
try:
if self.stream and hasattr(self.stream, "flush"):
self.stream.flush()
finally:
self.release()
def emit(self, record):
"""
Emit a record.
If a formatter is specified, it is used to format the record.
The record is then written to the stream with a trailing newline. If
exception information is present, it is formatted using
traceback.print_exception and appended to the stream. If the stream
has an 'encoding' attribute, it is used to determine how to do the
output to the stream.
"""
try:
msg = self.format(record)
stream = self.stream
fs = "%s\n"
if not _unicode: #if no unicode support...
stream.write(fs % msg)
else:
try:
if (isinstance(msg, unicode) and
getattr(stream, 'encoding', None)):
ufs = u'%s\n'
try:
stream.write(ufs % msg)
except UnicodeEncodeError:
#Printing to terminals sometimes fails. For example,
#with an encoding of 'cp1251', the above write will
#work if written to a stream opened or wrapped by
#the codecs module, but fail when writing to a
#terminal even when the codepage is set to cp1251.
#An extra encoding step seems to be needed.
stream.write((ufs % msg).encode(stream.encoding))
else:
stream.write(fs % msg)
except UnicodeError:
stream.write(fs % msg.encode("UTF-8"))
self.flush()
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
class FileHandler(StreamHandler):
"""
A handler class which writes formatted logging records to disk files.
"""
def __init__(self, filename, mode='a', encoding=None, delay=0):
"""
Open the specified file and use it as the stream for logging.
"""
#keep the absolute path, otherwise derived classes which use this
#may come a cropper when the current directory changes
if codecs is None:
encoding = None
self.baseFilename = os.path.abspath(filename)
self.mode = mode
self.encoding = encoding
self.delay = delay
if delay:
#We don't open the stream, but we still need to call the
#Handler constructor to set level, formatter, lock etc.
Handler.__init__(self)
self.stream = None
else:
StreamHandler.__init__(self, self._open())
def close(self):
"""
Closes the stream.
"""
self.acquire()
try:
try:
if self.stream:
try:
self.flush()
finally:
stream = self.stream
self.stream = None
if hasattr(stream, "close"):
stream.close()
finally:
# Issue #19523: call unconditionally to
# prevent a handler leak when delay is set
StreamHandler.close(self)
finally:
self.release()
def _open(self):
"""
Open the current base file with the (original) mode and encoding.
Return the resulting stream.
"""
if self.encoding is None:
stream = open(self.baseFilename, self.mode)
else:
stream = codecs.open(self.baseFilename, self.mode, self.encoding)
return stream
def emit(self, record):
"""
Emit a record.
If the stream was not opened because 'delay' was specified in the
constructor, open it before calling the superclass's emit.
"""
if self.stream is None:
self.stream = self._open()
StreamHandler.emit(self, record)
#---------------------------------------------------------------------------
# Manager classes and functions
#---------------------------------------------------------------------------
class PlaceHolder(object):
"""
PlaceHolder instances are used in the Manager logger hierarchy to take
the place of nodes for which no loggers have been defined. This class is
intended for internal use only and not as part of the public API.
"""
def __init__(self, alogger):
"""
Initialize with the specified logger being a child of this placeholder.
"""
#self.loggers = [alogger]
self.loggerMap = { alogger : None }
def append(self, alogger):
"""
Add the specified logger as a child of this placeholder.
"""
#if alogger not in self.loggers:
if alogger not in self.loggerMap:
#self.loggers.append(alogger)
self.loggerMap[alogger] = None
#
# Determine which class to use when instantiating loggers.
#
_loggerClass = None
def setLoggerClass(klass):
"""
Set the class to be used when instantiating a logger. The class should
define __init__() such that only a name argument is required, and the
__init__() should call Logger.__init__()
"""
if klass != Logger:
if not issubclass(klass, Logger):
raise TypeError("logger not derived from logging.Logger: "
+ klass.__name__)
global _loggerClass
_loggerClass = klass
def getLoggerClass():
"""
Return the class to be used when instantiating a logger.
"""
return _loggerClass
class Manager(object):
"""
There is [under normal circumstances] just one Manager instance, which
holds the hierarchy of loggers.
"""
def __init__(self, rootnode):
"""
Initialize the manager with the root node of the logger hierarchy.
"""
self.root = rootnode
self.disable = 0
self.emittedNoHandlerWarning = 0
self.loggerDict = {}
self.loggerClass = None
def getLogger(self, name):
"""
Get a logger with the specified name (channel name), creating it
if it doesn't yet exist. This name is a dot-separated hierarchical
name, such as "a", "a.b", "a.b.c" or similar.
If a PlaceHolder existed for the specified name [i.e. the logger
didn't exist but a child of it did], replace it with the created
logger and fix up the parent/child references which pointed to the
placeholder to now point to the logger.
"""
rv = None
if not isinstance(name, basestring):
raise TypeError('A logger name must be string or Unicode')
if isinstance(name, unicode):
name = name.encode('utf-8')
_acquireLock()
try:
if name in self.loggerDict:
rv = self.loggerDict[name]
if isinstance(rv, PlaceHolder):
ph = rv
rv = (self.loggerClass or _loggerClass)(name)
rv.manager = self
self.loggerDict[name] = rv
self._fixupChildren(ph, rv)
self._fixupParents(rv)
else:
rv = (self.loggerClass or _loggerClass)(name)
rv.manager = self
self.loggerDict[name] = rv
self._fixupParents(rv)
finally:
_releaseLock()
return rv
def setLoggerClass(self, klass):
"""
Set the class to be used when instantiating a logger with this Manager.
"""
if klass != Logger:
if not issubclass(klass, Logger):
raise TypeError("logger not derived from logging.Logger: "
+ klass.__name__)
self.loggerClass = klass
def _fixupParents(self, alogger):
"""
Ensure that there are either loggers or placeholders all the way
from the specified logger to the root of the logger hierarchy.
"""
name = alogger.name
i = name.rfind(".")
rv = None
while (i > 0) and not rv:
substr = name[:i]
if substr not in self.loggerDict:
self.loggerDict[substr] = PlaceHolder(alogger)
else:
obj = self.loggerDict[substr]
if isinstance(obj, Logger):
rv = obj
else:
assert isinstance(obj, PlaceHolder)
obj.append(alogger)
i = name.rfind(".", 0, i - 1)
if not rv:
rv = self.root
alogger.parent = rv
def _fixupChildren(self, ph, alogger):
"""
Ensure that children of the placeholder ph are connected to the
specified logger.
"""
name = alogger.name
namelen = len(name)
for c in ph.loggerMap.keys():
#The if means ... if not c.parent.name.startswith(nm)
if c.parent.name[:namelen] != name:
alogger.parent = c.parent
c.parent = alogger
#---------------------------------------------------------------------------
# Logger classes and functions
#---------------------------------------------------------------------------
class Logger(Filterer):
"""
Instances of the Logger class represent a single logging channel. A
"logging channel" indicates an area of an application. Exactly how an
"area" is defined is up to the application developer. Since an
application can have any number of areas, logging channels are identified
by a unique string. Application areas can be nested (e.g. an area
of "input processing" might include sub-areas "read CSV files", "read
XLS files" and "read Gnumeric files"). To cater for this natural nesting,
channel names are organized into a namespace hierarchy where levels are
separated by periods, much like the Java or Python package namespace. So
in the instance given above, channel names might be "input" for the upper
level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels.
There is no arbitrary limit to the depth of nesting.
"""
def __init__(self, name, level=NOTSET):
"""
Initialize the logger with a name and an optional level.
"""
Filterer.__init__(self)
self.name = name
self.level = _checkLevel(level)
self.parent = None
self.propagate = 1
self.handlers = []
self.disabled = 0
def setLevel(self, level):
"""
Set the logging level of this logger.
"""
self.level = _checkLevel(level)
def debug(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'DEBUG'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.debug("Houston, we have a %s", "thorny problem", exc_info=1)
"""
if self.isEnabledFor(DEBUG):
self._log(DEBUG, msg, args, **kwargs)
def info(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'INFO'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.info("Houston, we have a %s", "interesting problem", exc_info=1)
"""
if self.isEnabledFor(INFO):
self._log(INFO, msg, args, **kwargs)
def warning(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'WARNING'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.warning("Houston, we have a %s", "bit of a problem", exc_info=1)
"""
if self.isEnabledFor(WARNING):
self._log(WARNING, msg, args, **kwargs)
warn = warning
def error(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'ERROR'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.error("Houston, we have a %s", "major problem", exc_info=1)
"""
if self.isEnabledFor(ERROR):
self._log(ERROR, msg, args, **kwargs)
def exception(self, msg, *args, **kwargs):
"""
Convenience method for logging an ERROR with exception information.
"""
kwargs['exc_info'] = 1
self.error(msg, *args, **kwargs)
def critical(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'CRITICAL'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.critical("Houston, we have a %s", "major disaster", exc_info=1)
"""
if self.isEnabledFor(CRITICAL):
self._log(CRITICAL, msg, args, **kwargs)
fatal = critical
def log(self, level, msg, *args, **kwargs):
"""
Log 'msg % args' with the integer severity 'level'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.log(level, "We have a %s", "mysterious problem", exc_info=1)
"""
if not isinstance(level, (int, long)):
if raiseExceptions:
raise TypeError("level must be an integer")
else:
return
if self.isEnabledFor(level):
self._log(level, msg, args, **kwargs)
def findCaller(self):
"""
Find the stack frame of the caller so that we can note the source
file name, line number and function name.
"""
f = currentframe()
#On some versions of IronPython, currentframe() returns None if
#IronPython isn't run with -X:Frames.
if f is not None:
f = f.f_back
rv = "(unknown file)", 0, "(unknown function)"
while hasattr(f, "f_code"):
co = f.f_code
filename = os.path.normcase(co.co_filename)
if filename == _srcfile:
f = f.f_back
continue
rv = (co.co_filename, f.f_lineno, co.co_name)
break
return rv
def makeRecord(self, name, level, fn, lno, msg, args, exc_info, func=None, extra=None):
"""
A factory method which can be overridden in subclasses to create
specialized LogRecords.
"""
rv = LogRecord(name, level, fn, lno, msg, args, exc_info, func)
if extra is not None:
for key in extra:
if (key in ["message", "asctime"]) or (key in rv.__dict__):
raise KeyError("Attempt to overwrite %r in LogRecord" % key)
rv.__dict__[key] = extra[key]
return rv
def _log(self, level, msg, args, exc_info=None, extra=None):
"""
Low-level logging routine which creates a LogRecord and then calls
all the handlers of this logger to handle the record.
"""
if _srcfile:
#IronPython doesn't track Python frames, so findCaller raises an
#exception on some versions of IronPython. We trap it here so that
#IronPython can use logging.
try:
fn, lno, func = self.findCaller()
except ValueError:
fn, lno, func = "(unknown file)", 0, "(unknown function)"
else:
fn, lno, func = "(unknown file)", 0, "(unknown function)"
if exc_info:
if not isinstance(exc_info, tuple):
exc_info = sys.exc_info()
record = self.makeRecord(self.name, level, fn, lno, msg, args, exc_info, func, extra)
self.handle(record)
def handle(self, record):
"""
Call the handlers for the specified record.
This method is used for unpickled records received from a socket, as
well as those created locally. Logger-level filtering is applied.
"""
if (not self.disabled) and self.filter(record):
self.callHandlers(record)
def addHandler(self, hdlr):
"""
Add the specified handler to this logger.
"""
_acquireLock()
try:
if not (hdlr in self.handlers):
self.handlers.append(hdlr)
finally:
_releaseLock()
def removeHandler(self, hdlr):
"""
Remove the specified handler from this logger.
"""
_acquireLock()
try:
if hdlr in self.handlers:
self.handlers.remove(hdlr)
finally:
_releaseLock()
def callHandlers(self, record):
"""
Pass a record to all relevant handlers.
Loop through all handlers for this logger and its parents in the
logger hierarchy. If no handler was found, output a one-off error
message to sys.stderr. Stop searching up the hierarchy whenever a
logger with the "propagate" attribute set to zero is found - that
will be the last logger whose handlers are called.
"""
c = self
found = 0
while c:
for hdlr in c.handlers:
found = found + 1
if record.levelno >= hdlr.level:
hdlr.handle(record)
if not c.propagate:
c = None #break out
else:
c = c.parent
if (found == 0) and raiseExceptions and not self.manager.emittedNoHandlerWarning:
sys.stderr.write("No handlers could be found for logger"
" \"%s\"\n" % self.name)
self.manager.emittedNoHandlerWarning = 1
def getEffectiveLevel(self):
"""
Get the effective level for this logger.
Loop through this logger and its parents in the logger hierarchy,
looking for a non-zero logging level. Return the first one found.
"""
logger = self
while logger:
if logger.level:
return logger.level
logger = logger.parent
return NOTSET
def isEnabledFor(self, level):
"""
Is this logger enabled for level 'level'?
"""
if self.manager.disable >= level:
return 0
return level >= self.getEffectiveLevel()
def getChild(self, suffix):
"""
Get a logger which is a descendant to this one.
This is a convenience method, such that
logging.getLogger('abc').getChild('def.ghi')
is the same as
logging.getLogger('abc.def.ghi')
It's useful, for example, when the parent logger is named using
__name__ rather than a literal string.
"""
if self.root is not self:
suffix = '.'.join((self.name, suffix))
return self.manager.getLogger(suffix)
class RootLogger(Logger):
"""
A root logger is not that different to any other logger, except that
it must have a logging level and there is only one instance of it in
the hierarchy.
"""
def __init__(self, level):
"""
Initialize the logger with the name "root".
"""
Logger.__init__(self, "root", level)
_loggerClass = Logger
class LoggerAdapter(object):
"""
An adapter for loggers which makes it easier to specify contextual
information in logging output.
"""
def __init__(self, logger, extra):
"""
Initialize the adapter with a logger and a dict-like object which
provides contextual information. This constructor signature allows
easy stacking of LoggerAdapters, if so desired.
You can effectively pass keyword arguments as shown in the
following example:
adapter = LoggerAdapter(someLogger, dict(p1=v1, p2="v2"))
"""
self.logger = logger
self.extra = extra
def process(self, msg, kwargs):
"""
Process the logging message and keyword arguments passed in to
a logging call to insert contextual information. You can either
manipulate the message itself, the keyword args or both. Return
the message and kwargs modified (or not) to suit your needs.
Normally, you'll only need to override this one method in a
LoggerAdapter subclass for your specific needs.
"""
kwargs["extra"] = self.extra
return msg, kwargs
def debug(self, msg, *args, **kwargs):
"""
Delegate a debug call to the underlying logger, after adding
contextual information from this adapter instance.
"""
msg, kwargs = self.process(msg, kwargs)
self.logger.debug(msg, *args, **kwargs)
def info(self, msg, *args, **kwargs):
"""
Delegate an info call to the underlying logger, after adding
contextual information from this adapter instance.
"""
msg, kwargs = self.process(msg, kwargs)
self.logger.info(msg, *args, **kwargs)
def warning(self, msg, *args, **kwargs):
"""
Delegate a warning call to the underlying logger, after adding
contextual information from this adapter instance.
"""
msg, kwargs = self.process(msg, kwargs)
self.logger.warning(msg, *args, **kwargs)
def error(self, msg, *args, **kwargs):
"""
Delegate an error call to the underlying logger, after adding
contextual information from this adapter instance.
"""
msg, kwargs = self.process(msg, kwargs)
self.logger.error(msg, *args, **kwargs)
def exception(self, msg, *args, **kwargs):
"""
Delegate an exception call to the underlying logger, after adding
contextual information from this adapter instance.
"""
msg, kwargs = self.process(msg, kwargs)
kwargs["exc_info"] = 1
self.logger.error(msg, *args, **kwargs)
def critical(self, msg, *args, **kwargs):
"""
Delegate a critical call to the underlying logger, after adding
contextual information from this adapter instance.
"""
msg, kwargs = self.process(msg, kwargs)
self.logger.critical(msg, *args, **kwargs)
def log(self, level, msg, *args, **kwargs):
"""
Delegate a log call to the underlying logger, after adding
contextual information from this adapter instance.
"""
msg, kwargs = self.process(msg, kwargs)
self.logger.log(level, msg, *args, **kwargs)
def isEnabledFor(self, level):
"""
See if the underlying logger is enabled for the specified level.
"""
return self.logger.isEnabledFor(level)
root = RootLogger(WARNING)
Logger.root = root
Logger.manager = Manager(Logger.root)
#---------------------------------------------------------------------------
# Configuration classes and functions
#---------------------------------------------------------------------------
BASIC_FORMAT = "%(levelname)s:%(name)s:%(message)s"
def basicConfig(**kwargs):
"""
Do basic configuration for the logging system.
This function does nothing if the root logger already has handlers
configured. It is a convenience method intended for use by simple scripts
to do one-shot configuration of the logging package.
The default behaviour is to create a StreamHandler which writes to
sys.stderr, set a formatter using the BASIC_FORMAT format string, and
add the handler to the root logger.
A number of optional keyword arguments may be specified, which can alter
the default behaviour.
filename Specifies that a FileHandler be created, using the specified
filename, rather than a StreamHandler.
filemode Specifies the mode to open the file, if filename is specified
(if filemode is unspecified, it defaults to 'a').
format Use the specified format string for the handler.
datefmt Use the specified date/time format.
level Set the root logger level to the specified level.
stream Use the specified stream to initialize the StreamHandler. Note
that this argument is incompatible with 'filename' - if both
are present, 'stream' is ignored.
Note that you could specify a stream created using open(filename, mode)
rather than passing the filename and mode in. However, it should be
remembered that StreamHandler does not close its stream (since it may be
using sys.stdout or sys.stderr), whereas FileHandler closes its stream
when the handler is closed.
"""
# Add thread safety in case someone mistakenly calls
# basicConfig() from multiple threads
_acquireLock()
try:
if len(root.handlers) == 0:
filename = kwargs.get("filename")
if filename:
mode = kwargs.get("filemode", 'a')
hdlr = FileHandler(filename, mode)
else:
stream = kwargs.get("stream")
hdlr = StreamHandler(stream)
fs = kwargs.get("format", BASIC_FORMAT)
dfs = kwargs.get("datefmt", None)
fmt = Formatter(fs, dfs)
hdlr.setFormatter(fmt)
root.addHandler(hdlr)
level = kwargs.get("level")
if level is not None:
root.setLevel(level)
finally:
_releaseLock()
#---------------------------------------------------------------------------
# Utility functions at module level.
# Basically delegate everything to the root logger.
#---------------------------------------------------------------------------
def getLogger(name=None):
"""
Return a logger with the specified name, creating it if necessary.
If no name is specified, return the root logger.
"""
if name:
return Logger.manager.getLogger(name)
else:
return root
#def getRootLogger():
# """
# Return the root logger.
#
# Note that getLogger('') now does the same thing, so this function is
# deprecated and may disappear in the future.
# """
# return root
def critical(msg, *args, **kwargs):
"""
Log a message with severity 'CRITICAL' on the root logger.
"""
if len(root.handlers) == 0:
basicConfig()
root.critical(msg, *args, **kwargs)
fatal = critical
def error(msg, *args, **kwargs):
"""
Log a message with severity 'ERROR' on the root logger.
"""
if len(root.handlers) == 0:
basicConfig()
root.error(msg, *args, **kwargs)
def exception(msg, *args, **kwargs):
"""
Log a message with severity 'ERROR' on the root logger,
with exception information.
"""
kwargs['exc_info'] = 1
error(msg, *args, **kwargs)
def warning(msg, *args, **kwargs):
"""
Log a message with severity 'WARNING' on the root logger.
"""
if len(root.handlers) == 0:
basicConfig()
root.warning(msg, *args, **kwargs)
warn = warning
def info(msg, *args, **kwargs):
"""
Log a message with severity 'INFO' on the root logger.
"""
if len(root.handlers) == 0:
basicConfig()
root.info(msg, *args, **kwargs)
def debug(msg, *args, **kwargs):
"""
Log a message with severity 'DEBUG' on the root logger.
"""
if len(root.handlers) == 0:
basicConfig()
root.debug(msg, *args, **kwargs)
def log(level, msg, *args, **kwargs):
"""
Log 'msg % args' with the integer severity 'level' on the root logger.
"""
if len(root.handlers) == 0:
basicConfig()
root.log(level, msg, *args, **kwargs)
def disable(level):
"""
Disable all logging calls of severity 'level' and below.
"""
root.manager.disable = level
def shutdown(handlerList=_handlerList):
"""
Perform any cleanup actions in the logging system (e.g. flushing
buffers).
Should be called at application exit.
"""
for wr in reversed(handlerList[:]):
#errors might occur, for example, if files are locked
#we just ignore them if raiseExceptions is not set
try:
h = wr()
if h:
try:
h.acquire()
h.flush()
h.close()
except (IOError, ValueError):
# Ignore errors which might be caused
# because handlers have been closed but
# references to them are still around at
# application exit.
pass
finally:
h.release()
except:
if raiseExceptions:
raise
#else, swallow
#Let's try and shutdown automatically on application exit...
import atexit
atexit.register(shutdown)
# Null handler
class NullHandler(Handler):
"""
This handler does nothing. It's intended to be used to avoid the
"No handlers could be found for logger XXX" one-off warning. This is
important for library code, which may contain code to log events. If a user
of the library does not configure logging, the one-off warning might be
produced; to avoid this, the library developer simply needs to instantiate
a NullHandler and add it to the top-level logger of the library module or
package.
"""
def handle(self, record):
pass
def emit(self, record):
pass
def createLock(self):
self.lock = None
# Warnings integration
_warnings_showwarning = None
def _showwarning(message, category, filename, lineno, file=None, line=None):
"""
Implementation of showwarnings which redirects to logging, which will first
check to see if the file parameter is None. If a file is specified, it will
delegate to the original warnings implementation of showwarning. Otherwise,
it will call warnings.formatwarning and will log the resulting string to a
warnings logger named "py.warnings" with level logging.WARNING.
"""
if file is not None:
if _warnings_showwarning is not None:
_warnings_showwarning(message, category, filename, lineno, file, line)
else:
s = warnings.formatwarning(message, category, filename, lineno, line)
logger = getLogger("py.warnings")
if not logger.handlers:
logger.addHandler(NullHandler())
logger.warning("%s", s)
def captureWarnings(capture):
"""
If capture is true, redirect all warnings to the logging package.
If capture is False, ensure that warnings are not redirected to logging
but to their original destinations.
"""
global _warnings_showwarning
if capture:
if _warnings_showwarning is None:
_warnings_showwarning = warnings.showwarning
warnings.showwarning = _showwarning
else:
if _warnings_showwarning is not None:
warnings.showwarning = _warnings_showwarning
_warnings_showwarning = None
"""Pathname and path-related operations for the Macintosh."""
import os
import warnings
from stat import *
import genericpath
from genericpath import *
from genericpath import _unicode
__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime", "islink","exists","lexists","isdir","isfile",
"walk","expanduser","expandvars","normpath","abspath",
"curdir","pardir","sep","pathsep","defpath","altsep","extsep",
"devnull","realpath","supports_unicode_filenames"]
# strings representing various path-related bits and pieces
curdir = ':'
pardir = '::'
extsep = '.'
sep = ':'
pathsep = '\n'
defpath = ':'
altsep = None
devnull = 'Dev:Null'
# Normalize the case of a pathname. Dummy in Posix, but <s>.lower() here.
def normcase(path):
return path.lower()
def isabs(s):
"""Return true if a path is absolute.
On the Mac, relative paths begin with a colon,
but as a special case, paths with no colons at all are also relative.
Anything else is absolute (the string up to the first colon is the
volume name)."""
return ':' in s and s[0] != ':'
def join(s, *p):
path = s
for t in p:
if (not path) or isabs(t):
path = t
continue
if t[:1] == ':':
t = t[1:]
if ':' not in path:
path = ':' + path
if path[-1:] != ':':
path = path + ':'
path = path + t
return path
def split(s):
"""Split a pathname into two parts: the directory leading up to the final
bit, and the basename (the filename, without colons, in that directory).
The result (s, t) is such that join(s, t) yields the original argument."""
if ':' not in s: return '', s
colon = 0
for i in range(len(s)):
if s[i] == ':': colon = i + 1
path, file = s[:colon-1], s[colon:]
if path and not ':' in path:
path = path + ':'
return path, file
def splitext(p):
return genericpath._splitext(p, sep, altsep, extsep)
splitext.__doc__ = genericpath._splitext.__doc__
def splitdrive(p):
"""Split a pathname into a drive specification and the rest of the
path. Useful on DOS/Windows/NT; on the Mac, the drive is always
empty (don't use the volume name -- it doesn't have the same
syntactic and semantic oddities as DOS drive letters, such as there
being a separate current directory per drive)."""
return '', p
# Short interfaces to split()
def dirname(s): return split(s)[0]
def basename(s): return split(s)[1]
def ismount(s):
if not isabs(s):
return False
components = split(s)
return len(components) == 2 and components[1] == ''
def islink(s):
"""Return true if the pathname refers to a symbolic link."""
try:
import Carbon.File
return Carbon.File.ResolveAliasFile(s, 0)[2]
except:
return False
# Is `stat`/`lstat` a meaningful difference on the Mac? This is safe in any
# case.
def lexists(path):
"""Test whether a path exists. Returns True for broken symbolic links"""
try:
st = os.lstat(path)
except os.error:
return False
return True
def expandvars(path):
"""Dummy to retain interface-compatibility with other operating systems."""
return path
def expanduser(path):
"""Dummy to retain interface-compatibility with other operating systems."""
return path
class norm_error(Exception):
"""Path cannot be normalized"""
def normpath(s):
"""Normalize a pathname. Will return the same result for
equivalent paths."""
if ":" not in s:
return ":"+s
comps = s.split(":")
i = 1
while i < len(comps)-1:
if comps[i] == "" and comps[i-1] != "":
if i > 1:
del comps[i-1:i+1]
i = i - 1
else:
# best way to handle this is to raise an exception
raise norm_error, 'Cannot use :: immediately after volume name'
else:
i = i + 1
s = ":".join(comps)
# remove trailing ":" except for ":" and "Volume:"
if s[-1] == ":" and len(comps) > 2 and s != ":"*len(s):
s = s[:-1]
return s
def walk(top, func, arg):
"""Directory tree walk with callback function.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), call func(arg, dirname, fnames).
dirname is the name of the directory, and fnames a list of the names of
the files and subdirectories in dirname (excluding '.' and '..'). func
may modify the fnames list in-place (e.g. via del or slice assignment),
and walk will only recurse into the subdirectories whose names remain in
fnames; this can be used to implement a filter, or to impose a specific
order of visiting. No semantics are defined for, or required of, arg,
beyond that arg is always passed to func. It can be used, e.g., to pass
a filename pattern, or a mutable object designed to accumulate
statistics. Passing None for arg is common."""
warnings.warnpy3k("In 3.x, os.path.walk is removed in favor of os.walk.",
stacklevel=2)
try:
names = os.listdir(top)
except os.error:
return
func(arg, top, names)
for name in names:
name = join(top, name)
if isdir(name) and not islink(name):
walk(name, func, arg)
def abspath(path):
"""Return an absolute path."""
if not isabs(path):
if isinstance(path, _unicode):
cwd = os.getcwdu()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path)
# realpath is a no-op on systems without islink support
def realpath(path):
path = abspath(path)
try:
import Carbon.File
except ImportError:
return path
if not path:
return path
components = path.split(':')
path = components[0] + ':'
for c in components[1:]:
path = join(path, c)
try:
path = Carbon.File.FSResolveAliasFile(path, 1)[0].as_pathname()
except Carbon.File.Error:
pass
return path
supports_unicode_filenames = True
"""Macintosh-specific module for conversion between pathnames and URLs.
Do not import directly; use urllib instead."""
import urllib
import os
__all__ = ["url2pathname","pathname2url"]
def url2pathname(pathname):
"""OS-specific conversion from a relative URL of the 'file' scheme
to a file system path; not recommended for general use."""
#
# XXXX The .. handling should be fixed...
#
tp = urllib.splittype(pathname)[0]
if tp and tp != 'file':
raise RuntimeError, 'Cannot convert non-local URL to pathname'
# Turn starting /// into /, an empty hostname means current host
if pathname[:3] == '///':
pathname = pathname[2:]
elif pathname[:2] == '//':
raise RuntimeError, 'Cannot convert non-local URL to pathname'
components = pathname.split('/')
# Remove . and embedded ..
i = 0
while i < len(components):
if components[i] == '.':
del components[i]
elif components[i] == '..' and i > 0 and \
components[i-1] not in ('', '..'):
del components[i-1:i+1]
i = i-1
elif components[i] == '' and i > 0 and components[i-1] != '':
del components[i]
else:
i = i+1
if not components[0]:
# Absolute unix path, don't start with colon
rv = ':'.join(components[1:])
else:
# relative unix path, start with colon. First replace
# leading .. by empty strings (giving ::file)
i = 0
while i < len(components) and components[i] == '..':
components[i] = ''
i = i + 1
rv = ':' + ':'.join(components)
# and finally unquote slashes and other funny characters
return urllib.unquote(rv)
def pathname2url(pathname):
"""OS-specific conversion from a file system path to a relative URL
of the 'file' scheme; not recommended for general use."""
if '/' in pathname:
raise RuntimeError, "Cannot convert pathname containing slashes"
components = pathname.split(':')
# Remove empty first and/or last component
if components[0] == '':
del components[0]
if components[-1] == '':
del components[-1]
# Replace empty string ('::') by .. (will result in '/../' later)
for i in range(len(components)):
if components[i] == '':
components[i] = '..'
# Truncate names longer than 31 bytes
components = map(_pncomp2url, components)
if os.path.isabs(pathname):
return '/' + '/'.join(components)
else:
return '/'.join(components)
def _pncomp2url(component):
component = urllib.quote(component[:31], safe='') # We want to quote slashes
return component
"""Read/write support for Maildir, mbox, MH, Babyl, and MMDF mailboxes."""
# Notes for authors of new mailbox subclasses:
#
# Remember to fsync() changes to disk before closing a modified file
# or returning from a flush() method. See functions _sync_flush() and
# _sync_close().
import sys
import os
import time
import calendar
import socket
import errno
import copy
import email
import email.message
import email.generator
import StringIO
try:
if sys.platform == 'os2emx':
# OS/2 EMX fcntl() not adequate
raise ImportError
import fcntl
except ImportError:
fcntl = None
import warnings
with warnings.catch_warnings():
if sys.py3kwarning:
warnings.filterwarnings("ignore", ".*rfc822 has been removed",
DeprecationWarning)
import rfc822
__all__ = [ 'Mailbox', 'Maildir', 'mbox', 'MH', 'Babyl', 'MMDF',
'Message', 'MaildirMessage', 'mboxMessage', 'MHMessage',
'BabylMessage', 'MMDFMessage', 'UnixMailbox',
'PortableUnixMailbox', 'MmdfMailbox', 'MHMailbox', 'BabylMailbox' ]
class Mailbox:
"""A group of messages in a particular place."""
def __init__(self, path, factory=None, create=True):
"""Initialize a Mailbox instance."""
self._path = os.path.abspath(os.path.expanduser(path))
self._factory = factory
def add(self, message):
"""Add message and return assigned key."""
raise NotImplementedError('Method must be implemented by subclass')
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
raise NotImplementedError('Method must be implemented by subclass')
def __delitem__(self, key):
self.remove(key)
def discard(self, key):
"""If the keyed message exists, remove it."""
try:
self.remove(key)
except KeyError:
pass
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
raise NotImplementedError('Method must be implemented by subclass')
def get(self, key, default=None):
"""Return the keyed message, or default if it doesn't exist."""
try:
return self.__getitem__(key)
except KeyError:
return default
def __getitem__(self, key):
"""Return the keyed message; raise KeyError if it doesn't exist."""
if not self._factory:
return self.get_message(key)
else:
return self._factory(self.get_file(key))
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
raise NotImplementedError('Method must be implemented by subclass')
def get_string(self, key):
"""Return a string representation or raise a KeyError."""
raise NotImplementedError('Method must be implemented by subclass')
def get_file(self, key):
"""Return a file-like representation or raise a KeyError."""
raise NotImplementedError('Method must be implemented by subclass')
def iterkeys(self):
"""Return an iterator over keys."""
raise NotImplementedError('Method must be implemented by subclass')
def keys(self):
"""Return a list of keys."""
return list(self.iterkeys())
def itervalues(self):
"""Return an iterator over all messages."""
for key in self.iterkeys():
try:
value = self[key]
except KeyError:
continue
yield value
def __iter__(self):
return self.itervalues()
def values(self):
"""Return a list of messages. Memory intensive."""
return list(self.itervalues())
def iteritems(self):
"""Return an iterator over (key, message) tuples."""
for key in self.iterkeys():
try:
value = self[key]
except KeyError:
continue
yield (key, value)
def items(self):
"""Return a list of (key, message) tuples. Memory intensive."""
return list(self.iteritems())
def has_key(self, key):
"""Return True if the keyed message exists, False otherwise."""
raise NotImplementedError('Method must be implemented by subclass')
def __contains__(self, key):
return self.has_key(key)
def __len__(self):
"""Return a count of messages in the mailbox."""
raise NotImplementedError('Method must be implemented by subclass')
def clear(self):
"""Delete all messages."""
for key in self.iterkeys():
self.discard(key)
def pop(self, key, default=None):
"""Delete the keyed message and return it, or default."""
try:
result = self[key]
except KeyError:
return default
self.discard(key)
return result
def popitem(self):
"""Delete an arbitrary (key, message) pair and return it."""
for key in self.iterkeys():
return (key, self.pop(key)) # This is only run once.
else:
raise KeyError('No messages in mailbox')
def update(self, arg=None):
"""Change the messages that correspond to certain keys."""
if hasattr(arg, 'iteritems'):
source = arg.iteritems()
elif hasattr(arg, 'items'):
source = arg.items()
else:
source = arg
bad_key = False
for key, message in source:
try:
self[key] = message
except KeyError:
bad_key = True
if bad_key:
raise KeyError('No message with key(s)')
def flush(self):
"""Write any pending changes to the disk."""
raise NotImplementedError('Method must be implemented by subclass')
def lock(self):
"""Lock the mailbox."""
raise NotImplementedError('Method must be implemented by subclass')
def unlock(self):
"""Unlock the mailbox if it is locked."""
raise NotImplementedError('Method must be implemented by subclass')
def close(self):
"""Flush and close the mailbox."""
raise NotImplementedError('Method must be implemented by subclass')
# Whether each message must end in a newline
_append_newline = False
def _dump_message(self, message, target, mangle_from_=False):
# Most files are opened in binary mode to allow predictable seeking.
# To get native line endings on disk, the user-friendly \n line endings
# used in strings and by email.Message are translated here.
"""Dump message contents to target file."""
if isinstance(message, email.message.Message):
buffer = StringIO.StringIO()
gen = email.generator.Generator(buffer, mangle_from_, 0)
gen.flatten(message)
buffer.seek(0)
data = buffer.read().replace('\n', os.linesep)
target.write(data)
if self._append_newline and not data.endswith(os.linesep):
# Make sure the message ends with a newline
target.write(os.linesep)
elif isinstance(message, str):
if mangle_from_:
message = message.replace('\nFrom ', '\n>From ')
message = message.replace('\n', os.linesep)
target.write(message)
if self._append_newline and not message.endswith(os.linesep):
# Make sure the message ends with a newline
target.write(os.linesep)
elif hasattr(message, 'read'):
lastline = None
while True:
line = message.readline()
if line == '':
break
if mangle_from_ and line.startswith('From '):
line = '>From ' + line[5:]
line = line.replace('\n', os.linesep)
target.write(line)
lastline = line
if self._append_newline and lastline and not lastline.endswith(os.linesep):
# Make sure the message ends with a newline
target.write(os.linesep)
else:
raise TypeError('Invalid message type: %s' % type(message))
class Maildir(Mailbox):
"""A qmail-style Maildir mailbox."""
colon = ':'
def __init__(self, dirname, factory=rfc822.Message, create=True):
"""Initialize a Maildir instance."""
Mailbox.__init__(self, dirname, factory, create)
self._paths = {
'tmp': os.path.join(self._path, 'tmp'),
'new': os.path.join(self._path, 'new'),
'cur': os.path.join(self._path, 'cur'),
}
if not os.path.exists(self._path):
if create:
os.mkdir(self._path, 0700)
for path in self._paths.values():
os.mkdir(path, 0o700)
else:
raise NoSuchMailboxError(self._path)
self._toc = {}
self._toc_mtimes = {'cur': 0, 'new': 0}
self._last_read = 0 # Records last time we read cur/new
self._skewfactor = 0.1 # Adjust if os/fs clocks are skewing
def add(self, message):
"""Add message and return assigned key."""
tmp_file = self._create_tmp()
try:
self._dump_message(message, tmp_file)
except BaseException:
tmp_file.close()
os.remove(tmp_file.name)
raise
_sync_close(tmp_file)
if isinstance(message, MaildirMessage):
subdir = message.get_subdir()
suffix = self.colon + message.get_info()
if suffix == self.colon:
suffix = ''
else:
subdir = 'new'
suffix = ''
uniq = os.path.basename(tmp_file.name).split(self.colon)[0]
dest = os.path.join(self._path, subdir, uniq + suffix)
if isinstance(message, MaildirMessage):
os.utime(tmp_file.name,
(os.path.getatime(tmp_file.name), message.get_date()))
# No file modification should be done after the file is moved to its
# final position in order to prevent race conditions with changes
# from other programs
try:
if hasattr(os, 'link'):
os.link(tmp_file.name, dest)
os.remove(tmp_file.name)
else:
os.rename(tmp_file.name, dest)
except OSError, e:
os.remove(tmp_file.name)
if e.errno == errno.EEXIST:
raise ExternalClashError('Name clash with existing message: %s'
% dest)
else:
raise
return uniq
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
os.remove(os.path.join(self._path, self._lookup(key)))
def discard(self, key):
"""If the keyed message exists, remove it."""
# This overrides an inapplicable implementation in the superclass.
try:
self.remove(key)
except KeyError:
pass
except OSError, e:
if e.errno != errno.ENOENT:
raise
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
old_subpath = self._lookup(key)
temp_key = self.add(message)
temp_subpath = self._lookup(temp_key)
if isinstance(message, MaildirMessage):
# temp's subdir and suffix were specified by message.
dominant_subpath = temp_subpath
else:
# temp's subdir and suffix were defaults from add().
dominant_subpath = old_subpath
subdir = os.path.dirname(dominant_subpath)
if self.colon in dominant_subpath:
suffix = self.colon + dominant_subpath.split(self.colon)[-1]
else:
suffix = ''
self.discard(key)
tmp_path = os.path.join(self._path, temp_subpath)
new_path = os.path.join(self._path, subdir, key + suffix)
if isinstance(message, MaildirMessage):
os.utime(tmp_path,
(os.path.getatime(tmp_path), message.get_date()))
# No file modification should be done after the file is moved to its
# final position in order to prevent race conditions with changes
# from other programs
os.rename(tmp_path, new_path)
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
subpath = self._lookup(key)
f = open(os.path.join(self._path, subpath), 'r')
try:
if self._factory:
msg = self._factory(f)
else:
msg = MaildirMessage(f)
finally:
f.close()
subdir, name = os.path.split(subpath)
msg.set_subdir(subdir)
if self.colon in name:
msg.set_info(name.split(self.colon)[-1])
msg.set_date(os.path.getmtime(os.path.join(self._path, subpath)))
return msg
def get_string(self, key):
"""Return a string representation or raise a KeyError."""
f = open(os.path.join(self._path, self._lookup(key)), 'r')
try:
return f.read()
finally:
f.close()
def get_file(self, key):
"""Return a file-like representation or raise a KeyError."""
f = open(os.path.join(self._path, self._lookup(key)), 'rb')
return _ProxyFile(f)
def iterkeys(self):
"""Return an iterator over keys."""
self._refresh()
for key in self._toc:
try:
self._lookup(key)
except KeyError:
continue
yield key
def has_key(self, key):
"""Return True if the keyed message exists, False otherwise."""
self._refresh()
return key in self._toc
def __len__(self):
"""Return a count of messages in the mailbox."""
self._refresh()
return len(self._toc)
def flush(self):
"""Write any pending changes to disk."""
# Maildir changes are always written immediately, so there's nothing
# to do.
pass
def lock(self):
"""Lock the mailbox."""
return
def unlock(self):
"""Unlock the mailbox if it is locked."""
return
def close(self):
"""Flush and close the mailbox."""
return
def list_folders(self):
"""Return a list of folder names."""
result = []
for entry in os.listdir(self._path):
if len(entry) > 1 and entry[0] == '.' and \
os.path.isdir(os.path.join(self._path, entry)):
result.append(entry[1:])
return result
def get_folder(self, folder):
"""Return a Maildir instance for the named folder."""
return Maildir(os.path.join(self._path, '.' + folder),
factory=self._factory,
create=False)
def add_folder(self, folder):
"""Create a folder and return a Maildir instance representing it."""
path = os.path.join(self._path, '.' + folder)
result = Maildir(path, factory=self._factory)
maildirfolder_path = os.path.join(path, 'maildirfolder')
if not os.path.exists(maildirfolder_path):
os.close(os.open(maildirfolder_path, os.O_CREAT | os.O_WRONLY,
0666))
return result
def remove_folder(self, folder):
"""Delete the named folder, which must be empty."""
path = os.path.join(self._path, '.' + folder)
for entry in os.listdir(os.path.join(path, 'new')) + \
os.listdir(os.path.join(path, 'cur')):
if len(entry) < 1 or entry[0] != '.':
raise NotEmptyError('Folder contains message(s): %s' % folder)
for entry in os.listdir(path):
if entry != 'new' and entry != 'cur' and entry != 'tmp' and \
os.path.isdir(os.path.join(path, entry)):
raise NotEmptyError("Folder contains subdirectory '%s': %s" %
(folder, entry))
for root, dirs, files in os.walk(path, topdown=False):
for entry in files:
os.remove(os.path.join(root, entry))
for entry in dirs:
os.rmdir(os.path.join(root, entry))
os.rmdir(path)
def clean(self):
"""Delete old files in "tmp"."""
now = time.time()
for entry in os.listdir(os.path.join(self._path, 'tmp')):
path = os.path.join(self._path, 'tmp', entry)
if now - os.path.getatime(path) > 129600: # 60 * 60 * 36
os.remove(path)
_count = 1 # This is used to generate unique file names.
def _create_tmp(self):
"""Create a file in the tmp subdirectory and open and return it."""
now = time.time()
hostname = socket.gethostname()
if '/' in hostname:
hostname = hostname.replace('/', r'\057')
if ':' in hostname:
hostname = hostname.replace(':', r'\072')
uniq = "%s.M%sP%sQ%s.%s" % (int(now), int(now % 1 * 1e6), os.getpid(),
Maildir._count, hostname)
path = os.path.join(self._path, 'tmp', uniq)
try:
os.stat(path)
except OSError, e:
if e.errno == errno.ENOENT:
Maildir._count += 1
try:
return _create_carefully(path)
except OSError, e:
if e.errno != errno.EEXIST:
raise
else:
raise
# Fall through to here if stat succeeded or open raised EEXIST.
raise ExternalClashError('Name clash prevented file creation: %s' %
path)
def _refresh(self):
"""Update table of contents mapping."""
# If it has been less than two seconds since the last _refresh() call,
# we have to unconditionally re-read the mailbox just in case it has
# been modified, because os.path.mtime() has a 2 sec resolution in the
# most common worst case (FAT) and a 1 sec resolution typically. This
# results in a few unnecessary re-reads when _refresh() is called
# multiple times in that interval, but once the clock ticks over, we
# will only re-read as needed. Because the filesystem might be being
# served by an independent system with its own clock, we record and
# compare with the mtimes from the filesystem. Because the other
# system's clock might be skewing relative to our clock, we add an
# extra delta to our wait. The default is one tenth second, but is an
# instance variable and so can be adjusted if dealing with a
# particularly skewed or irregular system.
if time.time() - self._last_read > 2 + self._skewfactor:
refresh = False
for subdir in self._toc_mtimes:
mtime = os.path.getmtime(self._paths[subdir])
if mtime > self._toc_mtimes[subdir]:
refresh = True
self._toc_mtimes[subdir] = mtime
if not refresh:
return
# Refresh toc
self._toc = {}
for subdir in self._toc_mtimes:
path = self._paths[subdir]
for entry in os.listdir(path):
p = os.path.join(path, entry)
if os.path.isdir(p):
continue
uniq = entry.split(self.colon)[0]
self._toc[uniq] = os.path.join(subdir, entry)
self._last_read = time.time()
def _lookup(self, key):
"""Use TOC to return subpath for given key, or raise a KeyError."""
try:
if os.path.exists(os.path.join(self._path, self._toc[key])):
return self._toc[key]
except KeyError:
pass
self._refresh()
try:
return self._toc[key]
except KeyError:
raise KeyError('No message with key: %s' % key)
# This method is for backward compatibility only.
def next(self):
"""Return the next message in a one-time iteration."""
if not hasattr(self, '_onetime_keys'):
self._onetime_keys = self.iterkeys()
while True:
try:
return self[self._onetime_keys.next()]
except StopIteration:
return None
except KeyError:
continue
class _singlefileMailbox(Mailbox):
"""A single-file mailbox."""
def __init__(self, path, factory=None, create=True):
"""Initialize a single-file mailbox."""
Mailbox.__init__(self, path, factory, create)
try:
f = open(self._path, 'rb+')
except IOError, e:
if e.errno == errno.ENOENT:
if create:
f = open(self._path, 'wb+')
else:
raise NoSuchMailboxError(self._path)
elif e.errno in (errno.EACCES, errno.EROFS):
f = open(self._path, 'rb')
else:
raise
self._file = f
self._toc = None
self._next_key = 0
self._pending = False # No changes require rewriting the file.
self._pending_sync = False # No need to sync the file
self._locked = False
self._file_length = None # Used to record mailbox size
def add(self, message):
"""Add message and return assigned key."""
self._lookup()
self._toc[self._next_key] = self._append_message(message)
self._next_key += 1
# _append_message appends the message to the mailbox file. We
# don't need a full rewrite + rename, sync is enough.
self._pending_sync = True
return self._next_key - 1
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
self._lookup(key)
del self._toc[key]
self._pending = True
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
self._lookup(key)
self._toc[key] = self._append_message(message)
self._pending = True
def iterkeys(self):
"""Return an iterator over keys."""
self._lookup()
for key in self._toc.keys():
yield key
def has_key(self, key):
"""Return True if the keyed message exists, False otherwise."""
self._lookup()
return key in self._toc
def __len__(self):
"""Return a count of messages in the mailbox."""
self._lookup()
return len(self._toc)
def lock(self):
"""Lock the mailbox."""
if not self._locked:
_lock_file(self._file)
self._locked = True
def unlock(self):
"""Unlock the mailbox if it is locked."""
if self._locked:
_unlock_file(self._file)
self._locked = False
def flush(self):
"""Write any pending changes to disk."""
if not self._pending:
if self._pending_sync:
# Messages have only been added, so syncing the file
# is enough.
_sync_flush(self._file)
self._pending_sync = False
return
# In order to be writing anything out at all, self._toc must
# already have been generated (and presumably has been modified
# by adding or deleting an item).
assert self._toc is not None
# Check length of self._file; if it's changed, some other process
# has modified the mailbox since we scanned it.
self._file.seek(0, 2)
cur_len = self._file.tell()
if cur_len != self._file_length:
raise ExternalClashError('Size of mailbox file changed '
'(expected %i, found %i)' %
(self._file_length, cur_len))
new_file = _create_temporary(self._path)
try:
new_toc = {}
self._pre_mailbox_hook(new_file)
for key in sorted(self._toc.keys()):
start, stop = self._toc[key]
self._file.seek(start)
self._pre_message_hook(new_file)
new_start = new_file.tell()
while True:
buffer = self._file.read(min(4096,
stop - self._file.tell()))
if buffer == '':
break
new_file.write(buffer)
new_toc[key] = (new_start, new_file.tell())
self._post_message_hook(new_file)
self._file_length = new_file.tell()
except:
new_file.close()
os.remove(new_file.name)
raise
_sync_close(new_file)
# self._file is about to get replaced, so no need to sync.
self._file.close()
# Make sure the new file's mode is the same as the old file's
mode = os.stat(self._path).st_mode
os.chmod(new_file.name, mode)
try:
os.rename(new_file.name, self._path)
except OSError, e:
if e.errno == errno.EEXIST or \
(os.name == 'os2' and e.errno == errno.EACCES):
os.remove(self._path)
os.rename(new_file.name, self._path)
else:
raise
self._file = open(self._path, 'rb+')
self._toc = new_toc
self._pending = False
self._pending_sync = False
if self._locked:
_lock_file(self._file, dotlock=False)
def _pre_mailbox_hook(self, f):
"""Called before writing the mailbox to file f."""
return
def _pre_message_hook(self, f):
"""Called before writing each message to file f."""
return
def _post_message_hook(self, f):
"""Called after writing each message to file f."""
return
def close(self):
"""Flush and close the mailbox."""
try:
self.flush()
finally:
try:
if self._locked:
self.unlock()
finally:
self._file.close() # Sync has been done by self.flush() above.
def _lookup(self, key=None):
"""Return (start, stop) or raise KeyError."""
if self._toc is None:
self._generate_toc()
if key is not None:
try:
return self._toc[key]
except KeyError:
raise KeyError('No message with key: %s' % key)
def _append_message(self, message):
"""Append message to mailbox and return (start, stop) offsets."""
self._file.seek(0, 2)
before = self._file.tell()
if len(self._toc) == 0 and not self._pending:
# This is the first message, and the _pre_mailbox_hook
# hasn't yet been called. If self._pending is True,
# messages have been removed, so _pre_mailbox_hook must
# have been called already.
self._pre_mailbox_hook(self._file)
try:
self._pre_message_hook(self._file)
offsets = self._install_message(message)
self._post_message_hook(self._file)
except BaseException:
self._file.truncate(before)
raise
self._file.flush()
self._file_length = self._file.tell() # Record current length of mailbox
return offsets
class _mboxMMDF(_singlefileMailbox):
"""An mbox or MMDF mailbox."""
_mangle_from_ = True
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
from_line = self._file.readline().replace(os.linesep, '')
string = self._file.read(stop - self._file.tell())
msg = self._message_factory(string.replace(os.linesep, '\n'))
msg.set_from(from_line[5:])
return msg
def get_string(self, key, from_=False):
"""Return a string representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
if not from_:
self._file.readline()
string = self._file.read(stop - self._file.tell())
return string.replace(os.linesep, '\n')
def get_file(self, key, from_=False):
"""Return a file-like representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
if not from_:
self._file.readline()
return _PartialFile(self._file, self._file.tell(), stop)
def _install_message(self, message):
"""Format a message and blindly write to self._file."""
from_line = None
if isinstance(message, str) and message.startswith('From '):
newline = message.find('\n')
if newline != -1:
from_line = message[:newline]
message = message[newline + 1:]
else:
from_line = message
message = ''
elif isinstance(message, _mboxMMDFMessage):
from_line = 'From ' + message.get_from()
elif isinstance(message, email.message.Message):
from_line = message.get_unixfrom() # May be None.
if from_line is None:
from_line = 'From MAILER-DAEMON %s' % time.asctime(time.gmtime())
start = self._file.tell()
self._file.write(from_line + os.linesep)
self._dump_message(message, self._file, self._mangle_from_)
stop = self._file.tell()
return (start, stop)
class mbox(_mboxMMDF):
"""A classic mbox mailbox."""
_mangle_from_ = True
# All messages must end in a newline character, and
# _post_message_hooks outputs an empty line between messages.
_append_newline = True
def __init__(self, path, factory=None, create=True):
"""Initialize an mbox mailbox."""
self._message_factory = mboxMessage
_mboxMMDF.__init__(self, path, factory, create)
def _post_message_hook(self, f):
"""Called after writing each message to file f."""
f.write(os.linesep)
def _generate_toc(self):
"""Generate key-to-(start, stop) table of contents."""
starts, stops = [], []
last_was_empty = False
self._file.seek(0)
while True:
line_pos = self._file.tell()
line = self._file.readline()
if line.startswith('From '):
if len(stops) < len(starts):
if last_was_empty:
stops.append(line_pos - len(os.linesep))
else:
# The last line before the "From " line wasn't
# blank, but we consider it a start of a
# message anyway.
stops.append(line_pos)
starts.append(line_pos)
last_was_empty = False
elif not line:
if last_was_empty:
stops.append(line_pos - len(os.linesep))
else:
stops.append(line_pos)
break
elif line == os.linesep:
last_was_empty = True
else:
last_was_empty = False
self._toc = dict(enumerate(zip(starts, stops)))
self._next_key = len(self._toc)
self._file_length = self._file.tell()
class MMDF(_mboxMMDF):
"""An MMDF mailbox."""
def __init__(self, path, factory=None, create=True):
"""Initialize an MMDF mailbox."""
self._message_factory = MMDFMessage
_mboxMMDF.__init__(self, path, factory, create)
def _pre_message_hook(self, f):
"""Called before writing each message to file f."""
f.write('\001\001\001\001' + os.linesep)
def _post_message_hook(self, f):
"""Called after writing each message to file f."""
f.write(os.linesep + '\001\001\001\001' + os.linesep)
def _generate_toc(self):
"""Generate key-to-(start, stop) table of contents."""
starts, stops = [], []
self._file.seek(0)
next_pos = 0
while True:
line_pos = next_pos
line = self._file.readline()
next_pos = self._file.tell()
if line.startswith('\001\001\001\001' + os.linesep):
starts.append(next_pos)
while True:
line_pos = next_pos
line = self._file.readline()
next_pos = self._file.tell()
if line == '\001\001\001\001' + os.linesep:
stops.append(line_pos - len(os.linesep))
break
elif line == '':
stops.append(line_pos)
break
elif line == '':
break
self._toc = dict(enumerate(zip(starts, stops)))
self._next_key = len(self._toc)
self._file.seek(0, 2)
self._file_length = self._file.tell()
class MH(Mailbox):
"""An MH mailbox."""
def __init__(self, path, factory=None, create=True):
"""Initialize an MH instance."""
Mailbox.__init__(self, path, factory, create)
if not os.path.exists(self._path):
if create:
os.mkdir(self._path, 0700)
os.close(os.open(os.path.join(self._path, '.mh_sequences'),
os.O_CREAT | os.O_EXCL | os.O_WRONLY, 0600))
else:
raise NoSuchMailboxError(self._path)
self._locked = False
def add(self, message):
"""Add message and return assigned key."""
keys = self.keys()
if len(keys) == 0:
new_key = 1
else:
new_key = max(keys) + 1
new_path = os.path.join(self._path, str(new_key))
f = _create_carefully(new_path)
closed = False
try:
if self._locked:
_lock_file(f)
try:
try:
self._dump_message(message, f)
except BaseException:
# Unlock and close so it can be deleted on Windows
if self._locked:
_unlock_file(f)
_sync_close(f)
closed = True
os.remove(new_path)
raise
if isinstance(message, MHMessage):
self._dump_sequences(message, new_key)
finally:
if self._locked:
_unlock_file(f)
finally:
if not closed:
_sync_close(f)
return new_key
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
path = os.path.join(self._path, str(key))
try:
f = open(path, 'rb+')
except IOError, e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
else:
f.close()
os.remove(path)
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
path = os.path.join(self._path, str(key))
try:
f = open(path, 'rb+')
except IOError, e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
try:
if self._locked:
_lock_file(f)
try:
os.close(os.open(path, os.O_WRONLY | os.O_TRUNC))
self._dump_message(message, f)
if isinstance(message, MHMessage):
self._dump_sequences(message, key)
finally:
if self._locked:
_unlock_file(f)
finally:
_sync_close(f)
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
try:
if self._locked:
f = open(os.path.join(self._path, str(key)), 'r+')
else:
f = open(os.path.join(self._path, str(key)), 'r')
except IOError, e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
try:
if self._locked:
_lock_file(f)
try:
msg = MHMessage(f)
finally:
if self._locked:
_unlock_file(f)
finally:
f.close()
for name, key_list in self.get_sequences().iteritems():
if key in key_list:
msg.add_sequence(name)
return msg
def get_string(self, key):
"""Return a string representation or raise a KeyError."""
try:
if self._locked:
f = open(os.path.join(self._path, str(key)), 'r+')
else:
f = open(os.path.join(self._path, str(key)), 'r')
except IOError, e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
try:
if self._locked:
_lock_file(f)
try:
return f.read()
finally:
if self._locked:
_unlock_file(f)
finally:
f.close()
def get_file(self, key):
"""Return a file-like representation or raise a KeyError."""
try:
f = open(os.path.join(self._path, str(key)), 'rb')
except IOError, e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
return _ProxyFile(f)
def iterkeys(self):
"""Return an iterator over keys."""
return iter(sorted(int(entry) for entry in os.listdir(self._path)
if entry.isdigit()))
def has_key(self, key):
"""Return True if the keyed message exists, False otherwise."""
return os.path.exists(os.path.join(self._path, str(key)))
def __len__(self):
"""Return a count of messages in the mailbox."""
return len(list(self.iterkeys()))
def lock(self):
"""Lock the mailbox."""
if not self._locked:
self._file = open(os.path.join(self._path, '.mh_sequences'), 'rb+')
_lock_file(self._file)
self._locked = True
def unlock(self):
"""Unlock the mailbox if it is locked."""
if self._locked:
_unlock_file(self._file)
_sync_close(self._file)
del self._file
self._locked = False
def flush(self):
"""Write any pending changes to the disk."""
return
def close(self):
"""Flush and close the mailbox."""
if self._locked:
self.unlock()
def list_folders(self):
"""Return a list of folder names."""
result = []
for entry in os.listdir(self._path):
if os.path.isdir(os.path.join(self._path, entry)):
result.append(entry)
return result
def get_folder(self, folder):
"""Return an MH instance for the named folder."""
return MH(os.path.join(self._path, folder),
factory=self._factory, create=False)
def add_folder(self, folder):
"""Create a folder and return an MH instance representing it."""
return MH(os.path.join(self._path, folder),
factory=self._factory)
def remove_folder(self, folder):
"""Delete the named folder, which must be empty."""
path = os.path.join(self._path, folder)
entries = os.listdir(path)
if entries == ['.mh_sequences']:
os.remove(os.path.join(path, '.mh_sequences'))
elif entries == []:
pass
else:
raise NotEmptyError('Folder not empty: %s' % self._path)
os.rmdir(path)
def get_sequences(self):
"""Return a name-to-key-list dictionary to define each sequence."""
results = {}
f = open(os.path.join(self._path, '.mh_sequences'), 'r')
try:
all_keys = set(self.keys())
for line in f:
try:
name, contents = line.split(':')
keys = set()
for spec in contents.split():
if spec.isdigit():
keys.add(int(spec))
else:
start, stop = (int(x) for x in spec.split('-'))
keys.update(range(start, stop + 1))
results[name] = [key for key in sorted(keys) \
if key in all_keys]
if len(results[name]) == 0:
del results[name]
except ValueError:
raise FormatError('Invalid sequence specification: %s' %
line.rstrip())
finally:
f.close()
return results
def set_sequences(self, sequences):
"""Set sequences using the given name-to-key-list dictionary."""
f = open(os.path.join(self._path, '.mh_sequences'), 'r+')
try:
os.close(os.open(f.name, os.O_WRONLY | os.O_TRUNC))
for name, keys in sequences.iteritems():
if len(keys) == 0:
continue
f.write('%s:' % name)
prev = None
completing = False
for key in sorted(set(keys)):
if key - 1 == prev:
if not completing:
completing = True
f.write('-')
elif completing:
completing = False
f.write('%s %s' % (prev, key))
else:
f.write(' %s' % key)
prev = key
if completing:
f.write(str(prev) + '\n')
else:
f.write('\n')
finally:
_sync_close(f)
def pack(self):
"""Re-name messages to eliminate numbering gaps. Invalidates keys."""
sequences = self.get_sequences()
prev = 0
changes = []
for key in self.iterkeys():
if key - 1 != prev:
changes.append((key, prev + 1))
if hasattr(os, 'link'):
os.link(os.path.join(self._path, str(key)),
os.path.join(self._path, str(prev + 1)))
os.unlink(os.path.join(self._path, str(key)))
else:
os.rename(os.path.join(self._path, str(key)),
os.path.join(self._path, str(prev + 1)))
prev += 1
self._next_key = prev + 1
if len(changes) == 0:
return
for name, key_list in sequences.items():
for old, new in changes:
if old in key_list:
key_list[key_list.index(old)] = new
self.set_sequences(sequences)
def _dump_sequences(self, message, key):
"""Inspect a new MHMessage and update sequences appropriately."""
pending_sequences = message.get_sequences()
all_sequences = self.get_sequences()
for name, key_list in all_sequences.iteritems():
if name in pending_sequences:
key_list.append(key)
elif key in key_list:
del key_list[key_list.index(key)]
for sequence in pending_sequences:
if sequence not in all_sequences:
all_sequences[sequence] = [key]
self.set_sequences(all_sequences)
class Babyl(_singlefileMailbox):
"""An Rmail-style Babyl mailbox."""
_special_labels = frozenset(('unseen', 'deleted', 'filed', 'answered',
'forwarded', 'edited', 'resent'))
def __init__(self, path, factory=None, create=True):
"""Initialize a Babyl mailbox."""
_singlefileMailbox.__init__(self, path, factory, create)
self._labels = {}
def add(self, message):
"""Add message and return assigned key."""
key = _singlefileMailbox.add(self, message)
if isinstance(message, BabylMessage):
self._labels[key] = message.get_labels()
return key
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
_singlefileMailbox.remove(self, key)
if key in self._labels:
del self._labels[key]
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
_singlefileMailbox.__setitem__(self, key, message)
if isinstance(message, BabylMessage):
self._labels[key] = message.get_labels()
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
self._file.readline() # Skip '1,' line specifying labels.
original_headers = StringIO.StringIO()
while True:
line = self._file.readline()
if line == '*** EOOH ***' + os.linesep or line == '':
break
original_headers.write(line.replace(os.linesep, '\n'))
visible_headers = StringIO.StringIO()
while True:
line = self._file.readline()
if line == os.linesep or line == '':
break
visible_headers.write(line.replace(os.linesep, '\n'))
body = self._file.read(stop - self._file.tell()).replace(os.linesep,
'\n')
msg = BabylMessage(original_headers.getvalue() + body)
msg.set_visible(visible_headers.getvalue())
if key in self._labels:
msg.set_labels(self._labels[key])
return msg
def get_string(self, key):
"""Return a string representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
self._file.readline() # Skip '1,' line specifying labels.
original_headers = StringIO.StringIO()
while True:
line = self._file.readline()
if line == '*** EOOH ***' + os.linesep or line == '':
break
original_headers.write(line.replace(os.linesep, '\n'))
while True:
line = self._file.readline()
if line == os.linesep or line == '':
break
return original_headers.getvalue() + \
self._file.read(stop - self._file.tell()).replace(os.linesep,
'\n')
def get_file(self, key):
"""Return a file-like representation or raise a KeyError."""
return StringIO.StringIO(self.get_string(key).replace('\n',
os.linesep))
def get_labels(self):
"""Return a list of user-defined labels in the mailbox."""
self._lookup()
labels = set()
for label_list in self._labels.values():
labels.update(label_list)
labels.difference_update(self._special_labels)
return list(labels)
def _generate_toc(self):
"""Generate key-to-(start, stop) table of contents."""
starts, stops = [], []
self._file.seek(0)
next_pos = 0
label_lists = []
while True:
line_pos = next_pos
line = self._file.readline()
next_pos = self._file.tell()
if line == '\037\014' + os.linesep:
if len(stops) < len(starts):
stops.append(line_pos - len(os.linesep))
starts.append(next_pos)
labels = [label.strip() for label
in self._file.readline()[1:].split(',')
if label.strip() != '']
label_lists.append(labels)
elif line == '\037' or line == '\037' + os.linesep:
if len(stops) < len(starts):
stops.append(line_pos - len(os.linesep))
elif line == '':
stops.append(line_pos - len(os.linesep))
break
self._toc = dict(enumerate(zip(starts, stops)))
self._labels = dict(enumerate(label_lists))
self._next_key = len(self._toc)
self._file.seek(0, 2)
self._file_length = self._file.tell()
def _pre_mailbox_hook(self, f):
"""Called before writing the mailbox to file f."""
f.write('BABYL OPTIONS:%sVersion: 5%sLabels:%s%s\037' %
(os.linesep, os.linesep, ','.join(self.get_labels()),
os.linesep))
def _pre_message_hook(self, f):
"""Called before writing each message to file f."""
f.write('\014' + os.linesep)
def _post_message_hook(self, f):
"""Called after writing each message to file f."""
f.write(os.linesep + '\037')
def _install_message(self, message):
"""Write message contents and return (start, stop)."""
start = self._file.tell()
if isinstance(message, BabylMessage):
special_labels = []
labels = []
for label in message.get_labels():
if label in self._special_labels:
special_labels.append(label)
else:
labels.append(label)
self._file.write('1')
for label in special_labels:
self._file.write(', ' + label)
self._file.write(',,')
for label in labels:
self._file.write(' ' + label + ',')
self._file.write(os.linesep)
else:
self._file.write('1,,' + os.linesep)
if isinstance(message, email.message.Message):
orig_buffer = StringIO.StringIO()
orig_generator = email.generator.Generator(orig_buffer, False, 0)
orig_generator.flatten(message)
orig_buffer.seek(0)
while True:
line = orig_buffer.readline()
self._file.write(line.replace('\n', os.linesep))
if line == '\n' or line == '':
break
self._file.write('*** EOOH ***' + os.linesep)
if isinstance(message, BabylMessage):
vis_buffer = StringIO.StringIO()
vis_generator = email.generator.Generator(vis_buffer, False, 0)
vis_generator.flatten(message.get_visible())
while True:
line = vis_buffer.readline()
self._file.write(line.replace('\n', os.linesep))
if line == '\n' or line == '':
break
else:
orig_buffer.seek(0)
while True:
line = orig_buffer.readline()
self._file.write(line.replace('\n', os.linesep))
if line == '\n' or line == '':
break
while True:
buffer = orig_buffer.read(4096) # Buffer size is arbitrary.
if buffer == '':
break
self._file.write(buffer.replace('\n', os.linesep))
elif isinstance(message, str):
body_start = message.find('\n\n') + 2
if body_start - 2 != -1:
self._file.write(message[:body_start].replace('\n',
os.linesep))
self._file.write('*** EOOH ***' + os.linesep)
self._file.write(message[:body_start].replace('\n',
os.linesep))
self._file.write(message[body_start:].replace('\n',
os.linesep))
else:
self._file.write('*** EOOH ***' + os.linesep + os.linesep)
self._file.write(message.replace('\n', os.linesep))
elif hasattr(message, 'readline'):
original_pos = message.tell()
first_pass = True
while True:
line = message.readline()
self._file.write(line.replace('\n', os.linesep))
if line == '\n' or line == '':
if first_pass:
first_pass = False
self._file.write('*** EOOH ***' + os.linesep)
message.seek(original_pos)
else:
break
while True:
buffer = message.read(4096) # Buffer size is arbitrary.
if buffer == '':
break
self._file.write(buffer.replace('\n', os.linesep))
else:
raise TypeError('Invalid message type: %s' % type(message))
stop = self._file.tell()
return (start, stop)
class Message(email.message.Message):
"""Message with mailbox-format-specific properties."""
def __init__(self, message=None):
"""Initialize a Message instance."""
if isinstance(message, email.message.Message):
self._become_message(copy.deepcopy(message))
if isinstance(message, Message):
message._explain_to(self)
elif isinstance(message, str):
self._become_message(email.message_from_string(message))
elif hasattr(message, "read"):
self._become_message(email.message_from_file(message))
elif message is None:
email.message.Message.__init__(self)
else:
raise TypeError('Invalid message type: %s' % type(message))
def _become_message(self, message):
"""Assume the non-format-specific state of message."""
for name in ('_headers', '_unixfrom', '_payload', '_charset',
'preamble', 'epilogue', 'defects', '_default_type'):
self.__dict__[name] = message.__dict__[name]
def _explain_to(self, message):
"""Copy format-specific state to message insofar as possible."""
if isinstance(message, Message):
return # There's nothing format-specific to explain.
else:
raise TypeError('Cannot convert to specified type')
class MaildirMessage(Message):
"""Message with Maildir-specific properties."""
def __init__(self, message=None):
"""Initialize a MaildirMessage instance."""
self._subdir = 'new'
self._info = ''
self._date = time.time()
Message.__init__(self, message)
def get_subdir(self):
"""Return 'new' or 'cur'."""
return self._subdir
def set_subdir(self, subdir):
"""Set subdir to 'new' or 'cur'."""
if subdir == 'new' or subdir == 'cur':
self._subdir = subdir
else:
raise ValueError("subdir must be 'new' or 'cur': %s" % subdir)
def get_flags(self):
"""Return as a string the flags that are set."""
if self._info.startswith('2,'):
return self._info[2:]
else:
return ''
def set_flags(self, flags):
"""Set the given flags and unset all others."""
self._info = '2,' + ''.join(sorted(flags))
def add_flag(self, flag):
"""Set the given flag(s) without changing others."""
self.set_flags(''.join(set(self.get_flags()) | set(flag)))
def remove_flag(self, flag):
"""Unset the given string flag(s) without changing others."""
if self.get_flags() != '':
self.set_flags(''.join(set(self.get_flags()) - set(flag)))
def get_date(self):
"""Return delivery date of message, in seconds since the epoch."""
return self._date
def set_date(self, date):
"""Set delivery date of message, in seconds since the epoch."""
try:
self._date = float(date)
except ValueError:
raise TypeError("can't convert to float: %s" % date)
def get_info(self):
"""Get the message's "info" as a string."""
return self._info
def set_info(self, info):
"""Set the message's "info" string."""
if isinstance(info, str):
self._info = info
else:
raise TypeError('info must be a string: %s' % type(info))
def _explain_to(self, message):
"""Copy Maildir-specific state to message insofar as possible."""
if isinstance(message, MaildirMessage):
message.set_flags(self.get_flags())
message.set_subdir(self.get_subdir())
message.set_date(self.get_date())
elif isinstance(message, _mboxMMDFMessage):
flags = set(self.get_flags())
if 'S' in flags:
message.add_flag('R')
if self.get_subdir() == 'cur':
message.add_flag('O')
if 'T' in flags:
message.add_flag('D')
if 'F' in flags:
message.add_flag('F')
if 'R' in flags:
message.add_flag('A')
message.set_from('MAILER-DAEMON', time.gmtime(self.get_date()))
elif isinstance(message, MHMessage):
flags = set(self.get_flags())
if 'S' not in flags:
message.add_sequence('unseen')
if 'R' in flags:
message.add_sequence('replied')
if 'F' in flags:
message.add_sequence('flagged')
elif isinstance(message, BabylMessage):
flags = set(self.get_flags())
if 'S' not in flags:
message.add_label('unseen')
if 'T' in flags:
message.add_label('deleted')
if 'R' in flags:
message.add_label('answered')
if 'P' in flags:
message.add_label('forwarded')
elif isinstance(message, Message):
pass
else:
raise TypeError('Cannot convert to specified type: %s' %
type(message))
class _mboxMMDFMessage(Message):
"""Message with mbox- or MMDF-specific properties."""
def __init__(self, message=None):
"""Initialize an mboxMMDFMessage instance."""
self.set_from('MAILER-DAEMON', True)
if isinstance(message, email.message.Message):
unixfrom = message.get_unixfrom()
if unixfrom is not None and unixfrom.startswith('From '):
self.set_from(unixfrom[5:])
Message.__init__(self, message)
def get_from(self):
"""Return contents of "From " line."""
return self._from
def set_from(self, from_, time_=None):
"""Set "From " line, formatting and appending time_ if specified."""
if time_ is not None:
if time_ is True:
time_ = time.gmtime()
from_ += ' ' + time.asctime(time_)
self._from = from_
def get_flags(self):
"""Return as a string the flags that are set."""
return self.get('Status', '') + self.get('X-Status', '')
def set_flags(self, flags):
"""Set the given flags and unset all others."""
flags = set(flags)
status_flags, xstatus_flags = '', ''
for flag in ('R', 'O'):
if flag in flags:
status_flags += flag
flags.remove(flag)
for flag in ('D', 'F', 'A'):
if flag in flags:
xstatus_flags += flag
flags.remove(flag)
xstatus_flags += ''.join(sorted(flags))
try:
self.replace_header('Status', status_flags)
except KeyError:
self.add_header('Status', status_flags)
try:
self.replace_header('X-Status', xstatus_flags)
except KeyError:
self.add_header('X-Status', xstatus_flags)
def add_flag(self, flag):
"""Set the given flag(s) without changing others."""
self.set_flags(''.join(set(self.get_flags()) | set(flag)))
def remove_flag(self, flag):
"""Unset the given string flag(s) without changing others."""
if 'Status' in self or 'X-Status' in self:
self.set_flags(''.join(set(self.get_flags()) - set(flag)))
def _explain_to(self, message):
"""Copy mbox- or MMDF-specific state to message insofar as possible."""
if isinstance(message, MaildirMessage):
flags = set(self.get_flags())
if 'O' in flags:
message.set_subdir('cur')
if 'F' in flags:
message.add_flag('F')
if 'A' in flags:
message.add_flag('R')
if 'R' in flags:
message.add_flag('S')
if 'D' in flags:
message.add_flag('T')
del message['status']
del message['x-status']
maybe_date = ' '.join(self.get_from().split()[-5:])
try:
message.set_date(calendar.timegm(time.strptime(maybe_date,
'%a %b %d %H:%M:%S %Y')))
except (ValueError, OverflowError):
pass
elif isinstance(message, _mboxMMDFMessage):
message.set_flags(self.get_flags())
message.set_from(self.get_from())
elif isinstance(message, MHMessage):
flags = set(self.get_flags())
if 'R' not in flags:
message.add_sequence('unseen')
if 'A' in flags:
message.add_sequence('replied')
if 'F' in flags:
message.add_sequence('flagged')
del message['status']
del message['x-status']
elif isinstance(message, BabylMessage):
flags = set(self.get_flags())
if 'R' not in flags:
message.add_label('unseen')
if 'D' in flags:
message.add_label('deleted')
if 'A' in flags:
message.add_label('answered')
del message['status']
del message['x-status']
elif isinstance(message, Message):
pass
else:
raise TypeError('Cannot convert to specified type: %s' %
type(message))
class mboxMessage(_mboxMMDFMessage):
"""Message with mbox-specific properties."""
class MHMessage(Message):
"""Message with MH-specific properties."""
def __init__(self, message=None):
"""Initialize an MHMessage instance."""
self._sequences = []
Message.__init__(self, message)
def get_sequences(self):
"""Return a list of sequences that include the message."""
return self._sequences[:]
def set_sequences(self, sequences):
"""Set the list of sequences that include the message."""
self._sequences = list(sequences)
def add_sequence(self, sequence):
"""Add sequence to list of sequences including the message."""
if isinstance(sequence, str):
if not sequence in self._sequences:
self._sequences.append(sequence)
else:
raise TypeError('sequence must be a string: %s' % type(sequence))
def remove_sequence(self, sequence):
"""Remove sequence from the list of sequences including the message."""
try:
self._sequences.remove(sequence)
except ValueError:
pass
def _explain_to(self, message):
"""Copy MH-specific state to message insofar as possible."""
if isinstance(message, MaildirMessage):
sequences = set(self.get_sequences())
if 'unseen' in sequences:
message.set_subdir('cur')
else:
message.set_subdir('cur')
message.add_flag('S')
if 'flagged' in sequences:
message.add_flag('F')
if 'replied' in sequences:
message.add_flag('R')
elif isinstance(message, _mboxMMDFMessage):
sequences = set(self.get_sequences())
if 'unseen' not in sequences:
message.add_flag('RO')
else:
message.add_flag('O')
if 'flagged' in sequences:
message.add_flag('F')
if 'replied' in sequences:
message.add_flag('A')
elif isinstance(message, MHMessage):
for sequence in self.get_sequences():
message.add_sequence(sequence)
elif isinstance(message, BabylMessage):
sequences = set(self.get_sequences())
if 'unseen' in sequences:
message.add_label('unseen')
if 'replied' in sequences:
message.add_label('answered')
elif isinstance(message, Message):
pass
else:
raise TypeError('Cannot convert to specified type: %s' %
type(message))
class BabylMessage(Message):
"""Message with Babyl-specific properties."""
def __init__(self, message=None):
"""Initialize a BabylMessage instance."""
self._labels = []
self._visible = Message()
Message.__init__(self, message)
def get_labels(self):
"""Return a list of labels on the message."""
return self._labels[:]
def set_labels(self, labels):
"""Set the list of labels on the message."""
self._labels = list(labels)
def add_label(self, label):
"""Add label to list of labels on the message."""
if isinstance(label, str):
if label not in self._labels:
self._labels.append(label)
else:
raise TypeError('label must be a string: %s' % type(label))
def remove_label(self, label):
"""Remove label from the list of labels on the message."""
try:
self._labels.remove(label)
except ValueError:
pass
def get_visible(self):
"""Return a Message representation of visible headers."""
return Message(self._visible)
def set_visible(self, visible):
"""Set the Message representation of visible headers."""
self._visible = Message(visible)
def update_visible(self):
"""Update and/or sensibly generate a set of visible headers."""
for header in self._visible.keys():
if header in self:
self._visible.replace_header(header, self[header])
else:
del self._visible[header]
for header in ('Date', 'From', 'Reply-To', 'To', 'CC', 'Subject'):
if header in self and header not in self._visible:
self._visible[header] = self[header]
def _explain_to(self, message):
"""Copy Babyl-specific state to message insofar as possible."""
if isinstance(message, MaildirMessage):
labels = set(self.get_labels())
if 'unseen' in labels:
message.set_subdir('cur')
else:
message.set_subdir('cur')
message.add_flag('S')
if 'forwarded' in labels or 'resent' in labels:
message.add_flag('P')
if 'answered' in labels:
message.add_flag('R')
if 'deleted' in labels:
message.add_flag('T')
elif isinstance(message, _mboxMMDFMessage):
labels = set(self.get_labels())
if 'unseen' not in labels:
message.add_flag('RO')
else:
message.add_flag('O')
if 'deleted' in labels:
message.add_flag('D')
if 'answered' in labels:
message.add_flag('A')
elif isinstance(message, MHMessage):
labels = set(self.get_labels())
if 'unseen' in labels:
message.add_sequence('unseen')
if 'answered' in labels:
message.add_sequence('replied')
elif isinstance(message, BabylMessage):
message.set_visible(self.get_visible())
for label in self.get_labels():
message.add_label(label)
elif isinstance(message, Message):
pass
else:
raise TypeError('Cannot convert to specified type: %s' %
type(message))
class MMDFMessage(_mboxMMDFMessage):
"""Message with MMDF-specific properties."""
class _ProxyFile:
"""A read-only wrapper of a file."""
def __init__(self, f, pos=None):
"""Initialize a _ProxyFile."""
self._file = f
if pos is None:
self._pos = f.tell()
else:
self._pos = pos
def read(self, size=None):
"""Read bytes."""
return self._read(size, self._file.read)
def readline(self, size=None):
"""Read a line."""
return self._read(size, self._file.readline)
def readlines(self, sizehint=None):
"""Read multiple lines."""
result = []
for line in self:
result.append(line)
if sizehint is not None:
sizehint -= len(line)
if sizehint <= 0:
break
return result
def __iter__(self):
"""Iterate over lines."""
return iter(self.readline, "")
def tell(self):
"""Return the position."""
return self._pos
def seek(self, offset, whence=0):
"""Change position."""
if whence == 1:
self._file.seek(self._pos)
self._file.seek(offset, whence)
self._pos = self._file.tell()
def close(self):
"""Close the file."""
if hasattr(self, '_file'):
if hasattr(self._file, 'close'):
self._file.close()
del self._file
def _read(self, size, read_method):
"""Read size bytes using read_method."""
if size is None:
size = -1
self._file.seek(self._pos)
result = read_method(size)
self._pos = self._file.tell()
return result
class _PartialFile(_ProxyFile):
"""A read-only wrapper of part of a file."""
def __init__(self, f, start=None, stop=None):
"""Initialize a _PartialFile."""
_ProxyFile.__init__(self, f, start)
self._start = start
self._stop = stop
def tell(self):
"""Return the position with respect to start."""
return _ProxyFile.tell(self) - self._start
def seek(self, offset, whence=0):
"""Change position, possibly with respect to start or stop."""
if whence == 0:
self._pos = self._start
whence = 1
elif whence == 2:
self._pos = self._stop
whence = 1
_ProxyFile.seek(self, offset, whence)
def _read(self, size, read_method):
"""Read size bytes using read_method, honoring start and stop."""
remaining = self._stop - self._pos
if remaining <= 0:
return ''
if size is None or size < 0 or size > remaining:
size = remaining
return _ProxyFile._read(self, size, read_method)
def close(self):
# do *not* close the underlying file object for partial files,
# since it's global to the mailbox object
if hasattr(self, '_file'):
del self._file
def _lock_file(f, dotlock=True):
"""Lock file f using lockf and dot locking."""
dotlock_done = False
try:
if fcntl:
try:
fcntl.lockf(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError, e:
if e.errno in (errno.EAGAIN, errno.EACCES, errno.EROFS):
raise ExternalClashError('lockf: lock unavailable: %s' %
f.name)
else:
raise
if dotlock:
try:
pre_lock = _create_temporary(f.name + '.lock')
pre_lock.close()
except IOError, e:
if e.errno in (errno.EACCES, errno.EROFS):
return # Without write access, just skip dotlocking.
else:
raise
try:
if hasattr(os, 'link'):
os.link(pre_lock.name, f.name + '.lock')
dotlock_done = True
os.unlink(pre_lock.name)
else:
os.rename(pre_lock.name, f.name + '.lock')
dotlock_done = True
except OSError, e:
if e.errno == errno.EEXIST or \
(os.name == 'os2' and e.errno == errno.EACCES):
os.remove(pre_lock.name)
raise ExternalClashError('dot lock unavailable: %s' %
f.name)
else:
raise
except:
if fcntl:
fcntl.lockf(f, fcntl.LOCK_UN)
if dotlock_done:
os.remove(f.name + '.lock')
raise
def _unlock_file(f):
"""Unlock file f using lockf and dot locking."""
if fcntl:
fcntl.lockf(f, fcntl.LOCK_UN)
if os.path.exists(f.name + '.lock'):
os.remove(f.name + '.lock')
def _create_carefully(path):
"""Create a file if it doesn't exist and open for reading and writing."""
fd = os.open(path, os.O_CREAT | os.O_EXCL | os.O_RDWR, 0666)
try:
return open(path, 'rb+')
finally:
os.close(fd)
def _create_temporary(path):
"""Create a temp file based on path and open for reading and writing."""
return _create_carefully('%s.%s.%s.%s' % (path, int(time.time()),
socket.gethostname(),
os.getpid()))
def _sync_flush(f):
"""Ensure changes to file f are physically on disk."""
f.flush()
if hasattr(os, 'fsync'):
os.fsync(f.fileno())
def _sync_close(f):
"""Close file f, ensuring all changes are physically on disk."""
_sync_flush(f)
f.close()
## Start: classes from the original module (for backward compatibility).
# Note that the Maildir class, whose name is unchanged, itself offers a next()
# method for backward compatibility.
class _Mailbox:
def __init__(self, fp, factory=rfc822.Message):
self.fp = fp
self.seekp = 0
self.factory = factory
def __iter__(self):
return iter(self.next, None)
def next(self):
while 1:
self.fp.seek(self.seekp)
try:
self._search_start()
except EOFError:
self.seekp = self.fp.tell()
return None
start = self.fp.tell()
self._search_end()
self.seekp = stop = self.fp.tell()
if start != stop:
break
return self.factory(_PartialFile(self.fp, start, stop))
# Recommended to use PortableUnixMailbox instead!
class UnixMailbox(_Mailbox):
def _search_start(self):
while 1:
pos = self.fp.tell()
line = self.fp.readline()
if not line:
raise EOFError
if line[:5] == 'From ' and self._isrealfromline(line):
self.fp.seek(pos)
return
def _search_end(self):
self.fp.readline() # Throw away header line
while 1:
pos = self.fp.tell()
line = self.fp.readline()
if not line:
return
if line[:5] == 'From ' and self._isrealfromline(line):
self.fp.seek(pos)
return
# An overridable mechanism to test for From-line-ness. You can either
# specify a different regular expression or define a whole new
# _isrealfromline() method. Note that this only gets called for lines
# starting with the 5 characters "From ".
#
# BAW: According to
#http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html
# the only portable, reliable way to find message delimiters in a BSD (i.e
# Unix mailbox) style folder is to search for "\n\nFrom .*\n", or at the
# beginning of the file, "^From .*\n". While _fromlinepattern below seems
# like a good idea, in practice, there are too many variations for more
# strict parsing of the line to be completely accurate.
#
# _strict_isrealfromline() is the old version which tries to do stricter
# parsing of the From_ line. _portable_isrealfromline() simply returns
# true, since it's never called if the line doesn't already start with
# "From ".
#
# This algorithm, and the way it interacts with _search_start() and
# _search_end() may not be completely correct, because it doesn't check
# that the two characters preceding "From " are \n\n or the beginning of
# the file. Fixing this would require a more extensive rewrite than is
# necessary. For convenience, we've added a PortableUnixMailbox class
# which does no checking of the format of the 'From' line.
_fromlinepattern = (r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+"
r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*"
r"[^\s]*\s*"
"$")
_regexp = None
def _strict_isrealfromline(self, line):
if not self._regexp:
import re
self._regexp = re.compile(self._fromlinepattern)
return self._regexp.match(line)
def _portable_isrealfromline(self, line):
return True
_isrealfromline = _strict_isrealfromline
class PortableUnixMailbox(UnixMailbox):
_isrealfromline = UnixMailbox._portable_isrealfromline
class MmdfMailbox(_Mailbox):
def _search_start(self):
while 1:
line = self.fp.readline()
if not line:
raise EOFError
if line[:5] == '\001\001\001\001\n':
return
def _search_end(self):
while 1:
pos = self.fp.tell()
line = self.fp.readline()
if not line:
return
if line == '\001\001\001\001\n':
self.fp.seek(pos)
return
class MHMailbox:
def __init__(self, dirname, factory=rfc822.Message):
import re
pat = re.compile('^[1-9][0-9]*$')
self.dirname = dirname
# the three following lines could be combined into:
# list = map(long, filter(pat.match, os.listdir(self.dirname)))
list = os.listdir(self.dirname)
list = filter(pat.match, list)
list = map(long, list)
list.sort()
# This only works in Python 1.6 or later;
# before that str() added 'L':
self.boxes = map(str, list)
self.boxes.reverse()
self.factory = factory
def __iter__(self):
return iter(self.next, None)
def next(self):
if not self.boxes:
return None
fn = self.boxes.pop()
fp = open(os.path.join(self.dirname, fn))
msg = self.factory(fp)
try:
msg._mh_msgno = fn
except (AttributeError, TypeError):
pass
return msg
class BabylMailbox(_Mailbox):
def _search_start(self):
while 1:
line = self.fp.readline()
if not line:
raise EOFError
if line == '*** EOOH ***\n':
return
def _search_end(self):
while 1:
pos = self.fp.tell()
line = self.fp.readline()
if not line:
return
if line == '\037\014\n' or line == '\037':
self.fp.seek(pos)
return
## End: classes from the original module (for backward compatibility).
class Error(Exception):
"""Raised for module-specific errors."""
class NoSuchMailboxError(Error):
"""The specified mailbox does not exist and won't be created."""
class NotEmptyError(Error):
"""The specified mailbox is not empty and deletion was requested."""
class ExternalClashError(Error):
"""Another process caused an action to fail."""
class FormatError(Error):
"""A file appears to have an invalid format."""
"""Mailcap file handling. See RFC 1524."""
import os
__all__ = ["getcaps","findmatch"]
# Part 1: top-level interface.
def getcaps():
"""Return a dictionary containing the mailcap database.
The dictionary maps a MIME type (in all lowercase, e.g. 'text/plain')
to a list of dictionaries corresponding to mailcap entries. The list
collects all the entries for that MIME type from all available mailcap
files. Each dictionary contains key-value pairs for that MIME type,
where the viewing command is stored with the key "view".
"""
caps = {}
for mailcap in listmailcapfiles():
try:
fp = open(mailcap, 'r')
except IOError:
continue
with fp:
morecaps = readmailcapfile(fp)
for key, value in morecaps.iteritems():
if not key in caps:
caps[key] = value
else:
caps[key] = caps[key] + value
return caps
def listmailcapfiles():
"""Return a list of all mailcap files found on the system."""
# XXX Actually, this is Unix-specific
if 'MAILCAPS' in os.environ:
str = os.environ['MAILCAPS']
mailcaps = str.split(':')
else:
if 'HOME' in os.environ:
home = os.environ['HOME']
else:
# Don't bother with getpwuid()
home = '.' # Last resort
mailcaps = [home + '/.mailcap', '/etc/mailcap',
'/usr/etc/mailcap', '/usr/local/etc/mailcap']
return mailcaps
# Part 2: the parser.
def readmailcapfile(fp):
"""Read a mailcap file and return a dictionary keyed by MIME type.
Each MIME type is mapped to an entry consisting of a list of
dictionaries; the list will contain more than one such dictionary
if a given MIME type appears more than once in the mailcap file.
Each dictionary contains key-value pairs for that MIME type, where
the viewing command is stored with the key "view".
"""
caps = {}
while 1:
line = fp.readline()
if not line: break
# Ignore comments and blank lines
if line[0] == '#' or line.strip() == '':
continue
nextline = line
# Join continuation lines
while nextline[-2:] == '\\\n':
nextline = fp.readline()
if not nextline: nextline = '\n'
line = line[:-2] + nextline
# Parse the line
key, fields = parseline(line)
if not (key and fields):
continue
# Normalize the key
types = key.split('/')
for j in range(len(types)):
types[j] = types[j].strip()
key = '/'.join(types).lower()
# Update the database
if key in caps:
caps[key].append(fields)
else:
caps[key] = [fields]
return caps
def parseline(line):
"""Parse one entry in a mailcap file and return a dictionary.
The viewing command is stored as the value with the key "view",
and the rest of the fields produce key-value pairs in the dict.
"""
fields = []
i, n = 0, len(line)
while i < n:
field, i = parsefield(line, i, n)
fields.append(field)
i = i+1 # Skip semicolon
if len(fields) < 2:
return None, None
key, view, rest = fields[0], fields[1], fields[2:]
fields = {'view': view}
for field in rest:
i = field.find('=')
if i < 0:
fkey = field
fvalue = ""
else:
fkey = field[:i].strip()
fvalue = field[i+1:].strip()
if fkey in fields:
# Ignore it
pass
else:
fields[fkey] = fvalue
return key, fields
def parsefield(line, i, n):
"""Separate one key-value pair in a mailcap entry."""
start = i
while i < n:
c = line[i]
if c == ';':
break
elif c == '\\':
i = i+2
else:
i = i+1
return line[start:i].strip(), i
# Part 3: using the database.
def findmatch(caps, MIMEtype, key='view', filename="/dev/null", plist=[]):
"""Find a match for a mailcap entry.
Return a tuple containing the command line, and the mailcap entry
used; (None, None) if no match is found. This may invoke the
'test' command of several matching entries before deciding which
entry to use.
"""
entries = lookup(caps, MIMEtype, key)
# XXX This code should somehow check for the needsterminal flag.
for e in entries:
if 'test' in e:
test = subst(e['test'], filename, plist)
if test and os.system(test) != 0:
continue
command = subst(e[key], MIMEtype, filename, plist)
return command, e
return None, None
def lookup(caps, MIMEtype, key=None):
entries = []
if MIMEtype in caps:
entries = entries + caps[MIMEtype]
MIMEtypes = MIMEtype.split('/')
MIMEtype = MIMEtypes[0] + '/*'
if MIMEtype in caps:
entries = entries + caps[MIMEtype]
if key is not None:
entries = filter(lambda e, key=key: key in e, entries)
return entries
def subst(field, MIMEtype, filename, plist=[]):
# XXX Actually, this is Unix-specific
res = ''
i, n = 0, len(field)
while i < n:
c = field[i]; i = i+1
if c != '%':
if c == '\\':
c = field[i:i+1]; i = i+1
res = res + c
else:
c = field[i]; i = i+1
if c == '%':
res = res + c
elif c == 's':
res = res + filename
elif c == 't':
res = res + MIMEtype
elif c == '{':
start = i
while i < n and field[i] != '}':
i = i+1
name = field[start:i]
i = i+1
res = res + findparam(name, plist)
# XXX To do:
# %n == number of parts if type is multipart/*
# %F == list of alternating type and filename for parts
else:
res = res + '%' + c
return res
def findparam(name, plist):
name = name.lower() + '='
n = len(name)
for p in plist:
if p[:n].lower() == name:
return p[n:]
return ''
# Part 4: test program.
def test():
import sys
caps = getcaps()
if not sys.argv[1:]:
show(caps)
return
for i in range(1, len(sys.argv), 2):
args = sys.argv[i:i+2]
if len(args) < 2:
print "usage: mailcap [MIMEtype file] ..."
return
MIMEtype = args[0]
file = args[1]
command, e = findmatch(caps, MIMEtype, 'view', file)
if not command:
print "No viewer found for", type
else:
print "Executing:", command
sts = os.system(command)
if sts:
print "Exit status:", sts
def show(caps):
print "Mailcap files:"
for fn in listmailcapfiles(): print "\t" + fn
print
if not caps: caps = getcaps()
print "Mailcap entries:"
print
ckeys = caps.keys()
ckeys.sort()
for type in ckeys:
print type
entries = caps[type]
for e in entries:
keys = e.keys()
keys.sort()
for k in keys:
print " %-15s" % k, e[k]
print
if __name__ == '__main__':
test()
"""Shared support for scanning document type declarations in HTML and XHTML.
This module is used as a foundation for the HTMLParser and sgmllib
modules (indirectly, for htmllib as well). It has no documented
public API and should not be used directly.
"""
import re
_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*').match
_declstringlit_match = re.compile(r'(\'[^\']*\'|"[^"]*")\s*').match
_commentclose = re.compile(r'--\s*>')
_markedsectionclose = re.compile(r']\s*]\s*>')
# An analysis of the MS-Word extensions is available at
# http://www.planetpublish.com/xmlarena/xap/Thursday/WordtoXML.pdf
_msmarkedsectionclose = re.compile(r']\s*>')
del re
class ParserBase:
"""Parser base class which provides some common support methods used
by the SGML/HTML and XHTML parsers."""
def __init__(self):
if self.__class__ is ParserBase:
raise RuntimeError(
"markupbase.ParserBase must be subclassed")
def error(self, message):
raise NotImplementedError(
"subclasses of ParserBase must override error()")
def reset(self):
self.lineno = 1
self.offset = 0
def getpos(self):
"""Return current line number and offset."""
return self.lineno, self.offset
# Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
def updatepos(self, i, j):
if i >= j:
return j
rawdata = self.rawdata
nlines = rawdata.count("\n", i, j)
if nlines:
self.lineno = self.lineno + nlines
pos = rawdata.rindex("\n", i, j) # Should not fail
self.offset = j-(pos+1)
else:
self.offset = self.offset + j-i
return j
_decl_otherchars = ''
# Internal -- parse declaration (for use by subclasses).
def parse_declaration(self, i):
# This is some sort of declaration; in "HTML as
# deployed," this should only be the document type
# declaration ("<!DOCTYPE html...>").
# ISO 8879:1986, however, has more complex
# declaration syntax for elements in <!...>, including:
# --comment--
# [marked section]
# name in the following list: ENTITY, DOCTYPE, ELEMENT,
# ATTLIST, NOTATION, SHORTREF, USEMAP,
# LINKTYPE, LINK, IDLINK, USELINK, SYSTEM
rawdata = self.rawdata
j = i + 2
assert rawdata[i:j] == "<!", "unexpected call to parse_declaration"
if rawdata[j:j+1] == ">":
# the empty comment <!>
return j + 1
if rawdata[j:j+1] in ("-", ""):
# Start of comment followed by buffer boundary,
# or just a buffer boundary.
return -1
# A simple, practical version could look like: ((name|stringlit) S*) + '>'
n = len(rawdata)
if rawdata[j:j+2] == '--': #comment
# Locate --.*-- as the body of the comment
return self.parse_comment(i)
elif rawdata[j] == '[': #marked section
# Locate [statusWord [...arbitrary SGML...]] as the body of the marked section
# Where statusWord is one of TEMP, CDATA, IGNORE, INCLUDE, RCDATA
# Note that this is extended by Microsoft Office "Save as Web" function
# to include [if...] and [endif].
return self.parse_marked_section(i)
else: #all other declaration elements
decltype, j = self._scan_name(j, i)
if j < 0:
return j
if decltype == "doctype":
self._decl_otherchars = ''
while j < n:
c = rawdata[j]
if c == ">":
# end of declaration syntax
data = rawdata[i+2:j]
if decltype == "doctype":
self.handle_decl(data)
else:
# According to the HTML5 specs sections "8.2.4.44 Bogus
# comment state" and "8.2.4.45 Markup declaration open
# state", a comment token should be emitted.
# Calling unknown_decl provides more flexibility though.
self.unknown_decl(data)
return j + 1
if c in "\"'":
m = _declstringlit_match(rawdata, j)
if not m:
return -1 # incomplete
j = m.end()
elif c in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ":
name, j = self._scan_name(j, i)
elif c in self._decl_otherchars:
j = j + 1
elif c == "[":
# this could be handled in a separate doctype parser
if decltype == "doctype":
j = self._parse_doctype_subset(j + 1, i)
elif decltype in ("attlist", "linktype", "link", "element"):
# must tolerate []'d groups in a content model in an element declaration
# also in data attribute specifications of attlist declaration
# also link type declaration subsets in linktype declarations
# also link attribute specification lists in link declarations
self.error("unsupported '[' char in %s declaration" % decltype)
else:
self.error("unexpected '[' char in declaration")
else:
self.error(
"unexpected %r char in declaration" % rawdata[j])
if j < 0:
return j
return -1 # incomplete
# Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
def parse_marked_section(self, i, report=1):
rawdata= self.rawdata
assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()"
sectName, j = self._scan_name( i+3, i )
if j < 0:
return j
if sectName in ("temp", "cdata", "ignore", "include", "rcdata"):
# look for standard ]]> ending
match= _markedsectionclose.search(rawdata, i+3)
elif sectName in ("if", "else", "endif"):
# look for MS Office ]> ending
match= _msmarkedsectionclose.search(rawdata, i+3)
else:
self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
if not match:
return -1
if report:
j = match.start(0)
self.unknown_decl(rawdata[i+3: j])
return match.end(0)
# Internal -- parse comment, return length or -1 if not terminated
def parse_comment(self, i, report=1):
rawdata = self.rawdata
if rawdata[i:i+4] != '<!--':
self.error('unexpected call to parse_comment()')
match = _commentclose.search(rawdata, i+4)
if not match:
return -1
if report:
j = match.start(0)
self.handle_comment(rawdata[i+4: j])
return match.end(0)
# Internal -- scan past the internal subset in a <!DOCTYPE declaration,
# returning the index just past any whitespace following the trailing ']'.
def _parse_doctype_subset(self, i, declstartpos):
rawdata = self.rawdata
n = len(rawdata)
j = i
while j < n:
c = rawdata[j]
if c == "<":
s = rawdata[j:j+2]
if s == "<":
# end of buffer; incomplete
return -1
if s != "<!":
self.updatepos(declstartpos, j + 1)
self.error("unexpected char in internal subset (in %r)" % s)
if (j + 2) == n:
# end of buffer; incomplete
return -1
if (j + 4) > n:
# end of buffer; incomplete
return -1
if rawdata[j:j+4] == "<!--":
j = self.parse_comment(j, report=0)
if j < 0:
return j
continue
name, j = self._scan_name(j + 2, declstartpos)
if j == -1:
return -1
if name not in ("attlist", "element", "entity", "notation"):
self.updatepos(declstartpos, j + 2)
self.error(
"unknown declaration %r in internal subset" % name)
# handle the individual names
meth = getattr(self, "_parse_doctype_" + name)
j = meth(j, declstartpos)
if j < 0:
return j
elif c == "%":
# parameter entity reference
if (j + 1) == n:
# end of buffer; incomplete
return -1
s, j = self._scan_name(j + 1, declstartpos)
if j < 0:
return j
if rawdata[j] == ";":
j = j + 1
elif c == "]":
j = j + 1
while j < n and rawdata[j].isspace():
j = j + 1
if j < n:
if rawdata[j] == ">":
return j
self.updatepos(declstartpos, j)
self.error("unexpected char after internal subset")
else:
return -1
elif c.isspace():
j = j + 1
else:
self.updatepos(declstartpos, j)
self.error("unexpected char %r in internal subset" % c)
# end of buffer reached
return -1
# Internal -- scan past <!ELEMENT declarations
def _parse_doctype_element(self, i, declstartpos):
name, j = self._scan_name(i, declstartpos)
if j == -1:
return -1
# style content model; just skip until '>'
rawdata = self.rawdata
if '>' in rawdata[j:]:
return rawdata.find(">", j) + 1
return -1
# Internal -- scan past <!ATTLIST declarations
def _parse_doctype_attlist(self, i, declstartpos):
rawdata = self.rawdata
name, j = self._scan_name(i, declstartpos)
c = rawdata[j:j+1]
if c == "":
return -1
if c == ">":
return j + 1
while 1:
# scan a series of attribute descriptions; simplified:
# name type [value] [#constraint]
name, j = self._scan_name(j, declstartpos)
if j < 0:
return j
c = rawdata[j:j+1]
if c == "":
return -1
if c == "(":
# an enumerated type; look for ')'
if ")" in rawdata[j:]:
j = rawdata.find(")", j) + 1
else:
return -1
while rawdata[j:j+1].isspace():
j = j + 1
if not rawdata[j:]:
# end of buffer, incomplete
return -1
else:
name, j = self._scan_name(j, declstartpos)
c = rawdata[j:j+1]
if not c:
return -1
if c in "'\"":
m = _declstringlit_match(rawdata, j)
if m:
j = m.end()
else:
return -1
c = rawdata[j:j+1]
if not c:
return -1
if c == "#":
if rawdata[j:] == "#":
# end of buffer
return -1
name, j = self._scan_name(j + 1, declstartpos)
if j < 0:
return j
c = rawdata[j:j+1]
if not c:
return -1
if c == '>':
# all done
return j + 1
# Internal -- scan past <!NOTATION declarations
def _parse_doctype_notation(self, i, declstartpos):
name, j = self._scan_name(i, declstartpos)
if j < 0:
return j
rawdata = self.rawdata
while 1:
c = rawdata[j:j+1]
if not c:
# end of buffer; incomplete
return -1
if c == '>':
return j + 1
if c in "'\"":
m = _declstringlit_match(rawdata, j)
if not m:
return -1
j = m.end()
else:
name, j = self._scan_name(j, declstartpos)
if j < 0:
return j
# Internal -- scan past <!ENTITY declarations
def _parse_doctype_entity(self, i, declstartpos):
rawdata = self.rawdata
if rawdata[i:i+1] == "%":
j = i + 1
while 1:
c = rawdata[j:j+1]
if not c:
return -1
if c.isspace():
j = j + 1
else:
break
else:
j = i
name, j = self._scan_name(j, declstartpos)
if j < 0:
return j
while 1:
c = self.rawdata[j:j+1]
if not c:
return -1
if c in "'\"":
m = _declstringlit_match(rawdata, j)
if m:
j = m.end()
else:
return -1 # incomplete
elif c == ">":
return j + 1
else:
name, j = self._scan_name(j, declstartpos)
if j < 0:
return j
# Internal -- scan a name token and the new position and the token, or
# return -1 if we've reached the end of the buffer.
def _scan_name(self, i, declstartpos):
rawdata = self.rawdata
n = len(rawdata)
if i == n:
return None, -1
m = _declname_match(rawdata, i)
if m:
s = m.group()
name = s.strip()
if (i + len(s)) == n:
return None, -1 # end of buffer
return name.lower(), m.end()
else:
self.updatepos(declstartpos, i)
self.error("expected name token at %r"
% rawdata[declstartpos:declstartpos+20])
# To be overridden -- handlers for unknown objects
def unknown_decl(self, data):
pass
# $Id$
#
# Copyright (C) 2005 Gregory P. Smith ([email protected])
# Licensed to PSF under a Contributor Agreement.
import warnings
warnings.warn("the md5 module is deprecated; use hashlib instead",
DeprecationWarning, 2)
from hashlib import md5
new = md5
blocksize = 1 # legacy value (wrong in any useful sense)
digest_size = 16
"""MH interface -- purely object-oriented (well, almost)
Executive summary:
import mhlib
mh = mhlib.MH() # use default mailbox directory and profile
mh = mhlib.MH(mailbox) # override mailbox location (default from profile)
mh = mhlib.MH(mailbox, profile) # override mailbox and profile
mh.error(format, ...) # print error message -- can be overridden
s = mh.getprofile(key) # profile entry (None if not set)
path = mh.getpath() # mailbox pathname
name = mh.getcontext() # name of current folder
mh.setcontext(name) # set name of current folder
list = mh.listfolders() # names of top-level folders
list = mh.listallfolders() # names of all folders, including subfolders
list = mh.listsubfolders(name) # direct subfolders of given folder
list = mh.listallsubfolders(name) # all subfolders of given folder
mh.makefolder(name) # create new folder
mh.deletefolder(name) # delete folder -- must have no subfolders
f = mh.openfolder(name) # new open folder object
f.error(format, ...) # same as mh.error(format, ...)
path = f.getfullname() # folder's full pathname
path = f.getsequencesfilename() # full pathname of folder's sequences file
path = f.getmessagefilename(n) # full pathname of message n in folder
list = f.listmessages() # list of messages in folder (as numbers)
n = f.getcurrent() # get current message
f.setcurrent(n) # set current message
list = f.parsesequence(seq) # parse msgs syntax into list of messages
n = f.getlast() # get last message (0 if no messagse)
f.setlast(n) # set last message (internal use only)
dict = f.getsequences() # dictionary of sequences in folder {name: list}
f.putsequences(dict) # write sequences back to folder
f.createmessage(n, fp) # add message from file f as number n
f.removemessages(list) # remove messages in list from folder
f.refilemessages(list, tofolder) # move messages in list to other folder
f.movemessage(n, tofolder, ton) # move one message to a given destination
f.copymessage(n, tofolder, ton) # copy one message to a given destination
m = f.openmessage(n) # new open message object (costs a file descriptor)
m is a derived class of mimetools.Message(rfc822.Message), with:
s = m.getheadertext() # text of message's headers
s = m.getheadertext(pred) # text of message's headers, filtered by pred
s = m.getbodytext() # text of message's body, decoded
s = m.getbodytext(0) # text of message's body, not decoded
"""
from warnings import warnpy3k
warnpy3k("the mhlib module has been removed in Python 3.0; use the mailbox "
"module instead", stacklevel=2)
del warnpy3k
# XXX To do, functionality:
# - annotate messages
# - send messages
#
# XXX To do, organization:
# - move IntSet to separate file
# - move most Message functionality to module mimetools
# Customizable defaults
MH_PROFILE = '~/.mh_profile'
PATH = '~/Mail'
MH_SEQUENCES = '.mh_sequences'
FOLDER_PROTECT = 0700
# Imported modules
import os
import sys
import re
import mimetools
import multifile
import shutil
from bisect import bisect
__all__ = ["MH","Error","Folder","Message"]
# Exported constants
class Error(Exception):
pass
class MH:
"""Class representing a particular collection of folders.
Optional constructor arguments are the pathname for the directory
containing the collection, and the MH profile to use.
If either is omitted or empty a default is used; the default
directory is taken from the MH profile if it is specified there."""
def __init__(self, path = None, profile = None):
"""Constructor."""
if profile is None: profile = MH_PROFILE
self.profile = os.path.expanduser(profile)
if path is None: path = self.getprofile('Path')
if not path: path = PATH
if not os.path.isabs(path) and path[0] != '~':
path = os.path.join('~', path)
path = os.path.expanduser(path)
if not os.path.isdir(path): raise Error, 'MH() path not found'
self.path = path
def __repr__(self):
"""String representation."""
return 'MH(%r, %r)' % (self.path, self.profile)
def error(self, msg, *args):
"""Routine to print an error. May be overridden by a derived class."""
sys.stderr.write('MH error: %s\n' % (msg % args))
def getprofile(self, key):
"""Return a profile entry, None if not found."""
return pickline(self.profile, key)
def getpath(self):
"""Return the path (the name of the collection's directory)."""
return self.path
def getcontext(self):
"""Return the name of the current folder."""
context = pickline(os.path.join(self.getpath(), 'context'),
'Current-Folder')
if not context: context = 'inbox'
return context
def setcontext(self, context):
"""Set the name of the current folder."""
fn = os.path.join(self.getpath(), 'context')
f = open(fn, "w")
f.write("Current-Folder: %s\n" % context)
f.close()
def listfolders(self):
"""Return the names of the top-level folders."""
folders = []
path = self.getpath()
for name in os.listdir(path):
fullname = os.path.join(path, name)
if os.path.isdir(fullname):
folders.append(name)
folders.sort()
return folders
def listsubfolders(self, name):
"""Return the names of the subfolders in a given folder
(prefixed with the given folder name)."""
fullname = os.path.join(self.path, name)
# Get the link count so we can avoid listing folders
# that have no subfolders.
nlinks = os.stat(fullname).st_nlink
if nlinks == 2:
return []
subfolders = []
subnames = os.listdir(fullname)
for subname in subnames:
fullsubname = os.path.join(fullname, subname)
if os.path.isdir(fullsubname):
name_subname = os.path.join(name, subname)
subfolders.append(name_subname)
# Stop looking for subfolders when
# we've seen them all
nlinks = nlinks - 1
if nlinks == 2:
break
subfolders.sort()
return subfolders
def listallfolders(self):
"""Return the names of all folders and subfolders, recursively."""
return self.listallsubfolders('')
def listallsubfolders(self, name):
"""Return the names of subfolders in a given folder, recursively."""
fullname = os.path.join(self.path, name)
# Get the link count so we can avoid listing folders
# that have no subfolders.
nlinks = os.stat(fullname).st_nlink
if nlinks == 2:
return []
subfolders = []
subnames = os.listdir(fullname)
for subname in subnames:
if subname[0] == ',' or isnumeric(subname): continue
fullsubname = os.path.join(fullname, subname)
if os.path.isdir(fullsubname):
name_subname = os.path.join(name, subname)
subfolders.append(name_subname)
if not os.path.islink(fullsubname):
subsubfolders = self.listallsubfolders(
name_subname)
subfolders = subfolders + subsubfolders
# Stop looking for subfolders when
# we've seen them all
nlinks = nlinks - 1
if nlinks == 2:
break
subfolders.sort()
return subfolders
def openfolder(self, name):
"""Return a new Folder object for the named folder."""
return Folder(self, name)
def makefolder(self, name):
"""Create a new folder (or raise os.error if it cannot be created)."""
protect = pickline(self.profile, 'Folder-Protect')
if protect and isnumeric(protect):
mode = int(protect, 8)
else:
mode = FOLDER_PROTECT
os.mkdir(os.path.join(self.getpath(), name), mode)
def deletefolder(self, name):
"""Delete a folder. This removes files in the folder but not
subdirectories. Raise os.error if deleting the folder itself fails."""
fullname = os.path.join(self.getpath(), name)
for subname in os.listdir(fullname):
fullsubname = os.path.join(fullname, subname)
try:
os.unlink(fullsubname)
except os.error:
self.error('%s not deleted, continuing...' %
fullsubname)
os.rmdir(fullname)
numericprog = re.compile('^[1-9][0-9]*$')
def isnumeric(str):
return numericprog.match(str) is not None
class Folder:
"""Class representing a particular folder."""
def __init__(self, mh, name):
"""Constructor."""
self.mh = mh
self.name = name
if not os.path.isdir(self.getfullname()):
raise Error, 'no folder %s' % name
def __repr__(self):
"""String representation."""
return 'Folder(%r, %r)' % (self.mh, self.name)
def error(self, *args):
"""Error message handler."""
self.mh.error(*args)
def getfullname(self):
"""Return the full pathname of the folder."""
return os.path.join(self.mh.path, self.name)
def getsequencesfilename(self):
"""Return the full pathname of the folder's sequences file."""
return os.path.join(self.getfullname(), MH_SEQUENCES)
def getmessagefilename(self, n):
"""Return the full pathname of a message in the folder."""
return os.path.join(self.getfullname(), str(n))
def listsubfolders(self):
"""Return list of direct subfolders."""
return self.mh.listsubfolders(self.name)
def listallsubfolders(self):
"""Return list of all subfolders."""
return self.mh.listallsubfolders(self.name)
def listmessages(self):
"""Return the list of messages currently present in the folder.
As a side effect, set self.last to the last message (or 0)."""
messages = []
match = numericprog.match
append = messages.append
for name in os.listdir(self.getfullname()):
if match(name):
append(name)
messages = map(int, messages)
messages.sort()
if messages:
self.last = messages[-1]
else:
self.last = 0
return messages
def getsequences(self):
"""Return the set of sequences for the folder."""
sequences = {}
fullname = self.getsequencesfilename()
try:
f = open(fullname, 'r')
except IOError:
return sequences
while 1:
line = f.readline()
if not line: break
fields = line.split(':')
if len(fields) != 2:
self.error('bad sequence in %s: %s' %
(fullname, line.strip()))
key = fields[0].strip()
value = IntSet(fields[1].strip(), ' ').tolist()
sequences[key] = value
return sequences
def putsequences(self, sequences):
"""Write the set of sequences back to the folder."""
fullname = self.getsequencesfilename()
f = None
for key, seq in sequences.iteritems():
s = IntSet('', ' ')
s.fromlist(seq)
if not f: f = open(fullname, 'w')
f.write('%s: %s\n' % (key, s.tostring()))
if not f:
try:
os.unlink(fullname)
except os.error:
pass
else:
f.close()
def getcurrent(self):
"""Return the current message. Raise Error when there is none."""
seqs = self.getsequences()
try:
return max(seqs['cur'])
except (ValueError, KeyError):
raise Error, "no cur message"
def setcurrent(self, n):
"""Set the current message."""
updateline(self.getsequencesfilename(), 'cur', str(n), 0)
def parsesequence(self, seq):
"""Parse an MH sequence specification into a message list.
Attempt to mimic mh-sequence(5) as close as possible.
Also attempt to mimic observed behavior regarding which
conditions cause which error messages."""
# XXX Still not complete (see mh-format(5)).
# Missing are:
# - 'prev', 'next' as count
# - Sequence-Negation option
all = self.listmessages()
# Observed behavior: test for empty folder is done first
if not all:
raise Error, "no messages in %s" % self.name
# Common case first: all is frequently the default
if seq == 'all':
return all
# Test for X:Y before X-Y because 'seq:-n' matches both
i = seq.find(':')
if i >= 0:
head, dir, tail = seq[:i], '', seq[i+1:]
if tail[:1] in '-+':
dir, tail = tail[:1], tail[1:]
if not isnumeric(tail):
raise Error, "bad message list %s" % seq
try:
count = int(tail)
except (ValueError, OverflowError):
# Can't use sys.maxint because of i+count below
count = len(all)
try:
anchor = self._parseindex(head, all)
except Error, msg:
seqs = self.getsequences()
if not head in seqs:
if not msg:
msg = "bad message list %s" % seq
raise Error, msg, sys.exc_info()[2]
msgs = seqs[head]
if not msgs:
raise Error, "sequence %s empty" % head
if dir == '-':
return msgs[-count:]
else:
return msgs[:count]
else:
if not dir:
if head in ('prev', 'last'):
dir = '-'
if dir == '-':
i = bisect(all, anchor)
return all[max(0, i-count):i]
else:
i = bisect(all, anchor-1)
return all[i:i+count]
# Test for X-Y next
i = seq.find('-')
if i >= 0:
begin = self._parseindex(seq[:i], all)
end = self._parseindex(seq[i+1:], all)
i = bisect(all, begin-1)
j = bisect(all, end)
r = all[i:j]
if not r:
raise Error, "bad message list %s" % seq
return r
# Neither X:Y nor X-Y; must be a number or a (pseudo-)sequence
try:
n = self._parseindex(seq, all)
except Error, msg:
seqs = self.getsequences()
if not seq in seqs:
if not msg:
msg = "bad message list %s" % seq
raise Error, msg
return seqs[seq]
else:
if n not in all:
if isnumeric(seq):
raise Error, "message %d doesn't exist" % n
else:
raise Error, "no %s message" % seq
else:
return [n]
def _parseindex(self, seq, all):
"""Internal: parse a message number (or cur, first, etc.)."""
if isnumeric(seq):
try:
return int(seq)
except (OverflowError, ValueError):
return sys.maxint
if seq in ('cur', '.'):
return self.getcurrent()
if seq == 'first':
return all[0]
if seq == 'last':
return all[-1]
if seq == 'next':
n = self.getcurrent()
i = bisect(all, n)
try:
return all[i]
except IndexError:
raise Error, "no next message"
if seq == 'prev':
n = self.getcurrent()
i = bisect(all, n-1)
if i == 0:
raise Error, "no prev message"
try:
return all[i-1]
except IndexError:
raise Error, "no prev message"
raise Error, None
def openmessage(self, n):
"""Open a message -- returns a Message object."""
return Message(self, n)
def removemessages(self, list):
"""Remove one or more messages -- may raise os.error."""
errors = []
deleted = []
for n in list:
path = self.getmessagefilename(n)
commapath = self.getmessagefilename(',' + str(n))
try:
os.unlink(commapath)
except os.error:
pass
try:
os.rename(path, commapath)
except os.error, msg:
errors.append(msg)
else:
deleted.append(n)
if deleted:
self.removefromallsequences(deleted)
if errors:
if len(errors) == 1:
raise os.error, errors[0]
else:
raise os.error, ('multiple errors:', errors)
def refilemessages(self, list, tofolder, keepsequences=0):
"""Refile one or more messages -- may raise os.error.
'tofolder' is an open folder object."""
errors = []
refiled = {}
for n in list:
ton = tofolder.getlast() + 1
path = self.getmessagefilename(n)
topath = tofolder.getmessagefilename(ton)
try:
os.rename(path, topath)
except os.error:
# Try copying
try:
shutil.copy2(path, topath)
os.unlink(path)
except (IOError, os.error), msg:
errors.append(msg)
try:
os.unlink(topath)
except os.error:
pass
continue
tofolder.setlast(ton)
refiled[n] = ton
if refiled:
if keepsequences:
tofolder._copysequences(self, refiled.items())
self.removefromallsequences(refiled.keys())
if errors:
if len(errors) == 1:
raise os.error, errors[0]
else:
raise os.error, ('multiple errors:', errors)
def _copysequences(self, fromfolder, refileditems):
"""Helper for refilemessages() to copy sequences."""
fromsequences = fromfolder.getsequences()
tosequences = self.getsequences()
changed = 0
for name, seq in fromsequences.items():
try:
toseq = tosequences[name]
new = 0
except KeyError:
toseq = []
new = 1
for fromn, ton in refileditems:
if fromn in seq:
toseq.append(ton)
changed = 1
if new and toseq:
tosequences[name] = toseq
if changed:
self.putsequences(tosequences)
def movemessage(self, n, tofolder, ton):
"""Move one message over a specific destination message,
which may or may not already exist."""
path = self.getmessagefilename(n)
# Open it to check that it exists
f = open(path)
f.close()
del f
topath = tofolder.getmessagefilename(ton)
backuptopath = tofolder.getmessagefilename(',%d' % ton)
try:
os.rename(topath, backuptopath)
except os.error:
pass
try:
os.rename(path, topath)
except os.error:
# Try copying
ok = 0
try:
tofolder.setlast(None)
shutil.copy2(path, topath)
ok = 1
finally:
if not ok:
try:
os.unlink(topath)
except os.error:
pass
os.unlink(path)
self.removefromallsequences([n])
def copymessage(self, n, tofolder, ton):
"""Copy one message over a specific destination message,
which may or may not already exist."""
path = self.getmessagefilename(n)
# Open it to check that it exists
f = open(path)
f.close()
del f
topath = tofolder.getmessagefilename(ton)
backuptopath = tofolder.getmessagefilename(',%d' % ton)
try:
os.rename(topath, backuptopath)
except os.error:
pass
ok = 0
try:
tofolder.setlast(None)
shutil.copy2(path, topath)
ok = 1
finally:
if not ok:
try:
os.unlink(topath)
except os.error:
pass
def createmessage(self, n, txt):
"""Create a message, with text from the open file txt."""
path = self.getmessagefilename(n)
backuppath = self.getmessagefilename(',%d' % n)
try:
os.rename(path, backuppath)
except os.error:
pass
ok = 0
BUFSIZE = 16*1024
try:
f = open(path, "w")
while 1:
buf = txt.read(BUFSIZE)
if not buf:
break
f.write(buf)
f.close()
ok = 1
finally:
if not ok:
try:
os.unlink(path)
except os.error:
pass
def removefromallsequences(self, list):
"""Remove one or more messages from all sequences (including last)
-- but not from 'cur'!!!"""
if hasattr(self, 'last') and self.last in list:
del self.last
sequences = self.getsequences()
changed = 0
for name, seq in sequences.items():
if name == 'cur':
continue
for n in list:
if n in seq:
seq.remove(n)
changed = 1
if not seq:
del sequences[name]
if changed:
self.putsequences(sequences)
def getlast(self):
"""Return the last message number."""
if not hasattr(self, 'last'):
self.listmessages() # Set self.last
return self.last
def setlast(self, last):
"""Set the last message number."""
if last is None:
if hasattr(self, 'last'):
del self.last
else:
self.last = last
class Message(mimetools.Message):
def __init__(self, f, n, fp = None):
"""Constructor."""
self.folder = f
self.number = n
if fp is None:
path = f.getmessagefilename(n)
fp = open(path, 'r')
mimetools.Message.__init__(self, fp)
def __repr__(self):
"""String representation."""
return 'Message(%s, %s)' % (repr(self.folder), self.number)
def getheadertext(self, pred = None):
"""Return the message's header text as a string. If an
argument is specified, it is used as a filter predicate to
decide which headers to return (its argument is the header
name converted to lower case)."""
if pred is None:
return ''.join(self.headers)
headers = []
hit = 0
for line in self.headers:
if not line[0].isspace():
i = line.find(':')
if i > 0:
hit = pred(line[:i].lower())
if hit: headers.append(line)
return ''.join(headers)
def getbodytext(self, decode = 1):
"""Return the message's body text as string. This undoes a
Content-Transfer-Encoding, but does not interpret other MIME
features (e.g. multipart messages). To suppress decoding,
pass 0 as an argument."""
self.fp.seek(self.startofbody)
encoding = self.getencoding()
if not decode or encoding in ('', '7bit', '8bit', 'binary'):
return self.fp.read()
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
output = StringIO()
mimetools.decode(self.fp, output, encoding)
return output.getvalue()
def getbodyparts(self):
"""Only for multipart messages: return the message's body as a
list of SubMessage objects. Each submessage object behaves
(almost) as a Message object."""
if self.getmaintype() != 'multipart':
raise Error, 'Content-Type is not multipart/*'
bdry = self.getparam('boundary')
if not bdry:
raise Error, 'multipart/* without boundary param'
self.fp.seek(self.startofbody)
mf = multifile.MultiFile(self.fp)
mf.push(bdry)
parts = []
while mf.next():
n = "%s.%r" % (self.number, 1 + len(parts))
part = SubMessage(self.folder, n, mf)
parts.append(part)
mf.pop()
return parts
def getbody(self):
"""Return body, either a string or a list of messages."""
if self.getmaintype() == 'multipart':
return self.getbodyparts()
else:
return self.getbodytext()
class SubMessage(Message):
def __init__(self, f, n, fp):
"""Constructor."""
Message.__init__(self, f, n, fp)
if self.getmaintype() == 'multipart':
self.body = Message.getbodyparts(self)
else:
self.body = Message.getbodytext(self)
self.bodyencoded = Message.getbodytext(self, decode=0)
# XXX If this is big, should remember file pointers
def __repr__(self):
"""String representation."""
f, n, fp = self.folder, self.number, self.fp
return 'SubMessage(%s, %s, %s)' % (f, n, fp)
def getbodytext(self, decode = 1):
if not decode:
return self.bodyencoded
if type(self.body) == type(''):
return self.body
def getbodyparts(self):
if type(self.body) == type([]):
return self.body
def getbody(self):
return self.body
class IntSet:
"""Class implementing sets of integers.
This is an efficient representation for sets consisting of several
continuous ranges, e.g. 1-100,200-400,402-1000 is represented
internally as a list of three pairs: [(1,100), (200,400),
(402,1000)]. The internal representation is always kept normalized.
The constructor has up to three arguments:
- the string used to initialize the set (default ''),
- the separator between ranges (default ',')
- the separator between begin and end of a range (default '-')
The separators must be strings (not regexprs) and should be different.
The tostring() function yields a string that can be passed to another
IntSet constructor; __repr__() is a valid IntSet constructor itself.
"""
# XXX The default begin/end separator means that negative numbers are
# not supported very well.
#
# XXX There are currently no operations to remove set elements.
def __init__(self, data = None, sep = ',', rng = '-'):
self.pairs = []
self.sep = sep
self.rng = rng
if data: self.fromstring(data)
def reset(self):
self.pairs = []
def __cmp__(self, other):
return cmp(self.pairs, other.pairs)
def __hash__(self):
return hash(self.pairs)
def __repr__(self):
return 'IntSet(%r, %r, %r)' % (self.tostring(), self.sep, self.rng)
def normalize(self):
self.pairs.sort()
i = 1
while i < len(self.pairs):
alo, ahi = self.pairs[i-1]
blo, bhi = self.pairs[i]
if ahi >= blo-1:
self.pairs[i-1:i+1] = [(alo, max(ahi, bhi))]
else:
i = i+1
def tostring(self):
s = ''
for lo, hi in self.pairs:
if lo == hi: t = repr(lo)
else: t = repr(lo) + self.rng + repr(hi)
if s: s = s + (self.sep + t)
else: s = t
return s
def tolist(self):
l = []
for lo, hi in self.pairs:
m = range(lo, hi+1)
l = l + m
return l
def fromlist(self, list):
for i in list:
self.append(i)
def clone(self):
new = IntSet()
new.pairs = self.pairs[:]
return new
def min(self):
return self.pairs[0][0]
def max(self):
return self.pairs[-1][-1]
def contains(self, x):
for lo, hi in self.pairs:
if lo <= x <= hi: return True
return False
def append(self, x):
for i in range(len(self.pairs)):
lo, hi = self.pairs[i]
if x < lo: # Need to insert before
if x+1 == lo:
self.pairs[i] = (x, hi)
else:
self.pairs.insert(i, (x, x))
if i > 0 and x-1 == self.pairs[i-1][1]:
# Merge with previous
self.pairs[i-1:i+1] = [
(self.pairs[i-1][0],
self.pairs[i][1])
]
return
if x <= hi: # Already in set
return
i = len(self.pairs) - 1
if i >= 0:
lo, hi = self.pairs[i]
if x-1 == hi:
self.pairs[i] = lo, x
return
self.pairs.append((x, x))
def addpair(self, xlo, xhi):
if xlo > xhi: return
self.pairs.append((xlo, xhi))
self.normalize()
def fromstring(self, data):
new = []
for part in data.split(self.sep):
list = []
for subp in part.split(self.rng):
s = subp.strip()
list.append(int(s))
if len(list) == 1:
new.append((list[0], list[0]))
elif len(list) == 2 and list[0] <= list[1]:
new.append((list[0], list[1]))
else:
raise ValueError, 'bad data passed to IntSet'
self.pairs = self.pairs + new
self.normalize()
# Subroutines to read/write entries in .mh_profile and .mh_sequences
def pickline(file, key, casefold = 1):
try:
f = open(file, 'r')
except IOError:
return None
pat = re.escape(key) + ':'
prog = re.compile(pat, casefold and re.IGNORECASE)
while 1:
line = f.readline()
if not line: break
if prog.match(line):
text = line[len(key)+1:]
while 1:
line = f.readline()
if not line or not line[0].isspace():
break
text = text + line
return text.strip()
return None
def updateline(file, key, value, casefold = 1):
try:
f = open(file, 'r')
lines = f.readlines()
f.close()
except IOError:
lines = []
pat = re.escape(key) + ':(.*)\n'
prog = re.compile(pat, casefold and re.IGNORECASE)
if value is None:
newline = None
else:
newline = '%s: %s\n' % (key, value)
for i in range(len(lines)):
line = lines[i]
if prog.match(line):
if newline is None:
del lines[i]
else:
lines[i] = newline
break
else:
if newline is not None:
lines.append(newline)
tempfile = file + "~"
f = open(tempfile, 'w')
for line in lines:
f.write(line)
f.close()
os.rename(tempfile, file)
# Test program
def test():
global mh, f
os.system('rm -rf $HOME/Mail/@test')
mh = MH()
def do(s): print s; print eval(s)
do('mh.listfolders()')
do('mh.listallfolders()')
testfolders = ['@test', '@test/test1', '@test/test2',
'@test/test1/test11', '@test/test1/test12',
'@test/test1/test11/test111']
for t in testfolders: do('mh.makefolder(%r)' % (t,))
do('mh.listsubfolders(\'@test\')')
do('mh.listallsubfolders(\'@test\')')
f = mh.openfolder('@test')
do('f.listsubfolders()')
do('f.listallsubfolders()')
do('f.getsequences()')
seqs = f.getsequences()
seqs['foo'] = IntSet('1-10 12-20', ' ').tolist()
print seqs
f.putsequences(seqs)
do('f.getsequences()')
for t in reversed(testfolders): do('mh.deletefolder(%r)' % (t,))
do('mh.getcontext()')
context = mh.getcontext()
f = mh.openfolder(context)
do('f.getcurrent()')
for seq in ('first', 'last', 'cur', '.', 'prev', 'next',
'first:3', 'last:3', 'cur:3', 'cur:-3',
'prev:3', 'next:3',
'1:3', '1:-3', '100:3', '100:-3', '10000:3', '10000:-3',
'all'):
try:
do('f.parsesequence(%r)' % (seq,))
except Error, msg:
print "Error:", msg
stuff = os.popen("pick %r 2>/dev/null" % (seq,)).read()
list = map(int, stuff.split())
print list, "<-- pick"
do('f.listmessages()')
if __name__ == '__main__':
test()
"""Various tools used by MIME-reading or MIME-writing programs."""
import os
import sys
import tempfile
from warnings import filterwarnings, catch_warnings
with catch_warnings():
if sys.py3kwarning:
filterwarnings("ignore", ".*rfc822 has been removed", DeprecationWarning)
import rfc822
from warnings import warnpy3k
warnpy3k("in 3.x, mimetools has been removed in favor of the email package",
stacklevel=2)
__all__ = ["Message","choose_boundary","encode","decode","copyliteral",
"copybinary"]
class Message(rfc822.Message):
"""A derived class of rfc822.Message that knows about MIME headers and
contains some hooks for decoding encoded and multipart messages."""
def __init__(self, fp, seekable = 1):
rfc822.Message.__init__(self, fp, seekable)
self.encodingheader = \
self.getheader('content-transfer-encoding')
self.typeheader = \
self.getheader('content-type')
self.parsetype()
self.parseplist()
def parsetype(self):
str = self.typeheader
if str is None:
str = 'text/plain'
if ';' in str:
i = str.index(';')
self.plisttext = str[i:]
str = str[:i]
else:
self.plisttext = ''
fields = str.split('/')
for i in range(len(fields)):
fields[i] = fields[i].strip().lower()
self.type = '/'.join(fields)
self.maintype = fields[0]
self.subtype = '/'.join(fields[1:])
def parseplist(self):
str = self.plisttext
self.plist = []
while str[:1] == ';':
str = str[1:]
if ';' in str:
# XXX Should parse quotes!
end = str.index(';')
else:
end = len(str)
f = str[:end]
if '=' in f:
i = f.index('=')
f = f[:i].strip().lower() + \
'=' + f[i+1:].strip()
self.plist.append(f.strip())
str = str[end:]
def getplist(self):
return self.plist
def getparam(self, name):
name = name.lower() + '='
n = len(name)
for p in self.plist:
if p[:n] == name:
return rfc822.unquote(p[n:])
return None
def getparamnames(self):
result = []
for p in self.plist:
i = p.find('=')
if i >= 0:
result.append(p[:i].lower())
return result
def getencoding(self):
if self.encodingheader is None:
return '7bit'
return self.encodingheader.lower()
def gettype(self):
return self.type
def getmaintype(self):
return self.maintype
def getsubtype(self):
return self.subtype
# Utility functions
# -----------------
try:
import thread
except ImportError:
import dummy_thread as thread
_counter_lock = thread.allocate_lock()
del thread
_counter = 0
def _get_next_counter():
global _counter
_counter_lock.acquire()
_counter += 1
result = _counter
_counter_lock.release()
return result
_prefix = None
def choose_boundary():
"""Return a string usable as a multipart boundary.
The string chosen is unique within a single program run, and
incorporates the user id (if available), process id (if available),
and current time. So it's very unlikely the returned string appears
in message text, but there's no guarantee.
The boundary contains dots so you have to quote it in the header."""
global _prefix
import time
if _prefix is None:
import socket
try:
hostid = socket.gethostbyname(socket.gethostname())
except socket.gaierror:
hostid = '127.0.0.1'
try:
uid = repr(os.getuid())
except AttributeError:
uid = '1'
try:
pid = repr(os.getpid())
except AttributeError:
pid = '1'
_prefix = hostid + '.' + uid + '.' + pid
return "%s.%.3f.%d" % (_prefix, time.time(), _get_next_counter())
# Subroutines for decoding some common content-transfer-types
def decode(input, output, encoding):
"""Decode common content-transfer-encodings (base64, quopri, uuencode)."""
if encoding == 'base64':
import base64
return base64.decode(input, output)
if encoding == 'quoted-printable':
import quopri
return quopri.decode(input, output)
if encoding in ('uuencode', 'x-uuencode', 'uue', 'x-uue'):
import uu
return uu.decode(input, output)
if encoding in ('7bit', '8bit'):
return output.write(input.read())
if encoding in decodetab:
pipethrough(input, decodetab[encoding], output)
else:
raise ValueError, \
'unknown Content-Transfer-Encoding: %s' % encoding
def encode(input, output, encoding):
"""Encode common content-transfer-encodings (base64, quopri, uuencode)."""
if encoding == 'base64':
import base64
return base64.encode(input, output)
if encoding == 'quoted-printable':
import quopri
return quopri.encode(input, output, 0)
if encoding in ('uuencode', 'x-uuencode', 'uue', 'x-uue'):
import uu
return uu.encode(input, output)
if encoding in ('7bit', '8bit'):
return output.write(input.read())
if encoding in encodetab:
pipethrough(input, encodetab[encoding], output)
else:
raise ValueError, \
'unknown Content-Transfer-Encoding: %s' % encoding
# The following is no longer used for standard encodings
# XXX This requires that uudecode and mmencode are in $PATH
uudecode_pipe = '''(
TEMP=/tmp/@uu.$$
sed "s%^begin [0-7][0-7]* .*%begin 600 $TEMP%" | uudecode
cat $TEMP
rm $TEMP
)'''
decodetab = {
'uuencode': uudecode_pipe,
'x-uuencode': uudecode_pipe,
'uue': uudecode_pipe,
'x-uue': uudecode_pipe,
'quoted-printable': 'mmencode -u -q',
'base64': 'mmencode -u -b',
}
encodetab = {
'x-uuencode': 'uuencode tempfile',
'uuencode': 'uuencode tempfile',
'x-uue': 'uuencode tempfile',
'uue': 'uuencode tempfile',
'quoted-printable': 'mmencode -q',
'base64': 'mmencode -b',
}
def pipeto(input, command):
pipe = os.popen(command, 'w')
copyliteral(input, pipe)
pipe.close()
def pipethrough(input, command, output):
(fd, tempname) = tempfile.mkstemp()
temp = os.fdopen(fd, 'w')
copyliteral(input, temp)
temp.close()
pipe = os.popen(command + ' <' + tempname, 'r')
copybinary(pipe, output)
pipe.close()
os.unlink(tempname)
def copyliteral(input, output):
while 1:
line = input.readline()
if not line: break
output.write(line)
def copybinary(input, output):
BUFSIZE = 8192
while 1:
line = input.read(BUFSIZE)
if not line: break
output.write(line)
"""Guess the MIME type of a file.
This module defines two useful functions:
guess_type(url, strict=1) -- guess the MIME type and encoding of a URL.
guess_extension(type, strict=1) -- guess the extension for a given MIME type.
It also contains the following, for tuning the behavior:
Data:
knownfiles -- list of files to parse
inited -- flag set when init() has been called
suffix_map -- dictionary mapping suffixes to suffixes
encodings_map -- dictionary mapping suffixes to encodings
types_map -- dictionary mapping suffixes to types
Functions:
init([files]) -- parse a list of files, default knownfiles (on Windows, the
default values are taken from the registry)
read_mime_types(file) -- parse one file, return a dictionary or None
"""
import os
import sys
import posixpath
import urllib
try:
import _winreg
except ImportError:
_winreg = None
__all__ = [
"guess_type","guess_extension","guess_all_extensions",
"add_type","read_mime_types","init"
]
knownfiles = [
"/etc/mime.types",
"/etc/httpd/mime.types", # Mac OS X
"/etc/httpd/conf/mime.types", # Apache
"/etc/apache/mime.types", # Apache 1
"/etc/apache2/mime.types", # Apache 2
"/usr/local/etc/httpd/conf/mime.types",
"/usr/local/lib/netscape/mime.types",
"/usr/local/etc/httpd/conf/mime.types", # Apache 1.2
"/usr/local/etc/mime.types", # Apache 1.3
]
inited = False
_db = None
class MimeTypes:
"""MIME-types datastore.
This datastore can handle information from mime.types-style files
and supports basic determination of MIME type from a filename or
URL, and can guess a reasonable extension given a MIME type.
"""
def __init__(self, filenames=(), strict=True):
if not inited:
init()
self.encodings_map = encodings_map.copy()
self.suffix_map = suffix_map.copy()
self.types_map = ({}, {}) # dict for (non-strict, strict)
self.types_map_inv = ({}, {})
for (ext, type) in types_map.items():
self.add_type(type, ext, True)
for (ext, type) in common_types.items():
self.add_type(type, ext, False)
for name in filenames:
self.read(name, strict)
def add_type(self, type, ext, strict=True):
"""Add a mapping between a type and an extension.
When the extension is already known, the new
type will replace the old one. When the type
is already known the extension will be added
to the list of known extensions.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
self.types_map[strict][ext] = type
exts = self.types_map_inv[strict].setdefault(type, [])
if ext not in exts:
exts.append(ext)
def guess_type(self, url, strict=True):
"""Guess the type of a file based on its URL.
Return value is a tuple (type, encoding) where type is None if
the type can't be guessed (no or unknown suffix) or a string
of the form type/subtype, usable for a MIME Content-type
header; and encoding is None for no encoding or the name of
the program used to encode (e.g. compress or gzip). The
mappings are table driven. Encoding suffixes are case
sensitive; type suffixes are first tried case sensitive, then
case insensitive.
The suffixes .tgz, .taz and .tz (case sensitive!) are all
mapped to '.tar.gz'. (This is table-driven too, using the
dictionary suffix_map.)
Optional `strict' argument when False adds a bunch of commonly found,
but non-standard types.
"""
scheme, url = urllib.splittype(url)
if scheme == 'data':
# syntax of data URLs:
# dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
# mediatype := [ type "/" subtype ] *( ";" parameter )
# data := *urlchar
# parameter := attribute "=" value
# type/subtype defaults to "text/plain"
comma = url.find(',')
if comma < 0:
# bad data URL
return None, None
semi = url.find(';', 0, comma)
if semi >= 0:
type = url[:semi]
else:
type = url[:comma]
if '=' in type or '/' not in type:
type = 'text/plain'
return type, None # never compressed, so encoding is None
base, ext = posixpath.splitext(url)
while ext in self.suffix_map:
base, ext = posixpath.splitext(base + self.suffix_map[ext])
if ext in self.encodings_map:
encoding = self.encodings_map[ext]
base, ext = posixpath.splitext(base)
else:
encoding = None
types_map = self.types_map[True]
if ext in types_map:
return types_map[ext], encoding
elif ext.lower() in types_map:
return types_map[ext.lower()], encoding
elif strict:
return None, encoding
types_map = self.types_map[False]
if ext in types_map:
return types_map[ext], encoding
elif ext.lower() in types_map:
return types_map[ext.lower()], encoding
else:
return None, encoding
def guess_all_extensions(self, type, strict=True):
"""Guess the extensions for a file based on its MIME type.
Return value is a list of strings giving the possible filename
extensions, including the leading dot ('.'). The extension is not
guaranteed to have been associated with any particular data stream,
but would be mapped to the MIME type `type' by guess_type().
Optional `strict' argument when false adds a bunch of commonly found,
but non-standard types.
"""
type = type.lower()
extensions = self.types_map_inv[True].get(type, [])
if not strict:
for ext in self.types_map_inv[False].get(type, []):
if ext not in extensions:
extensions.append(ext)
return extensions
def guess_extension(self, type, strict=True):
"""Guess the extension for a file based on its MIME type.
Return value is a string giving a filename extension,
including the leading dot ('.'). The extension is not
guaranteed to have been associated with any particular data
stream, but would be mapped to the MIME type `type' by
guess_type(). If no extension can be guessed for `type', None
is returned.
Optional `strict' argument when false adds a bunch of commonly found,
but non-standard types.
"""
extensions = self.guess_all_extensions(type, strict)
if not extensions:
return None
return extensions[0]
def read(self, filename, strict=True):
"""
Read a single mime.types-format file, specified by pathname.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
with open(filename) as fp:
self.readfp(fp, strict)
def readfp(self, fp, strict=True):
"""
Read a single mime.types-format file.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
while 1:
line = fp.readline()
if not line:
break
words = line.split()
for i in range(len(words)):
if words[i][0] == '#':
del words[i:]
break
if not words:
continue
type, suffixes = words[0], words[1:]
for suff in suffixes:
self.add_type(type, '.' + suff, strict)
def read_windows_registry(self, strict=True):
"""
Load the MIME types database from Windows registry.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
# Windows only
if not _winreg:
return
def enum_types(mimedb):
i = 0
while True:
try:
ctype = _winreg.EnumKey(mimedb, i)
except EnvironmentError:
break
else:
if '\0' not in ctype:
yield ctype
i += 1
default_encoding = sys.getdefaultencoding()
with _winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT, '') as hkcr:
for subkeyname in enum_types(hkcr):
with _winreg.OpenKey(hkcr, subkeyname) as subkey:
# Only check file extensions
if not subkeyname.startswith("."):
continue
try:
# raises EnvironmentError if no 'Content Type' value
mimetype, datatype = _winreg.QueryValueEx(
subkey, 'Content Type')
except EnvironmentError:
pass
else:
if datatype != _winreg.REG_SZ:
continue
try:
mimetype = mimetype.encode(default_encoding)
except UnicodeEncodeError:
pass
else:
self.add_type(mimetype, subkeyname, strict)
def guess_type(url, strict=True):
"""Guess the type of a file based on its URL.
Return value is a tuple (type, encoding) where type is None if the
type can't be guessed (no or unknown suffix) or a string of the
form type/subtype, usable for a MIME Content-type header; and
encoding is None for no encoding or the name of the program used
to encode (e.g. compress or gzip). The mappings are table
driven. Encoding suffixes are case sensitive; type suffixes are
first tried case sensitive, then case insensitive.
The suffixes .tgz, .taz and .tz (case sensitive!) are all mapped
to ".tar.gz". (This is table-driven too, using the dictionary
suffix_map).
Optional `strict' argument when false adds a bunch of commonly found, but
non-standard types.
"""
if _db is None:
init()
return _db.guess_type(url, strict)
def guess_all_extensions(type, strict=True):
"""Guess the extensions for a file based on its MIME type.
Return value is a list of strings giving the possible filename
extensions, including the leading dot ('.'). The extension is not
guaranteed to have been associated with any particular data
stream, but would be mapped to the MIME type `type' by
guess_type(). If no extension can be guessed for `type', None
is returned.
Optional `strict' argument when false adds a bunch of commonly found,
but non-standard types.
"""
if _db is None:
init()
return _db.guess_all_extensions(type, strict)
def guess_extension(type, strict=True):
"""Guess the extension for a file based on its MIME type.
Return value is a string giving a filename extension, including the
leading dot ('.'). The extension is not guaranteed to have been
associated with any particular data stream, but would be mapped to the
MIME type `type' by guess_type(). If no extension can be guessed for
`type', None is returned.
Optional `strict' argument when false adds a bunch of commonly found,
but non-standard types.
"""
if _db is None:
init()
return _db.guess_extension(type, strict)
def add_type(type, ext, strict=True):
"""Add a mapping between a type and an extension.
When the extension is already known, the new
type will replace the old one. When the type
is already known the extension will be added
to the list of known extensions.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
if _db is None:
init()
return _db.add_type(type, ext, strict)
def init(files=None):
global suffix_map, types_map, encodings_map, common_types
global inited, _db
inited = True # so that MimeTypes.__init__() doesn't call us again
db = MimeTypes()
if files is None:
if _winreg:
db.read_windows_registry()
files = knownfiles
for file in files:
if os.path.isfile(file):
db.read(file)
encodings_map = db.encodings_map
suffix_map = db.suffix_map
types_map = db.types_map[True]
common_types = db.types_map[False]
# Make the DB a global variable now that it is fully initialized
_db = db
def read_mime_types(file):
try:
f = open(file)
except IOError:
return None
with f:
db = MimeTypes()
db.readfp(f, True)
return db.types_map[True]
def _default_mime_types():
global suffix_map
global encodings_map
global types_map
global common_types
suffix_map = {
'.svgz': '.svg.gz',
'.tgz': '.tar.gz',
'.taz': '.tar.gz',
'.tz': '.tar.gz',
'.tbz2': '.tar.bz2',
'.txz': '.tar.xz',
}
encodings_map = {
'.gz': 'gzip',
'.Z': 'compress',
'.bz2': 'bzip2',
'.xz': 'xz',
}
# Before adding new types, make sure they are either registered with IANA,
# at http://www.isi.edu/in-notes/iana/assignments/media-types
# or extensions, i.e. using the x- prefix
# If you add to these, please keep them sorted!
types_map = {
'.a' : 'application/octet-stream',
'.ai' : 'application/postscript',
'.aif' : 'audio/x-aiff',
'.aifc' : 'audio/x-aiff',
'.aiff' : 'audio/x-aiff',
'.au' : 'audio/basic',
'.avi' : 'video/x-msvideo',
'.bat' : 'text/plain',
'.bcpio' : 'application/x-bcpio',
'.bin' : 'application/octet-stream',
'.bmp' : 'image/x-ms-bmp',
'.c' : 'text/plain',
# Duplicates :(
'.cdf' : 'application/x-cdf',
'.cdf' : 'application/x-netcdf',
'.cpio' : 'application/x-cpio',
'.csh' : 'application/x-csh',
'.css' : 'text/css',
'.csv' : 'text/csv',
'.dll' : 'application/octet-stream',
'.doc' : 'application/msword',
'.dot' : 'application/msword',
'.dvi' : 'application/x-dvi',
'.eml' : 'message/rfc822',
'.eps' : 'application/postscript',
'.etx' : 'text/x-setext',
'.exe' : 'application/octet-stream',
'.gif' : 'image/gif',
'.gtar' : 'application/x-gtar',
'.h' : 'text/plain',
'.hdf' : 'application/x-hdf',
'.htm' : 'text/html',
'.html' : 'text/html',
'.ico' : 'image/vnd.microsoft.icon',
'.ief' : 'image/ief',
'.jpe' : 'image/jpeg',
'.jpeg' : 'image/jpeg',
'.jpg' : 'image/jpeg',
'.js' : 'application/javascript',
'.json' : 'application/json',
'.ksh' : 'text/plain',
'.latex' : 'application/x-latex',
'.m1v' : 'video/mpeg',
'.man' : 'application/x-troff-man',
'.me' : 'application/x-troff-me',
'.mht' : 'message/rfc822',
'.mhtml' : 'message/rfc822',
'.mif' : 'application/x-mif',
'.mjs' : 'application/javascript',
'.mov' : 'video/quicktime',
'.movie' : 'video/x-sgi-movie',
'.mp2' : 'audio/mpeg',
'.mp3' : 'audio/mpeg',
'.mp4' : 'video/mp4',
'.mpa' : 'video/mpeg',
'.mpe' : 'video/mpeg',
'.mpeg' : 'video/mpeg',
'.mpg' : 'video/mpeg',
'.ms' : 'application/x-troff-ms',
'.nc' : 'application/x-netcdf',
'.nws' : 'message/rfc822',
'.o' : 'application/octet-stream',
'.obj' : 'application/octet-stream',
'.oda' : 'application/oda',
'.p12' : 'application/x-pkcs12',
'.p7c' : 'application/pkcs7-mime',
'.pbm' : 'image/x-portable-bitmap',
'.pdf' : 'application/pdf',
'.pfx' : 'application/x-pkcs12',
'.pgm' : 'image/x-portable-graymap',
'.pl' : 'text/plain',
'.png' : 'image/png',
'.pnm' : 'image/x-portable-anymap',
'.pot' : 'application/vnd.ms-powerpoint',
'.ppa' : 'application/vnd.ms-powerpoint',
'.ppm' : 'image/x-portable-pixmap',
'.pps' : 'application/vnd.ms-powerpoint',
'.ppt' : 'application/vnd.ms-powerpoint',
'.ps' : 'application/postscript',
'.pwz' : 'application/vnd.ms-powerpoint',
'.py' : 'text/x-python',
'.pyc' : 'application/x-python-code',
'.pyo' : 'application/x-python-code',
'.qt' : 'video/quicktime',
'.ra' : 'audio/x-pn-realaudio',
'.ram' : 'application/x-pn-realaudio',
'.ras' : 'image/x-cmu-raster',
'.rdf' : 'application/xml',
'.rgb' : 'image/x-rgb',
'.roff' : 'application/x-troff',
'.rtx' : 'text/richtext',
'.sgm' : 'text/x-sgml',
'.sgml' : 'text/x-sgml',
'.sh' : 'application/x-sh',
'.shar' : 'application/x-shar',
'.snd' : 'audio/basic',
'.so' : 'application/octet-stream',
'.src' : 'application/x-wais-source',
'.sv4cpio': 'application/x-sv4cpio',
'.sv4crc' : 'application/x-sv4crc',
'.svg' : 'image/svg+xml',
'.swf' : 'application/x-shockwave-flash',
'.t' : 'application/x-troff',
'.tar' : 'application/x-tar',
'.tcl' : 'application/x-tcl',
'.tex' : 'application/x-tex',
'.texi' : 'application/x-texinfo',
'.texinfo': 'application/x-texinfo',
'.tif' : 'image/tiff',
'.tiff' : 'image/tiff',
'.tr' : 'application/x-troff',
'.tsv' : 'text/tab-separated-values',
'.txt' : 'text/plain',
'.ustar' : 'application/x-ustar',
'.vcf' : 'text/x-vcard',
'.wav' : 'audio/x-wav',
'.webm' : 'video/webm',
'.wiz' : 'application/msword',
'.wsdl' : 'application/xml',
'.xbm' : 'image/x-xbitmap',
'.xlb' : 'application/vnd.ms-excel',
# Duplicates :(
'.xls' : 'application/excel',
'.xls' : 'application/vnd.ms-excel',
'.xml' : 'text/xml',
'.xpdl' : 'application/xml',
'.xpm' : 'image/x-xpixmap',
'.xsl' : 'application/xml',
'.xwd' : 'image/x-xwindowdump',
'.zip' : 'application/zip',
}
# These are non-standard types, commonly found in the wild. They will
# only match if strict=0 flag is given to the API methods.
# Please sort these too
common_types = {
'.jpg' : 'image/jpg',
'.mid' : 'audio/midi',
'.midi': 'audio/midi',
'.pct' : 'image/pict',
'.pic' : 'image/pict',
'.pict': 'image/pict',
'.rtf' : 'application/rtf',
'.xul' : 'text/xul'
}
_default_mime_types()
if __name__ == '__main__':
import getopt
USAGE = """\
Usage: mimetypes.py [options] type
Options:
--help / -h -- print this message and exit
--lenient / -l -- additionally search of some common, but non-standard
types.
--extension / -e -- guess extension instead of type
More than one type argument may be given.
"""
def usage(code, msg=''):
print USAGE
if msg: print msg
sys.exit(code)
try:
opts, args = getopt.getopt(sys.argv[1:], 'hle',
['help', 'lenient', 'extension'])
except getopt.error, msg:
usage(1, msg)
strict = 1
extension = 0
for opt, arg in opts:
if opt in ('-h', '--help'):
usage(0)
elif opt in ('-l', '--lenient'):
strict = 0
elif opt in ('-e', '--extension'):
extension = 1
for gtype in args:
if extension:
guess = guess_extension(gtype, strict)
if not guess: print "I don't know anything about type", gtype
else: print guess
else:
guess, encoding = guess_type(gtype, strict)
if not guess: print "I don't know anything about type", gtype
else: print 'type:', guess, 'encoding:', encoding
"""Generic MIME writer.
This module defines the class MimeWriter. The MimeWriter class implements
a basic formatter for creating MIME multi-part files. It doesn't seek around
the output file nor does it use large amounts of buffer space. You must write
the parts out in the order that they should occur in the final file.
MimeWriter does buffer the headers you add, allowing you to rearrange their
order.
"""
import mimetools
__all__ = ["MimeWriter"]
import warnings
warnings.warn("the MimeWriter module is deprecated; use the email package instead",
DeprecationWarning, 2)
class MimeWriter:
"""Generic MIME writer.
Methods:
__init__()
addheader()
flushheaders()
startbody()
startmultipartbody()
nextpart()
lastpart()
A MIME writer is much more primitive than a MIME parser. It
doesn't seek around on the output file, and it doesn't use large
amounts of buffer space, so you have to write the parts in the
order they should occur on the output file. It does buffer the
headers you add, allowing you to rearrange their order.
General usage is:
f = <open the output file>
w = MimeWriter(f)
...call w.addheader(key, value) 0 or more times...
followed by either:
f = w.startbody(content_type)
...call f.write(data) for body data...
or:
w.startmultipartbody(subtype)
for each part:
subwriter = w.nextpart()
...use the subwriter's methods to create the subpart...
w.lastpart()
The subwriter is another MimeWriter instance, and should be
treated in the same way as the toplevel MimeWriter. This way,
writing recursive body parts is easy.
Warning: don't forget to call lastpart()!
XXX There should be more state so calls made in the wrong order
are detected.
Some special cases:
- startbody() just returns the file passed to the constructor;
but don't use this knowledge, as it may be changed.
- startmultipartbody() actually returns a file as well;
this can be used to write the initial 'if you can read this your
mailer is not MIME-aware' message.
- If you call flushheaders(), the headers accumulated so far are
written out (and forgotten); this is useful if you don't need a
body part at all, e.g. for a subpart of type message/rfc822
that's (mis)used to store some header-like information.
- Passing a keyword argument 'prefix=<flag>' to addheader(),
start*body() affects where the header is inserted; 0 means
append at the end, 1 means insert at the start; default is
append for addheader(), but insert for start*body(), which use
it to determine where the Content-Type header goes.
"""
def __init__(self, fp):
self._fp = fp
self._headers = []
def addheader(self, key, value, prefix=0):
"""Add a header line to the MIME message.
The key is the name of the header, where the value obviously provides
the value of the header. The optional argument prefix determines
where the header is inserted; 0 means append at the end, 1 means
insert at the start. The default is to append.
"""
lines = value.split("\n")
while lines and not lines[-1]: del lines[-1]
while lines and not lines[0]: del lines[0]
for i in range(1, len(lines)):
lines[i] = " " + lines[i].strip()
value = "\n".join(lines) + "\n"
line = key + ": " + value
if prefix:
self._headers.insert(0, line)
else:
self._headers.append(line)
def flushheaders(self):
"""Writes out and forgets all headers accumulated so far.
This is useful if you don't need a body part at all; for example,
for a subpart of type message/rfc822 that's (mis)used to store some
header-like information.
"""
self._fp.writelines(self._headers)
self._headers = []
def startbody(self, ctype, plist=[], prefix=1):
"""Returns a file-like object for writing the body of the message.
The content-type is set to the provided ctype, and the optional
parameter, plist, provides additional parameters for the
content-type declaration. The optional argument prefix determines
where the header is inserted; 0 means append at the end, 1 means
insert at the start. The default is to insert at the start.
"""
for name, value in plist:
ctype = ctype + ';\n %s=\"%s\"' % (name, value)
self.addheader("Content-Type", ctype, prefix=prefix)
self.flushheaders()
self._fp.write("\n")
return self._fp
def startmultipartbody(self, subtype, boundary=None, plist=[], prefix=1):
"""Returns a file-like object for writing the body of the message.
Additionally, this method initializes the multi-part code, where the
subtype parameter provides the multipart subtype, the boundary
parameter may provide a user-defined boundary specification, and the
plist parameter provides optional parameters for the subtype. The
optional argument, prefix, determines where the header is inserted;
0 means append at the end, 1 means insert at the start. The default
is to insert at the start. Subparts should be created using the
nextpart() method.
"""
self._boundary = boundary or mimetools.choose_boundary()
return self.startbody("multipart/" + subtype,
[("boundary", self._boundary)] + plist,
prefix=prefix)
def nextpart(self):
"""Returns a new instance of MimeWriter which represents an
individual part in a multipart message.
This may be used to write the part as well as used for creating
recursively complex multipart messages. The message must first be
initialized with the startmultipartbody() method before using the
nextpart() method.
"""
self._fp.write("\n--" + self._boundary + "\n")
return self.__class__(self._fp)
def lastpart(self):
"""This is used to designate the last part of a multipart message.
It should always be used when writing multipart messages.
"""
self._fp.write("\n--" + self._boundary + "--\n")
if __name__ == '__main__':
import test.test_MimeWriter
#! /usr/bin/env python
"""Mimification and unmimification of mail messages.
Decode quoted-printable parts of a mail message or encode using
quoted-printable.
Usage:
mimify(input, output)
unmimify(input, output, decode_base64 = 0)
to encode and decode respectively. Input and output may be the name
of a file or an open file object. Only a readline() method is used
on the input file, only a write() method is used on the output file.
When using file names, the input and output file names may be the
same.
Interactive usage:
mimify.py -e [infile [outfile]]
mimify.py -d [infile [outfile]]
to encode and decode respectively. Infile defaults to standard
input and outfile to standard output.
"""
# Configure
MAXLEN = 200 # if lines longer than this, encode as quoted-printable
CHARSET = 'ISO-8859-1' # default charset for non-US-ASCII mail
QUOTE = '> ' # string replies are quoted with
# End configure
import re
import warnings
warnings.warn("the mimify module is deprecated; use the email package instead",
DeprecationWarning, 2)
__all__ = ["mimify","unmimify","mime_encode_header","mime_decode_header"]
qp = re.compile('^content-transfer-encoding:\\s*quoted-printable', re.I)
base64_re = re.compile('^content-transfer-encoding:\\s*base64', re.I)
mp = re.compile('^content-type:.*multipart/.*boundary="?([^;"\n]*)', re.I|re.S)
chrset = re.compile('^(content-type:.*charset=")(us-ascii|iso-8859-[0-9]+)(".*)', re.I|re.S)
he = re.compile('^-*\n')
mime_code = re.compile('=([0-9a-f][0-9a-f])', re.I)
mime_head = re.compile('=\\?iso-8859-1\\?q\\?([^? \t\n]+)\\?=', re.I)
repl = re.compile('^subject:\\s+re: ', re.I)
class File:
"""A simple fake file object that knows about limited read-ahead and
boundaries. The only supported method is readline()."""
def __init__(self, file, boundary):
self.file = file
self.boundary = boundary
self.peek = None
def readline(self):
if self.peek is not None:
return ''
line = self.file.readline()
if not line:
return line
if self.boundary:
if line == self.boundary + '\n':
self.peek = line
return ''
if line == self.boundary + '--\n':
self.peek = line
return ''
return line
class HeaderFile:
def __init__(self, file):
self.file = file
self.peek = None
def readline(self):
if self.peek is not None:
line = self.peek
self.peek = None
else:
line = self.file.readline()
if not line:
return line
if he.match(line):
return line
while 1:
self.peek = self.file.readline()
if len(self.peek) == 0 or \
(self.peek[0] != ' ' and self.peek[0] != '\t'):
return line
line = line + self.peek
self.peek = None
def mime_decode(line):
"""Decode a single line of quoted-printable text to 8bit."""
newline = ''
pos = 0
while 1:
res = mime_code.search(line, pos)
if res is None:
break
newline = newline + line[pos:res.start(0)] + \
chr(int(res.group(1), 16))
pos = res.end(0)
return newline + line[pos:]
def mime_decode_header(line):
"""Decode a header line to 8bit."""
newline = ''
pos = 0
while 1:
res = mime_head.search(line, pos)
if res is None:
break
match = res.group(1)
# convert underscores to spaces (before =XX conversion!)
match = ' '.join(match.split('_'))
newline = newline + line[pos:res.start(0)] + mime_decode(match)
pos = res.end(0)
return newline + line[pos:]
def unmimify_part(ifile, ofile, decode_base64 = 0):
"""Convert a quoted-printable part of a MIME mail message to 8bit."""
multipart = None
quoted_printable = 0
is_base64 = 0
is_repl = 0
if ifile.boundary and ifile.boundary[:2] == QUOTE:
prefix = QUOTE
else:
prefix = ''
# read header
hfile = HeaderFile(ifile)
while 1:
line = hfile.readline()
if not line:
return
if prefix and line[:len(prefix)] == prefix:
line = line[len(prefix):]
pref = prefix
else:
pref = ''
line = mime_decode_header(line)
if qp.match(line):
quoted_printable = 1
continue # skip this header
if decode_base64 and base64_re.match(line):
is_base64 = 1
continue
ofile.write(pref + line)
if not prefix and repl.match(line):
# we're dealing with a reply message
is_repl = 1
mp_res = mp.match(line)
if mp_res:
multipart = '--' + mp_res.group(1)
if he.match(line):
break
if is_repl and (quoted_printable or multipart):
is_repl = 0
# read body
while 1:
line = ifile.readline()
if not line:
return
line = re.sub(mime_head, '\\1', line)
if prefix and line[:len(prefix)] == prefix:
line = line[len(prefix):]
pref = prefix
else:
pref = ''
## if is_repl and len(line) >= 4 and line[:4] == QUOTE+'--' and line[-3:] != '--\n':
## multipart = line[:-1]
while multipart:
if line == multipart + '--\n':
ofile.write(pref + line)
multipart = None
line = None
break
if line == multipart + '\n':
ofile.write(pref + line)
nifile = File(ifile, multipart)
unmimify_part(nifile, ofile, decode_base64)
line = nifile.peek
if not line:
# premature end of file
break
continue
# not a boundary between parts
break
if line and quoted_printable:
while line[-2:] == '=\n':
line = line[:-2]
newline = ifile.readline()
if newline[:len(QUOTE)] == QUOTE:
newline = newline[len(QUOTE):]
line = line + newline
line = mime_decode(line)
if line and is_base64 and not pref:
import base64
line = base64.decodestring(line)
if line:
ofile.write(pref + line)
def unmimify(infile, outfile, decode_base64 = 0):
"""Convert quoted-printable parts of a MIME mail message to 8bit."""
if type(infile) == type(''):
ifile = open(infile)
if type(outfile) == type('') and infile == outfile:
import os
d, f = os.path.split(infile)
os.rename(infile, os.path.join(d, ',' + f))
else:
ifile = infile
if type(outfile) == type(''):
ofile = open(outfile, 'w')
else:
ofile = outfile
nifile = File(ifile, None)
unmimify_part(nifile, ofile, decode_base64)
ofile.flush()
mime_char = re.compile('[=\177-\377]') # quote these chars in body
mime_header_char = re.compile('[=?\177-\377]') # quote these in header
def mime_encode(line, header):
"""Code a single line as quoted-printable.
If header is set, quote some extra characters."""
if header:
reg = mime_header_char
else:
reg = mime_char
newline = ''
pos = 0
if len(line) >= 5 and line[:5] == 'From ':
# quote 'From ' at the start of a line for stupid mailers
newline = ('=%02x' % ord('F')).upper()
pos = 1
while 1:
res = reg.search(line, pos)
if res is None:
break
newline = newline + line[pos:res.start(0)] + \
('=%02x' % ord(res.group(0))).upper()
pos = res.end(0)
line = newline + line[pos:]
newline = ''
while len(line) >= 75:
i = 73
while line[i] == '=' or line[i-1] == '=':
i = i - 1
i = i + 1
newline = newline + line[:i] + '=\n'
line = line[i:]
return newline + line
mime_header = re.compile('([ \t(]|^)([-a-zA-Z0-9_+]*[\177-\377][-a-zA-Z0-9_+\177-\377]*)(?=[ \t)]|\n)')
def mime_encode_header(line):
"""Code a single header line as quoted-printable."""
newline = ''
pos = 0
while 1:
res = mime_header.search(line, pos)
if res is None:
break
newline = '%s%s%s=?%s?Q?%s?=' % \
(newline, line[pos:res.start(0)], res.group(1),
CHARSET, mime_encode(res.group(2), 1))
pos = res.end(0)
return newline + line[pos:]
mv = re.compile('^mime-version:', re.I)
cte = re.compile('^content-transfer-encoding:', re.I)
iso_char = re.compile('[\177-\377]')
def mimify_part(ifile, ofile, is_mime):
"""Convert an 8bit part of a MIME mail message to quoted-printable."""
has_cte = is_qp = is_base64 = 0
multipart = None
must_quote_body = must_quote_header = has_iso_chars = 0
header = []
header_end = ''
message = []
message_end = ''
# read header
hfile = HeaderFile(ifile)
while 1:
line = hfile.readline()
if not line:
break
if not must_quote_header and iso_char.search(line):
must_quote_header = 1
if mv.match(line):
is_mime = 1
if cte.match(line):
has_cte = 1
if qp.match(line):
is_qp = 1
elif base64_re.match(line):
is_base64 = 1
mp_res = mp.match(line)
if mp_res:
multipart = '--' + mp_res.group(1)
if he.match(line):
header_end = line
break
header.append(line)
# read body
while 1:
line = ifile.readline()
if not line:
break
if multipart:
if line == multipart + '--\n':
message_end = line
break
if line == multipart + '\n':
message_end = line
break
if is_base64:
message.append(line)
continue
if is_qp:
while line[-2:] == '=\n':
line = line[:-2]
newline = ifile.readline()
if newline[:len(QUOTE)] == QUOTE:
newline = newline[len(QUOTE):]
line = line + newline
line = mime_decode(line)
message.append(line)
if not has_iso_chars:
if iso_char.search(line):
has_iso_chars = must_quote_body = 1
if not must_quote_body:
if len(line) > MAXLEN:
must_quote_body = 1
# convert and output header and body
for line in header:
if must_quote_header:
line = mime_encode_header(line)
chrset_res = chrset.match(line)
if chrset_res:
if has_iso_chars:
# change us-ascii into iso-8859-1
if chrset_res.group(2).lower() == 'us-ascii':
line = '%s%s%s' % (chrset_res.group(1),
CHARSET,
chrset_res.group(3))
else:
# change iso-8859-* into us-ascii
line = '%sus-ascii%s' % chrset_res.group(1, 3)
if has_cte and cte.match(line):
line = 'Content-Transfer-Encoding: '
if is_base64:
line = line + 'base64\n'
elif must_quote_body:
line = line + 'quoted-printable\n'
else:
line = line + '7bit\n'
ofile.write(line)
if (must_quote_header or must_quote_body) and not is_mime:
ofile.write('Mime-Version: 1.0\n')
ofile.write('Content-Type: text/plain; ')
if has_iso_chars:
ofile.write('charset="%s"\n' % CHARSET)
else:
ofile.write('charset="us-ascii"\n')
if must_quote_body and not has_cte:
ofile.write('Content-Transfer-Encoding: quoted-printable\n')
ofile.write(header_end)
for line in message:
if must_quote_body:
line = mime_encode(line, 0)
ofile.write(line)
ofile.write(message_end)
line = message_end
while multipart:
if line == multipart + '--\n':
# read bit after the end of the last part
while 1:
line = ifile.readline()
if not line:
return
if must_quote_body:
line = mime_encode(line, 0)
ofile.write(line)
if line == multipart + '\n':
nifile = File(ifile, multipart)
mimify_part(nifile, ofile, 1)
line = nifile.peek
if not line:
# premature end of file
break
ofile.write(line)
continue
# unexpectedly no multipart separator--copy rest of file
while 1:
line = ifile.readline()
if not line:
return
if must_quote_body:
line = mime_encode(line, 0)
ofile.write(line)
def mimify(infile, outfile):
"""Convert 8bit parts of a MIME mail message to quoted-printable."""
if type(infile) == type(''):
ifile = open(infile)
if type(outfile) == type('') and infile == outfile:
import os
d, f = os.path.split(infile)
os.rename(infile, os.path.join(d, ',' + f))
else:
ifile = infile
if type(outfile) == type(''):
ofile = open(outfile, 'w')
else:
ofile = outfile
nifile = File(ifile, None)
mimify_part(nifile, ofile, 0)
ofile.flush()
import sys
if __name__ == '__main__' or (len(sys.argv) > 0 and sys.argv[0] == 'mimify'):
import getopt
usage = 'Usage: mimify [-l len] -[ed] [infile [outfile]]'
decode_base64 = 0
opts, args = getopt.getopt(sys.argv[1:], 'l:edb')
if len(args) not in (0, 1, 2):
print usage
sys.exit(1)
if (('-e', '') in opts) == (('-d', '') in opts) or \
((('-b', '') in opts) and (('-d', '') not in opts)):
print usage
sys.exit(1)
for o, a in opts:
if o == '-e':
encode = mimify
elif o == '-d':
encode = unmimify
elif o == '-l':
try:
MAXLEN = int(a)
except (ValueError, OverflowError):
print usage
sys.exit(1)
elif o == '-b':
decode_base64 = 1
if len(args) == 0:
encode_args = (sys.stdin, sys.stdout)
elif len(args) == 1:
encode_args = (args[0], sys.stdout)
else:
encode_args = (args[0], args[1])
if decode_base64:
encode_args = encode_args + (decode_base64,)
encode(*encode_args)
"""Find modules used by a script, using introspection."""
from __future__ import generators
import dis
import imp
import marshal
import os
import sys
import types
import struct
if hasattr(sys.__stdout__, "newlines"):
READ_MODE = "U" # universal line endings
else:
# Python < 2.3 compatibility, no longer strictly required
READ_MODE = "r"
LOAD_CONST = dis.opmap['LOAD_CONST']
IMPORT_NAME = dis.opmap['IMPORT_NAME']
STORE_NAME = dis.opmap['STORE_NAME']
STORE_GLOBAL = dis.opmap['STORE_GLOBAL']
STORE_OPS = STORE_NAME, STORE_GLOBAL
HAVE_ARGUMENT = dis.HAVE_ARGUMENT
EXTENDED_ARG = dis.EXTENDED_ARG
def _unpack_opargs(code):
# enumerate() is not an option, since we sometimes process
# multiple elements on a single pass through the loop
extended_arg = 0
n = len(code)
i = 0
while i < n:
op = ord(code[i])
offset = i
i = i+1
arg = None
if op >= HAVE_ARGUMENT:
arg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
extended_arg = 0
i = i+2
if op == EXTENDED_ARG:
extended_arg = arg*65536
yield (offset, op, arg)
# Modulefinder does a good job at simulating Python's, but it can not
# handle __path__ modifications packages make at runtime. Therefore there
# is a mechanism whereby you can register extra paths in this map for a
# package, and it will be honored.
# Note this is a mapping is lists of paths.
packagePathMap = {}
# A Public interface
def AddPackagePath(packagename, path):
paths = packagePathMap.get(packagename, [])
paths.append(path)
packagePathMap[packagename] = paths
replacePackageMap = {}
# This ReplacePackage mechanism allows modulefinder to work around the
# way the _xmlplus package injects itself under the name "xml" into
# sys.modules at runtime by calling ReplacePackage("_xmlplus", "xml")
# before running ModuleFinder.
def ReplacePackage(oldname, newname):
replacePackageMap[oldname] = newname
class Module:
def __init__(self, name, file=None, path=None):
self.__name__ = name
self.__file__ = file
self.__path__ = path
self.__code__ = None
# The set of global names that are assigned to in the module.
# This includes those names imported through starimports of
# Python modules.
self.globalnames = {}
# The set of starimports this module did that could not be
# resolved, ie. a starimport from a non-Python module.
self.starimports = {}
def __repr__(self):
s = "Module(%r" % (self.__name__,)
if self.__file__ is not None:
s = s + ", %r" % (self.__file__,)
if self.__path__ is not None:
s = s + ", %r" % (self.__path__,)
s = s + ")"
return s
class ModuleFinder:
def __init__(self, path=None, debug=0, excludes=[], replace_paths=[]):
if path is None:
path = sys.path
self.path = path
self.modules = {}
self.badmodules = {}
self.debug = debug
self.indent = 0
self.excludes = excludes
self.replace_paths = replace_paths
self.processed_paths = [] # Used in debugging only
def msg(self, level, str, *args):
if level <= self.debug:
for i in range(self.indent):
print " ",
print str,
for arg in args:
print repr(arg),
print
def msgin(self, *args):
level = args[0]
if level <= self.debug:
self.indent = self.indent + 1
self.msg(*args)
def msgout(self, *args):
level = args[0]
if level <= self.debug:
self.indent = self.indent - 1
self.msg(*args)
def run_script(self, pathname):
self.msg(2, "run_script", pathname)
with open(pathname, READ_MODE) as fp:
stuff = ("", "r", imp.PY_SOURCE)
self.load_module('__main__', fp, pathname, stuff)
def load_file(self, pathname):
dir, name = os.path.split(pathname)
name, ext = os.path.splitext(name)
with open(pathname, READ_MODE) as fp:
stuff = (ext, "r", imp.PY_SOURCE)
self.load_module(name, fp, pathname, stuff)
def import_hook(self, name, caller=None, fromlist=None, level=-1):
self.msg(3, "import_hook", name, caller, fromlist, level)
parent = self.determine_parent(caller, level=level)
q, tail = self.find_head_package(parent, name)
m = self.load_tail(q, tail)
if not fromlist:
return q
if m.__path__:
self.ensure_fromlist(m, fromlist)
return None
def determine_parent(self, caller, level=-1):
self.msgin(4, "determine_parent", caller, level)
if not caller or level == 0:
self.msgout(4, "determine_parent -> None")
return None
pname = caller.__name__
if level >= 1: # relative import
if caller.__path__:
level -= 1
if level == 0:
parent = self.modules[pname]
assert parent is caller
self.msgout(4, "determine_parent ->", parent)
return parent
if pname.count(".") < level:
raise ImportError, "relative importpath too deep"
pname = ".".join(pname.split(".")[:-level])
parent = self.modules[pname]
self.msgout(4, "determine_parent ->", parent)
return parent
if caller.__path__:
parent = self.modules[pname]
assert caller is parent
self.msgout(4, "determine_parent ->", parent)
return parent
if '.' in pname:
i = pname.rfind('.')
pname = pname[:i]
parent = self.modules[pname]
assert parent.__name__ == pname
self.msgout(4, "determine_parent ->", parent)
return parent
self.msgout(4, "determine_parent -> None")
return None
def find_head_package(self, parent, name):
self.msgin(4, "find_head_package", parent, name)
if '.' in name:
i = name.find('.')
head = name[:i]
tail = name[i+1:]
else:
head = name
tail = ""
if parent:
qname = "%s.%s" % (parent.__name__, head)
else:
qname = head
q = self.import_module(head, qname, parent)
if q:
self.msgout(4, "find_head_package ->", (q, tail))
return q, tail
if parent:
qname = head
parent = None
q = self.import_module(head, qname, parent)
if q:
self.msgout(4, "find_head_package ->", (q, tail))
return q, tail
self.msgout(4, "raise ImportError: No module named", qname)
raise ImportError, "No module named " + qname
def load_tail(self, q, tail):
self.msgin(4, "load_tail", q, tail)
m = q
while tail:
i = tail.find('.')
if i < 0: i = len(tail)
head, tail = tail[:i], tail[i+1:]
mname = "%s.%s" % (m.__name__, head)
m = self.import_module(head, mname, m)
if not m:
self.msgout(4, "raise ImportError: No module named", mname)
raise ImportError, "No module named " + mname
self.msgout(4, "load_tail ->", m)
return m
def ensure_fromlist(self, m, fromlist, recursive=0):
self.msg(4, "ensure_fromlist", m, fromlist, recursive)
for sub in fromlist:
if sub == "*":
if not recursive:
all = self.find_all_submodules(m)
if all:
self.ensure_fromlist(m, all, 1)
elif not hasattr(m, sub):
subname = "%s.%s" % (m.__name__, sub)
submod = self.import_module(sub, subname, m)
if not submod:
raise ImportError, "No module named " + subname
def find_all_submodules(self, m):
if not m.__path__:
return
modules = {}
# 'suffixes' used to be a list hardcoded to [".py", ".pyc", ".pyo"].
# But we must also collect Python extension modules - although
# we cannot separate normal dlls from Python extensions.
suffixes = []
for triple in imp.get_suffixes():
suffixes.append(triple[0])
for dir in m.__path__:
try:
names = os.listdir(dir)
except os.error:
self.msg(2, "can't list directory", dir)
continue
for name in names:
mod = None
for suff in suffixes:
n = len(suff)
if name[-n:] == suff:
mod = name[:-n]
break
if mod and mod != "__init__":
modules[mod] = mod
return modules.keys()
def import_module(self, partname, fqname, parent):
self.msgin(3, "import_module", partname, fqname, parent)
try:
m = self.modules[fqname]
except KeyError:
pass
else:
self.msgout(3, "import_module ->", m)
return m
if fqname in self.badmodules:
self.msgout(3, "import_module -> None")
return None
if parent and parent.__path__ is None:
self.msgout(3, "import_module -> None")
return None
try:
fp, pathname, stuff = self.find_module(partname,
parent and parent.__path__, parent)
except ImportError:
self.msgout(3, "import_module ->", None)
return None
try:
m = self.load_module(fqname, fp, pathname, stuff)
finally:
if fp: fp.close()
if parent:
setattr(parent, partname, m)
self.msgout(3, "import_module ->", m)
return m
def load_module(self, fqname, fp, pathname, file_info):
suffix, mode, type = file_info
self.msgin(2, "load_module", fqname, fp and "fp", pathname)
if type == imp.PKG_DIRECTORY:
m = self.load_package(fqname, pathname)
self.msgout(2, "load_module ->", m)
return m
if type == imp.PY_SOURCE:
co = compile(fp.read()+'\n', pathname, 'exec')
elif type == imp.PY_COMPILED:
if fp.read(4) != imp.get_magic():
self.msgout(2, "raise ImportError: Bad magic number", pathname)
raise ImportError, "Bad magic number in %s" % pathname
fp.read(4)
co = marshal.load(fp)
else:
co = None
m = self.add_module(fqname)
m.__file__ = pathname
if co:
if self.replace_paths:
co = self.replace_paths_in_code(co)
m.__code__ = co
self.scan_code(co, m)
self.msgout(2, "load_module ->", m)
return m
def _add_badmodule(self, name, caller):
if name not in self.badmodules:
self.badmodules[name] = {}
if caller:
self.badmodules[name][caller.__name__] = 1
else:
self.badmodules[name]["-"] = 1
def _safe_import_hook(self, name, caller, fromlist, level=-1):
# wrapper for self.import_hook() that won't raise ImportError
if name in self.badmodules:
self._add_badmodule(name, caller)
return
try:
self.import_hook(name, caller, level=level)
except ImportError, msg:
self.msg(2, "ImportError:", str(msg))
self._add_badmodule(name, caller)
else:
if fromlist:
for sub in fromlist:
if sub in self.badmodules:
self._add_badmodule(sub, caller)
continue
try:
self.import_hook(name, caller, [sub], level=level)
except ImportError, msg:
self.msg(2, "ImportError:", str(msg))
fullname = name + "." + sub
self._add_badmodule(fullname, caller)
def scan_opcodes_cli(self, co):
import ast
with open(co.co_filename, 'rU') as f:
nodes = ast.parse(f.read(), co.co_filename)
items = []
class ModuleFinderVisitor(ast.NodeVisitor):
def visit_Assign(self, node):
for x in node.targets:
if isinstance(x, ast.Subscript):
if isinstance(x.value, ast.Name):
items.append(("store", (x.value.id, )))
elif isinstance(x.value, ast.Attribute):
items.append(("store", (x.value.attr, )))
else:
print 'Unknown in store: %s' % type(x.value).__name__
elif isinstance(x, ast.Name):
items.append(("store", (x.id, )))
def visit_Import(self, node):
items.extend([("import", (None, x.name)) for x in node.names])
def visit_ImportFrom(self, node):
if node.level == 1:
items.append(("relative_import", (node.level, [x.name for x in node.names], node.module)))
else:
items.extend([("import", ([x.name for x in node.names], node.module))])
v = ModuleFinderVisitor()
v.visit(nodes)
for what, args in items:
yield what, args
def scan_opcodes(self, co,
unpack = struct.unpack):
# Scan the code, and yield 'interesting' opcode combinations
# Version for Python 2.4 and older
code = co.co_code
names = co.co_names
consts = co.co_consts
opargs = [(op, arg) for _, op, arg in _unpack_opargs(code)
if op != EXTENDED_ARG]
for i, (op, oparg) in enumerate(opargs):
if c in STORE_OPS:
yield "store", (names[oparg],)
continue
if (op == IMPORT_NAME and i >= 1
and opargs[i-1][0] == LOAD_CONST):
fromlist = consts[opargs[i-1][1]]
yield "import", (fromlist, names[oparg])
continue
def scan_opcodes_25(self, co):
# Scan the code, and yield 'interesting' opcode combinations
code = co.co_code
names = co.co_names
consts = co.co_consts
opargs = [(op, arg) for _, op, arg in _unpack_opargs(code)
if op != EXTENDED_ARG]
for i, (op, oparg) in enumerate(opargs):
if op in STORE_OPS:
yield "store", (names[oparg],)
continue
if (op == IMPORT_NAME and i >= 2
and opargs[i-1][0] == opargs[i-2][0] == LOAD_CONST):
level = consts[opargs[i-2][1]]
fromlist = consts[opargs[i-1][1]]
if level == -1: # normal import
yield "import", (fromlist, names[oparg])
elif level == 0: # absolute import
yield "absolute_import", (fromlist, names[oparg])
else: # relative import
yield "relative_import", (level, fromlist, names[oparg])
continue
def scan_code(self, co, m):
code = co.co_code
if sys.platform == 'cli':
scanner = self.scan_opcodes_cli
elif sys.version_info >= (2, 5):
scanner = self.scan_opcodes_25
else:
scanner = self.scan_opcodes
for what, args in scanner(co):
if what == "store":
name, = args
m.globalnames[name] = 1
elif what in ("import", "absolute_import"):
fromlist, name = args
have_star = 0
if fromlist is not None:
if "*" in fromlist:
have_star = 1
fromlist = [f for f in fromlist if f != "*"]
if what == "absolute_import": level = 0
else: level = -1
self._safe_import_hook(name, m, fromlist, level=level)
if have_star:
# We've encountered an "import *". If it is a Python module,
# the code has already been parsed and we can suck out the
# global names.
mm = None
if m.__path__:
# At this point we don't know whether 'name' is a
# submodule of 'm' or a global module. Let's just try
# the full name first.
mm = self.modules.get(m.__name__ + "." + name)
if mm is None:
mm = self.modules.get(name)
if mm is not None:
m.globalnames.update(mm.globalnames)
m.starimports.update(mm.starimports)
if mm.__code__ is None:
m.starimports[name] = 1
else:
m.starimports[name] = 1
elif what == "relative_import":
level, fromlist, name = args
if name:
self._safe_import_hook(name, m, fromlist, level=level)
else:
parent = self.determine_parent(m, level=level)
self._safe_import_hook(parent.__name__, None, fromlist, level=0)
else:
# We don't expect anything else from the generator.
raise RuntimeError(what)
for c in co.co_consts:
if isinstance(c, type(co)):
self.scan_code(c, m)
def load_package(self, fqname, pathname):
self.msgin(2, "load_package", fqname, pathname)
newname = replacePackageMap.get(fqname)
if newname:
fqname = newname
m = self.add_module(fqname)
m.__file__ = pathname
m.__path__ = [pathname]
# As per comment at top of file, simulate runtime __path__ additions.
m.__path__ = m.__path__ + packagePathMap.get(fqname, [])
fp, buf, stuff = self.find_module("__init__", m.__path__)
self.load_module(fqname, fp, buf, stuff)
self.msgout(2, "load_package ->", m)
if fp:
fp.close()
return m
def add_module(self, fqname):
if fqname in self.modules:
return self.modules[fqname]
self.modules[fqname] = m = Module(fqname)
return m
def find_module(self, name, path, parent=None):
if parent is not None:
# assert path is not None
fullname = parent.__name__+'.'+name
else:
fullname = name
if fullname in self.excludes:
self.msgout(3, "find_module -> Excluded", fullname)
raise ImportError, name
if path is None:
if name in sys.builtin_module_names:
return (None, None, ("", "", imp.C_BUILTIN))
path = self.path
return imp.find_module(name, path)
def report(self):
"""Print a report to stdout, listing the found modules with their
paths, as well as modules that are missing, or seem to be missing.
"""
print
print " %-25s %s" % ("Name", "File")
print " %-25s %s" % ("----", "----")
# Print modules found
keys = self.modules.keys()
keys.sort()
for key in keys:
m = self.modules[key]
if m.__path__:
print "P",
else:
print "m",
print "%-25s" % key, m.__file__ or ""
# Print missing modules
missing, maybe = self.any_missing_maybe()
if missing:
print
print "Missing modules:"
for name in missing:
mods = self.badmodules[name].keys()
mods.sort()
print "?", name, "imported from", ', '.join(mods)
# Print modules that may be missing, but then again, maybe not...
if maybe:
print
print "Submodules that appear to be missing, but could also be",
print "global names in the parent package:"
for name in maybe:
mods = self.badmodules[name].keys()
mods.sort()
print "?", name, "imported from", ', '.join(mods)
def any_missing(self):
"""Return a list of modules that appear to be missing. Use
any_missing_maybe() if you want to know which modules are
certain to be missing, and which *may* be missing.
"""
missing, maybe = self.any_missing_maybe()
return missing + maybe
def any_missing_maybe(self):
"""Return two lists, one with modules that are certainly missing
and one with modules that *may* be missing. The latter names could
either be submodules *or* just global names in the package.
The reason it can't always be determined is that it's impossible to
tell which names are imported when "from module import *" is done
with an extension module, short of actually importing it.
"""
missing = []
maybe = []
for name in self.badmodules:
if name in self.excludes:
continue
i = name.rfind(".")
if i < 0:
missing.append(name)
continue
subname = name[i+1:]
pkgname = name[:i]
pkg = self.modules.get(pkgname)
if pkg is not None:
if pkgname in self.badmodules[name]:
# The package tried to import this module itself and
# failed. It's definitely missing.
missing.append(name)
elif subname in pkg.globalnames:
# It's a global in the package: definitely not missing.
pass
elif pkg.starimports:
# It could be missing, but the package did an "import *"
# from a non-Python module, so we simply can't be sure.
maybe.append(name)
else:
# It's not a global in the package, the package didn't
# do funny star imports, it's very likely to be missing.
# The symbol could be inserted into the package from the
# outside, but since that's not good style we simply list
# it missing.
missing.append(name)
else:
missing.append(name)
missing.sort()
maybe.sort()
return missing, maybe
def replace_paths_in_code(self, co):
new_filename = original_filename = os.path.normpath(co.co_filename)
for f, r in self.replace_paths:
if original_filename.startswith(f):
new_filename = r + original_filename[len(f):]
break
if self.debug and original_filename not in self.processed_paths:
if new_filename != original_filename:
self.msgout(2, "co_filename %r changed to %r" \
% (original_filename,new_filename,))
else:
self.msgout(2, "co_filename %r remains unchanged" \
% (original_filename,))
self.processed_paths.append(original_filename)
consts = list(co.co_consts)
for i in range(len(consts)):
if isinstance(consts[i], type(co)):
consts[i] = self.replace_paths_in_code(consts[i])
return types.CodeType(co.co_argcount, co.co_nlocals, co.co_stacksize,
co.co_flags, co.co_code, tuple(consts), co.co_names,
co.co_varnames, new_filename, co.co_name,
co.co_firstlineno, co.co_lnotab,
co.co_freevars, co.co_cellvars)
def test():
# Parse command line
import getopt
try:
opts, args = getopt.getopt(sys.argv[1:], "dmp:qx:")
except getopt.error, msg:
print msg
return
# Process options
debug = 1
domods = 0
addpath = []
exclude = []
for o, a in opts:
if o == '-d':
debug = debug + 1
if o == '-m':
domods = 1
if o == '-p':
addpath = addpath + a.split(os.pathsep)
if o == '-q':
debug = 0
if o == '-x':
exclude.append(a)
# Provide default arguments
if not args:
script = "hello.py"
else:
script = args[0]
# Set the path based on sys.path and the script directory
path = sys.path[:]
path[0] = os.path.dirname(script)
path = addpath + path
if debug > 1:
print "path:"
for item in path:
print " ", repr(item)
# Create the module finder and turn its crank
mf = ModuleFinder(path, debug, exclude)
for arg in args[1:]:
if arg == '-m':
domods = 1
continue
if domods:
if arg[-2:] == '.*':
mf.import_hook(arg[:-2], None, ["*"])
else:
mf.import_hook(arg)
else:
mf.load_file(arg)
mf.run_script(script)
mf.report()
return mf # for -i debugging
if __name__ == '__main__':
try:
mf = test()
except KeyboardInterrupt:
print "\n[interrupt]"
"""A readline()-style interface to the parts of a multipart message.
The MultiFile class makes each part of a multipart message "feel" like
an ordinary file, as long as you use fp.readline(). Allows recursive
use, for nested multipart messages. Probably best used together
with module mimetools.
Suggested use:
real_fp = open(...)
fp = MultiFile(real_fp)
"read some lines from fp"
fp.push(separator)
while 1:
"read lines from fp until it returns an empty string" (A)
if not fp.next(): break
fp.pop()
"read remaining lines from fp until it returns an empty string"
The latter sequence may be used recursively at (A).
It is also allowed to use multiple push()...pop() sequences.
If seekable is given as 0, the class code will not do the bookkeeping
it normally attempts in order to make seeks relative to the beginning of the
current file part. This may be useful when using MultiFile with a non-
seekable stream object.
"""
from warnings import warn
warn("the multifile module has been deprecated since Python 2.5",
DeprecationWarning, stacklevel=2)
del warn
__all__ = ["MultiFile","Error"]
class Error(Exception):
pass
class MultiFile:
seekable = 0
def __init__(self, fp, seekable=1):
self.fp = fp
self.stack = []
self.level = 0
self.last = 0
if seekable:
self.seekable = 1
self.start = self.fp.tell()
self.posstack = []
def tell(self):
if self.level > 0:
return self.lastpos
return self.fp.tell() - self.start
def seek(self, pos, whence=0):
here = self.tell()
if whence:
if whence == 1:
pos = pos + here
elif whence == 2:
if self.level > 0:
pos = pos + self.lastpos
else:
raise Error, "can't use whence=2 yet"
if not 0 <= pos <= here or \
self.level > 0 and pos > self.lastpos:
raise Error, 'bad MultiFile.seek() call'
self.fp.seek(pos + self.start)
self.level = 0
self.last = 0
def readline(self):
if self.level > 0:
return ''
line = self.fp.readline()
# Real EOF?
if not line:
self.level = len(self.stack)
self.last = (self.level > 0)
if self.last:
raise Error, 'sudden EOF in MultiFile.readline()'
return ''
assert self.level == 0
# Fast check to see if this is just data
if self.is_data(line):
return line
else:
# Ignore trailing whitespace on marker lines
marker = line.rstrip()
# No? OK, try to match a boundary.
# Return the line (unstripped) if we don't.
for i, sep in enumerate(reversed(self.stack)):
if marker == self.section_divider(sep):
self.last = 0
break
elif marker == self.end_marker(sep):
self.last = 1
break
else:
return line
# We only get here if we see a section divider or EOM line
if self.seekable:
self.lastpos = self.tell() - len(line)
self.level = i+1
if self.level > 1:
raise Error,'Missing endmarker in MultiFile.readline()'
return ''
def readlines(self):
list = []
while 1:
line = self.readline()
if not line: break
list.append(line)
return list
def read(self): # Note: no size argument -- read until EOF only!
return ''.join(self.readlines())
def next(self):
while self.readline(): pass
if self.level > 1 or self.last:
return 0
self.level = 0
self.last = 0
if self.seekable:
self.start = self.fp.tell()
return 1
def push(self, sep):
if self.level > 0:
raise Error, 'bad MultiFile.push() call'
self.stack.append(sep)
if self.seekable:
self.posstack.append(self.start)
self.start = self.fp.tell()
def pop(self):
if self.stack == []:
raise Error, 'bad MultiFile.pop() call'
if self.level <= 1:
self.last = 0
else:
abslastpos = self.lastpos + self.start
self.level = max(0, self.level - 1)
self.stack.pop()
if self.seekable:
self.start = self.posstack.pop()
if self.level > 0:
self.lastpos = abslastpos - self.start
def is_data(self, line):
return line[:2] != '--'
def section_divider(self, str):
return "--" + str
def end_marker(self, str):
return "--" + str + "--"
#
# A higher level module for using sockets (or Windows named pipes)
#
# multiprocessing/connection.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = [ 'Client', 'Listener', 'Pipe' ]
import os
import sys
import socket
import errno
import time
import tempfile
import itertools
import _multiprocessing
from multiprocessing import current_process, AuthenticationError
from multiprocessing.util import get_temp_dir, Finalize, sub_debug, debug
from multiprocessing.forking import duplicate, close
#
#
#
BUFSIZE = 8192
# A very generous timeout when it comes to local connections...
CONNECTION_TIMEOUT = 20.
_mmap_counter = itertools.count()
default_family = 'AF_INET'
families = ['AF_INET']
if hasattr(socket, 'AF_UNIX'):
default_family = 'AF_UNIX'
families += ['AF_UNIX']
if sys.platform == 'win32':
default_family = 'AF_PIPE'
families += ['AF_PIPE']
def _init_timeout(timeout=CONNECTION_TIMEOUT):
return time.time() + timeout
def _check_timeout(t):
return time.time() > t
#
#
#
def arbitrary_address(family):
'''
Return an arbitrary free address for the given family
'''
if family == 'AF_INET':
return ('localhost', 0)
elif family == 'AF_UNIX':
return tempfile.mktemp(prefix='listener-', dir=get_temp_dir())
elif family == 'AF_PIPE':
return tempfile.mktemp(prefix=r'\\.\pipe\pyc-%d-%d-' %
(os.getpid(), _mmap_counter.next()), dir="")
else:
raise ValueError('unrecognized family')
def address_type(address):
'''
Return the types of the address
This can be 'AF_INET', 'AF_UNIX', or 'AF_PIPE'
'''
if type(address) == tuple:
return 'AF_INET'
elif type(address) is str and address.startswith('\\\\'):
return 'AF_PIPE'
elif type(address) is str:
return 'AF_UNIX'
else:
raise ValueError('address type of %r unrecognized' % address)
#
# Public functions
#
class Listener(object):
'''
Returns a listener object.
This is a wrapper for a bound socket which is 'listening' for
connections, or for a Windows named pipe.
'''
def __init__(self, address=None, family=None, backlog=1, authkey=None):
family = family or (address and address_type(address)) \
or default_family
address = address or arbitrary_address(family)
if family == 'AF_PIPE':
self._listener = PipeListener(address, backlog)
else:
self._listener = SocketListener(address, family, backlog)
if authkey is not None and not isinstance(authkey, bytes):
raise TypeError, 'authkey should be a byte string'
self._authkey = authkey
def accept(self):
'''
Accept a connection on the bound socket or named pipe of `self`.
Returns a `Connection` object.
'''
c = self._listener.accept()
if self._authkey:
deliver_challenge(c, self._authkey)
answer_challenge(c, self._authkey)
return c
def close(self):
'''
Close the bound socket or named pipe of `self`.
'''
return self._listener.close()
address = property(lambda self: self._listener._address)
last_accepted = property(lambda self: self._listener._last_accepted)
def Client(address, family=None, authkey=None):
'''
Returns a connection to the address of a `Listener`
'''
family = family or address_type(address)
if family == 'AF_PIPE':
c = PipeClient(address)
else:
c = SocketClient(address)
if authkey is not None and not isinstance(authkey, bytes):
raise TypeError, 'authkey should be a byte string'
if authkey is not None:
answer_challenge(c, authkey)
deliver_challenge(c, authkey)
return c
if sys.platform != 'win32':
def Pipe(duplex=True):
'''
Returns pair of connection objects at either end of a pipe
'''
if duplex:
s1, s2 = socket.socketpair()
s1.setblocking(True)
s2.setblocking(True)
c1 = _multiprocessing.Connection(os.dup(s1.fileno()))
c2 = _multiprocessing.Connection(os.dup(s2.fileno()))
s1.close()
s2.close()
else:
fd1, fd2 = os.pipe()
c1 = _multiprocessing.Connection(fd1, writable=False)
c2 = _multiprocessing.Connection(fd2, readable=False)
return c1, c2
else:
from _multiprocessing import win32
def Pipe(duplex=True):
'''
Returns pair of connection objects at either end of a pipe
'''
address = arbitrary_address('AF_PIPE')
if duplex:
openmode = win32.PIPE_ACCESS_DUPLEX
access = win32.GENERIC_READ | win32.GENERIC_WRITE
obsize, ibsize = BUFSIZE, BUFSIZE
else:
openmode = win32.PIPE_ACCESS_INBOUND
access = win32.GENERIC_WRITE
obsize, ibsize = 0, BUFSIZE
h1 = win32.CreateNamedPipe(
address, openmode,
win32.PIPE_TYPE_MESSAGE | win32.PIPE_READMODE_MESSAGE |
win32.PIPE_WAIT,
1, obsize, ibsize, win32.NMPWAIT_WAIT_FOREVER, win32.NULL
)
h2 = win32.CreateFile(
address, access, 0, win32.NULL, win32.OPEN_EXISTING, 0, win32.NULL
)
win32.SetNamedPipeHandleState(
h2, win32.PIPE_READMODE_MESSAGE, None, None
)
try:
win32.ConnectNamedPipe(h1, win32.NULL)
except WindowsError, e:
if e.args[0] != win32.ERROR_PIPE_CONNECTED:
raise
c1 = _multiprocessing.PipeConnection(h1, writable=duplex)
c2 = _multiprocessing.PipeConnection(h2, readable=duplex)
return c1, c2
#
# Definitions for connections based on sockets
#
class SocketListener(object):
'''
Representation of a socket which is bound to an address and listening
'''
def __init__(self, address, family, backlog=1):
self._socket = socket.socket(getattr(socket, family))
try:
self._socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self._socket.setblocking(True)
self._socket.bind(address)
self._socket.listen(backlog)
self._address = self._socket.getsockname()
except socket.error:
self._socket.close()
raise
self._family = family
self._last_accepted = None
if family == 'AF_UNIX':
self._unlink = Finalize(
self, os.unlink, args=(address,), exitpriority=0
)
else:
self._unlink = None
def accept(self):
while True:
try:
s, self._last_accepted = self._socket.accept()
except socket.error as e:
if e.args[0] != errno.EINTR:
raise
else:
break
s.setblocking(True)
fd = duplicate(s.fileno())
conn = _multiprocessing.Connection(fd)
s.close()
return conn
def close(self):
try:
self._socket.close()
finally:
unlink = self._unlink
if unlink is not None:
self._unlink = None
unlink()
def SocketClient(address):
'''
Return a connection object connected to the socket given by `address`
'''
family = getattr(socket, address_type(address))
t = _init_timeout()
while 1:
s = socket.socket(family)
s.setblocking(True)
try:
s.connect(address)
except socket.error, e:
s.close()
if e.args[0] != errno.ECONNREFUSED or _check_timeout(t):
debug('failed to connect to address %s', address)
raise
time.sleep(0.01)
else:
break
else:
raise
fd = duplicate(s.fileno())
conn = _multiprocessing.Connection(fd)
s.close()
return conn
#
# Definitions for connections based on named pipes
#
if sys.platform == 'win32':
class PipeListener(object):
'''
Representation of a named pipe
'''
def __init__(self, address, backlog=None):
self._address = address
handle = win32.CreateNamedPipe(
address, win32.PIPE_ACCESS_DUPLEX,
win32.PIPE_TYPE_MESSAGE | win32.PIPE_READMODE_MESSAGE |
win32.PIPE_WAIT,
win32.PIPE_UNLIMITED_INSTANCES, BUFSIZE, BUFSIZE,
win32.NMPWAIT_WAIT_FOREVER, win32.NULL
)
self._handle_queue = [handle]
self._last_accepted = None
sub_debug('listener created with address=%r', self._address)
self.close = Finalize(
self, PipeListener._finalize_pipe_listener,
args=(self._handle_queue, self._address), exitpriority=0
)
def accept(self):
newhandle = win32.CreateNamedPipe(
self._address, win32.PIPE_ACCESS_DUPLEX,
win32.PIPE_TYPE_MESSAGE | win32.PIPE_READMODE_MESSAGE |
win32.PIPE_WAIT,
win32.PIPE_UNLIMITED_INSTANCES, BUFSIZE, BUFSIZE,
win32.NMPWAIT_WAIT_FOREVER, win32.NULL
)
self._handle_queue.append(newhandle)
handle = self._handle_queue.pop(0)
try:
win32.ConnectNamedPipe(handle, win32.NULL)
except WindowsError, e:
# ERROR_NO_DATA can occur if a client has already connected,
# written data and then disconnected -- see Issue 14725.
if e.args[0] not in (win32.ERROR_PIPE_CONNECTED,
win32.ERROR_NO_DATA):
raise
return _multiprocessing.PipeConnection(handle)
@staticmethod
def _finalize_pipe_listener(queue, address):
sub_debug('closing listener with address=%r', address)
for handle in queue:
close(handle)
def PipeClient(address):
'''
Return a connection object connected to the pipe given by `address`
'''
t = _init_timeout()
while 1:
try:
win32.WaitNamedPipe(address, 1000)
h = win32.CreateFile(
address, win32.GENERIC_READ | win32.GENERIC_WRITE,
0, win32.NULL, win32.OPEN_EXISTING, 0, win32.NULL
)
except WindowsError, e:
if e.args[0] not in (win32.ERROR_SEM_TIMEOUT,
win32.ERROR_PIPE_BUSY) or _check_timeout(t):
raise
else:
break
else:
raise
win32.SetNamedPipeHandleState(
h, win32.PIPE_READMODE_MESSAGE, None, None
)
return _multiprocessing.PipeConnection(h)
#
# Authentication stuff
#
MESSAGE_LENGTH = 20
CHALLENGE = b'#CHALLENGE#'
WELCOME = b'#WELCOME#'
FAILURE = b'#FAILURE#'
def deliver_challenge(connection, authkey):
import hmac
assert isinstance(authkey, bytes)
message = os.urandom(MESSAGE_LENGTH)
connection.send_bytes(CHALLENGE + message)
digest = hmac.new(authkey, message).digest()
response = connection.recv_bytes(256) # reject large message
if response == digest:
connection.send_bytes(WELCOME)
else:
connection.send_bytes(FAILURE)
raise AuthenticationError('digest received was wrong')
def answer_challenge(connection, authkey):
import hmac
assert isinstance(authkey, bytes)
message = connection.recv_bytes(256) # reject large message
assert message[:len(CHALLENGE)] == CHALLENGE, 'message = %r' % message
message = message[len(CHALLENGE):]
digest = hmac.new(authkey, message).digest()
connection.send_bytes(digest)
response = connection.recv_bytes(256) # reject large message
if response != WELCOME:
raise AuthenticationError('digest sent was rejected')
#
# Support for using xmlrpclib for serialization
#
class ConnectionWrapper(object):
def __init__(self, conn, dumps, loads):
self._conn = conn
self._dumps = dumps
self._loads = loads
for attr in ('fileno', 'close', 'poll', 'recv_bytes', 'send_bytes'):
obj = getattr(conn, attr)
setattr(self, attr, obj)
def send(self, obj):
s = self._dumps(obj)
self._conn.send_bytes(s)
def recv(self):
s = self._conn.recv_bytes()
return self._loads(s)
def _xml_dumps(obj):
return xmlrpclib.dumps((obj,), None, None, None, 1)
def _xml_loads(s):
(obj,), method = xmlrpclib.loads(s)
return obj
class XmlListener(Listener):
def accept(self):
global xmlrpclib
import xmlrpclib
obj = Listener.accept(self)
return ConnectionWrapper(obj, _xml_dumps, _xml_loads)
def XmlClient(*args, **kwds):
global xmlrpclib
import xmlrpclib
return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
#
# Analogue of `multiprocessing.connection` which uses queues instead of sockets
#
# multiprocessing/dummy/connection.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = [ 'Client', 'Listener', 'Pipe' ]
from Queue import Queue
families = [None]
class Listener(object):
def __init__(self, address=None, family=None, backlog=1):
self._backlog_queue = Queue(backlog)
def accept(self):
return Connection(*self._backlog_queue.get())
def close(self):
self._backlog_queue = None
address = property(lambda self: self._backlog_queue)
def Client(address):
_in, _out = Queue(), Queue()
address.put((_out, _in))
return Connection(_in, _out)
def Pipe(duplex=True):
a, b = Queue(), Queue()
return Connection(a, b), Connection(b, a)
class Connection(object):
def __init__(self, _in, _out):
self._out = _out
self._in = _in
self.send = self.send_bytes = _out.put
self.recv = self.recv_bytes = _in.get
def poll(self, timeout=0.0):
if self._in.qsize() > 0:
return True
if timeout <= 0.0:
return False
self._in.not_empty.acquire()
self._in.not_empty.wait(timeout)
self._in.not_empty.release()
return self._in.qsize() > 0
def close(self):
pass
#
# Support for the API of the multiprocessing package using threads
#
# multiprocessing/dummy/__init__.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = [
'Process', 'current_process', 'active_children', 'freeze_support',
'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Condition',
'Event', 'Queue', 'Manager', 'Pipe', 'Pool', 'JoinableQueue'
]
#
# Imports
#
import threading
import sys
import weakref
import array
import itertools
from multiprocessing import TimeoutError, cpu_count
from multiprocessing.dummy.connection import Pipe
from threading import Lock, RLock, Semaphore, BoundedSemaphore
from threading import Event
from Queue import Queue
#
#
#
class DummyProcess(threading.Thread):
def __init__(self, group=None, target=None, name=None, args=(), kwargs={}):
threading.Thread.__init__(self, group, target, name, args, kwargs)
self._pid = None
self._children = weakref.WeakKeyDictionary()
self._start_called = False
self._parent = current_process()
def start(self):
assert self._parent is current_process()
self._start_called = True
if hasattr(self._parent, '_children'):
self._parent._children[self] = None
threading.Thread.start(self)
@property
def exitcode(self):
if self._start_called and not self.is_alive():
return 0
else:
return None
#
#
#
class Condition(threading._Condition):
notify_all = threading._Condition.notify_all.im_func
#
#
#
Process = DummyProcess
current_process = threading.current_thread
current_process()._children = weakref.WeakKeyDictionary()
def active_children():
children = current_process()._children
for p in list(children):
if not p.is_alive():
children.pop(p, None)
return list(children)
def freeze_support():
pass
#
#
#
class Namespace(object):
def __init__(self, **kwds):
self.__dict__.update(kwds)
def __repr__(self):
items = self.__dict__.items()
temp = []
for name, value in items:
if not name.startswith('_'):
temp.append('%s=%r' % (name, value))
temp.sort()
return 'Namespace(%s)' % str.join(', ', temp)
dict = dict
list = list
def Array(typecode, sequence, lock=True):
return array.array(typecode, sequence)
class Value(object):
def __init__(self, typecode, value, lock=True):
self._typecode = typecode
self._value = value
def _get(self):
return self._value
def _set(self, value):
self._value = value
value = property(_get, _set)
def __repr__(self):
return '<%s(%r, %r)>'%(type(self).__name__,self._typecode,self._value)
def Manager():
return sys.modules[__name__]
def shutdown():
pass
def Pool(processes=None, initializer=None, initargs=()):
from multiprocessing.pool import ThreadPool
return ThreadPool(processes, initializer, initargs)
JoinableQueue = Queue
#
# Module for starting a process object using os.fork() or CreateProcess()
#
# multiprocessing/forking.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
import os
import sys
import signal
import errno
from multiprocessing import util, process
__all__ = ['Popen', 'assert_spawning', 'exit', 'duplicate', 'close', 'ForkingPickler']
#
# Check that the current thread is spawning a child process
#
def assert_spawning(self):
if not Popen.thread_is_spawning():
raise RuntimeError(
'%s objects should only be shared between processes'
' through inheritance' % type(self).__name__
)
#
# Try making some callable types picklable
#
from pickle import Pickler
class ForkingPickler(Pickler):
dispatch = Pickler.dispatch.copy()
@classmethod
def register(cls, type, reduce):
def dispatcher(self, obj):
rv = reduce(obj)
self.save_reduce(obj=obj, *rv)
cls.dispatch[type] = dispatcher
def _reduce_method(m):
if m.im_self is None:
return getattr, (m.im_class, m.im_func.func_name)
else:
return getattr, (m.im_self, m.im_func.func_name)
ForkingPickler.register(type(ForkingPickler.save), _reduce_method)
def _reduce_method_descriptor(m):
return getattr, (m.__objclass__, m.__name__)
ForkingPickler.register(type(list.append), _reduce_method_descriptor)
ForkingPickler.register(type(int.__add__), _reduce_method_descriptor)
#def _reduce_builtin_function_or_method(m):
# return getattr, (m.__self__, m.__name__)
#ForkingPickler.register(type(list().append), _reduce_builtin_function_or_method)
#ForkingPickler.register(type(int().__add__), _reduce_builtin_function_or_method)
try:
from functools import partial
except ImportError:
pass
else:
def _reduce_partial(p):
return _rebuild_partial, (p.func, p.args, p.keywords or {})
def _rebuild_partial(func, args, keywords):
return partial(func, *args, **keywords)
ForkingPickler.register(partial, _reduce_partial)
#
# Unix
#
if sys.platform != 'win32':
import time
exit = os._exit
duplicate = os.dup
close = os.close
#
# We define a Popen class similar to the one from subprocess, but
# whose constructor takes a process object as its argument.
#
class Popen(object):
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork()
if self.pid == 0:
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap()
sys.stdout.flush()
sys.stderr.flush()
os._exit(code)
def poll(self, flag=os.WNOHANG):
if self.returncode is None:
while True:
try:
pid, sts = os.waitpid(self.pid, flag)
except os.error as e:
if e.errno == errno.EINTR:
continue
# Child process not yet created. See #1731717
# e.errno == errno.ECHILD == 10
return None
else:
break
if pid == self.pid:
if os.WIFSIGNALED(sts):
self.returncode = -os.WTERMSIG(sts)
else:
assert os.WIFEXITED(sts)
self.returncode = os.WEXITSTATUS(sts)
return self.returncode
def wait(self, timeout=None):
if timeout is None:
return self.poll(0)
deadline = time.time() + timeout
delay = 0.0005
while 1:
res = self.poll()
if res is not None:
break
remaining = deadline - time.time()
if remaining <= 0:
break
delay = min(delay * 2, remaining, 0.05)
time.sleep(delay)
return res
def terminate(self):
if self.returncode is None:
try:
os.kill(self.pid, signal.SIGTERM)
except OSError, e:
if self.wait(timeout=0.1) is None:
raise
@staticmethod
def thread_is_spawning():
return False
#
# Windows
#
else:
import thread
import msvcrt
import _subprocess
import time
from _multiprocessing import win32, Connection, PipeConnection
from .util import Finalize
#try:
# from cPickle import dump, load, HIGHEST_PROTOCOL
#except ImportError:
from pickle import load, HIGHEST_PROTOCOL
def dump(obj, file, protocol=None):
ForkingPickler(file, protocol).dump(obj)
#
#
#
TERMINATE = 0x10000
WINEXE = (sys.platform == 'win32' and getattr(sys, 'frozen', False))
WINSERVICE = sys.executable.lower().endswith("pythonservice.exe")
exit = win32.ExitProcess
close = win32.CloseHandle
#
# _python_exe is the assumed path to the python executable.
# People embedding Python want to modify it.
#
if WINSERVICE:
_python_exe = os.path.join(sys.exec_prefix, 'python.exe')
else:
_python_exe = sys.executable
def set_executable(exe):
global _python_exe
_python_exe = exe
#
#
#
def duplicate(handle, target_process=None, inheritable=False):
if target_process is None:
target_process = _subprocess.GetCurrentProcess()
return _subprocess.DuplicateHandle(
_subprocess.GetCurrentProcess(), handle, target_process,
0, inheritable, _subprocess.DUPLICATE_SAME_ACCESS
).Detach()
#
# We define a Popen class similar to the one from subprocess, but
# whose constructor takes a process object as its argument.
#
class Popen(object):
'''
Start a subprocess to run the code of a process object
'''
_tls = thread._local()
def __init__(self, process_obj):
# create pipe for communication with child
rfd, wfd = os.pipe()
# get handle for read end of the pipe and make it inheritable
rhandle = duplicate(msvcrt.get_osfhandle(rfd), inheritable=True)
os.close(rfd)
# start process
cmd = get_command_line() + [rhandle]
cmd = ' '.join('"%s"' % x for x in cmd)
hp, ht, pid, tid = _subprocess.CreateProcess(
_python_exe, cmd, None, None, 1, 0, None, None, None
)
ht.Close()
close(rhandle)
# set attributes of self
self.pid = pid
self.returncode = None
self._handle = hp
# send information to child
prep_data = get_preparation_data(process_obj._name)
to_child = os.fdopen(wfd, 'wb')
Popen._tls.process_handle = int(hp)
try:
dump(prep_data, to_child, HIGHEST_PROTOCOL)
dump(process_obj, to_child, HIGHEST_PROTOCOL)
finally:
del Popen._tls.process_handle
to_child.close()
@staticmethod
def thread_is_spawning():
return getattr(Popen._tls, 'process_handle', None) is not None
@staticmethod
def duplicate_for_child(handle):
return duplicate(handle, Popen._tls.process_handle)
def wait(self, timeout=None):
if self.returncode is None:
if timeout is None:
msecs = _subprocess.INFINITE
else:
msecs = max(0, int(timeout * 1000 + 0.5))
res = _subprocess.WaitForSingleObject(int(self._handle), msecs)
if res == _subprocess.WAIT_OBJECT_0:
code = _subprocess.GetExitCodeProcess(self._handle)
if code == TERMINATE:
code = -signal.SIGTERM
self.returncode = code
return self.returncode
def poll(self):
return self.wait(timeout=0)
def terminate(self):
if self.returncode is None:
try:
_subprocess.TerminateProcess(int(self._handle), TERMINATE)
except WindowsError:
if self.wait(timeout=0.1) is None:
raise
#
#
#
def is_forking(argv):
'''
Return whether commandline indicates we are forking
'''
if len(argv) >= 2 and argv[1] == '--multiprocessing-fork':
assert len(argv) == 3
return True
else:
return False
def freeze_support():
'''
Run code for process object if this in not the main process
'''
if is_forking(sys.argv):
main()
sys.exit()
def get_command_line():
'''
Returns prefix of command line used for spawning a child process
'''
if getattr(process.current_process(), '_inheriting', False):
raise RuntimeError('''
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.''')
if getattr(sys, 'frozen', False):
return [sys.executable, '--multiprocessing-fork']
else:
prog = 'from multiprocessing.forking import main; main()'
opts = util._args_from_interpreter_flags()
return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork']
def main():
'''
Run code specified by data received over pipe
'''
assert is_forking(sys.argv)
handle = int(sys.argv[-1])
fd = msvcrt.open_osfhandle(handle, os.O_RDONLY)
from_parent = os.fdopen(fd, 'rb')
process.current_process()._inheriting = True
preparation_data = load(from_parent)
prepare(preparation_data)
self = load(from_parent)
process.current_process()._inheriting = False
from_parent.close()
exitcode = self._bootstrap()
exit(exitcode)
def get_preparation_data(name):
'''
Return info about parent needed by child to unpickle process object
'''
from .util import _logger, _log_to_stderr
d = dict(
name=name,
sys_path=sys.path,
sys_argv=sys.argv,
log_to_stderr=_log_to_stderr,
orig_dir=process.ORIGINAL_DIR,
authkey=process.current_process().authkey,
)
if _logger is not None:
d['log_level'] = _logger.getEffectiveLevel()
if not WINEXE and not WINSERVICE and \
not d['sys_argv'][0].lower().endswith('pythonservice.exe'):
main_path = getattr(sys.modules['__main__'], '__file__', None)
if not main_path and sys.argv[0] not in ('', '-c'):
main_path = sys.argv[0]
if main_path is not None:
if not os.path.isabs(main_path) and \
process.ORIGINAL_DIR is not None:
main_path = os.path.join(process.ORIGINAL_DIR, main_path)
d['main_path'] = os.path.normpath(main_path)
return d
#
# Make (Pipe)Connection picklable
#
def reduce_connection(conn):
if not Popen.thread_is_spawning():
raise RuntimeError(
'By default %s objects can only be shared between processes\n'
'using inheritance' % type(conn).__name__
)
return type(conn), (Popen.duplicate_for_child(conn.fileno()),
conn.readable, conn.writable)
ForkingPickler.register(Connection, reduce_connection)
ForkingPickler.register(PipeConnection, reduce_connection)
#
# Prepare current process
#
old_main_modules = []
def prepare(data):
'''
Try to get current process ready to unpickle process object
'''
old_main_modules.append(sys.modules['__main__'])
if 'name' in data:
process.current_process().name = data['name']
if 'authkey' in data:
process.current_process()._authkey = data['authkey']
if 'log_to_stderr' in data and data['log_to_stderr']:
util.log_to_stderr()
if 'log_level' in data:
util.get_logger().setLevel(data['log_level'])
if 'sys_path' in data:
sys.path = data['sys_path']
if 'sys_argv' in data:
sys.argv = data['sys_argv']
if 'dir' in data:
os.chdir(data['dir'])
if 'orig_dir' in data:
process.ORIGINAL_DIR = data['orig_dir']
if 'main_path' in data:
# XXX (ncoghlan): The following code makes several bogus
# assumptions regarding the relationship between __file__
# and a module's real name. See PEP 302 and issue #10845
# The problem is resolved properly in Python 3.4+, as
# described in issue #19946
main_path = data['main_path']
main_name = os.path.splitext(os.path.basename(main_path))[0]
if main_name == '__init__':
main_name = os.path.basename(os.path.dirname(main_path))
if main_name == '__main__':
# For directory and zipfile execution, we assume an implicit
# "if __name__ == '__main__':" around the module, and don't
# rerun the main module code in spawned processes
main_module = sys.modules['__main__']
main_module.__file__ = main_path
elif main_name != 'ipython':
# Main modules not actually called __main__.py may
# contain additional code that should still be executed
import imp
if main_path is None:
dirs = None
elif os.path.basename(main_path).startswith('__init__.py'):
dirs = [os.path.dirname(os.path.dirname(main_path))]
else:
dirs = [os.path.dirname(main_path)]
assert main_name not in sys.modules, main_name
file, path_name, etc = imp.find_module(main_name, dirs)
try:
# We would like to do "imp.load_module('__main__', ...)"
# here. However, that would cause 'if __name__ ==
# "__main__"' clauses to be executed.
main_module = imp.load_module(
'__parents_main__', file, path_name, etc
)
finally:
if file:
file.close()
sys.modules['__main__'] = main_module
main_module.__name__ = '__main__'
# Try to make the potentially picklable objects in
# sys.modules['__main__'] realize they are in the main
# module -- somewhat ugly.
for obj in main_module.__dict__.values():
try:
if obj.__module__ == '__parents_main__':
obj.__module__ = '__main__'
except Exception:
pass
#
# Module which supports allocation of memory from an mmap
#
# multiprocessing/heap.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
import bisect
import mmap
import tempfile
import os
import sys
import threading
import itertools
import _multiprocessing
from multiprocessing.util import Finalize, info
from multiprocessing.forking import assert_spawning
__all__ = ['BufferWrapper']
#
# Inheirtable class which wraps an mmap, and from which blocks can be allocated
#
if sys.platform == 'win32':
from _multiprocessing import win32
class Arena(object):
_counter = itertools.count()
def __init__(self, size):
self.size = size
self.name = 'pym-%d-%d' % (os.getpid(), Arena._counter.next())
self.buffer = mmap.mmap(-1, self.size, tagname=self.name)
assert win32.GetLastError() == 0, 'tagname already in use'
self._state = (self.size, self.name)
def __getstate__(self):
assert_spawning(self)
return self._state
def __setstate__(self, state):
self.size, self.name = self._state = state
self.buffer = mmap.mmap(-1, self.size, tagname=self.name)
assert win32.GetLastError() == win32.ERROR_ALREADY_EXISTS
else:
class Arena(object):
def __init__(self, size):
self.buffer = mmap.mmap(-1, size)
self.size = size
self.name = None
#
# Class allowing allocation of chunks of memory from arenas
#
class Heap(object):
_alignment = 8
def __init__(self, size=mmap.PAGESIZE):
self._lastpid = os.getpid()
self._lock = threading.Lock()
self._size = size
self._lengths = []
self._len_to_seq = {}
self._start_to_block = {}
self._stop_to_block = {}
self._allocated_blocks = set()
self._arenas = []
# list of pending blocks to free - see free() comment below
self._pending_free_blocks = []
@staticmethod
def _roundup(n, alignment):
# alignment must be a power of 2
mask = alignment - 1
return (n + mask) & ~mask
def _malloc(self, size):
# returns a large enough block -- it might be much larger
i = bisect.bisect_left(self._lengths, size)
if i == len(self._lengths):
length = self._roundup(max(self._size, size), mmap.PAGESIZE)
self._size *= 2
info('allocating a new mmap of length %d', length)
arena = Arena(length)
self._arenas.append(arena)
return (arena, 0, length)
else:
length = self._lengths[i]
seq = self._len_to_seq[length]
block = seq.pop()
if not seq:
del self._len_to_seq[length], self._lengths[i]
(arena, start, stop) = block
del self._start_to_block[(arena, start)]
del self._stop_to_block[(arena, stop)]
return block
def _free(self, block):
# free location and try to merge with neighbours
(arena, start, stop) = block
try:
prev_block = self._stop_to_block[(arena, start)]
except KeyError:
pass
else:
start, _ = self._absorb(prev_block)
try:
next_block = self._start_to_block[(arena, stop)]
except KeyError:
pass
else:
_, stop = self._absorb(next_block)
block = (arena, start, stop)
length = stop - start
try:
self._len_to_seq[length].append(block)
except KeyError:
self._len_to_seq[length] = [block]
bisect.insort(self._lengths, length)
self._start_to_block[(arena, start)] = block
self._stop_to_block[(arena, stop)] = block
def _absorb(self, block):
# deregister this block so it can be merged with a neighbour
(arena, start, stop) = block
del self._start_to_block[(arena, start)]
del self._stop_to_block[(arena, stop)]
length = stop - start
seq = self._len_to_seq[length]
seq.remove(block)
if not seq:
del self._len_to_seq[length]
self._lengths.remove(length)
return start, stop
def _free_pending_blocks(self):
# Free all the blocks in the pending list - called with the lock held.
while True:
try:
block = self._pending_free_blocks.pop()
except IndexError:
break
self._allocated_blocks.remove(block)
self._free(block)
def free(self, block):
# free a block returned by malloc()
# Since free() can be called asynchronously by the GC, it could happen
# that it's called while self._lock is held: in that case,
# self._lock.acquire() would deadlock (issue #12352). To avoid that, a
# trylock is used instead, and if the lock can't be acquired
# immediately, the block is added to a list of blocks to be freed
# synchronously sometimes later from malloc() or free(), by calling
# _free_pending_blocks() (appending and retrieving from a list is not
# strictly thread-safe but under cPython it's atomic thanks to the GIL).
assert os.getpid() == self._lastpid
if not self._lock.acquire(False):
# can't acquire the lock right now, add the block to the list of
# pending blocks to free
self._pending_free_blocks.append(block)
else:
# we hold the lock
try:
self._free_pending_blocks()
self._allocated_blocks.remove(block)
self._free(block)
finally:
self._lock.release()
def malloc(self, size):
# return a block of right size (possibly rounded up)
assert 0 <= size < sys.maxint
if os.getpid() != self._lastpid:
self.__init__() # reinitialize after fork
self._lock.acquire()
self._free_pending_blocks()
try:
size = self._roundup(max(size,1), self._alignment)
(arena, start, stop) = self._malloc(size)
new_stop = start + size
if new_stop < stop:
self._free((arena, new_stop, stop))
block = (arena, start, new_stop)
self._allocated_blocks.add(block)
return block
finally:
self._lock.release()
#
# Class representing a chunk of an mmap -- can be inherited
#
class BufferWrapper(object):
_heap = Heap()
def __init__(self, size):
assert 0 <= size < sys.maxint
block = BufferWrapper._heap.malloc(size)
self._state = (block, size)
Finalize(self, BufferWrapper._heap.free, args=(block,))
def get_address(self):
(arena, start, stop), size = self._state
address, length = _multiprocessing.address_of_buffer(arena.buffer)
assert size <= length
return address + start
def get_size(self):
return self._state[1]
#
# Module providing the `SyncManager` class for dealing
# with shared objects
#
# multiprocessing/managers.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = [ 'BaseManager', 'SyncManager', 'BaseProxy', 'Token' ]
#
# Imports
#
import os
import sys
import weakref
import threading
import array
import Queue
from traceback import format_exc
from multiprocessing import Process, current_process, active_children, Pool, util, connection
from multiprocessing.process import AuthenticationString
from multiprocessing.forking import exit, Popen, assert_spawning, ForkingPickler
from multiprocessing.util import Finalize, info
try:
from cPickle import PicklingError
except ImportError:
from pickle import PicklingError
#
# Register some things for pickling
#
def reduce_array(a):
return array.array, (a.typecode, a.tostring())
ForkingPickler.register(array.array, reduce_array)
view_types = [type(getattr({}, name)()) for name in ('items','keys','values')]
#
# Type for identifying shared objects
#
class Token(object):
'''
Type to uniquely indentify a shared object
'''
__slots__ = ('typeid', 'address', 'id')
def __init__(self, typeid, address, id):
(self.typeid, self.address, self.id) = (typeid, address, id)
def __getstate__(self):
return (self.typeid, self.address, self.id)
def __setstate__(self, state):
(self.typeid, self.address, self.id) = state
def __repr__(self):
return 'Token(typeid=%r, address=%r, id=%r)' % \
(self.typeid, self.address, self.id)
#
# Function for communication with a manager's server process
#
def dispatch(c, id, methodname, args=(), kwds={}):
'''
Send a message to manager using connection `c` and return response
'''
c.send((id, methodname, args, kwds))
kind, result = c.recv()
if kind == '#RETURN':
return result
raise convert_to_error(kind, result)
def convert_to_error(kind, result):
if kind == '#ERROR':
return result
elif kind == '#TRACEBACK':
assert type(result) is str
return RemoteError(result)
elif kind == '#UNSERIALIZABLE':
assert type(result) is str
return RemoteError('Unserializable message: %s\n' % result)
else:
return ValueError('Unrecognized message type')
class RemoteError(Exception):
def __str__(self):
return ('\n' + '-'*75 + '\n' + str(self.args[0]) + '-'*75)
#
# Functions for finding the method names of an object
#
def all_methods(obj):
'''
Return a list of names of methods of `obj`
'''
temp = []
for name in dir(obj):
func = getattr(obj, name)
if hasattr(func, '__call__'):
temp.append(name)
return temp
def public_methods(obj):
'''
Return a list of names of methods of `obj` which do not start with '_'
'''
return [name for name in all_methods(obj) if name[0] != '_']
#
# Server which is run in a process controlled by a manager
#
class Server(object):
'''
Server class which runs in a process controlled by a manager object
'''
public = ['shutdown', 'create', 'accept_connection', 'get_methods',
'debug_info', 'number_of_objects', 'dummy', 'incref', 'decref']
def __init__(self, registry, address, authkey, serializer):
assert isinstance(authkey, bytes)
self.registry = registry
self.authkey = AuthenticationString(authkey)
Listener, Client = listener_client[serializer]
# do authentication later
self.listener = Listener(address=address, backlog=16)
self.address = self.listener.address
self.id_to_obj = {'0': (None, ())}
self.id_to_refcount = {}
self.mutex = threading.RLock()
self.stop = 0
def serve_forever(self):
'''
Run the server forever
'''
current_process()._manager_server = self
try:
try:
while 1:
try:
c = self.listener.accept()
except (OSError, IOError):
continue
t = threading.Thread(target=self.handle_request, args=(c,))
t.daemon = True
t.start()
except (KeyboardInterrupt, SystemExit):
pass
finally:
self.stop = 999
self.listener.close()
def handle_request(self, c):
'''
Handle a new connection
'''
funcname = result = request = None
try:
connection.deliver_challenge(c, self.authkey)
connection.answer_challenge(c, self.authkey)
request = c.recv()
ignore, funcname, args, kwds = request
assert funcname in self.public, '%r unrecognized' % funcname
func = getattr(self, funcname)
except Exception:
msg = ('#TRACEBACK', format_exc())
else:
try:
result = func(c, *args, **kwds)
except Exception:
msg = ('#TRACEBACK', format_exc())
else:
msg = ('#RETURN', result)
try:
c.send(msg)
except Exception, e:
try:
c.send(('#TRACEBACK', format_exc()))
except Exception:
pass
util.info('Failure to send message: %r', msg)
util.info(' ... request was %r', request)
util.info(' ... exception was %r', e)
c.close()
def serve_client(self, conn):
'''
Handle requests from the proxies in a particular process/thread
'''
util.debug('starting server thread to service %r',
threading.current_thread().name)
recv = conn.recv
send = conn.send
id_to_obj = self.id_to_obj
while not self.stop:
try:
methodname = obj = None
request = recv()
ident, methodname, args, kwds = request
obj, exposed, gettypeid = id_to_obj[ident]
if methodname not in exposed:
raise AttributeError(
'method %r of %r object is not in exposed=%r' %
(methodname, type(obj), exposed)
)
function = getattr(obj, methodname)
try:
res = function(*args, **kwds)
except Exception, e:
msg = ('#ERROR', e)
else:
typeid = gettypeid and gettypeid.get(methodname, None)
if typeid:
rident, rexposed = self.create(conn, typeid, res)
token = Token(typeid, self.address, rident)
msg = ('#PROXY', (rexposed, token))
else:
msg = ('#RETURN', res)
except AttributeError:
if methodname is None:
msg = ('#TRACEBACK', format_exc())
else:
try:
fallback_func = self.fallback_mapping[methodname]
result = fallback_func(
self, conn, ident, obj, *args, **kwds
)
msg = ('#RETURN', result)
except Exception:
msg = ('#TRACEBACK', format_exc())
except EOFError:
util.debug('got EOF -- exiting thread serving %r',
threading.current_thread().name)
sys.exit(0)
except Exception:
msg = ('#TRACEBACK', format_exc())
try:
try:
send(msg)
except Exception, e:
send(('#UNSERIALIZABLE', format_exc()))
except Exception, e:
util.info('exception in thread serving %r',
threading.current_thread().name)
util.info(' ... message was %r', msg)
util.info(' ... exception was %r', e)
conn.close()
sys.exit(1)
def fallback_getvalue(self, conn, ident, obj):
return obj
def fallback_str(self, conn, ident, obj):
return str(obj)
def fallback_repr(self, conn, ident, obj):
return repr(obj)
fallback_mapping = {
'__str__':fallback_str,
'__repr__':fallback_repr,
'#GETVALUE':fallback_getvalue
}
def dummy(self, c):
pass
def debug_info(self, c):
'''
Return some info --- useful to spot problems with refcounting
'''
self.mutex.acquire()
try:
result = []
keys = self.id_to_obj.keys()
keys.sort()
for ident in keys:
if ident != '0':
result.append(' %s: refcount=%s\n %s' %
(ident, self.id_to_refcount[ident],
str(self.id_to_obj[ident][0])[:75]))
return '\n'.join(result)
finally:
self.mutex.release()
def number_of_objects(self, c):
'''
Number of shared objects
'''
return len(self.id_to_obj) - 1 # don't count ident='0'
def shutdown(self, c):
'''
Shutdown this process
'''
try:
try:
util.debug('manager received shutdown message')
c.send(('#RETURN', None))
if sys.stdout != sys.__stdout__:
util.debug('resetting stdout, stderr')
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
util._run_finalizers(0)
for p in active_children():
util.debug('terminating a child process of manager')
p.terminate()
for p in active_children():
util.debug('terminating a child process of manager')
p.join()
util._run_finalizers()
util.info('manager exiting with exitcode 0')
except:
import traceback
traceback.print_exc()
finally:
exit(0)
def create(self, c, typeid, *args, **kwds):
'''
Create a new shared object and return its id
'''
self.mutex.acquire()
try:
callable, exposed, method_to_typeid, proxytype = \
self.registry[typeid]
if callable is None:
assert len(args) == 1 and not kwds
obj = args[0]
else:
obj = callable(*args, **kwds)
if exposed is None:
exposed = public_methods(obj)
if method_to_typeid is not None:
assert type(method_to_typeid) is dict
exposed = list(exposed) + list(method_to_typeid)
ident = '%x' % id(obj) # convert to string because xmlrpclib
# only has 32 bit signed integers
util.debug('%r callable returned object with id %r', typeid, ident)
self.id_to_obj[ident] = (obj, set(exposed), method_to_typeid)
if ident not in self.id_to_refcount:
self.id_to_refcount[ident] = 0
# increment the reference count immediately, to avoid
# this object being garbage collected before a Proxy
# object for it can be created. The caller of create()
# is responsible for doing a decref once the Proxy object
# has been created.
self.incref(c, ident)
return ident, tuple(exposed)
finally:
self.mutex.release()
def get_methods(self, c, token):
'''
Return the methods of the shared object indicated by token
'''
return tuple(self.id_to_obj[token.id][1])
def accept_connection(self, c, name):
'''
Spawn a new thread to serve this connection
'''
threading.current_thread().name = name
c.send(('#RETURN', None))
self.serve_client(c)
def incref(self, c, ident):
self.mutex.acquire()
try:
self.id_to_refcount[ident] += 1
finally:
self.mutex.release()
def decref(self, c, ident):
self.mutex.acquire()
try:
assert self.id_to_refcount[ident] >= 1
self.id_to_refcount[ident] -= 1
if self.id_to_refcount[ident] == 0:
del self.id_to_obj[ident], self.id_to_refcount[ident]
util.debug('disposing of obj with id %r', ident)
finally:
self.mutex.release()
#
# Class to represent state of a manager
#
class State(object):
__slots__ = ['value']
INITIAL = 0
STARTED = 1
SHUTDOWN = 2
#
# Mapping from serializer name to Listener and Client types
#
listener_client = {
'pickle' : (connection.Listener, connection.Client),
'xmlrpclib' : (connection.XmlListener, connection.XmlClient)
}
#
# Definition of BaseManager
#
class BaseManager(object):
'''
Base class for managers
'''
_registry = {}
_Server = Server
def __init__(self, address=None, authkey=None, serializer='pickle'):
if authkey is None:
authkey = current_process().authkey
self._address = address # XXX not final address if eg ('', 0)
self._authkey = AuthenticationString(authkey)
self._state = State()
self._state.value = State.INITIAL
self._serializer = serializer
self._Listener, self._Client = listener_client[serializer]
def __reduce__(self):
return type(self).from_address, \
(self._address, self._authkey, self._serializer)
def get_server(self):
'''
Return server object with serve_forever() method and address attribute
'''
assert self._state.value == State.INITIAL
return Server(self._registry, self._address,
self._authkey, self._serializer)
def connect(self):
'''
Connect manager object to the server process
'''
Listener, Client = listener_client[self._serializer]
conn = Client(self._address, authkey=self._authkey)
dispatch(conn, None, 'dummy')
self._state.value = State.STARTED
def start(self, initializer=None, initargs=()):
'''
Spawn a server process for this manager object
'''
assert self._state.value == State.INITIAL
if initializer is not None and not hasattr(initializer, '__call__'):
raise TypeError('initializer must be a callable')
# pipe over which we will retrieve address of server
reader, writer = connection.Pipe(duplex=False)
# spawn process which runs a server
self._process = Process(
target=type(self)._run_server,
args=(self._registry, self._address, self._authkey,
self._serializer, writer, initializer, initargs),
)
ident = ':'.join(str(i) for i in self._process._identity)
self._process.name = type(self).__name__ + '-' + ident
self._process.start()
# get address of server
writer.close()
self._address = reader.recv()
reader.close()
# register a finalizer
self._state.value = State.STARTED
self.shutdown = util.Finalize(
self, type(self)._finalize_manager,
args=(self._process, self._address, self._authkey,
self._state, self._Client),
exitpriority=0
)
@classmethod
def _run_server(cls, registry, address, authkey, serializer, writer,
initializer=None, initargs=()):
'''
Create a server, report its address and run it
'''
if initializer is not None:
initializer(*initargs)
# create server
server = cls._Server(registry, address, authkey, serializer)
# inform parent process of the server's address
writer.send(server.address)
writer.close()
# run the manager
util.info('manager serving at %r', server.address)
server.serve_forever()
def _create(self, typeid, *args, **kwds):
'''
Create a new shared object; return the token and exposed tuple
'''
assert self._state.value == State.STARTED, 'server not yet started'
conn = self._Client(self._address, authkey=self._authkey)
try:
id, exposed = dispatch(conn, None, 'create', (typeid,)+args, kwds)
finally:
conn.close()
return Token(typeid, self._address, id), exposed
def join(self, timeout=None):
'''
Join the manager process (if it has been spawned)
'''
self._process.join(timeout)
def _debug_info(self):
'''
Return some info about the servers shared objects and connections
'''
conn = self._Client(self._address, authkey=self._authkey)
try:
return dispatch(conn, None, 'debug_info')
finally:
conn.close()
def _number_of_objects(self):
'''
Return the number of shared objects
'''
conn = self._Client(self._address, authkey=self._authkey)
try:
return dispatch(conn, None, 'number_of_objects')
finally:
conn.close()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.shutdown()
@staticmethod
def _finalize_manager(process, address, authkey, state, _Client):
'''
Shutdown the manager process; will be registered as a finalizer
'''
if process.is_alive():
util.info('sending shutdown message to manager')
try:
conn = _Client(address, authkey=authkey)
try:
dispatch(conn, None, 'shutdown')
finally:
conn.close()
except Exception:
pass
process.join(timeout=0.2)
if process.is_alive():
util.info('manager still alive')
if hasattr(process, 'terminate'):
util.info('trying to `terminate()` manager process')
process.terminate()
process.join(timeout=0.1)
if process.is_alive():
util.info('manager still alive after terminate')
state.value = State.SHUTDOWN
try:
del BaseProxy._address_to_local[address]
except KeyError:
pass
address = property(lambda self: self._address)
@classmethod
def register(cls, typeid, callable=None, proxytype=None, exposed=None,
method_to_typeid=None, create_method=True):
'''
Register a typeid with the manager type
'''
if '_registry' not in cls.__dict__:
cls._registry = cls._registry.copy()
if proxytype is None:
proxytype = AutoProxy
exposed = exposed or getattr(proxytype, '_exposed_', None)
method_to_typeid = method_to_typeid or \
getattr(proxytype, '_method_to_typeid_', None)
if method_to_typeid:
for key, value in method_to_typeid.items():
assert type(key) is str, '%r is not a string' % key
assert type(value) is str, '%r is not a string' % value
cls._registry[typeid] = (
callable, exposed, method_to_typeid, proxytype
)
if create_method:
def temp(self, *args, **kwds):
util.debug('requesting creation of a shared %r object', typeid)
token, exp = self._create(typeid, *args, **kwds)
proxy = proxytype(
token, self._serializer, manager=self,
authkey=self._authkey, exposed=exp
)
conn = self._Client(token.address, authkey=self._authkey)
dispatch(conn, None, 'decref', (token.id,))
return proxy
temp.__name__ = typeid
setattr(cls, typeid, temp)
#
# Subclass of set which get cleared after a fork
#
class ProcessLocalSet(set):
def __init__(self):
util.register_after_fork(self, lambda obj: obj.clear())
def __reduce__(self):
return type(self), ()
#
# Definition of BaseProxy
#
class BaseProxy(object):
'''
A base for proxies of shared objects
'''
_address_to_local = {}
_mutex = util.ForkAwareThreadLock()
def __init__(self, token, serializer, manager=None,
authkey=None, exposed=None, incref=True):
BaseProxy._mutex.acquire()
try:
tls_idset = BaseProxy._address_to_local.get(token.address, None)
if tls_idset is None:
tls_idset = util.ForkAwareLocal(), ProcessLocalSet()
BaseProxy._address_to_local[token.address] = tls_idset
finally:
BaseProxy._mutex.release()
# self._tls is used to record the connection used by this
# thread to communicate with the manager at token.address
self._tls = tls_idset[0]
# self._idset is used to record the identities of all shared
# objects for which the current process owns references and
# which are in the manager at token.address
self._idset = tls_idset[1]
self._token = token
self._id = self._token.id
self._manager = manager
self._serializer = serializer
self._Client = listener_client[serializer][1]
if authkey is not None:
self._authkey = AuthenticationString(authkey)
elif self._manager is not None:
self._authkey = self._manager._authkey
else:
self._authkey = current_process().authkey
if incref:
self._incref()
util.register_after_fork(self, BaseProxy._after_fork)
def _connect(self):
util.debug('making connection to manager')
name = current_process().name
if threading.current_thread().name != 'MainThread':
name += '|' + threading.current_thread().name
conn = self._Client(self._token.address, authkey=self._authkey)
dispatch(conn, None, 'accept_connection', (name,))
self._tls.connection = conn
def _callmethod(self, methodname, args=(), kwds={}):
'''
Try to call a method of the referrent and return a copy of the result
'''
try:
conn = self._tls.connection
except AttributeError:
util.debug('thread %r does not own a connection',
threading.current_thread().name)
self._connect()
conn = self._tls.connection
conn.send((self._id, methodname, args, kwds))
kind, result = conn.recv()
if kind == '#RETURN':
return result
elif kind == '#PROXY':
exposed, token = result
proxytype = self._manager._registry[token.typeid][-1]
token.address = self._token.address
proxy = proxytype(
token, self._serializer, manager=self._manager,
authkey=self._authkey, exposed=exposed
)
conn = self._Client(token.address, authkey=self._authkey)
dispatch(conn, None, 'decref', (token.id,))
return proxy
raise convert_to_error(kind, result)
def _getvalue(self):
'''
Get a copy of the value of the referent
'''
return self._callmethod('#GETVALUE')
def _incref(self):
conn = self._Client(self._token.address, authkey=self._authkey)
dispatch(conn, None, 'incref', (self._id,))
util.debug('INCREF %r', self._token.id)
self._idset.add(self._id)
state = self._manager and self._manager._state
self._close = util.Finalize(
self, BaseProxy._decref,
args=(self._token, self._authkey, state,
self._tls, self._idset, self._Client),
exitpriority=10
)
@staticmethod
def _decref(token, authkey, state, tls, idset, _Client):
idset.discard(token.id)
# check whether manager is still alive
if state is None or state.value == State.STARTED:
# tell manager this process no longer cares about referent
try:
util.debug('DECREF %r', token.id)
conn = _Client(token.address, authkey=authkey)
dispatch(conn, None, 'decref', (token.id,))
except Exception, e:
util.debug('... decref failed %s', e)
else:
util.debug('DECREF %r -- manager already shutdown', token.id)
# check whether we can close this thread's connection because
# the process owns no more references to objects for this manager
if not idset and hasattr(tls, 'connection'):
util.debug('thread %r has no more proxies so closing conn',
threading.current_thread().name)
tls.connection.close()
del tls.connection
def _after_fork(self):
self._manager = None
try:
self._incref()
except Exception, e:
# the proxy may just be for a manager which has shutdown
util.info('incref failed: %s' % e)
def __reduce__(self):
kwds = {}
if Popen.thread_is_spawning():
kwds['authkey'] = self._authkey
if getattr(self, '_isauto', False):
kwds['exposed'] = self._exposed_
return (RebuildProxy,
(AutoProxy, self._token, self._serializer, kwds))
else:
return (RebuildProxy,
(type(self), self._token, self._serializer, kwds))
def __deepcopy__(self, memo):
return self._getvalue()
def __repr__(self):
return '<%s object, typeid %r at %s>' % \
(type(self).__name__, self._token.typeid, '0x%x' % id(self))
def __str__(self):
'''
Return representation of the referent (or a fall-back if that fails)
'''
try:
return self._callmethod('__repr__')
except Exception:
return repr(self)[:-1] + "; '__str__()' failed>"
#
# Function used for unpickling
#
def RebuildProxy(func, token, serializer, kwds):
'''
Function used for unpickling proxy objects.
If possible the shared object is returned, or otherwise a proxy for it.
'''
server = getattr(current_process(), '_manager_server', None)
if server and server.address == token.address:
return server.id_to_obj[token.id][0]
else:
incref = (
kwds.pop('incref', True) and
not getattr(current_process(), '_inheriting', False)
)
return func(token, serializer, incref=incref, **kwds)
#
# Functions to create proxies and proxy types
#
def MakeProxyType(name, exposed, _cache={}):
'''
Return a proxy type whose methods are given by `exposed`
'''
exposed = tuple(exposed)
try:
return _cache[(name, exposed)]
except KeyError:
pass
dic = {}
for meth in exposed:
exec '''def %s(self, *args, **kwds):
return self._callmethod(%r, args, kwds)''' % (meth, meth) in dic
ProxyType = type(name, (BaseProxy,), dic)
ProxyType._exposed_ = exposed
_cache[(name, exposed)] = ProxyType
return ProxyType
def AutoProxy(token, serializer, manager=None, authkey=None,
exposed=None, incref=True):
'''
Return an auto-proxy for `token`
'''
_Client = listener_client[serializer][1]
if exposed is None:
conn = _Client(token.address, authkey=authkey)
try:
exposed = dispatch(conn, None, 'get_methods', (token,))
finally:
conn.close()
if authkey is None and manager is not None:
authkey = manager._authkey
if authkey is None:
authkey = current_process().authkey
ProxyType = MakeProxyType('AutoProxy[%s]' % token.typeid, exposed)
proxy = ProxyType(token, serializer, manager=manager, authkey=authkey,
incref=incref)
proxy._isauto = True
return proxy
#
# Types/callables which we will register with SyncManager
#
class Namespace(object):
def __init__(self, **kwds):
self.__dict__.update(kwds)
def __repr__(self):
items = self.__dict__.items()
temp = []
for name, value in items:
if not name.startswith('_'):
temp.append('%s=%r' % (name, value))
temp.sort()
return 'Namespace(%s)' % str.join(', ', temp)
class Value(object):
def __init__(self, typecode, value, lock=True):
self._typecode = typecode
self._value = value
def get(self):
return self._value
def set(self, value):
self._value = value
def __repr__(self):
return '%s(%r, %r)'%(type(self).__name__, self._typecode, self._value)
value = property(get, set)
def Array(typecode, sequence, lock=True):
return array.array(typecode, sequence)
#
# Proxy types used by SyncManager
#
class IteratorProxy(BaseProxy):
# XXX remove methods for Py3.0 and Py2.6
_exposed_ = ('__next__', 'next', 'send', 'throw', 'close')
def __iter__(self):
return self
def __next__(self, *args):
return self._callmethod('__next__', args)
def next(self, *args):
return self._callmethod('next', args)
def send(self, *args):
return self._callmethod('send', args)
def throw(self, *args):
return self._callmethod('throw', args)
def close(self, *args):
return self._callmethod('close', args)
class AcquirerProxy(BaseProxy):
_exposed_ = ('acquire', 'release')
def acquire(self, blocking=True):
return self._callmethod('acquire', (blocking,))
def release(self):
return self._callmethod('release')
def __enter__(self):
return self._callmethod('acquire')
def __exit__(self, exc_type, exc_val, exc_tb):
return self._callmethod('release')
class ConditionProxy(AcquirerProxy):
# XXX will Condition.notfyAll() name be available in Py3.0?
_exposed_ = ('acquire', 'release', 'wait', 'notify', 'notify_all')
def wait(self, timeout=None):
return self._callmethod('wait', (timeout,))
def notify(self):
return self._callmethod('notify')
def notify_all(self):
return self._callmethod('notify_all')
class EventProxy(BaseProxy):
_exposed_ = ('is_set', 'set', 'clear', 'wait')
def is_set(self):
return self._callmethod('is_set')
def set(self):
return self._callmethod('set')
def clear(self):
return self._callmethod('clear')
def wait(self, timeout=None):
return self._callmethod('wait', (timeout,))
class NamespaceProxy(BaseProxy):
_exposed_ = ('__getattribute__', '__setattr__', '__delattr__')
def __getattr__(self, key):
if key[0] == '_':
return object.__getattribute__(self, key)
callmethod = object.__getattribute__(self, '_callmethod')
return callmethod('__getattribute__', (key,))
def __setattr__(self, key, value):
if key[0] == '_':
return object.__setattr__(self, key, value)
callmethod = object.__getattribute__(self, '_callmethod')
return callmethod('__setattr__', (key, value))
def __delattr__(self, key):
if key[0] == '_':
return object.__delattr__(self, key)
callmethod = object.__getattribute__(self, '_callmethod')
return callmethod('__delattr__', (key,))
class ValueProxy(BaseProxy):
_exposed_ = ('get', 'set')
def get(self):
return self._callmethod('get')
def set(self, value):
return self._callmethod('set', (value,))
value = property(get, set)
BaseListProxy = MakeProxyType('BaseListProxy', (
'__add__', '__contains__', '__delitem__', '__delslice__',
'__getitem__', '__getslice__', '__len__', '__mul__',
'__reversed__', '__rmul__', '__setitem__', '__setslice__',
'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove',
'reverse', 'sort', '__imul__'
)) # XXX __getslice__ and __setslice__ unneeded in Py3.0
class ListProxy(BaseListProxy):
def __iadd__(self, value):
self._callmethod('extend', (value,))
return self
def __imul__(self, value):
self._callmethod('__imul__', (value,))
return self
DictProxy = MakeProxyType('DictProxy', (
'__contains__', '__delitem__', '__getitem__', '__iter__', '__len__',
'__setitem__', 'clear', 'copy', 'get', 'has_key', 'items',
'keys', 'pop', 'popitem', 'setdefault', 'update', 'values'
))
DictProxy._method_to_typeid_ = {
'__iter__': 'Iterator',
}
ArrayProxy = MakeProxyType('ArrayProxy', (
'__len__', '__getitem__', '__setitem__', '__getslice__', '__setslice__'
)) # XXX __getslice__ and __setslice__ unneeded in Py3.0
PoolProxy = MakeProxyType('PoolProxy', (
'apply', 'apply_async', 'close', 'imap', 'imap_unordered', 'join',
'map', 'map_async', 'terminate'
))
PoolProxy._method_to_typeid_ = {
'apply_async': 'AsyncResult',
'map_async': 'AsyncResult',
'imap': 'Iterator',
'imap_unordered': 'Iterator'
}
#
# Definition of SyncManager
#
class SyncManager(BaseManager):
'''
Subclass of `BaseManager` which supports a number of shared object types.
The types registered are those intended for the synchronization
of threads, plus `dict`, `list` and `Namespace`.
The `multiprocessing.Manager()` function creates started instances of
this class.
'''
SyncManager.register('Queue', Queue.Queue)
SyncManager.register('JoinableQueue', Queue.Queue)
SyncManager.register('Event', threading.Event, EventProxy)
SyncManager.register('Lock', threading.Lock, AcquirerProxy)
SyncManager.register('RLock', threading.RLock, AcquirerProxy)
SyncManager.register('Semaphore', threading.Semaphore, AcquirerProxy)
SyncManager.register('BoundedSemaphore', threading.BoundedSemaphore,
AcquirerProxy)
SyncManager.register('Condition', threading.Condition, ConditionProxy)
SyncManager.register('Pool', Pool, PoolProxy)
SyncManager.register('list', list, ListProxy)
SyncManager.register('dict', dict, DictProxy)
SyncManager.register('Value', Value, ValueProxy)
SyncManager.register('Array', Array, ArrayProxy)
SyncManager.register('Namespace', Namespace, NamespaceProxy)
# types returned by methods of PoolProxy
SyncManager.register('Iterator', proxytype=IteratorProxy, create_method=False)
SyncManager.register('AsyncResult', create_method=False)
#
# Module providing the `Pool` class for managing a process pool
#
# multiprocessing/pool.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = ['Pool']
#
# Imports
#
import threading
import Queue
import itertools
import collections
import time
from multiprocessing import Process, cpu_count, TimeoutError
from multiprocessing.util import Finalize, debug
#
# Constants representing the state of a pool
#
RUN = 0
CLOSE = 1
TERMINATE = 2
#
# Miscellaneous
#
job_counter = itertools.count()
def mapstar(args):
return map(*args)
#
# Code run by worker processes
#
class MaybeEncodingError(Exception):
"""Wraps possible unpickleable errors, so they can be
safely sent through the socket."""
def __init__(self, exc, value):
self.exc = repr(exc)
self.value = repr(value)
super(MaybeEncodingError, self).__init__(self.exc, self.value)
def __str__(self):
return "Error sending result: '%s'. Reason: '%s'" % (self.value,
self.exc)
def __repr__(self):
return "<MaybeEncodingError: %s>" % str(self)
def worker(inqueue, outqueue, initializer=None, initargs=(), maxtasks=None):
assert maxtasks is None or (type(maxtasks) in (int, long) and maxtasks > 0)
put = outqueue.put
get = inqueue.get
if hasattr(inqueue, '_writer'):
inqueue._writer.close()
outqueue._reader.close()
if initializer is not None:
initializer(*initargs)
completed = 0
while maxtasks is None or (maxtasks and completed < maxtasks):
try:
task = get()
except (EOFError, IOError):
debug('worker got EOFError or IOError -- exiting')
break
if task is None:
debug('worker got sentinel -- exiting')
break
job, i, func, args, kwds = task
try:
result = (True, func(*args, **kwds))
except Exception, e:
result = (False, e)
try:
put((job, i, result))
except Exception as e:
wrapped = MaybeEncodingError(e, result[1])
debug("Possible encoding error while sending result: %s" % (
wrapped))
put((job, i, (False, wrapped)))
task = job = result = func = args = kwds = None
completed += 1
debug('worker exiting after %d tasks' % completed)
#
# Class representing a process pool
#
class Pool(object):
'''
Class which supports an async version of the `apply()` builtin
'''
Process = Process
def __init__(self, processes=None, initializer=None, initargs=(),
maxtasksperchild=None):
self._setup_queues()
self._taskqueue = Queue.Queue()
self._cache = {}
self._state = RUN
self._maxtasksperchild = maxtasksperchild
self._initializer = initializer
self._initargs = initargs
if processes is None:
try:
processes = cpu_count()
except NotImplementedError:
processes = 1
if processes < 1:
raise ValueError("Number of processes must be at least 1")
if initializer is not None and not hasattr(initializer, '__call__'):
raise TypeError('initializer must be a callable')
self._processes = processes
self._pool = []
self._repopulate_pool()
self._worker_handler = threading.Thread(
target=Pool._handle_workers,
args=(self, )
)
self._worker_handler.daemon = True
self._worker_handler._state = RUN
self._worker_handler.start()
self._task_handler = threading.Thread(
target=Pool._handle_tasks,
args=(self._taskqueue, self._quick_put, self._outqueue,
self._pool, self._cache)
)
self._task_handler.daemon = True
self._task_handler._state = RUN
self._task_handler.start()
self._result_handler = threading.Thread(
target=Pool._handle_results,
args=(self._outqueue, self._quick_get, self._cache)
)
self._result_handler.daemon = True
self._result_handler._state = RUN
self._result_handler.start()
self._terminate = Finalize(
self, self._terminate_pool,
args=(self._taskqueue, self._inqueue, self._outqueue, self._pool,
self._worker_handler, self._task_handler,
self._result_handler, self._cache),
exitpriority=15
)
def _join_exited_workers(self):
"""Cleanup after any worker processes which have exited due to reaching
their specified lifetime. Returns True if any workers were cleaned up.
"""
cleaned = False
for i in reversed(range(len(self._pool))):
worker = self._pool[i]
if worker.exitcode is not None:
# worker exited
debug('cleaning up worker %d' % i)
worker.join()
cleaned = True
del self._pool[i]
return cleaned
def _repopulate_pool(self):
"""Bring the number of pool processes up to the specified number,
for use after reaping workers which have exited.
"""
for i in range(self._processes - len(self._pool)):
w = self.Process(target=worker,
args=(self._inqueue, self._outqueue,
self._initializer,
self._initargs, self._maxtasksperchild)
)
self._pool.append(w)
w.name = w.name.replace('Process', 'PoolWorker')
w.daemon = True
w.start()
debug('added worker')
def _maintain_pool(self):
"""Clean up any exited workers and start replacements for them.
"""
if self._join_exited_workers():
self._repopulate_pool()
def _setup_queues(self):
from .queues import SimpleQueue
self._inqueue = SimpleQueue()
self._outqueue = SimpleQueue()
self._quick_put = self._inqueue._writer.send
self._quick_get = self._outqueue._reader.recv
def apply(self, func, args=(), kwds={}):
'''
Equivalent of `apply()` builtin
'''
assert self._state == RUN
return self.apply_async(func, args, kwds).get()
def map(self, func, iterable, chunksize=None):
'''
Equivalent of `map()` builtin
'''
assert self._state == RUN
return self.map_async(func, iterable, chunksize).get()
def imap(self, func, iterable, chunksize=1):
'''
Equivalent of `itertools.imap()` -- can be MUCH slower than `Pool.map()`
'''
assert self._state == RUN
if chunksize == 1:
result = IMapIterator(self._cache)
self._taskqueue.put((((result._job, i, func, (x,), {})
for i, x in enumerate(iterable)), result._set_length))
return result
else:
assert chunksize > 1
task_batches = Pool._get_tasks(func, iterable, chunksize)
result = IMapIterator(self._cache)
self._taskqueue.put((((result._job, i, mapstar, (x,), {})
for i, x in enumerate(task_batches)), result._set_length))
return (item for chunk in result for item in chunk)
def imap_unordered(self, func, iterable, chunksize=1):
'''
Like `imap()` method but ordering of results is arbitrary
'''
assert self._state == RUN
if chunksize == 1:
result = IMapUnorderedIterator(self._cache)
self._taskqueue.put((((result._job, i, func, (x,), {})
for i, x in enumerate(iterable)), result._set_length))
return result
else:
assert chunksize > 1
task_batches = Pool._get_tasks(func, iterable, chunksize)
result = IMapUnorderedIterator(self._cache)
self._taskqueue.put((((result._job, i, mapstar, (x,), {})
for i, x in enumerate(task_batches)), result._set_length))
return (item for chunk in result for item in chunk)
def apply_async(self, func, args=(), kwds={}, callback=None):
'''
Asynchronous equivalent of `apply()` builtin
'''
assert self._state == RUN
result = ApplyResult(self._cache, callback)
self._taskqueue.put(([(result._job, None, func, args, kwds)], None))
return result
def map_async(self, func, iterable, chunksize=None, callback=None):
'''
Asynchronous equivalent of `map()` builtin
'''
assert self._state == RUN
if not hasattr(iterable, '__len__'):
iterable = list(iterable)
if chunksize is None:
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
chunksize += 1
if len(iterable) == 0:
chunksize = 0
task_batches = Pool._get_tasks(func, iterable, chunksize)
result = MapResult(self._cache, chunksize, len(iterable), callback)
self._taskqueue.put((((result._job, i, mapstar, (x,), {})
for i, x in enumerate(task_batches)), None))
return result
@staticmethod
def _handle_workers(pool):
thread = threading.current_thread()
# Keep maintaining workers until the cache gets drained, unless the pool
# is terminated.
while thread._state == RUN or (pool._cache and thread._state != TERMINATE):
pool._maintain_pool()
time.sleep(0.1)
# send sentinel to stop workers
pool._taskqueue.put(None)
debug('worker handler exiting')
@staticmethod
def _handle_tasks(taskqueue, put, outqueue, pool, cache):
thread = threading.current_thread()
for taskseq, set_length in iter(taskqueue.get, None):
task = None
i = -1
try:
for i, task in enumerate(taskseq):
if thread._state:
debug('task handler found thread._state != RUN')
break
try:
put(task)
except Exception as e:
job, ind = task[:2]
try:
cache[job]._set(ind, (False, e))
except KeyError:
pass
else:
if set_length:
debug('doing set_length()')
set_length(i+1)
continue
break
except Exception as ex:
job, ind = task[:2] if task else (0, 0)
if job in cache:
cache[job]._set(ind + 1, (False, ex))
if set_length:
debug('doing set_length()')
set_length(i+1)
finally:
task = taskseq = job = None
else:
debug('task handler got sentinel')
try:
# tell result handler to finish when cache is empty
debug('task handler sending sentinel to result handler')
outqueue.put(None)
# tell workers there is no more work
debug('task handler sending sentinel to workers')
for p in pool:
put(None)
except IOError:
debug('task handler got IOError when sending sentinels')
debug('task handler exiting')
@staticmethod
def _handle_results(outqueue, get, cache):
thread = threading.current_thread()
while 1:
try:
task = get()
except (IOError, EOFError):
debug('result handler got EOFError/IOError -- exiting')
return
if thread._state:
assert thread._state == TERMINATE
debug('result handler found thread._state=TERMINATE')
break
if task is None:
debug('result handler got sentinel')
break
job, i, obj = task
try:
cache[job]._set(i, obj)
except KeyError:
pass
task = job = obj = None
while cache and thread._state != TERMINATE:
try:
task = get()
except (IOError, EOFError):
debug('result handler got EOFError/IOError -- exiting')
return
if task is None:
debug('result handler ignoring extra sentinel')
continue
job, i, obj = task
try:
cache[job]._set(i, obj)
except KeyError:
pass
task = job = obj = None
if hasattr(outqueue, '_reader'):
debug('ensuring that outqueue is not full')
# If we don't make room available in outqueue then
# attempts to add the sentinel (None) to outqueue may
# block. There is guaranteed to be no more than 2 sentinels.
try:
for i in range(10):
if not outqueue._reader.poll():
break
get()
except (IOError, EOFError):
pass
debug('result handler exiting: len(cache)=%s, thread._state=%s',
len(cache), thread._state)
@staticmethod
def _get_tasks(func, it, size):
it = iter(it)
while 1:
x = tuple(itertools.islice(it, size))
if not x:
return
yield (func, x)
def __reduce__(self):
raise NotImplementedError(
'pool objects cannot be passed between processes or pickled'
)
def close(self):
debug('closing pool')
if self._state == RUN:
self._state = CLOSE
self._worker_handler._state = CLOSE
def terminate(self):
debug('terminating pool')
self._state = TERMINATE
self._worker_handler._state = TERMINATE
self._terminate()
def join(self):
debug('joining pool')
assert self._state in (CLOSE, TERMINATE)
self._worker_handler.join()
self._task_handler.join()
self._result_handler.join()
for p in self._pool:
p.join()
@staticmethod
def _help_stuff_finish(inqueue, task_handler, size):
# task_handler may be blocked trying to put items on inqueue
debug('removing tasks from inqueue until task handler finished')
inqueue._rlock.acquire()
while task_handler.is_alive() and inqueue._reader.poll():
inqueue._reader.recv()
time.sleep(0)
@classmethod
def _terminate_pool(cls, taskqueue, inqueue, outqueue, pool,
worker_handler, task_handler, result_handler, cache):
# this is guaranteed to only be called once
debug('finalizing pool')
worker_handler._state = TERMINATE
task_handler._state = TERMINATE
debug('helping task handler/workers to finish')
cls._help_stuff_finish(inqueue, task_handler, len(pool))
assert result_handler.is_alive() or len(cache) == 0
result_handler._state = TERMINATE
outqueue.put(None) # sentinel
# We must wait for the worker handler to exit before terminating
# workers because we don't want workers to be restarted behind our back.
debug('joining worker handler')
if threading.current_thread() is not worker_handler:
worker_handler.join(1e100)
# Terminate workers which haven't already finished.
if pool and hasattr(pool[0], 'terminate'):
debug('terminating workers')
for p in pool:
if p.exitcode is None:
p.terminate()
debug('joining task handler')
if threading.current_thread() is not task_handler:
task_handler.join(1e100)
debug('joining result handler')
if threading.current_thread() is not result_handler:
result_handler.join(1e100)
if pool and hasattr(pool[0], 'terminate'):
debug('joining pool workers')
for p in pool:
if p.is_alive():
# worker has not yet exited
debug('cleaning up worker %d' % p.pid)
p.join()
#
# Class whose instances are returned by `Pool.apply_async()`
#
class ApplyResult(object):
def __init__(self, cache, callback):
self._cond = threading.Condition(threading.Lock())
self._job = job_counter.next()
self._cache = cache
self._ready = False
self._callback = callback
cache[self._job] = self
def ready(self):
return self._ready
def successful(self):
assert self._ready
return self._success
def wait(self, timeout=None):
self._cond.acquire()
try:
if not self._ready:
self._cond.wait(timeout)
finally:
self._cond.release()
def get(self, timeout=None):
self.wait(timeout)
if not self._ready:
raise TimeoutError
if self._success:
return self._value
else:
raise self._value
def _set(self, i, obj):
self._success, self._value = obj
if self._callback and self._success:
self._callback(self._value)
self._cond.acquire()
try:
self._ready = True
self._cond.notify()
finally:
self._cond.release()
del self._cache[self._job]
AsyncResult = ApplyResult # create alias -- see #17805
#
# Class whose instances are returned by `Pool.map_async()`
#
class MapResult(ApplyResult):
def __init__(self, cache, chunksize, length, callback):
ApplyResult.__init__(self, cache, callback)
self._success = True
self._value = [None] * length
self._chunksize = chunksize
if chunksize <= 0:
self._number_left = 0
self._ready = True
del cache[self._job]
else:
self._number_left = length//chunksize + bool(length % chunksize)
def _set(self, i, success_result):
success, result = success_result
if success:
self._value[i*self._chunksize:(i+1)*self._chunksize] = result
self._number_left -= 1
if self._number_left == 0:
if self._callback:
self._callback(self._value)
del self._cache[self._job]
self._cond.acquire()
try:
self._ready = True
self._cond.notify()
finally:
self._cond.release()
else:
self._success = False
self._value = result
del self._cache[self._job]
self._cond.acquire()
try:
self._ready = True
self._cond.notify()
finally:
self._cond.release()
#
# Class whose instances are returned by `Pool.imap()`
#
class IMapIterator(object):
def __init__(self, cache):
self._cond = threading.Condition(threading.Lock())
self._job = job_counter.next()
self._cache = cache
self._items = collections.deque()
self._index = 0
self._length = None
self._unsorted = {}
cache[self._job] = self
def __iter__(self):
return self
def next(self, timeout=None):
self._cond.acquire()
try:
try:
item = self._items.popleft()
except IndexError:
if self._index == self._length:
raise StopIteration
self._cond.wait(timeout)
try:
item = self._items.popleft()
except IndexError:
if self._index == self._length:
raise StopIteration
raise TimeoutError
finally:
self._cond.release()
success, value = item
if success:
return value
raise value
__next__ = next # XXX
def _set(self, i, obj):
self._cond.acquire()
try:
if self._index == i:
self._items.append(obj)
self._index += 1
while self._index in self._unsorted:
obj = self._unsorted.pop(self._index)
self._items.append(obj)
self._index += 1
self._cond.notify()
else:
self._unsorted[i] = obj
if self._index == self._length:
del self._cache[self._job]
finally:
self._cond.release()
def _set_length(self, length):
self._cond.acquire()
try:
self._length = length
if self._index == self._length:
self._cond.notify()
del self._cache[self._job]
finally:
self._cond.release()
#
# Class whose instances are returned by `Pool.imap_unordered()`
#
class IMapUnorderedIterator(IMapIterator):
def _set(self, i, obj):
self._cond.acquire()
try:
self._items.append(obj)
self._index += 1
self._cond.notify()
if self._index == self._length:
del self._cache[self._job]
finally:
self._cond.release()
#
#
#
class ThreadPool(Pool):
from .dummy import Process
def __init__(self, processes=None, initializer=None, initargs=()):
Pool.__init__(self, processes, initializer, initargs)
def _setup_queues(self):
self._inqueue = Queue.Queue()
self._outqueue = Queue.Queue()
self._quick_put = self._inqueue.put
self._quick_get = self._outqueue.get
@staticmethod
def _help_stuff_finish(inqueue, task_handler, size):
# put sentinels at head of inqueue to make workers finish
inqueue.not_empty.acquire()
try:
inqueue.queue.clear()
inqueue.queue.extend([None] * size)
inqueue.not_empty.notify_all()
finally:
inqueue.not_empty.release()
#
# Module providing the `Process` class which emulates `threading.Thread`
#
# multiprocessing/process.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = ['Process', 'current_process', 'active_children']
#
# Imports
#
import os
import sys
import signal
import itertools
#
#
#
try:
ORIGINAL_DIR = os.path.abspath(os.getcwd())
except OSError:
ORIGINAL_DIR = None
#
# Public functions
#
def current_process():
'''
Return process object representing the current process
'''
return _current_process
def active_children():
'''
Return list of process objects corresponding to live child processes
'''
_cleanup()
return list(_current_process._children)
#
#
#
def _cleanup():
# check for processes which have finished
for p in list(_current_process._children):
if p._popen.poll() is not None:
_current_process._children.discard(p)
#
# The `Process` class
#
class Process(object):
'''
Process objects represent activity that is run in a separate process
The class is analagous to `threading.Thread`
'''
_Popen = None
def __init__(self, group=None, target=None, name=None, args=(), kwargs={}):
assert group is None, 'group argument must be None for now'
count = _current_process._counter.next()
self._identity = _current_process._identity + (count,)
self._authkey = _current_process._authkey
self._daemonic = _current_process._daemonic
self._tempdir = _current_process._tempdir
self._parent_pid = os.getpid()
self._popen = None
self._target = target
self._args = tuple(args)
self._kwargs = dict(kwargs)
self._name = name or type(self).__name__ + '-' + \
':'.join(str(i) for i in self._identity)
def run(self):
'''
Method to be run in sub-process; can be overridden in sub-class
'''
if self._target:
self._target(*self._args, **self._kwargs)
def start(self):
'''
Start child process
'''
assert self._popen is None, 'cannot start a process twice'
assert self._parent_pid == os.getpid(), \
'can only start a process object created by current process'
assert not _current_process._daemonic, \
'daemonic processes are not allowed to have children'
_cleanup()
if self._Popen is not None:
Popen = self._Popen
else:
from .forking import Popen
self._popen = Popen(self)
# Avoid a refcycle if the target function holds an indirect
# reference to the process object (see bpo-30775)
del self._target, self._args, self._kwargs
_current_process._children.add(self)
def terminate(self):
'''
Terminate process; sends SIGTERM signal or uses TerminateProcess()
'''
self._popen.terminate()
def join(self, timeout=None):
'''
Wait until child process terminates
'''
assert self._parent_pid == os.getpid(), 'can only join a child process'
assert self._popen is not None, 'can only join a started process'
res = self._popen.wait(timeout)
if res is not None:
_current_process._children.discard(self)
def is_alive(self):
'''
Return whether process is alive
'''
if self is _current_process:
return True
assert self._parent_pid == os.getpid(), 'can only test a child process'
if self._popen is None:
return False
returncode = self._popen.poll()
if returncode is None:
return True
else:
_current_process._children.discard(self)
return False
@property
def name(self):
return self._name
@name.setter
def name(self, name):
assert isinstance(name, basestring), 'name must be a string'
self._name = name
@property
def daemon(self):
'''
Return whether process is a daemon
'''
return self._daemonic
@daemon.setter
def daemon(self, daemonic):
'''
Set whether process is a daemon
'''
assert self._popen is None, 'process has already started'
self._daemonic = daemonic
@property
def authkey(self):
return self._authkey
@authkey.setter
def authkey(self, authkey):
'''
Set authorization key of process
'''
self._authkey = AuthenticationString(authkey)
@property
def exitcode(self):
'''
Return exit code of process or `None` if it has yet to stop
'''
if self._popen is None:
return self._popen
return self._popen.poll()
@property
def ident(self):
'''
Return identifier (PID) of process or `None` if it has yet to start
'''
if self is _current_process:
return os.getpid()
else:
return self._popen and self._popen.pid
pid = ident
def __repr__(self):
if self is _current_process:
status = 'started'
elif self._parent_pid != os.getpid():
status = 'unknown'
elif self._popen is None:
status = 'initial'
else:
if self._popen.poll() is not None:
status = self.exitcode
else:
status = 'started'
if type(status) in (int, long):
if status == 0:
status = 'stopped'
else:
status = 'stopped[%s]' % _exitcode_to_name.get(status, status)
return '<%s(%s, %s%s)>' % (type(self).__name__, self._name,
status, self._daemonic and ' daemon' or '')
##
def _bootstrap(self):
from . import util
global _current_process
try:
self._children = set()
self._counter = itertools.count(1)
try:
sys.stdin.close()
sys.stdin = open(os.devnull)
except (OSError, ValueError):
pass
_current_process = self
util._finalizer_registry.clear()
util._run_after_forkers()
util.info('child process calling self.run()')
try:
self.run()
exitcode = 0
finally:
util._exit_function()
except SystemExit, e:
if not e.args:
exitcode = 1
elif isinstance(e.args[0], (int, long)):
exitcode = int(e.args[0])
else:
sys.stderr.write(str(e.args[0]) + '\n')
sys.stderr.flush()
exitcode = 1
except:
exitcode = 1
import traceback
sys.stderr.write('Process %s:\n' % self.name)
sys.stderr.flush()
traceback.print_exc()
util.info('process exiting with exitcode %d' % exitcode)
return exitcode
#
# We subclass bytes to avoid accidental transmission of auth keys over network
#
class AuthenticationString(bytes):
def __reduce__(self):
from .forking import Popen
if not Popen.thread_is_spawning():
raise TypeError(
'Pickling an AuthenticationString object is '
'disallowed for security reasons'
)
return AuthenticationString, (bytes(self),)
#
# Create object representing the main process
#
class _MainProcess(Process):
def __init__(self):
self._identity = ()
self._daemonic = False
self._name = 'MainProcess'
self._parent_pid = None
self._popen = None
self._counter = itertools.count(1)
self._children = set()
self._authkey = AuthenticationString(os.urandom(32))
self._tempdir = None
_current_process = _MainProcess()
del _MainProcess
#
# Give names to some return codes
#
_exitcode_to_name = {}
for name, signum in signal.__dict__.items():
if name[:3]=='SIG' and '_' not in name:
_exitcode_to_name[-signum] = name
#
# Module implementing queues
#
# multiprocessing/queues.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = ['Queue', 'SimpleQueue', 'JoinableQueue']
import sys
import os
import threading
import collections
import time
import atexit
import weakref
from Queue import Empty, Full
import _multiprocessing
from . import Pipe
from .synchronize import Lock, BoundedSemaphore, Semaphore, Condition
from .util import debug, info, Finalize, register_after_fork, is_exiting
from .forking import assert_spawning
#
# Queue type using a pipe, buffer and thread
#
class Queue(object):
def __init__(self, maxsize=0):
if maxsize <= 0:
maxsize = _multiprocessing.SemLock.SEM_VALUE_MAX
self._maxsize = maxsize
self._reader, self._writer = Pipe(duplex=False)
self._rlock = Lock()
self._opid = os.getpid()
if sys.platform == 'win32':
self._wlock = None
else:
self._wlock = Lock()
self._sem = BoundedSemaphore(maxsize)
self._after_fork()
if sys.platform != 'win32':
register_after_fork(self, Queue._after_fork)
def __getstate__(self):
assert_spawning(self)
return (self._maxsize, self._reader, self._writer,
self._rlock, self._wlock, self._sem, self._opid)
def __setstate__(self, state):
(self._maxsize, self._reader, self._writer,
self._rlock, self._wlock, self._sem, self._opid) = state
self._after_fork()
def _after_fork(self):
debug('Queue._after_fork()')
self._notempty = threading.Condition(threading.Lock())
self._buffer = collections.deque()
self._thread = None
self._jointhread = None
self._joincancelled = False
self._closed = False
self._close = None
self._send = self._writer.send
self._recv = self._reader.recv
self._poll = self._reader.poll
def put(self, obj, block=True, timeout=None):
assert not self._closed
if not self._sem.acquire(block, timeout):
raise Full
self._notempty.acquire()
try:
if self._thread is None:
self._start_thread()
self._buffer.append(obj)
self._notempty.notify()
finally:
self._notempty.release()
def get(self, block=True, timeout=None):
if block and timeout is None:
self._rlock.acquire()
try:
res = self._recv()
self._sem.release()
return res
finally:
self._rlock.release()
else:
if block:
deadline = time.time() + timeout
if not self._rlock.acquire(block, timeout):
raise Empty
try:
if block:
timeout = deadline - time.time()
if not self._poll(timeout):
raise Empty
elif not self._poll():
raise Empty
res = self._recv()
self._sem.release()
return res
finally:
self._rlock.release()
def qsize(self):
# Raises NotImplementedError on Mac OSX because of broken sem_getvalue()
return self._maxsize - self._sem._semlock._get_value()
def empty(self):
return not self._poll()
def full(self):
return self._sem._semlock._is_zero()
def get_nowait(self):
return self.get(False)
def put_nowait(self, obj):
return self.put(obj, False)
def close(self):
self._closed = True
try:
self._reader.close()
finally:
close = self._close
if close:
self._close = None
close()
def join_thread(self):
debug('Queue.join_thread()')
assert self._closed
if self._jointhread:
self._jointhread()
def cancel_join_thread(self):
debug('Queue.cancel_join_thread()')
self._joincancelled = True
try:
self._jointhread.cancel()
except AttributeError:
pass
def _start_thread(self):
debug('Queue._start_thread()')
# Start thread which transfers data from buffer to pipe
self._buffer.clear()
self._thread = threading.Thread(
target=Queue._feed,
args=(self._buffer, self._notempty, self._send,
self._wlock, self._writer.close),
name='QueueFeederThread'
)
self._thread.daemon = True
debug('doing self._thread.start()')
self._thread.start()
debug('... done self._thread.start()')
# On process exit we will wait for data to be flushed to pipe.
if not self._joincancelled:
self._jointhread = Finalize(
self._thread, Queue._finalize_join,
[weakref.ref(self._thread)],
exitpriority=-5
)
# Send sentinel to the thread queue object when garbage collected
self._close = Finalize(
self, Queue._finalize_close,
[self._buffer, self._notempty],
exitpriority=10
)
@staticmethod
def _finalize_join(twr):
debug('joining queue thread')
thread = twr()
if thread is not None:
thread.join()
debug('... queue thread joined')
else:
debug('... queue thread already dead')
@staticmethod
def _finalize_close(buffer, notempty):
debug('telling queue thread to quit')
notempty.acquire()
try:
buffer.append(_sentinel)
notempty.notify()
finally:
notempty.release()
@staticmethod
def _feed(buffer, notempty, send, writelock, close):
debug('starting thread to feed data to pipe')
nacquire = notempty.acquire
nrelease = notempty.release
nwait = notempty.wait
bpopleft = buffer.popleft
sentinel = _sentinel
if sys.platform != 'win32':
wacquire = writelock.acquire
wrelease = writelock.release
else:
wacquire = None
while 1:
try:
nacquire()
try:
if not buffer:
nwait()
finally:
nrelease()
try:
while 1:
obj = bpopleft()
if obj is sentinel:
debug('feeder thread got sentinel -- exiting')
close()
return
if wacquire is None:
send(obj)
else:
wacquire()
try:
send(obj)
finally:
wrelease()
except IndexError:
pass
except Exception as e:
# Since this runs in a daemon thread the resources it uses
# may be become unusable while the process is cleaning up.
# We ignore errors which happen after the process has
# started to cleanup.
if is_exiting():
info('error in queue thread: %s', e)
return
else:
import traceback
traceback.print_exc()
_sentinel = object()
#
# A queue type which also supports join() and task_done() methods
#
# Note that if you do not call task_done() for each finished task then
# eventually the counter's semaphore may overflow causing Bad Things
# to happen.
#
class JoinableQueue(Queue):
def __init__(self, maxsize=0):
Queue.__init__(self, maxsize)
self._unfinished_tasks = Semaphore(0)
self._cond = Condition()
def __getstate__(self):
return Queue.__getstate__(self) + (self._cond, self._unfinished_tasks)
def __setstate__(self, state):
Queue.__setstate__(self, state[:-2])
self._cond, self._unfinished_tasks = state[-2:]
def put(self, obj, block=True, timeout=None):
assert not self._closed
if not self._sem.acquire(block, timeout):
raise Full
self._notempty.acquire()
self._cond.acquire()
try:
if self._thread is None:
self._start_thread()
self._buffer.append(obj)
self._unfinished_tasks.release()
self._notempty.notify()
finally:
self._cond.release()
self._notempty.release()
def task_done(self):
self._cond.acquire()
try:
if not self._unfinished_tasks.acquire(False):
raise ValueError('task_done() called too many times')
if self._unfinished_tasks._semlock._is_zero():
self._cond.notify_all()
finally:
self._cond.release()
def join(self):
self._cond.acquire()
try:
if not self._unfinished_tasks._semlock._is_zero():
self._cond.wait()
finally:
self._cond.release()
#
# Simplified Queue type -- really just a locked pipe
#
class SimpleQueue(object):
def __init__(self):
self._reader, self._writer = Pipe(duplex=False)
self._rlock = Lock()
if sys.platform == 'win32':
self._wlock = None
else:
self._wlock = Lock()
self._make_methods()
def empty(self):
return not self._reader.poll()
def __getstate__(self):
assert_spawning(self)
return (self._reader, self._writer, self._rlock, self._wlock)
def __setstate__(self, state):
(self._reader, self._writer, self._rlock, self._wlock) = state
self._make_methods()
def _make_methods(self):
recv = self._reader.recv
racquire, rrelease = self._rlock.acquire, self._rlock.release
def get():
racquire()
try:
return recv()
finally:
rrelease()
self.get = get
if self._wlock is None:
# writes to a message oriented win32 pipe are atomic
self.put = self._writer.send
else:
send = self._writer.send
wacquire, wrelease = self._wlock.acquire, self._wlock.release
def put(obj):
wacquire()
try:
return send(obj)
finally:
wrelease()
self.put = put
#
# Module to allow connection and socket objects to be transferred
# between processes
#
# multiprocessing/reduction.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = []
import os
import sys
import socket
import threading
import _multiprocessing
from multiprocessing import current_process
from multiprocessing.forking import Popen, duplicate, close, ForkingPickler
from multiprocessing.util import register_after_fork, debug, sub_debug
from multiprocessing.connection import Client, Listener
#
#
#
if not(sys.platform == 'win32' or hasattr(_multiprocessing, 'recvfd')):
raise ImportError('pickling of connections not supported')
#
# Platform specific definitions
#
if sys.platform == 'win32':
import _subprocess
from _multiprocessing import win32
def send_handle(conn, handle, destination_pid):
process_handle = win32.OpenProcess(
win32.PROCESS_ALL_ACCESS, False, destination_pid
)
try:
new_handle = duplicate(handle, process_handle)
conn.send(new_handle)
finally:
close(process_handle)
def recv_handle(conn):
return conn.recv()
else:
def send_handle(conn, handle, destination_pid):
_multiprocessing.sendfd(conn.fileno(), handle)
def recv_handle(conn):
return _multiprocessing.recvfd(conn.fileno())
#
# Support for a per-process server thread which caches pickled handles
#
_cache = set()
def _reset(obj):
global _lock, _listener, _cache
for h in _cache:
close(h)
_cache.clear()
_lock = threading.Lock()
_listener = None
_reset(None)
register_after_fork(_reset, _reset)
def _get_listener():
global _listener
if _listener is None:
_lock.acquire()
try:
if _listener is None:
debug('starting listener and thread for sending handles')
_listener = Listener(authkey=current_process().authkey)
t = threading.Thread(target=_serve)
t.daemon = True
t.start()
finally:
_lock.release()
return _listener
def _serve():
from .util import is_exiting, sub_warning
while 1:
try:
conn = _listener.accept()
handle_wanted, destination_pid = conn.recv()
_cache.remove(handle_wanted)
send_handle(conn, handle_wanted, destination_pid)
close(handle_wanted)
conn.close()
except:
if not is_exiting():
import traceback
sub_warning(
'thread for sharing handles raised exception :\n' +
'-'*79 + '\n' + traceback.format_exc() + '-'*79
)
#
# Functions to be used for pickling/unpickling objects with handles
#
def reduce_handle(handle):
if Popen.thread_is_spawning():
return (None, Popen.duplicate_for_child(handle), True)
dup_handle = duplicate(handle)
_cache.add(dup_handle)
sub_debug('reducing handle %d', handle)
return (_get_listener().address, dup_handle, False)
def rebuild_handle(pickled_data):
address, handle, inherited = pickled_data
if inherited:
return handle
sub_debug('rebuilding handle %d', handle)
conn = Client(address, authkey=current_process().authkey)
conn.send((handle, os.getpid()))
new_handle = recv_handle(conn)
conn.close()
return new_handle
#
# Register `_multiprocessing.Connection` with `ForkingPickler`
#
def reduce_connection(conn):
rh = reduce_handle(conn.fileno())
return rebuild_connection, (rh, conn.readable, conn.writable)
def rebuild_connection(reduced_handle, readable, writable):
handle = rebuild_handle(reduced_handle)
return _multiprocessing.Connection(
handle, readable=readable, writable=writable
)
ForkingPickler.register(_multiprocessing.Connection, reduce_connection)
#
# Register `socket.socket` with `ForkingPickler`
#
def fromfd(fd, family, type_, proto=0):
s = socket.fromfd(fd, family, type_, proto)
if s.__class__ is not socket.socket:
s = socket.socket(_sock=s)
return s
def reduce_socket(s):
reduced_handle = reduce_handle(s.fileno())
return rebuild_socket, (reduced_handle, s.family, s.type, s.proto)
def rebuild_socket(reduced_handle, family, type_, proto):
fd = rebuild_handle(reduced_handle)
_sock = fromfd(fd, family, type_, proto)
close(fd)
return _sock
ForkingPickler.register(socket.socket, reduce_socket)
#
# Register `_multiprocessing.PipeConnection` with `ForkingPickler`
#
if sys.platform == 'win32':
def reduce_pipe_connection(conn):
rh = reduce_handle(conn.fileno())
return rebuild_pipe_connection, (rh, conn.readable, conn.writable)
def rebuild_pipe_connection(reduced_handle, readable, writable):
handle = rebuild_handle(reduced_handle)
return _multiprocessing.PipeConnection(
handle, readable=readable, writable=writable
)
ForkingPickler.register(_multiprocessing.PipeConnection, reduce_pipe_connection)
#
# Module which supports allocation of ctypes objects from shared memory
#
# multiprocessing/sharedctypes.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
import sys
import ctypes
import weakref
from multiprocessing import heap, RLock
from multiprocessing.forking import assert_spawning, ForkingPickler
__all__ = ['RawValue', 'RawArray', 'Value', 'Array', 'copy', 'synchronized']
#
#
#
typecode_to_type = {
'c': ctypes.c_char,
'b': ctypes.c_byte, 'B': ctypes.c_ubyte,
'h': ctypes.c_short, 'H': ctypes.c_ushort,
'i': ctypes.c_int, 'I': ctypes.c_uint,
'l': ctypes.c_long, 'L': ctypes.c_ulong,
'f': ctypes.c_float, 'd': ctypes.c_double
}
try:
typecode_to_type['u'] = ctypes.c_wchar
except AttributeError:
pass
#
#
#
def _new_value(type_):
size = ctypes.sizeof(type_)
wrapper = heap.BufferWrapper(size)
return rebuild_ctype(type_, wrapper, None)
def RawValue(typecode_or_type, *args):
'''
Returns a ctypes object allocated from shared memory
'''
type_ = typecode_to_type.get(typecode_or_type, typecode_or_type)
obj = _new_value(type_)
ctypes.memset(ctypes.addressof(obj), 0, ctypes.sizeof(obj))
obj.__init__(*args)
return obj
def RawArray(typecode_or_type, size_or_initializer):
'''
Returns a ctypes array allocated from shared memory
'''
type_ = typecode_to_type.get(typecode_or_type, typecode_or_type)
if isinstance(size_or_initializer, (int, long)):
type_ = type_ * size_or_initializer
obj = _new_value(type_)
ctypes.memset(ctypes.addressof(obj), 0, ctypes.sizeof(obj))
return obj
else:
type_ = type_ * len(size_or_initializer)
result = _new_value(type_)
result.__init__(*size_or_initializer)
return result
def Value(typecode_or_type, *args, **kwds):
'''
Return a synchronization wrapper for a Value
'''
lock = kwds.pop('lock', None)
if kwds:
raise ValueError('unrecognized keyword argument(s): %s' % kwds.keys())
obj = RawValue(typecode_or_type, *args)
if lock is False:
return obj
if lock in (True, None):
lock = RLock()
if not hasattr(lock, 'acquire'):
raise AttributeError("'%r' has no method 'acquire'" % lock)
return synchronized(obj, lock)
def Array(typecode_or_type, size_or_initializer, **kwds):
'''
Return a synchronization wrapper for a RawArray
'''
lock = kwds.pop('lock', None)
if kwds:
raise ValueError('unrecognized keyword argument(s): %s' % kwds.keys())
obj = RawArray(typecode_or_type, size_or_initializer)
if lock is False:
return obj
if lock in (True, None):
lock = RLock()
if not hasattr(lock, 'acquire'):
raise AttributeError("'%r' has no method 'acquire'" % lock)
return synchronized(obj, lock)
def copy(obj):
new_obj = _new_value(type(obj))
ctypes.pointer(new_obj)[0] = obj
return new_obj
def synchronized(obj, lock=None):
assert not isinstance(obj, SynchronizedBase), 'object already synchronized'
if isinstance(obj, ctypes._SimpleCData):
return Synchronized(obj, lock)
elif isinstance(obj, ctypes.Array):
if obj._type_ is ctypes.c_char:
return SynchronizedString(obj, lock)
return SynchronizedArray(obj, lock)
else:
cls = type(obj)
try:
scls = class_cache[cls]
except KeyError:
names = [field[0] for field in cls._fields_]
d = dict((name, make_property(name)) for name in names)
classname = 'Synchronized' + cls.__name__
scls = class_cache[cls] = type(classname, (SynchronizedBase,), d)
return scls(obj, lock)
#
# Functions for pickling/unpickling
#
def reduce_ctype(obj):
assert_spawning(obj)
if isinstance(obj, ctypes.Array):
return rebuild_ctype, (obj._type_, obj._wrapper, obj._length_)
else:
return rebuild_ctype, (type(obj), obj._wrapper, None)
def rebuild_ctype(type_, wrapper, length):
if length is not None:
type_ = type_ * length
ForkingPickler.register(type_, reduce_ctype)
obj = type_.from_address(wrapper.get_address())
obj._wrapper = wrapper
return obj
#
# Function to create properties
#
def make_property(name):
try:
return prop_cache[name]
except KeyError:
d = {}
exec template % ((name,)*7) in d
prop_cache[name] = d[name]
return d[name]
template = '''
def get%s(self):
self.acquire()
try:
return self._obj.%s
finally:
self.release()
def set%s(self, value):
self.acquire()
try:
self._obj.%s = value
finally:
self.release()
%s = property(get%s, set%s)
'''
prop_cache = {}
class_cache = weakref.WeakKeyDictionary()
#
# Synchronized wrappers
#
class SynchronizedBase(object):
def __init__(self, obj, lock=None):
self._obj = obj
self._lock = lock or RLock()
self.acquire = self._lock.acquire
self.release = self._lock.release
def __reduce__(self):
assert_spawning(self)
return synchronized, (self._obj, self._lock)
def get_obj(self):
return self._obj
def get_lock(self):
return self._lock
def __repr__(self):
return '<%s wrapper for %s>' % (type(self).__name__, self._obj)
class Synchronized(SynchronizedBase):
value = make_property('value')
class SynchronizedArray(SynchronizedBase):
def __len__(self):
return len(self._obj)
def __getitem__(self, i):
self.acquire()
try:
return self._obj[i]
finally:
self.release()
def __setitem__(self, i, value):
self.acquire()
try:
self._obj[i] = value
finally:
self.release()
def __getslice__(self, start, stop):
self.acquire()
try:
return self._obj[start:stop]
finally:
self.release()
def __setslice__(self, start, stop, values):
self.acquire()
try:
self._obj[start:stop] = values
finally:
self.release()
class SynchronizedString(SynchronizedArray):
value = make_property('value')
raw = make_property('raw')
#
# Module implementing synchronization primitives
#
# multiprocessing/synchronize.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__all__ = [
'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Condition', 'Event'
]
import threading
import os
import sys
from time import time as _time, sleep as _sleep
import _multiprocessing
from multiprocessing.process import current_process
from multiprocessing.util import Finalize, register_after_fork, debug
from multiprocessing.forking import assert_spawning, Popen
# Try to import the mp.synchronize module cleanly, if it fails
# raise ImportError for platforms lacking a working sem_open implementation.
# See issue 3770
try:
from _multiprocessing import SemLock
except (ImportError):
raise ImportError("This platform lacks a functioning sem_open" +
" implementation, therefore, the required" +
" synchronization primitives needed will not" +
" function, see issue 3770.")
#
# Constants
#
RECURSIVE_MUTEX, SEMAPHORE = range(2)
SEM_VALUE_MAX = _multiprocessing.SemLock.SEM_VALUE_MAX
#
# Base class for semaphores and mutexes; wraps `_multiprocessing.SemLock`
#
class SemLock(object):
def __init__(self, kind, value, maxvalue):
sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
debug('created semlock with handle %s' % sl.handle)
self._make_methods()
if sys.platform != 'win32':
def _after_fork(obj):
obj._semlock._after_fork()
register_after_fork(self, _after_fork)
def _make_methods(self):
self.acquire = self._semlock.acquire
self.release = self._semlock.release
def __enter__(self):
return self._semlock.__enter__()
def __exit__(self, *args):
return self._semlock.__exit__(*args)
def __getstate__(self):
assert_spawning(self)
sl = self._semlock
return (Popen.duplicate_for_child(sl.handle), sl.kind, sl.maxvalue)
def __setstate__(self, state):
self._semlock = _multiprocessing.SemLock._rebuild(*state)
debug('recreated blocker with handle %r' % state[0])
self._make_methods()
#
# Semaphore
#
class Semaphore(SemLock):
def __init__(self, value=1):
SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX)
def get_value(self):
return self._semlock._get_value()
def __repr__(self):
try:
value = self._semlock._get_value()
except Exception:
value = 'unknown'
return '<Semaphore(value=%s)>' % value
#
# Bounded semaphore
#
class BoundedSemaphore(Semaphore):
def __init__(self, value=1):
SemLock.__init__(self, SEMAPHORE, value, value)
def __repr__(self):
try:
value = self._semlock._get_value()
except Exception:
value = 'unknown'
return '<BoundedSemaphore(value=%s, maxvalue=%s)>' % \
(value, self._semlock.maxvalue)
#
# Non-recursive lock
#
class Lock(SemLock):
def __init__(self):
SemLock.__init__(self, SEMAPHORE, 1, 1)
def __repr__(self):
try:
if self._semlock._is_mine():
name = current_process().name
if threading.current_thread().name != 'MainThread':
name += '|' + threading.current_thread().name
elif self._semlock._get_value() == 1:
name = 'None'
elif self._semlock._count() > 0:
name = 'SomeOtherThread'
else:
name = 'SomeOtherProcess'
except Exception:
name = 'unknown'
return '<Lock(owner=%s)>' % name
#
# Recursive lock
#
class RLock(SemLock):
def __init__(self):
SemLock.__init__(self, RECURSIVE_MUTEX, 1, 1)
def __repr__(self):
try:
if self._semlock._is_mine():
name = current_process().name
if threading.current_thread().name != 'MainThread':
name += '|' + threading.current_thread().name
count = self._semlock._count()
elif self._semlock._get_value() == 1:
name, count = 'None', 0
elif self._semlock._count() > 0:
name, count = 'SomeOtherThread', 'nonzero'
else:
name, count = 'SomeOtherProcess', 'nonzero'
except Exception:
name, count = 'unknown', 'unknown'
return '<RLock(%s, %s)>' % (name, count)
#
# Condition variable
#
class Condition(object):
def __init__(self, lock=None):
self._lock = lock or RLock()
self._sleeping_count = Semaphore(0)
self._woken_count = Semaphore(0)
self._wait_semaphore = Semaphore(0)
self._make_methods()
def __getstate__(self):
assert_spawning(self)
return (self._lock, self._sleeping_count,
self._woken_count, self._wait_semaphore)
def __setstate__(self, state):
(self._lock, self._sleeping_count,
self._woken_count, self._wait_semaphore) = state
self._make_methods()
def __enter__(self):
return self._lock.__enter__()
def __exit__(self, *args):
return self._lock.__exit__(*args)
def _make_methods(self):
self.acquire = self._lock.acquire
self.release = self._lock.release
def __repr__(self):
try:
num_waiters = (self._sleeping_count._semlock._get_value() -
self._woken_count._semlock._get_value())
except Exception:
num_waiters = 'unknown'
return '<Condition(%s, %s)>' % (self._lock, num_waiters)
def wait(self, timeout=None):
assert self._lock._semlock._is_mine(), \
'must acquire() condition before using wait()'
# indicate that this thread is going to sleep
self._sleeping_count.release()
# release lock
count = self._lock._semlock._count()
for i in xrange(count):
self._lock.release()
try:
# wait for notification or timeout
self._wait_semaphore.acquire(True, timeout)
finally:
# indicate that this thread has woken
self._woken_count.release()
# reacquire lock
for i in xrange(count):
self._lock.acquire()
def notify(self):
assert self._lock._semlock._is_mine(), 'lock is not owned'
assert not self._wait_semaphore.acquire(False)
# to take account of timeouts since last notify() we subtract
# woken_count from sleeping_count and rezero woken_count
while self._woken_count.acquire(False):
res = self._sleeping_count.acquire(False)
assert res
if self._sleeping_count.acquire(False): # try grabbing a sleeper
self._wait_semaphore.release() # wake up one sleeper
self._woken_count.acquire() # wait for the sleeper to wake
# rezero _wait_semaphore in case a timeout just happened
self._wait_semaphore.acquire(False)
def notify_all(self):
assert self._lock._semlock._is_mine(), 'lock is not owned'
assert not self._wait_semaphore.acquire(False)
# to take account of timeouts since last notify*() we subtract
# woken_count from sleeping_count and rezero woken_count
while self._woken_count.acquire(False):
res = self._sleeping_count.acquire(False)
assert res
sleepers = 0
while self._sleeping_count.acquire(False):
self._wait_semaphore.release() # wake up one sleeper
sleepers += 1
if sleepers:
for i in xrange(sleepers):
self._woken_count.acquire() # wait for a sleeper to wake
# rezero wait_semaphore in case some timeouts just happened
while self._wait_semaphore.acquire(False):
pass
#
# Event
#
class Event(object):
def __init__(self):
self._cond = Condition(Lock())
self._flag = Semaphore(0)
def is_set(self):
self._cond.acquire()
try:
if self._flag.acquire(False):
self._flag.release()
return True
return False
finally:
self._cond.release()
def set(self):
self._cond.acquire()
try:
self._flag.acquire(False)
self._flag.release()
self._cond.notify_all()
finally:
self._cond.release()
def clear(self):
self._cond.acquire()
try:
self._flag.acquire(False)
finally:
self._cond.release()
def wait(self, timeout=None):
self._cond.acquire()
try:
if self._flag.acquire(False):
self._flag.release()
else:
self._cond.wait(timeout)
if self._flag.acquire(False):
self._flag.release()
return True
return False
finally:
self._cond.release()
#
# Module providing various facilities to other parts of the package
#
# multiprocessing/util.py
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
import os
import itertools
import weakref
import atexit
import threading # we want threading to install it's
# cleanup function before multiprocessing does
from subprocess import _args_from_interpreter_flags
from multiprocessing.process import current_process, active_children
__all__ = [
'sub_debug', 'debug', 'info', 'sub_warning', 'get_logger',
'log_to_stderr', 'get_temp_dir', 'register_after_fork',
'is_exiting', 'Finalize', 'ForkAwareThreadLock', 'ForkAwareLocal',
'SUBDEBUG', 'SUBWARNING',
]
#
# Logging
#
NOTSET = 0
SUBDEBUG = 5
DEBUG = 10
INFO = 20
SUBWARNING = 25
LOGGER_NAME = 'multiprocessing'
DEFAULT_LOGGING_FORMAT = '[%(levelname)s/%(processName)s] %(message)s'
_logger = None
_log_to_stderr = False
def sub_debug(msg, *args):
if _logger:
_logger.log(SUBDEBUG, msg, *args)
def debug(msg, *args):
if _logger:
_logger.log(DEBUG, msg, *args)
def info(msg, *args):
if _logger:
_logger.log(INFO, msg, *args)
def sub_warning(msg, *args):
if _logger:
_logger.log(SUBWARNING, msg, *args)
def get_logger():
'''
Returns logger used by multiprocessing
'''
global _logger
import logging, atexit
logging._acquireLock()
try:
if not _logger:
_logger = logging.getLogger(LOGGER_NAME)
_logger.propagate = 0
logging.addLevelName(SUBDEBUG, 'SUBDEBUG')
logging.addLevelName(SUBWARNING, 'SUBWARNING')
# XXX multiprocessing should cleanup before logging
if hasattr(atexit, 'unregister'):
atexit.unregister(_exit_function)
atexit.register(_exit_function)
else:
atexit._exithandlers.remove((_exit_function, (), {}))
atexit._exithandlers.append((_exit_function, (), {}))
finally:
logging._releaseLock()
return _logger
def log_to_stderr(level=None):
'''
Turn on logging and add a handler which prints to stderr
'''
global _log_to_stderr
import logging
logger = get_logger()
formatter = logging.Formatter(DEFAULT_LOGGING_FORMAT)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
if level:
logger.setLevel(level)
_log_to_stderr = True
return _logger
#
# Function returning a temp directory which will be removed on exit
#
def get_temp_dir():
# get name of a temp directory which will be automatically cleaned up
if current_process()._tempdir is None:
import shutil, tempfile
tempdir = tempfile.mkdtemp(prefix='pymp-')
info('created temp directory %s', tempdir)
Finalize(None, shutil.rmtree, args=[tempdir], exitpriority=-100)
current_process()._tempdir = tempdir
return current_process()._tempdir
#
# Support for reinitialization of objects when bootstrapping a child process
#
_afterfork_registry = weakref.WeakValueDictionary()
_afterfork_counter = itertools.count()
def _run_after_forkers():
items = list(_afterfork_registry.items())
items.sort()
for (index, ident, func), obj in items:
try:
func(obj)
except Exception, e:
info('after forker raised exception %s', e)
def register_after_fork(obj, func):
_afterfork_registry[(_afterfork_counter.next(), id(obj), func)] = obj
#
# Finalization using weakrefs
#
_finalizer_registry = {}
_finalizer_counter = itertools.count()
class Finalize(object):
'''
Class which supports object finalization using weakrefs
'''
def __init__(self, obj, callback, args=(), kwargs=None, exitpriority=None):
assert exitpriority is None or type(exitpriority) in (int, long)
if obj is not None:
self._weakref = weakref.ref(obj, self)
else:
assert exitpriority is not None
self._callback = callback
self._args = args
self._kwargs = kwargs or {}
self._key = (exitpriority, _finalizer_counter.next())
self._pid = os.getpid()
_finalizer_registry[self._key] = self
def __call__(self, wr=None):
'''
Run the callback unless it has already been called or cancelled
'''
try:
del _finalizer_registry[self._key]
except KeyError:
sub_debug('finalizer no longer registered')
else:
if self._pid != os.getpid():
sub_debug('finalizer ignored because different process')
res = None
else:
sub_debug('finalizer calling %s with args %s and kwargs %s',
self._callback, self._args, self._kwargs)
res = self._callback(*self._args, **self._kwargs)
self._weakref = self._callback = self._args = \
self._kwargs = self._key = None
return res
def cancel(self):
'''
Cancel finalization of the object
'''
try:
del _finalizer_registry[self._key]
except KeyError:
pass
else:
self._weakref = self._callback = self._args = \
self._kwargs = self._key = None
def still_active(self):
'''
Return whether this finalizer is still waiting to invoke callback
'''
return self._key in _finalizer_registry
def __repr__(self):
try:
obj = self._weakref()
except (AttributeError, TypeError):
obj = None
if obj is None:
return '<Finalize object, dead>'
x = '<Finalize object, callback=%s' % \
getattr(self._callback, '__name__', self._callback)
if self._args:
x += ', args=' + str(self._args)
if self._kwargs:
x += ', kwargs=' + str(self._kwargs)
if self._key[0] is not None:
x += ', exitprority=' + str(self._key[0])
return x + '>'
def _run_finalizers(minpriority=None):
'''
Run all finalizers whose exit priority is not None and at least minpriority
Finalizers with highest priority are called first; finalizers with
the same priority will be called in reverse order of creation.
'''
if _finalizer_registry is None:
# This function may be called after this module's globals are
# destroyed. See the _exit_function function in this module for more
# notes.
return
if minpriority is None:
f = lambda p : p[0][0] is not None
else:
f = lambda p : p[0][0] is not None and p[0][0] >= minpriority
# Careful: _finalizer_registry may be mutated while this function
# is running (either by a GC run or by another thread).
items = [x for x in _finalizer_registry.items() if f(x)]
items.sort(reverse=True)
for key, finalizer in items:
sub_debug('calling %s', finalizer)
try:
finalizer()
except Exception:
import traceback
traceback.print_exc()
if minpriority is None:
_finalizer_registry.clear()
#
# Clean up on exit
#
def is_exiting():
'''
Returns true if the process is shutting down
'''
return _exiting or _exiting is None
_exiting = False
def _exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers,
active_children=active_children,
current_process=current_process):
# NB: we hold on to references to functions in the arglist due to the
# situation described below, where this function is called after this
# module's globals are destroyed.
global _exiting
info('process shutting down')
debug('running all "atexit" finalizers with priority >= 0')
_run_finalizers(0)
if current_process() is not None:
# NB: we check if the current process is None here because if
# it's None, any call to ``active_children()`` will throw an
# AttributeError (active_children winds up trying to get
# attributes from util._current_process). This happens in a
# variety of shutdown circumstances that are not well-understood
# because module-scope variables are not apparently supposed to
# be destroyed until after this function is called. However,
# they are indeed destroyed before this function is called. See
# issues 9775 and 15881. Also related: 4106, 9205, and 9207.
for p in active_children():
if p._daemonic:
info('calling terminate() for daemon %s', p.name)
p._popen.terminate()
for p in active_children():
info('calling join() for process %s', p.name)
p.join()
debug('running the remaining "atexit" finalizers')
_run_finalizers()
atexit.register(_exit_function)
#
# Some fork aware types
#
class ForkAwareThreadLock(object):
def __init__(self):
self._reset()
register_after_fork(self, ForkAwareThreadLock._reset)
def _reset(self):
self._lock = threading.Lock()
self.acquire = self._lock.acquire
self.release = self._lock.release
class ForkAwareLocal(threading.local):
def __init__(self):
register_after_fork(self, lambda obj : obj.__dict__.clear())
def __reduce__(self):
return type(self), ()
#
# Package analogous to 'threading.py' but using processes
#
# multiprocessing/__init__.py
#
# This package is intended to duplicate the functionality (and much of
# the API) of threading.py but uses processes instead of threads. A
# subpackage 'multiprocessing.dummy' has the same API but is a simple
# wrapper for 'threading'.
#
# Try calling `multiprocessing.doc.main()` to read the html
# documentation in a webbrowser.
#
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. Neither the name of author nor the names of any contributors may be
# used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
__version__ = '0.70a1'
__all__ = [
'Process', 'current_process', 'active_children', 'freeze_support',
'Manager', 'Pipe', 'cpu_count', 'log_to_stderr', 'get_logger',
'allow_connection_pickling', 'BufferTooShort', 'TimeoutError',
'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Condition',
'Event', 'Queue', 'JoinableQueue', 'Pool', 'Value', 'Array',
'RawValue', 'RawArray', 'SUBDEBUG', 'SUBWARNING',
]
__author__ = 'R. Oudkerk ([email protected])'
#
# Imports
#
import os
import sys
from multiprocessing.process import Process, current_process, active_children
from multiprocessing.util import SUBDEBUG, SUBWARNING
#
# Exceptions
#
class ProcessError(Exception):
pass
class BufferTooShort(ProcessError):
pass
class TimeoutError(ProcessError):
pass
class AuthenticationError(ProcessError):
pass
# This is down here because _multiprocessing uses BufferTooShort
try:
# IronPython does not provide _multiprocessing
import _multiprocessing
except ImportError:
pass
#
# Definitions not depending on native semaphores
#
def Manager():
'''
Returns a manager associated with a running server process
The managers methods such as `Lock()`, `Condition()` and `Queue()`
can be used to create shared objects.
'''
from multiprocessing.managers import SyncManager
m = SyncManager()
m.start()
return m
def Pipe(duplex=True):
'''
Returns two connection object connected by a pipe
'''
from multiprocessing.connection import Pipe
return Pipe(duplex)
def cpu_count():
'''
Returns the number of CPUs in the system
'''
if sys.platform == 'win32' or (sys.platform == 'cli' and os.name == 'nt'):
try:
num = int(os.environ['NUMBER_OF_PROCESSORS'])
except (ValueError, KeyError):
num = 0
elif 'bsd' in sys.platform or sys.platform == 'darwin':
comm = '/sbin/sysctl -n hw.ncpu'
if sys.platform == 'darwin':
comm = '/usr' + comm
try:
with os.popen(comm) as p:
num = int(p.read())
except ValueError:
num = 0
else:
try:
num = os.sysconf('SC_NPROCESSORS_ONLN')
except (ValueError, OSError, AttributeError):
num = 0
if num >= 1:
return num
else:
raise NotImplementedError('cannot determine number of cpus')
def freeze_support():
'''
Check whether this is a fake forked process in a frozen executable.
If so then run code specified by commandline and exit.
'''
if sys.platform == 'win32' and getattr(sys, 'frozen', False):
from multiprocessing.forking import freeze_support
freeze_support()
def get_logger():
'''
Return package logger -- if it does not already exist then it is created
'''
from multiprocessing.util import get_logger
return get_logger()
def log_to_stderr(level=None):
'''
Turn on logging and add a handler which prints to stderr
'''
from multiprocessing.util import log_to_stderr
return log_to_stderr(level)
def allow_connection_pickling():
'''
Install support for sending connections and sockets between processes
'''
from multiprocessing import reduction
#
# Definitions depending on native semaphores
#
def Lock():
'''
Returns a non-recursive lock object
'''
from multiprocessing.synchronize import Lock
return Lock()
def RLock():
'''
Returns a recursive lock object
'''
from multiprocessing.synchronize import RLock
return RLock()
def Condition(lock=None):
'''
Returns a condition object
'''
from multiprocessing.synchronize import Condition
return Condition(lock)
def Semaphore(value=1):
'''
Returns a semaphore object
'''
from multiprocessing.synchronize import Semaphore
return Semaphore(value)
def BoundedSemaphore(value=1):
'''
Returns a bounded semaphore object
'''
from multiprocessing.synchronize import BoundedSemaphore
return BoundedSemaphore(value)
def Event():
'''
Returns an event object
'''
from multiprocessing.synchronize import Event
return Event()
def Queue(maxsize=0):
'''
Returns a queue object
'''
from multiprocessing.queues import Queue
return Queue(maxsize)
def JoinableQueue(maxsize=0):
'''
Returns a queue object
'''
from multiprocessing.queues import JoinableQueue
return JoinableQueue(maxsize)
def Pool(processes=None, initializer=None, initargs=(), maxtasksperchild=None):
'''
Returns a process pool object
'''
from multiprocessing.pool import Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
def RawValue(typecode_or_type, *args):
'''
Returns a shared object
'''
from multiprocessing.sharedctypes import RawValue
return RawValue(typecode_or_type, *args)
def RawArray(typecode_or_type, size_or_initializer):
'''
Returns a shared array
'''
from multiprocessing.sharedctypes import RawArray
return RawArray(typecode_or_type, size_or_initializer)
def Value(typecode_or_type, *args, **kwds):
'''
Returns a synchronized shared object
'''
from multiprocessing.sharedctypes import Value
return Value(typecode_or_type, *args, **kwds)
def Array(typecode_or_type, size_or_initializer, **kwds):
'''
Returns a synchronized shared array
'''
from multiprocessing.sharedctypes import Array
return Array(typecode_or_type, size_or_initializer, **kwds)
#
#
#
if sys.platform == 'win32':
def set_executable(executable):
'''
Sets the path to a python.exe or pythonw.exe binary used to run
child processes on Windows instead of sys.executable.
Useful for people embedding Python.
'''
from multiprocessing.forking import set_executable
set_executable(executable)
__all__ += ['set_executable']
"""Mutual exclusion -- for use with module sched
A mutex has two pieces of state -- a 'locked' bit and a queue.
When the mutex is not locked, the queue is empty.
Otherwise, the queue contains 0 or more (function, argument) pairs
representing functions (or methods) waiting to acquire the lock.
When the mutex is unlocked while the queue is not empty,
the first queue entry is removed and its function(argument) pair called,
implying it now has the lock.
Of course, no multi-threading is implied -- hence the funny interface
for lock, where a function is called once the lock is acquired.
"""
from warnings import warnpy3k
warnpy3k("the mutex module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
from collections import deque
class mutex:
def __init__(self):
"""Create a new mutex -- initially unlocked."""
self.locked = False
self.queue = deque()
def test(self):
"""Test the locked bit of the mutex."""
return self.locked
def testandset(self):
"""Atomic test-and-set -- grab the lock if it is not set,
return True if it succeeded."""
if not self.locked:
self.locked = True
return True
else:
return False
def lock(self, function, argument):
"""Lock a mutex, call the function with supplied argument
when it is acquired. If the mutex is already locked, place
function and argument in the queue."""
if self.testandset():
function(argument)
else:
self.queue.append((function, argument))
def unlock(self):
"""Unlock a mutex. If the queue is not empty, call the next
function with its argument."""
if self.queue:
function, argument = self.queue.popleft()
function(argument)
else:
self.locked = False
"""An object-oriented interface to .netrc files."""
# Module and documentation by Eric S. Raymond, 21 Dec 1998
import os, stat, shlex
if os.name == 'posix':
import pwd
__all__ = ["netrc", "NetrcParseError"]
class NetrcParseError(Exception):
"""Exception raised on syntax errors in the .netrc file."""
def __init__(self, msg, filename=None, lineno=None):
self.filename = filename
self.lineno = lineno
self.msg = msg
Exception.__init__(self, msg)
def __str__(self):
return "%s (%s, line %s)" % (self.msg, self.filename, self.lineno)
class netrc:
def __init__(self, file=None):
default_netrc = file is None
if file is None:
try:
file = os.path.join(os.environ['HOME'], ".netrc")
except KeyError:
raise IOError("Could not find .netrc: $HOME is not set")
self.hosts = {}
self.macros = {}
with open(file) as fp:
self._parse(file, fp, default_netrc)
def _parse(self, file, fp, default_netrc):
lexer = shlex.shlex(fp)
lexer.wordchars += r"""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
lexer.commenters = lexer.commenters.replace('#', '')
while 1:
# Look for a machine, default, or macdef top-level keyword
toplevel = tt = lexer.get_token()
if not tt:
break
elif tt[0] == '#':
# seek to beginning of comment, in case reading the token put
# us on a new line, and then skip the rest of the line.
pos = len(tt) + 1
lexer.instream.seek(-pos, 1)
lexer.instream.readline()
continue
elif tt == 'machine':
entryname = lexer.get_token()
elif tt == 'default':
entryname = 'default'
elif tt == 'macdef': # Just skip to end of macdefs
entryname = lexer.get_token()
self.macros[entryname] = []
lexer.whitespace = ' \t'
while 1:
line = lexer.instream.readline()
if not line or line == '\012':
lexer.whitespace = ' \t\r\n'
break
self.macros[entryname].append(line)
continue
else:
raise NetrcParseError(
"bad toplevel token %r" % tt, file, lexer.lineno)
# We're looking at start of an entry for a named machine or default.
login = ''
account = password = None
self.hosts[entryname] = {}
while 1:
tt = lexer.get_token()
if (tt.startswith('#') or
tt in {'', 'machine', 'default', 'macdef'}):
if password:
self.hosts[entryname] = (login, account, password)
lexer.push_token(tt)
break
else:
raise NetrcParseError(
"malformed %s entry %s terminated by %s"
% (toplevel, entryname, repr(tt)),
file, lexer.lineno)
elif tt == 'login' or tt == 'user':
login = lexer.get_token()
elif tt == 'account':
account = lexer.get_token()
elif tt == 'password':
if os.name == 'posix' and default_netrc:
prop = os.fstat(fp.fileno())
if prop.st_uid != os.getuid():
try:
fowner = pwd.getpwuid(prop.st_uid)[0]
except KeyError:
fowner = 'uid %s' % prop.st_uid
try:
user = pwd.getpwuid(os.getuid())[0]
except KeyError:
user = 'uid %s' % os.getuid()
raise NetrcParseError(
("~/.netrc file owner (%s) does not match"
" current user (%s)") % (fowner, user),
file, lexer.lineno)
if (prop.st_mode & (stat.S_IRWXG | stat.S_IRWXO)):
raise NetrcParseError(
"~/.netrc access too permissive: access"
" permissions must restrict access to only"
" the owner", file, lexer.lineno)
password = lexer.get_token()
else:
raise NetrcParseError("bad follower token %r" % tt,
file, lexer.lineno)
def authenticators(self, host):
"""Return a (user, account, password) tuple for given host."""
if host in self.hosts:
return self.hosts[host]
elif 'default' in self.hosts:
return self.hosts['default']
else:
return None
def __repr__(self):
"""Dump the class data in the format of a .netrc file."""
rep = ""
for host in self.hosts.keys():
attrs = self.hosts[host]
rep += "machine {host}\n\tlogin {attrs[0]}\n".format(host=host, attrs=attrs)
if attrs[1]:
rep += "\taccount {attrs[1]}\n".format(attrs=attrs)
rep += "\tpassword {attrs[2]}\n".format(attrs=attrs)
for macro in self.macros.keys():
rep += "macdef {macro}\n".format(macro=macro)
for line in self.macros[macro]:
rep += line
rep += "\n"
return rep
if __name__ == '__main__':
print netrc()
"""Create new objects of various types. Deprecated.
This module is no longer required except for backward compatibility.
Objects of most types can now be created by calling the type object.
"""
from warnings import warnpy3k
warnpy3k("The 'new' module has been removed in Python 3.0; use the 'types' "
"module instead.", stacklevel=2)
del warnpy3k
from types import ClassType as classobj
from types import FunctionType as function
from types import InstanceType as instance
from types import MethodType as instancemethod
from types import ModuleType as module
from types import CodeType as code
"""An NNTP client class based on RFC 977: Network News Transfer Protocol.
Example:
>>> from nntplib import NNTP
>>> s = NNTP('news')
>>> resp, count, first, last, name = s.group('comp.lang.python')
>>> print 'Group', name, 'has', count, 'articles, range', first, 'to', last
Group comp.lang.python has 51 articles, range 5770 to 5821
>>> resp, subs = s.xhdr('subject', first + '-' + last)
>>> resp = s.quit()
>>>
Here 'resp' is the server response line.
Error responses are turned into exceptions.
To post an article from a file:
>>> f = open(filename, 'r') # file containing article, including header
>>> resp = s.post(f)
>>>
For descriptions of all methods, read the comments in the code below.
Note that all arguments and return values representing article numbers
are strings, not numbers, since they are rarely used for calculations.
"""
# RFC 977 by Brian Kantor and Phil Lapsley.
# xover, xgtitle, xpath, date methods by Kevan Heydon
# Imports
import re
import socket
__all__ = ["NNTP","NNTPReplyError","NNTPTemporaryError",
"NNTPPermanentError","NNTPProtocolError","NNTPDataError",
"error_reply","error_temp","error_perm","error_proto",
"error_data",]
# maximal line length when calling readline(). This is to prevent
# reading arbitrary length lines. RFC 3977 limits NNTP line length to
# 512 characters, including CRLF. We have selected 2048 just to be on
# the safe side.
_MAXLINE = 2048
# Exceptions raised when an error or invalid response is received
class NNTPError(Exception):
"""Base class for all nntplib exceptions"""
def __init__(self, *args):
Exception.__init__(self, *args)
try:
self.response = args[0]
except IndexError:
self.response = 'No response given'
class NNTPReplyError(NNTPError):
"""Unexpected [123]xx reply"""
pass
class NNTPTemporaryError(NNTPError):
"""4xx errors"""
pass
class NNTPPermanentError(NNTPError):
"""5xx errors"""
pass
class NNTPProtocolError(NNTPError):
"""Response does not begin with [1-5]"""
pass
class NNTPDataError(NNTPError):
"""Error in response data"""
pass
# for backwards compatibility
error_reply = NNTPReplyError
error_temp = NNTPTemporaryError
error_perm = NNTPPermanentError
error_proto = NNTPProtocolError
error_data = NNTPDataError
# Standard port used by NNTP servers
NNTP_PORT = 119
# Response numbers that are followed by additional text (e.g. article)
LONGRESP = ['100', '215', '220', '221', '222', '224', '230', '231', '282']
# Line terminators (we always output CRLF, but accept any of CRLF, CR, LF)
CRLF = '\r\n'
# The class itself
class NNTP:
def __init__(self, host, port=NNTP_PORT, user=None, password=None,
readermode=None, usenetrc=True):
"""Initialize an instance. Arguments:
- host: hostname to connect to
- port: port to connect to (default the standard NNTP port)
- user: username to authenticate with
- password: password to use with username
- readermode: if true, send 'mode reader' command after
connecting.
readermode is sometimes necessary if you are connecting to an
NNTP server on the local machine and intend to call
reader-specific commands, such as `group'. If you get
unexpected NNTPPermanentErrors, you might need to set
readermode.
"""
self.host = host
self.port = port
self.sock = socket.create_connection((host, port))
self.file = self.sock.makefile('rb')
self.debugging = 0
self.welcome = self.getresp()
# 'mode reader' is sometimes necessary to enable 'reader' mode.
# However, the order in which 'mode reader' and 'authinfo' need to
# arrive differs between some NNTP servers. Try to send
# 'mode reader', and if it fails with an authorization failed
# error, try again after sending authinfo.
readermode_afterauth = 0
if readermode:
try:
self.welcome = self.shortcmd('mode reader')
except NNTPPermanentError:
# error 500, probably 'not implemented'
pass
except NNTPTemporaryError, e:
if user and e.response[:3] == '480':
# Need authorization before 'mode reader'
readermode_afterauth = 1
else:
raise
# If no login/password was specified, try to get them from ~/.netrc
# Presume that if .netc has an entry, NNRP authentication is required.
try:
if usenetrc and not user:
import netrc
credentials = netrc.netrc()
auth = credentials.authenticators(host)
if auth:
user = auth[0]
password = auth[2]
except IOError:
pass
# Perform NNRP authentication if needed.
if user:
resp = self.shortcmd('authinfo user '+user)
if resp[:3] == '381':
if not password:
raise NNTPReplyError(resp)
else:
resp = self.shortcmd(
'authinfo pass '+password)
if resp[:3] != '281':
raise NNTPPermanentError(resp)
if readermode_afterauth:
try:
self.welcome = self.shortcmd('mode reader')
except NNTPPermanentError:
# error 500, probably 'not implemented'
pass
# Get the welcome message from the server
# (this is read and squirreled away by __init__()).
# If the response code is 200, posting is allowed;
# if it 201, posting is not allowed
def getwelcome(self):
"""Get the welcome message from the server
(this is read and squirreled away by __init__()).
If the response code is 200, posting is allowed;
if it 201, posting is not allowed."""
if self.debugging: print '*welcome*', repr(self.welcome)
return self.welcome
def set_debuglevel(self, level):
"""Set the debugging level. Argument 'level' means:
0: no debugging output (default)
1: print commands and responses but not body text etc.
2: also print raw lines read and sent before stripping CR/LF"""
self.debugging = level
debug = set_debuglevel
def putline(self, line):
"""Internal: send one line to the server, appending CRLF."""
line = line + CRLF
if self.debugging > 1: print '*put*', repr(line)
self.sock.sendall(line)
def putcmd(self, line):
"""Internal: send one command to the server (through putline())."""
if self.debugging: print '*cmd*', repr(line)
self.putline(line)
def getline(self):
"""Internal: return one line from the server, stripping CRLF.
Raise EOFError if the connection is closed."""
line = self.file.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise NNTPDataError('line too long')
if self.debugging > 1:
print '*get*', repr(line)
if not line: raise EOFError
if line[-2:] == CRLF: line = line[:-2]
elif line[-1:] in CRLF: line = line[:-1]
return line
def getresp(self):
"""Internal: get a response from the server.
Raise various errors if the response indicates an error."""
resp = self.getline()
if self.debugging: print '*resp*', repr(resp)
c = resp[:1]
if c == '4':
raise NNTPTemporaryError(resp)
if c == '5':
raise NNTPPermanentError(resp)
if c not in '123':
raise NNTPProtocolError(resp)
return resp
def getlongresp(self, file=None):
"""Internal: get a response plus following text from the server.
Raise various errors if the response indicates an error."""
openedFile = None
try:
# If a string was passed then open a file with that name
if isinstance(file, str):
openedFile = file = open(file, "w")
resp = self.getresp()
if resp[:3] not in LONGRESP:
raise NNTPReplyError(resp)
list = []
while 1:
line = self.getline()
if line == '.':
break
if line[:2] == '..':
line = line[1:]
if file:
file.write(line + "\n")
else:
list.append(line)
finally:
# If this method created the file, then it must close it
if openedFile:
openedFile.close()
return resp, list
def shortcmd(self, line):
"""Internal: send a command and get the response."""
self.putcmd(line)
return self.getresp()
def longcmd(self, line, file=None):
"""Internal: send a command and get the response plus following text."""
self.putcmd(line)
return self.getlongresp(file)
def newgroups(self, date, time, file=None):
"""Process a NEWGROUPS command. Arguments:
- date: string 'yymmdd' indicating the date
- time: string 'hhmmss' indicating the time
Return:
- resp: server response if successful
- list: list of newsgroup names"""
return self.longcmd('NEWGROUPS ' + date + ' ' + time, file)
def newnews(self, group, date, time, file=None):
"""Process a NEWNEWS command. Arguments:
- group: group name or '*'
- date: string 'yymmdd' indicating the date
- time: string 'hhmmss' indicating the time
Return:
- resp: server response if successful
- list: list of message ids"""
cmd = 'NEWNEWS ' + group + ' ' + date + ' ' + time
return self.longcmd(cmd, file)
def list(self, file=None):
"""Process a LIST command. Return:
- resp: server response if successful
- list: list of (group, last, first, flag) (strings)"""
resp, list = self.longcmd('LIST', file)
for i in range(len(list)):
# Parse lines into "group last first flag"
list[i] = tuple(list[i].split())
return resp, list
def description(self, group):
"""Get a description for a single group. If more than one
group matches ('group' is a pattern), return the first. If no
group matches, return an empty string.
This elides the response code from the server, since it can
only be '215' or '285' (for xgtitle) anyway. If the response
code is needed, use the 'descriptions' method.
NOTE: This neither checks for a wildcard in 'group' nor does
it check whether the group actually exists."""
resp, lines = self.descriptions(group)
if len(lines) == 0:
return ""
else:
return lines[0][1]
def descriptions(self, group_pattern):
"""Get descriptions for a range of groups."""
line_pat = re.compile("^(?P<group>[^ \t]+)[ \t]+(.*)$")
# Try the more std (acc. to RFC2980) LIST NEWSGROUPS first
resp, raw_lines = self.longcmd('LIST NEWSGROUPS ' + group_pattern)
if resp[:3] != "215":
# Now the deprecated XGTITLE. This either raises an error
# or succeeds with the same output structure as LIST
# NEWSGROUPS.
resp, raw_lines = self.longcmd('XGTITLE ' + group_pattern)
lines = []
for raw_line in raw_lines:
match = line_pat.search(raw_line.strip())
if match:
lines.append(match.group(1, 2))
return resp, lines
def group(self, name):
"""Process a GROUP command. Argument:
- group: the group name
Returns:
- resp: server response if successful
- count: number of articles (string)
- first: first article number (string)
- last: last article number (string)
- name: the group name"""
resp = self.shortcmd('GROUP ' + name)
if resp[:3] != '211':
raise NNTPReplyError(resp)
words = resp.split()
count = first = last = 0
n = len(words)
if n > 1:
count = words[1]
if n > 2:
first = words[2]
if n > 3:
last = words[3]
if n > 4:
name = words[4].lower()
return resp, count, first, last, name
def help(self, file=None):
"""Process a HELP command. Returns:
- resp: server response if successful
- list: list of strings"""
return self.longcmd('HELP',file)
def statparse(self, resp):
"""Internal: parse the response of a STAT, NEXT or LAST command."""
if resp[:2] != '22':
raise NNTPReplyError(resp)
words = resp.split()
nr = 0
id = ''
n = len(words)
if n > 1:
nr = words[1]
if n > 2:
id = words[2]
return resp, nr, id
def statcmd(self, line):
"""Internal: process a STAT, NEXT or LAST command."""
resp = self.shortcmd(line)
return self.statparse(resp)
def stat(self, id):
"""Process a STAT command. Argument:
- id: article number or message id
Returns:
- resp: server response if successful
- nr: the article number
- id: the message id"""
return self.statcmd('STAT ' + id)
def next(self):
"""Process a NEXT command. No arguments. Return as for STAT."""
return self.statcmd('NEXT')
def last(self):
"""Process a LAST command. No arguments. Return as for STAT."""
return self.statcmd('LAST')
def artcmd(self, line, file=None):
"""Internal: process a HEAD, BODY or ARTICLE command."""
resp, list = self.longcmd(line, file)
resp, nr, id = self.statparse(resp)
return resp, nr, id, list
def head(self, id):
"""Process a HEAD command. Argument:
- id: article number or message id
Returns:
- resp: server response if successful
- nr: article number
- id: message id
- list: the lines of the article's header"""
return self.artcmd('HEAD ' + id)
def body(self, id, file=None):
"""Process a BODY command. Argument:
- id: article number or message id
- file: Filename string or file object to store the article in
Returns:
- resp: server response if successful
- nr: article number
- id: message id
- list: the lines of the article's body or an empty list
if file was used"""
return self.artcmd('BODY ' + id, file)
def article(self, id):
"""Process an ARTICLE command. Argument:
- id: article number or message id
Returns:
- resp: server response if successful
- nr: article number
- id: message id
- list: the lines of the article"""
return self.artcmd('ARTICLE ' + id)
def slave(self):
"""Process a SLAVE command. Returns:
- resp: server response if successful"""
return self.shortcmd('SLAVE')
def xhdr(self, hdr, str, file=None):
"""Process an XHDR command (optional server extension). Arguments:
- hdr: the header type (e.g. 'subject')
- str: an article nr, a message id, or a range nr1-nr2
Returns:
- resp: server response if successful
- list: list of (nr, value) strings"""
pat = re.compile('^([0-9]+) ?(.*)\n?')
resp, lines = self.longcmd('XHDR ' + hdr + ' ' + str, file)
for i in range(len(lines)):
line = lines[i]
m = pat.match(line)
if m:
lines[i] = m.group(1, 2)
return resp, lines
def xover(self, start, end, file=None):
"""Process an XOVER command (optional server extension) Arguments:
- start: start of range
- end: end of range
Returns:
- resp: server response if successful
- list: list of (art-nr, subject, poster, date,
id, references, size, lines)"""
resp, lines = self.longcmd('XOVER ' + start + '-' + end, file)
xover_lines = []
for line in lines:
elem = line.split("\t")
try:
xover_lines.append((elem[0],
elem[1],
elem[2],
elem[3],
elem[4],
elem[5].split(),
elem[6],
elem[7]))
except IndexError:
raise NNTPDataError(line)
return resp,xover_lines
def xgtitle(self, group, file=None):
"""Process an XGTITLE command (optional server extension) Arguments:
- group: group name wildcard (i.e. news.*)
Returns:
- resp: server response if successful
- list: list of (name,title) strings"""
line_pat = re.compile("^([^ \t]+)[ \t]+(.*)$")
resp, raw_lines = self.longcmd('XGTITLE ' + group, file)
lines = []
for raw_line in raw_lines:
match = line_pat.search(raw_line.strip())
if match:
lines.append(match.group(1, 2))
return resp, lines
def xpath(self,id):
"""Process an XPATH command (optional server extension) Arguments:
- id: Message id of article
Returns:
resp: server response if successful
path: directory path to article"""
resp = self.shortcmd("XPATH " + id)
if resp[:3] != '223':
raise NNTPReplyError(resp)
try:
[resp_num, path] = resp.split()
except ValueError:
raise NNTPReplyError(resp)
else:
return resp, path
def date (self):
"""Process the DATE command. Arguments:
None
Returns:
resp: server response if successful
date: Date suitable for newnews/newgroups commands etc.
time: Time suitable for newnews/newgroups commands etc."""
resp = self.shortcmd("DATE")
if resp[:3] != '111':
raise NNTPReplyError(resp)
elem = resp.split()
if len(elem) != 2:
raise NNTPDataError(resp)
date = elem[1][2:8]
time = elem[1][-6:]
if len(date) != 6 or len(time) != 6:
raise NNTPDataError(resp)
return resp, date, time
def post(self, f):
"""Process a POST command. Arguments:
- f: file containing the article
Returns:
- resp: server response if successful"""
resp = self.shortcmd('POST')
# Raises error_??? if posting is not allowed
if resp[0] != '3':
raise NNTPReplyError(resp)
while 1:
line = f.readline()
if not line:
break
if line[-1] == '\n':
line = line[:-1]
if line[:1] == '.':
line = '.' + line
self.putline(line)
self.putline('.')
return self.getresp()
def ihave(self, id, f):
"""Process an IHAVE command. Arguments:
- id: message-id of the article
- f: file containing the article
Returns:
- resp: server response if successful
Note that if the server refuses the article an exception is raised."""
resp = self.shortcmd('IHAVE ' + id)
# Raises error_??? if the server already has it
if resp[0] != '3':
raise NNTPReplyError(resp)
while 1:
line = f.readline()
if not line:
break
if line[-1] == '\n':
line = line[:-1]
if line[:1] == '.':
line = '.' + line
self.putline(line)
self.putline('.')
return self.getresp()
def quit(self):
"""Process a QUIT command and close the socket. Returns:
- resp: server response if successful"""
resp = self.shortcmd('QUIT')
self.file.close()
self.sock.close()
del self.file, self.sock
return resp
# Test retrieval when run as a script.
# Assumption: if there's a local news server, it's called 'news'.
# Assumption: if user queries a remote news server, it's named
# in the environment variable NNTPSERVER (used by slrn and kin)
# and we want readermode off.
if __name__ == '__main__':
import os
newshost = 'news' and os.environ["NNTPSERVER"]
if newshost.find('.') == -1:
mode = 'readermode'
else:
mode = None
s = NNTP(newshost, readermode=mode)
resp, count, first, last, name = s.group('comp.lang.python')
print resp
print 'Group', name, 'has', count, 'articles, range', first, 'to', last
resp, subs = s.xhdr('subject', first + '-' + last)
print resp
for item in subs:
print "%7s %s" % item
resp = s.quit()
print resp
# Module 'ntpath' -- common operations on WinNT/Win95 pathnames
"""Common pathname manipulations, WindowsNT/95 version.
Instead of importing this module directly, import os and refer to this
module as os.path.
"""
import os
import sys
import stat
import genericpath
import warnings
from genericpath import *
from genericpath import _unicode
__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime", "islink","exists","lexists","isdir","isfile",
"ismount","walk","expanduser","expandvars","normpath","abspath",
"splitunc","curdir","pardir","sep","pathsep","defpath","altsep",
"extsep","devnull","realpath","supports_unicode_filenames","relpath"]
# strings representing various path-related bits and pieces
curdir = '.'
pardir = '..'
extsep = '.'
sep = '\\'
pathsep = ';'
altsep = '/'
defpath = '.;C:\\bin'
if 'ce' in sys.builtin_module_names:
defpath = '\\Windows'
elif 'os2' in sys.builtin_module_names:
# OS/2 w/ VACPP
altsep = '/'
devnull = 'nul'
# Normalize the case of a pathname and map slashes to backslashes.
# Other normalizations (such as optimizing '../' away) are not done
# (this is done by normpath).
def normcase(s):
"""Normalize case of pathname.
Makes all characters lowercase and all slashes into backslashes."""
return s.replace("/", "\\").lower()
# Return whether a path is absolute.
# Trivial in Posix, harder on the Mac or MS-DOS.
# For DOS it is absolute if it starts with a slash or backslash (current
# volume), or if a pathname after the volume letter and colon / UNC resource
# starts with a slash or backslash.
def isabs(s):
"""Test whether a path is absolute"""
s = splitdrive(s)[1]
return s != '' and s[:1] in '/\\'
# Join two (or more) paths.
def join(path, *paths):
"""Join two or more pathname components, inserting "\\" as needed."""
result_drive, result_path = splitdrive(path)
for p in paths:
p_drive, p_path = splitdrive(p)
if p_path and p_path[0] in '\\/':
# Second path is absolute
if p_drive or not result_drive:
result_drive = p_drive
result_path = p_path
continue
elif p_drive and p_drive != result_drive:
if p_drive.lower() != result_drive.lower():
# Different drives => ignore the first path entirely
result_drive = p_drive
result_path = p_path
continue
# Same drive in different case
result_drive = p_drive
# Second path is relative to the first
if result_path and result_path[-1] not in '\\/':
result_path = result_path + '\\'
result_path = result_path + p_path
## add separator between UNC and non-absolute path
if (result_path and result_path[0] not in '\\/' and
result_drive and result_drive[-1:] != ':'):
return result_drive + sep + result_path
return result_drive + result_path
# Split a path in a drive specification (a drive letter followed by a
# colon) and the path specification.
# It is always true that drivespec + pathspec == p
def splitdrive(p):
"""Split a pathname into drive/UNC sharepoint and relative path specifiers.
Returns a 2-tuple (drive_or_unc, path); either part may be empty.
If you assign
result = splitdrive(p)
It is always true that:
result[0] + result[1] == p
If the path contained a drive letter, drive_or_unc will contain everything
up to and including the colon. e.g. splitdrive("c:/dir") returns ("c:", "/dir")
If the path contained a UNC path, the drive_or_unc will contain the host name
and share up to but not including the fourth directory separator character.
e.g. splitdrive("//host/computer/dir") returns ("//host/computer", "/dir")
Paths cannot contain both a drive letter and a UNC path.
"""
if len(p) > 1:
normp = p.replace(altsep, sep)
if (normp[0:2] == sep*2) and (normp[2:3] != sep):
# is a UNC path:
# vvvvvvvvvvvvvvvvvvvv drive letter or UNC path
# \\machine\mountpoint\directory\etc\...
# directory ^^^^^^^^^^^^^^^
index = normp.find(sep, 2)
if index == -1:
return '', p
index2 = normp.find(sep, index + 1)
# a UNC path can't have two slashes in a row
# (after the initial two)
if index2 == index + 1:
return '', p
if index2 == -1:
index2 = len(p)
return p[:index2], p[index2:]
if normp[1] == ':':
return p[:2], p[2:]
return '', p
# Parse UNC paths
def splitunc(p):
"""Split a pathname into UNC mount point and relative path specifiers.
Return a 2-tuple (unc, rest); either part may be empty.
If unc is not empty, it has the form '//host/mount' (or similar
using backslashes). unc+rest is always the input path.
Paths containing drive letters never have a UNC part.
"""
if p[1:2] == ':':
return '', p # Drive letter present
firstTwo = p[0:2]
if firstTwo == '//' or firstTwo == '\\\\':
# is a UNC path:
# vvvvvvvvvvvvvvvvvvvv equivalent to drive letter
# \\machine\mountpoint\directories...
# directory ^^^^^^^^^^^^^^^
normp = p.replace('\\', '/')
index = normp.find('/', 2)
if index <= 2:
return '', p
index2 = normp.find('/', index + 1)
# a UNC path can't have two slashes in a row
# (after the initial two)
if index2 == index + 1:
return '', p
if index2 == -1:
index2 = len(p)
return p[:index2], p[index2:]
return '', p
# Split a path in head (everything up to the last '/') and tail (the
# rest). After the trailing '/' is stripped, the invariant
# join(head, tail) == p holds.
# The resulting head won't end in '/' unless it is the root.
def split(p):
"""Split a pathname.
Return tuple (head, tail) where tail is everything after the final slash.
Either part may be empty."""
d, p = splitdrive(p)
# set i to index beyond p's last slash
i = len(p)
while i and p[i-1] not in '/\\':
i = i - 1
head, tail = p[:i], p[i:] # now tail has no slashes
# remove trailing slashes from head, unless it's all slashes
head2 = head
while head2 and head2[-1] in '/\\':
head2 = head2[:-1]
head = head2 or head
return d + head, tail
# Split a path in root and extension.
# The extension is everything starting at the last dot in the last
# pathname component; the root is everything before that.
# It is always true that root + ext == p.
def splitext(p):
return genericpath._splitext(p, sep, altsep, extsep)
splitext.__doc__ = genericpath._splitext.__doc__
# Return the tail (basename) part of a path.
def basename(p):
"""Returns the final component of a pathname"""
return split(p)[1]
# Return the head (dirname) part of a path.
def dirname(p):
"""Returns the directory component of a pathname"""
return split(p)[0]
# Is a path a symbolic link?
# This will always return false on systems where posix.lstat doesn't exist.
def islink(path):
"""Test for symbolic link.
On WindowsNT/95 and OS/2 always returns false
"""
return False
# alias exists to lexists
lexists = exists
# Is a path a mount point? Either a root (with or without drive letter)
# or a UNC path with at most a / or \ after the mount point.
def ismount(path):
"""Test whether a path is a mount point (defined as root of drive)"""
unc, rest = splitunc(path)
if unc:
return rest in ("", "/", "\\")
p = splitdrive(path)[1]
return len(p) == 1 and p[0] in '/\\'
# Directory tree walk.
# For each directory under top (including top itself, but excluding
# '.' and '..'), func(arg, dirname, filenames) is called, where
# dirname is the name of the directory and filenames is the list
# of files (and subdirectories etc.) in the directory.
# The func may modify the filenames list, to implement a filter,
# or to impose a different order of visiting.
def walk(top, func, arg):
"""Directory tree walk with callback function.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), call func(arg, dirname, fnames).
dirname is the name of the directory, and fnames a list of the names of
the files and subdirectories in dirname (excluding '.' and '..'). func
may modify the fnames list in-place (e.g. via del or slice assignment),
and walk will only recurse into the subdirectories whose names remain in
fnames; this can be used to implement a filter, or to impose a specific
order of visiting. No semantics are defined for, or required of, arg,
beyond that arg is always passed to func. It can be used, e.g., to pass
a filename pattern, or a mutable object designed to accumulate
statistics. Passing None for arg is common."""
warnings.warnpy3k("In 3.x, os.path.walk is removed in favor of os.walk.",
stacklevel=2)
try:
names = os.listdir(top)
except os.error:
return
func(arg, top, names)
for name in names:
name = join(top, name)
if isdir(name):
walk(name, func, arg)
# Expand paths beginning with '~' or '~user'.
# '~' means $HOME; '~user' means that user's home directory.
# If the path doesn't begin with '~', or if the user or $HOME is unknown,
# the path is returned unchanged (leaving error reporting to whatever
# function is called with the expanded path as argument).
# See also module 'glob' for expansion of *, ? and [...] in pathnames.
# (A function should also be defined to do full *sh-style environment
# variable expansion.)
def expanduser(path):
"""Expand ~ and ~user constructs.
If user or $HOME is unknown, do nothing."""
if path[:1] != '~':
return path
i, n = 1, len(path)
while i < n and path[i] not in '/\\':
i = i + 1
if 'HOME' in os.environ:
userhome = os.environ['HOME']
elif 'USERPROFILE' in os.environ:
userhome = os.environ['USERPROFILE']
elif not 'HOMEPATH' in os.environ:
return path
else:
try:
drive = os.environ['HOMEDRIVE']
except KeyError:
drive = ''
userhome = join(drive, os.environ['HOMEPATH'])
if i != 1: #~user
userhome = join(dirname(userhome), path[1:i])
return userhome + path[i:]
# Expand paths containing shell variable substitutions.
# The following rules apply:
# - no expansion within single quotes
# - '$$' is translated into '$'
# - '%%' is translated into '%' if '%%' are not seen in %var1%%var2%
# - ${varname} is accepted.
# - $varname is accepted.
# - %varname% is accepted.
# - varnames can be made out of letters, digits and the characters '_-'
# (though is not verified in the ${varname} and %varname% cases)
# XXX With COMMAND.COM you can use any characters in a variable name,
# XXX except '^|<>='.
def expandvars(path):
"""Expand shell variables of the forms $var, ${var} and %var%.
Unknown variables are left unchanged."""
if '$' not in path and '%' not in path:
return path
import string
varchars = string.ascii_letters + string.digits + '_-'
if sys.platform != 'cli' and isinstance(path, _unicode):
encoding = sys.getfilesystemencoding()
def getenv(var):
return os.environ[var.encode(encoding)].decode(encoding)
else:
def getenv(var):
return os.environ[var]
res = ''
index = 0
pathlen = len(path)
while index < pathlen:
c = path[index]
if c == '\'': # no expansion within single quotes
path = path[index + 1:]
pathlen = len(path)
try:
index = path.index('\'')
res = res + '\'' + path[:index + 1]
except ValueError:
res = res + c + path
index = pathlen - 1
elif c == '%': # variable or '%'
if path[index + 1:index + 2] == '%':
res = res + c
index = index + 1
else:
path = path[index+1:]
pathlen = len(path)
try:
index = path.index('%')
except ValueError:
res = res + '%' + path
index = pathlen - 1
else:
var = path[:index]
try:
res = res + getenv(var)
except KeyError:
res = res + '%' + var + '%'
elif c == '$': # variable or '$$'
if path[index + 1:index + 2] == '$':
res = res + c
index = index + 1
elif path[index + 1:index + 2] == '{':
path = path[index+2:]
pathlen = len(path)
try:
index = path.index('}')
var = path[:index]
try:
res = res + getenv(var)
except KeyError:
res = res + '${' + var + '}'
except ValueError:
res = res + '${' + path
index = pathlen - 1
else:
var = ''
index = index + 1
c = path[index:index + 1]
while c != '' and c in varchars:
var = var + c
index = index + 1
c = path[index:index + 1]
try:
res = res + getenv(var)
except KeyError:
res = res + '$' + var
if c != '':
index = index - 1
else:
res = res + c
index = index + 1
return res
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A\B.
# Previously, this function also truncated pathnames to 8+3 format,
# but as this module is called "ntpath", that's obviously wrong!
def normpath(path):
"""Normalize path, eliminating double slashes, etc."""
# Preserve unicode (if path is unicode)
backslash, dot = (u'\\', u'.') if isinstance(path, _unicode) else ('\\', '.')
if path.startswith(('\\\\.\\', '\\\\?\\')):
# in the case of paths with these prefixes:
# \\.\ -> device names
# \\?\ -> literal paths
# do not do any normalization, but return the path unchanged
return path
path = path.replace("/", "\\")
prefix, path = splitdrive(path)
# We need to be careful here. If the prefix is empty, and the path starts
# with a backslash, it could either be an absolute path on the current
# drive (\dir1\dir2\file) or a UNC filename (\\server\mount\dir1\file). It
# is therefore imperative NOT to collapse multiple backslashes blindly in
# that case.
# The code below preserves multiple backslashes when there is no drive
# letter. This means that the invalid filename \\\a\b is preserved
# unchanged, where a\\\b is normalised to a\b. It's not clear that there
# is any better behaviour for such edge cases.
if prefix == '':
# No drive letter - preserve initial backslashes
while path[:1] == "\\":
prefix = prefix + backslash
path = path[1:]
else:
# We have a drive letter - collapse initial backslashes
if path.startswith("\\"):
prefix = prefix + backslash
path = path.lstrip("\\")
comps = path.split("\\")
i = 0
while i < len(comps):
if comps[i] in ('.', ''):
del comps[i]
elif comps[i] == '..':
if i > 0 and comps[i-1] != '..':
del comps[i-1:i+1]
i -= 1
elif i == 0 and prefix.endswith("\\"):
del comps[i]
else:
i += 1
else:
i += 1
# If the path is now empty, substitute '.'
if not prefix and not comps:
comps.append(dot)
return prefix + backslash.join(comps)
# Return an absolute path.
try:
from nt import _getfullpathname
except ImportError: # not running on Windows - mock up something sensible
def abspath(path):
"""Return the absolute version of a path."""
if not isabs(path):
if isinstance(path, _unicode):
cwd = os.getcwdu()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path)
else: # use native Windows method on Windows
def abspath(path):
"""Return the absolute version of a path."""
if path: # Empty path must return current working directory.
try:
path = _getfullpathname(path)
except WindowsError:
pass # Bad path - return unchanged.
elif isinstance(path, _unicode):
path = os.getcwdu()
else:
path = os.getcwd()
return normpath(path)
# realpath is a no-op on systems without islink support
realpath = abspath
# Win9x family and earlier have no Unicode filename support.
supports_unicode_filenames = (hasattr(sys, "getwindowsversion") and
sys.getwindowsversion()[3] >= 2)
def _abspath_split(path):
abs = abspath(normpath(path))
prefix, rest = splitunc(abs)
is_unc = bool(prefix)
if not is_unc:
prefix, rest = splitdrive(abs)
return is_unc, prefix, [x for x in rest.split(sep) if x]
def relpath(path, start=curdir):
"""Return a relative version of a path"""
if not path:
raise ValueError("no path specified")
start_is_unc, start_prefix, start_list = _abspath_split(start)
path_is_unc, path_prefix, path_list = _abspath_split(path)
if path_is_unc ^ start_is_unc:
raise ValueError("Cannot mix UNC and non-UNC paths (%s and %s)"
% (path, start))
if path_prefix.lower() != start_prefix.lower():
if path_is_unc:
raise ValueError("path is on UNC root %s, start on UNC root %s"
% (path_prefix, start_prefix))
else:
raise ValueError("path is on drive %s, start on drive %s"
% (path_prefix, start_prefix))
# Work out how much of the filepath is shared by start and path.
i = 0
for e1, e2 in zip(start_list, path_list):
if e1.lower() != e2.lower():
break
i += 1
rel_list = [pardir] * (len(start_list)-i) + path_list[i:]
if not rel_list:
return curdir
return join(*rel_list)
try:
# The genericpath.isdir implementation uses os.stat and checks the mode
# attribute to tell whether or not the path is a directory.
# This is overkill on Windows - just pass the path to GetFileAttributes
# and check the attribute from there.
from nt import _isdir as isdir
except ImportError:
# Use genericpath.isdir as imported above.
pass
"""Convert a NT pathname to a file URL and vice versa."""
def url2pathname(url):
"""OS-specific conversion from a relative URL of the 'file' scheme
to a file system path; not recommended for general use."""
# e.g.
# ///C|/foo/bar/spam.foo
# and
# ///C:/foo/bar/spam.foo
# become
# C:\foo\bar\spam.foo
import string, urllib
# Windows itself uses ":" even in URLs.
url = url.replace(':', '|')
if not '|' in url:
# No drive specifier, just convert slashes
if url[:4] == '////':
# path is something like ////host/path/on/remote/host
# convert this to \\host\path\on\remote\host
# (notice halving of slashes at the start of the path)
url = url[2:]
components = url.split('/')
# make sure not to convert quoted slashes :-)
return urllib.unquote('\\'.join(components))
comp = url.split('|')
if len(comp) != 2 or comp[0][-1] not in string.ascii_letters:
error = 'Bad URL: ' + url
raise IOError, error
drive = comp[0][-1].upper()
path = drive + ':'
components = comp[1].split('/')
for comp in components:
if comp:
path = path + '\\' + urllib.unquote(comp)
# Issue #11474: url like '/C|/' should convert into 'C:\\'
if path.endswith(':') and url.endswith('/'):
path += '\\'
return path
def pathname2url(p):
"""OS-specific conversion from a file system path to a relative URL
of the 'file' scheme; not recommended for general use."""
# e.g.
# C:\foo\bar\spam.foo
# becomes
# ///C:/foo/bar/spam.foo
import urllib
if not ':' in p:
# No drive specifier, just convert slashes and quote the name
if p[:2] == '\\\\':
# path is something like \\host\path\on\remote\host
# convert this to ////host/path/on/remote/host
# (notice doubling of slashes at the start of the path)
p = '\\\\' + p
components = p.split('\\')
return urllib.quote('/'.join(components))
comp = p.split(':')
if len(comp) != 2 or len(comp[0]) > 1:
error = 'Bad path: ' + p
raise IOError, error
drive = urllib.quote(comp[0].upper())
components = comp[1].split('\\')
path = '///' + drive + ':'
for comp in components:
if comp:
path = path + '/' + urllib.quote(comp)
return path
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Abstract Base Classes (ABCs) for numbers, according to PEP 3141.
TODO: Fill out more detailed documentation on the operators."""
from __future__ import division
from abc import ABCMeta, abstractmethod, abstractproperty
__all__ = ["Number", "Complex", "Real", "Rational", "Integral"]
class Number(object):
"""All numbers inherit from this class.
If you just want to check if an argument x is a number, without
caring what kind, use isinstance(x, Number).
"""
__metaclass__ = ABCMeta
__slots__ = ()
# Concrete numeric types must provide their own hash implementation
__hash__ = None
## Notes on Decimal
## ----------------
## Decimal has all of the methods specified by the Real abc, but it should
## not be registered as a Real because decimals do not interoperate with
## binary floats (i.e. Decimal('3.14') + 2.71828 is undefined). But,
## abstract reals are expected to interoperate (i.e. R1 + R2 should be
## expected to work if R1 and R2 are both Reals).
class Complex(Number):
"""Complex defines the operations that work on the builtin complex type.
In short, those are: a conversion to complex, .real, .imag, +, -,
*, /, abs(), .conjugate, ==, and !=.
If it is given heterogenous arguments, and doesn't have special
knowledge about them, it should fall back to the builtin complex
type as described below.
"""
__slots__ = ()
@abstractmethod
def __complex__(self):
"""Return a builtin complex instance. Called for complex(self)."""
# Will be __bool__ in 3.0.
def __nonzero__(self):
"""True if self != 0. Called for bool(self)."""
return self != 0
@abstractproperty
def real(self):
"""Retrieve the real component of this number.
This should subclass Real.
"""
raise NotImplementedError
@abstractproperty
def imag(self):
"""Retrieve the imaginary component of this number.
This should subclass Real.
"""
raise NotImplementedError
@abstractmethod
def __add__(self, other):
"""self + other"""
raise NotImplementedError
@abstractmethod
def __radd__(self, other):
"""other + self"""
raise NotImplementedError
@abstractmethod
def __neg__(self):
"""-self"""
raise NotImplementedError
@abstractmethod
def __pos__(self):
"""+self"""
raise NotImplementedError
def __sub__(self, other):
"""self - other"""
return self + -other
def __rsub__(self, other):
"""other - self"""
return -self + other
@abstractmethod
def __mul__(self, other):
"""self * other"""
raise NotImplementedError
@abstractmethod
def __rmul__(self, other):
"""other * self"""
raise NotImplementedError
@abstractmethod
def __div__(self, other):
"""self / other without __future__ division
May promote to float.
"""
raise NotImplementedError
@abstractmethod
def __rdiv__(self, other):
"""other / self without __future__ division"""
raise NotImplementedError
@abstractmethod
def __truediv__(self, other):
"""self / other with __future__ division.
Should promote to float when necessary.
"""
raise NotImplementedError
@abstractmethod
def __rtruediv__(self, other):
"""other / self with __future__ division"""
raise NotImplementedError
@abstractmethod
def __pow__(self, exponent):
"""self**exponent; should promote to float or complex when necessary."""
raise NotImplementedError
@abstractmethod
def __rpow__(self, base):
"""base ** self"""
raise NotImplementedError
@abstractmethod
def __abs__(self):
"""Returns the Real distance from 0. Called for abs(self)."""
raise NotImplementedError
@abstractmethod
def conjugate(self):
"""(x+y*i).conjugate() returns (x-y*i)."""
raise NotImplementedError
@abstractmethod
def __eq__(self, other):
"""self == other"""
raise NotImplementedError
def __ne__(self, other):
"""self != other"""
# The default __ne__ doesn't negate __eq__ until 3.0.
return not (self == other)
Complex.register(complex)
class Real(Complex):
"""To Complex, Real adds the operations that work on real numbers.
In short, those are: a conversion to float, trunc(), divmod,
%, <, <=, >, and >=.
Real also provides defaults for the derived operations.
"""
__slots__ = ()
@abstractmethod
def __float__(self):
"""Any Real can be converted to a native float object.
Called for float(self)."""
raise NotImplementedError
@abstractmethod
def __trunc__(self):
"""trunc(self): Truncates self to an Integral.
Returns an Integral i such that:
* i>0 iff self>0;
* abs(i) <= abs(self);
* for any Integral j satisfying the first two conditions,
abs(i) >= abs(j) [i.e. i has "maximal" abs among those].
i.e. "truncate towards 0".
"""
raise NotImplementedError
def __divmod__(self, other):
"""divmod(self, other): The pair (self // other, self % other).
Sometimes this can be computed faster than the pair of
operations.
"""
return (self // other, self % other)
def __rdivmod__(self, other):
"""divmod(other, self): The pair (self // other, self % other).
Sometimes this can be computed faster than the pair of
operations.
"""
return (other // self, other % self)
@abstractmethod
def __floordiv__(self, other):
"""self // other: The floor() of self/other."""
raise NotImplementedError
@abstractmethod
def __rfloordiv__(self, other):
"""other // self: The floor() of other/self."""
raise NotImplementedError
@abstractmethod
def __mod__(self, other):
"""self % other"""
raise NotImplementedError
@abstractmethod
def __rmod__(self, other):
"""other % self"""
raise NotImplementedError
@abstractmethod
def __lt__(self, other):
"""self < other
< on Reals defines a total ordering, except perhaps for NaN."""
raise NotImplementedError
@abstractmethod
def __le__(self, other):
"""self <= other"""
raise NotImplementedError
# Concrete implementations of Complex abstract methods.
def __complex__(self):
"""complex(self) == complex(float(self), 0)"""
return complex(float(self))
@property
def real(self):
"""Real numbers are their real component."""
return +self
@property
def imag(self):
"""Real numbers have no imaginary component."""
return 0
def conjugate(self):
"""Conjugate is a no-op for Reals."""
return +self
Real.register(float)
class Rational(Real):
""".numerator and .denominator should be in lowest terms."""
__slots__ = ()
@abstractproperty
def numerator(self):
raise NotImplementedError
@abstractproperty
def denominator(self):
raise NotImplementedError
# Concrete implementation of Real's conversion to float.
def __float__(self):
"""float(self) = self.numerator / self.denominator
It's important that this conversion use the integer's "true"
division rather than casting one side to float before dividing
so that ratios of huge integers convert without overflowing.
"""
return self.numerator / self.denominator
class Integral(Rational):
"""Integral adds a conversion to long and the bit-string operations."""
__slots__ = ()
@abstractmethod
def __long__(self):
"""long(self)"""
raise NotImplementedError
def __index__(self):
"""Called whenever an index is needed, such as in slicing"""
return long(self)
@abstractmethod
def __pow__(self, exponent, modulus=None):
"""self ** exponent % modulus, but maybe faster.
Accept the modulus argument if you want to support the
3-argument version of pow(). Raise a TypeError if exponent < 0
or any argument isn't Integral. Otherwise, just implement the
2-argument version described in Complex.
"""
raise NotImplementedError
@abstractmethod
def __lshift__(self, other):
"""self << other"""
raise NotImplementedError
@abstractmethod
def __rlshift__(self, other):
"""other << self"""
raise NotImplementedError
@abstractmethod
def __rshift__(self, other):
"""self >> other"""
raise NotImplementedError
@abstractmethod
def __rrshift__(self, other):
"""other >> self"""
raise NotImplementedError
@abstractmethod
def __and__(self, other):
"""self & other"""
raise NotImplementedError
@abstractmethod
def __rand__(self, other):
"""other & self"""
raise NotImplementedError
@abstractmethod
def __xor__(self, other):
"""self ^ other"""
raise NotImplementedError
@abstractmethod
def __rxor__(self, other):
"""other ^ self"""
raise NotImplementedError
@abstractmethod
def __or__(self, other):
"""self | other"""
raise NotImplementedError
@abstractmethod
def __ror__(self, other):
"""other | self"""
raise NotImplementedError
@abstractmethod
def __invert__(self):
"""~self"""
raise NotImplementedError
# Concrete implementations of Rational and Real abstract methods.
def __float__(self):
"""float(self) == float(long(self))"""
return float(long(self))
@property
def numerator(self):
"""Integers are their own numerators."""
return +self
@property
def denominator(self):
"""Integers have a denominator of 1."""
return 1
Integral.register(int)
Integral.register(long)
"""
opcode module - potentially shared between dis and other modules which
operate on bytecodes (e.g. peephole optimizers).
"""
__all__ = ["cmp_op", "hasconst", "hasname", "hasjrel", "hasjabs",
"haslocal", "hascompare", "hasfree", "opname", "opmap",
"HAVE_ARGUMENT", "EXTENDED_ARG"]
cmp_op = ('<', '<=', '==', '!=', '>', '>=', 'in', 'not in', 'is',
'is not', 'exception match', 'BAD')
hasconst = []
hasname = []
hasjrel = []
hasjabs = []
haslocal = []
hascompare = []
hasfree = []
opmap = {}
opname = [''] * 256
for op in range(256): opname[op] = '<%r>' % (op,)
del op
def def_op(name, op):
opname[op] = name
opmap[name] = op
def name_op(name, op):
def_op(name, op)
hasname.append(op)
def jrel_op(name, op):
def_op(name, op)
hasjrel.append(op)
def jabs_op(name, op):
def_op(name, op)
hasjabs.append(op)
# Instruction opcodes for compiled code
# Blank lines correspond to available opcodes
def_op('STOP_CODE', 0)
def_op('POP_TOP', 1)
def_op('ROT_TWO', 2)
def_op('ROT_THREE', 3)
def_op('DUP_TOP', 4)
def_op('ROT_FOUR', 5)
def_op('NOP', 9)
def_op('UNARY_POSITIVE', 10)
def_op('UNARY_NEGATIVE', 11)
def_op('UNARY_NOT', 12)
def_op('UNARY_CONVERT', 13)
def_op('UNARY_INVERT', 15)
def_op('BINARY_POWER', 19)
def_op('BINARY_MULTIPLY', 20)
def_op('BINARY_DIVIDE', 21)
def_op('BINARY_MODULO', 22)
def_op('BINARY_ADD', 23)
def_op('BINARY_SUBTRACT', 24)
def_op('BINARY_SUBSCR', 25)
def_op('BINARY_FLOOR_DIVIDE', 26)
def_op('BINARY_TRUE_DIVIDE', 27)
def_op('INPLACE_FLOOR_DIVIDE', 28)
def_op('INPLACE_TRUE_DIVIDE', 29)
def_op('SLICE+0', 30)
def_op('SLICE+1', 31)
def_op('SLICE+2', 32)
def_op('SLICE+3', 33)
def_op('STORE_SLICE+0', 40)
def_op('STORE_SLICE+1', 41)
def_op('STORE_SLICE+2', 42)
def_op('STORE_SLICE+3', 43)
def_op('DELETE_SLICE+0', 50)
def_op('DELETE_SLICE+1', 51)
def_op('DELETE_SLICE+2', 52)
def_op('DELETE_SLICE+3', 53)
def_op('STORE_MAP', 54)
def_op('INPLACE_ADD', 55)
def_op('INPLACE_SUBTRACT', 56)
def_op('INPLACE_MULTIPLY', 57)
def_op('INPLACE_DIVIDE', 58)
def_op('INPLACE_MODULO', 59)
def_op('STORE_SUBSCR', 60)
def_op('DELETE_SUBSCR', 61)
def_op('BINARY_LSHIFT', 62)
def_op('BINARY_RSHIFT', 63)
def_op('BINARY_AND', 64)
def_op('BINARY_XOR', 65)
def_op('BINARY_OR', 66)
def_op('INPLACE_POWER', 67)
def_op('GET_ITER', 68)
def_op('PRINT_EXPR', 70)
def_op('PRINT_ITEM', 71)
def_op('PRINT_NEWLINE', 72)
def_op('PRINT_ITEM_TO', 73)
def_op('PRINT_NEWLINE_TO', 74)
def_op('INPLACE_LSHIFT', 75)
def_op('INPLACE_RSHIFT', 76)
def_op('INPLACE_AND', 77)
def_op('INPLACE_XOR', 78)
def_op('INPLACE_OR', 79)
def_op('BREAK_LOOP', 80)
def_op('WITH_CLEANUP', 81)
def_op('LOAD_LOCALS', 82)
def_op('RETURN_VALUE', 83)
def_op('IMPORT_STAR', 84)
def_op('EXEC_STMT', 85)
def_op('YIELD_VALUE', 86)
def_op('POP_BLOCK', 87)
def_op('END_FINALLY', 88)
def_op('BUILD_CLASS', 89)
HAVE_ARGUMENT = 90 # Opcodes from here have an argument:
name_op('STORE_NAME', 90) # Index in name list
name_op('DELETE_NAME', 91) # ""
def_op('UNPACK_SEQUENCE', 92) # Number of tuple items
jrel_op('FOR_ITER', 93)
def_op('LIST_APPEND', 94)
name_op('STORE_ATTR', 95) # Index in name list
name_op('DELETE_ATTR', 96) # ""
name_op('STORE_GLOBAL', 97) # ""
name_op('DELETE_GLOBAL', 98) # ""
def_op('DUP_TOPX', 99) # number of items to duplicate
def_op('LOAD_CONST', 100) # Index in const list
hasconst.append(100)
name_op('LOAD_NAME', 101) # Index in name list
def_op('BUILD_TUPLE', 102) # Number of tuple items
def_op('BUILD_LIST', 103) # Number of list items
def_op('BUILD_SET', 104) # Number of set items
def_op('BUILD_MAP', 105) # Number of dict entries (upto 255)
name_op('LOAD_ATTR', 106) # Index in name list
def_op('COMPARE_OP', 107) # Comparison operator
hascompare.append(107)
name_op('IMPORT_NAME', 108) # Index in name list
name_op('IMPORT_FROM', 109) # Index in name list
jrel_op('JUMP_FORWARD', 110) # Number of bytes to skip
jabs_op('JUMP_IF_FALSE_OR_POP', 111) # Target byte offset from beginning of code
jabs_op('JUMP_IF_TRUE_OR_POP', 112) # ""
jabs_op('JUMP_ABSOLUTE', 113) # ""
jabs_op('POP_JUMP_IF_FALSE', 114) # ""
jabs_op('POP_JUMP_IF_TRUE', 115) # ""
name_op('LOAD_GLOBAL', 116) # Index in name list
jabs_op('CONTINUE_LOOP', 119) # Target address
jrel_op('SETUP_LOOP', 120) # Distance to target address
jrel_op('SETUP_EXCEPT', 121) # ""
jrel_op('SETUP_FINALLY', 122) # ""
def_op('LOAD_FAST', 124) # Local variable number
haslocal.append(124)
def_op('STORE_FAST', 125) # Local variable number
haslocal.append(125)
def_op('DELETE_FAST', 126) # Local variable number
haslocal.append(126)
def_op('RAISE_VARARGS', 130) # Number of raise arguments (1, 2, or 3)
def_op('CALL_FUNCTION', 131) # #args + (#kwargs << 8)
def_op('MAKE_FUNCTION', 132) # Number of args with default values
def_op('BUILD_SLICE', 133) # Number of items
def_op('MAKE_CLOSURE', 134)
def_op('LOAD_CLOSURE', 135)
hasfree.append(135)
def_op('LOAD_DEREF', 136)
hasfree.append(136)
def_op('STORE_DEREF', 137)
hasfree.append(137)
def_op('CALL_FUNCTION_VAR', 140) # #args + (#kwargs << 8)
def_op('CALL_FUNCTION_KW', 141) # #args + (#kwargs << 8)
def_op('CALL_FUNCTION_VAR_KW', 142) # #args + (#kwargs << 8)
jrel_op('SETUP_WITH', 143)
def_op('EXTENDED_ARG', 145)
EXTENDED_ARG = 145
def_op('SET_ADD', 146)
def_op('MAP_ADD', 147)
del def_op, name_op, jrel_op, jabs_op
"""A powerful, extensible, and easy-to-use option parser.
By Greg Ward <[email protected]>
Originally distributed as Optik.
For support, use the [email protected] mailing list
(http://lists.sourceforge.net/lists/listinfo/optik-users).
Simple usage example:
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
action="store_false", dest="verbose", default=True,
help="don't print status messages to stdout")
(options, args) = parser.parse_args()
"""
__version__ = "1.5.3"
__all__ = ['Option',
'make_option',
'SUPPRESS_HELP',
'SUPPRESS_USAGE',
'Values',
'OptionContainer',
'OptionGroup',
'OptionParser',
'HelpFormatter',
'IndentedHelpFormatter',
'TitledHelpFormatter',
'OptParseError',
'OptionError',
'OptionConflictError',
'OptionValueError',
'BadOptionError']
__copyright__ = """
Copyright (c) 2001-2006 Gregory P. Ward. All rights reserved.
Copyright (c) 2002-2006 Python Software Foundation. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the author nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
import sys, os
import types
import textwrap
def _repr(self):
return "<%s at 0x%x: %s>" % (self.__class__.__name__, id(self), self)
# This file was generated from:
# Id: option_parser.py 527 2006-07-23 15:21:30Z greg
# Id: option.py 522 2006-06-11 16:22:03Z gward
# Id: help.py 527 2006-07-23 15:21:30Z greg
# Id: errors.py 509 2006-04-20 00:58:24Z gward
try:
from gettext import gettext
except ImportError:
def gettext(message):
return message
_ = gettext
class OptParseError (Exception):
def __init__(self, msg):
self.msg = msg
def __str__(self):
return self.msg
class OptionError (OptParseError):
"""
Raised if an Option instance is created with invalid or
inconsistent arguments.
"""
def __init__(self, msg, option):
self.msg = msg
self.option_id = str(option)
def __str__(self):
if self.option_id:
return "option %s: %s" % (self.option_id, self.msg)
else:
return self.msg
class OptionConflictError (OptionError):
"""
Raised if conflicting options are added to an OptionParser.
"""
class OptionValueError (OptParseError):
"""
Raised if an invalid option value is encountered on the command
line.
"""
class BadOptionError (OptParseError):
"""
Raised if an invalid option is seen on the command line.
"""
def __init__(self, opt_str):
self.opt_str = opt_str
def __str__(self):
return _("no such option: %s") % self.opt_str
class AmbiguousOptionError (BadOptionError):
"""
Raised if an ambiguous option is seen on the command line.
"""
def __init__(self, opt_str, possibilities):
BadOptionError.__init__(self, opt_str)
self.possibilities = possibilities
def __str__(self):
return (_("ambiguous option: %s (%s?)")
% (self.opt_str, ", ".join(self.possibilities)))
class HelpFormatter:
"""
Abstract base class for formatting option help. OptionParser
instances should use one of the HelpFormatter subclasses for
formatting help; by default IndentedHelpFormatter is used.
Instance attributes:
parser : OptionParser
the controlling OptionParser instance
indent_increment : int
the number of columns to indent per nesting level
max_help_position : int
the maximum starting column for option help text
help_position : int
the calculated starting column for option help text;
initially the same as the maximum
width : int
total number of columns for output (pass None to constructor for
this value to be taken from the $COLUMNS environment variable)
level : int
current indentation level
current_indent : int
current indentation level (in columns)
help_width : int
number of columns available for option help text (calculated)
default_tag : str
text to replace with each option's default value, "%default"
by default. Set to false value to disable default value expansion.
option_strings : { Option : str }
maps Option instances to the snippet of help text explaining
the syntax of that option, e.g. "-h, --help" or
"-fFILE, --file=FILE"
_short_opt_fmt : str
format string controlling how short options with values are
printed in help text. Must be either "%s%s" ("-fFILE") or
"%s %s" ("-f FILE"), because those are the two syntaxes that
Optik supports.
_long_opt_fmt : str
similar but for long options; must be either "%s %s" ("--file FILE")
or "%s=%s" ("--file=FILE").
"""
NO_DEFAULT_VALUE = "none"
def __init__(self,
indent_increment,
max_help_position,
width,
short_first):
self.parser = None
self.indent_increment = indent_increment
if width is None:
try:
width = int(os.environ['COLUMNS'])
except (KeyError, ValueError):
width = 80
width -= 2
self.width = width
self.help_position = self.max_help_position = \
min(max_help_position, max(width - 20, indent_increment * 2))
self.current_indent = 0
self.level = 0
self.help_width = None # computed later
self.short_first = short_first
self.default_tag = "%default"
self.option_strings = {}
self._short_opt_fmt = "%s %s"
self._long_opt_fmt = "%s=%s"
def set_parser(self, parser):
self.parser = parser
def set_short_opt_delimiter(self, delim):
if delim not in ("", " "):
raise ValueError(
"invalid metavar delimiter for short options: %r" % delim)
self._short_opt_fmt = "%s" + delim + "%s"
def set_long_opt_delimiter(self, delim):
if delim not in ("=", " "):
raise ValueError(
"invalid metavar delimiter for long options: %r" % delim)
self._long_opt_fmt = "%s" + delim + "%s"
def indent(self):
self.current_indent += self.indent_increment
self.level += 1
def dedent(self):
self.current_indent -= self.indent_increment
assert self.current_indent >= 0, "Indent decreased below 0."
self.level -= 1
def format_usage(self, usage):
raise NotImplementedError, "subclasses must implement"
def format_heading(self, heading):
raise NotImplementedError, "subclasses must implement"
def _format_text(self, text):
"""
Format a paragraph of free-form text for inclusion in the
help output at the current indentation level.
"""
text_width = max(self.width - self.current_indent, 11)
indent = " "*self.current_indent
return textwrap.fill(text,
text_width,
initial_indent=indent,
subsequent_indent=indent)
def format_description(self, description):
if description:
return self._format_text(description) + "\n"
else:
return ""
def format_epilog(self, epilog):
if epilog:
return "\n" + self._format_text(epilog) + "\n"
else:
return ""
def expand_default(self, option):
if self.parser is None or not self.default_tag:
return option.help
default_value = self.parser.defaults.get(option.dest)
if default_value is NO_DEFAULT or default_value is None:
default_value = self.NO_DEFAULT_VALUE
return option.help.replace(self.default_tag, str(default_value))
def format_option(self, option):
# The help for each option consists of two parts:
# * the opt strings and metavars
# eg. ("-x", or "-fFILENAME, --file=FILENAME")
# * the user-supplied help string
# eg. ("turn on expert mode", "read data from FILENAME")
#
# If possible, we write both of these on the same line:
# -x turn on expert mode
#
# But if the opt string list is too long, we put the help
# string on a second line, indented to the same column it would
# start in if it fit on the first line.
# -fFILENAME, --file=FILENAME
# read data from FILENAME
result = []
opts = self.option_strings[option]
opt_width = self.help_position - self.current_indent - 2
if len(opts) > opt_width:
opts = "%*s%s\n" % (self.current_indent, "", opts)
indent_first = self.help_position
else: # start help on same line as opts
opts = "%*s%-*s " % (self.current_indent, "", opt_width, opts)
indent_first = 0
result.append(opts)
if option.help:
help_text = self.expand_default(option)
help_lines = textwrap.wrap(help_text, self.help_width)
result.append("%*s%s\n" % (indent_first, "", help_lines[0]))
result.extend(["%*s%s\n" % (self.help_position, "", line)
for line in help_lines[1:]])
elif opts[-1] != "\n":
result.append("\n")
return "".join(result)
def store_option_strings(self, parser):
self.indent()
max_len = 0
for opt in parser.option_list:
strings = self.format_option_strings(opt)
self.option_strings[opt] = strings
max_len = max(max_len, len(strings) + self.current_indent)
self.indent()
for group in parser.option_groups:
for opt in group.option_list:
strings = self.format_option_strings(opt)
self.option_strings[opt] = strings
max_len = max(max_len, len(strings) + self.current_indent)
self.dedent()
self.dedent()
self.help_position = min(max_len + 2, self.max_help_position)
self.help_width = max(self.width - self.help_position, 11)
def format_option_strings(self, option):
"""Return a comma-separated list of option strings & metavariables."""
if option.takes_value():
metavar = option.metavar or option.dest.upper()
short_opts = [self._short_opt_fmt % (sopt, metavar)
for sopt in option._short_opts]
long_opts = [self._long_opt_fmt % (lopt, metavar)
for lopt in option._long_opts]
else:
short_opts = option._short_opts
long_opts = option._long_opts
if self.short_first:
opts = short_opts + long_opts
else:
opts = long_opts + short_opts
return ", ".join(opts)
class IndentedHelpFormatter (HelpFormatter):
"""Format help with indented section bodies.
"""
def __init__(self,
indent_increment=2,
max_help_position=24,
width=None,
short_first=1):
HelpFormatter.__init__(
self, indent_increment, max_help_position, width, short_first)
def format_usage(self, usage):
return _("Usage: %s\n") % usage
def format_heading(self, heading):
return "%*s%s:\n" % (self.current_indent, "", heading)
class TitledHelpFormatter (HelpFormatter):
"""Format help with underlined section headers.
"""
def __init__(self,
indent_increment=0,
max_help_position=24,
width=None,
short_first=0):
HelpFormatter.__init__ (
self, indent_increment, max_help_position, width, short_first)
def format_usage(self, usage):
return "%s %s\n" % (self.format_heading(_("Usage")), usage)
def format_heading(self, heading):
return "%s\n%s\n" % (heading, "=-"[self.level] * len(heading))
def _parse_num(val, type):
if val[:2].lower() == "0x": # hexadecimal
radix = 16
elif val[:2].lower() == "0b": # binary
radix = 2
val = val[2:] or "0" # have to remove "0b" prefix
elif val[:1] == "0": # octal
radix = 8
else: # decimal
radix = 10
return type(val, radix)
def _parse_int(val):
return _parse_num(val, int)
def _parse_long(val):
return _parse_num(val, long)
_builtin_cvt = { "int" : (_parse_int, _("integer")),
"long" : (_parse_long, _("long integer")),
"float" : (float, _("floating-point")),
"complex" : (complex, _("complex")) }
def check_builtin(option, opt, value):
(cvt, what) = _builtin_cvt[option.type]
try:
return cvt(value)
except ValueError:
raise OptionValueError(
_("option %s: invalid %s value: %r") % (opt, what, value))
def check_choice(option, opt, value):
if value in option.choices:
return value
else:
choices = ", ".join(map(repr, option.choices))
raise OptionValueError(
_("option %s: invalid choice: %r (choose from %s)")
% (opt, value, choices))
# Not supplying a default is different from a default of None,
# so we need an explicit "not supplied" value.
NO_DEFAULT = ("NO", "DEFAULT")
class Option:
"""
Instance attributes:
_short_opts : [string]
_long_opts : [string]
action : string
type : string
dest : string
default : any
nargs : int
const : any
choices : [string]
callback : function
callback_args : (any*)
callback_kwargs : { string : any }
help : string
metavar : string
"""
# The list of instance attributes that may be set through
# keyword args to the constructor.
ATTRS = ['action',
'type',
'dest',
'default',
'nargs',
'const',
'choices',
'callback',
'callback_args',
'callback_kwargs',
'help',
'metavar']
# The set of actions allowed by option parsers. Explicitly listed
# here so the constructor can validate its arguments.
ACTIONS = ("store",
"store_const",
"store_true",
"store_false",
"append",
"append_const",
"count",
"callback",
"help",
"version")
# The set of actions that involve storing a value somewhere;
# also listed just for constructor argument validation. (If
# the action is one of these, there must be a destination.)
STORE_ACTIONS = ("store",
"store_const",
"store_true",
"store_false",
"append",
"append_const",
"count")
# The set of actions for which it makes sense to supply a value
# type, ie. which may consume an argument from the command line.
TYPED_ACTIONS = ("store",
"append",
"callback")
# The set of actions which *require* a value type, ie. that
# always consume an argument from the command line.
ALWAYS_TYPED_ACTIONS = ("store",
"append")
# The set of actions which take a 'const' attribute.
CONST_ACTIONS = ("store_const",
"append_const")
# The set of known types for option parsers. Again, listed here for
# constructor argument validation.
TYPES = ("string", "int", "long", "float", "complex", "choice")
# Dictionary of argument checking functions, which convert and
# validate option arguments according to the option type.
#
# Signature of checking functions is:
# check(option : Option, opt : string, value : string) -> any
# where
# option is the Option instance calling the checker
# opt is the actual option seen on the command-line
# (eg. "-a", "--file")
# value is the option argument seen on the command-line
#
# The return value should be in the appropriate Python type
# for option.type -- eg. an integer if option.type == "int".
#
# If no checker is defined for a type, arguments will be
# unchecked and remain strings.
TYPE_CHECKER = { "int" : check_builtin,
"long" : check_builtin,
"float" : check_builtin,
"complex": check_builtin,
"choice" : check_choice,
}
# CHECK_METHODS is a list of unbound method objects; they are called
# by the constructor, in order, after all attributes are
# initialized. The list is created and filled in later, after all
# the methods are actually defined. (I just put it here because I
# like to define and document all class attributes in the same
# place.) Subclasses that add another _check_*() method should
# define their own CHECK_METHODS list that adds their check method
# to those from this class.
CHECK_METHODS = None
# -- Constructor/initialization methods ----------------------------
def __init__(self, *opts, **attrs):
# Set _short_opts, _long_opts attrs from 'opts' tuple.
# Have to be set now, in case no option strings are supplied.
self._short_opts = []
self._long_opts = []
opts = self._check_opt_strings(opts)
self._set_opt_strings(opts)
# Set all other attrs (action, type, etc.) from 'attrs' dict
self._set_attrs(attrs)
# Check all the attributes we just set. There are lots of
# complicated interdependencies, but luckily they can be farmed
# out to the _check_*() methods listed in CHECK_METHODS -- which
# could be handy for subclasses! The one thing these all share
# is that they raise OptionError if they discover a problem.
for checker in self.CHECK_METHODS:
checker(self)
def _check_opt_strings(self, opts):
# Filter out None because early versions of Optik had exactly
# one short option and one long option, either of which
# could be None.
opts = filter(None, opts)
if not opts:
raise TypeError("at least one option string must be supplied")
return opts
def _set_opt_strings(self, opts):
for opt in opts:
if len(opt) < 2:
raise OptionError(
"invalid option string %r: "
"must be at least two characters long" % opt, self)
elif len(opt) == 2:
if not (opt[0] == "-" and opt[1] != "-"):
raise OptionError(
"invalid short option string %r: "
"must be of the form -x, (x any non-dash char)" % opt,
self)
self._short_opts.append(opt)
else:
if not (opt[0:2] == "--" and opt[2] != "-"):
raise OptionError(
"invalid long option string %r: "
"must start with --, followed by non-dash" % opt,
self)
self._long_opts.append(opt)
def _set_attrs(self, attrs):
for attr in self.ATTRS:
if attr in attrs:
setattr(self, attr, attrs[attr])
del attrs[attr]
else:
if attr == 'default':
setattr(self, attr, NO_DEFAULT)
else:
setattr(self, attr, None)
if attrs:
attrs = attrs.keys()
attrs.sort()
raise OptionError(
"invalid keyword arguments: %s" % ", ".join(attrs),
self)
# -- Constructor validation methods --------------------------------
def _check_action(self):
if self.action is None:
self.action = "store"
elif self.action not in self.ACTIONS:
raise OptionError("invalid action: %r" % self.action, self)
def _check_type(self):
if self.type is None:
if self.action in self.ALWAYS_TYPED_ACTIONS:
if self.choices is not None:
# The "choices" attribute implies "choice" type.
self.type = "choice"
else:
# No type given? "string" is the most sensible default.
self.type = "string"
else:
# Allow type objects or builtin type conversion functions
# (int, str, etc.) as an alternative to their names. (The
# complicated check of __builtin__ is only necessary for
# Python 2.1 and earlier, and is short-circuited by the
# first check on modern Pythons.)
import __builtin__
if ( type(self.type) is types.TypeType or
(hasattr(self.type, "__name__") and
getattr(__builtin__, self.type.__name__, None) is self.type) ):
self.type = self.type.__name__
if self.type == "str":
self.type = "string"
if self.type not in self.TYPES:
raise OptionError("invalid option type: %r" % self.type, self)
if self.action not in self.TYPED_ACTIONS:
raise OptionError(
"must not supply a type for action %r" % self.action, self)
def _check_choice(self):
if self.type == "choice":
if self.choices is None:
raise OptionError(
"must supply a list of choices for type 'choice'", self)
elif type(self.choices) not in (types.TupleType, types.ListType):
raise OptionError(
"choices must be a list of strings ('%s' supplied)"
% str(type(self.choices)).split("'")[1], self)
elif self.choices is not None:
raise OptionError(
"must not supply choices for type %r" % self.type, self)
def _check_dest(self):
# No destination given, and we need one for this action. The
# self.type check is for callbacks that take a value.
takes_value = (self.action in self.STORE_ACTIONS or
self.type is not None)
if self.dest is None and takes_value:
# Glean a destination from the first long option string,
# or from the first short option string if no long options.
if self._long_opts:
# eg. "--foo-bar" -> "foo_bar"
self.dest = self._long_opts[0][2:].replace('-', '_')
else:
self.dest = self._short_opts[0][1]
def _check_const(self):
if self.action not in self.CONST_ACTIONS and self.const is not None:
raise OptionError(
"'const' must not be supplied for action %r" % self.action,
self)
def _check_nargs(self):
if self.action in self.TYPED_ACTIONS:
if self.nargs is None:
self.nargs = 1
elif self.nargs is not None:
raise OptionError(
"'nargs' must not be supplied for action %r" % self.action,
self)
def _check_callback(self):
if self.action == "callback":
if not hasattr(self.callback, '__call__'):
raise OptionError(
"callback not callable: %r" % self.callback, self)
if (self.callback_args is not None and
type(self.callback_args) is not types.TupleType):
raise OptionError(
"callback_args, if supplied, must be a tuple: not %r"
% self.callback_args, self)
if (self.callback_kwargs is not None and
type(self.callback_kwargs) is not types.DictType):
raise OptionError(
"callback_kwargs, if supplied, must be a dict: not %r"
% self.callback_kwargs, self)
else:
if self.callback is not None:
raise OptionError(
"callback supplied (%r) for non-callback option"
% self.callback, self)
if self.callback_args is not None:
raise OptionError(
"callback_args supplied for non-callback option", self)
if self.callback_kwargs is not None:
raise OptionError(
"callback_kwargs supplied for non-callback option", self)
CHECK_METHODS = [_check_action,
_check_type,
_check_choice,
_check_dest,
_check_const,
_check_nargs,
_check_callback]
# -- Miscellaneous methods -----------------------------------------
def __str__(self):
return "/".join(self._short_opts + self._long_opts)
__repr__ = _repr
def takes_value(self):
return self.type is not None
def get_opt_string(self):
if self._long_opts:
return self._long_opts[0]
else:
return self._short_opts[0]
# -- Processing methods --------------------------------------------
def check_value(self, opt, value):
checker = self.TYPE_CHECKER.get(self.type)
if checker is None:
return value
else:
return checker(self, opt, value)
def convert_value(self, opt, value):
if value is not None:
if self.nargs == 1:
return self.check_value(opt, value)
else:
return tuple([self.check_value(opt, v) for v in value])
def process(self, opt, value, values, parser):
# First, convert the value(s) to the right type. Howl if any
# value(s) are bogus.
value = self.convert_value(opt, value)
# And then take whatever action is expected of us.
# This is a separate method to make life easier for
# subclasses to add new actions.
return self.take_action(
self.action, self.dest, opt, value, values, parser)
def take_action(self, action, dest, opt, value, values, parser):
if action == "store":
setattr(values, dest, value)
elif action == "store_const":
setattr(values, dest, self.const)
elif action == "store_true":
setattr(values, dest, True)
elif action == "store_false":
setattr(values, dest, False)
elif action == "append":
values.ensure_value(dest, []).append(value)
elif action == "append_const":
values.ensure_value(dest, []).append(self.const)
elif action == "count":
setattr(values, dest, values.ensure_value(dest, 0) + 1)
elif action == "callback":
args = self.callback_args or ()
kwargs = self.callback_kwargs or {}
self.callback(self, opt, value, parser, *args, **kwargs)
elif action == "help":
parser.print_help()
parser.exit()
elif action == "version":
parser.print_version()
parser.exit()
else:
raise ValueError("unknown action %r" % self.action)
return 1
# class Option
SUPPRESS_HELP = "SUPPRESS"+"HELP"
SUPPRESS_USAGE = "SUPPRESS"+"USAGE"
try:
basestring
except NameError:
def isbasestring(x):
return isinstance(x, (types.StringType, types.UnicodeType))
else:
def isbasestring(x):
return isinstance(x, basestring)
class Values:
def __init__(self, defaults=None):
if defaults:
for (attr, val) in defaults.items():
setattr(self, attr, val)
def __str__(self):
return str(self.__dict__)
__repr__ = _repr
def __cmp__(self, other):
if isinstance(other, Values):
return cmp(self.__dict__, other.__dict__)
elif isinstance(other, types.DictType):
return cmp(self.__dict__, other)
else:
return -1
def _update_careful(self, dict):
"""
Update the option values from an arbitrary dictionary, but only
use keys from dict that already have a corresponding attribute
in self. Any keys in dict without a corresponding attribute
are silently ignored.
"""
for attr in dir(self):
if attr in dict:
dval = dict[attr]
if dval is not None:
setattr(self, attr, dval)
def _update_loose(self, dict):
"""
Update the option values from an arbitrary dictionary,
using all keys from the dictionary regardless of whether
they have a corresponding attribute in self or not.
"""
self.__dict__.update(dict)
def _update(self, dict, mode):
if mode == "careful":
self._update_careful(dict)
elif mode == "loose":
self._update_loose(dict)
else:
raise ValueError, "invalid update mode: %r" % mode
def read_module(self, modname, mode="careful"):
__import__(modname)
mod = sys.modules[modname]
self._update(vars(mod), mode)
def read_file(self, filename, mode="careful"):
vars = {}
execfile(filename, vars)
self._update(vars, mode)
def ensure_value(self, attr, value):
if not hasattr(self, attr) or getattr(self, attr) is None:
setattr(self, attr, value)
return getattr(self, attr)
class OptionContainer:
"""
Abstract base class.
Class attributes:
standard_option_list : [Option]
list of standard options that will be accepted by all instances
of this parser class (intended to be overridden by subclasses).
Instance attributes:
option_list : [Option]
the list of Option objects contained by this OptionContainer
_short_opt : { string : Option }
dictionary mapping short option strings, eg. "-f" or "-X",
to the Option instances that implement them. If an Option
has multiple short option strings, it will appear in this
dictionary multiple times. [1]
_long_opt : { string : Option }
dictionary mapping long option strings, eg. "--file" or
"--exclude", to the Option instances that implement them.
Again, a given Option can occur multiple times in this
dictionary. [1]
defaults : { string : any }
dictionary mapping option destination names to default
values for each destination [1]
[1] These mappings are common to (shared by) all components of the
controlling OptionParser, where they are initially created.
"""
def __init__(self, option_class, conflict_handler, description):
# Initialize the option list and related data structures.
# This method must be provided by subclasses, and it must
# initialize at least the following instance attributes:
# option_list, _short_opt, _long_opt, defaults.
self._create_option_list()
self.option_class = option_class
self.set_conflict_handler(conflict_handler)
self.set_description(description)
def _create_option_mappings(self):
# For use by OptionParser constructor -- create the master
# option mappings used by this OptionParser and all
# OptionGroups that it owns.
self._short_opt = {} # single letter -> Option instance
self._long_opt = {} # long option -> Option instance
self.defaults = {} # maps option dest -> default value
def _share_option_mappings(self, parser):
# For use by OptionGroup constructor -- use shared option
# mappings from the OptionParser that owns this OptionGroup.
self._short_opt = parser._short_opt
self._long_opt = parser._long_opt
self.defaults = parser.defaults
def set_conflict_handler(self, handler):
if handler not in ("error", "resolve"):
raise ValueError, "invalid conflict_resolution value %r" % handler
self.conflict_handler = handler
def set_description(self, description):
self.description = description
def get_description(self):
return self.description
def destroy(self):
"""see OptionParser.destroy()."""
del self._short_opt
del self._long_opt
del self.defaults
# -- Option-adding methods -----------------------------------------
def _check_conflict(self, option):
conflict_opts = []
for opt in option._short_opts:
if opt in self._short_opt:
conflict_opts.append((opt, self._short_opt[opt]))
for opt in option._long_opts:
if opt in self._long_opt:
conflict_opts.append((opt, self._long_opt[opt]))
if conflict_opts:
handler = self.conflict_handler
if handler == "error":
raise OptionConflictError(
"conflicting option string(s): %s"
% ", ".join([co[0] for co in conflict_opts]),
option)
elif handler == "resolve":
for (opt, c_option) in conflict_opts:
if opt.startswith("--"):
c_option._long_opts.remove(opt)
del self._long_opt[opt]
else:
c_option._short_opts.remove(opt)
del self._short_opt[opt]
if not (c_option._short_opts or c_option._long_opts):
c_option.container.option_list.remove(c_option)
def add_option(self, *args, **kwargs):
"""add_option(Option)
add_option(opt_str, ..., kwarg=val, ...)
"""
if type(args[0]) in types.StringTypes:
option = self.option_class(*args, **kwargs)
elif len(args) == 1 and not kwargs:
option = args[0]
if not isinstance(option, Option):
raise TypeError, "not an Option instance: %r" % option
else:
raise TypeError, "invalid arguments"
self._check_conflict(option)
self.option_list.append(option)
option.container = self
for opt in option._short_opts:
self._short_opt[opt] = option
for opt in option._long_opts:
self._long_opt[opt] = option
if option.dest is not None: # option has a dest, we need a default
if option.default is not NO_DEFAULT:
self.defaults[option.dest] = option.default
elif option.dest not in self.defaults:
self.defaults[option.dest] = None
return option
def add_options(self, option_list):
for option in option_list:
self.add_option(option)
# -- Option query/removal methods ----------------------------------
def get_option(self, opt_str):
return (self._short_opt.get(opt_str) or
self._long_opt.get(opt_str))
def has_option(self, opt_str):
return (opt_str in self._short_opt or
opt_str in self._long_opt)
def remove_option(self, opt_str):
option = self._short_opt.get(opt_str)
if option is None:
option = self._long_opt.get(opt_str)
if option is None:
raise ValueError("no such option %r" % opt_str)
for opt in option._short_opts:
del self._short_opt[opt]
for opt in option._long_opts:
del self._long_opt[opt]
option.container.option_list.remove(option)
# -- Help-formatting methods ---------------------------------------
def format_option_help(self, formatter):
if not self.option_list:
return ""
result = []
for option in self.option_list:
if not option.help is SUPPRESS_HELP:
result.append(formatter.format_option(option))
return "".join(result)
def format_description(self, formatter):
return formatter.format_description(self.get_description())
def format_help(self, formatter):
result = []
if self.description:
result.append(self.format_description(formatter))
if self.option_list:
result.append(self.format_option_help(formatter))
return "\n".join(result)
class OptionGroup (OptionContainer):
def __init__(self, parser, title, description=None):
self.parser = parser
OptionContainer.__init__(
self, parser.option_class, parser.conflict_handler, description)
self.title = title
def _create_option_list(self):
self.option_list = []
self._share_option_mappings(self.parser)
def set_title(self, title):
self.title = title
def destroy(self):
"""see OptionParser.destroy()."""
OptionContainer.destroy(self)
del self.option_list
# -- Help-formatting methods ---------------------------------------
def format_help(self, formatter):
result = formatter.format_heading(self.title)
formatter.indent()
result += OptionContainer.format_help(self, formatter)
formatter.dedent()
return result
class OptionParser (OptionContainer):
"""
Class attributes:
standard_option_list : [Option]
list of standard options that will be accepted by all instances
of this parser class (intended to be overridden by subclasses).
Instance attributes:
usage : string
a usage string for your program. Before it is displayed
to the user, "%prog" will be expanded to the name of
your program (self.prog or os.path.basename(sys.argv[0])).
prog : string
the name of the current program (to override
os.path.basename(sys.argv[0])).
description : string
A paragraph of text giving a brief overview of your program.
optparse reformats this paragraph to fit the current terminal
width and prints it when the user requests help (after usage,
but before the list of options).
epilog : string
paragraph of help text to print after option help
option_groups : [OptionGroup]
list of option groups in this parser (option groups are
irrelevant for parsing the command-line, but very useful
for generating help)
allow_interspersed_args : bool = true
if true, positional arguments may be interspersed with options.
Assuming -a and -b each take a single argument, the command-line
-ablah foo bar -bboo baz
will be interpreted the same as
-ablah -bboo -- foo bar baz
If this flag were false, that command line would be interpreted as
-ablah -- foo bar -bboo baz
-- ie. we stop processing options as soon as we see the first
non-option argument. (This is the tradition followed by
Python's getopt module, Perl's Getopt::Std, and other argument-
parsing libraries, but it is generally annoying to users.)
process_default_values : bool = true
if true, option default values are processed similarly to option
values from the command line: that is, they are passed to the
type-checking function for the option's type (as long as the
default value is a string). (This really only matters if you
have defined custom types; see SF bug #955889.) Set it to false
to restore the behaviour of Optik 1.4.1 and earlier.
rargs : [string]
the argument list currently being parsed. Only set when
parse_args() is active, and continually trimmed down as
we consume arguments. Mainly there for the benefit of
callback options.
largs : [string]
the list of leftover arguments that we have skipped while
parsing options. If allow_interspersed_args is false, this
list is always empty.
values : Values
the set of option values currently being accumulated. Only
set when parse_args() is active. Also mainly for callbacks.
Because of the 'rargs', 'largs', and 'values' attributes,
OptionParser is not thread-safe. If, for some perverse reason, you
need to parse command-line arguments simultaneously in different
threads, use different OptionParser instances.
"""
standard_option_list = []
def __init__(self,
usage=None,
option_list=None,
option_class=Option,
version=None,
conflict_handler="error",
description=None,
formatter=None,
add_help_option=True,
prog=None,
epilog=None):
OptionContainer.__init__(
self, option_class, conflict_handler, description)
self.set_usage(usage)
self.prog = prog
self.version = version
self.allow_interspersed_args = True
self.process_default_values = True
if formatter is None:
formatter = IndentedHelpFormatter()
self.formatter = formatter
self.formatter.set_parser(self)
self.epilog = epilog
# Populate the option list; initial sources are the
# standard_option_list class attribute, the 'option_list'
# argument, and (if applicable) the _add_version_option() and
# _add_help_option() methods.
self._populate_option_list(option_list,
add_help=add_help_option)
self._init_parsing_state()
def destroy(self):
"""
Declare that you are done with this OptionParser. This cleans up
reference cycles so the OptionParser (and all objects referenced by
it) can be garbage-collected promptly. After calling destroy(), the
OptionParser is unusable.
"""
OptionContainer.destroy(self)
for group in self.option_groups:
group.destroy()
del self.option_list
del self.option_groups
del self.formatter
# -- Private methods -----------------------------------------------
# (used by our or OptionContainer's constructor)
def _create_option_list(self):
self.option_list = []
self.option_groups = []
self._create_option_mappings()
def _add_help_option(self):
self.add_option("-h", "--help",
action="help",
help=_("show this help message and exit"))
def _add_version_option(self):
self.add_option("--version",
action="version",
help=_("show program's version number and exit"))
def _populate_option_list(self, option_list, add_help=True):
if self.standard_option_list:
self.add_options(self.standard_option_list)
if option_list:
self.add_options(option_list)
if self.version:
self._add_version_option()
if add_help:
self._add_help_option()
def _init_parsing_state(self):
# These are set in parse_args() for the convenience of callbacks.
self.rargs = None
self.largs = None
self.values = None
# -- Simple modifier methods ---------------------------------------
def set_usage(self, usage):
if usage is None:
self.usage = _("%prog [options]")
elif usage is SUPPRESS_USAGE:
self.usage = None
# For backwards compatibility with Optik 1.3 and earlier.
elif usage.lower().startswith("usage: "):
self.usage = usage[7:]
else:
self.usage = usage
def enable_interspersed_args(self):
"""Set parsing to not stop on the first non-option, allowing
interspersing switches with command arguments. This is the
default behavior. See also disable_interspersed_args() and the
class documentation description of the attribute
allow_interspersed_args."""
self.allow_interspersed_args = True
def disable_interspersed_args(self):
"""Set parsing to stop on the first non-option. Use this if
you have a command processor which runs another command that
has options of its own and you want to make sure these options
don't get confused.
"""
self.allow_interspersed_args = False
def set_process_default_values(self, process):
self.process_default_values = process
def set_default(self, dest, value):
self.defaults[dest] = value
def set_defaults(self, **kwargs):
self.defaults.update(kwargs)
def _get_all_options(self):
options = self.option_list[:]
for group in self.option_groups:
options.extend(group.option_list)
return options
def get_default_values(self):
if not self.process_default_values:
# Old, pre-Optik 1.5 behaviour.
return Values(self.defaults)
defaults = self.defaults.copy()
for option in self._get_all_options():
default = defaults.get(option.dest)
if isbasestring(default):
opt_str = option.get_opt_string()
defaults[option.dest] = option.check_value(opt_str, default)
return Values(defaults)
# -- OptionGroup methods -------------------------------------------
def add_option_group(self, *args, **kwargs):
# XXX lots of overlap with OptionContainer.add_option()
if type(args[0]) is types.StringType:
group = OptionGroup(self, *args, **kwargs)
elif len(args) == 1 and not kwargs:
group = args[0]
if not isinstance(group, OptionGroup):
raise TypeError, "not an OptionGroup instance: %r" % group
if group.parser is not self:
raise ValueError, "invalid OptionGroup (wrong parser)"
else:
raise TypeError, "invalid arguments"
self.option_groups.append(group)
return group
def get_option_group(self, opt_str):
option = (self._short_opt.get(opt_str) or
self._long_opt.get(opt_str))
if option and option.container is not self:
return option.container
return None
# -- Option-parsing methods ----------------------------------------
def _get_args(self, args):
if args is None:
return sys.argv[1:]
else:
return args[:] # don't modify caller's list
def parse_args(self, args=None, values=None):
"""
parse_args(args : [string] = sys.argv[1:],
values : Values = None)
-> (values : Values, args : [string])
Parse the command-line options found in 'args' (default:
sys.argv[1:]). Any errors result in a call to 'error()', which
by default prints the usage message to stderr and calls
sys.exit() with an error message. On success returns a pair
(values, args) where 'values' is a Values instance (with all
your option values) and 'args' is the list of arguments left
over after parsing options.
"""
rargs = self._get_args(args)
if values is None:
values = self.get_default_values()
# Store the halves of the argument list as attributes for the
# convenience of callbacks:
# rargs
# the rest of the command-line (the "r" stands for
# "remaining" or "right-hand")
# largs
# the leftover arguments -- ie. what's left after removing
# options and their arguments (the "l" stands for "leftover"
# or "left-hand")
self.rargs = rargs
self.largs = largs = []
self.values = values
try:
stop = self._process_args(largs, rargs, values)
except (BadOptionError, OptionValueError), err:
self.error(str(err))
args = largs + rargs
return self.check_values(values, args)
def check_values(self, values, args):
"""
check_values(values : Values, args : [string])
-> (values : Values, args : [string])
Check that the supplied option values and leftover arguments are
valid. Returns the option values and leftover arguments
(possibly adjusted, possibly completely new -- whatever you
like). Default implementation just returns the passed-in
values; subclasses may override as desired.
"""
return (values, args)
def _process_args(self, largs, rargs, values):
"""_process_args(largs : [string],
rargs : [string],
values : Values)
Process command-line arguments and populate 'values', consuming
options and arguments from 'rargs'. If 'allow_interspersed_args' is
false, stop at the first non-option argument. If true, accumulate any
interspersed non-option arguments in 'largs'.
"""
while rargs:
arg = rargs[0]
# We handle bare "--" explicitly, and bare "-" is handled by the
# standard arg handler since the short arg case ensures that the
# len of the opt string is greater than 1.
if arg == "--":
del rargs[0]
return
elif arg[0:2] == "--":
# process a single long option (possibly with value(s))
self._process_long_opt(rargs, values)
elif arg[:1] == "-" and len(arg) > 1:
# process a cluster of short options (possibly with
# value(s) for the last one only)
self._process_short_opts(rargs, values)
elif self.allow_interspersed_args:
largs.append(arg)
del rargs[0]
else:
return # stop now, leave this arg in rargs
# Say this is the original argument list:
# [arg0, arg1, ..., arg(i-1), arg(i), arg(i+1), ..., arg(N-1)]
# ^
# (we are about to process arg(i)).
#
# Then rargs is [arg(i), ..., arg(N-1)] and largs is a *subset* of
# [arg0, ..., arg(i-1)] (any options and their arguments will have
# been removed from largs).
#
# The while loop will usually consume 1 or more arguments per pass.
# If it consumes 1 (eg. arg is an option that takes no arguments),
# then after _process_arg() is done the situation is:
#
# largs = subset of [arg0, ..., arg(i)]
# rargs = [arg(i+1), ..., arg(N-1)]
#
# If allow_interspersed_args is false, largs will always be
# *empty* -- still a subset of [arg0, ..., arg(i-1)], but
# not a very interesting subset!
def _match_long_opt(self, opt):
"""_match_long_opt(opt : string) -> string
Determine which long option string 'opt' matches, ie. which one
it is an unambiguous abbreviation for. Raises BadOptionError if
'opt' doesn't unambiguously match any long option string.
"""
return _match_abbrev(opt, self._long_opt)
def _process_long_opt(self, rargs, values):
arg = rargs.pop(0)
# Value explicitly attached to arg? Pretend it's the next
# argument.
if "=" in arg:
(opt, next_arg) = arg.split("=", 1)
rargs.insert(0, next_arg)
had_explicit_value = True
else:
opt = arg
had_explicit_value = False
opt = self._match_long_opt(opt)
option = self._long_opt[opt]
if option.takes_value():
nargs = option.nargs
if len(rargs) < nargs:
if nargs == 1:
self.error(_("%s option requires an argument") % opt)
else:
self.error(_("%s option requires %d arguments")
% (opt, nargs))
elif nargs == 1:
value = rargs.pop(0)
else:
value = tuple(rargs[0:nargs])
del rargs[0:nargs]
elif had_explicit_value:
self.error(_("%s option does not take a value") % opt)
else:
value = None
option.process(opt, value, values, self)
def _process_short_opts(self, rargs, values):
arg = rargs.pop(0)
stop = False
i = 1
for ch in arg[1:]:
opt = "-" + ch
option = self._short_opt.get(opt)
i += 1 # we have consumed a character
if not option:
raise BadOptionError(opt)
if option.takes_value():
# Any characters left in arg? Pretend they're the
# next arg, and stop consuming characters of arg.
if i < len(arg):
rargs.insert(0, arg[i:])
stop = True
nargs = option.nargs
if len(rargs) < nargs:
if nargs == 1:
self.error(_("%s option requires an argument") % opt)
else:
self.error(_("%s option requires %d arguments")
% (opt, nargs))
elif nargs == 1:
value = rargs.pop(0)
else:
value = tuple(rargs[0:nargs])
del rargs[0:nargs]
else: # option doesn't take a value
value = None
option.process(opt, value, values, self)
if stop:
break
# -- Feedback methods ----------------------------------------------
def get_prog_name(self):
if self.prog is None:
return os.path.basename(sys.argv[0])
else:
return self.prog
def expand_prog_name(self, s):
return s.replace("%prog", self.get_prog_name())
def get_description(self):
return self.expand_prog_name(self.description)
def exit(self, status=0, msg=None):
if msg:
sys.stderr.write(msg)
sys.exit(status)
def error(self, msg):
"""error(msg : string)
Print a usage message incorporating 'msg' to stderr and exit.
If you override this in a subclass, it should not return -- it
should either exit or raise an exception.
"""
self.print_usage(sys.stderr)
self.exit(2, "%s: error: %s\n" % (self.get_prog_name(), msg))
def get_usage(self):
if self.usage:
return self.formatter.format_usage(
self.expand_prog_name(self.usage))
else:
return ""
def print_usage(self, file=None):
"""print_usage(file : file = stdout)
Print the usage message for the current program (self.usage) to
'file' (default stdout). Any occurrence of the string "%prog" in
self.usage is replaced with the name of the current program
(basename of sys.argv[0]). Does nothing if self.usage is empty
or not defined.
"""
if self.usage:
print >>file, self.get_usage()
def get_version(self):
if self.version:
return self.expand_prog_name(self.version)
else:
return ""
def print_version(self, file=None):
"""print_version(file : file = stdout)
Print the version message for this program (self.version) to
'file' (default stdout). As with print_usage(), any occurrence
of "%prog" in self.version is replaced by the current program's
name. Does nothing if self.version is empty or undefined.
"""
if self.version:
print >>file, self.get_version()
def format_option_help(self, formatter=None):
if formatter is None:
formatter = self.formatter
formatter.store_option_strings(self)
result = []
result.append(formatter.format_heading(_("Options")))
formatter.indent()
if self.option_list:
result.append(OptionContainer.format_option_help(self, formatter))
result.append("\n")
for group in self.option_groups:
result.append(group.format_help(formatter))
result.append("\n")
formatter.dedent()
# Drop the last "\n", or the header if no options or option groups:
return "".join(result[:-1])
def format_epilog(self, formatter):
return formatter.format_epilog(self.epilog)
def format_help(self, formatter=None):
if formatter is None:
formatter = self.formatter
result = []
if self.usage:
result.append(self.get_usage() + "\n")
if self.description:
result.append(self.format_description(formatter) + "\n")
result.append(self.format_option_help(formatter))
result.append(self.format_epilog(formatter))
return "".join(result)
# used by test suite
def _get_encoding(self, file):
encoding = getattr(file, "encoding", None)
if not encoding:
encoding = sys.getdefaultencoding()
return encoding
def print_help(self, file=None):
"""print_help(file : file = stdout)
Print an extended help message, listing all options and any
help text provided with them, to 'file' (default stdout).
"""
if file is None:
file = sys.stdout
encoding = self._get_encoding(file)
file.write(self.format_help().encode(encoding, "replace"))
# class OptionParser
def _match_abbrev(s, wordmap):
"""_match_abbrev(s : string, wordmap : {string : Option}) -> string
Return the string key in 'wordmap' for which 's' is an unambiguous
abbreviation. If 's' is found to be ambiguous or doesn't match any of
'words', raise BadOptionError.
"""
# Is there an exact match?
if s in wordmap:
return s
else:
# Isolate all words with s as a prefix.
possibilities = [word for word in wordmap.keys()
if word.startswith(s)]
# No exact match, so there had better be just one possibility.
if len(possibilities) == 1:
return possibilities[0]
elif not possibilities:
raise BadOptionError(s)
else:
# More than one possible completion: ambiguous prefix.
possibilities.sort()
raise AmbiguousOptionError(s, possibilities)
# Some day, there might be many Option classes. As of Optik 1.3, the
# preferred way to instantiate Options is indirectly, via make_option(),
# which will become a factory function when there are many Option
# classes.
make_option = Option
r"""OS routines for NT or Posix depending on what system we're on.
This exports:
- all functions from posix, nt, os2, or ce, e.g. unlink, stat, etc.
- os.path is one of the modules posixpath, or ntpath
- os.name is 'posix', 'nt', 'os2', 'ce' or 'riscos'
- os.curdir is a string representing the current directory ('.' or ':')
- os.pardir is a string representing the parent directory ('..' or '::')
- os.sep is the (or a most common) pathname separator ('/' or ':' or '\\')
- os.extsep is the extension separator ('.' or '/')
- os.altsep is the alternate pathname separator (None or '/')
- os.pathsep is the component separator used in $PATH etc
- os.linesep is the line separator in text files ('\r' or '\n' or '\r\n')
- os.defpath is the default search path for executables
- os.devnull is the file path of the null device ('/dev/null', etc.)
Programs that import and use 'os' stand a better chance of being
portable between different platforms. Of course, they must then
only use functions that are defined by all platforms (e.g., unlink
and opendir), and leave all pathname manipulation to os.path
(e.g., split and join).
"""
#'
import sys, errno
_names = sys.builtin_module_names
# Note: more names are added to __all__ later.
__all__ = ["altsep", "curdir", "pardir", "sep", "extsep", "pathsep", "linesep",
"defpath", "name", "path", "devnull",
"SEEK_SET", "SEEK_CUR", "SEEK_END"]
def _get_exports_list(module):
try:
return list(module.__all__)
except AttributeError:
return [n for n in dir(module) if n[0] != '_']
if 'posix' in _names:
name = 'posix'
linesep = '\n'
from posix import *
try:
from posix import _exit
except ImportError:
pass
import posixpath as path
import posix
__all__.extend(_get_exports_list(posix))
del posix
elif 'nt' in _names:
name = 'nt'
linesep = '\r\n'
from nt import *
try:
from nt import _exit
except ImportError:
pass
import ntpath as path
import nt
__all__.extend(_get_exports_list(nt))
del nt
elif 'os2' in _names:
name = 'os2'
linesep = '\r\n'
from os2 import *
try:
from os2 import _exit
except ImportError:
pass
if sys.version.find('EMX GCC') == -1:
import ntpath as path
else:
import os2emxpath as path
from _emx_link import link
import os2
__all__.extend(_get_exports_list(os2))
del os2
elif 'ce' in _names:
name = 'ce'
linesep = '\r\n'
from ce import *
try:
from ce import _exit
except ImportError:
pass
# We can use the standard Windows path.
import ntpath as path
import ce
__all__.extend(_get_exports_list(ce))
del ce
elif 'riscos' in _names:
name = 'riscos'
linesep = '\n'
from riscos import *
try:
from riscos import _exit
except ImportError:
pass
import riscospath as path
import riscos
__all__.extend(_get_exports_list(riscos))
del riscos
else:
raise ImportError, 'no os specific module found'
sys.modules['os.path'] = path
from os.path import (curdir, pardir, sep, pathsep, defpath, extsep, altsep,
devnull)
del _names
# Python uses fixed values for the SEEK_ constants; they are mapped
# to native constants if necessary in posixmodule.c
SEEK_SET = 0
SEEK_CUR = 1
SEEK_END = 2
#'
# Super directory utilities.
# (Inspired by Eric Raymond; the doc strings are mostly his)
def makedirs(name, mode=0777):
"""makedirs(path [, mode=0777])
Super-mkdir; create a leaf directory and all intermediate ones.
Works like mkdir, except that any intermediate path segment (not
just the rightmost) will be created if it does not exist. This is
recursive.
"""
head, tail = path.split(name)
if not tail:
head, tail = path.split(head)
if head and tail and not path.exists(head):
try:
makedirs(head, mode)
except OSError, e:
# be happy if someone already created the path
if e.errno != errno.EEXIST:
raise
if tail == curdir: # xxx/newdir/. exists if xxx/newdir exists
return
mkdir(name, mode)
def removedirs(name):
"""removedirs(path)
Super-rmdir; remove a leaf directory and all empty intermediate
ones. Works like rmdir except that, if the leaf directory is
successfully removed, directories corresponding to rightmost path
segments will be pruned away until either the whole path is
consumed or an error occurs. Errors during this latter phase are
ignored -- they generally mean that a directory was not empty.
"""
rmdir(name)
head, tail = path.split(name)
if not tail:
head, tail = path.split(head)
while head and tail:
try:
rmdir(head)
except error:
break
head, tail = path.split(head)
def renames(old, new):
"""renames(old, new)
Super-rename; create directories as necessary and delete any left
empty. Works like rename, except creation of any intermediate
directories needed to make the new pathname good is attempted
first. After the rename, directories corresponding to rightmost
path segments of the old name will be pruned until either the
whole path is consumed or a nonempty directory is found.
Note: this function can fail with the new directory structure made
if you lack permissions needed to unlink the leaf directory or
file.
"""
head, tail = path.split(new)
if head and tail and not path.exists(head):
makedirs(head)
rename(old, new)
head, tail = path.split(old)
if head and tail:
try:
removedirs(head)
except error:
pass
__all__.extend(["makedirs", "removedirs", "renames"])
def walk(top, topdown=True, onerror=None, followlinks=False):
"""Directory tree generator.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), yields a 3-tuple
dirpath, dirnames, filenames
dirpath is a string, the path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists are just names, with no path components.
To get a full path (which begins with top) to a file or directory in
dirpath, do os.path.join(dirpath, name).
If optional arg 'topdown' is true or not specified, the triple for a
directory is generated before the triples for any of its subdirectories
(directories are generated top down). If topdown is false, the triple
for a directory is generated after the triples for all of its
subdirectories (directories are generated bottom up).
When topdown is true, the caller can modify the dirnames list in-place
(e.g., via del or slice assignment), and walk will only recurse into the
subdirectories whose names remain in dirnames; this can be used to prune the
search, or to impose a specific order of visiting. Modifying dirnames when
topdown is false is ineffective, since the directories in dirnames have
already been generated by the time dirnames itself is generated. No matter
the value of topdown, the list of subdirectories is retrieved before the
tuples for the directory and its subdirectories are generated.
By default errors from the os.listdir() call are ignored. If
optional arg 'onerror' is specified, it should be a function; it
will be called with one argument, an os.error instance. It can
report the error to continue with the walk, or raise the exception
to abort the walk. Note that the filename is available as the
filename attribute of the exception object.
By default, os.walk does not follow symbolic links to subdirectories on
systems that support them. In order to get this functionality, set the
optional argument 'followlinks' to true.
Caution: if you pass a relative pathname for top, don't change the
current working directory between resumptions of walk. walk never
changes the current directory, and assumes that the client doesn't
either.
Example:
import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print root, "consumes",
print sum([getsize(join(root, name)) for name in files]),
print "bytes in", len(files), "non-directory files"
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
"""
islink, join, isdir = path.islink, path.join, path.isdir
# We may not have read permission for top, in which case we can't
# get a list of the files the directory contains. os.path.walk
# always suppressed the exception then, rather than blow up for a
# minor reason when (say) a thousand readable directories are still
# left to visit. That logic is copied here.
try:
# Note that listdir and error are globals in this module due
# to earlier import-*.
names = listdir(top)
except error, err:
if onerror is not None:
onerror(err)
return
dirs, nondirs = [], []
for name in names:
if isdir(join(top, name)):
dirs.append(name)
else:
nondirs.append(name)
if topdown:
yield top, dirs, nondirs
for name in dirs:
new_path = join(top, name)
if followlinks or not islink(new_path):
for x in walk(new_path, topdown, onerror, followlinks):
yield x
if not topdown:
yield top, dirs, nondirs
__all__.append("walk")
# Make sure os.environ exists, at least
try:
environ
except NameError:
environ = {}
def execl(file, *args):
"""execl(file, *args)
Execute the executable file with argument list args, replacing the
current process. """
execv(file, args)
def execle(file, *args):
"""execle(file, *args, env)
Execute the executable file with argument list args and
environment env, replacing the current process. """
env = args[-1]
execve(file, args[:-1], env)
def execlp(file, *args):
"""execlp(file, *args)
Execute the executable file (which is searched for along $PATH)
with argument list args, replacing the current process. """
execvp(file, args)
def execlpe(file, *args):
"""execlpe(file, *args, env)
Execute the executable file (which is searched for along $PATH)
with argument list args and environment env, replacing the current
process. """
env = args[-1]
execvpe(file, args[:-1], env)
def execvp(file, args):
"""execvp(file, args)
Execute the executable file (which is searched for along $PATH)
with argument list args, replacing the current process.
args may be a list or tuple of strings. """
_execvpe(file, args)
def execvpe(file, args, env):
"""execvpe(file, args, env)
Execute the executable file (which is searched for along $PATH)
with argument list args and environment env , replacing the
current process.
args may be a list or tuple of strings. """
_execvpe(file, args, env)
__all__.extend(["execl","execle","execlp","execlpe","execvp","execvpe"])
def _execvpe(file, args, env=None):
if env is not None:
func = execve
argrest = (args, env)
else:
func = execv
argrest = (args,)
env = environ
head, tail = path.split(file)
if head:
func(file, *argrest)
return
if 'PATH' in env:
envpath = env['PATH']
else:
envpath = defpath
PATH = envpath.split(pathsep)
saved_exc = None
saved_tb = None
for dir in PATH:
fullname = path.join(dir, file)
try:
func(fullname, *argrest)
except error, e:
tb = sys.exc_info()[2]
if (e.errno != errno.ENOENT and e.errno != errno.ENOTDIR
and saved_exc is None):
saved_exc = e
saved_tb = tb
if saved_exc:
raise error, saved_exc, saved_tb
raise error, e, tb
# Change environ to automatically call putenv() if it exists
try:
# This will fail if there's no putenv
putenv
except NameError:
pass
else:
import UserDict
# Fake unsetenv() for Windows
# not sure about os2 here but
# I'm guessing they are the same.
if name in ('os2', 'nt'):
def unsetenv(key):
putenv(key, "")
if name == "riscos":
# On RISC OS, all env access goes through getenv and putenv
from riscosenviron import _Environ
elif name in ('os2', 'nt'): # Where Env Var Names Must Be UPPERCASE
# But we store them as upper case
class _Environ(UserDict.IterableUserDict):
def __init__(self, environ):
UserDict.UserDict.__init__(self)
data = self.data
for k, v in environ.items():
data[k.upper()] = v
def __setitem__(self, key, item):
putenv(key, item)
self.data[key.upper()] = item
def __getitem__(self, key):
return self.data[key.upper()]
try:
unsetenv
except NameError:
def __delitem__(self, key):
del self.data[key.upper()]
else:
def __delitem__(self, key):
unsetenv(key)
del self.data[key.upper()]
def clear(self):
for key in self.data.keys():
unsetenv(key)
del self.data[key]
def pop(self, key, *args):
unsetenv(key)
return self.data.pop(key.upper(), *args)
def has_key(self, key):
return key.upper() in self.data
def __contains__(self, key):
return key.upper() in self.data
def get(self, key, failobj=None):
return self.data.get(key.upper(), failobj)
def update(self, dict=None, **kwargs):
if dict:
try:
keys = dict.keys()
except AttributeError:
# List of (key, value)
for k, v in dict:
self[k] = v
else:
# got keys
# cannot use items(), since mappings
# may not have them.
for k in keys:
self[k] = dict[k]
if kwargs:
self.update(kwargs)
def copy(self):
return dict(self)
else: # Where Env Var Names Can Be Mixed Case
class _Environ(UserDict.IterableUserDict):
def __init__(self, environ):
UserDict.UserDict.__init__(self)
self.data = environ
def __setitem__(self, key, item):
putenv(key, item)
self.data[key] = item
def update(self, dict=None, **kwargs):
if dict:
try:
keys = dict.keys()
except AttributeError:
# List of (key, value)
for k, v in dict:
self[k] = v
else:
# got keys
# cannot use items(), since mappings
# may not have them.
for k in keys:
self[k] = dict[k]
if kwargs:
self.update(kwargs)
try:
unsetenv
except NameError:
pass
else:
def __delitem__(self, key):
unsetenv(key)
del self.data[key]
def clear(self):
for key in self.data.keys():
unsetenv(key)
del self.data[key]
def pop(self, key, *args):
unsetenv(key)
return self.data.pop(key, *args)
def copy(self):
return dict(self)
environ = _Environ(environ)
def getenv(key, default=None):
"""Get an environment variable, return None if it doesn't exist.
The optional second argument can specify an alternate default."""
return environ.get(key, default)
__all__.append("getenv")
def _exists(name):
return name in globals()
# Supply spawn*() (probably only for Unix)
if _exists("fork") and not _exists("spawnv") and _exists("execv"):
P_WAIT = 0
P_NOWAIT = P_NOWAITO = 1
# XXX Should we support P_DETACH? I suppose it could fork()**2
# and close the std I/O streams. Also, P_OVERLAY is the same
# as execv*()?
def _spawnvef(mode, file, args, env, func):
# Internal helper; func is the exec*() function to use
pid = fork()
if not pid:
# Child
try:
if env is None:
func(file, args)
else:
func(file, args, env)
except:
_exit(127)
else:
# Parent
if mode == P_NOWAIT:
return pid # Caller is responsible for waiting!
while 1:
wpid, sts = waitpid(pid, 0)
if WIFSTOPPED(sts):
continue
elif WIFSIGNALED(sts):
return -WTERMSIG(sts)
elif WIFEXITED(sts):
return WEXITSTATUS(sts)
else:
raise error, "Not stopped, signaled or exited???"
def spawnv(mode, file, args):
"""spawnv(mode, file, args) -> integer
Execute file with arguments from args in a subprocess.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return _spawnvef(mode, file, args, None, execv)
def spawnve(mode, file, args, env):
"""spawnve(mode, file, args, env) -> integer
Execute file with arguments from args in a subprocess with the
specified environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return _spawnvef(mode, file, args, env, execve)
# Note: spawnvp[e] is't currently supported on Windows
def spawnvp(mode, file, args):
"""spawnvp(mode, file, args) -> integer
Execute file (which is looked for along $PATH) with arguments from
args in a subprocess.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return _spawnvef(mode, file, args, None, execvp)
def spawnvpe(mode, file, args, env):
"""spawnvpe(mode, file, args, env) -> integer
Execute file (which is looked for along $PATH) with arguments from
args in a subprocess with the supplied environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return _spawnvef(mode, file, args, env, execvpe)
if _exists("spawnv"):
# These aren't supplied by the basic Windows code
# but can be easily implemented in Python
def spawnl(mode, file, *args):
"""spawnl(mode, file, *args) -> integer
Execute file with arguments from args in a subprocess.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return spawnv(mode, file, args)
def spawnle(mode, file, *args):
"""spawnle(mode, file, *args, env) -> integer
Execute file with arguments from args in a subprocess with the
supplied environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
env = args[-1]
return spawnve(mode, file, args[:-1], env)
__all__.extend(["spawnv", "spawnve", "spawnl", "spawnle",])
if _exists("spawnvp"):
# At the moment, Windows doesn't implement spawnvp[e],
# so it won't have spawnlp[e] either.
def spawnlp(mode, file, *args):
"""spawnlp(mode, file, *args) -> integer
Execute file (which is looked for along $PATH) with arguments from
args in a subprocess with the supplied environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return spawnvp(mode, file, args)
def spawnlpe(mode, file, *args):
"""spawnlpe(mode, file, *args, env) -> integer
Execute file (which is looked for along $PATH) with arguments from
args in a subprocess with the supplied environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
env = args[-1]
return spawnvpe(mode, file, args[:-1], env)
__all__.extend(["spawnvp", "spawnvpe", "spawnlp", "spawnlpe",])
# Supply popen2 etc. (for Unix)
if _exists("fork"):
if not _exists("popen2"):
def popen2(cmd, mode="t", bufsize=-1):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd'
may be a sequence, in which case arguments will be passed directly to
the program without shell intervention (as with os.spawnv()). If 'cmd'
is a string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdin, child_stdout) are returned."""
import warnings
msg = "os.popen2 is deprecated. Use the subprocess module."
warnings.warn(msg, DeprecationWarning, stacklevel=2)
import subprocess
PIPE = subprocess.PIPE
p = subprocess.Popen(cmd, shell=isinstance(cmd, basestring),
bufsize=bufsize, stdin=PIPE, stdout=PIPE,
close_fds=True)
return p.stdin, p.stdout
__all__.append("popen2")
if not _exists("popen3"):
def popen3(cmd, mode="t", bufsize=-1):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd'
may be a sequence, in which case arguments will be passed directly to
the program without shell intervention (as with os.spawnv()). If 'cmd'
is a string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdin, child_stdout, child_stderr) are returned."""
import warnings
msg = "os.popen3 is deprecated. Use the subprocess module."
warnings.warn(msg, DeprecationWarning, stacklevel=2)
import subprocess
PIPE = subprocess.PIPE
p = subprocess.Popen(cmd, shell=isinstance(cmd, basestring),
bufsize=bufsize, stdin=PIPE, stdout=PIPE,
stderr=PIPE, close_fds=True)
return p.stdin, p.stdout, p.stderr
__all__.append("popen3")
if not _exists("popen4"):
def popen4(cmd, mode="t", bufsize=-1):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd'
may be a sequence, in which case arguments will be passed directly to
the program without shell intervention (as with os.spawnv()). If 'cmd'
is a string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdin, child_stdout_stderr) are returned."""
import warnings
msg = "os.popen4 is deprecated. Use the subprocess module."
warnings.warn(msg, DeprecationWarning, stacklevel=2)
import subprocess
PIPE = subprocess.PIPE
p = subprocess.Popen(cmd, shell=isinstance(cmd, basestring),
bufsize=bufsize, stdin=PIPE, stdout=PIPE,
stderr=subprocess.STDOUT, close_fds=True)
return p.stdin, p.stdout
__all__.append("popen4")
import copy_reg as _copy_reg
def _make_stat_result(tup, dict):
return stat_result(tup, dict)
def _pickle_stat_result(sr):
(type, args) = sr.__reduce__()
return (_make_stat_result, args)
try:
_copy_reg.pickle(stat_result, _pickle_stat_result, _make_stat_result)
except NameError: # stat_result may not exist
pass
def _make_statvfs_result(tup, dict):
return statvfs_result(tup, dict)
def _pickle_statvfs_result(sr):
(type, args) = sr.__reduce__()
return (_make_statvfs_result, args)
try:
_copy_reg.pickle(statvfs_result, _pickle_statvfs_result,
_make_statvfs_result)
except NameError: # statvfs_result may not exist
pass
# Module 'os2emxpath' -- common operations on OS/2 pathnames
"""Common pathname manipulations, OS/2 EMX version.
Instead of importing this module directly, import os and refer to this
module as os.path.
"""
import os
import stat
from genericpath import *
from genericpath import _unicode
from ntpath import (expanduser, expandvars, isabs, islink, splitdrive,
splitext, split, walk)
__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime", "islink","exists","lexists","isdir","isfile",
"ismount","walk","expanduser","expandvars","normpath","abspath",
"splitunc","curdir","pardir","sep","pathsep","defpath","altsep",
"extsep","devnull","realpath","supports_unicode_filenames"]
# strings representing various path-related bits and pieces
curdir = '.'
pardir = '..'
extsep = '.'
sep = '/'
altsep = '\\'
pathsep = ';'
defpath = '.;C:\\bin'
devnull = 'nul'
# Normalize the case of a pathname and map slashes to backslashes.
# Other normalizations (such as optimizing '../' away) are not done
# (this is done by normpath).
def normcase(s):
"""Normalize case of pathname.
Makes all characters lowercase and all altseps into seps."""
return s.replace('\\', '/').lower()
# Join two (or more) paths.
def join(a, *p):
"""Join two or more pathname components, inserting sep as needed"""
path = a
for b in p:
if isabs(b):
path = b
elif path == '' or path[-1:] in '/\\:':
path = path + b
else:
path = path + '/' + b
return path
# Parse UNC paths
def splitunc(p):
"""Split a pathname into UNC mount point and relative path specifiers.
Return a 2-tuple (unc, rest); either part may be empty.
If unc is not empty, it has the form '//host/mount' (or similar
using backslashes). unc+rest is always the input path.
Paths containing drive letters never have a UNC part.
"""
if p[1:2] == ':':
return '', p # Drive letter present
firstTwo = p[0:2]
if firstTwo == '/' * 2 or firstTwo == '\\' * 2:
# is a UNC path:
# vvvvvvvvvvvvvvvvvvvv equivalent to drive letter
# \\machine\mountpoint\directories...
# directory ^^^^^^^^^^^^^^^
normp = normcase(p)
index = normp.find('/', 2)
if index == -1:
##raise RuntimeError, 'illegal UNC path: "' + p + '"'
return ("", p)
index = normp.find('/', index + 1)
if index == -1:
index = len(p)
return p[:index], p[index:]
return '', p
# Return the tail (basename) part of a path.
def basename(p):
"""Returns the final component of a pathname"""
return split(p)[1]
# Return the head (dirname) part of a path.
def dirname(p):
"""Returns the directory component of a pathname"""
return split(p)[0]
# alias exists to lexists
lexists = exists
# Is a path a directory?
# Is a path a mount point? Either a root (with or without drive letter)
# or a UNC path with at most a / or \ after the mount point.
def ismount(path):
"""Test whether a path is a mount point (defined as root of drive)"""
unc, rest = splitunc(path)
if unc:
return rest in ("", "/", "\\")
p = splitdrive(path)[1]
return len(p) == 1 and p[0] in '/\\'
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A/B.
def normpath(path):
"""Normalize path, eliminating double slashes, etc."""
path = path.replace('\\', '/')
prefix, path = splitdrive(path)
while path[:1] == '/':
prefix = prefix + '/'
path = path[1:]
comps = path.split('/')
i = 0
while i < len(comps):
if comps[i] == '.':
del comps[i]
elif comps[i] == '..' and i > 0 and comps[i-1] not in ('', '..'):
del comps[i-1:i+1]
i = i - 1
elif comps[i] == '' and i > 0 and comps[i-1] != '':
del comps[i]
else:
i = i + 1
# If the path is now empty, substitute '.'
if not prefix and not comps:
comps.append('.')
return prefix + '/'.join(comps)
# Return an absolute path.
def abspath(path):
"""Return the absolute version of a path"""
if not isabs(path):
if isinstance(path, _unicode):
cwd = os.getcwdu()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path)
# realpath is a no-op on systems without islink support
realpath = abspath
supports_unicode_filenames = False
#! /usr/bin/env python
"""A Python debugger."""
# (See pdb.doc for documentation.)
import sys
import linecache
import cmd
import bdb
from repr import Repr
import os
import re
import pprint
import traceback
class Restart(Exception):
"""Causes a debugger to be restarted for the debugged python program."""
pass
# Create a custom safe Repr instance and increase its maxstring.
# The default of 30 truncates error messages too easily.
_repr = Repr()
_repr.maxstring = 200
_saferepr = _repr.repr
__all__ = ["run", "pm", "Pdb", "runeval", "runctx", "runcall", "set_trace",
"post_mortem", "help"]
def find_function(funcname, filename):
cre = re.compile(r'def\s+%s\s*[(]' % re.escape(funcname))
try:
fp = open(filename)
except IOError:
return None
# consumer of this info expects the first line to be 1
lineno = 1
answer = None
while 1:
line = fp.readline()
if line == '':
break
if cre.match(line):
answer = funcname, filename, lineno
break
lineno = lineno + 1
fp.close()
return answer
# Interaction prompt line will separate file and call info from code
# text using value of line_prefix string. A newline and arrow may
# be to your liking. You can set it once pdb is imported using the
# command "pdb.line_prefix = '\n% '".
# line_prefix = ': ' # Use this to get the old situation back
line_prefix = '\n-> ' # Probably a better default
class Pdb(bdb.Bdb, cmd.Cmd):
def __init__(self, completekey='tab', stdin=None, stdout=None, skip=None):
bdb.Bdb.__init__(self, skip=skip)
cmd.Cmd.__init__(self, completekey, stdin, stdout)
if stdout:
self.use_rawinput = 0
self.prompt = '(Pdb) '
self.aliases = {}
self.mainpyfile = ''
self._wait_for_mainpyfile = 0
# Try to load readline if it exists
try:
import readline
except ImportError:
pass
# Read $HOME/.pdbrc and ./.pdbrc
self.rcLines = []
if 'HOME' in os.environ:
envHome = os.environ['HOME']
try:
rcFile = open(os.path.join(envHome, ".pdbrc"))
except IOError:
pass
else:
for line in rcFile.readlines():
self.rcLines.append(line)
rcFile.close()
try:
rcFile = open(".pdbrc")
except IOError:
pass
else:
for line in rcFile.readlines():
self.rcLines.append(line)
rcFile.close()
self.commands = {} # associates a command list to breakpoint numbers
self.commands_doprompt = {} # for each bp num, tells if the prompt
# must be disp. after execing the cmd list
self.commands_silent = {} # for each bp num, tells if the stack trace
# must be disp. after execing the cmd list
self.commands_defining = False # True while in the process of defining
# a command list
self.commands_bnum = None # The breakpoint number for which we are
# defining a list
def reset(self):
bdb.Bdb.reset(self)
self.forget()
def forget(self):
self.lineno = None
self.stack = []
self.curindex = 0
self.curframe = None
def setup(self, f, t):
self.forget()
self.stack, self.curindex = self.get_stack(f, t)
self.curframe = self.stack[self.curindex][0]
# The f_locals dictionary is updated from the actual frame
# locals whenever the .f_locals accessor is called, so we
# cache it here to ensure that modifications are not overwritten.
self.curframe_locals = self.curframe.f_locals
self.execRcLines()
# Can be executed earlier than 'setup' if desired
def execRcLines(self):
if self.rcLines:
# Make local copy because of recursion
rcLines = self.rcLines
# executed only once
self.rcLines = []
for line in rcLines:
line = line[:-1]
if len(line) > 0 and line[0] != '#':
self.onecmd(line)
# Override Bdb methods
def user_call(self, frame, argument_list):
"""This method is called when there is the remote possibility
that we ever need to stop in this function."""
if self._wait_for_mainpyfile:
return
if self.stop_here(frame):
print >>self.stdout, '--Call--'
self.interaction(frame, None)
def user_line(self, frame):
"""This function is called when we stop or break at this line."""
if self._wait_for_mainpyfile:
if (self.mainpyfile != self.canonic(frame.f_code.co_filename)
or frame.f_lineno<= 0):
return
self._wait_for_mainpyfile = 0
if self.bp_commands(frame):
self.interaction(frame, None)
def bp_commands(self,frame):
"""Call every command that was set for the current active breakpoint
(if there is one).
Returns True if the normal interaction function must be called,
False otherwise."""
# self.currentbp is set in bdb in Bdb.break_here if a breakpoint was hit
if getattr(self, "currentbp", False) and \
self.currentbp in self.commands:
currentbp = self.currentbp
self.currentbp = 0
lastcmd_back = self.lastcmd
self.setup(frame, None)
for line in self.commands[currentbp]:
self.onecmd(line)
self.lastcmd = lastcmd_back
if not self.commands_silent[currentbp]:
self.print_stack_entry(self.stack[self.curindex])
if self.commands_doprompt[currentbp]:
self.cmdloop()
self.forget()
return
return 1
def user_return(self, frame, return_value):
"""This function is called when a return trap is set here."""
if self._wait_for_mainpyfile:
return
frame.f_locals['__return__'] = return_value
print >>self.stdout, '--Return--'
self.interaction(frame, None)
def user_exception(self, frame, exc_info):
"""This function is called if an exception occurs,
but only if we are to stop at or just below this level."""
if self._wait_for_mainpyfile:
return
exc_type, exc_value, exc_traceback = exc_info
frame.f_locals['__exception__'] = exc_type, exc_value
if type(exc_type) == type(''):
exc_type_name = exc_type
else: exc_type_name = exc_type.__name__
print >>self.stdout, exc_type_name + ':', _saferepr(exc_value)
self.interaction(frame, exc_traceback)
# General interaction function
def interaction(self, frame, traceback):
self.setup(frame, traceback)
self.print_stack_entry(self.stack[self.curindex])
self.cmdloop()
self.forget()
def displayhook(self, obj):
"""Custom displayhook for the exec in default(), which prevents
assignment of the _ variable in the builtins.
"""
# reproduce the behavior of the standard displayhook, not printing None
if obj is not None:
print repr(obj)
def default(self, line):
if line[:1] == '!': line = line[1:]
locals = self.curframe_locals
globals = self.curframe.f_globals
try:
code = compile(line + '\n', '<stdin>', 'single')
save_stdout = sys.stdout
save_stdin = sys.stdin
save_displayhook = sys.displayhook
try:
sys.stdin = self.stdin
sys.stdout = self.stdout
sys.displayhook = self.displayhook
exec code in globals, locals
finally:
sys.stdout = save_stdout
sys.stdin = save_stdin
sys.displayhook = save_displayhook
except:
t, v = sys.exc_info()[:2]
if type(t) == type(''):
exc_type_name = t
else: exc_type_name = t.__name__
print >>self.stdout, '***', exc_type_name + ':', v
def precmd(self, line):
"""Handle alias expansion and ';;' separator."""
if not line.strip():
return line
args = line.split()
while args[0] in self.aliases:
line = self.aliases[args[0]]
ii = 1
for tmpArg in args[1:]:
line = line.replace("%" + str(ii),
tmpArg)
ii = ii + 1
line = line.replace("%*", ' '.join(args[1:]))
args = line.split()
# split into ';;' separated commands
# unless it's an alias command
if args[0] != 'alias':
marker = line.find(';;')
if marker >= 0:
# queue up everything after marker
next = line[marker+2:].lstrip()
self.cmdqueue.append(next)
line = line[:marker].rstrip()
return line
def onecmd(self, line):
"""Interpret the argument as though it had been typed in response
to the prompt.
Checks whether this line is typed at the normal prompt or in
a breakpoint command list definition.
"""
if not self.commands_defining:
return cmd.Cmd.onecmd(self, line)
else:
return self.handle_command_def(line)
def handle_command_def(self,line):
"""Handles one command line during command list definition."""
cmd, arg, line = self.parseline(line)
if not cmd:
return
if cmd == 'silent':
self.commands_silent[self.commands_bnum] = True
return # continue to handle other cmd def in the cmd list
elif cmd == 'end':
self.cmdqueue = []
return 1 # end of cmd list
cmdlist = self.commands[self.commands_bnum]
if arg:
cmdlist.append(cmd+' '+arg)
else:
cmdlist.append(cmd)
# Determine if we must stop
try:
func = getattr(self, 'do_' + cmd)
except AttributeError:
func = self.default
# one of the resuming commands
if func.func_name in self.commands_resuming:
self.commands_doprompt[self.commands_bnum] = False
self.cmdqueue = []
return 1
return
# Command definitions, called by cmdloop()
# The argument is the remaining string on the command line
# Return true to exit from the command loop
do_h = cmd.Cmd.do_help
def do_commands(self, arg):
"""Defines a list of commands associated to a breakpoint.
Those commands will be executed whenever the breakpoint causes
the program to stop execution."""
if not arg:
bnum = len(bdb.Breakpoint.bpbynumber)-1
else:
try:
bnum = int(arg)
except:
print >>self.stdout, "Usage : commands [bnum]\n ..." \
"\n end"
return
self.commands_bnum = bnum
self.commands[bnum] = []
self.commands_doprompt[bnum] = True
self.commands_silent[bnum] = False
prompt_back = self.prompt
self.prompt = '(com) '
self.commands_defining = True
try:
self.cmdloop()
finally:
self.commands_defining = False
self.prompt = prompt_back
def do_break(self, arg, temporary = 0):
# break [ ([filename:]lineno | function) [, "condition"] ]
if not arg:
if self.breaks: # There's at least one
print >>self.stdout, "Num Type Disp Enb Where"
for bp in bdb.Breakpoint.bpbynumber:
if bp:
bp.bpprint(self.stdout)
return
# parse arguments; comma has lowest precedence
# and cannot occur in filename
filename = None
lineno = None
cond = None
comma = arg.find(',')
if comma > 0:
# parse stuff after comma: "condition"
cond = arg[comma+1:].lstrip()
arg = arg[:comma].rstrip()
# parse stuff before comma: [filename:]lineno | function
colon = arg.rfind(':')
funcname = None
if colon >= 0:
filename = arg[:colon].rstrip()
f = self.lookupmodule(filename)
if not f:
print >>self.stdout, '*** ', repr(filename),
print >>self.stdout, 'not found from sys.path'
return
else:
filename = f
arg = arg[colon+1:].lstrip()
try:
lineno = int(arg)
except ValueError, msg:
print >>self.stdout, '*** Bad lineno:', arg
return
else:
# no colon; can be lineno or function
try:
lineno = int(arg)
except ValueError:
try:
func = eval(arg,
self.curframe.f_globals,
self.curframe_locals)
except:
func = arg
try:
if hasattr(func, 'im_func'):
func = func.im_func
code = func.func_code
#use co_name to identify the bkpt (function names
#could be aliased, but co_name is invariant)
funcname = code.co_name
lineno = code.co_firstlineno
filename = code.co_filename
except:
# last thing to try
(ok, filename, ln) = self.lineinfo(arg)
if not ok:
print >>self.stdout, '*** The specified object',
print >>self.stdout, repr(arg),
print >>self.stdout, 'is not a function'
print >>self.stdout, 'or was not found along sys.path.'
return
funcname = ok # ok contains a function name
lineno = int(ln)
if not filename:
filename = self.defaultFile()
# Check for reasonable breakpoint
line = self.checkline(filename, lineno)
if line:
# now set the break point
err = self.set_break(filename, line, temporary, cond, funcname)
if err: print >>self.stdout, '***', err
else:
bp = self.get_breaks(filename, line)[-1]
print >>self.stdout, "Breakpoint %d at %s:%d" % (bp.number,
bp.file,
bp.line)
# To be overridden in derived debuggers
def defaultFile(self):
"""Produce a reasonable default."""
filename = self.curframe.f_code.co_filename
if filename == '<string>' and self.mainpyfile:
filename = self.mainpyfile
return filename
do_b = do_break
def do_tbreak(self, arg):
self.do_break(arg, 1)
def lineinfo(self, identifier):
failed = (None, None, None)
# Input is identifier, may be in single quotes
idstring = identifier.split("'")
if len(idstring) == 1:
# not in single quotes
id = idstring[0].strip()
elif len(idstring) == 3:
# quoted
id = idstring[1].strip()
else:
return failed
if id == '': return failed
parts = id.split('.')
# Protection for derived debuggers
if parts[0] == 'self':
del parts[0]
if len(parts) == 0:
return failed
# Best first guess at file to look at
fname = self.defaultFile()
if len(parts) == 1:
item = parts[0]
else:
# More than one part.
# First is module, second is method/class
f = self.lookupmodule(parts[0])
if f:
fname = f
item = parts[1]
answer = find_function(item, fname)
return answer or failed
def checkline(self, filename, lineno):
"""Check whether specified line seems to be executable.
Return `lineno` if it is, 0 if not (e.g. a docstring, comment, blank
line or EOF). Warning: testing is not comprehensive.
"""
# this method should be callable before starting debugging, so default
# to "no globals" if there is no current frame
globs = self.curframe.f_globals if hasattr(self, 'curframe') else None
line = linecache.getline(filename, lineno, globs)
if not line:
print >>self.stdout, 'End of file'
return 0
line = line.strip()
# Don't allow setting breakpoint at a blank line
if (not line or (line[0] == '#') or
(line[:3] == '"""') or line[:3] == "'''"):
print >>self.stdout, '*** Blank or comment'
return 0
return lineno
def do_enable(self, arg):
args = arg.split()
for i in args:
try:
i = int(i)
except ValueError:
print >>self.stdout, 'Breakpoint index %r is not a number' % i
continue
if not (0 <= i < len(bdb.Breakpoint.bpbynumber)):
print >>self.stdout, 'No breakpoint numbered', i
continue
bp = bdb.Breakpoint.bpbynumber[i]
if bp:
bp.enable()
def do_disable(self, arg):
args = arg.split()
for i in args:
try:
i = int(i)
except ValueError:
print >>self.stdout, 'Breakpoint index %r is not a number' % i
continue
if not (0 <= i < len(bdb.Breakpoint.bpbynumber)):
print >>self.stdout, 'No breakpoint numbered', i
continue
bp = bdb.Breakpoint.bpbynumber[i]
if bp:
bp.disable()
def do_condition(self, arg):
# arg is breakpoint number and condition
args = arg.split(' ', 1)
try:
bpnum = int(args[0].strip())
except ValueError:
# something went wrong
print >>self.stdout, \
'Breakpoint index %r is not a number' % args[0]
return
try:
cond = args[1]
except:
cond = None
try:
bp = bdb.Breakpoint.bpbynumber[bpnum]
except IndexError:
print >>self.stdout, 'Breakpoint index %r is not valid' % args[0]
return
if bp:
bp.cond = cond
if not cond:
print >>self.stdout, 'Breakpoint', bpnum,
print >>self.stdout, 'is now unconditional.'
def do_ignore(self,arg):
"""arg is bp number followed by ignore count."""
args = arg.split()
try:
bpnum = int(args[0].strip())
except ValueError:
# something went wrong
print >>self.stdout, \
'Breakpoint index %r is not a number' % args[0]
return
try:
count = int(args[1].strip())
except:
count = 0
try:
bp = bdb.Breakpoint.bpbynumber[bpnum]
except IndexError:
print >>self.stdout, 'Breakpoint index %r is not valid' % args[0]
return
if bp:
bp.ignore = count
if count > 0:
reply = 'Will ignore next '
if count > 1:
reply = reply + '%d crossings' % count
else:
reply = reply + '1 crossing'
print >>self.stdout, reply + ' of breakpoint %d.' % bpnum
else:
print >>self.stdout, 'Will stop next time breakpoint',
print >>self.stdout, bpnum, 'is reached.'
def do_clear(self, arg):
"""Three possibilities, tried in this order:
clear -> clear all breaks, ask for confirmation
clear file:lineno -> clear all breaks at file:lineno
clear bpno bpno ... -> clear breakpoints by number"""
if not arg:
try:
reply = raw_input('Clear all breaks? ')
except EOFError:
reply = 'no'
reply = reply.strip().lower()
if reply in ('y', 'yes'):
self.clear_all_breaks()
return
if ':' in arg:
# Make sure it works for "clear C:\foo\bar.py:12"
i = arg.rfind(':')
filename = arg[:i]
arg = arg[i+1:]
try:
lineno = int(arg)
except ValueError:
err = "Invalid line number (%s)" % arg
else:
err = self.clear_break(filename, lineno)
if err: print >>self.stdout, '***', err
return
numberlist = arg.split()
for i in numberlist:
try:
i = int(i)
except ValueError:
print >>self.stdout, 'Breakpoint index %r is not a number' % i
continue
if not (0 <= i < len(bdb.Breakpoint.bpbynumber)):
print >>self.stdout, 'No breakpoint numbered', i
continue
err = self.clear_bpbynumber(i)
if err:
print >>self.stdout, '***', err
else:
print >>self.stdout, 'Deleted breakpoint', i
do_cl = do_clear # 'c' is already an abbreviation for 'continue'
def do_where(self, arg):
self.print_stack_trace()
do_w = do_where
do_bt = do_where
def do_up(self, arg):
if self.curindex == 0:
print >>self.stdout, '*** Oldest frame'
else:
self.curindex = self.curindex - 1
self.curframe = self.stack[self.curindex][0]
self.curframe_locals = self.curframe.f_locals
self.print_stack_entry(self.stack[self.curindex])
self.lineno = None
do_u = do_up
def do_down(self, arg):
if self.curindex + 1 == len(self.stack):
print >>self.stdout, '*** Newest frame'
else:
self.curindex = self.curindex + 1
self.curframe = self.stack[self.curindex][0]
self.curframe_locals = self.curframe.f_locals
self.print_stack_entry(self.stack[self.curindex])
self.lineno = None
do_d = do_down
def do_until(self, arg):
self.set_until(self.curframe)
return 1
do_unt = do_until
def do_step(self, arg):
self.set_step()
return 1
do_s = do_step
def do_next(self, arg):
self.set_next(self.curframe)
return 1
do_n = do_next
def do_run(self, arg):
"""Restart program by raising an exception to be caught in the main
debugger loop. If arguments were given, set them in sys.argv."""
if arg:
import shlex
argv0 = sys.argv[0:1]
sys.argv = shlex.split(arg)
sys.argv[:0] = argv0
raise Restart
do_restart = do_run
def do_return(self, arg):
self.set_return(self.curframe)
return 1
do_r = do_return
def do_continue(self, arg):
self.set_continue()
return 1
do_c = do_cont = do_continue
def do_jump(self, arg):
if self.curindex + 1 != len(self.stack):
print >>self.stdout, "*** You can only jump within the bottom frame"
return
try:
arg = int(arg)
except ValueError:
print >>self.stdout, "*** The 'jump' command requires a line number."
else:
try:
# Do the jump, fix up our copy of the stack, and display the
# new position
self.curframe.f_lineno = arg
self.stack[self.curindex] = self.stack[self.curindex][0], arg
self.print_stack_entry(self.stack[self.curindex])
except ValueError, e:
print >>self.stdout, '*** Jump failed:', e
do_j = do_jump
def do_debug(self, arg):
sys.settrace(None)
globals = self.curframe.f_globals
locals = self.curframe_locals
p = Pdb(self.completekey, self.stdin, self.stdout)
p.prompt = "(%s) " % self.prompt.strip()
print >>self.stdout, "ENTERING RECURSIVE DEBUGGER"
sys.call_tracing(p.run, (arg, globals, locals))
print >>self.stdout, "LEAVING RECURSIVE DEBUGGER"
sys.settrace(self.trace_dispatch)
self.lastcmd = p.lastcmd
def do_quit(self, arg):
self._user_requested_quit = 1
self.set_quit()
return 1
do_q = do_quit
do_exit = do_quit
def do_EOF(self, arg):
print >>self.stdout
self._user_requested_quit = 1
self.set_quit()
return 1
def do_args(self, arg):
co = self.curframe.f_code
dict = self.curframe_locals
n = co.co_argcount
if co.co_flags & 4: n = n+1
if co.co_flags & 8: n = n+1
for i in range(n):
name = co.co_varnames[i]
print >>self.stdout, name, '=',
if name in dict: print >>self.stdout, dict[name]
else: print >>self.stdout, "*** undefined ***"
do_a = do_args
def do_retval(self, arg):
if '__return__' in self.curframe_locals:
print >>self.stdout, self.curframe_locals['__return__']
else:
print >>self.stdout, '*** Not yet returned!'
do_rv = do_retval
def _getval(self, arg):
try:
return eval(arg, self.curframe.f_globals,
self.curframe_locals)
except:
t, v = sys.exc_info()[:2]
if isinstance(t, str):
exc_type_name = t
else: exc_type_name = t.__name__
print >>self.stdout, '***', exc_type_name + ':', repr(v)
raise
def do_p(self, arg):
try:
print >>self.stdout, repr(self._getval(arg))
except:
pass
def do_pp(self, arg):
try:
pprint.pprint(self._getval(arg), self.stdout)
except:
pass
def do_list(self, arg):
self.lastcmd = 'list'
last = None
if arg:
try:
x = eval(arg, {}, {})
if type(x) == type(()):
first, last = x
first = int(first)
last = int(last)
if last < first:
# Assume it's a count
last = first + last
else:
first = max(1, int(x) - 5)
except:
print >>self.stdout, '*** Error in argument:', repr(arg)
return
elif self.lineno is None:
first = max(1, self.curframe.f_lineno - 5)
else:
first = self.lineno + 1
if last is None:
last = first + 10
filename = self.curframe.f_code.co_filename
breaklist = self.get_file_breaks(filename)
try:
for lineno in range(first, last+1):
line = linecache.getline(filename, lineno,
self.curframe.f_globals)
if not line:
print >>self.stdout, '[EOF]'
break
else:
s = repr(lineno).rjust(3)
if len(s) < 4: s = s + ' '
if lineno in breaklist: s = s + 'B'
else: s = s + ' '
if lineno == self.curframe.f_lineno:
s = s + '->'
print >>self.stdout, s + '\t' + line,
self.lineno = lineno
except KeyboardInterrupt:
pass
do_l = do_list
def do_whatis(self, arg):
try:
value = eval(arg, self.curframe.f_globals,
self.curframe_locals)
except:
t, v = sys.exc_info()[:2]
if type(t) == type(''):
exc_type_name = t
else: exc_type_name = t.__name__
print >>self.stdout, '***', exc_type_name + ':', repr(v)
return
code = None
# Is it a function?
try: code = value.func_code
except: pass
if code:
print >>self.stdout, 'Function', code.co_name
return
# Is it an instance method?
try: code = value.im_func.func_code
except: pass
if code:
print >>self.stdout, 'Method', code.co_name
return
# None of the above...
print >>self.stdout, type(value)
def do_alias(self, arg):
args = arg.split()
if len(args) == 0:
keys = self.aliases.keys()
keys.sort()
for alias in keys:
print >>self.stdout, "%s = %s" % (alias, self.aliases[alias])
return
if args[0] in self.aliases and len(args) == 1:
print >>self.stdout, "%s = %s" % (args[0], self.aliases[args[0]])
else:
self.aliases[args[0]] = ' '.join(args[1:])
def do_unalias(self, arg):
args = arg.split()
if len(args) == 0: return
if args[0] in self.aliases:
del self.aliases[args[0]]
#list of all the commands making the program resume execution.
commands_resuming = ['do_continue', 'do_step', 'do_next', 'do_return',
'do_quit', 'do_jump']
# Print a traceback starting at the top stack frame.
# The most recently entered frame is printed last;
# this is different from dbx and gdb, but consistent with
# the Python interpreter's stack trace.
# It is also consistent with the up/down commands (which are
# compatible with dbx and gdb: up moves towards 'main()'
# and down moves towards the most recent stack frame).
def print_stack_trace(self):
try:
for frame_lineno in self.stack:
self.print_stack_entry(frame_lineno)
except KeyboardInterrupt:
pass
def print_stack_entry(self, frame_lineno, prompt_prefix=line_prefix):
frame, lineno = frame_lineno
if frame is self.curframe:
print >>self.stdout, '>',
else:
print >>self.stdout, ' ',
print >>self.stdout, self.format_stack_entry(frame_lineno,
prompt_prefix)
# Help methods (derived from pdb.doc)
def help_help(self):
self.help_h()
def help_h(self):
print >>self.stdout, """h(elp)
Without argument, print the list of available commands.
With a command name as argument, print help about that command
"help pdb" pipes the full documentation file to the $PAGER
"help exec" gives help on the ! command"""
def help_where(self):
self.help_w()
def help_w(self):
print >>self.stdout, """w(here)
Print a stack trace, with the most recent frame at the bottom.
An arrow indicates the "current frame", which determines the
context of most commands. 'bt' is an alias for this command."""
help_bt = help_w
def help_down(self):
self.help_d()
def help_d(self):
print >>self.stdout, """d(own)
Move the current frame one level down in the stack trace
(to a newer frame)."""
def help_up(self):
self.help_u()
def help_u(self):
print >>self.stdout, """u(p)
Move the current frame one level up in the stack trace
(to an older frame)."""
def help_break(self):
self.help_b()
def help_b(self):
print >>self.stdout, """b(reak) ([file:]lineno | function) [, condition]
With a line number argument, set a break there in the current
file. With a function name, set a break at first executable line
of that function. Without argument, list all breaks. If a second
argument is present, it is a string specifying an expression
which must evaluate to true before the breakpoint is honored.
The line number may be prefixed with a filename and a colon,
to specify a breakpoint in another file (probably one that
hasn't been loaded yet). The file is searched for on sys.path;
the .py suffix may be omitted."""
def help_clear(self):
self.help_cl()
def help_cl(self):
print >>self.stdout, "cl(ear) filename:lineno"
print >>self.stdout, """cl(ear) [bpnumber [bpnumber...]]
With a space separated list of breakpoint numbers, clear
those breakpoints. Without argument, clear all breaks (but
first ask confirmation). With a filename:lineno argument,
clear all breaks at that line in that file.
Note that the argument is different from previous versions of
the debugger (in python distributions 1.5.1 and before) where
a linenumber was used instead of either filename:lineno or
breakpoint numbers."""
def help_tbreak(self):
print >>self.stdout, """tbreak same arguments as break, but breakpoint
is removed when first hit."""
def help_enable(self):
print >>self.stdout, """enable bpnumber [bpnumber ...]
Enables the breakpoints given as a space separated list of
bp numbers."""
def help_disable(self):
print >>self.stdout, """disable bpnumber [bpnumber ...]
Disables the breakpoints given as a space separated list of
bp numbers."""
def help_ignore(self):
print >>self.stdout, """ignore bpnumber count
Sets the ignore count for the given breakpoint number. A breakpoint
becomes active when the ignore count is zero. When non-zero, the
count is decremented each time the breakpoint is reached and the
breakpoint is not disabled and any associated condition evaluates
to true."""
def help_condition(self):
print >>self.stdout, """condition bpnumber str_condition
str_condition is a string specifying an expression which
must evaluate to true before the breakpoint is honored.
If str_condition is absent, any existing condition is removed;
i.e., the breakpoint is made unconditional."""
def help_step(self):
self.help_s()
def help_s(self):
print >>self.stdout, """s(tep)
Execute the current line, stop at the first possible occasion
(either in a function that is called or in the current function)."""
def help_until(self):
self.help_unt()
def help_unt(self):
print """unt(il)
Continue execution until the line with a number greater than the current
one is reached or until the current frame returns"""
def help_next(self):
self.help_n()
def help_n(self):
print >>self.stdout, """n(ext)
Continue execution until the next line in the current function
is reached or it returns."""
def help_return(self):
self.help_r()
def help_r(self):
print >>self.stdout, """r(eturn)
Continue execution until the current function returns."""
def help_continue(self):
self.help_c()
def help_cont(self):
self.help_c()
def help_c(self):
print >>self.stdout, """c(ont(inue))
Continue execution, only stop when a breakpoint is encountered."""
def help_jump(self):
self.help_j()
def help_j(self):
print >>self.stdout, """j(ump) lineno
Set the next line that will be executed."""
def help_debug(self):
print >>self.stdout, """debug code
Enter a recursive debugger that steps through the code argument
(which is an arbitrary expression or statement to be executed
in the current environment)."""
def help_list(self):
self.help_l()
def help_l(self):
print >>self.stdout, """l(ist) [first [,last]]
List source code for the current file.
Without arguments, list 11 lines around the current line
or continue the previous listing.
With one argument, list 11 lines starting at that line.
With two arguments, list the given range;
if the second argument is less than the first, it is a count."""
def help_args(self):
self.help_a()
def help_a(self):
print >>self.stdout, """a(rgs)
Print the arguments of the current function."""
def help_p(self):
print >>self.stdout, """p expression
Print the value of the expression."""
def help_pp(self):
print >>self.stdout, """pp expression
Pretty-print the value of the expression."""
def help_exec(self):
print >>self.stdout, """(!) statement
Execute the (one-line) statement in the context of
the current stack frame.
The exclamation point can be omitted unless the first word
of the statement resembles a debugger command.
To assign to a global variable you must always prefix the
command with a 'global' command, e.g.:
(Pdb) global list_options; list_options = ['-l']
(Pdb)"""
def help_run(self):
print """run [args...]
Restart the debugged python program. If a string is supplied, it is
split with "shlex" and the result is used as the new sys.argv.
History, breakpoints, actions and debugger options are preserved.
"restart" is an alias for "run"."""
help_restart = help_run
def help_quit(self):
self.help_q()
def help_q(self):
print >>self.stdout, """q(uit) or exit - Quit from the debugger.
The program being executed is aborted."""
help_exit = help_q
def help_whatis(self):
print >>self.stdout, """whatis arg
Prints the type of the argument."""
def help_EOF(self):
print >>self.stdout, """EOF
Handles the receipt of EOF as a command."""
def help_alias(self):
print >>self.stdout, """alias [name [command [parameter parameter ...]]]
Creates an alias called 'name' the executes 'command'. The command
must *not* be enclosed in quotes. Replaceable parameters are
indicated by %1, %2, and so on, while %* is replaced by all the
parameters. If no command is given, the current alias for name
is shown. If no name is given, all aliases are listed.
Aliases may be nested and can contain anything that can be
legally typed at the pdb prompt. Note! You *can* override
internal pdb commands with aliases! Those internal commands
are then hidden until the alias is removed. Aliasing is recursively
applied to the first word of the command line; all other words
in the line are left alone.
Some useful aliases (especially when placed in the .pdbrc file) are:
#Print instance variables (usage "pi classInst")
alias pi for k in %1.__dict__.keys(): print "%1.",k,"=",%1.__dict__[k]
#Print instance variables in self
alias ps pi self
"""
def help_unalias(self):
print >>self.stdout, """unalias name
Deletes the specified alias."""
def help_commands(self):
print >>self.stdout, """commands [bpnumber]
(com) ...
(com) end
(Pdb)
Specify a list of commands for breakpoint number bpnumber. The
commands themselves appear on the following lines. Type a line
containing just 'end' to terminate the commands.
To remove all commands from a breakpoint, type commands and
follow it immediately with end; that is, give no commands.
With no bpnumber argument, commands refers to the last
breakpoint set.
You can use breakpoint commands to start your program up again.
Simply use the continue command, or step, or any other
command that resumes execution.
Specifying any command resuming execution (currently continue,
step, next, return, jump, quit and their abbreviations) terminates
the command list (as if that command was immediately followed by end).
This is because any time you resume execution
(even with a simple next or step), you may encounter
another breakpoint--which could have its own command list, leading to
ambiguities about which list to execute.
If you use the 'silent' command in the command list, the
usual message about stopping at a breakpoint is not printed. This may
be desirable for breakpoints that are to print a specific message and
then continue. If none of the other commands print anything, you
see no sign that the breakpoint was reached.
"""
def help_pdb(self):
help()
def lookupmodule(self, filename):
"""Helper function for break/clear parsing -- may be overridden.
lookupmodule() translates (possibly incomplete) file or module name
into an absolute file name.
"""
if os.path.isabs(filename) and os.path.exists(filename):
return filename
f = os.path.join(sys.path[0], filename)
if os.path.exists(f) and self.canonic(f) == self.mainpyfile:
return f
root, ext = os.path.splitext(filename)
if ext == '':
filename = filename + '.py'
if os.path.isabs(filename):
return filename
for dirname in sys.path:
while os.path.islink(dirname):
dirname = os.readlink(dirname)
fullname = os.path.join(dirname, filename)
if os.path.exists(fullname):
return fullname
return None
def _runscript(self, filename):
# The script has to run in __main__ namespace (or imports from
# __main__ will break).
#
# So we clear up the __main__ and set several special variables
# (this gets rid of pdb's globals and cleans old variables on restarts).
import __main__
__main__.__dict__.clear()
__main__.__dict__.update({"__name__" : "__main__",
"__file__" : filename,
"__builtins__": __builtins__,
})
# When bdb sets tracing, a number of call and line events happens
# BEFORE debugger even reaches user's code (and the exact sequence of
# events depends on python version). So we take special measures to
# avoid stopping before we reach the main script (see user_line and
# user_call for details).
self._wait_for_mainpyfile = 1
self.mainpyfile = self.canonic(filename)
self._user_requested_quit = 0
statement = 'execfile(%r)' % filename
self.run(statement)
# Simplified interface
def run(statement, globals=None, locals=None):
Pdb().run(statement, globals, locals)
def runeval(expression, globals=None, locals=None):
return Pdb().runeval(expression, globals, locals)
def runctx(statement, globals, locals):
# B/W compatibility
run(statement, globals, locals)
def runcall(*args, **kwds):
return Pdb().runcall(*args, **kwds)
def set_trace():
Pdb().set_trace(sys._getframe().f_back)
# Post-Mortem interface
def post_mortem(t=None):
# handling the default
if t is None:
# sys.exc_info() returns (type, value, traceback) if an exception is
# being handled, otherwise it returns None
t = sys.exc_info()[2]
if t is None:
raise ValueError("A valid traceback must be passed if no "
"exception is being handled")
p = Pdb()
p.reset()
p.interaction(None, t)
def pm():
post_mortem(sys.last_traceback)
# Main program for testing
TESTCMD = 'import x; x.main()'
def test():
run(TESTCMD)
# print help
def help():
for dirname in sys.path:
fullname = os.path.join(dirname, 'pdb.doc')
if os.path.exists(fullname):
sts = os.system('${PAGER-more} '+fullname)
if sts: print '*** Pager exit status:', sts
break
else:
print 'Sorry, can\'t find the help file "pdb.doc"',
print 'along the Python search path'
def main():
if not sys.argv[1:] or sys.argv[1] in ("--help", "-h"):
print "usage: pdb.py scriptfile [arg] ..."
sys.exit(2)
mainpyfile = sys.argv[1] # Get script filename
if not os.path.exists(mainpyfile):
print 'Error:', mainpyfile, 'does not exist'
sys.exit(1)
del sys.argv[0] # Hide "pdb.py" from argument list
# Replace pdb's dir with script's dir in front of module search path.
sys.path[0] = os.path.dirname(mainpyfile)
# Note on saving/restoring sys.argv: it's a good idea when sys.argv was
# modified by the script being debugged. It's a bad idea when it was
# changed by the user from the command line. There is a "restart" command
# which allows explicit specification of command line arguments.
pdb = Pdb()
while True:
try:
pdb._runscript(mainpyfile)
if pdb._user_requested_quit:
break
print "The program finished and will be restarted"
except Restart:
print "Restarting", mainpyfile, "with arguments:"
print "\t" + " ".join(sys.argv[1:])
except SystemExit:
# In most cases SystemExit does not warrant a post-mortem session.
print "The program exited via sys.exit(). Exit status: ",
print sys.exc_info()[1]
except SyntaxError:
traceback.print_exc()
sys.exit(1)
except:
traceback.print_exc()
print "Uncaught exception. Entering post mortem debugging"
print "Running 'cont' or 'step' will restart the program"
t = sys.exc_info()[2]
pdb.interaction(None, t)
print "Post mortem debugger finished. The " + mainpyfile + \
" will be restarted"
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
"""Create portable serialized representations of Python objects.
See module cPickle for a (much) faster implementation.
See module copy_reg for a mechanism for registering custom picklers.
See module pickletools source for extensive comments.
Classes:
Pickler
Unpickler
Functions:
dump(object, file)
dumps(object) -> string
load(file) -> object
loads(string) -> object
Misc variables:
__version__
format_version
compatible_formats
"""
__version__ = "$Revision: 72223 $" # Code version
from types import *
from copy_reg import dispatch_table
from copy_reg import _extension_registry, _inverted_registry, _extension_cache
import marshal
import sys
import struct
import re
__all__ = ["PickleError", "PicklingError", "UnpicklingError", "Pickler",
"Unpickler", "dump", "dumps", "load", "loads"]
# These are purely informational; no code uses these.
format_version = "2.0" # File format version we write
compatible_formats = ["1.0", # Original protocol 0
"1.1", # Protocol 0 with INST added
"1.2", # Original protocol 1
"1.3", # Protocol 1 with BINFLOAT added
"2.0", # Protocol 2
] # Old format versions we can read
# Keep in synch with cPickle. This is the highest protocol number we
# know how to read.
HIGHEST_PROTOCOL = 2
# Why use struct.pack() for pickling but marshal.loads() for
# unpickling? struct.pack() is 40% faster than marshal.dumps(), but
# marshal.loads() is twice as fast as struct.unpack()!
mloads = marshal.loads
class PickleError(Exception):
"""A common base class for the other pickling exceptions."""
pass
class PicklingError(PickleError):
"""This exception is raised when an unpicklable object is passed to the
dump() method.
"""
pass
class UnpicklingError(PickleError):
"""This exception is raised when there is a problem unpickling an object,
such as a security violation.
Note that other exceptions may also be raised during unpickling, including
(but not necessarily limited to) AttributeError, EOFError, ImportError,
and IndexError.
"""
pass
# An instance of _Stop is raised by Unpickler.load_stop() in response to
# the STOP opcode, passing the object that is the result of unpickling.
class _Stop(Exception):
def __init__(self, value):
self.value = value
# Jython has PyStringMap; it's a dict subclass with string keys
try:
from org.python.core import PyStringMap
except ImportError:
PyStringMap = None
# UnicodeType may or may not be exported (normally imported from types)
try:
UnicodeType
except NameError:
UnicodeType = None
# Pickle opcodes. See pickletools.py for extensive docs. The listing
# here is in kind-of alphabetical order of 1-character pickle code.
# pickletools groups them by purpose.
MARK = '(' # push special markobject on stack
STOP = '.' # every pickle ends with STOP
POP = '0' # discard topmost stack item
POP_MARK = '1' # discard stack top through topmost markobject
DUP = '2' # duplicate top stack item
FLOAT = 'F' # push float object; decimal string argument
INT = 'I' # push integer or bool; decimal string argument
BININT = 'J' # push four-byte signed int
BININT1 = 'K' # push 1-byte unsigned int
LONG = 'L' # push long; decimal string argument
BININT2 = 'M' # push 2-byte unsigned int
NONE = 'N' # push None
PERSID = 'P' # push persistent object; id is taken from string arg
BINPERSID = 'Q' # " " " ; " " " " stack
REDUCE = 'R' # apply callable to argtuple, both on stack
STRING = 'S' # push string; NL-terminated string argument
BINSTRING = 'T' # push string; counted binary string argument
SHORT_BINSTRING = 'U' # " " ; " " " " < 256 bytes
UNICODE = 'V' # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE = 'X' # " " " ; counted UTF-8 string argument
APPEND = 'a' # append stack top to list below it
BUILD = 'b' # call __setstate__ or __dict__.update()
GLOBAL = 'c' # push self.find_class(modname, name); 2 string args
DICT = 'd' # build a dict from stack items
EMPTY_DICT = '}' # push empty dict
APPENDS = 'e' # extend list on stack by topmost stack slice
GET = 'g' # push item from memo on stack; index is string arg
BINGET = 'h' # " " " " " " ; " " 1-byte arg
INST = 'i' # build & push class instance
LONG_BINGET = 'j' # push item from memo on stack; index is 4-byte arg
LIST = 'l' # build list from topmost stack items
EMPTY_LIST = ']' # push empty list
OBJ = 'o' # build & push class instance
PUT = 'p' # store stack top in memo; index is string arg
BINPUT = 'q' # " " " " " ; " " 1-byte arg
LONG_BINPUT = 'r' # " " " " " ; " " 4-byte arg
SETITEM = 's' # add key+value pair to dict
TUPLE = 't' # build tuple from topmost stack items
EMPTY_TUPLE = ')' # push empty tuple
SETITEMS = 'u' # modify dict by adding topmost key+value pairs
BINFLOAT = 'G' # push float; arg is 8-byte float encoding
TRUE = 'I01\n' # not an opcode; see INT docs in pickletools.py
FALSE = 'I00\n' # not an opcode; see INT docs in pickletools.py
# Protocol 2
PROTO = '\x80' # identify pickle protocol
NEWOBJ = '\x81' # build object by applying cls.__new__ to argtuple
EXT1 = '\x82' # push object from extension registry; 1-byte index
EXT2 = '\x83' # ditto, but 2-byte index
EXT4 = '\x84' # ditto, but 4-byte index
TUPLE1 = '\x85' # build 1-tuple from stack top
TUPLE2 = '\x86' # build 2-tuple from two topmost stack items
TUPLE3 = '\x87' # build 3-tuple from three topmost stack items
NEWTRUE = '\x88' # push True
NEWFALSE = '\x89' # push False
LONG1 = '\x8a' # push long from < 256 bytes
LONG4 = '\x8b' # push really big long
_tuplesize2code = [EMPTY_TUPLE, TUPLE1, TUPLE2, TUPLE3]
__all__.extend([x for x in dir() if re.match("[A-Z][A-Z0-9_]+$",x)])
del x
# Pickling machinery
class Pickler:
def __init__(self, file, protocol=None):
"""This takes a file-like object for writing a pickle data stream.
The optional protocol argument tells the pickler to use the
given protocol; supported protocols are 0, 1, 2. The default
protocol is 0, to be backwards compatible. (Protocol 0 is the
only protocol that can be written to a file opened in text
mode and read back successfully. When using a protocol higher
than 0, make sure the file is opened in binary mode, both when
pickling and unpickling.)
Protocol 1 is more efficient than protocol 0; protocol 2 is
more efficient than protocol 1.
Specifying a negative protocol version selects the highest
protocol version supported. The higher the protocol used, the
more recent the version of Python needed to read the pickle
produced.
The file parameter must have a write() method that accepts a single
string argument. It can thus be an open file object, a StringIO
object, or any other custom object that meets this interface.
"""
if protocol is None:
protocol = 0
if protocol < 0:
protocol = HIGHEST_PROTOCOL
elif not 0 <= protocol <= HIGHEST_PROTOCOL:
raise ValueError("pickle protocol must be <= %d" % HIGHEST_PROTOCOL)
self.write = file.write
self.memo = {}
self.proto = int(protocol)
self.bin = protocol >= 1
self.fast = 0
def clear_memo(self):
"""Clears the pickler's "memo".
The memo is the data structure that remembers which objects the
pickler has already seen, so that shared or recursive objects are
pickled by reference and not by value. This method is useful when
re-using picklers.
"""
self.memo.clear()
def dump(self, obj):
"""Write a pickled representation of obj to the open file."""
if self.proto >= 2:
self.write(PROTO + chr(self.proto))
self.save(obj)
self.write(STOP)
def memoize(self, obj):
"""Store an object in the memo."""
# The Pickler memo is a dictionary mapping object ids to 2-tuples
# that contain the Unpickler memo key and the object being memoized.
# The memo key is written to the pickle and will become
# the key in the Unpickler's memo. The object is stored in the
# Pickler memo so that transient objects are kept alive during
# pickling.
# The use of the Unpickler memo length as the memo key is just a
# convention. The only requirement is that the memo values be unique.
# But there appears no advantage to any other scheme, and this
# scheme allows the Unpickler memo to be implemented as a plain (but
# growable) array, indexed by memo key.
if self.fast:
return
assert id(obj) not in self.memo
memo_len = len(self.memo)
self.write(self.put(memo_len))
self.memo[id(obj)] = memo_len, obj
# Return a PUT (BINPUT, LONG_BINPUT) opcode string, with argument i.
def put(self, i, pack=struct.pack):
if self.bin:
if i < 256:
return BINPUT + chr(i)
else:
return LONG_BINPUT + pack("<i", i)
return PUT + repr(i) + '\n'
# Return a GET (BINGET, LONG_BINGET) opcode string, with argument i.
def get(self, i, pack=struct.pack):
if self.bin:
if i < 256:
return BINGET + chr(i)
else:
return LONG_BINGET + pack("<i", i)
return GET + repr(i) + '\n'
def save(self, obj):
# Check for persistent id (defined by a subclass)
pid = self.persistent_id(obj)
if pid is not None:
self.save_pers(pid)
return
# Check the memo
x = self.memo.get(id(obj))
if x:
self.write(self.get(x[0]))
return
# Check the type dispatch table
t = type(obj)
f = self.dispatch.get(t)
if f:
f(self, obj) # Call unbound method with explicit self
return
# Check copy_reg.dispatch_table
reduce = dispatch_table.get(t)
if reduce:
rv = reduce(obj)
else:
# Check for a class with a custom metaclass; treat as regular class
try:
issc = issubclass(t, TypeType)
except TypeError: # t is not a class (old Boost; see SF #502085)
issc = 0
if issc:
self.save_global(obj)
return
# Check for a __reduce_ex__ method, fall back to __reduce__
reduce = getattr(obj, "__reduce_ex__", None)
if reduce:
rv = reduce(self.proto)
else:
reduce = getattr(obj, "__reduce__", None)
if reduce:
rv = reduce()
else:
raise PicklingError("Can't pickle %r object: %r" %
(t.__name__, obj))
# Check for string returned by reduce(), meaning "save as global"
if type(rv) is StringType:
self.save_global(obj, rv)
return
# Assert that reduce() returned a tuple
if type(rv) is not TupleType:
raise PicklingError("%s must return string or tuple" % reduce)
# Assert that it returned an appropriately sized tuple
l = len(rv)
if not (2 <= l <= 5):
raise PicklingError("Tuple returned by %s must have "
"two to five elements" % reduce)
# Save the reduce() output and finally memoize the object
self.save_reduce(obj=obj, *rv)
def persistent_id(self, obj):
# This exists so a subclass can override it
return None
def save_pers(self, pid):
# Save a persistent id reference
if self.bin:
self.save(pid)
self.write(BINPERSID)
else:
self.write(PERSID + str(pid) + '\n')
def save_reduce(self, func, args, state=None,
listitems=None, dictitems=None, obj=None):
# This API is called by some subclasses
# Assert that args is a tuple or None
if not isinstance(args, TupleType):
raise PicklingError("args from reduce() should be a tuple")
# Assert that func is callable
if not hasattr(func, '__call__'):
raise PicklingError("func from reduce should be callable")
save = self.save
write = self.write
# Protocol 2 special case: if func's name is __newobj__, use NEWOBJ
if self.proto >= 2 and getattr(func, "__name__", "") == "__newobj__":
# A __reduce__ implementation can direct protocol 2 to
# use the more efficient NEWOBJ opcode, while still
# allowing protocol 0 and 1 to work normally. For this to
# work, the function returned by __reduce__ should be
# called __newobj__, and its first argument should be a
# new-style class. The implementation for __newobj__
# should be as follows, although pickle has no way to
# verify this:
#
# def __newobj__(cls, *args):
# return cls.__new__(cls, *args)
#
# Protocols 0 and 1 will pickle a reference to __newobj__,
# while protocol 2 (and above) will pickle a reference to
# cls, the remaining args tuple, and the NEWOBJ code,
# which calls cls.__new__(cls, *args) at unpickling time
# (see load_newobj below). If __reduce__ returns a
# three-tuple, the state from the third tuple item will be
# pickled regardless of the protocol, calling __setstate__
# at unpickling time (see load_build below).
#
# Note that no standard __newobj__ implementation exists;
# you have to provide your own. This is to enforce
# compatibility with Python 2.2 (pickles written using
# protocol 0 or 1 in Python 2.3 should be unpicklable by
# Python 2.2).
cls = args[0]
if not hasattr(cls, "__new__"):
raise PicklingError(
"args[0] from __newobj__ args has no __new__")
if obj is not None and cls is not obj.__class__:
raise PicklingError(
"args[0] from __newobj__ args has the wrong class")
args = args[1:]
save(cls)
save(args)
write(NEWOBJ)
else:
save(func)
save(args)
write(REDUCE)
if obj is not None:
# If the object is already in the memo, this means it is
# recursive. In this case, throw away everything we put on the
# stack, and fetch the object back from the memo.
if id(obj) in self.memo:
write(POP + self.get(self.memo[id(obj)][0]))
else:
self.memoize(obj)
# More new special cases (that work with older protocols as
# well): when __reduce__ returns a tuple with 4 or 5 items,
# the 4th and 5th item should be iterators that provide list
# items and dict items (as (key, value) tuples), or None.
if listitems is not None:
self._batch_appends(listitems)
if dictitems is not None:
self._batch_setitems(dictitems)
if state is not None:
save(state)
write(BUILD)
# Methods below this point are dispatched through the dispatch table
dispatch = {}
def save_none(self, obj):
self.write(NONE)
dispatch[NoneType] = save_none
def save_bool(self, obj):
if self.proto >= 2:
self.write(obj and NEWTRUE or NEWFALSE)
else:
self.write(obj and TRUE or FALSE)
dispatch[bool] = save_bool
def save_int(self, obj, pack=struct.pack):
if self.bin:
# If the int is small enough to fit in a signed 4-byte 2's-comp
# format, we can store it more efficiently than the general
# case.
# First one- and two-byte unsigned ints:
if obj >= 0:
if obj <= 0xff:
self.write(BININT1 + chr(obj))
return
if obj <= 0xffff:
self.write("%c%c%c" % (BININT2, obj&0xff, obj>>8))
return
# Next check for 4-byte signed ints:
high_bits = obj >> 31 # note that Python shift sign-extends
if high_bits == 0 or high_bits == -1:
# All high bits are copies of bit 2**31, so the value
# fits in a 4-byte signed int.
self.write(BININT + pack("<i", obj))
return
# Text pickle, or int too big to fit in signed 4-byte format.
self.write(INT + repr(obj) + '\n')
dispatch[IntType] = save_int
def save_long(self, obj, pack=struct.pack):
if self.proto >= 2:
bytes = encode_long(obj)
n = len(bytes)
if n < 256:
self.write(LONG1 + chr(n) + bytes)
else:
self.write(LONG4 + pack("<i", n) + bytes)
return
self.write(LONG + repr(obj) + '\n')
dispatch[LongType] = save_long
def save_float(self, obj, pack=struct.pack):
if self.bin:
self.write(BINFLOAT + pack('>d', obj))
else:
self.write(FLOAT + repr(obj) + '\n')
dispatch[FloatType] = save_float
def save_string(self, obj, pack=struct.pack):
if self.bin:
n = len(obj)
if n < 256:
self.write(SHORT_BINSTRING + chr(n) + obj)
else:
self.write(BINSTRING + pack("<i", n) + obj)
else:
self.write(STRING + repr(obj) + '\n')
self.memoize(obj)
dispatch[StringType] = save_string
def save_unicode(self, obj, pack=struct.pack):
if self.bin:
encoding = obj.encode('utf-8')
n = len(encoding)
self.write(BINUNICODE + pack("<i", n) + encoding)
else:
obj = obj.replace("\\", "\\u005c")
obj = obj.replace("\n", "\\u000a")
self.write(UNICODE + obj.encode('raw-unicode-escape') + '\n')
self.memoize(obj)
dispatch[UnicodeType] = save_unicode
if StringType is UnicodeType:
# This is true for Jython
def save_string(self, obj, pack=struct.pack):
unicode = obj.isunicode()
if self.bin:
if unicode:
obj = obj.encode("utf-8")
l = len(obj)
if l < 256 and not unicode:
self.write(SHORT_BINSTRING + chr(l) + obj)
else:
s = pack("<i", l)
if unicode:
self.write(BINUNICODE + s + obj)
else:
self.write(BINSTRING + s + obj)
else:
if unicode:
obj = obj.replace("\\", "\\u005c")
obj = obj.replace("\n", "\\u000a")
obj = obj.encode('raw-unicode-escape')
self.write(UNICODE + obj + '\n')
else:
self.write(STRING + repr(obj) + '\n')
self.memoize(obj)
dispatch[StringType] = save_string
def save_tuple(self, obj):
write = self.write
proto = self.proto
n = len(obj)
if n == 0:
if proto:
write(EMPTY_TUPLE)
else:
write(MARK + TUPLE)
return
save = self.save
memo = self.memo
if n <= 3 and proto >= 2:
for element in obj:
save(element)
# Subtle. Same as in the big comment below.
if id(obj) in memo:
get = self.get(memo[id(obj)][0])
write(POP * n + get)
else:
write(_tuplesize2code[n])
self.memoize(obj)
return
# proto 0 or proto 1 and tuple isn't empty, or proto > 1 and tuple
# has more than 3 elements.
write(MARK)
for element in obj:
save(element)
if id(obj) in memo:
# Subtle. d was not in memo when we entered save_tuple(), so
# the process of saving the tuple's elements must have saved
# the tuple itself: the tuple is recursive. The proper action
# now is to throw away everything we put on the stack, and
# simply GET the tuple (it's already constructed). This check
# could have been done in the "for element" loop instead, but
# recursive tuples are a rare thing.
get = self.get(memo[id(obj)][0])
if proto:
write(POP_MARK + get)
else: # proto 0 -- POP_MARK not available
write(POP * (n+1) + get)
return
# No recursion.
self.write(TUPLE)
self.memoize(obj)
dispatch[TupleType] = save_tuple
# save_empty_tuple() isn't used by anything in Python 2.3. However, I
# found a Pickler subclass in Zope3 that calls it, so it's not harmless
# to remove it.
def save_empty_tuple(self, obj):
self.write(EMPTY_TUPLE)
def save_list(self, obj):
write = self.write
if self.bin:
write(EMPTY_LIST)
else: # proto 0 -- can't use EMPTY_LIST
write(MARK + LIST)
self.memoize(obj)
self._batch_appends(iter(obj))
dispatch[ListType] = save_list
# Keep in synch with cPickle's BATCHSIZE. Nothing will break if it gets
# out of synch, though.
_BATCHSIZE = 1000
def _batch_appends(self, items):
# Helper to batch up APPENDS sequences
save = self.save
write = self.write
if not self.bin:
for x in items:
save(x)
write(APPEND)
return
r = xrange(self._BATCHSIZE)
while items is not None:
tmp = []
for i in r:
try:
x = items.next()
tmp.append(x)
except StopIteration:
items = None
break
n = len(tmp)
if n > 1:
write(MARK)
for x in tmp:
save(x)
write(APPENDS)
elif n:
save(tmp[0])
write(APPEND)
# else tmp is empty, and we're done
def save_dict(self, obj):
write = self.write
if self.bin:
write(EMPTY_DICT)
else: # proto 0 -- can't use EMPTY_DICT
write(MARK + DICT)
self.memoize(obj)
self._batch_setitems(obj.iteritems())
dispatch[DictionaryType] = save_dict
if not PyStringMap is None:
dispatch[PyStringMap] = save_dict
def _batch_setitems(self, items):
# Helper to batch up SETITEMS sequences; proto >= 1 only
save = self.save
write = self.write
if not self.bin:
for k, v in items:
save(k)
save(v)
write(SETITEM)
return
r = xrange(self._BATCHSIZE)
while items is not None:
tmp = []
for i in r:
try:
tmp.append(items.next())
except StopIteration:
items = None
break
n = len(tmp)
if n > 1:
write(MARK)
for k, v in tmp:
save(k)
save(v)
write(SETITEMS)
elif n:
k, v = tmp[0]
save(k)
save(v)
write(SETITEM)
# else tmp is empty, and we're done
def save_inst(self, obj):
cls = obj.__class__
memo = self.memo
write = self.write
save = self.save
if hasattr(obj, '__getinitargs__'):
args = obj.__getinitargs__()
len(args) # XXX Assert it's a sequence
_keep_alive(args, memo)
else:
args = ()
write(MARK)
if self.bin:
save(cls)
for arg in args:
save(arg)
write(OBJ)
else:
for arg in args:
save(arg)
write(INST + cls.__module__ + '\n' + cls.__name__ + '\n')
self.memoize(obj)
try:
getstate = obj.__getstate__
except AttributeError:
stuff = obj.__dict__
else:
stuff = getstate()
_keep_alive(stuff, memo)
save(stuff)
write(BUILD)
dispatch[InstanceType] = save_inst
def save_global(self, obj, name=None, pack=struct.pack):
write = self.write
memo = self.memo
if name is None:
name = obj.__name__
module = getattr(obj, "__module__", None)
if module is None:
module = whichmodule(obj, name)
try:
__import__(module)
mod = sys.modules[module]
klass = getattr(mod, name)
except (ImportError, KeyError, AttributeError):
raise PicklingError(
"Can't pickle %r: it's not found as %s.%s" %
(obj, module, name))
else:
if klass is not obj:
raise PicklingError(
"Can't pickle %r: it's not the same object as %s.%s" %
(obj, module, name))
if self.proto >= 2:
code = _extension_registry.get((module, name))
if code:
assert code > 0
if code <= 0xff:
write(EXT1 + chr(code))
elif code <= 0xffff:
write("%c%c%c" % (EXT2, code&0xff, code>>8))
else:
write(EXT4 + pack("<i", code))
return
write(GLOBAL + module + '\n' + name + '\n')
self.memoize(obj)
dispatch[ClassType] = save_global
dispatch[FunctionType] = save_global
dispatch[BuiltinFunctionType] = save_global
dispatch[TypeType] = save_global
# Pickling helpers
def _keep_alive(x, memo):
"""Keeps a reference to the object x in the memo.
Because we remember objects by their id, we have
to assure that possibly temporary objects are kept
alive by referencing them.
We store a reference at the id of the memo, which should
normally not be used unless someone tries to deepcopy
the memo itself...
"""
try:
memo[id(memo)].append(x)
except KeyError:
# aha, this is the first one :-)
memo[id(memo)]=[x]
# A cache for whichmodule(), mapping a function object to the name of
# the module in which the function was found.
classmap = {} # called classmap for backwards compatibility
def whichmodule(func, funcname):
"""Figure out the module in which a function occurs.
Search sys.modules for the module.
Cache in classmap.
Return a module name.
If the function cannot be found, return "__main__".
"""
# Python functions should always get an __module__ from their globals.
mod = getattr(func, "__module__", None)
if mod is not None:
return mod
if func in classmap:
return classmap[func]
for name, module in sys.modules.items():
if module is None:
continue # skip dummy package entries
if name != '__main__' and getattr(module, funcname, None) is func:
break
else:
name = '__main__'
classmap[func] = name
return name
# Unpickling machinery
class Unpickler:
def __init__(self, file):
"""This takes a file-like object for reading a pickle data stream.
The protocol version of the pickle is detected automatically, so no
proto argument is needed.
The file-like object must have two methods, a read() method that
takes an integer argument, and a readline() method that requires no
arguments. Both methods should return a string. Thus file-like
object can be a file object opened for reading, a StringIO object,
or any other custom object that meets this interface.
"""
self.readline = file.readline
self.read = file.read
self.memo = {}
def load(self):
"""Read a pickled object representation from the open file.
Return the reconstituted object hierarchy specified in the file.
"""
self.mark = object() # any new unique object
self.stack = []
self.append = self.stack.append
read = self.read
dispatch = self.dispatch
try:
while 1:
key = read(1)
dispatch[key](self)
except _Stop, stopinst:
return stopinst.value
# Return largest index k such that self.stack[k] is self.mark.
# If the stack doesn't contain a mark, eventually raises IndexError.
# This could be sped by maintaining another stack, of indices at which
# the mark appears. For that matter, the latter stack would suffice,
# and we wouldn't need to push mark objects on self.stack at all.
# Doing so is probably a good thing, though, since if the pickle is
# corrupt (or hostile) we may get a clue from finding self.mark embedded
# in unpickled objects.
def marker(self):
stack = self.stack
mark = self.mark
k = len(stack)-1
while stack[k] is not mark: k = k-1
return k
dispatch = {}
def load_eof(self):
raise EOFError
dispatch[''] = load_eof
def load_proto(self):
proto = ord(self.read(1))
if not 0 <= proto <= 2:
raise ValueError, "unsupported pickle protocol: %d" % proto
dispatch[PROTO] = load_proto
def load_persid(self):
pid = self.readline()[:-1]
self.append(self.persistent_load(pid))
dispatch[PERSID] = load_persid
def load_binpersid(self):
pid = self.stack.pop()
self.append(self.persistent_load(pid))
dispatch[BINPERSID] = load_binpersid
def load_none(self):
self.append(None)
dispatch[NONE] = load_none
def load_false(self):
self.append(False)
dispatch[NEWFALSE] = load_false
def load_true(self):
self.append(True)
dispatch[NEWTRUE] = load_true
def load_int(self):
data = self.readline()
if data == FALSE[1:]:
val = False
elif data == TRUE[1:]:
val = True
else:
try:
val = int(data)
except ValueError:
val = long(data)
self.append(val)
dispatch[INT] = load_int
def load_binint(self):
self.append(mloads('i' + self.read(4)))
dispatch[BININT] = load_binint
def load_binint1(self):
self.append(ord(self.read(1)))
dispatch[BININT1] = load_binint1
def load_binint2(self):
self.append(mloads('i' + self.read(2) + '\000\000'))
dispatch[BININT2] = load_binint2
def load_long(self):
self.append(long(self.readline()[:-1], 0))
dispatch[LONG] = load_long
def load_long1(self):
n = ord(self.read(1))
bytes = self.read(n)
self.append(decode_long(bytes))
dispatch[LONG1] = load_long1
def load_long4(self):
n = mloads('i' + self.read(4))
bytes = self.read(n)
self.append(decode_long(bytes))
dispatch[LONG4] = load_long4
def load_float(self):
self.append(float(self.readline()[:-1]))
dispatch[FLOAT] = load_float
def load_binfloat(self, unpack=struct.unpack):
self.append(unpack('>d', self.read(8))[0])
dispatch[BINFLOAT] = load_binfloat
def load_string(self):
rep = self.readline()[:-1]
for q in "\"'": # double or single quote
if rep.startswith(q):
if len(rep) < 2 or not rep.endswith(q):
raise ValueError, "insecure string pickle"
rep = rep[len(q):-len(q)]
break
else:
raise ValueError, "insecure string pickle"
self.append(rep.decode("string-escape"))
dispatch[STRING] = load_string
def load_binstring(self):
len = mloads('i' + self.read(4))
self.append(self.read(len))
dispatch[BINSTRING] = load_binstring
def load_unicode(self):
self.append(unicode(self.readline()[:-1],'raw-unicode-escape'))
dispatch[UNICODE] = load_unicode
def load_binunicode(self):
len = mloads('i' + self.read(4))
self.append(unicode(self.read(len),'utf-8'))
dispatch[BINUNICODE] = load_binunicode
def load_short_binstring(self):
len = ord(self.read(1))
self.append(self.read(len))
dispatch[SHORT_BINSTRING] = load_short_binstring
def load_tuple(self):
k = self.marker()
self.stack[k:] = [tuple(self.stack[k+1:])]
dispatch[TUPLE] = load_tuple
def load_empty_tuple(self):
self.stack.append(())
dispatch[EMPTY_TUPLE] = load_empty_tuple
def load_tuple1(self):
self.stack[-1] = (self.stack[-1],)
dispatch[TUPLE1] = load_tuple1
def load_tuple2(self):
self.stack[-2:] = [(self.stack[-2], self.stack[-1])]
dispatch[TUPLE2] = load_tuple2
def load_tuple3(self):
self.stack[-3:] = [(self.stack[-3], self.stack[-2], self.stack[-1])]
dispatch[TUPLE3] = load_tuple3
def load_empty_list(self):
self.stack.append([])
dispatch[EMPTY_LIST] = load_empty_list
def load_empty_dictionary(self):
self.stack.append({})
dispatch[EMPTY_DICT] = load_empty_dictionary
def load_list(self):
k = self.marker()
self.stack[k:] = [self.stack[k+1:]]
dispatch[LIST] = load_list
def load_dict(self):
k = self.marker()
d = {}
items = self.stack[k+1:]
for i in range(0, len(items), 2):
key = items[i]
value = items[i+1]
d[key] = value
self.stack[k:] = [d]
dispatch[DICT] = load_dict
# INST and OBJ differ only in how they get a class object. It's not
# only sensible to do the rest in a common routine, the two routines
# previously diverged and grew different bugs.
# klass is the class to instantiate, and k points to the topmost mark
# object, following which are the arguments for klass.__init__.
def _instantiate(self, klass, k):
args = tuple(self.stack[k+1:])
del self.stack[k:]
instantiated = 0
if (not args and
type(klass) is ClassType and
not hasattr(klass, "__getinitargs__")):
try:
value = _EmptyClass()
value.__class__ = klass
instantiated = 1
except RuntimeError:
# In restricted execution, assignment to inst.__class__ is
# prohibited
pass
if not instantiated:
try:
value = klass(*args)
except TypeError, err:
raise TypeError, "in constructor for %s: %s" % (
klass.__name__, str(err)), sys.exc_info()[2]
self.append(value)
def load_inst(self):
module = self.readline()[:-1]
name = self.readline()[:-1]
klass = self.find_class(module, name)
self._instantiate(klass, self.marker())
dispatch[INST] = load_inst
def load_obj(self):
# Stack is ... markobject classobject arg1 arg2 ...
k = self.marker()
klass = self.stack.pop(k+1)
self._instantiate(klass, k)
dispatch[OBJ] = load_obj
def load_newobj(self):
args = self.stack.pop()
cls = self.stack[-1]
obj = cls.__new__(cls, *args)
self.stack[-1] = obj
dispatch[NEWOBJ] = load_newobj
def load_global(self):
module = self.readline()[:-1]
name = self.readline()[:-1]
klass = self.find_class(module, name)
self.append(klass)
dispatch[GLOBAL] = load_global
def load_ext1(self):
code = ord(self.read(1))
self.get_extension(code)
dispatch[EXT1] = load_ext1
def load_ext2(self):
code = mloads('i' + self.read(2) + '\000\000')
self.get_extension(code)
dispatch[EXT2] = load_ext2
def load_ext4(self):
code = mloads('i' + self.read(4))
self.get_extension(code)
dispatch[EXT4] = load_ext4
def get_extension(self, code):
nil = []
obj = _extension_cache.get(code, nil)
if obj is not nil:
self.append(obj)
return
key = _inverted_registry.get(code)
if not key:
raise ValueError("unregistered extension code %d" % code)
obj = self.find_class(*key)
_extension_cache[code] = obj
self.append(obj)
def find_class(self, module, name):
# Subclasses may override this
__import__(module)
mod = sys.modules[module]
klass = getattr(mod, name)
return klass
def load_reduce(self):
stack = self.stack
args = stack.pop()
func = stack[-1]
value = func(*args)
stack[-1] = value
dispatch[REDUCE] = load_reduce
def load_pop(self):
del self.stack[-1]
dispatch[POP] = load_pop
def load_pop_mark(self):
k = self.marker()
del self.stack[k:]
dispatch[POP_MARK] = load_pop_mark
def load_dup(self):
self.append(self.stack[-1])
dispatch[DUP] = load_dup
def load_get(self):
self.append(self.memo[self.readline()[:-1]])
dispatch[GET] = load_get
def load_binget(self):
i = ord(self.read(1))
self.append(self.memo[repr(i)])
dispatch[BINGET] = load_binget
def load_long_binget(self):
i = mloads('i' + self.read(4))
self.append(self.memo[repr(i)])
dispatch[LONG_BINGET] = load_long_binget
def load_put(self):
self.memo[self.readline()[:-1]] = self.stack[-1]
dispatch[PUT] = load_put
def load_binput(self):
i = ord(self.read(1))
self.memo[repr(i)] = self.stack[-1]
dispatch[BINPUT] = load_binput
def load_long_binput(self):
i = mloads('i' + self.read(4))
self.memo[repr(i)] = self.stack[-1]
dispatch[LONG_BINPUT] = load_long_binput
def load_append(self):
stack = self.stack
value = stack.pop()
list = stack[-1]
list.append(value)
dispatch[APPEND] = load_append
def load_appends(self):
stack = self.stack
mark = self.marker()
list = stack[mark - 1]
list.extend(stack[mark + 1:])
del stack[mark:]
dispatch[APPENDS] = load_appends
def load_setitem(self):
stack = self.stack
value = stack.pop()
key = stack.pop()
dict = stack[-1]
dict[key] = value
dispatch[SETITEM] = load_setitem
def load_setitems(self):
stack = self.stack
mark = self.marker()
dict = stack[mark - 1]
for i in range(mark + 1, len(stack), 2):
dict[stack[i]] = stack[i + 1]
del stack[mark:]
dispatch[SETITEMS] = load_setitems
def load_build(self):
stack = self.stack
state = stack.pop()
inst = stack[-1]
setstate = getattr(inst, "__setstate__", None)
if setstate:
setstate(state)
return
slotstate = None
if isinstance(state, tuple) and len(state) == 2:
state, slotstate = state
if state:
try:
d = inst.__dict__
try:
for k, v in state.iteritems():
d[intern(k)] = v
# keys in state don't have to be strings
# don't blow up, but don't go out of our way
except TypeError:
d.update(state)
except RuntimeError:
# XXX In restricted execution, the instance's __dict__
# is not accessible. Use the old way of unpickling
# the instance variables. This is a semantic
# difference when unpickling in restricted
# vs. unrestricted modes.
# Note, however, that cPickle has never tried to do the
# .update() business, and always uses
# PyObject_SetItem(inst.__dict__, key, value) in a
# loop over state.items().
for k, v in state.items():
setattr(inst, k, v)
if slotstate:
for k, v in slotstate.items():
setattr(inst, k, v)
dispatch[BUILD] = load_build
def load_mark(self):
self.append(self.mark)
dispatch[MARK] = load_mark
def load_stop(self):
value = self.stack.pop()
raise _Stop(value)
dispatch[STOP] = load_stop
# Helper class for load_inst/load_obj
class _EmptyClass:
pass
# Encode/decode longs in linear time.
import binascii as _binascii
def encode_long(x):
r"""Encode a long to a two's complement little-endian binary string.
Note that 0L is a special case, returning an empty string, to save a
byte in the LONG1 pickling context.
>>> encode_long(0L)
''
>>> encode_long(255L)
'\xff\x00'
>>> encode_long(32767L)
'\xff\x7f'
>>> encode_long(-256L)
'\x00\xff'
>>> encode_long(-32768L)
'\x00\x80'
>>> encode_long(-128L)
'\x80'
>>> encode_long(127L)
'\x7f'
>>>
"""
if x == 0:
return ''
if x > 0:
ashex = hex(x)
assert ashex.startswith("0x")
njunkchars = 2 + ashex.endswith('L')
nibbles = len(ashex) - njunkchars
if nibbles & 1:
# need an even # of nibbles for unhexlify
ashex = "0x0" + ashex[2:]
elif int(ashex[2], 16) >= 8:
# "looks negative", so need a byte of sign bits
ashex = "0x00" + ashex[2:]
else:
# Build the 256's-complement: (1L << nbytes) + x. The trick is
# to find the number of bytes in linear time (although that should
# really be a constant-time task).
ashex = hex(-x)
assert ashex.startswith("0x")
njunkchars = 2 + ashex.endswith('L')
nibbles = len(ashex) - njunkchars
if nibbles & 1:
# Extend to a full byte.
nibbles += 1
nbits = nibbles * 4
x += 1L << nbits
assert x > 0
ashex = hex(x)
njunkchars = 2 + ashex.endswith('L')
newnibbles = len(ashex) - njunkchars
if newnibbles < nibbles:
ashex = "0x" + "0" * (nibbles - newnibbles) + ashex[2:]
if int(ashex[2], 16) < 8:
# "looks positive", so need a byte of sign bits
ashex = "0xff" + ashex[2:]
if ashex.endswith('L'):
ashex = ashex[2:-1]
else:
ashex = ashex[2:]
assert len(ashex) & 1 == 0, (x, ashex)
binary = _binascii.unhexlify(ashex)
return binary[::-1]
def decode_long(data):
r"""Decode a long from a two's complement little-endian binary string.
>>> decode_long('')
0L
>>> decode_long("\xff\x00")
255L
>>> decode_long("\xff\x7f")
32767L
>>> decode_long("\x00\xff")
-256L
>>> decode_long("\x00\x80")
-32768L
>>> decode_long("\x80")
-128L
>>> decode_long("\x7f")
127L
"""
nbytes = len(data)
if nbytes == 0:
return 0L
ashex = _binascii.hexlify(data[::-1])
n = long(ashex, 16) # quadratic time before Python 2.3; linear now
if data[-1] >= '\x80':
n -= 1L << (nbytes * 8)
return n
# Shorthands
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
def dump(obj, file, protocol=None):
Pickler(file, protocol).dump(obj)
def dumps(obj, protocol=None):
file = StringIO()
Pickler(file, protocol).dump(obj)
return file.getvalue()
def load(file):
return Unpickler(file).load()
def loads(str):
file = StringIO(str)
return Unpickler(file).load()
# Doctest
def _test():
import doctest
return doctest.testmod()
if __name__ == "__main__":
_test()
'''"Executable documentation" for the pickle module.
Extensive comments about the pickle protocols and pickle-machine opcodes
can be found here. Some functions meant for external use:
genops(pickle)
Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
dis(pickle, out=None, memo=None, indentlevel=4)
Print a symbolic disassembly of a pickle.
'''
__all__ = ['dis', 'genops', 'optimize']
# Other ideas:
#
# - A pickle verifier: read a pickle and check it exhaustively for
# well-formedness. dis() does a lot of this already.
#
# - A protocol identifier: examine a pickle and return its protocol number
# (== the highest .proto attr value among all the opcodes in the pickle).
# dis() already prints this info at the end.
#
# - A pickle optimizer: for example, tuple-building code is sometimes more
# elaborate than necessary, catering for the possibility that the tuple
# is recursive. Or lots of times a PUT is generated that's never accessed
# by a later GET.
"""
"A pickle" is a program for a virtual pickle machine (PM, but more accurately
called an unpickling machine). It's a sequence of opcodes, interpreted by the
PM, building an arbitrarily complex Python object.
For the most part, the PM is very simple: there are no looping, testing, or
conditional instructions, no arithmetic and no function calls. Opcodes are
executed once each, from first to last, until a STOP opcode is reached.
The PM has two data areas, "the stack" and "the memo".
Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
integer object on the stack, whose value is gotten from a decimal string
literal immediately following the INT opcode in the pickle bytestream. Other
opcodes take Python objects off the stack. The result of unpickling is
whatever object is left on the stack when the final STOP opcode is executed.
The memo is simply an array of objects, or it can be implemented as a dict
mapping little integers to objects. The memo serves as the PM's "long term
memory", and the little integers indexing the memo are akin to variable
names. Some opcodes pop a stack object into the memo at a given index,
and others push a memo object at a given index onto the stack again.
At heart, that's all the PM has. Subtleties arise for these reasons:
+ Object identity. Objects can be arbitrarily complex, and subobjects
may be shared (for example, the list [a, a] refers to the same object a
twice). It can be vital that unpickling recreate an isomorphic object
graph, faithfully reproducing sharing.
+ Recursive objects. For example, after "L = []; L.append(L)", L is a
list, and L[0] is the same list. This is related to the object identity
point, and some sequences of pickle opcodes are subtle in order to
get the right result in all cases.
+ Things pickle doesn't know everything about. Examples of things pickle
does know everything about are Python's builtin scalar and container
types, like ints and tuples. They generally have opcodes dedicated to
them. For things like module references and instances of user-defined
classes, pickle's knowledge is limited. Historically, many enhancements
have been made to the pickle protocol in order to do a better (faster,
and/or more compact) job on those.
+ Backward compatibility and micro-optimization. As explained below,
pickle opcodes never go away, not even when better ways to do a thing
get invented. The repertoire of the PM just keeps growing over time.
For example, protocol 0 had two opcodes for building Python integers (INT
and LONG), protocol 1 added three more for more-efficient pickling of short
integers, and protocol 2 added two more for more-efficient pickling of
long integers (before protocol 2, the only ways to pickle a Python long
took time quadratic in the number of digits, for both pickling and
unpickling). "Opcode bloat" isn't so much a subtlety as a source of
wearying complication.
Pickle protocols:
For compatibility, the meaning of a pickle opcode never changes. Instead new
pickle opcodes get added, and each version's unpickler can handle all the
pickle opcodes in all protocol versions to date. So old pickles continue to
be readable forever. The pickler can generally be told to restrict itself to
the subset of opcodes available under previous protocol versions too, so that
users can create pickles under the current version readable by older
versions. However, a pickle does not contain its version number embedded
within it. If an older unpickler tries to read a pickle using a later
protocol, the result is most likely an exception due to seeing an unknown (in
the older unpickler) opcode.
The original pickle used what's now called "protocol 0", and what was called
"text mode" before Python 2.3. The entire pickle bytestream is made up of
printable 7-bit ASCII characters, plus the newline character, in protocol 0.
That's why it was called text mode. Protocol 0 is small and elegant, but
sometimes painfully inefficient.
The second major set of additions is now called "protocol 1", and was called
"binary mode" before Python 2.3. This added many opcodes with arguments
consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
bytes. Binary mode pickles can be substantially smaller than equivalent
text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
int as 4 bytes following the opcode, which is cheaper to unpickle than the
(perhaps) 11-character decimal string attached to INT. Protocol 1 also added
a number of opcodes that operate on many stack elements at once (like APPENDS
and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
The third major set of additions came in Python 2.3, and is called "protocol
2". This added:
- A better way to pickle instances of new-style classes (NEWOBJ).
- A way for a pickle to identify its protocol (PROTO).
- Time- and space- efficient pickling of long ints (LONG{1,4}).
- Shortcuts for small tuples (TUPLE{1,2,3}}.
- Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
- The "extension registry", a vector of popular objects that can be pushed
efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
the registry contents are predefined (there's nothing akin to the memo's
PUT).
Another independent change with Python 2.3 is the abandonment of any
pretense that it might be safe to load pickles received from untrusted
parties -- no sufficient security analysis has been done to guarantee
this and there isn't a use case that warrants the expense of such an
analysis.
To this end, all tests for __safe_for_unpickling__ or for
copy_reg.safe_constructors are removed from the unpickling code.
References to these variables in the descriptions below are to be seen
as describing unpickling in Python 2.2 and before.
"""
# Meta-rule: Descriptions are stored in instances of descriptor objects,
# with plain constructors. No meta-language is defined from which
# descriptors could be constructed. If you want, e.g., XML, write a little
# program to generate XML from the objects.
##############################################################################
# Some pickle opcodes have an argument, following the opcode in the
# bytestream. An argument is of a specific type, described by an instance
# of ArgumentDescriptor. These are not to be confused with arguments taken
# off the stack -- ArgumentDescriptor applies only to arguments embedded in
# the opcode stream, immediately following an opcode.
# Represents the number of bytes consumed by an argument delimited by the
# next newline character.
UP_TO_NEWLINE = -1
# Represents the number of bytes consumed by a two-argument opcode where
# the first argument gives the number of bytes in the second argument.
TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
class ArgumentDescriptor(object):
__slots__ = (
# name of descriptor record, also a module global name; a string
'name',
# length of argument, in bytes; an int; UP_TO_NEWLINE and
# TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
# cases
'n',
# a function taking a file-like object, reading this kind of argument
# from the object at the current position, advancing the current
# position by n bytes, and returning the value of the argument
'reader',
# human-readable docs for this arg descriptor; a string
'doc',
)
def __init__(self, name, n, reader, doc):
assert isinstance(name, str)
self.name = name
assert isinstance(n, (int, long)) and (n >= 0 or
n in (UP_TO_NEWLINE,
TAKEN_FROM_ARGUMENT1,
TAKEN_FROM_ARGUMENT4))
self.n = n
self.reader = reader
assert isinstance(doc, str)
self.doc = doc
from struct import unpack as _unpack
def read_uint1(f):
r"""
>>> import StringIO
>>> read_uint1(StringIO.StringIO('\xff'))
255
"""
data = f.read(1)
if data:
return ord(data)
raise ValueError("not enough data in stream to read uint1")
uint1 = ArgumentDescriptor(
name='uint1',
n=1,
reader=read_uint1,
doc="One-byte unsigned integer.")
def read_uint2(f):
r"""
>>> import StringIO
>>> read_uint2(StringIO.StringIO('\xff\x00'))
255
>>> read_uint2(StringIO.StringIO('\xff\xff'))
65535
"""
data = f.read(2)
if len(data) == 2:
return _unpack("<H", data)[0]
raise ValueError("not enough data in stream to read uint2")
uint2 = ArgumentDescriptor(
name='uint2',
n=2,
reader=read_uint2,
doc="Two-byte unsigned integer, little-endian.")
def read_int4(f):
r"""
>>> import StringIO
>>> read_int4(StringIO.StringIO('\xff\x00\x00\x00'))
255
>>> read_int4(StringIO.StringIO('\x00\x00\x00\x80')) == -(2**31)
True
"""
data = f.read(4)
if len(data) == 4:
return _unpack("<i", data)[0]
raise ValueError("not enough data in stream to read int4")
int4 = ArgumentDescriptor(
name='int4',
n=4,
reader=read_int4,
doc="Four-byte signed integer, little-endian, 2's complement.")
def read_stringnl(f, decode=True, stripquotes=True):
r"""
>>> import StringIO
>>> read_stringnl(StringIO.StringIO("'abcd'\nefg\n"))
'abcd'
>>> read_stringnl(StringIO.StringIO("\n"))
Traceback (most recent call last):
...
ValueError: no string quotes around ''
>>> read_stringnl(StringIO.StringIO("\n"), stripquotes=False)
''
>>> read_stringnl(StringIO.StringIO("''\n"))
''
>>> read_stringnl(StringIO.StringIO('"abcd"'))
Traceback (most recent call last):
...
ValueError: no newline found when trying to read stringnl
Embedded escapes are undone in the result.
>>> read_stringnl(StringIO.StringIO(r"'a\n\\b\x00c\td'" + "\n'e'"))
'a\n\\b\x00c\td'
"""
data = f.readline()
if not data.endswith('\n'):
raise ValueError("no newline found when trying to read stringnl")
data = data[:-1] # lose the newline
if stripquotes:
for q in "'\"":
if data.startswith(q):
if not data.endswith(q):
raise ValueError("strinq quote %r not found at both "
"ends of %r" % (q, data))
data = data[1:-1]
break
else:
raise ValueError("no string quotes around %r" % data)
# I'm not sure when 'string_escape' was added to the std codecs; it's
# crazy not to use it if it's there.
if decode:
data = data.decode('string_escape')
return data
stringnl = ArgumentDescriptor(
name='stringnl',
n=UP_TO_NEWLINE,
reader=read_stringnl,
doc="""A newline-terminated string.
This is a repr-style string, with embedded escapes, and
bracketing quotes.
""")
def read_stringnl_noescape(f):
return read_stringnl(f, decode=False, stripquotes=False)
stringnl_noescape = ArgumentDescriptor(
name='stringnl_noescape',
n=UP_TO_NEWLINE,
reader=read_stringnl_noescape,
doc="""A newline-terminated string.
This is a str-style string, without embedded escapes,
or bracketing quotes. It should consist solely of
printable ASCII characters.
""")
def read_stringnl_noescape_pair(f):
r"""
>>> import StringIO
>>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\nEmpty\njunk"))
'Queue Empty'
"""
return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
stringnl_noescape_pair = ArgumentDescriptor(
name='stringnl_noescape_pair',
n=UP_TO_NEWLINE,
reader=read_stringnl_noescape_pair,
doc="""A pair of newline-terminated strings.
These are str-style strings, without embedded
escapes, or bracketing quotes. They should
consist solely of printable ASCII characters.
The pair is returned as a single string, with
a single blank separating the two strings.
""")
def read_string4(f):
r"""
>>> import StringIO
>>> read_string4(StringIO.StringIO("\x00\x00\x00\x00abc"))
''
>>> read_string4(StringIO.StringIO("\x03\x00\x00\x00abcdef"))
'abc'
>>> read_string4(StringIO.StringIO("\x00\x00\x00\x03abcdef"))
Traceback (most recent call last):
...
ValueError: expected 50331648 bytes in a string4, but only 6 remain
"""
n = read_int4(f)
if n < 0:
raise ValueError("string4 byte count < 0: %d" % n)
data = f.read(n)
if len(data) == n:
return data
raise ValueError("expected %d bytes in a string4, but only %d remain" %
(n, len(data)))
string4 = ArgumentDescriptor(
name="string4",
n=TAKEN_FROM_ARGUMENT4,
reader=read_string4,
doc="""A counted string.
The first argument is a 4-byte little-endian signed int giving
the number of bytes in the string, and the second argument is
that many bytes.
""")
def read_string1(f):
r"""
>>> import StringIO
>>> read_string1(StringIO.StringIO("\x00"))
''
>>> read_string1(StringIO.StringIO("\x03abcdef"))
'abc'
"""
n = read_uint1(f)
assert n >= 0
data = f.read(n)
if len(data) == n:
return data
raise ValueError("expected %d bytes in a string1, but only %d remain" %
(n, len(data)))
string1 = ArgumentDescriptor(
name="string1",
n=TAKEN_FROM_ARGUMENT1,
reader=read_string1,
doc="""A counted string.
The first argument is a 1-byte unsigned int giving the number
of bytes in the string, and the second argument is that many
bytes.
""")
def read_unicodestringnl(f):
r"""
>>> import StringIO
>>> read_unicodestringnl(StringIO.StringIO("abc\uabcd\njunk"))
u'abc\uabcd'
"""
data = f.readline()
if not data.endswith('\n'):
raise ValueError("no newline found when trying to read "
"unicodestringnl")
data = data[:-1] # lose the newline
return unicode(data, 'raw-unicode-escape')
unicodestringnl = ArgumentDescriptor(
name='unicodestringnl',
n=UP_TO_NEWLINE,
reader=read_unicodestringnl,
doc="""A newline-terminated Unicode string.
This is raw-unicode-escape encoded, so consists of
printable ASCII characters, and may contain embedded
escape sequences.
""")
def read_unicodestring4(f):
r"""
>>> import StringIO
>>> s = u'abcd\uabcd'
>>> enc = s.encode('utf-8')
>>> enc
'abcd\xea\xaf\x8d'
>>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
>>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
>>> s == t
True
>>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
Traceback (most recent call last):
...
ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
"""
n = read_int4(f)
if n < 0:
raise ValueError("unicodestring4 byte count < 0: %d" % n)
data = f.read(n)
if len(data) == n:
return unicode(data, 'utf-8')
raise ValueError("expected %d bytes in a unicodestring4, but only %d "
"remain" % (n, len(data)))
unicodestring4 = ArgumentDescriptor(
name="unicodestring4",
n=TAKEN_FROM_ARGUMENT4,
reader=read_unicodestring4,
doc="""A counted Unicode string.
The first argument is a 4-byte little-endian signed int
giving the number of bytes in the string, and the second
argument-- the UTF-8 encoding of the Unicode string --
contains that many bytes.
""")
def read_decimalnl_short(f):
r"""
>>> import StringIO
>>> read_decimalnl_short(StringIO.StringIO("1234\n56"))
1234
>>> read_decimalnl_short(StringIO.StringIO("1234L\n56"))
Traceback (most recent call last):
...
ValueError: trailing 'L' not allowed in '1234L'
"""
s = read_stringnl(f, decode=False, stripquotes=False)
if s.endswith("L"):
raise ValueError("trailing 'L' not allowed in %r" % s)
# It's not necessarily true that the result fits in a Python short int:
# the pickle may have been written on a 64-bit box. There's also a hack
# for True and False here.
if s == "00":
return False
elif s == "01":
return True
try:
return int(s)
except OverflowError:
return long(s)
def read_decimalnl_long(f):
r"""
>>> import StringIO
>>> read_decimalnl_long(StringIO.StringIO("1234\n56"))
Traceback (most recent call last):
...
ValueError: trailing 'L' required in '1234'
Someday the trailing 'L' will probably go away from this output.
>>> read_decimalnl_long(StringIO.StringIO("1234L\n56"))
1234L
>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\n6"))
123456789012345678901234L
"""
s = read_stringnl(f, decode=False, stripquotes=False)
if not s.endswith("L"):
raise ValueError("trailing 'L' required in %r" % s)
return long(s)
decimalnl_short = ArgumentDescriptor(
name='decimalnl_short',
n=UP_TO_NEWLINE,
reader=read_decimalnl_short,
doc="""A newline-terminated decimal integer literal.
This never has a trailing 'L', and the integer fit
in a short Python int on the box where the pickle
was written -- but there's no guarantee it will fit
in a short Python int on the box where the pickle
is read.
""")
decimalnl_long = ArgumentDescriptor(
name='decimalnl_long',
n=UP_TO_NEWLINE,
reader=read_decimalnl_long,
doc="""A newline-terminated decimal integer literal.
This has a trailing 'L', and can represent integers
of any size.
""")
def read_floatnl(f):
r"""
>>> import StringIO
>>> read_floatnl(StringIO.StringIO("-1.25\n6"))
-1.25
"""
s = read_stringnl(f, decode=False, stripquotes=False)
return float(s)
floatnl = ArgumentDescriptor(
name='floatnl',
n=UP_TO_NEWLINE,
reader=read_floatnl,
doc="""A newline-terminated decimal floating literal.
In general this requires 17 significant digits for roundtrip
identity, and pickling then unpickling infinities, NaNs, and
minus zero doesn't work across boxes, or on some boxes even
on itself (e.g., Windows can't read the strings it produces
for infinities or NaNs).
""")
def read_float8(f):
r"""
>>> import StringIO, struct
>>> raw = struct.pack(">d", -1.25)
>>> raw
'\xbf\xf4\x00\x00\x00\x00\x00\x00'
>>> read_float8(StringIO.StringIO(raw + "\n"))
-1.25
"""
data = f.read(8)
if len(data) == 8:
return _unpack(">d", data)[0]
raise ValueError("not enough data in stream to read float8")
float8 = ArgumentDescriptor(
name='float8',
n=8,
reader=read_float8,
doc="""An 8-byte binary representation of a float, big-endian.
The format is unique to Python, and shared with the struct
module (format string '>d') "in theory" (the struct and cPickle
implementations don't share the code -- they should). It's
strongly related to the IEEE-754 double format, and, in normal
cases, is in fact identical to the big-endian 754 double format.
On other boxes the dynamic range is limited to that of a 754
double, and "add a half and chop" rounding is used to reduce
the precision to 53 bits. However, even on a 754 box,
infinities, NaNs, and minus zero may not be handled correctly
(may not survive roundtrip pickling intact).
""")
# Protocol 2 formats
from pickle import decode_long
def read_long1(f):
r"""
>>> import StringIO
>>> read_long1(StringIO.StringIO("\x00"))
0L
>>> read_long1(StringIO.StringIO("\x02\xff\x00"))
255L
>>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
32767L
>>> read_long1(StringIO.StringIO("\x02\x00\xff"))
-256L
>>> read_long1(StringIO.StringIO("\x02\x00\x80"))
-32768L
"""
n = read_uint1(f)
data = f.read(n)
if len(data) != n:
raise ValueError("not enough data in stream to read long1")
return decode_long(data)
long1 = ArgumentDescriptor(
name="long1",
n=TAKEN_FROM_ARGUMENT1,
reader=read_long1,
doc="""A binary long, little-endian, using 1-byte size.
This first reads one byte as an unsigned size, then reads that
many bytes and interprets them as a little-endian 2's-complement long.
If the size is 0, that's taken as a shortcut for the long 0L.
""")
def read_long4(f):
r"""
>>> import StringIO
>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
255L
>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
32767L
>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
-256L
>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
-32768L
>>> read_long1(StringIO.StringIO("\x00\x00\x00\x00"))
0L
"""
n = read_int4(f)
if n < 0:
raise ValueError("long4 byte count < 0: %d" % n)
data = f.read(n)
if len(data) != n:
raise ValueError("not enough data in stream to read long4")
return decode_long(data)
long4 = ArgumentDescriptor(
name="long4",
n=TAKEN_FROM_ARGUMENT4,
reader=read_long4,
doc="""A binary representation of a long, little-endian.
This first reads four bytes as a signed size (but requires the
size to be >= 0), then reads that many bytes and interprets them
as a little-endian 2's-complement long. If the size is 0, that's taken
as a shortcut for the long 0L, although LONG1 should really be used
then instead (and in any case where # of bytes < 256).
""")
##############################################################################
# Object descriptors. The stack used by the pickle machine holds objects,
# and in the stack_before and stack_after attributes of OpcodeInfo
# descriptors we need names to describe the various types of objects that can
# appear on the stack.
class StackObject(object):
__slots__ = (
# name of descriptor record, for info only
'name',
# type of object, or tuple of type objects (meaning the object can
# be of any type in the tuple)
'obtype',
# human-readable docs for this kind of stack object; a string
'doc',
)
def __init__(self, name, obtype, doc):
assert isinstance(name, str)
self.name = name
assert isinstance(obtype, type) or isinstance(obtype, tuple)
if isinstance(obtype, tuple):
for contained in obtype:
assert isinstance(contained, type)
self.obtype = obtype
assert isinstance(doc, str)
self.doc = doc
def __repr__(self):
return self.name
pyint = StackObject(
name='int',
obtype=int,
doc="A short (as opposed to long) Python integer object.")
pylong = StackObject(
name='long',
obtype=long,
doc="A long (as opposed to short) Python integer object.")
pyinteger_or_bool = StackObject(
name='int_or_bool',
obtype=(int, long, bool),
doc="A Python integer object (short or long), or "
"a Python bool.")
pybool = StackObject(
name='bool',
obtype=(bool,),
doc="A Python bool object.")
pyfloat = StackObject(
name='float',
obtype=float,
doc="A Python float object.")
pystring = StackObject(
name='str',
obtype=str,
doc="A Python string object.")
pyunicode = StackObject(
name='unicode',
obtype=unicode,
doc="A Python Unicode string object.")
pynone = StackObject(
name="None",
obtype=type(None),
doc="The Python None object.")
pytuple = StackObject(
name="tuple",
obtype=tuple,
doc="A Python tuple object.")
pylist = StackObject(
name="list",
obtype=list,
doc="A Python list object.")
pydict = StackObject(
name="dict",
obtype=dict,
doc="A Python dict object.")
anyobject = StackObject(
name='any',
obtype=object,
doc="Any kind of object whatsoever.")
markobject = StackObject(
name="mark",
obtype=StackObject,
doc="""'The mark' is a unique object.
Opcodes that operate on a variable number of objects
generally don't embed the count of objects in the opcode,
or pull it off the stack. Instead the MARK opcode is used
to push a special marker object on the stack, and then
some other opcodes grab all the objects from the top of
the stack down to (but not including) the topmost marker
object.
""")
stackslice = StackObject(
name="stackslice",
obtype=StackObject,
doc="""An object representing a contiguous slice of the stack.
This is used in conjunction with markobject, to represent all
of the stack following the topmost markobject. For example,
the POP_MARK opcode changes the stack from
[..., markobject, stackslice]
to
[...]
No matter how many object are on the stack after the topmost
markobject, POP_MARK gets rid of all of them (including the
topmost markobject too).
""")
##############################################################################
# Descriptors for pickle opcodes.
class OpcodeInfo(object):
__slots__ = (
# symbolic name of opcode; a string
'name',
# the code used in a bytestream to represent the opcode; a
# one-character string
'code',
# If the opcode has an argument embedded in the byte string, an
# instance of ArgumentDescriptor specifying its type. Note that
# arg.reader(s) can be used to read and decode the argument from
# the bytestream s, and arg.doc documents the format of the raw
# argument bytes. If the opcode doesn't have an argument embedded
# in the bytestream, arg should be None.
'arg',
# what the stack looks like before this opcode runs; a list
'stack_before',
# what the stack looks like after this opcode runs; a list
'stack_after',
# the protocol number in which this opcode was introduced; an int
'proto',
# human-readable docs for this opcode; a string
'doc',
)
def __init__(self, name, code, arg,
stack_before, stack_after, proto, doc):
assert isinstance(name, str)
self.name = name
assert isinstance(code, str)
assert len(code) == 1
self.code = code
assert arg is None or isinstance(arg, ArgumentDescriptor)
self.arg = arg
assert isinstance(stack_before, list)
for x in stack_before:
assert isinstance(x, StackObject)
self.stack_before = stack_before
assert isinstance(stack_after, list)
for x in stack_after:
assert isinstance(x, StackObject)
self.stack_after = stack_after
assert isinstance(proto, (int, long)) and 0 <= proto <= 2
self.proto = proto
assert isinstance(doc, str)
self.doc = doc
I = OpcodeInfo
opcodes = [
# Ways to spell integers.
I(name='INT',
code='I',
arg=decimalnl_short,
stack_before=[],
stack_after=[pyinteger_or_bool],
proto=0,
doc="""Push an integer or bool.
The argument is a newline-terminated decimal literal string.
The intent may have been that this always fit in a short Python int,
but INT can be generated in pickles written on a 64-bit box that
require a Python long on a 32-bit box. The difference between this
and LONG then is that INT skips a trailing 'L', and produces a short
int whenever possible.
Another difference is due to that, when bool was introduced as a
distinct type in 2.3, builtin names True and False were also added to
2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
Leading zeroes are never produced for a genuine integer. The 2.3
(and later) unpicklers special-case these and return bool instead;
earlier unpicklers ignore the leading "0" and return the int.
"""),
I(name='BININT',
code='J',
arg=int4,
stack_before=[],
stack_after=[pyint],
proto=1,
doc="""Push a four-byte signed integer.
This handles the full range of Python (short) integers on a 32-bit
box, directly as binary bytes (1 for the opcode and 4 for the integer).
If the integer is non-negative and fits in 1 or 2 bytes, pickling via
BININT1 or BININT2 saves space.
"""),
I(name='BININT1',
code='K',
arg=uint1,
stack_before=[],
stack_after=[pyint],
proto=1,
doc="""Push a one-byte unsigned integer.
This is a space optimization for pickling very small non-negative ints,
in range(256).
"""),
I(name='BININT2',
code='M',
arg=uint2,
stack_before=[],
stack_after=[pyint],
proto=1,
doc="""Push a two-byte unsigned integer.
This is a space optimization for pickling small positive ints, in
range(256, 2**16). Integers in range(256) can also be pickled via
BININT2, but BININT1 instead saves a byte.
"""),
I(name='LONG',
code='L',
arg=decimalnl_long,
stack_before=[],
stack_after=[pylong],
proto=0,
doc="""Push a long integer.
The same as INT, except that the literal ends with 'L', and always
unpickles to a Python long. There doesn't seem a real purpose to the
trailing 'L'.
Note that LONG takes time quadratic in the number of digits when
unpickling (this is simply due to the nature of decimal->binary
conversion). Proto 2 added linear-time (in C; still quadratic-time
in Python) LONG1 and LONG4 opcodes.
"""),
I(name="LONG1",
code='\x8a',
arg=long1,
stack_before=[],
stack_after=[pylong],
proto=2,
doc="""Long integer using one-byte length.
A more efficient encoding of a Python long; the long1 encoding
says it all."""),
I(name="LONG4",
code='\x8b',
arg=long4,
stack_before=[],
stack_after=[pylong],
proto=2,
doc="""Long integer using found-byte length.
A more efficient encoding of a Python long; the long4 encoding
says it all."""),
# Ways to spell strings (8-bit, not Unicode).
I(name='STRING',
code='S',
arg=stringnl,
stack_before=[],
stack_after=[pystring],
proto=0,
doc="""Push a Python string object.
The argument is a repr-style string, with bracketing quote characters,
and perhaps embedded escapes. The argument extends until the next
newline character.
"""),
I(name='BINSTRING',
code='T',
arg=string4,
stack_before=[],
stack_after=[pystring],
proto=1,
doc="""Push a Python string object.
There are two arguments: the first is a 4-byte little-endian signed int
giving the number of bytes in the string, and the second is that many
bytes, which are taken literally as the string content.
"""),
I(name='SHORT_BINSTRING',
code='U',
arg=string1,
stack_before=[],
stack_after=[pystring],
proto=1,
doc="""Push a Python string object.
There are two arguments: the first is a 1-byte unsigned int giving
the number of bytes in the string, and the second is that many bytes,
which are taken literally as the string content.
"""),
# Ways to spell None.
I(name='NONE',
code='N',
arg=None,
stack_before=[],
stack_after=[pynone],
proto=0,
doc="Push None on the stack."),
# Ways to spell bools, starting with proto 2. See INT for how this was
# done before proto 2.
I(name='NEWTRUE',
code='\x88',
arg=None,
stack_before=[],
stack_after=[pybool],
proto=2,
doc="Push True onto the stack."),
I(name='NEWFALSE',
code='\x89',
arg=None,
stack_before=[],
stack_after=[pybool],
proto=2,
doc="Push False onto the stack."),
# Ways to spell Unicode strings.
I(name='UNICODE',
code='V',
arg=unicodestringnl,
stack_before=[],
stack_after=[pyunicode],
proto=0, # this may be pure-text, but it's a later addition
doc="""Push a Python Unicode string object.
The argument is a raw-unicode-escape encoding of a Unicode string,
and so may contain embedded escape sequences. The argument extends
until the next newline character.
"""),
I(name='BINUNICODE',
code='X',
arg=unicodestring4,
stack_before=[],
stack_after=[pyunicode],
proto=1,
doc="""Push a Python Unicode string object.
There are two arguments: the first is a 4-byte little-endian signed int
giving the number of bytes in the string. The second is that many
bytes, and is the UTF-8 encoding of the Unicode string.
"""),
# Ways to spell floats.
I(name='FLOAT',
code='F',
arg=floatnl,
stack_before=[],
stack_after=[pyfloat],
proto=0,
doc="""Newline-terminated decimal float literal.
The argument is repr(a_float), and in general requires 17 significant
digits for roundtrip conversion to be an identity (this is so for
IEEE-754 double precision values, which is what Python float maps to
on most boxes).
In general, FLOAT cannot be used to transport infinities, NaNs, or
minus zero across boxes (or even on a single box, if the platform C
library can't read the strings it produces for such things -- Windows
is like that), but may do less damage than BINFLOAT on boxes with
greater precision or dynamic range than IEEE-754 double.
"""),
I(name='BINFLOAT',
code='G',
arg=float8,
stack_before=[],
stack_after=[pyfloat],
proto=1,
doc="""Float stored in binary form, with 8 bytes of data.
This generally requires less than half the space of FLOAT encoding.
In general, BINFLOAT cannot be used to transport infinities, NaNs, or
minus zero, raises an exception if the exponent exceeds the range of
an IEEE-754 double, and retains no more than 53 bits of precision (if
there are more than that, "add a half and chop" rounding is used to
cut it back to 53 significant bits).
"""),
# Ways to build lists.
I(name='EMPTY_LIST',
code=']',
arg=None,
stack_before=[],
stack_after=[pylist],
proto=1,
doc="Push an empty list."),
I(name='APPEND',
code='a',
arg=None,
stack_before=[pylist, anyobject],
stack_after=[pylist],
proto=0,
doc="""Append an object to a list.
Stack before: ... pylist anyobject
Stack after: ... pylist+[anyobject]
although pylist is really extended in-place.
"""),
I(name='APPENDS',
code='e',
arg=None,
stack_before=[pylist, markobject, stackslice],
stack_after=[pylist],
proto=1,
doc="""Extend a list by a slice of stack objects.
Stack before: ... pylist markobject stackslice
Stack after: ... pylist+stackslice
although pylist is really extended in-place.
"""),
I(name='LIST',
code='l',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[pylist],
proto=0,
doc="""Build a list out of the topmost stack slice, after markobject.
All the stack entries following the topmost markobject are placed into
a single Python list, which single list object replaces all of the
stack from the topmost markobject onward. For example,
Stack before: ... markobject 1 2 3 'abc'
Stack after: ... [1, 2, 3, 'abc']
"""),
# Ways to build tuples.
I(name='EMPTY_TUPLE',
code=')',
arg=None,
stack_before=[],
stack_after=[pytuple],
proto=1,
doc="Push an empty tuple."),
I(name='TUPLE',
code='t',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[pytuple],
proto=0,
doc="""Build a tuple out of the topmost stack slice, after markobject.
All the stack entries following the topmost markobject are placed into
a single Python tuple, which single tuple object replaces all of the
stack from the topmost markobject onward. For example,
Stack before: ... markobject 1 2 3 'abc'
Stack after: ... (1, 2, 3, 'abc')
"""),
I(name='TUPLE1',
code='\x85',
arg=None,
stack_before=[anyobject],
stack_after=[pytuple],
proto=2,
doc="""Build a one-tuple out of the topmost item on the stack.
This code pops one value off the stack and pushes a tuple of
length 1 whose one item is that value back onto it. In other
words:
stack[-1] = tuple(stack[-1:])
"""),
I(name='TUPLE2',
code='\x86',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[pytuple],
proto=2,
doc="""Build a two-tuple out of the top two items on the stack.
This code pops two values off the stack and pushes a tuple of
length 2 whose items are those values back onto it. In other
words:
stack[-2:] = [tuple(stack[-2:])]
"""),
I(name='TUPLE3',
code='\x87',
arg=None,
stack_before=[anyobject, anyobject, anyobject],
stack_after=[pytuple],
proto=2,
doc="""Build a three-tuple out of the top three items on the stack.
This code pops three values off the stack and pushes a tuple of
length 3 whose items are those values back onto it. In other
words:
stack[-3:] = [tuple(stack[-3:])]
"""),
# Ways to build dicts.
I(name='EMPTY_DICT',
code='}',
arg=None,
stack_before=[],
stack_after=[pydict],
proto=1,
doc="Push an empty dict."),
I(name='DICT',
code='d',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[pydict],
proto=0,
doc="""Build a dict out of the topmost stack slice, after markobject.
All the stack entries following the topmost markobject are placed into
a single Python dict, which single dict object replaces all of the
stack from the topmost markobject onward. The stack slice alternates
key, value, key, value, .... For example,
Stack before: ... markobject 1 2 3 'abc'
Stack after: ... {1: 2, 3: 'abc'}
"""),
I(name='SETITEM',
code='s',
arg=None,
stack_before=[pydict, anyobject, anyobject],
stack_after=[pydict],
proto=0,
doc="""Add a key+value pair to an existing dict.
Stack before: ... pydict key value
Stack after: ... pydict
where pydict has been modified via pydict[key] = value.
"""),
I(name='SETITEMS',
code='u',
arg=None,
stack_before=[pydict, markobject, stackslice],
stack_after=[pydict],
proto=1,
doc="""Add an arbitrary number of key+value pairs to an existing dict.
The slice of the stack following the topmost markobject is taken as
an alternating sequence of keys and values, added to the dict
immediately under the topmost markobject. Everything at and after the
topmost markobject is popped, leaving the mutated dict at the top
of the stack.
Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
Stack after: ... pydict
where pydict has been modified via pydict[key_i] = value_i for i in
1, 2, ..., n, and in that order.
"""),
# Stack manipulation.
I(name='POP',
code='0',
arg=None,
stack_before=[anyobject],
stack_after=[],
proto=0,
doc="Discard the top stack item, shrinking the stack by one item."),
I(name='DUP',
code='2',
arg=None,
stack_before=[anyobject],
stack_after=[anyobject, anyobject],
proto=0,
doc="Push the top stack item onto the stack again, duplicating it."),
I(name='MARK',
code='(',
arg=None,
stack_before=[],
stack_after=[markobject],
proto=0,
doc="""Push markobject onto the stack.
markobject is a unique object, used by other opcodes to identify a
region of the stack containing a variable number of objects for them
to work on. See markobject.doc for more detail.
"""),
I(name='POP_MARK',
code='1',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[],
proto=1,
doc="""Pop all the stack objects at and above the topmost markobject.
When an opcode using a variable number of stack objects is done,
POP_MARK is used to remove those objects, and to remove the markobject
that delimited their starting position on the stack.
"""),
# Memo manipulation. There are really only two operations (get and put),
# each in all-text, "short binary", and "long binary" flavors.
I(name='GET',
code='g',
arg=decimalnl_short,
stack_before=[],
stack_after=[anyobject],
proto=0,
doc="""Read an object from the memo and push it on the stack.
The index of the memo object to push is given by the newline-terminated
decimal string following. BINGET and LONG_BINGET are space-optimized
versions.
"""),
I(name='BINGET',
code='h',
arg=uint1,
stack_before=[],
stack_after=[anyobject],
proto=1,
doc="""Read an object from the memo and push it on the stack.
The index of the memo object to push is given by the 1-byte unsigned
integer following.
"""),
I(name='LONG_BINGET',
code='j',
arg=int4,
stack_before=[],
stack_after=[anyobject],
proto=1,
doc="""Read an object from the memo and push it on the stack.
The index of the memo object to push is given by the 4-byte signed
little-endian integer following.
"""),
I(name='PUT',
code='p',
arg=decimalnl_short,
stack_before=[],
stack_after=[],
proto=0,
doc="""Store the stack top into the memo. The stack is not popped.
The index of the memo location to write into is given by the newline-
terminated decimal string following. BINPUT and LONG_BINPUT are
space-optimized versions.
"""),
I(name='BINPUT',
code='q',
arg=uint1,
stack_before=[],
stack_after=[],
proto=1,
doc="""Store the stack top into the memo. The stack is not popped.
The index of the memo location to write into is given by the 1-byte
unsigned integer following.
"""),
I(name='LONG_BINPUT',
code='r',
arg=int4,
stack_before=[],
stack_after=[],
proto=1,
doc="""Store the stack top into the memo. The stack is not popped.
The index of the memo location to write into is given by the 4-byte
signed little-endian integer following.
"""),
# Access the extension registry (predefined objects). Akin to the GET
# family.
I(name='EXT1',
code='\x82',
arg=uint1,
stack_before=[],
stack_after=[anyobject],
proto=2,
doc="""Extension code.
This code and the similar EXT2 and EXT4 allow using a registry
of popular objects that are pickled by name, typically classes.
It is envisioned that through a global negotiation and
registration process, third parties can set up a mapping between
ints and object names.
In order to guarantee pickle interchangeability, the extension
code registry ought to be global, although a range of codes may
be reserved for private use.
EXT1 has a 1-byte integer argument. This is used to index into the
extension registry, and the object at that index is pushed on the stack.
"""),
I(name='EXT2',
code='\x83',
arg=uint2,
stack_before=[],
stack_after=[anyobject],
proto=2,
doc="""Extension code.
See EXT1. EXT2 has a two-byte integer argument.
"""),
I(name='EXT4',
code='\x84',
arg=int4,
stack_before=[],
stack_after=[anyobject],
proto=2,
doc="""Extension code.
See EXT1. EXT4 has a four-byte integer argument.
"""),
# Push a class object, or module function, on the stack, via its module
# and name.
I(name='GLOBAL',
code='c',
arg=stringnl_noescape_pair,
stack_before=[],
stack_after=[anyobject],
proto=0,
doc="""Push a global object (module.attr) on the stack.
Two newline-terminated strings follow the GLOBAL opcode. The first is
taken as a module name, and the second as a class name. The class
object module.class is pushed on the stack. More accurately, the
object returned by self.find_class(module, class) is pushed on the
stack, so unpickling subclasses can override this form of lookup.
"""),
# Ways to build objects of classes pickle doesn't know about directly
# (user-defined classes). I despair of documenting this accurately
# and comprehensibly -- you really have to read the pickle code to
# find all the special cases.
I(name='REDUCE',
code='R',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[anyobject],
proto=0,
doc="""Push an object built from a callable and an argument tuple.
The opcode is named to remind of the __reduce__() method.
Stack before: ... callable pytuple
Stack after: ... callable(*pytuple)
The callable and the argument tuple are the first two items returned
by a __reduce__ method. Applying the callable to the argtuple is
supposed to reproduce the original object, or at least get it started.
If the __reduce__ method returns a 3-tuple, the last component is an
argument to be passed to the object's __setstate__, and then the REDUCE
opcode is followed by code to create setstate's argument, and then a
BUILD opcode to apply __setstate__ to that argument.
If type(callable) is not ClassType, REDUCE complains unless the
callable has been registered with the copy_reg module's
safe_constructors dict, or the callable has a magic
'__safe_for_unpickling__' attribute with a true value. I'm not sure
why it does this, but I've sure seen this complaint often enough when
I didn't want to <wink>.
"""),
I(name='BUILD',
code='b',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[anyobject],
proto=0,
doc="""Finish building an object, via __setstate__ or dict update.
Stack before: ... anyobject argument
Stack after: ... anyobject
where anyobject may have been mutated, as follows:
If the object has a __setstate__ method,
anyobject.__setstate__(argument)
is called.
Else the argument must be a dict, the object must have a __dict__, and
the object is updated via
anyobject.__dict__.update(argument)
This may raise RuntimeError in restricted execution mode (which
disallows access to __dict__ directly); in that case, the object
is updated instead via
for k, v in argument.items():
anyobject[k] = v
"""),
I(name='INST',
code='i',
arg=stringnl_noescape_pair,
stack_before=[markobject, stackslice],
stack_after=[anyobject],
proto=0,
doc="""Build a class instance.
This is the protocol 0 version of protocol 1's OBJ opcode.
INST is followed by two newline-terminated strings, giving a
module and class name, just as for the GLOBAL opcode (and see
GLOBAL for more details about that). self.find_class(module, name)
is used to get a class object.
In addition, all the objects on the stack following the topmost
markobject are gathered into a tuple and popped (along with the
topmost markobject), just as for the TUPLE opcode.
Now it gets complicated. If all of these are true:
+ The argtuple is empty (markobject was at the top of the stack
at the start).
+ It's an old-style class object (the type of the class object is
ClassType).
+ The class object does not have a __getinitargs__ attribute.
then we want to create an old-style class instance without invoking
its __init__() method (pickle has waffled on this over the years; not
calling __init__() is current wisdom). In this case, an instance of
an old-style dummy class is created, and then we try to rebind its
__class__ attribute to the desired class object. If this succeeds,
the new instance object is pushed on the stack, and we're done. In
restricted execution mode it can fail (assignment to __class__ is
disallowed), and I'm not really sure what happens then -- it looks
like the code ends up calling the class object's __init__ anyway,
via falling into the next case.
Else (the argtuple is not empty, it's not an old-style class object,
or the class object does have a __getinitargs__ attribute), the code
first insists that the class object have a __safe_for_unpickling__
attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
it doesn't matter whether this attribute has a true or false value, it
only matters whether it exists (XXX this is a bug; cPickle
requires the attribute to be true). If __safe_for_unpickling__
doesn't exist, UnpicklingError is raised.
Else (the class object does have a __safe_for_unpickling__ attr),
the class object obtained from INST's arguments is applied to the
argtuple obtained from the stack, and the resulting instance object
is pushed on the stack.
NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.
"""),
I(name='OBJ',
code='o',
arg=None,
stack_before=[markobject, anyobject, stackslice],
stack_after=[anyobject],
proto=1,
doc="""Build a class instance.
This is the protocol 1 version of protocol 0's INST opcode, and is
very much like it. The major difference is that the class object
is taken off the stack, allowing it to be retrieved from the memo
repeatedly if several instances of the same class are created. This
can be much more efficient (in both time and space) than repeatedly
embedding the module and class names in INST opcodes.
Unlike INST, OBJ takes no arguments from the opcode stream. Instead
the class object is taken off the stack, immediately above the
topmost markobject:
Stack before: ... markobject classobject stackslice
Stack after: ... new_instance_object
As for INST, the remainder of the stack above the markobject is
gathered into an argument tuple, and then the logic seems identical,
except that no __safe_for_unpickling__ check is done (XXX this is
a bug; cPickle does test __safe_for_unpickling__). See INST for
the gory details.
NOTE: In Python 2.3, INST and OBJ are identical except for how they
get the class object. That was always the intent; the implementations
had diverged for accidental reasons.
"""),
I(name='NEWOBJ',
code='\x81',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[anyobject],
proto=2,
doc="""Build an object instance.
The stack before should be thought of as containing a class
object followed by an argument tuple (the tuple being the stack
top). Call these cls and args. They are popped off the stack,
and the value returned by cls.__new__(cls, *args) is pushed back
onto the stack.
"""),
# Machine control.
I(name='PROTO',
code='\x80',
arg=uint1,
stack_before=[],
stack_after=[],
proto=2,
doc="""Protocol version indicator.
For protocol 2 and above, a pickle must start with this opcode.
The argument is the protocol version, an int in range(2, 256).
"""),
I(name='STOP',
code='.',
arg=None,
stack_before=[anyobject],
stack_after=[],
proto=0,
doc="""Stop the unpickling machine.
Every pickle ends with this opcode. The object at the top of the stack
is popped, and that's the result of unpickling. The stack should be
empty then.
"""),
# Ways to deal with persistent IDs.
I(name='PERSID',
code='P',
arg=stringnl_noescape,
stack_before=[],
stack_after=[anyobject],
proto=0,
doc="""Push an object identified by a persistent ID.
The pickle module doesn't define what a persistent ID means. PERSID's
argument is a newline-terminated str-style (no embedded escapes, no
bracketing quote characters) string, which *is* "the persistent ID".
The unpickler passes this string to self.persistent_load(). Whatever
object that returns is pushed on the stack. There is no implementation
of persistent_load() in Python's unpickler: it must be supplied by an
unpickler subclass.
"""),
I(name='BINPERSID',
code='Q',
arg=None,
stack_before=[anyobject],
stack_after=[anyobject],
proto=1,
doc="""Push an object identified by a persistent ID.
Like PERSID, except the persistent ID is popped off the stack (instead
of being a string embedded in the opcode bytestream). The persistent
ID is passed to self.persistent_load(), and whatever object that
returns is pushed on the stack. See PERSID for more detail.
"""),
]
del I
# Verify uniqueness of .name and .code members.
name2i = {}
code2i = {}
for i, d in enumerate(opcodes):
if d.name in name2i:
raise ValueError("repeated name %r at indices %d and %d" %
(d.name, name2i[d.name], i))
if d.code in code2i:
raise ValueError("repeated code %r at indices %d and %d" %
(d.code, code2i[d.code], i))
name2i[d.name] = i
code2i[d.code] = i
del name2i, code2i, i, d
##############################################################################
# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
# Also ensure we've got the same stuff as pickle.py, although the
# introspection here is dicey.
code2op = {}
for d in opcodes:
code2op[d.code] = d
del d
def assure_pickle_consistency(verbose=False):
import pickle, re
copy = code2op.copy()
for name in pickle.__all__:
if not re.match("[A-Z][A-Z0-9_]+$", name):
if verbose:
print "skipping %r: it doesn't look like an opcode name" % name
continue
picklecode = getattr(pickle, name)
if not isinstance(picklecode, str) or len(picklecode) != 1:
if verbose:
print ("skipping %r: value %r doesn't look like a pickle "
"code" % (name, picklecode))
continue
if picklecode in copy:
if verbose:
print "checking name %r w/ code %r for consistency" % (
name, picklecode)
d = copy[picklecode]
if d.name != name:
raise ValueError("for pickle code %r, pickle.py uses name %r "
"but we're using name %r" % (picklecode,
name,
d.name))
# Forget this one. Any left over in copy at the end are a problem
# of a different kind.
del copy[picklecode]
else:
raise ValueError("pickle.py appears to have a pickle opcode with "
"name %r and code %r, but we don't" %
(name, picklecode))
if copy:
msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
for code, d in copy.items():
msg.append(" name %r with code %r" % (d.name, code))
raise ValueError("\n".join(msg))
assure_pickle_consistency()
del assure_pickle_consistency
##############################################################################
# A pickle opcode generator.
def genops(pickle):
"""Generate all the opcodes in a pickle.
'pickle' is a file-like object, or string, containing the pickle.
Each opcode in the pickle is generated, from the current pickle position,
stopping after a STOP opcode is delivered. A triple is generated for
each opcode:
opcode, arg, pos
opcode is an OpcodeInfo record, describing the current opcode.
If the opcode has an argument embedded in the pickle, arg is its decoded
value, as a Python object. If the opcode doesn't have an argument, arg
is None.
If the pickle has a tell() method, pos was the value of pickle.tell()
before reading the current opcode. If the pickle is a string object,
it's wrapped in a StringIO object, and the latter's tell() result is
used. Else (the pickle doesn't have a tell(), and it's not obvious how
to query its current position) pos is None.
"""
import cStringIO as StringIO
if isinstance(pickle, str):
pickle = StringIO.StringIO(pickle)
if hasattr(pickle, "tell"):
getpos = pickle.tell
else:
getpos = lambda: None
while True:
pos = getpos()
code = pickle.read(1)
opcode = code2op.get(code)
if opcode is None:
if code == "":
raise ValueError("pickle exhausted before seeing STOP")
else:
raise ValueError("at position %s, opcode %r unknown" % (
pos is None and "<unknown>" or pos,
code))
if opcode.arg is None:
arg = None
else:
arg = opcode.arg.reader(pickle)
yield opcode, arg, pos
if code == '.':
assert opcode.name == 'STOP'
break
##############################################################################
# A pickle optimizer.
def optimize(p):
'Optimize a pickle string by removing unused PUT opcodes'
gets = set() # set of args used by a GET opcode
puts = [] # (arg, startpos, stoppos) for the PUT opcodes
prevpos = None # set to pos if previous opcode was a PUT
for opcode, arg, pos in genops(p):
if prevpos is not None:
puts.append((prevarg, prevpos, pos))
prevpos = None
if 'PUT' in opcode.name:
prevarg, prevpos = arg, pos
elif 'GET' in opcode.name:
gets.add(arg)
# Copy the pickle string except for PUTS without a corresponding GET
s = []
i = 0
for arg, start, stop in puts:
j = stop if (arg in gets) else start
s.append(p[i:j])
i = stop
s.append(p[i:])
return ''.join(s)
##############################################################################
# A symbolic pickle disassembler.
def dis(pickle, out=None, memo=None, indentlevel=4):
"""Produce a symbolic disassembly of a pickle.
'pickle' is a file-like object, or string, containing a (at least one)
pickle. The pickle is disassembled from the current position, through
the first STOP opcode encountered.
Optional arg 'out' is a file-like object to which the disassembly is
printed. It defaults to sys.stdout.
Optional arg 'memo' is a Python dict, used as the pickle's memo. It
may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
Passing the same memo object to another dis() call then allows disassembly
to proceed across multiple pickles that were all created by the same
pickler with the same memo. Ordinarily you don't need to worry about this.
Optional arg indentlevel is the number of blanks by which to indent
a new MARK level. It defaults to 4.
In addition to printing the disassembly, some sanity checks are made:
+ All embedded opcode arguments "make sense".
+ Explicit and implicit pop operations have enough items on the stack.
+ When an opcode implicitly refers to a markobject, a markobject is
actually on the stack.
+ A memo entry isn't referenced before it's defined.
+ The markobject isn't stored in the memo.
+ A memo entry isn't redefined.
"""
# Most of the hair here is for sanity checks, but most of it is needed
# anyway to detect when a protocol 0 POP takes a MARK off the stack
# (which in turn is needed to indent MARK blocks correctly).
stack = [] # crude emulation of unpickler stack
if memo is None:
memo = {} # crude emulation of unpickler memo
maxproto = -1 # max protocol number seen
markstack = [] # bytecode positions of MARK opcodes
indentchunk = ' ' * indentlevel
errormsg = None
for opcode, arg, pos in genops(pickle):
if pos is not None:
print >> out, "%5d:" % pos,
line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
indentchunk * len(markstack),
opcode.name)
maxproto = max(maxproto, opcode.proto)
before = opcode.stack_before # don't mutate
after = opcode.stack_after # don't mutate
numtopop = len(before)
# See whether a MARK should be popped.
markmsg = None
if markobject in before or (opcode.name == "POP" and
stack and
stack[-1] is markobject):
assert markobject not in after
if __debug__:
if markobject in before:
assert before[-1] is stackslice
if markstack:
markpos = markstack.pop()
if markpos is None:
markmsg = "(MARK at unknown opcode offset)"
else:
markmsg = "(MARK at %d)" % markpos
# Pop everything at and after the topmost markobject.
while stack[-1] is not markobject:
stack.pop()
stack.pop()
# Stop later code from popping too much.
try:
numtopop = before.index(markobject)
except ValueError:
assert opcode.name == "POP"
numtopop = 0
else:
errormsg = markmsg = "no MARK exists on stack"
# Check for correct memo usage.
if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT"):
assert arg is not None
if arg in memo:
errormsg = "memo key %r already defined" % arg
elif not stack:
errormsg = "stack is empty -- can't store into memo"
elif stack[-1] is markobject:
errormsg = "can't store markobject in the memo"
else:
memo[arg] = stack[-1]
elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
if arg in memo:
assert len(after) == 1
after = [memo[arg]] # for better stack emulation
else:
errormsg = "memo key %r has never been stored into" % arg
if arg is not None or markmsg:
# make a mild effort to align arguments
line += ' ' * (10 - len(opcode.name))
if arg is not None:
line += ' ' + repr(arg)
if markmsg:
line += ' ' + markmsg
print >> out, line
if errormsg:
# Note that we delayed complaining until the offending opcode
# was printed.
raise ValueError(errormsg)
# Emulate the stack effects.
if len(stack) < numtopop:
raise ValueError("tries to pop %d items from stack with "
"only %d items" % (numtopop, len(stack)))
if numtopop:
del stack[-numtopop:]
if markobject in after:
assert markobject not in before
markstack.append(pos)
stack.extend(after)
print >> out, "highest protocol among opcodes =", maxproto
if stack:
raise ValueError("stack not empty after STOP: %r" % stack)
# For use in the doctest, simply as an example of a class to pickle.
class _Example:
def __init__(self, value):
self.value = value
_dis_test = r"""
>>> import pickle
>>> x = [1, 2, (3, 4), {'abc': u"def"}]
>>> pkl = pickle.dumps(x, 0)
>>> dis(pkl)
0: ( MARK
1: l LIST (MARK at 0)
2: p PUT 0
5: I INT 1
8: a APPEND
9: I INT 2
12: a APPEND
13: ( MARK
14: I INT 3
17: I INT 4
20: t TUPLE (MARK at 13)
21: p PUT 1
24: a APPEND
25: ( MARK
26: d DICT (MARK at 25)
27: p PUT 2
30: S STRING 'abc'
37: p PUT 3
40: V UNICODE u'def'
45: p PUT 4
48: s SETITEM
49: a APPEND
50: . STOP
highest protocol among opcodes = 0
Try again with a "binary" pickle.
>>> pkl = pickle.dumps(x, 1)
>>> dis(pkl)
0: ] EMPTY_LIST
1: q BINPUT 0
3: ( MARK
4: K BININT1 1
6: K BININT1 2
8: ( MARK
9: K BININT1 3
11: K BININT1 4
13: t TUPLE (MARK at 8)
14: q BINPUT 1
16: } EMPTY_DICT
17: q BINPUT 2
19: U SHORT_BINSTRING 'abc'
24: q BINPUT 3
26: X BINUNICODE u'def'
34: q BINPUT 4
36: s SETITEM
37: e APPENDS (MARK at 3)
38: . STOP
highest protocol among opcodes = 1
Exercise the INST/OBJ/BUILD family.
>>> import pickletools
>>> dis(pickle.dumps(pickletools.dis, 0))
0: c GLOBAL 'pickletools dis'
17: p PUT 0
20: . STOP
highest protocol among opcodes = 0
>>> from pickletools import _Example
>>> x = [_Example(42)] * 2
>>> dis(pickle.dumps(x, 0))
0: ( MARK
1: l LIST (MARK at 0)
2: p PUT 0
5: ( MARK
6: i INST 'pickletools _Example' (MARK at 5)
28: p PUT 1
31: ( MARK
32: d DICT (MARK at 31)
33: p PUT 2
36: S STRING 'value'
45: p PUT 3
48: I INT 42
52: s SETITEM
53: b BUILD
54: a APPEND
55: g GET 1
58: a APPEND
59: . STOP
highest protocol among opcodes = 0
>>> dis(pickle.dumps(x, 1))
0: ] EMPTY_LIST
1: q BINPUT 0
3: ( MARK
4: ( MARK
5: c GLOBAL 'pickletools _Example'
27: q BINPUT 1
29: o OBJ (MARK at 4)
30: q BINPUT 2
32: } EMPTY_DICT
33: q BINPUT 3
35: U SHORT_BINSTRING 'value'
42: q BINPUT 4
44: K BININT1 42
46: s SETITEM
47: b BUILD
48: h BINGET 2
50: e APPENDS (MARK at 3)
51: . STOP
highest protocol among opcodes = 1
Try "the canonical" recursive-object test.
>>> L = []
>>> T = L,
>>> L.append(T)
>>> L[0] is T
True
>>> T[0] is L
True
>>> L[0][0] is L
True
>>> T[0][0] is T
True
>>> dis(pickle.dumps(L, 0))
0: ( MARK
1: l LIST (MARK at 0)
2: p PUT 0
5: ( MARK
6: g GET 0
9: t TUPLE (MARK at 5)
10: p PUT 1
13: a APPEND
14: . STOP
highest protocol among opcodes = 0
>>> dis(pickle.dumps(L, 1))
0: ] EMPTY_LIST
1: q BINPUT 0
3: ( MARK
4: h BINGET 0
6: t TUPLE (MARK at 3)
7: q BINPUT 1
9: a APPEND
10: . STOP
highest protocol among opcodes = 1
Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
has to emulate the stack in order to realize that the POP opcode at 16 gets
rid of the MARK at 0.
>>> dis(pickle.dumps(T, 0))
0: ( MARK
1: ( MARK
2: l LIST (MARK at 1)
3: p PUT 0
6: ( MARK
7: g GET 0
10: t TUPLE (MARK at 6)
11: p PUT 1
14: a APPEND
15: 0 POP
16: 0 POP (MARK at 0)
17: g GET 1
20: . STOP
highest protocol among opcodes = 0
>>> dis(pickle.dumps(T, 1))
0: ( MARK
1: ] EMPTY_LIST
2: q BINPUT 0
4: ( MARK
5: h BINGET 0
7: t TUPLE (MARK at 4)
8: q BINPUT 1
10: a APPEND
11: 1 POP_MARK (MARK at 0)
12: h BINGET 1
14: . STOP
highest protocol among opcodes = 1
Try protocol 2.
>>> dis(pickle.dumps(L, 2))
0: \x80 PROTO 2
2: ] EMPTY_LIST
3: q BINPUT 0
5: h BINGET 0
7: \x85 TUPLE1
8: q BINPUT 1
10: a APPEND
11: . STOP
highest protocol among opcodes = 2
>>> dis(pickle.dumps(T, 2))
0: \x80 PROTO 2
2: ] EMPTY_LIST
3: q BINPUT 0
5: h BINGET 0
7: \x85 TUPLE1
8: q BINPUT 1
10: a APPEND
11: 0 POP
12: h BINGET 1
14: . STOP
highest protocol among opcodes = 2
"""
_memo_test = r"""
>>> import pickle
>>> from StringIO import StringIO
>>> f = StringIO()
>>> p = pickle.Pickler(f, 2)
>>> x = [1, 2, 3]
>>> p.dump(x)
>>> p.dump(x)
>>> f.seek(0)
>>> memo = {}
>>> dis(f, memo=memo)
0: \x80 PROTO 2
2: ] EMPTY_LIST
3: q BINPUT 0
5: ( MARK
6: K BININT1 1
8: K BININT1 2
10: K BININT1 3
12: e APPENDS (MARK at 5)
13: . STOP
highest protocol among opcodes = 2
>>> dis(f, memo=memo)
14: \x80 PROTO 2
16: h BINGET 0
18: . STOP
highest protocol among opcodes = 2
"""
__test__ = {'disassembler_test': _dis_test,
'disassembler_memo_test': _memo_test,
}
def _test():
import doctest
return doctest.testmod()
if __name__ == "__main__":
_test()
"""Conversion pipeline templates.
The problem:
------------
Suppose you have some data that you want to convert to another format,
such as from GIF image format to PPM image format. Maybe the
conversion involves several steps (e.g. piping it through compress or
uuencode). Some of the conversion steps may require that their input
is a disk file, others may be able to read standard input; similar for
their output. The input to the entire conversion may also be read
from a disk file or from an open file, and similar for its output.
The module lets you construct a pipeline template by sticking one or
more conversion steps together. It will take care of creating and
removing temporary files if they are necessary to hold intermediate
data. You can then use the template to do conversions from many
different sources to many different destinations. The temporary
file names used are different each time the template is used.
The templates are objects so you can create templates for many
different conversion steps and store them in a dictionary, for
instance.
Directions:
-----------
To create a template:
t = Template()
To add a conversion step to a template:
t.append(command, kind)
where kind is a string of two characters: the first is '-' if the
command reads its standard input or 'f' if it requires a file; the
second likewise for the output. The command must be valid /bin/sh
syntax. If input or output files are required, they are passed as
$IN and $OUT; otherwise, it must be possible to use the command in
a pipeline.
To add a conversion step at the beginning:
t.prepend(command, kind)
To convert a file to another file using a template:
sts = t.copy(infile, outfile)
If infile or outfile are the empty string, standard input is read or
standard output is written, respectively. The return value is the
exit status of the conversion pipeline.
To open a file for reading or writing through a conversion pipeline:
fp = t.open(file, mode)
where mode is 'r' to read the file, or 'w' to write it -- just like
for the built-in function open() or for os.popen().
To create a new template object initialized to a given one:
t2 = t.clone()
""" # '
import re
import os
import tempfile
import string
__all__ = ["Template"]
# Conversion step kinds
FILEIN_FILEOUT = 'ff' # Must read & write real files
STDIN_FILEOUT = '-f' # Must write a real file
FILEIN_STDOUT = 'f-' # Must read a real file
STDIN_STDOUT = '--' # Normal pipeline element
SOURCE = '.-' # Must be first, writes stdout
SINK = '-.' # Must be last, reads stdin
stepkinds = [FILEIN_FILEOUT, STDIN_FILEOUT, FILEIN_STDOUT, STDIN_STDOUT, \
SOURCE, SINK]
class Template:
"""Class representing a pipeline template."""
def __init__(self):
"""Template() returns a fresh pipeline template."""
self.debugging = 0
self.reset()
def __repr__(self):
"""t.__repr__() implements repr(t)."""
return '<Template instance, steps=%r>' % (self.steps,)
def reset(self):
"""t.reset() restores a pipeline template to its initial state."""
self.steps = []
def clone(self):
"""t.clone() returns a new pipeline template with identical
initial state as the current one."""
t = Template()
t.steps = self.steps[:]
t.debugging = self.debugging
return t
def debug(self, flag):
"""t.debug(flag) turns debugging on or off."""
self.debugging = flag
def append(self, cmd, kind):
"""t.append(cmd, kind) adds a new step at the end."""
if type(cmd) is not type(''):
raise TypeError, \
'Template.append: cmd must be a string'
if kind not in stepkinds:
raise ValueError, \
'Template.append: bad kind %r' % (kind,)
if kind == SOURCE:
raise ValueError, \
'Template.append: SOURCE can only be prepended'
if self.steps and self.steps[-1][1] == SINK:
raise ValueError, \
'Template.append: already ends with SINK'
if kind[0] == 'f' and not re.search(r'\$IN\b', cmd):
raise ValueError, \
'Template.append: missing $IN in cmd'
if kind[1] == 'f' and not re.search(r'\$OUT\b', cmd):
raise ValueError, \
'Template.append: missing $OUT in cmd'
self.steps.append((cmd, kind))
def prepend(self, cmd, kind):
"""t.prepend(cmd, kind) adds a new step at the front."""
if type(cmd) is not type(''):
raise TypeError, \
'Template.prepend: cmd must be a string'
if kind not in stepkinds:
raise ValueError, \
'Template.prepend: bad kind %r' % (kind,)
if kind == SINK:
raise ValueError, \
'Template.prepend: SINK can only be appended'
if self.steps and self.steps[0][1] == SOURCE:
raise ValueError, \
'Template.prepend: already begins with SOURCE'
if kind[0] == 'f' and not re.search(r'\$IN\b', cmd):
raise ValueError, \
'Template.prepend: missing $IN in cmd'
if kind[1] == 'f' and not re.search(r'\$OUT\b', cmd):
raise ValueError, \
'Template.prepend: missing $OUT in cmd'
self.steps.insert(0, (cmd, kind))
def open(self, file, rw):
"""t.open(file, rw) returns a pipe or file object open for
reading or writing; the file is the other end of the pipeline."""
if rw == 'r':
return self.open_r(file)
if rw == 'w':
return self.open_w(file)
raise ValueError, \
'Template.open: rw must be \'r\' or \'w\', not %r' % (rw,)
def open_r(self, file):
"""t.open_r(file) and t.open_w(file) implement
t.open(file, 'r') and t.open(file, 'w') respectively."""
if not self.steps:
return open(file, 'r')
if self.steps[-1][1] == SINK:
raise ValueError, \
'Template.open_r: pipeline ends width SINK'
cmd = self.makepipeline(file, '')
return os.popen(cmd, 'r')
def open_w(self, file):
if not self.steps:
return open(file, 'w')
if self.steps[0][1] == SOURCE:
raise ValueError, \
'Template.open_w: pipeline begins with SOURCE'
cmd = self.makepipeline('', file)
return os.popen(cmd, 'w')
def copy(self, infile, outfile):
return os.system(self.makepipeline(infile, outfile))
def makepipeline(self, infile, outfile):
cmd = makepipeline(infile, self.steps, outfile)
if self.debugging:
print cmd
cmd = 'set -x; ' + cmd
return cmd
def makepipeline(infile, steps, outfile):
# Build a list with for each command:
# [input filename or '', command string, kind, output filename or '']
list = []
for cmd, kind in steps:
list.append(['', cmd, kind, ''])
#
# Make sure there is at least one step
#
if not list:
list.append(['', 'cat', '--', ''])
#
# Take care of the input and output ends
#
[cmd, kind] = list[0][1:3]
if kind[0] == 'f' and not infile:
list.insert(0, ['', 'cat', '--', ''])
list[0][0] = infile
#
[cmd, kind] = list[-1][1:3]
if kind[1] == 'f' and not outfile:
list.append(['', 'cat', '--', ''])
list[-1][-1] = outfile
#
# Invent temporary files to connect stages that need files
#
garbage = []
for i in range(1, len(list)):
lkind = list[i-1][2]
rkind = list[i][2]
if lkind[1] == 'f' or rkind[0] == 'f':
(fd, temp) = tempfile.mkstemp()
os.close(fd)
garbage.append(temp)
list[i-1][-1] = list[i][0] = temp
#
for item in list:
[inf, cmd, kind, outf] = item
if kind[1] == 'f':
cmd = 'OUT=' + quote(outf) + '; ' + cmd
if kind[0] == 'f':
cmd = 'IN=' + quote(inf) + '; ' + cmd
if kind[0] == '-' and inf:
cmd = cmd + ' <' + quote(inf)
if kind[1] == '-' and outf:
cmd = cmd + ' >' + quote(outf)
item[1] = cmd
#
cmdlist = list[0][1]
for item in list[1:]:
[cmd, kind] = item[1:3]
if item[0] == '':
if 'f' in kind:
cmd = '{ ' + cmd + '; }'
cmdlist = cmdlist + ' |\n' + cmd
else:
cmdlist = cmdlist + '\n' + cmd
#
if garbage:
rmcmd = 'rm -f'
for file in garbage:
rmcmd = rmcmd + ' ' + quote(file)
trapcmd = 'trap ' + quote(rmcmd + '; exit') + ' 1 2 3 13 14 15'
cmdlist = trapcmd + '\n' + cmdlist + '\n' + rmcmd
#
return cmdlist
# Reliably quote a string as a single argument for /bin/sh
# Safe unquoted
_safechars = frozenset(string.ascii_letters + string.digits + '@%_-+=:,./')
def quote(file):
"""Return a shell-escaped version of the file string."""
for c in file:
if c not in _safechars:
break
else:
if not file:
return "''"
return file
# use single quotes, and put single quotes into double quotes
# the string $'b is then quoted as '$'"'"'b'
return "'" + file.replace("'", "'\"'\"'") + "'"
"""Utilities to support packages."""
import os
import sys
import imp
import os.path
from types import ModuleType
__all__ = [
'get_importer', 'iter_importers', 'get_loader', 'find_loader',
'walk_packages', 'iter_modules', 'get_data',
'ImpImporter', 'ImpLoader', 'read_code', 'extend_path',
]
def read_code(stream):
# This helper is needed in order for the PEP 302 emulation to
# correctly handle compiled files
import marshal
magic = stream.read(4)
if magic != imp.get_magic():
return None
stream.read(4) # Skip timestamp
return marshal.load(stream)
def simplegeneric(func):
"""Make a trivial single-dispatch generic function"""
registry = {}
def wrapper(*args, **kw):
ob = args[0]
try:
cls = ob.__class__
except AttributeError:
cls = type(ob)
try:
mro = cls.__mro__
except AttributeError:
try:
class cls(cls, object):
pass
mro = cls.__mro__[1:]
except TypeError:
mro = object, # must be an ExtensionClass or some such :(
for t in mro:
if t in registry:
return registry[t](*args, **kw)
else:
return func(*args, **kw)
try:
wrapper.__name__ = func.__name__
except (TypeError, AttributeError):
pass # Python 2.3 doesn't allow functions to be renamed
def register(typ, func=None):
if func is None:
return lambda f: register(typ, f)
registry[typ] = func
return func
wrapper.__dict__ = func.__dict__
wrapper.__doc__ = func.__doc__
wrapper.register = register
return wrapper
def walk_packages(path=None, prefix='', onerror=None):
"""Yields (module_loader, name, ispkg) for all modules recursively
on path, or, if path is None, all accessible modules.
'path' should be either None or a list of paths to look for
modules in.
'prefix' is a string to output on the front of every module name
on output.
Note that this function must import all *packages* (NOT all
modules!) on the given path, in order to access the __path__
attribute to find submodules.
'onerror' is a function which gets called with one argument (the
name of the package which was being imported) if any exception
occurs while trying to import a package. If no onerror function is
supplied, ImportErrors are caught and ignored, while all other
exceptions are propagated, terminating the search.
Examples:
# list all modules python can access
walk_packages()
# list all submodules of ctypes
walk_packages(ctypes.__path__, ctypes.__name__+'.')
"""
def seen(p, m={}):
if p in m:
return True
m[p] = True
for importer, name, ispkg in iter_modules(path, prefix):
yield importer, name, ispkg
if ispkg:
try:
__import__(name)
except ImportError:
if onerror is not None:
onerror(name)
except Exception:
if onerror is not None:
onerror(name)
else:
raise
else:
path = getattr(sys.modules[name], '__path__', None) or []
# don't traverse path items we've seen before
path = [p for p in path if not seen(p)]
for item in walk_packages(path, name+'.', onerror):
yield item
def iter_modules(path=None, prefix=''):
"""Yields (module_loader, name, ispkg) for all submodules on path,
or, if path is None, all top-level modules on sys.path.
'path' should be either None or a list of paths to look for
modules in.
'prefix' is a string to output on the front of every module name
on output.
"""
if path is None:
importers = iter_importers()
else:
importers = map(get_importer, path)
yielded = {}
for i in importers:
for name, ispkg in iter_importer_modules(i, prefix):
if name not in yielded:
yielded[name] = 1
yield i, name, ispkg
#@simplegeneric
def iter_importer_modules(importer, prefix=''):
if not hasattr(importer, 'iter_modules'):
return []
return importer.iter_modules(prefix)
iter_importer_modules = simplegeneric(iter_importer_modules)
class ImpImporter:
"""PEP 302 Importer that wraps Python's "classic" import algorithm
ImpImporter(dirname) produces a PEP 302 importer that searches that
directory. ImpImporter(None) produces a PEP 302 importer that searches
the current sys.path, plus any modules that are frozen or built-in.
Note that ImpImporter does not currently support being used by placement
on sys.meta_path.
"""
def __init__(self, path=None):
self.path = path
def find_module(self, fullname, path=None):
# Note: we ignore 'path' argument since it is only used via meta_path
subname = fullname.split(".")[-1]
if subname != fullname and self.path is None:
return None
if self.path is None:
path = None
else:
path = [os.path.realpath(self.path)]
try:
file, filename, etc = imp.find_module(subname, path)
except ImportError:
return None
return ImpLoader(fullname, file, filename, etc)
def iter_modules(self, prefix=''):
if self.path is None or not os.path.isdir(self.path):
return
yielded = {}
import inspect
try:
filenames = os.listdir(self.path)
except OSError:
# ignore unreadable directories like import does
filenames = []
filenames.sort() # handle packages before same-named modules
for fn in filenames:
modname = inspect.getmodulename(fn)
if modname=='__init__' or modname in yielded:
continue
path = os.path.join(self.path, fn)
ispkg = False
if not modname and os.path.isdir(path) and '.' not in fn:
modname = fn
try:
dircontents = os.listdir(path)
except OSError:
# ignore unreadable directories like import does
dircontents = []
for fn in dircontents:
subname = inspect.getmodulename(fn)
if subname=='__init__':
ispkg = True
break
else:
continue # not a package
if modname and '.' not in modname:
yielded[modname] = 1
yield prefix + modname, ispkg
class ImpLoader:
"""PEP 302 Loader that wraps Python's "classic" import algorithm
"""
code = source = None
def __init__(self, fullname, file, filename, etc):
self.file = file
self.filename = filename
self.fullname = fullname
self.etc = etc
def load_module(self, fullname):
self._reopen()
try:
mod = imp.load_module(fullname, self.file, self.filename, self.etc)
finally:
if self.file:
self.file.close()
# Note: we don't set __loader__ because we want the module to look
# normal; i.e. this is just a wrapper for standard import machinery
return mod
def get_data(self, pathname):
with open(pathname, "rb") as file:
return file.read()
def _reopen(self):
if self.file and self.file.closed:
mod_type = self.etc[2]
if mod_type==imp.PY_SOURCE:
self.file = open(self.filename, 'rU')
elif mod_type in (imp.PY_COMPILED, imp.C_EXTENSION):
self.file = open(self.filename, 'rb')
def _fix_name(self, fullname):
if fullname is None:
fullname = self.fullname
elif fullname != self.fullname:
raise ImportError("Loader for module %s cannot handle "
"module %s" % (self.fullname, fullname))
return fullname
def is_package(self, fullname):
fullname = self._fix_name(fullname)
return self.etc[2]==imp.PKG_DIRECTORY
def get_code(self, fullname=None):
fullname = self._fix_name(fullname)
if self.code is None:
mod_type = self.etc[2]
if mod_type==imp.PY_SOURCE:
source = self.get_source(fullname)
self.code = compile(source, self.filename, 'exec')
elif mod_type==imp.PY_COMPILED:
self._reopen()
try:
self.code = read_code(self.file)
finally:
self.file.close()
elif mod_type==imp.PKG_DIRECTORY:
self.code = self._get_delegate().get_code()
return self.code
def get_source(self, fullname=None):
fullname = self._fix_name(fullname)
if self.source is None:
mod_type = self.etc[2]
if mod_type==imp.PY_SOURCE:
self._reopen()
try:
self.source = self.file.read()
finally:
self.file.close()
elif mod_type==imp.PY_COMPILED:
if os.path.exists(self.filename[:-1]):
f = open(self.filename[:-1], 'rU')
self.source = f.read()
f.close()
elif mod_type==imp.PKG_DIRECTORY:
self.source = self._get_delegate().get_source()
return self.source
def _get_delegate(self):
return ImpImporter(self.filename).find_module('__init__')
def get_filename(self, fullname=None):
fullname = self._fix_name(fullname)
mod_type = self.etc[2]
if self.etc[2]==imp.PKG_DIRECTORY:
return self._get_delegate().get_filename()
elif self.etc[2] in (imp.PY_SOURCE, imp.PY_COMPILED, imp.C_EXTENSION):
return self.filename
return None
try:
import zipimport
from zipimport import zipimporter
def iter_zipimport_modules(importer, prefix=''):
dirlist = zipimport._zip_directory_cache[importer.archive].keys()
dirlist.sort()
_prefix = importer.prefix
plen = len(_prefix)
yielded = {}
import inspect
for fn in dirlist:
if not fn.startswith(_prefix):
continue
fn = fn[plen:].split(os.sep)
if len(fn)==2 and fn[1].startswith('__init__.py'):
if fn[0] not in yielded:
yielded[fn[0]] = 1
yield fn[0], True
if len(fn)!=1:
continue
modname = inspect.getmodulename(fn[0])
if modname=='__init__':
continue
if modname and '.' not in modname and modname not in yielded:
yielded[modname] = 1
yield prefix + modname, False
iter_importer_modules.register(zipimporter, iter_zipimport_modules)
except ImportError:
pass
def get_importer(path_item):
"""Retrieve a PEP 302 importer for the given path item
The returned importer is cached in sys.path_importer_cache
if it was newly created by a path hook.
If there is no importer, a wrapper around the basic import
machinery is returned. This wrapper is never inserted into
the importer cache (None is inserted instead).
The cache (or part of it) can be cleared manually if a
rescan of sys.path_hooks is necessary.
"""
try:
importer = sys.path_importer_cache[path_item]
except KeyError:
for path_hook in sys.path_hooks:
try:
importer = path_hook(path_item)
break
except ImportError:
pass
else:
importer = None
sys.path_importer_cache.setdefault(path_item, importer)
if importer is None:
try:
importer = ImpImporter(path_item)
except ImportError:
importer = None
return importer
def iter_importers(fullname=""):
"""Yield PEP 302 importers for the given module name
If fullname contains a '.', the importers will be for the package
containing fullname, otherwise they will be importers for sys.meta_path,
sys.path, and Python's "classic" import machinery, in that order. If
the named module is in a package, that package is imported as a side
effect of invoking this function.
Non PEP 302 mechanisms (e.g. the Windows registry) used by the
standard import machinery to find files in alternative locations
are partially supported, but are searched AFTER sys.path. Normally,
these locations are searched BEFORE sys.path, preventing sys.path
entries from shadowing them.
For this to cause a visible difference in behaviour, there must
be a module or package name that is accessible via both sys.path
and one of the non PEP 302 file system mechanisms. In this case,
the emulation will find the former version, while the builtin
import mechanism will find the latter.
Items of the following types can be affected by this discrepancy:
imp.C_EXTENSION, imp.PY_SOURCE, imp.PY_COMPILED, imp.PKG_DIRECTORY
"""
if fullname.startswith('.'):
raise ImportError("Relative module names not supported")
if '.' in fullname:
# Get the containing package's __path__
pkg = '.'.join(fullname.split('.')[:-1])
if pkg not in sys.modules:
__import__(pkg)
path = getattr(sys.modules[pkg], '__path__', None) or []
else:
for importer in sys.meta_path:
yield importer
path = sys.path
for item in path:
yield get_importer(item)
if '.' not in fullname:
yield ImpImporter()
def get_loader(module_or_name):
"""Get a PEP 302 "loader" object for module_or_name
If the module or package is accessible via the normal import
mechanism, a wrapper around the relevant part of that machinery
is returned. Returns None if the module cannot be found or imported.
If the named module is not already imported, its containing package
(if any) is imported, in order to establish the package __path__.
This function uses iter_importers(), and is thus subject to the same
limitations regarding platform-specific special import locations such
as the Windows registry.
"""
if module_or_name in sys.modules:
module_or_name = sys.modules[module_or_name]
if isinstance(module_or_name, ModuleType):
module = module_or_name
loader = getattr(module, '__loader__', None)
if loader is not None:
return loader
fullname = module.__name__
else:
fullname = module_or_name
return find_loader(fullname)
def find_loader(fullname):
"""Find a PEP 302 "loader" object for fullname
If fullname contains dots, path must be the containing package's __path__.
Returns None if the module cannot be found or imported. This function uses
iter_importers(), and is thus subject to the same limitations regarding
platform-specific special import locations such as the Windows registry.
"""
for importer in iter_importers(fullname):
loader = importer.find_module(fullname)
if loader is not None:
return loader
return None
def extend_path(path, name):
"""Extend a package's path.
Intended use is to place the following code in a package's __init__.py:
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
This will add to the package's __path__ all subdirectories of
directories on sys.path named after the package. This is useful
if one wants to distribute different parts of a single logical
package as multiple directories.
It also looks for *.pkg files beginning where * matches the name
argument. This feature is similar to *.pth files (see site.py),
except that it doesn't special-case lines starting with 'import'.
A *.pkg file is trusted at face value: apart from checking for
duplicates, all entries found in a *.pkg file are added to the
path, regardless of whether they are exist the filesystem. (This
is a feature.)
If the input path is not a list (as is the case for frozen
packages) it is returned unchanged. The input path is not
modified; an extended copy is returned. Items are only appended
to the copy at the end.
It is assumed that sys.path is a sequence. Items of sys.path that
are not (unicode or 8-bit) strings referring to existing
directories are ignored. Unicode items of sys.path that cause
errors when used as filenames may cause this function to raise an
exception (in line with os.path.isdir() behavior).
"""
if not isinstance(path, list):
# This could happen e.g. when this is called from inside a
# frozen package. Return the path unchanged in that case.
return path
pname = os.path.join(*name.split('.')) # Reconstitute as relative path
# Just in case os.extsep != '.'
sname = os.extsep.join(name.split('.'))
sname_pkg = sname + os.extsep + "pkg"
init_py = "__init__" + os.extsep + "py"
path = path[:] # Start with a copy of the existing path
for dir in sys.path:
if not isinstance(dir, basestring) or not os.path.isdir(dir):
continue
subdir = os.path.join(dir, pname)
# XXX This may still add duplicate entries to path on
# case-insensitive filesystems
initfile = os.path.join(subdir, init_py)
if subdir not in path and os.path.isfile(initfile):
path.append(subdir)
# XXX Is this the right thing for subpackages like zope.app?
# It looks for a file named "zope.app.pkg"
pkgfile = os.path.join(dir, sname_pkg)
if os.path.isfile(pkgfile):
try:
f = open(pkgfile)
except IOError, msg:
sys.stderr.write("Can't open %s: %s\n" %
(pkgfile, msg))
else:
for line in f:
line = line.rstrip('\n')
if not line or line.startswith('#'):
continue
path.append(line) # Don't check for existence!
f.close()
return path
def get_data(package, resource):
"""Get a resource from a package.
This is a wrapper round the PEP 302 loader get_data API. The package
argument should be the name of a package, in standard module format
(foo.bar). The resource argument should be in the form of a relative
filename, using '/' as the path separator. The parent directory name '..'
is not allowed, and nor is a rooted name (starting with a '/').
The function returns a binary string, which is the contents of the
specified resource.
For packages located in the filesystem, which have already been imported,
this is the rough equivalent of
d = os.path.dirname(sys.modules[package].__file__)
data = open(os.path.join(d, resource), 'rb').read()
If the package cannot be located or loaded, or it uses a PEP 302 loader
which does not support get_data(), then None is returned.
"""
loader = get_loader(package)
if loader is None or not hasattr(loader, 'get_data'):
return None
mod = sys.modules.get(package) or loader.load_module(package)
if mod is None or not hasattr(mod, '__file__'):
return None
# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
return loader.get_data(resource_name)
#!/usr/bin/env python
""" This module tries to retrieve as much platform-identifying data as
possible. It makes this information available via function APIs.
If called from the command line, it prints the platform
information concatenated as single string to stdout. The output
format is useable as part of a filename.
"""
# This module is maintained by Marc-Andre Lemburg <[email protected]>.
# If you find problems, please submit bug reports/patches via the
# Python bug tracker (http://bugs.python.org) and assign them to "lemburg".
#
# Note: Please keep this module compatible to Python 1.5.2.
#
# Still needed:
# * more support for WinCE
# * support for MS-DOS (PythonDX ?)
# * support for Amiga and other still unsupported platforms running Python
# * support for additional Linux distributions
#
# Many thanks to all those who helped adding platform-specific
# checks (in no particular order):
#
# Charles G Waldman, David Arnold, Gordon McMillan, Ben Darnell,
# Jeff Bauer, Cliff Crawford, Ivan Van Laningham, Josef
# Betancourt, Randall Hopper, Karl Putland, John Farrell, Greg
# Andruk, Just van Rossum, Thomas Heller, Mark R. Levinson, Mark
# Hammond, Bill Tutt, Hans Nowak, Uwe Zessin (OpenVMS support),
# Colin Kong, Trent Mick, Guido van Rossum, Anthony Baxter, Steve
# Dower
#
# History:
#
# <see CVS and SVN checkin messages for history>
#
# 1.0.8 - changed Windows support to read version from kernel32.dll
# 1.0.7 - added DEV_NULL
# 1.0.6 - added linux_distribution()
# 1.0.5 - fixed Java support to allow running the module on Jython
# 1.0.4 - added IronPython support
# 1.0.3 - added normalization of Windows system name
# 1.0.2 - added more Windows support
# 1.0.1 - reformatted to make doc.py happy
# 1.0.0 - reformatted a bit and checked into Python CVS
# 0.8.0 - added sys.version parser and various new access
# APIs (python_version(), python_compiler(), etc.)
# 0.7.2 - fixed architecture() to use sizeof(pointer) where available
# 0.7.1 - added support for Caldera OpenLinux
# 0.7.0 - some fixes for WinCE; untabified the source file
# 0.6.2 - support for OpenVMS - requires version 1.5.2-V006 or higher and
# vms_lib.getsyi() configured
# 0.6.1 - added code to prevent 'uname -p' on platforms which are
# known not to support it
# 0.6.0 - fixed win32_ver() to hopefully work on Win95,98,NT and Win2k;
# did some cleanup of the interfaces - some APIs have changed
# 0.5.5 - fixed another type in the MacOS code... should have
# used more coffee today ;-)
# 0.5.4 - fixed a few typos in the MacOS code
# 0.5.3 - added experimental MacOS support; added better popen()
# workarounds in _syscmd_ver() -- still not 100% elegant
# though
# 0.5.2 - fixed uname() to return '' instead of 'unknown' in all
# return values (the system uname command tends to return
# 'unknown' instead of just leaving the field empty)
# 0.5.1 - included code for slackware dist; added exception handlers
# to cover up situations where platforms don't have os.popen
# (e.g. Mac) or fail on socket.gethostname(); fixed libc
# detection RE
# 0.5.0 - changed the API names referring to system commands to *syscmd*;
# added java_ver(); made syscmd_ver() a private
# API (was system_ver() in previous versions) -- use uname()
# instead; extended the win32_ver() to also return processor
# type information
# 0.4.0 - added win32_ver() and modified the platform() output for WinXX
# 0.3.4 - fixed a bug in _follow_symlinks()
# 0.3.3 - fixed popen() and "file" command invokation bugs
# 0.3.2 - added architecture() API and support for it in platform()
# 0.3.1 - fixed syscmd_ver() RE to support Windows NT
# 0.3.0 - added system alias support
# 0.2.3 - removed 'wince' again... oh well.
# 0.2.2 - added 'wince' to syscmd_ver() supported platforms
# 0.2.1 - added cache logic and changed the platform string format
# 0.2.0 - changed the API to use functions instead of module globals
# since some action take too long to be run on module import
# 0.1.0 - first release
#
# You can always get the latest version of this module at:
#
# http://www.egenix.com/files/python/platform.py
#
# If that URL should fail, try contacting the author.
__copyright__ = """
Copyright (c) 1999-2000, Marc-Andre Lemburg; mailto:[email protected]
Copyright (c) 2000-2010, eGenix.com Software GmbH; mailto:[email protected]
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee or royalty is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation or portions thereof, including modifications,
that you make.
EGENIX.COM SOFTWARE GMBH DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL,
INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING
FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
WITH THE USE OR PERFORMANCE OF THIS SOFTWARE !
"""
__version__ = '1.0.7'
import sys,string,os,re
### Globals & Constants
# Determine the platform's /dev/null device
try:
DEV_NULL = os.devnull
except AttributeError:
# os.devnull was added in Python 2.4, so emulate it for earlier
# Python versions
if sys.platform in ('dos','win32','win16','os2'):
# Use the old CP/M NUL as device name
DEV_NULL = 'NUL'
else:
# Standard Unix uses /dev/null
DEV_NULL = '/dev/null'
# Helper for comparing two version number strings.
# Based on the description of the PHP's version_compare():
# http://php.net/manual/en/function.version-compare.php
_ver_stages = {
# any string not found in this dict, will get 0 assigned
'dev': 10,
'alpha': 20, 'a': 20,
'beta': 30, 'b': 30,
'c': 40,
'RC': 50, 'rc': 50,
# number, will get 100 assigned
'pl': 200, 'p': 200,
}
_component_re = re.compile(r'([0-9]+|[._+-])')
def _comparable_version(version):
result = []
for v in _component_re.split(version):
if v not in '._+-':
try:
v = int(v, 10)
t = 100
except ValueError:
t = _ver_stages.get(v, 0)
result.extend((t, v))
return result
### Platform specific APIs
_libc_search = re.compile(r'(__libc_init)'
'|'
'(GLIBC_([0-9.]+))'
'|'
'(libc(_\w+)?\.so(?:\.(\d[0-9.]*))?)')
def libc_ver(executable=sys.executable,lib='',version='', chunksize=2048):
""" Tries to determine the libc version that the file executable
(which defaults to the Python interpreter) is linked against.
Returns a tuple of strings (lib,version) which default to the
given parameters in case the lookup fails.
Note that the function has intimate knowledge of how different
libc versions add symbols to the executable and thus is probably
only useable for executables compiled using gcc.
The file is read and scanned in chunks of chunksize bytes.
"""
V = _comparable_version
if hasattr(os.path, 'realpath'):
# Python 2.2 introduced os.path.realpath(); it is used
# here to work around problems with Cygwin not being
# able to open symlinks for reading
executable = os.path.realpath(executable)
with open(executable, 'rb') as f:
binary = f.read(chunksize)
pos = 0
while pos < len(binary):
if 'libc' in binary or 'GLIBC' in binary:
m = _libc_search.search(binary, pos)
else:
m = None
if not m or m.end() == len(binary):
chunk = f.read(chunksize)
if chunk:
binary = binary[max(pos, len(binary) - 1000):] + chunk
pos = 0
continue
if not m:
break
libcinit,glibc,glibcversion,so,threads,soversion = m.groups()
if libcinit and not lib:
lib = 'libc'
elif glibc:
if lib != 'glibc':
lib = 'glibc'
version = glibcversion
elif V(glibcversion) > V(version):
version = glibcversion
elif so:
if lib != 'glibc':
lib = 'libc'
if soversion and (not version or V(soversion) > V(version)):
version = soversion
if threads and version[-len(threads):] != threads:
version = version + threads
pos = m.end()
return lib,version
def _dist_try_harder(distname,version,id):
""" Tries some special tricks to get the distribution
information in case the default method fails.
Currently supports older SuSE Linux, Caldera OpenLinux and
Slackware Linux distributions.
"""
if os.path.exists('/var/adm/inst-log/info'):
# SuSE Linux stores distribution information in that file
info = open('/var/adm/inst-log/info').readlines()
distname = 'SuSE'
for line in info:
tv = string.split(line)
if len(tv) == 2:
tag,value = tv
else:
continue
if tag == 'MIN_DIST_VERSION':
version = string.strip(value)
elif tag == 'DIST_IDENT':
values = string.split(value,'-')
id = values[2]
return distname,version,id
if os.path.exists('/etc/.installed'):
# Caldera OpenLinux has some infos in that file (thanks to Colin Kong)
info = open('/etc/.installed').readlines()
for line in info:
pkg = string.split(line,'-')
if len(pkg) >= 2 and pkg[0] == 'OpenLinux':
# XXX does Caldera support non Intel platforms ? If yes,
# where can we find the needed id ?
return 'OpenLinux',pkg[1],id
if os.path.isdir('/usr/lib/setup'):
# Check for slackware version tag file (thanks to Greg Andruk)
verfiles = os.listdir('/usr/lib/setup')
for n in range(len(verfiles)-1, -1, -1):
if verfiles[n][:14] != 'slack-version-':
del verfiles[n]
if verfiles:
verfiles.sort()
distname = 'slackware'
version = verfiles[-1][14:]
return distname,version,id
return distname,version,id
_release_filename = re.compile(r'(\w+)[-_](release|version)')
_lsb_release_version = re.compile(r'(.+)'
' release '
'([\d.]+)'
'[^(]*(?:\((.+)\))?')
_release_version = re.compile(r'([^0-9]+)'
'(?: release )?'
'([\d.]+)'
'[^(]*(?:\((.+)\))?')
# See also http://www.novell.com/coolsolutions/feature/11251.html
# and http://linuxmafia.com/faq/Admin/release-files.html
# and http://data.linux-ntfs.org/rpm/whichrpm
# and http://www.die.net/doc/linux/man/man1/lsb_release.1.html
_supported_dists = (
'SuSE', 'debian', 'fedora', 'redhat', 'centos',
'mandrake', 'mandriva', 'rocks', 'slackware', 'yellowdog', 'gentoo',
'UnitedLinux', 'turbolinux')
def _parse_release_file(firstline):
# Default to empty 'version' and 'id' strings. Both defaults are used
# when 'firstline' is empty. 'id' defaults to empty when an id can not
# be deduced.
version = ''
id = ''
# Parse the first line
m = _lsb_release_version.match(firstline)
if m is not None:
# LSB format: "distro release x.x (codename)"
return tuple(m.groups())
# Pre-LSB format: "distro x.x (codename)"
m = _release_version.match(firstline)
if m is not None:
return tuple(m.groups())
# Unknown format... take the first two words
l = string.split(string.strip(firstline))
if l:
version = l[0]
if len(l) > 1:
id = l[1]
return '', version, id
def linux_distribution(distname='', version='', id='',
supported_dists=_supported_dists,
full_distribution_name=1):
""" Tries to determine the name of the Linux OS distribution name.
The function first looks for a distribution release file in
/etc and then reverts to _dist_try_harder() in case no
suitable files are found.
supported_dists may be given to define the set of Linux
distributions to look for. It defaults to a list of currently
supported Linux distributions identified by their release file
name.
If full_distribution_name is true (default), the full
distribution read from the OS is returned. Otherwise the short
name taken from supported_dists is used.
Returns a tuple (distname,version,id) which default to the
args given as parameters.
"""
try:
etc = os.listdir('/etc')
except os.error:
# Probably not a Unix system
return distname,version,id
etc.sort()
for file in etc:
m = _release_filename.match(file)
if m is not None:
_distname,dummy = m.groups()
if _distname in supported_dists:
distname = _distname
break
else:
return _dist_try_harder(distname,version,id)
# Read the first line
f = open('/etc/'+file, 'r')
firstline = f.readline()
f.close()
_distname, _version, _id = _parse_release_file(firstline)
if _distname and full_distribution_name:
distname = _distname
if _version:
version = _version
if _id:
id = _id
return distname, version, id
# To maintain backwards compatibility:
def dist(distname='',version='',id='',
supported_dists=_supported_dists):
""" Tries to determine the name of the Linux OS distribution name.
The function first looks for a distribution release file in
/etc and then reverts to _dist_try_harder() in case no
suitable files are found.
Returns a tuple (distname,version,id) which default to the
args given as parameters.
"""
return linux_distribution(distname, version, id,
supported_dists=supported_dists,
full_distribution_name=0)
class _popen:
""" Fairly portable (alternative) popen implementation.
This is mostly needed in case os.popen() is not available, or
doesn't work as advertised, e.g. in Win9X GUI programs like
PythonWin or IDLE.
Writing to the pipe is currently not supported.
"""
tmpfile = ''
pipe = None
bufsize = None
mode = 'r'
def __init__(self,cmd,mode='r',bufsize=None):
if mode != 'r':
raise ValueError,'popen()-emulation only supports read mode'
import tempfile
self.tmpfile = tmpfile = tempfile.mktemp()
os.system(cmd + ' > %s' % tmpfile)
self.pipe = open(tmpfile,'rb')
self.bufsize = bufsize
self.mode = mode
def read(self):
return self.pipe.read()
def readlines(self):
if self.bufsize is not None:
return self.pipe.readlines()
def close(self,
remove=os.unlink,error=os.error):
if self.pipe:
rc = self.pipe.close()
else:
rc = 255
if self.tmpfile:
try:
remove(self.tmpfile)
except error:
pass
return rc
# Alias
__del__ = close
def popen(cmd, mode='r', bufsize=None):
""" Portable popen() interface.
"""
# Find a working popen implementation preferring win32pipe.popen
# over os.popen over _popen
popen = None
if os.environ.get('OS','') == 'Windows_NT':
# On NT win32pipe should work; on Win9x it hangs due to bugs
# in the MS C lib (see MS KnowledgeBase article Q150956)
try:
import win32pipe
except ImportError:
pass
else:
popen = win32pipe.popen
if popen is None:
if hasattr(os,'popen'):
popen = os.popen
# Check whether it works... it doesn't in GUI programs
# on Windows platforms
if sys.platform == 'win32': # XXX Others too ?
try:
popen('')
except os.error:
popen = _popen
else:
popen = _popen
if bufsize is None:
return popen(cmd,mode)
else:
return popen(cmd,mode,bufsize)
def _norm_version(version, build=''):
""" Normalize the version and build strings and return a single
version string using the format major.minor.build (or patchlevel).
"""
l = string.split(version,'.')
if build:
l.append(build)
try:
ints = map(int,l)
except ValueError:
strings = l
else:
strings = map(str,ints)
version = string.join(strings[:3],'.')
return version
_ver_output = re.compile(r'(?:([\w ]+) ([\w.]+) '
'.*'
'\[.* ([\d.]+)\])')
# Examples of VER command output:
#
# Windows 2000: Microsoft Windows 2000 [Version 5.00.2195]
# Windows XP: Microsoft Windows XP [Version 5.1.2600]
# Windows Vista: Microsoft Windows [Version 6.0.6002]
#
# Note that the "Version" string gets localized on different
# Windows versions.
def _syscmd_ver(system='', release='', version='',
supported_platforms=('win32','win16','dos','os2')):
""" Tries to figure out the OS version used and returns
a tuple (system,release,version).
It uses the "ver" shell command for this which is known
to exists on Windows, DOS and OS/2. XXX Others too ?
In case this fails, the given parameters are used as
defaults.
"""
if sys.platform not in supported_platforms:
return system,release,version
# Try some common cmd strings
for cmd in ('ver','command /c ver','cmd /c ver'):
try:
pipe = popen(cmd)
info = pipe.read()
if pipe.close():
raise os.error,'command failed'
# XXX How can I suppress shell errors from being written
# to stderr ?
except os.error,why:
#print 'Command %s failed: %s' % (cmd,why)
continue
except IOError,why:
#print 'Command %s failed: %s' % (cmd,why)
continue
else:
break
else:
return system,release,version
# Parse the output
info = string.strip(info)
m = _ver_output.match(info)
if m is not None:
system,release,version = m.groups()
# Strip trailing dots from version and release
if release[-1] == '.':
release = release[:-1]
if version[-1] == '.':
version = version[:-1]
# Normalize the version and build strings (eliminating additional
# zeros)
version = _norm_version(version)
return system,release,version
_WIN32_CLIENT_RELEASES = {
(5, 0): "2000",
(5, 1): "XP",
# Strictly, 5.2 client is XP 64-bit, but platform.py historically
# has always called it 2003 Server
(5, 2): "2003Server",
(5, None): "post2003",
(6, 0): "Vista",
(6, 1): "7",
(6, 2): "8",
(6, 3): "8.1",
(6, None): "post8.1",
(10, 0): "10",
(10, None): "post10",
}
# Server release name lookup will default to client names if necessary
_WIN32_SERVER_RELEASES = {
(5, 2): "2003Server",
(6, 0): "2008Server",
(6, 1): "2008ServerR2",
(6, 2): "2012Server",
(6, 3): "2012ServerR2",
(6, None): "post2012ServerR2",
}
def _get_real_winver(maj, min, build):
if maj < 6 or (maj == 6 and min < 2):
return maj, min, build
from ctypes import (c_buffer, POINTER, byref, create_unicode_buffer,
Structure, WinDLL, _Pointer)
from ctypes.wintypes import DWORD, HANDLE
class VS_FIXEDFILEINFO(Structure):
_fields_ = [
("dwSignature", DWORD),
("dwStrucVersion", DWORD),
("dwFileVersionMS", DWORD),
("dwFileVersionLS", DWORD),
("dwProductVersionMS", DWORD),
("dwProductVersionLS", DWORD),
("dwFileFlagsMask", DWORD),
("dwFileFlags", DWORD),
("dwFileOS", DWORD),
("dwFileType", DWORD),
("dwFileSubtype", DWORD),
("dwFileDateMS", DWORD),
("dwFileDateLS", DWORD),
]
class PVS_FIXEDFILEINFO(_Pointer):
_type_ = VS_FIXEDFILEINFO
kernel32 = WinDLL('kernel32')
version = WinDLL('version')
# We will immediately double the length up to MAX_PATH, but the
# path may be longer, so we retry until the returned string is
# shorter than our buffer.
name_len = actual_len = 130
while actual_len == name_len:
name_len *= 2
name = create_unicode_buffer(name_len)
actual_len = kernel32.GetModuleFileNameW(HANDLE(kernel32._handle),
name, len(name))
if not actual_len:
return maj, min, build
size = version.GetFileVersionInfoSizeW(name, None)
if not size:
return maj, min, build
ver_block = c_buffer(size)
if (not version.GetFileVersionInfoW(name, None, size, ver_block) or
not ver_block):
return maj, min, build
pvi = PVS_FIXEDFILEINFO()
if not version.VerQueryValueW(ver_block, "", byref(pvi), byref(DWORD())):
return maj, min, build
maj = pvi.contents.dwProductVersionMS >> 16
min = pvi.contents.dwProductVersionMS & 0xFFFF
build = pvi.contents.dwProductVersionLS >> 16
return maj, min, build
def win32_ver(release='', version='', csd='', ptype=''):
try:
from sys import getwindowsversion
except ImportError:
return release, version, csd, ptype
try:
from winreg import OpenKeyEx, QueryValueEx, CloseKey, HKEY_LOCAL_MACHINE
except ImportError:
from _winreg import OpenKeyEx, QueryValueEx, CloseKey, HKEY_LOCAL_MACHINE
winver = getwindowsversion()
maj, min, build = _get_real_winver(*winver[:3])
version = '{0}.{1}.{2}'.format(maj, min, build)
release = (_WIN32_CLIENT_RELEASES.get((maj, min)) or
_WIN32_CLIENT_RELEASES.get((maj, None)) or
release)
# getwindowsversion() reflect the compatibility mode Python is
# running under, and so the service pack value is only going to be
# valid if the versions match.
if winver[:2] == (maj, min):
try:
csd = 'SP{}'.format(winver.service_pack_major)
except AttributeError:
if csd[:13] == 'Service Pack ':
csd = 'SP' + csd[13:]
# VER_NT_SERVER = 3
if getattr(winver, 'product_type', None) == 3:
release = (_WIN32_SERVER_RELEASES.get((maj, min)) or
_WIN32_SERVER_RELEASES.get((maj, None)) or
release)
key = None
try:
key = OpenKeyEx(HKEY_LOCAL_MACHINE,
r'SOFTWARE\Microsoft\Windows NT\CurrentVersion')
ptype = QueryValueEx(key, 'CurrentType')[0]
except:
pass
finally:
if key:
CloseKey(key)
return release, version, csd, ptype
def _mac_ver_lookup(selectors,default=None):
from gestalt import gestalt
import MacOS
l = []
append = l.append
for selector in selectors:
try:
append(gestalt(selector))
except (RuntimeError, MacOS.Error):
append(default)
return l
def _bcd2str(bcd):
return hex(bcd)[2:]
def _mac_ver_gestalt():
"""
Thanks to Mark R. Levinson for mailing documentation links and
code examples for this function. Documentation for the
gestalt() API is available online at:
http://www.rgaros.nl/gestalt/
"""
# Check whether the version info module is available
try:
import gestalt
import MacOS
except ImportError:
return None
# Get the infos
sysv,sysa = _mac_ver_lookup(('sysv','sysa'))
# Decode the infos
if sysv:
major = (sysv & 0xFF00) >> 8
minor = (sysv & 0x00F0) >> 4
patch = (sysv & 0x000F)
if (major, minor) >= (10, 4):
# the 'sysv' gestald cannot return patchlevels
# higher than 9. Apple introduced 3 new
# gestalt codes in 10.4 to deal with this
# issue (needed because patch levels can
# run higher than 9, such as 10.4.11)
major,minor,patch = _mac_ver_lookup(('sys1','sys2','sys3'))
release = '%i.%i.%i' %(major, minor, patch)
else:
release = '%s.%i.%i' % (_bcd2str(major),minor,patch)
if sysa:
machine = {0x1: '68k',
0x2: 'PowerPC',
0xa: 'i386'}.get(sysa,'')
versioninfo=('', '', '')
return release,versioninfo,machine
def _mac_ver_xml():
fn = '/System/Library/CoreServices/SystemVersion.plist'
if not os.path.exists(fn):
return None
try:
import plistlib
except ImportError:
return None
pl = plistlib.readPlist(fn)
release = pl['ProductVersion']
versioninfo=('', '', '')
machine = os.uname()[4]
if machine in ('ppc', 'Power Macintosh'):
# for compatibility with the gestalt based code
machine = 'PowerPC'
return release,versioninfo,machine
def mac_ver(release='',versioninfo=('','',''),machine=''):
""" Get MacOS version information and return it as tuple (release,
versioninfo, machine) with versioninfo being a tuple (version,
dev_stage, non_release_version).
Entries which cannot be determined are set to the parameter values
which default to ''. All tuple entries are strings.
"""
# First try reading the information from an XML file which should
# always be present
info = _mac_ver_xml()
if info is not None:
return info
# If that doesn't work for some reason fall back to reading the
# information using gestalt calls.
info = _mac_ver_gestalt()
if info is not None:
return info
# If that also doesn't work return the default values
return release,versioninfo,machine
def _java_getprop(name,default):
from java.lang import System
try:
value = System.getProperty(name)
if value is None:
return default
return value
except AttributeError:
return default
def java_ver(release='',vendor='',vminfo=('','',''),osinfo=('','','')):
""" Version interface for Jython.
Returns a tuple (release,vendor,vminfo,osinfo) with vminfo being
a tuple (vm_name,vm_release,vm_vendor) and osinfo being a
tuple (os_name,os_version,os_arch).
Values which cannot be determined are set to the defaults
given as parameters (which all default to '').
"""
# Import the needed APIs
try:
import java.lang
except ImportError:
return release,vendor,vminfo,osinfo
vendor = _java_getprop('java.vendor', vendor)
release = _java_getprop('java.version', release)
vm_name, vm_release, vm_vendor = vminfo
vm_name = _java_getprop('java.vm.name', vm_name)
vm_vendor = _java_getprop('java.vm.vendor', vm_vendor)
vm_release = _java_getprop('java.vm.version', vm_release)
vminfo = vm_name, vm_release, vm_vendor
os_name, os_version, os_arch = osinfo
os_arch = _java_getprop('java.os.arch', os_arch)
os_name = _java_getprop('java.os.name', os_name)
os_version = _java_getprop('java.os.version', os_version)
osinfo = os_name, os_version, os_arch
return release, vendor, vminfo, osinfo
### System name aliasing
def system_alias(system,release,version):
""" Returns (system,release,version) aliased to common
marketing names used for some systems.
It also does some reordering of the information in some cases
where it would otherwise cause confusion.
"""
if system == 'Rhapsody':
# Apple's BSD derivative
# XXX How can we determine the marketing release number ?
return 'MacOS X Server',system+release,version
elif system == 'SunOS':
# Sun's OS
if release < '5':
# These releases use the old name SunOS
return system,release,version
# Modify release (marketing release = SunOS release - 3)
l = string.split(release,'.')
if l:
try:
major = int(l[0])
except ValueError:
pass
else:
major = major - 3
l[0] = str(major)
release = string.join(l,'.')
if release < '6':
system = 'Solaris'
else:
# XXX Whatever the new SunOS marketing name is...
system = 'Solaris'
elif system == 'IRIX64':
# IRIX reports IRIX64 on platforms with 64-bit support; yet it
# is really a version and not a different platform, since 32-bit
# apps are also supported..
system = 'IRIX'
if version:
version = version + ' (64bit)'
else:
version = '64bit'
elif system in ('win32','win16'):
# In case one of the other tricks
system = 'Windows'
return system,release,version
### Various internal helpers
def _platform(*args):
""" Helper to format the platform string in a filename
compatible format e.g. "system-version-machine".
"""
# Format the platform string
platform = string.join(
map(string.strip,
filter(len, args)),
'-')
# Cleanup some possible filename obstacles...
replace = string.replace
platform = replace(platform,' ','_')
platform = replace(platform,'/','-')
platform = replace(platform,'\\','-')
platform = replace(platform,':','-')
platform = replace(platform,';','-')
platform = replace(platform,'"','-')
platform = replace(platform,'(','-')
platform = replace(platform,')','-')
# No need to report 'unknown' information...
platform = replace(platform,'unknown','')
# Fold '--'s and remove trailing '-'
while 1:
cleaned = replace(platform,'--','-')
if cleaned == platform:
break
platform = cleaned
while platform[-1] == '-':
platform = platform[:-1]
return platform
def _node(default=''):
""" Helper to determine the node name of this machine.
"""
try:
import socket
except ImportError:
# No sockets...
return default
try:
return socket.gethostname()
except socket.error:
# Still not working...
return default
# os.path.abspath is new in Python 1.5.2:
if not hasattr(os.path,'abspath'):
def _abspath(path,
isabs=os.path.isabs,join=os.path.join,getcwd=os.getcwd,
normpath=os.path.normpath):
if not isabs(path):
path = join(getcwd(), path)
return normpath(path)
else:
_abspath = os.path.abspath
def _follow_symlinks(filepath):
""" In case filepath is a symlink, follow it until a
real file is reached.
"""
filepath = _abspath(filepath)
while os.path.islink(filepath):
filepath = os.path.normpath(
os.path.join(os.path.dirname(filepath),os.readlink(filepath)))
return filepath
def _syscmd_uname(option,default=''):
""" Interface to the system's uname command.
"""
if sys.platform in ('dos','win32','win16','os2'):
# XXX Others too ?
return default
try:
f = os.popen('uname %s 2> %s' % (option, DEV_NULL))
except (AttributeError,os.error):
return default
output = string.strip(f.read())
rc = f.close()
if not output or rc:
return default
else:
return output
def _syscmd_file(target,default=''):
""" Interface to the system's file command.
The function uses the -b option of the file command to have it
ommit the filename in its output and if possible the -L option
to have the command follow symlinks. It returns default in
case the command should fail.
"""
# We do the import here to avoid a bootstrap issue.
# See c73b90b6dadd changeset.
#
# [..]
# ranlib libpython2.7.a
# gcc -o python \
# Modules/python.o \
# libpython2.7.a -lsocket -lnsl -ldl -lm
# Traceback (most recent call last):
# File "./setup.py", line 8, in <module>
# from platform import machine as platform_machine
# File "[..]/build/Lib/platform.py", line 116, in <module>
# import sys,string,os,re,subprocess
# File "[..]/build/Lib/subprocess.py", line 429, in <module>
# import select
# ImportError: No module named select
import subprocess
if sys.platform in ('dos','win32','win16','os2'):
# XXX Others too ?
return default
target = _follow_symlinks(target)
try:
proc = subprocess.Popen(['file', target],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
except (AttributeError,os.error):
return default
output = proc.communicate()[0]
rc = proc.wait()
if not output or rc:
return default
else:
return output
### Information about the used architecture
# Default values for architecture; non-empty strings override the
# defaults given as parameters
_default_architecture = {
'win32': ('','WindowsPE'),
'win16': ('','Windows'),
'dos': ('','MSDOS'),
}
_architecture_split = re.compile(r'[\s,]').split
def architecture(executable=sys.executable,bits='',linkage=''):
""" Queries the given executable (defaults to the Python interpreter
binary) for various architecture information.
Returns a tuple (bits,linkage) which contains information about
the bit architecture and the linkage format used for the
executable. Both values are returned as strings.
Values that cannot be determined are returned as given by the
parameter presets. If bits is given as '', the sizeof(pointer)
(or sizeof(long) on Python version < 1.5.2) is used as
indicator for the supported pointer size.
The function relies on the system's "file" command to do the
actual work. This is available on most if not all Unix
platforms. On some non-Unix platforms where the "file" command
does not exist and the executable is set to the Python interpreter
binary defaults from _default_architecture are used.
"""
# Use the sizeof(pointer) as default number of bits if nothing
# else is given as default.
if not bits:
import struct
try:
size = struct.calcsize('P')
except struct.error:
# Older installations can only query longs
size = struct.calcsize('l')
bits = str(size*8) + 'bit'
# Get data from the 'file' system command
if executable:
output = _syscmd_file(executable, '')
else:
output = ''
if not output and \
executable == sys.executable:
# "file" command did not return anything; we'll try to provide
# some sensible defaults then...
if sys.platform in _default_architecture:
b, l = _default_architecture[sys.platform]
if b:
bits = b
if l:
linkage = l
return bits, linkage
# Split the output into a list of strings omitting the filename
fileout = _architecture_split(output)[1:]
if 'executable' not in fileout:
# Format not supported
return bits,linkage
# Bits
if '32-bit' in fileout:
bits = '32bit'
elif 'N32' in fileout:
# On Irix only
bits = 'n32bit'
elif '64-bit' in fileout:
bits = '64bit'
# Linkage
if 'ELF' in fileout:
linkage = 'ELF'
elif 'PE' in fileout:
# E.g. Windows uses this format
if 'Windows' in fileout:
linkage = 'WindowsPE'
else:
linkage = 'PE'
elif 'COFF' in fileout:
linkage = 'COFF'
elif 'MS-DOS' in fileout:
linkage = 'MSDOS'
else:
# XXX the A.OUT format also falls under this class...
pass
return bits,linkage
### Portable uname() interface
_uname_cache = None
def uname():
""" Fairly portable uname interface. Returns a tuple
of strings (system,node,release,version,machine,processor)
identifying the underlying platform.
Note that unlike the os.uname function this also returns
possible processor information as an additional tuple entry.
Entries which cannot be determined are set to ''.
"""
global _uname_cache
no_os_uname = 0
if _uname_cache is not None:
return _uname_cache
processor = ''
# Get some infos from the builtin os.uname API...
try:
system,node,release,version,machine = os.uname()
except AttributeError:
no_os_uname = 1
if no_os_uname or not filter(None, (system, node, release, version, machine)):
# Hmm, no there is either no uname or uname has returned
#'unknowns'... we'll have to poke around the system then.
if no_os_uname:
system = sys.platform
release = ''
version = ''
node = _node()
machine = ''
use_syscmd_ver = 1
# Try win32_ver() on win32 platforms
if system == 'win32' or (system == 'cli' and os.name == 'nt'):
release,version,csd,ptype = win32_ver()
if release and version:
use_syscmd_ver = 0
# Try to use the PROCESSOR_* environment variables
# available on Win XP and later; see
# http://support.microsoft.com/kb/888731 and
# http://www.geocities.com/rick_lively/MANUALS/ENV/MSWIN/PROCESSI.HTM
if not machine:
# WOW64 processes mask the native architecture
if "PROCESSOR_ARCHITEW6432" in os.environ:
machine = os.environ.get("PROCESSOR_ARCHITEW6432", '')
else:
machine = os.environ.get('PROCESSOR_ARCHITECTURE', '')
if not processor:
processor = os.environ.get('PROCESSOR_IDENTIFIER', machine)
# Try the 'ver' system command available on some
# platforms
if use_syscmd_ver:
system,release,version = _syscmd_ver(system)
# Normalize system to what win32_ver() normally returns
# (_syscmd_ver() tends to return the vendor name as well)
if system == 'Microsoft Windows':
system = 'Windows'
elif system == 'Microsoft' and release == 'Windows':
# Under Windows Vista and Windows Server 2008,
# Microsoft changed the output of the ver command. The
# release is no longer printed. This causes the
# system and release to be misidentified.
system = 'Windows'
if '6.0' == version[:3]:
release = 'Vista'
else:
release = ''
# In case we still don't know anything useful, we'll try to
# help ourselves
if system in ('win32','win16'):
if not version:
if system == 'win32':
version = '32bit'
else:
version = '16bit'
system = 'Windows'
elif system[:4] == 'java':
release,vendor,vminfo,osinfo = java_ver()
system = 'Java'
version = string.join(vminfo,', ')
if not version:
version = vendor
elif system == 'cli':
# we know this is Windows, because uname exists on Linux and macOS
system = 'Windows'
# System specific extensions
if system == 'OpenVMS':
# OpenVMS seems to have release and version mixed up
if not release or release == '0':
release = version
version = ''
# Get processor information
try:
import vms_lib
except ImportError:
pass
else:
csid, cpu_number = vms_lib.getsyi('SYI$_CPU',0)
if (cpu_number >= 128):
processor = 'Alpha'
else:
processor = 'VAX'
if not processor:
# Get processor information from the uname system command
processor = _syscmd_uname('-p','')
#If any unknowns still exist, replace them with ''s, which are more portable
if system == 'unknown':
system = ''
if node == 'unknown':
node = ''
if release == 'unknown':
release = ''
if version == 'unknown':
version = ''
if machine == 'unknown':
machine = ''
if processor == 'unknown':
processor = ''
# normalize name
if system == 'Microsoft' and release == 'Windows':
system = 'Windows'
release = 'Vista'
_uname_cache = system,node,release,version,machine,processor
return _uname_cache
### Direct interfaces to some of the uname() return values
def system():
""" Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'.
An empty string is returned if the value cannot be determined.
"""
return uname()[0]
def node():
""" Returns the computer's network name (which may not be fully
qualified)
An empty string is returned if the value cannot be determined.
"""
return uname()[1]
def release():
""" Returns the system's release, e.g. '2.2.0' or 'NT'
An empty string is returned if the value cannot be determined.
"""
return uname()[2]
def version():
""" Returns the system's release version, e.g. '#3 on degas'
An empty string is returned if the value cannot be determined.
"""
return uname()[3]
def machine():
""" Returns the machine type, e.g. 'i386'
An empty string is returned if the value cannot be determined.
"""
return uname()[4]
def processor():
""" Returns the (true) processor name, e.g. 'amdk6'
An empty string is returned if the value cannot be
determined. Note that many platforms do not provide this
information or simply return the same value as for machine(),
e.g. NetBSD does this.
"""
return uname()[5]
### Various APIs for extracting information from sys.version
_sys_version_parser = re.compile(
r'([\w.+]+)\s*' # "version<space>"
r'(?: DEBUG)?\s*' # DEBUG - IronPython DEBUG builds only
r'\(#?([^,]+)' # "(#buildno"
r'(?:,\s*([\w ]*)' # ", builddate"
r'(?:,\s*([\w :]*))?)?\)\s*' # ", buildtime)<space>"
r'\[([^\]]+)\]?') # "[compiler]"
_ironpython_sys_version_parser = re.compile(
r'([\w\.]+)\s*\('
'IronPython\s*[\w\.]+'
'(?: (?:Alpha|Beta|RC) ?[\d\.]+)?(?: DEBUG)?'
'(?: \(([\d\.]+)\))?'
' on ((?:.NET|Mono) [\d\.]+ \((?:32|64)-bit\))\)')
_pypy_sys_version_parser = re.compile(
r'([\w.+]+)\s*'
'\(#?([^,]+),\s*([\w ]+),\s*([\w :]+)\)\s*'
'\[PyPy [^\]]+\]?')
_sys_version_cache = {}
def _sys_version(sys_version=None):
""" Returns a parsed version of Python's sys.version as tuple
(name, version, branch, revision, buildno, builddate, compiler)
referring to the Python implementation name, version, branch,
revision, build number, build date/time as string and the compiler
identification string.
Note that unlike the Python sys.version, the returned value
for the Python version will always include the patchlevel (it
defaults to '.0').
The function returns empty strings for tuple entries that
cannot be determined.
sys_version may be given to parse an alternative version
string, e.g. if the version was read from a different Python
interpreter.
"""
# Get the Python version
if sys_version is None:
sys_version = sys.version
# Try the cache first
result = _sys_version_cache.get(sys_version, None)
if result is not None:
return result
# Parse it
if 'IronPython' in sys_version:
# IronPython
name = 'IronPython'
match = _ironpython_sys_version_parser.match(sys_version)
if match is None:
raise ValueError(
'failed to parse IronPython sys.version: %s' %
repr(sys_version))
version, alt_version, compiler = match.groups()
buildno = ''
builddate = ''
elif sys.platform == "cli":
# IronPython
name = 'IronPython'
match = _sys_version_parser.match(sys_version)
if match is None:
raise ValueError(
'failed to parse IronPython sys.version: %s' %
repr(sys_version))
version, _, _, _, _ = match.groups()
buildno = ''
builddate = ''
compiler = ''
elif sys.platform.startswith('java'):
# Jython
name = 'Jython'
match = _sys_version_parser.match(sys_version)
if match is None:
raise ValueError(
'failed to parse Jython sys.version: %s' %
repr(sys_version))
version, buildno, builddate, buildtime, _ = match.groups()
if builddate is None:
builddate = ''
compiler = sys.platform
elif "PyPy" in sys_version:
# PyPy
name = "PyPy"
match = _pypy_sys_version_parser.match(sys_version)
if match is None:
raise ValueError("failed to parse PyPy sys.version: %s" %
repr(sys_version))
version, buildno, builddate, buildtime = match.groups()
compiler = ""
else:
# CPython
match = _sys_version_parser.match(sys_version)
if match is None:
raise ValueError(
'failed to parse CPython sys.version: %s' %
repr(sys_version))
version, buildno, builddate, buildtime, compiler = \
match.groups()
name = 'CPython'
if builddate is None:
builddate = ''
elif buildtime:
builddate = builddate + ' ' + buildtime
if hasattr(sys, 'subversion'):
# sys.subversion was added in Python 2.5
_, branch, revision = sys.subversion
else:
branch = ''
revision = ''
# Add the patchlevel version if missing
l = string.split(version, '.')
if len(l) == 2:
l.append('0')
version = string.join(l, '.')
# Build and cache the result
result = (name, version, branch, revision, buildno, builddate, compiler)
_sys_version_cache[sys_version] = result
return result
def python_implementation():
""" Returns a string identifying the Python implementation.
Currently, the following implementations are identified:
'CPython' (C implementation of Python),
'IronPython' (.NET implementation of Python),
'Jython' (Java implementation of Python),
'PyPy' (Python implementation of Python).
"""
return _sys_version()[0]
def python_version():
""" Returns the Python version as string 'major.minor.patchlevel'
Note that unlike the Python sys.version, the returned value
will always include the patchlevel (it defaults to 0).
"""
return _sys_version()[1]
def python_version_tuple():
""" Returns the Python version as tuple (major, minor, patchlevel)
of strings.
Note that unlike the Python sys.version, the returned value
will always include the patchlevel (it defaults to 0).
"""
return tuple(string.split(_sys_version()[1], '.'))
def python_branch():
""" Returns a string identifying the Python implementation
branch.
For CPython this is the Subversion branch from which the
Python binary was built.
If not available, an empty string is returned.
"""
return _sys_version()[2]
def python_revision():
""" Returns a string identifying the Python implementation
revision.
For CPython this is the Subversion revision from which the
Python binary was built.
If not available, an empty string is returned.
"""
return _sys_version()[3]
def python_build():
""" Returns a tuple (buildno, builddate) stating the Python
build number and date as strings.
"""
return _sys_version()[4:6]
def python_compiler():
""" Returns a string identifying the compiler used for compiling
Python.
"""
return _sys_version()[6]
### The Opus Magnum of platform strings :-)
_platform_cache = {}
def platform(aliased=0, terse=0):
""" Returns a single string identifying the underlying platform
with as much useful information as possible (but no more :).
The output is intended to be human readable rather than
machine parseable. It may look different on different
platforms and this is intended.
If "aliased" is true, the function will use aliases for
various platforms that report system names which differ from
their common names, e.g. SunOS will be reported as
Solaris. The system_alias() function is used to implement
this.
Setting terse to true causes the function to return only the
absolute minimum information needed to identify the platform.
"""
result = _platform_cache.get((aliased, terse), None)
if result is not None:
return result
# Get uname information and then apply platform specific cosmetics
# to it...
system,node,release,version,machine,processor = uname()
if machine == processor:
processor = ''
if aliased:
system,release,version = system_alias(system,release,version)
if system == 'Windows':
# MS platforms
rel,vers,csd,ptype = win32_ver(version)
if terse:
platform = _platform(system,release)
else:
platform = _platform(system,release,version,csd)
elif system in ('Linux',):
# Linux based systems
distname,distversion,distid = dist('')
if distname and not terse:
platform = _platform(system,release,machine,processor,
'with',
distname,distversion,distid)
else:
# If the distribution name is unknown check for libc vs. glibc
libcname,libcversion = libc_ver(sys.executable)
platform = _platform(system,release,machine,processor,
'with',
libcname+libcversion)
elif system == 'Java':
# Java platforms
r,v,vminfo,(os_name,os_version,os_arch) = java_ver()
if terse or not os_name:
platform = _platform(system,release,version)
else:
platform = _platform(system,release,version,
'on',
os_name,os_version,os_arch)
elif system == 'MacOS':
# MacOS platforms
if terse:
platform = _platform(system,release)
else:
platform = _platform(system,release,machine)
else:
# Generic handler
if terse:
platform = _platform(system,release)
else:
bits,linkage = architecture(sys.executable)
platform = _platform(system,release,machine,processor,bits,linkage)
_platform_cache[(aliased, terse)] = platform
return platform
### Command line interface
if __name__ == '__main__':
# Default is to print the aliased verbose platform string
terse = ('terse' in sys.argv or '--terse' in sys.argv)
aliased = (not 'nonaliased' in sys.argv and not '--nonaliased' in sys.argv)
print platform(aliased,terse)
sys.exit(0)
r"""plistlib.py -- a tool to generate and parse MacOSX .plist files.
The PropertyList (.plist) file format is a simple XML pickle supporting
basic object types, like dictionaries, lists, numbers and strings.
Usually the top level object is a dictionary.
To write out a plist file, use the writePlist(rootObject, pathOrFile)
function. 'rootObject' is the top level object, 'pathOrFile' is a
filename or a (writable) file object.
To parse a plist from a file, use the readPlist(pathOrFile) function,
with a file name or a (readable) file object as the only argument. It
returns the top level object (again, usually a dictionary).
To work with plist data in strings, you can use readPlistFromString()
and writePlistToString().
Values can be strings, integers, floats, booleans, tuples, lists,
dictionaries, Data or datetime.datetime objects. String values (including
dictionary keys) may be unicode strings -- they will be written out as
UTF-8.
The <data> plist type is supported through the Data class. This is a
thin wrapper around a Python string.
Generate Plist example:
pl = dict(
aString="Doodah",
aList=["A", "B", 12, 32.1, [1, 2, 3]],
aFloat=0.1,
anInt=728,
aDict=dict(
anotherString="<hello & hi there!>",
aUnicodeValue=u'M\xe4ssig, Ma\xdf',
aTrueValue=True,
aFalseValue=False,
),
someData=Data("<binary gunk>"),
someMoreData=Data("<lots of binary gunk>" * 10),
aDate=datetime.datetime.fromtimestamp(time.mktime(time.gmtime())),
)
# unicode keys are possible, but a little awkward to use:
pl[u'\xc5benraa'] = "That was a unicode key."
writePlist(pl, fileName)
Parse Plist example:
pl = readPlist(pathOrFile)
print pl["aKey"]
"""
__all__ = [
"readPlist", "writePlist", "readPlistFromString", "writePlistToString",
"readPlistFromResource", "writePlistToResource",
"Plist", "Data", "Dict"
]
# Note: the Plist and Dict classes have been deprecated.
import binascii
import datetime
from cStringIO import StringIO
import re
import warnings
def readPlist(pathOrFile):
"""Read a .plist file. 'pathOrFile' may either be a file name or a
(readable) file object. Return the unpacked root object (which
usually is a dictionary).
"""
didOpen = 0
if isinstance(pathOrFile, (str, unicode)):
pathOrFile = open(pathOrFile)
didOpen = 1
p = PlistParser()
rootObject = p.parse(pathOrFile)
if didOpen:
pathOrFile.close()
return rootObject
def writePlist(rootObject, pathOrFile):
"""Write 'rootObject' to a .plist file. 'pathOrFile' may either be a
file name or a (writable) file object.
"""
didOpen = 0
if isinstance(pathOrFile, (str, unicode)):
pathOrFile = open(pathOrFile, "w")
didOpen = 1
writer = PlistWriter(pathOrFile)
writer.writeln("<plist version=\"1.0\">")
writer.writeValue(rootObject)
writer.writeln("</plist>")
if didOpen:
pathOrFile.close()
def readPlistFromString(data):
"""Read a plist data from a string. Return the root object.
"""
return readPlist(StringIO(data))
def writePlistToString(rootObject):
"""Return 'rootObject' as a plist-formatted string.
"""
f = StringIO()
writePlist(rootObject, f)
return f.getvalue()
def readPlistFromResource(path, restype='plst', resid=0):
"""Read plst resource from the resource fork of path.
"""
warnings.warnpy3k("In 3.x, readPlistFromResource is removed.",
stacklevel=2)
from Carbon.File import FSRef, FSGetResourceForkName
from Carbon.Files import fsRdPerm
from Carbon import Res
fsRef = FSRef(path)
resNum = Res.FSOpenResourceFile(fsRef, FSGetResourceForkName(), fsRdPerm)
Res.UseResFile(resNum)
plistData = Res.Get1Resource(restype, resid).data
Res.CloseResFile(resNum)
return readPlistFromString(plistData)
def writePlistToResource(rootObject, path, restype='plst', resid=0):
"""Write 'rootObject' as a plst resource to the resource fork of path.
"""
warnings.warnpy3k("In 3.x, writePlistToResource is removed.", stacklevel=2)
from Carbon.File import FSRef, FSGetResourceForkName
from Carbon.Files import fsRdWrPerm
from Carbon import Res
plistData = writePlistToString(rootObject)
fsRef = FSRef(path)
resNum = Res.FSOpenResourceFile(fsRef, FSGetResourceForkName(), fsRdWrPerm)
Res.UseResFile(resNum)
try:
Res.Get1Resource(restype, resid).RemoveResource()
except Res.Error:
pass
res = Res.Resource(plistData)
res.AddResource(restype, resid, '')
res.WriteResource()
Res.CloseResFile(resNum)
class DumbXMLWriter:
def __init__(self, file, indentLevel=0, indent="\t"):
self.file = file
self.stack = []
self.indentLevel = indentLevel
self.indent = indent
def beginElement(self, element):
self.stack.append(element)
self.writeln("<%s>" % element)
self.indentLevel += 1
def endElement(self, element):
assert self.indentLevel > 0
assert self.stack.pop() == element
self.indentLevel -= 1
self.writeln("</%s>" % element)
def simpleElement(self, element, value=None):
if value is not None:
value = _escapeAndEncode(value)
self.writeln("<%s>%s</%s>" % (element, value, element))
else:
self.writeln("<%s/>" % element)
def writeln(self, line):
if line:
self.file.write(self.indentLevel * self.indent + line + "\n")
else:
self.file.write("\n")
# Contents should conform to a subset of ISO 8601
# (in particular, YYYY '-' MM '-' DD 'T' HH ':' MM ':' SS 'Z'. Smaller units may be omitted with
# a loss of precision)
_dateParser = re.compile(r"(?P<year>\d\d\d\d)(?:-(?P<month>\d\d)(?:-(?P<day>\d\d)(?:T(?P<hour>\d\d)(?::(?P<minute>\d\d)(?::(?P<second>\d\d))?)?)?)?)?Z")
def _dateFromString(s):
order = ('year', 'month', 'day', 'hour', 'minute', 'second')
gd = _dateParser.match(s).groupdict()
lst = []
for key in order:
val = gd[key]
if val is None:
break
lst.append(int(val))
return datetime.datetime(*lst)
def _dateToString(d):
return '%04d-%02d-%02dT%02d:%02d:%02dZ' % (
d.year, d.month, d.day,
d.hour, d.minute, d.second
)
# Regex to find any control chars, except for \t \n and \r
_controlCharPat = re.compile(
r"[\x00\x01\x02\x03\x04\x05\x06\x07\x08\x0b\x0c\x0e\x0f"
r"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f]")
def _escapeAndEncode(text):
m = _controlCharPat.search(text)
if m is not None:
raise ValueError("strings can't contains control characters; "
"use plistlib.Data instead")
text = text.replace("\r\n", "\n") # convert DOS line endings
text = text.replace("\r", "\n") # convert Mac line endings
text = text.replace("&", "&") # escape '&'
text = text.replace("<", "<") # escape '<'
text = text.replace(">", ">") # escape '>'
return text.encode("utf-8") # encode as UTF-8
PLISTHEADER = """\
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
"""
class PlistWriter(DumbXMLWriter):
def __init__(self, file, indentLevel=0, indent="\t", writeHeader=1):
if writeHeader:
file.write(PLISTHEADER)
DumbXMLWriter.__init__(self, file, indentLevel, indent)
def writeValue(self, value):
if isinstance(value, (str, unicode)):
self.simpleElement("string", value)
elif isinstance(value, bool):
# must switch for bool before int, as bool is a
# subclass of int...
if value:
self.simpleElement("true")
else:
self.simpleElement("false")
elif isinstance(value, (int, long)):
self.simpleElement("integer", "%d" % value)
elif isinstance(value, float):
self.simpleElement("real", repr(value))
elif isinstance(value, dict):
self.writeDict(value)
elif isinstance(value, Data):
self.writeData(value)
elif isinstance(value, datetime.datetime):
self.simpleElement("date", _dateToString(value))
elif isinstance(value, (tuple, list)):
self.writeArray(value)
else:
raise TypeError("unsuported type: %s" % type(value))
def writeData(self, data):
self.beginElement("data")
self.indentLevel -= 1
maxlinelength = max(16, 76 - len(self.indent.replace("\t", " " * 8) *
self.indentLevel))
for line in data.asBase64(maxlinelength).split("\n"):
if line:
self.writeln(line)
self.indentLevel += 1
self.endElement("data")
def writeDict(self, d):
self.beginElement("dict")
items = d.items()
items.sort()
for key, value in items:
if not isinstance(key, (str, unicode)):
raise TypeError("keys must be strings")
self.simpleElement("key", key)
self.writeValue(value)
self.endElement("dict")
def writeArray(self, array):
self.beginElement("array")
for value in array:
self.writeValue(value)
self.endElement("array")
class _InternalDict(dict):
# This class is needed while Dict is scheduled for deprecation:
# we only need to warn when a *user* instantiates Dict or when
# the "attribute notation for dict keys" is used.
def __getattr__(self, attr):
try:
value = self[attr]
except KeyError:
raise AttributeError, attr
from warnings import warn
warn("Attribute access from plist dicts is deprecated, use d[key] "
"notation instead", PendingDeprecationWarning, 2)
return value
def __setattr__(self, attr, value):
from warnings import warn
warn("Attribute access from plist dicts is deprecated, use d[key] "
"notation instead", PendingDeprecationWarning, 2)
self[attr] = value
def __delattr__(self, attr):
try:
del self[attr]
except KeyError:
raise AttributeError, attr
from warnings import warn
warn("Attribute access from plist dicts is deprecated, use d[key] "
"notation instead", PendingDeprecationWarning, 2)
class Dict(_InternalDict):
def __init__(self, **kwargs):
from warnings import warn
warn("The plistlib.Dict class is deprecated, use builtin dict instead",
PendingDeprecationWarning, 2)
super(Dict, self).__init__(**kwargs)
class Plist(_InternalDict):
"""This class has been deprecated. Use readPlist() and writePlist()
functions instead, together with regular dict objects.
"""
def __init__(self, **kwargs):
from warnings import warn
warn("The Plist class is deprecated, use the readPlist() and "
"writePlist() functions instead", PendingDeprecationWarning, 2)
super(Plist, self).__init__(**kwargs)
def fromFile(cls, pathOrFile):
"""Deprecated. Use the readPlist() function instead."""
rootObject = readPlist(pathOrFile)
plist = cls()
plist.update(rootObject)
return plist
fromFile = classmethod(fromFile)
def write(self, pathOrFile):
"""Deprecated. Use the writePlist() function instead."""
writePlist(self, pathOrFile)
def _encodeBase64(s, maxlinelength=76):
# copied from base64.encodestring(), with added maxlinelength argument
maxbinsize = (maxlinelength//4)*3
pieces = []
for i in range(0, len(s), maxbinsize):
chunk = s[i : i + maxbinsize]
pieces.append(binascii.b2a_base64(chunk))
return "".join(pieces)
class Data:
"""Wrapper for binary data."""
def __init__(self, data):
self.data = data
def fromBase64(cls, data):
# base64.decodestring just calls binascii.a2b_base64;
# it seems overkill to use both base64 and binascii.
return cls(binascii.a2b_base64(data))
fromBase64 = classmethod(fromBase64)
def asBase64(self, maxlinelength=76):
return _encodeBase64(self.data, maxlinelength)
def __cmp__(self, other):
if isinstance(other, self.__class__):
return cmp(self.data, other.data)
elif isinstance(other, str):
return cmp(self.data, other)
else:
return cmp(id(self), id(other))
def __repr__(self):
return "%s(%s)" % (self.__class__.__name__, repr(self.data))
class PlistParser:
def __init__(self):
self.stack = []
self.currentKey = None
self.root = None
def parse(self, fileobj):
from xml.parsers.expat import ParserCreate
parser = ParserCreate()
parser.StartElementHandler = self.handleBeginElement
parser.EndElementHandler = self.handleEndElement
parser.CharacterDataHandler = self.handleData
parser.ParseFile(fileobj)
return self.root
def handleBeginElement(self, element, attrs):
self.data = []
handler = getattr(self, "begin_" + element, None)
if handler is not None:
handler(attrs)
def handleEndElement(self, element):
handler = getattr(self, "end_" + element, None)
if handler is not None:
handler()
def handleData(self, data):
self.data.append(data)
def addObject(self, value):
if self.currentKey is not None:
self.stack[-1][self.currentKey] = value
self.currentKey = None
elif not self.stack:
# this is the root object
self.root = value
else:
self.stack[-1].append(value)
def getData(self):
data = "".join(self.data)
try:
data = data.encode("ascii")
except UnicodeError:
pass
self.data = []
return data
# element handlers
def begin_dict(self, attrs):
d = _InternalDict()
self.addObject(d)
self.stack.append(d)
def end_dict(self):
self.stack.pop()
def end_key(self):
self.currentKey = self.getData()
def begin_array(self, attrs):
a = []
self.addObject(a)
self.stack.append(a)
def end_array(self):
self.stack.pop()
def end_true(self):
self.addObject(True)
def end_false(self):
self.addObject(False)
def end_integer(self):
self.addObject(int(self.getData()))
def end_real(self):
self.addObject(float(self.getData()))
def end_string(self):
self.addObject(self.getData())
def end_data(self):
self.addObject(Data.fromBase64(self.getData()))
def end_date(self):
self.addObject(_dateFromString(self.getData()))
"""Spawn a command with pipes to its stdin, stdout, and optionally stderr.
The normal os.popen(cmd, mode) call spawns a shell command and provides a
file interface to just the input or output of the process depending on
whether mode is 'r' or 'w'. This module provides the functions popen2(cmd)
and popen3(cmd) which return two or three pipes to the spawned command.
"""
import os
import sys
import warnings
warnings.warn("The popen2 module is deprecated. Use the subprocess module.",
DeprecationWarning, stacklevel=2)
__all__ = ["popen2", "popen3", "popen4"]
try:
MAXFD = os.sysconf('SC_OPEN_MAX')
except (AttributeError, ValueError):
MAXFD = 256
_active = []
def _cleanup():
for inst in _active[:]:
if inst.poll(_deadstate=sys.maxint) >= 0:
try:
_active.remove(inst)
except ValueError:
# This can happen if two threads create a new Popen instance.
# It's harmless that it was already removed, so ignore.
pass
class Popen3:
"""Class representing a child process. Normally, instances are created
internally by the functions popen2() and popen3()."""
sts = -1 # Child not completed yet
def __init__(self, cmd, capturestderr=False, bufsize=-1):
"""The parameter 'cmd' is the shell command to execute in a
sub-process. On UNIX, 'cmd' may be a sequence, in which case arguments
will be passed directly to the program without shell intervention (as
with os.spawnv()). If 'cmd' is a string it will be passed to the shell
(as with os.system()). The 'capturestderr' flag, if true, specifies
that the object should capture standard error output of the child
process. The default is false. If the 'bufsize' parameter is
specified, it specifies the size of the I/O buffers to/from the child
process."""
_cleanup()
self.cmd = cmd
p2cread, p2cwrite = os.pipe()
c2pread, c2pwrite = os.pipe()
if capturestderr:
errout, errin = os.pipe()
self.pid = os.fork()
if self.pid == 0:
# Child
os.dup2(p2cread, 0)
os.dup2(c2pwrite, 1)
if capturestderr:
os.dup2(errin, 2)
self._run_child(cmd)
os.close(p2cread)
self.tochild = os.fdopen(p2cwrite, 'w', bufsize)
os.close(c2pwrite)
self.fromchild = os.fdopen(c2pread, 'r', bufsize)
if capturestderr:
os.close(errin)
self.childerr = os.fdopen(errout, 'r', bufsize)
else:
self.childerr = None
def __del__(self):
# In case the child hasn't been waited on, check if it's done.
self.poll(_deadstate=sys.maxint)
if self.sts < 0:
if _active is not None:
# Child is still running, keep us alive until we can wait on it.
_active.append(self)
def _run_child(self, cmd):
if isinstance(cmd, basestring):
cmd = ['/bin/sh', '-c', cmd]
os.closerange(3, MAXFD)
try:
os.execvp(cmd[0], cmd)
finally:
os._exit(1)
def poll(self, _deadstate=None):
"""Return the exit status of the child process if it has finished,
or -1 if it hasn't finished yet."""
if self.sts < 0:
try:
pid, sts = os.waitpid(self.pid, os.WNOHANG)
# pid will be 0 if self.pid hasn't terminated
if pid == self.pid:
self.sts = sts
except os.error:
if _deadstate is not None:
self.sts = _deadstate
return self.sts
def wait(self):
"""Wait for and return the exit status of the child process."""
if self.sts < 0:
pid, sts = os.waitpid(self.pid, 0)
# This used to be a test, but it is believed to be
# always true, so I changed it to an assertion - mvl
assert pid == self.pid
self.sts = sts
return self.sts
class Popen4(Popen3):
childerr = None
def __init__(self, cmd, bufsize=-1):
_cleanup()
self.cmd = cmd
p2cread, p2cwrite = os.pipe()
c2pread, c2pwrite = os.pipe()
self.pid = os.fork()
if self.pid == 0:
# Child
os.dup2(p2cread, 0)
os.dup2(c2pwrite, 1)
os.dup2(c2pwrite, 2)
self._run_child(cmd)
os.close(p2cread)
self.tochild = os.fdopen(p2cwrite, 'w', bufsize)
os.close(c2pwrite)
self.fromchild = os.fdopen(c2pread, 'r', bufsize)
if sys.platform[:3] == "win" or sys.platform == "os2emx":
# Some things don't make sense on non-Unix platforms.
del Popen3, Popen4
def popen2(cmd, bufsize=-1, mode='t'):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd' may
be a sequence, in which case arguments will be passed directly to the
program without shell intervention (as with os.spawnv()). If 'cmd' is a
string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdout, child_stdin) are returned."""
w, r = os.popen2(cmd, mode, bufsize)
return r, w
def popen3(cmd, bufsize=-1, mode='t'):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd' may
be a sequence, in which case arguments will be passed directly to the
program without shell intervention (as with os.spawnv()). If 'cmd' is a
string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdout, child_stdin, child_stderr) are returned."""
w, r, e = os.popen3(cmd, mode, bufsize)
return r, w, e
def popen4(cmd, bufsize=-1, mode='t'):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd' may
be a sequence, in which case arguments will be passed directly to the
program without shell intervention (as with os.spawnv()). If 'cmd' is a
string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdout_stderr, child_stdin) are returned."""
w, r = os.popen4(cmd, mode, bufsize)
return r, w
else:
def popen2(cmd, bufsize=-1, mode='t'):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd' may
be a sequence, in which case arguments will be passed directly to the
program without shell intervention (as with os.spawnv()). If 'cmd' is a
string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdout, child_stdin) are returned."""
inst = Popen3(cmd, False, bufsize)
return inst.fromchild, inst.tochild
def popen3(cmd, bufsize=-1, mode='t'):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd' may
be a sequence, in which case arguments will be passed directly to the
program without shell intervention (as with os.spawnv()). If 'cmd' is a
string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdout, child_stdin, child_stderr) are returned."""
inst = Popen3(cmd, True, bufsize)
return inst.fromchild, inst.tochild, inst.childerr
def popen4(cmd, bufsize=-1, mode='t'):
"""Execute the shell command 'cmd' in a sub-process. On UNIX, 'cmd' may
be a sequence, in which case arguments will be passed directly to the
program without shell intervention (as with os.spawnv()). If 'cmd' is a
string it will be passed to the shell (as with os.system()). If
'bufsize' is specified, it sets the buffer size for the I/O pipes. The
file objects (child_stdout_stderr, child_stdin) are returned."""
inst = Popen4(cmd, bufsize)
return inst.fromchild, inst.tochild
__all__.extend(["Popen3", "Popen4"])
"""A POP3 client class.
Based on the J. Myers POP3 draft, Jan. 96
"""
# Author: David Ascher <[email protected]>
# [heavily stealing from nntplib.py]
# Updated: Piers Lauder <[email protected]> [Jul '97]
# String method conversion and test jig improvements by ESR, February 2001.
# Added the POP3_SSL class. Methods loosely based on IMAP_SSL. Hector Urtubia <[email protected]> Aug 2003
# Example (see the test function at the end of this file)
# Imports
import re, socket
__all__ = ["POP3","error_proto"]
# Exception raised when an error or invalid response is received:
class error_proto(Exception): pass
# Standard Port
POP3_PORT = 110
# POP SSL PORT
POP3_SSL_PORT = 995
# Line terminators (we always output CRLF, but accept any of CRLF, LFCR, LF)
CR = '\r'
LF = '\n'
CRLF = CR+LF
# maximal line length when calling readline(). This is to prevent
# reading arbitrary length lines. RFC 1939 limits POP3 line length to
# 512 characters, including CRLF. We have selected 2048 just to be on
# the safe side.
_MAXLINE = 2048
class POP3:
"""This class supports both the minimal and optional command sets.
Arguments can be strings or integers (where appropriate)
(e.g.: retr(1) and retr('1') both work equally well.
Minimal Command Set:
USER name user(name)
PASS string pass_(string)
STAT stat()
LIST [msg] list(msg = None)
RETR msg retr(msg)
DELE msg dele(msg)
NOOP noop()
RSET rset()
QUIT quit()
Optional Commands (some servers support these):
RPOP name rpop(name)
APOP name digest apop(name, digest)
TOP msg n top(msg, n)
UIDL [msg] uidl(msg = None)
Raises one exception: 'error_proto'.
Instantiate with:
POP3(hostname, port=110)
NB: the POP protocol locks the mailbox from user
authorization until QUIT, so be sure to get in, suck
the messages, and quit, each time you access the
mailbox.
POP is a line-based protocol, which means large mail
messages consume lots of python cycles reading them
line-by-line.
If it's available on your mail server, use IMAP4
instead, it doesn't suffer from the two problems
above.
"""
def __init__(self, host, port=POP3_PORT,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
self.host = host
self.port = port
self.sock = socket.create_connection((host, port), timeout)
self.file = self.sock.makefile('rb')
self._debugging = 0
self.welcome = self._getresp()
def _putline(self, line):
if self._debugging > 1: print '*put*', repr(line)
self.sock.sendall('%s%s' % (line, CRLF))
# Internal: send one command to the server (through _putline())
def _putcmd(self, line):
if self._debugging: print '*cmd*', repr(line)
self._putline(line)
# Internal: return one line from the server, stripping CRLF.
# This is where all the CPU time of this module is consumed.
# Raise error_proto('-ERR EOF') if the connection is closed.
def _getline(self):
line = self.file.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise error_proto('line too long')
if self._debugging > 1: print '*get*', repr(line)
if not line: raise error_proto('-ERR EOF')
octets = len(line)
# server can send any combination of CR & LF
# however, 'readline()' returns lines ending in LF
# so only possibilities are ...LF, ...CRLF, CR...LF
if line[-2:] == CRLF:
return line[:-2], octets
if line[0] == CR:
return line[1:-1], octets
return line[:-1], octets
# Internal: get a response from the server.
# Raise 'error_proto' if the response doesn't start with '+'.
def _getresp(self):
resp, o = self._getline()
if self._debugging > 1: print '*resp*', repr(resp)
c = resp[:1]
if c != '+':
raise error_proto(resp)
return resp
# Internal: get a response plus following text from the server.
def _getlongresp(self):
resp = self._getresp()
list = []; octets = 0
line, o = self._getline()
while line != '.':
if line[:2] == '..':
o = o-1
line = line[1:]
octets = octets + o
list.append(line)
line, o = self._getline()
return resp, list, octets
# Internal: send a command and get the response
def _shortcmd(self, line):
self._putcmd(line)
return self._getresp()
# Internal: send a command and get the response plus following text
def _longcmd(self, line):
self._putcmd(line)
return self._getlongresp()
# These can be useful:
def getwelcome(self):
return self.welcome
def set_debuglevel(self, level):
self._debugging = level
# Here are all the POP commands:
def user(self, user):
"""Send user name, return response
(should indicate password required).
"""
return self._shortcmd('USER %s' % user)
def pass_(self, pswd):
"""Send password, return response
(response includes message count, mailbox size).
NB: mailbox is locked by server from here to 'quit()'
"""
return self._shortcmd('PASS %s' % pswd)
def stat(self):
"""Get mailbox status.
Result is tuple of 2 ints (message count, mailbox size)
"""
retval = self._shortcmd('STAT')
rets = retval.split()
if self._debugging: print '*stat*', repr(rets)
numMessages = int(rets[1])
sizeMessages = int(rets[2])
return (numMessages, sizeMessages)
def list(self, which=None):
"""Request listing, return result.
Result without a message number argument is in form
['response', ['mesg_num octets', ...], octets].
Result when a message number argument is given is a
single response: the "scan listing" for that message.
"""
if which is not None:
return self._shortcmd('LIST %s' % which)
return self._longcmd('LIST')
def retr(self, which):
"""Retrieve whole message number 'which'.
Result is in form ['response', ['line', ...], octets].
"""
return self._longcmd('RETR %s' % which)
def dele(self, which):
"""Delete message number 'which'.
Result is 'response'.
"""
return self._shortcmd('DELE %s' % which)
def noop(self):
"""Does nothing.
One supposes the response indicates the server is alive.
"""
return self._shortcmd('NOOP')
def rset(self):
"""Unmark all messages marked for deletion."""
return self._shortcmd('RSET')
def quit(self):
"""Signoff: commit changes on server, unlock mailbox, close connection."""
try:
resp = self._shortcmd('QUIT')
except error_proto, val:
resp = val
self.file.close()
self.sock.close()
del self.file, self.sock
return resp
#__del__ = quit
# optional commands:
def rpop(self, user):
"""Not sure what this does."""
return self._shortcmd('RPOP %s' % user)
timestamp = re.compile(br'\+OK.[^<]*(<.*>)')
def apop(self, user, secret):
"""Authorisation
- only possible if server has supplied a timestamp in initial greeting.
Args:
user - mailbox user;
secret - secret shared between client and server.
NB: mailbox is locked by server from here to 'quit()'
"""
m = self.timestamp.match(self.welcome)
if not m:
raise error_proto('-ERR APOP not supported by server')
import hashlib
digest = hashlib.md5(m.group(1)+secret).digest()
digest = ''.join(map(lambda x:'%02x'%ord(x), digest))
return self._shortcmd('APOP %s %s' % (user, digest))
def top(self, which, howmuch):
"""Retrieve message header of message number 'which'
and first 'howmuch' lines of message body.
Result is in form ['response', ['line', ...], octets].
"""
return self._longcmd('TOP %s %s' % (which, howmuch))
def uidl(self, which=None):
"""Return message digest (unique id) list.
If 'which', result contains unique id for that message
in the form 'response mesgnum uid', otherwise result is
the list ['response', ['mesgnum uid', ...], octets]
"""
if which is not None:
return self._shortcmd('UIDL %s' % which)
return self._longcmd('UIDL')
try:
import ssl
except ImportError:
pass
else:
class POP3_SSL(POP3):
"""POP3 client class over SSL connection
Instantiate with: POP3_SSL(hostname, port=995, keyfile=None, certfile=None)
hostname - the hostname of the pop3 over ssl server
port - port number
keyfile - PEM formatted file that contains your private key
certfile - PEM formatted certificate chain file
See the methods of the parent class POP3 for more documentation.
"""
def __init__(self, host, port = POP3_SSL_PORT, keyfile = None, certfile = None):
self.host = host
self.port = port
self.keyfile = keyfile
self.certfile = certfile
self.buffer = ""
msg = "getaddrinfo returns an empty list"
self.sock = None
for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
try:
self.sock = socket.socket(af, socktype, proto)
self.sock.connect(sa)
except socket.error, msg:
if self.sock:
self.sock.close()
self.sock = None
continue
break
if not self.sock:
raise socket.error, msg
self.file = self.sock.makefile('rb')
self.sslobj = ssl.wrap_socket(self.sock, self.keyfile, self.certfile)
self._debugging = 0
self.welcome = self._getresp()
def _fillBuffer(self):
localbuf = self.sslobj.read()
if len(localbuf) == 0:
raise error_proto('-ERR EOF')
self.buffer += localbuf
def _getline(self):
line = ""
renewline = re.compile(r'.*?\n')
match = renewline.match(self.buffer)
while not match:
self._fillBuffer()
if len(self.buffer) > _MAXLINE:
raise error_proto('line too long')
match = renewline.match(self.buffer)
line = match.group(0)
self.buffer = renewline.sub('' ,self.buffer, 1)
if self._debugging > 1: print '*get*', repr(line)
octets = len(line)
if line[-2:] == CRLF:
return line[:-2], octets
if line[0] == CR:
return line[1:-1], octets
return line[:-1], octets
def _putline(self, line):
if self._debugging > 1: print '*put*', repr(line)
line += CRLF
bytes = len(line)
while bytes > 0:
sent = self.sslobj.write(line)
if sent == bytes:
break # avoid copy
line = line[sent:]
bytes = bytes - sent
def quit(self):
"""Signoff: commit changes on server, unlock mailbox, close connection."""
try:
resp = self._shortcmd('QUIT')
except error_proto, val:
resp = val
self.sock.close()
del self.sslobj, self.sock
return resp
__all__.append("POP3_SSL")
if __name__ == "__main__":
import sys
a = POP3(sys.argv[1])
print a.getwelcome()
a.user(sys.argv[2])
a.pass_(sys.argv[3])
a.list()
(numMsgs, totalSize) = a.stat()
for i in range(1, numMsgs + 1):
(header, msg, octets) = a.retr(i)
print "Message %d:" % i
for line in msg:
print ' ' + line
print '-----------------------'
a.quit()
"""Extended file operations available in POSIX.
f = posixfile.open(filename, [mode, [bufsize]])
will create a new posixfile object
f = posixfile.fileopen(fileobject)
will create a posixfile object from a builtin file object
f.file()
will return the original builtin file object
f.dup()
will return a new file object based on a new filedescriptor
f.dup2(fd)
will return a new file object based on the given filedescriptor
f.flags(mode)
will turn on the associated flag (merge)
mode can contain the following characters:
(character representing a flag)
a append only flag
c close on exec flag
n no delay flag
s synchronization flag
(modifiers)
! turn flags 'off' instead of default 'on'
= copy flags 'as is' instead of default 'merge'
? return a string in which the characters represent the flags
that are set
note: - the '!' and '=' modifiers are mutually exclusive.
- the '?' modifier will return the status of the flags after they
have been changed by other characters in the mode string
f.lock(mode [, len [, start [, whence]]])
will (un)lock a region
mode can contain the following characters:
(character representing type of lock)
u unlock
r read lock
w write lock
(modifiers)
| wait until the lock can be granted
? return the first lock conflicting with the requested lock
or 'None' if there is no conflict. The lock returned is in the
format (mode, len, start, whence, pid) where mode is a
character representing the type of lock ('r' or 'w')
note: - the '?' modifier prevents a region from being locked; it is
query only
"""
import warnings
warnings.warn("The posixfile module is deprecated; "
"fcntl.lockf() provides better locking", DeprecationWarning, 2)
class _posixfile_:
"""File wrapper class that provides extra POSIX file routines."""
states = ['open', 'closed']
#
# Internal routines
#
def __repr__(self):
file = self._file_
return "<%s posixfile '%s', mode '%s' at %s>" % \
(self.states[file.closed], file.name, file.mode, \
hex(id(self))[2:])
#
# Initialization routines
#
def open(self, name, mode='r', bufsize=-1):
import __builtin__
return self.fileopen(__builtin__.open(name, mode, bufsize))
def fileopen(self, file):
import types
if repr(type(file)) != "<type 'file'>":
raise TypeError, 'posixfile.fileopen() arg must be file object'
self._file_ = file
# Copy basic file methods
for maybemethod in dir(file):
if not maybemethod.startswith('_'):
attr = getattr(file, maybemethod)
if isinstance(attr, types.BuiltinMethodType):
setattr(self, maybemethod, attr)
return self
#
# New methods
#
def file(self):
return self._file_
def dup(self):
import posix
if not hasattr(posix, 'fdopen'):
raise AttributeError, 'dup() method unavailable'
return posix.fdopen(posix.dup(self._file_.fileno()), self._file_.mode)
def dup2(self, fd):
import posix
if not hasattr(posix, 'fdopen'):
raise AttributeError, 'dup() method unavailable'
posix.dup2(self._file_.fileno(), fd)
return posix.fdopen(fd, self._file_.mode)
def flags(self, *which):
import fcntl, os
if which:
if len(which) > 1:
raise TypeError, 'Too many arguments'
which = which[0]
else: which = '?'
l_flags = 0
if 'n' in which: l_flags = l_flags | os.O_NDELAY
if 'a' in which: l_flags = l_flags | os.O_APPEND
if 's' in which: l_flags = l_flags | os.O_SYNC
file = self._file_
if '=' not in which:
cur_fl = fcntl.fcntl(file.fileno(), fcntl.F_GETFL, 0)
if '!' in which: l_flags = cur_fl & ~ l_flags
else: l_flags = cur_fl | l_flags
l_flags = fcntl.fcntl(file.fileno(), fcntl.F_SETFL, l_flags)
if 'c' in which:
arg = ('!' not in which) # 0 is don't, 1 is do close on exec
l_flags = fcntl.fcntl(file.fileno(), fcntl.F_SETFD, arg)
if '?' in which:
which = '' # Return current flags
l_flags = fcntl.fcntl(file.fileno(), fcntl.F_GETFL, 0)
if os.O_APPEND & l_flags: which = which + 'a'
if fcntl.fcntl(file.fileno(), fcntl.F_GETFD, 0) & 1:
which = which + 'c'
if os.O_NDELAY & l_flags: which = which + 'n'
if os.O_SYNC & l_flags: which = which + 's'
return which
def lock(self, how, *args):
import struct, fcntl
if 'w' in how: l_type = fcntl.F_WRLCK
elif 'r' in how: l_type = fcntl.F_RDLCK
elif 'u' in how: l_type = fcntl.F_UNLCK
else: raise TypeError, 'no type of lock specified'
if '|' in how: cmd = fcntl.F_SETLKW
elif '?' in how: cmd = fcntl.F_GETLK
else: cmd = fcntl.F_SETLK
l_whence = 0
l_start = 0
l_len = 0
if len(args) == 1:
l_len = args[0]
elif len(args) == 2:
l_len, l_start = args
elif len(args) == 3:
l_len, l_start, l_whence = args
elif len(args) > 3:
raise TypeError, 'too many arguments'
# Hack by [email protected] to get locking to go on freebsd;
# additions for AIX by [email protected]
import sys, os
if sys.platform in ('netbsd1',
'openbsd2',
'freebsd2', 'freebsd3', 'freebsd4', 'freebsd5',
'freebsd6', 'freebsd7', 'freebsd8',
'bsdos2', 'bsdos3', 'bsdos4'):
flock = struct.pack('lxxxxlxxxxlhh', \
l_start, l_len, os.getpid(), l_type, l_whence)
elif sys.platform in ('aix3', 'aix4'):
flock = struct.pack('hhlllii', \
l_type, l_whence, l_start, l_len, 0, 0, 0)
else:
flock = struct.pack('hhllhh', \
l_type, l_whence, l_start, l_len, 0, 0)
flock = fcntl.fcntl(self._file_.fileno(), cmd, flock)
if '?' in how:
if sys.platform in ('netbsd1',
'openbsd2',
'freebsd2', 'freebsd3', 'freebsd4', 'freebsd5',
'bsdos2', 'bsdos3', 'bsdos4'):
l_start, l_len, l_pid, l_type, l_whence = \
struct.unpack('lxxxxlxxxxlhh', flock)
elif sys.platform in ('aix3', 'aix4'):
l_type, l_whence, l_start, l_len, l_sysid, l_pid, l_vfs = \
struct.unpack('hhlllii', flock)
elif sys.platform == "linux2":
l_type, l_whence, l_start, l_len, l_pid, l_sysid = \
struct.unpack('hhllhh', flock)
else:
l_type, l_whence, l_start, l_len, l_sysid, l_pid = \
struct.unpack('hhllhh', flock)
if l_type != fcntl.F_UNLCK:
if l_type == fcntl.F_RDLCK:
return 'r', l_len, l_start, l_whence, l_pid
else:
return 'w', l_len, l_start, l_whence, l_pid
def open(name, mode='r', bufsize=-1):
"""Public routine to open a file as a posixfile object."""
return _posixfile_().open(name, mode, bufsize)
def fileopen(file):
"""Public routine to get a posixfile object from a Python file object."""
return _posixfile_().fileopen(file)
#
# Constants
#
SEEK_SET = 0
SEEK_CUR = 1
SEEK_END = 2
#
# End of posixfile.py
#
"""Common operations on Posix pathnames.
Instead of importing this module directly, import os and refer to
this module as os.path. The "os.path" name is an alias for this
module on Posix systems; on other systems (e.g. Mac, Windows),
os.path provides the same operations in a manner specific to that
platform, and is an alias to another module (e.g. macpath, ntpath).
Some of this can actually be useful on non-Posix systems too, e.g.
for manipulation of the pathname component of URLs.
"""
import os
import sys
import stat
import genericpath
import warnings
from genericpath import *
from genericpath import _unicode
__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime","islink","exists","lexists","isdir","isfile",
"ismount","walk","expanduser","expandvars","normpath","abspath",
"samefile","sameopenfile","samestat",
"curdir","pardir","sep","pathsep","defpath","altsep","extsep",
"devnull","realpath","supports_unicode_filenames","relpath"]
# strings representing various path-related bits and pieces
curdir = '.'
pardir = '..'
extsep = '.'
sep = '/'
pathsep = ':'
defpath = ':/bin:/usr/bin'
altsep = None
devnull = '/dev/null'
# Normalize the case of a pathname. Trivial in Posix, string.lower on Mac.
# On MS-DOS this may also turn slashes into backslashes; however, other
# normalizations (such as optimizing '../' away) are not allowed
# (another function should be defined to do that).
def normcase(s):
"""Normalize case of pathname. Has no effect under Posix"""
return s
# Return whether a path is absolute.
# Trivial in Posix, harder on the Mac or MS-DOS.
def isabs(s):
"""Test whether a path is absolute"""
return s.startswith('/')
# Join pathnames.
# Ignore the previous parts if a part is absolute.
# Insert a '/' unless the first part is empty or already ends in '/'.
def join(a, *p):
"""Join two or more pathname components, inserting '/' as needed.
If any component is an absolute path, all previous path components
will be discarded. An empty last part will result in a path that
ends with a separator."""
path = a
for b in p:
if b.startswith('/'):
path = b
elif path == '' or path.endswith('/'):
path += b
else:
path += '/' + b
return path
# Split a path in head (everything up to the last '/') and tail (the
# rest). If the path ends in '/', tail will be empty. If there is no
# '/' in the path, head will be empty.
# Trailing '/'es are stripped from head unless it is the root.
def split(p):
"""Split a pathname. Returns tuple "(head, tail)" where "tail" is
everything after the final slash. Either part may be empty."""
i = p.rfind('/') + 1
head, tail = p[:i], p[i:]
if head and head != '/'*len(head):
head = head.rstrip('/')
return head, tail
# Split a path in root and extension.
# The extension is everything starting at the last dot in the last
# pathname component; the root is everything before that.
# It is always true that root + ext == p.
def splitext(p):
return genericpath._splitext(p, sep, altsep, extsep)
splitext.__doc__ = genericpath._splitext.__doc__
# Split a pathname into a drive specification and the rest of the
# path. Useful on DOS/Windows/NT; on Unix, the drive is always empty.
def splitdrive(p):
"""Split a pathname into drive and path. On Posix, drive is always
empty."""
return '', p
# Return the tail (basename) part of a path, same as split(path)[1].
def basename(p):
"""Returns the final component of a pathname"""
i = p.rfind('/') + 1
return p[i:]
# Return the head (dirname) part of a path, same as split(path)[0].
def dirname(p):
"""Returns the directory component of a pathname"""
i = p.rfind('/') + 1
head = p[:i]
if head and head != '/'*len(head):
head = head.rstrip('/')
return head
# Is a path a symbolic link?
# This will always return false on systems where os.lstat doesn't exist.
def islink(path):
"""Test whether a path is a symbolic link"""
try:
st = os.lstat(path)
except (os.error, AttributeError):
return False
return stat.S_ISLNK(st.st_mode)
# Being true for dangling symbolic links is also useful.
def lexists(path):
"""Test whether a path exists. Returns True for broken symbolic links"""
try:
os.lstat(path)
except os.error:
return False
return True
# Are two filenames really pointing to the same file?
def samefile(f1, f2):
"""Test whether two pathnames reference the same actual file"""
s1 = os.stat(f1)
s2 = os.stat(f2)
return samestat(s1, s2)
# Are two open files really referencing the same file?
# (Not necessarily the same file descriptor!)
def sameopenfile(fp1, fp2):
"""Test whether two open file objects reference the same file"""
s1 = os.fstat(fp1)
s2 = os.fstat(fp2)
return samestat(s1, s2)
# Are two stat buffers (obtained from stat, fstat or lstat)
# describing the same file?
def samestat(s1, s2):
"""Test whether two stat buffers reference the same file"""
return s1.st_ino == s2.st_ino and \
s1.st_dev == s2.st_dev
# Is a path a mount point?
# (Does this work for all UNIXes? Is it even guaranteed to work by Posix?)
def ismount(path):
"""Test whether a path is a mount point"""
if islink(path):
# A symlink can never be a mount point
return False
try:
s1 = os.lstat(path)
s2 = os.lstat(realpath(join(path, '..')))
except os.error:
return False # It doesn't exist -- so not a mount point :-)
dev1 = s1.st_dev
dev2 = s2.st_dev
if dev1 != dev2:
return True # path/.. on a different device as path
ino1 = s1.st_ino
ino2 = s2.st_ino
if ino1 == ino2:
return True # path/.. is the same i-node as path
return False
# Directory tree walk.
# For each directory under top (including top itself, but excluding
# '.' and '..'), func(arg, dirname, filenames) is called, where
# dirname is the name of the directory and filenames is the list
# of files (and subdirectories etc.) in the directory.
# The func may modify the filenames list, to implement a filter,
# or to impose a different order of visiting.
def walk(top, func, arg):
"""Directory tree walk with callback function.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), call func(arg, dirname, fnames).
dirname is the name of the directory, and fnames a list of the names of
the files and subdirectories in dirname (excluding '.' and '..'). func
may modify the fnames list in-place (e.g. via del or slice assignment),
and walk will only recurse into the subdirectories whose names remain in
fnames; this can be used to implement a filter, or to impose a specific
order of visiting. No semantics are defined for, or required of, arg,
beyond that arg is always passed to func. It can be used, e.g., to pass
a filename pattern, or a mutable object designed to accumulate
statistics. Passing None for arg is common."""
warnings.warnpy3k("In 3.x, os.path.walk is removed in favor of os.walk.",
stacklevel=2)
try:
names = os.listdir(top)
except os.error:
return
func(arg, top, names)
for name in names:
name = join(top, name)
try:
st = os.lstat(name)
except os.error:
continue
if stat.S_ISDIR(st.st_mode):
walk(name, func, arg)
# Expand paths beginning with '~' or '~user'.
# '~' means $HOME; '~user' means that user's home directory.
# If the path doesn't begin with '~', or if the user or $HOME is unknown,
# the path is returned unchanged (leaving error reporting to whatever
# function is called with the expanded path as argument).
# See also module 'glob' for expansion of *, ? and [...] in pathnames.
# (A function should also be defined to do full *sh-style environment
# variable expansion.)
def expanduser(path):
"""Expand ~ and ~user constructions. If user or $HOME is unknown,
do nothing."""
if not path.startswith('~'):
return path
i = path.find('/', 1)
if i < 0:
i = len(path)
if i == 1:
if 'HOME' not in os.environ:
import pwd
try:
userhome = pwd.getpwuid(os.getuid()).pw_dir
except KeyError:
# bpo-10496: if the current user identifier doesn't exist in the
# password database, return the path unchanged
return path
else:
userhome = os.environ['HOME']
else:
import pwd
try:
pwent = pwd.getpwnam(path[1:i])
except KeyError:
# bpo-10496: if the user name from the path doesn't exist in the
# password database, return the path unchanged
return path
userhome = pwent.pw_dir
userhome = userhome.rstrip('/')
return (userhome + path[i:]) or '/'
# Expand paths containing shell variable substitutions.
# This expands the forms $variable and ${variable} only.
# Non-existent variables are left unchanged.
_varprog = None
_uvarprog = None
def expandvars(path):
"""Expand shell variables of form $var and ${var}. Unknown variables
are left unchanged."""
global _varprog, _uvarprog
if '$' not in path:
return path
if isinstance(path, _unicode):
if not _uvarprog:
import re
_uvarprog = re.compile(ur'\$(\w+|\{[^}]*\})', re.UNICODE)
varprog = _uvarprog
encoding = sys.getfilesystemencoding()
else:
if not _varprog:
import re
_varprog = re.compile(r'\$(\w+|\{[^}]*\})')
varprog = _varprog
encoding = None
i = 0
while True:
m = varprog.search(path, i)
if not m:
break
i, j = m.span(0)
name = m.group(1)
if name.startswith('{') and name.endswith('}'):
name = name[1:-1]
if encoding:
name = name.encode(encoding)
if name in os.environ:
tail = path[j:]
value = os.environ[name]
if encoding:
value = value.decode(encoding)
path = path[:i] + value
i = len(path)
path += tail
else:
i = j
return path
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A/B.
# It should be understood that this may change the meaning of the path
# if it contains symbolic links!
def normpath(path):
"""Normalize path, eliminating double slashes, etc."""
# Preserve unicode (if path is unicode)
slash, dot = (u'/', u'.') if isinstance(path, _unicode) else ('/', '.')
if path == '':
return dot
initial_slashes = path.startswith('/')
# POSIX allows one or two initial slashes, but treats three or more
# as single slash.
if (initial_slashes and
path.startswith('//') and not path.startswith('///')):
initial_slashes = 2
comps = path.split('/')
new_comps = []
for comp in comps:
if comp in ('', '.'):
continue
if (comp != '..' or (not initial_slashes and not new_comps) or
(new_comps and new_comps[-1] == '..')):
new_comps.append(comp)
elif new_comps:
new_comps.pop()
comps = new_comps
path = slash.join(comps)
if initial_slashes:
path = slash*initial_slashes + path
return path or dot
def abspath(path):
"""Return an absolute path."""
if not isabs(path):
if isinstance(path, _unicode):
cwd = os.getcwdu()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path)
# Return a canonical path (i.e. the absolute location of a file on the
# filesystem).
def realpath(filename):
"""Return the canonical path of the specified filename, eliminating any
symbolic links encountered in the path."""
path, ok = _joinrealpath('', filename, {})
return abspath(path)
# Join two paths, normalizing and eliminating any symbolic links
# encountered in the second path.
def _joinrealpath(path, rest, seen):
if isabs(rest):
rest = rest[1:]
path = sep
while rest:
name, _, rest = rest.partition(sep)
if not name or name == curdir:
# current dir
continue
if name == pardir:
# parent dir
if path:
path, name = split(path)
if name == pardir:
path = join(path, pardir, pardir)
else:
path = pardir
continue
newpath = join(path, name)
if not islink(newpath):
path = newpath
continue
# Resolve the symbolic link
if newpath in seen:
# Already seen this path
path = seen[newpath]
if path is not None:
# use cached value
continue
# The symlink is not resolved, so we must have a symlink loop.
# Return already resolved part + rest of the path unchanged.
return join(newpath, rest), False
seen[newpath] = None # not resolved symlink
path, ok = _joinrealpath(path, os.readlink(newpath), seen)
if not ok:
return join(path, rest), False
seen[newpath] = path # resolved symlink
return path, True
supports_unicode_filenames = (sys.platform == 'darwin')
def relpath(path, start=curdir):
"""Return a relative version of a path"""
if not path:
raise ValueError("no path specified")
start_list = [x for x in abspath(start).split(sep) if x]
path_list = [x for x in abspath(path).split(sep) if x]
# Work out how much of the filepath is shared by start and path.
i = len(commonprefix([start_list, path_list]))
rel_list = [pardir] * (len(start_list)-i) + path_list[i:]
if not rel_list:
return curdir
return join(*rel_list)
# Author: Fred L. Drake, Jr.
# [email protected]
#
# This is a simple little module I wrote to make life easier. I didn't
# see anything quite like it in the library, though I may have overlooked
# something. I wrote this when I was trying to read some heavily nested
# tuples with fairly non-descriptive content. This is modeled very much
# after Lisp/Scheme - style pretty-printing of lists. If you find it
# useful, thank small children who sleep at night.
"""Support to pretty-print lists, tuples, & dictionaries recursively.
Very simple, but useful, especially in debugging data structures.
Classes
-------
PrettyPrinter()
Handle pretty-printing operations onto a stream using a configured
set of formatting parameters.
Functions
---------
pformat()
Format a Python object into a pretty-printed representation.
pprint()
Pretty-print a Python object to a stream [default is sys.stdout].
saferepr()
Generate a 'standard' repr()-like value, but protect against recursive
data structures.
"""
import sys as _sys
import warnings
try:
from cStringIO import StringIO as _StringIO
except ImportError:
from StringIO import StringIO as _StringIO
__all__ = ["pprint","pformat","isreadable","isrecursive","saferepr",
"PrettyPrinter"]
# cache these for faster access:
_commajoin = ", ".join
_id = id
_len = len
_type = type
def pprint(object, stream=None, indent=1, width=80, depth=None):
"""Pretty-print a Python object to a stream [default is sys.stdout]."""
printer = PrettyPrinter(
stream=stream, indent=indent, width=width, depth=depth)
printer.pprint(object)
def pformat(object, indent=1, width=80, depth=None):
"""Format a Python object into a pretty-printed representation."""
return PrettyPrinter(indent=indent, width=width, depth=depth).pformat(object)
def saferepr(object):
"""Version of repr() which can handle recursive data structures."""
return _safe_repr(object, {}, None, 0)[0]
def isreadable(object):
"""Determine if saferepr(object) is readable by eval()."""
return _safe_repr(object, {}, None, 0)[1]
def isrecursive(object):
"""Determine if object requires a recursive representation."""
return _safe_repr(object, {}, None, 0)[2]
def _sorted(iterable):
with warnings.catch_warnings():
if _sys.py3kwarning:
warnings.filterwarnings("ignore", "comparing unequal types "
"not supported", DeprecationWarning)
return sorted(iterable)
class PrettyPrinter:
def __init__(self, indent=1, width=80, depth=None, stream=None):
"""Handle pretty printing operations onto a stream using a set of
configured parameters.
indent
Number of spaces to indent for each level of nesting.
width
Attempted maximum number of columns in the output.
depth
The maximum depth to print out nested structures.
stream
The desired output stream. If omitted (or false), the standard
output stream available at construction will be used.
"""
indent = int(indent)
width = int(width)
assert indent >= 0, "indent must be >= 0"
assert depth is None or depth > 0, "depth must be > 0"
assert width, "width must be != 0"
self._depth = depth
self._indent_per_level = indent
self._width = width
if stream is not None:
self._stream = stream
else:
self._stream = _sys.stdout
def pprint(self, object):
self._format(object, self._stream, 0, 0, {}, 0)
self._stream.write("\n")
def pformat(self, object):
sio = _StringIO()
self._format(object, sio, 0, 0, {}, 0)
return sio.getvalue()
def isrecursive(self, object):
return self.format(object, {}, 0, 0)[2]
def isreadable(self, object):
s, readable, recursive = self.format(object, {}, 0, 0)
return readable and not recursive
def _format(self, object, stream, indent, allowance, context, level):
level = level + 1
objid = _id(object)
if objid in context:
stream.write(_recursion(object))
self._recursive = True
self._readable = False
return
rep = self._repr(object, context, level - 1)
typ = _type(object)
sepLines = _len(rep) > (self._width - 1 - indent - allowance)
write = stream.write
if self._depth and level > self._depth:
write(rep)
return
r = getattr(typ, "__repr__", None)
if issubclass(typ, dict) and r is dict.__repr__:
write('{')
if self._indent_per_level > 1:
write((self._indent_per_level - 1) * ' ')
length = _len(object)
if length:
context[objid] = 1
indent = indent + self._indent_per_level
items = _sorted(object.items())
key, ent = items[0]
rep = self._repr(key, context, level)
write(rep)
write(': ')
self._format(ent, stream, indent + _len(rep) + 2,
allowance + 1, context, level)
if length > 1:
for key, ent in items[1:]:
rep = self._repr(key, context, level)
if sepLines:
write(',\n%s%s: ' % (' '*indent, rep))
else:
write(', %s: ' % rep)
self._format(ent, stream, indent + _len(rep) + 2,
allowance + 1, context, level)
indent = indent - self._indent_per_level
del context[objid]
write('}')
return
if ((issubclass(typ, list) and r is list.__repr__) or
(issubclass(typ, tuple) and r is tuple.__repr__) or
(issubclass(typ, set) and r is set.__repr__) or
(issubclass(typ, frozenset) and r is frozenset.__repr__)
):
length = _len(object)
if issubclass(typ, list):
write('[')
endchar = ']'
elif issubclass(typ, tuple):
write('(')
endchar = ')'
else:
if not length:
write(rep)
return
write(typ.__name__)
write('([')
endchar = '])'
indent += len(typ.__name__) + 1
object = _sorted(object)
if self._indent_per_level > 1 and sepLines:
write((self._indent_per_level - 1) * ' ')
if length:
context[objid] = 1
indent = indent + self._indent_per_level
self._format(object[0], stream, indent, allowance + 1,
context, level)
if length > 1:
for ent in object[1:]:
if sepLines:
write(',\n' + ' '*indent)
else:
write(', ')
self._format(ent, stream, indent,
allowance + 1, context, level)
indent = indent - self._indent_per_level
del context[objid]
if issubclass(typ, tuple) and length == 1:
write(',')
write(endchar)
return
write(rep)
def _repr(self, object, context, level):
repr, readable, recursive = self.format(object, context.copy(),
self._depth, level)
if not readable:
self._readable = False
if recursive:
self._recursive = True
return repr
def format(self, object, context, maxlevels, level):
"""Format object for a specific context, returning a string
and flags indicating whether the representation is 'readable'
and whether the object represents a recursive construct.
"""
return _safe_repr(object, context, maxlevels, level)
# Return triple (repr_string, isreadable, isrecursive).
def _safe_repr(object, context, maxlevels, level):
typ = _type(object)
if typ is str:
if 'locale' not in _sys.modules:
return repr(object), True, False
if "'" in object and '"' not in object:
closure = '"'
quotes = {'"': '\\"'}
else:
closure = "'"
quotes = {"'": "\\'"}
qget = quotes.get
sio = _StringIO()
write = sio.write
for char in object:
if char.isalpha():
write(char)
else:
write(qget(char, repr(char)[1:-1]))
return ("%s%s%s" % (closure, sio.getvalue(), closure)), True, False
r = getattr(typ, "__repr__", None)
if issubclass(typ, dict) and r is dict.__repr__:
if not object:
return "{}", True, False
objid = _id(object)
if maxlevels and level >= maxlevels:
return "{...}", False, objid in context
if objid in context:
return _recursion(object), False, True
context[objid] = 1
readable = True
recursive = False
components = []
append = components.append
level += 1
saferepr = _safe_repr
for k, v in _sorted(object.items()):
krepr, kreadable, krecur = saferepr(k, context, maxlevels, level)
vrepr, vreadable, vrecur = saferepr(v, context, maxlevels, level)
append("%s: %s" % (krepr, vrepr))
readable = readable and kreadable and vreadable
if krecur or vrecur:
recursive = True
del context[objid]
return "{%s}" % _commajoin(components), readable, recursive
if (issubclass(typ, list) and r is list.__repr__) or \
(issubclass(typ, tuple) and r is tuple.__repr__):
if issubclass(typ, list):
if not object:
return "[]", True, False
format = "[%s]"
elif _len(object) == 1:
format = "(%s,)"
else:
if not object:
return "()", True, False
format = "(%s)"
objid = _id(object)
if maxlevels and level >= maxlevels:
return format % "...", False, objid in context
if objid in context:
return _recursion(object), False, True
context[objid] = 1
readable = True
recursive = False
components = []
append = components.append
level += 1
for o in object:
orepr, oreadable, orecur = _safe_repr(o, context, maxlevels, level)
append(orepr)
if not oreadable:
readable = False
if orecur:
recursive = True
del context[objid]
return format % _commajoin(components), readable, recursive
rep = repr(object)
return rep, (rep and not rep.startswith('<')), False
def _recursion(object):
return ("<Recursion on %s with id=%s>"
% (_type(object).__name__, _id(object)))
def _perfcheck(object=None):
import time
if object is None:
object = [("string", (1, 2), [3, 4], {5: 6, 7: 8})] * 100000
p = PrettyPrinter()
t1 = time.time()
_safe_repr(object, {}, None, 0)
t2 = time.time()
p.pformat(object)
t3 = time.time()
print "_safe_repr:", t2 - t1
print "pformat:", t3 - t2
if __name__ == "__main__":
_perfcheck()
#! /usr/bin/env python
#
# Class for profiling python code. rev 1.0 6/2/94
#
# Written by James Roskind
# Based on prior profile module by Sjoerd Mullender...
# which was hacked somewhat by: Guido van Rossum
"""Class for profiling Python code."""
# Copyright Disney Enterprises, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied. See the License for the specific language
# governing permissions and limitations under the License.
import sys
import os
import time
import marshal
from optparse import OptionParser
__all__ = ["run", "runctx", "help", "Profile"]
# Sample timer for use with
#i_count = 0
#def integer_timer():
# global i_count
# i_count = i_count + 1
# return i_count
#itimes = integer_timer # replace with C coded timer returning integers
#**************************************************************************
# The following are the static member functions for the profiler class
# Note that an instance of Profile() is *not* needed to call them.
#**************************************************************************
def run(statement, filename=None, sort=-1):
"""Run statement under profiler optionally saving results in filename
This function takes a single argument that can be passed to the
"exec" statement, and an optional file name. In all cases this
routine attempts to "exec" its first argument and gather profiling
statistics from the execution. If no file name is present, then this
function automatically prints a simple profiling report, sorted by the
standard name string (file/line/function-name) that is presented in
each line.
"""
prof = Profile()
try:
prof = prof.run(statement)
except SystemExit:
pass
if filename is not None:
prof.dump_stats(filename)
else:
return prof.print_stats(sort)
def runctx(statement, globals, locals, filename=None, sort=-1):
"""Run statement under profiler, supplying your own globals and locals,
optionally saving results in filename.
statement and filename have the same semantics as profile.run
"""
prof = Profile()
try:
prof = prof.runctx(statement, globals, locals)
except SystemExit:
pass
if filename is not None:
prof.dump_stats(filename)
else:
return prof.print_stats(sort)
# Backwards compatibility.
def help():
print "Documentation for the profile module can be found "
print "in the Python Library Reference, section 'The Python Profiler'."
if hasattr(os, "times"):
def _get_time_times(timer=os.times):
t = timer()
return t[0] + t[1]
# Using getrusage(3) is better than clock(3) if available:
# on some systems (e.g. FreeBSD), getrusage has a higher resolution
# Furthermore, on a POSIX system, returns microseconds, which
# wrap around after 36min.
_has_res = 0
try:
import resource
resgetrusage = lambda: resource.getrusage(resource.RUSAGE_SELF)
def _get_time_resource(timer=resgetrusage):
t = timer()
return t[0] + t[1]
_has_res = 1
except ImportError:
pass
class Profile:
"""Profiler class.
self.cur is always a tuple. Each such tuple corresponds to a stack
frame that is currently active (self.cur[-2]). The following are the
definitions of its members. We use this external "parallel stack" to
avoid contaminating the program that we are profiling. (old profiler
used to write into the frames local dictionary!!) Derived classes
can change the definition of some entries, as long as they leave
[-2:] intact (frame and previous tuple). In case an internal error is
detected, the -3 element is used as the function name.
[ 0] = Time that needs to be charged to the parent frame's function.
It is used so that a function call will not have to access the
timing data for the parent frame.
[ 1] = Total time spent in this frame's function, excluding time in
subfunctions (this latter is tallied in cur[2]).
[ 2] = Total time spent in subfunctions, excluding time executing the
frame's function (this latter is tallied in cur[1]).
[-3] = Name of the function that corresponds to this frame.
[-2] = Actual frame that we correspond to (used to sync exception handling).
[-1] = Our parent 6-tuple (corresponds to frame.f_back).
Timing data for each function is stored as a 5-tuple in the dictionary
self.timings[]. The index is always the name stored in self.cur[-3].
The following are the definitions of the members:
[0] = The number of times this function was called, not counting direct
or indirect recursion,
[1] = Number of times this function appears on the stack, minus one
[2] = Total time spent internal to this function
[3] = Cumulative time that this function was present on the stack. In
non-recursive functions, this is the total execution time from start
to finish of each invocation of a function, including time spent in
all subfunctions.
[4] = A dictionary indicating for each function name, the number of times
it was called by us.
"""
bias = 0 # calibration constant
def __init__(self, timer=None, bias=None):
self.timings = {}
self.cur = None
self.cmd = ""
self.c_func_name = ""
if bias is None:
bias = self.bias
self.bias = bias # Materialize in local dict for lookup speed.
if not timer:
if _has_res:
self.timer = resgetrusage
self.dispatcher = self.trace_dispatch
self.get_time = _get_time_resource
elif hasattr(time, 'clock'):
self.timer = self.get_time = time.clock
self.dispatcher = self.trace_dispatch_i
elif hasattr(os, 'times'):
self.timer = os.times
self.dispatcher = self.trace_dispatch
self.get_time = _get_time_times
else:
self.timer = self.get_time = time.time
self.dispatcher = self.trace_dispatch_i
else:
self.timer = timer
t = self.timer() # test out timer function
try:
length = len(t)
except TypeError:
self.get_time = timer
self.dispatcher = self.trace_dispatch_i
else:
if length == 2:
self.dispatcher = self.trace_dispatch
else:
self.dispatcher = self.trace_dispatch_l
# This get_time() implementation needs to be defined
# here to capture the passed-in timer in the parameter
# list (for performance). Note that we can't assume
# the timer() result contains two values in all
# cases.
def get_time_timer(timer=timer, sum=sum):
return sum(timer())
self.get_time = get_time_timer
self.t = self.get_time()
self.simulate_call('profiler')
# Heavily optimized dispatch routine for os.times() timer
def trace_dispatch(self, frame, event, arg):
timer = self.timer
t = timer()
t = t[0] + t[1] - self.t - self.bias
if event == "c_call":
self.c_func_name = arg.__name__
if self.dispatch[event](self, frame,t):
t = timer()
self.t = t[0] + t[1]
else:
r = timer()
self.t = r[0] + r[1] - t # put back unrecorded delta
# Dispatch routine for best timer program (return = scalar, fastest if
# an integer but float works too -- and time.clock() relies on that).
def trace_dispatch_i(self, frame, event, arg):
timer = self.timer
t = timer() - self.t - self.bias
if event == "c_call":
self.c_func_name = arg.__name__
if self.dispatch[event](self, frame, t):
self.t = timer()
else:
self.t = timer() - t # put back unrecorded delta
# Dispatch routine for macintosh (timer returns time in ticks of
# 1/60th second)
def trace_dispatch_mac(self, frame, event, arg):
timer = self.timer
t = timer()/60.0 - self.t - self.bias
if event == "c_call":
self.c_func_name = arg.__name__
if self.dispatch[event](self, frame, t):
self.t = timer()/60.0
else:
self.t = timer()/60.0 - t # put back unrecorded delta
# SLOW generic dispatch routine for timer returning lists of numbers
def trace_dispatch_l(self, frame, event, arg):
get_time = self.get_time
t = get_time() - self.t - self.bias
if event == "c_call":
self.c_func_name = arg.__name__
if self.dispatch[event](self, frame, t):
self.t = get_time()
else:
self.t = get_time() - t # put back unrecorded delta
# In the event handlers, the first 3 elements of self.cur are unpacked
# into vrbls w/ 3-letter names. The last two characters are meant to be
# mnemonic:
# _pt self.cur[0] "parent time" time to be charged to parent frame
# _it self.cur[1] "internal time" time spent directly in the function
# _et self.cur[2] "external time" time spent in subfunctions
def trace_dispatch_exception(self, frame, t):
rpt, rit, ret, rfn, rframe, rcur = self.cur
if (rframe is not frame) and rcur:
return self.trace_dispatch_return(rframe, t)
self.cur = rpt, rit+t, ret, rfn, rframe, rcur
return 1
def trace_dispatch_call(self, frame, t):
if self.cur and frame.f_back is not self.cur[-2]:
rpt, rit, ret, rfn, rframe, rcur = self.cur
if not isinstance(rframe, Profile.fake_frame):
assert rframe.f_back is frame.f_back, ("Bad call", rfn,
rframe, rframe.f_back,
frame, frame.f_back)
self.trace_dispatch_return(rframe, 0)
assert (self.cur is None or \
frame.f_back is self.cur[-2]), ("Bad call",
self.cur[-3])
fcode = frame.f_code
fn = (fcode.co_filename, fcode.co_firstlineno, fcode.co_name)
self.cur = (t, 0, 0, fn, frame, self.cur)
timings = self.timings
if fn in timings:
cc, ns, tt, ct, callers = timings[fn]
timings[fn] = cc, ns + 1, tt, ct, callers
else:
timings[fn] = 0, 0, 0, 0, {}
return 1
def trace_dispatch_c_call (self, frame, t):
fn = ("", 0, self.c_func_name)
self.cur = (t, 0, 0, fn, frame, self.cur)
timings = self.timings
if fn in timings:
cc, ns, tt, ct, callers = timings[fn]
timings[fn] = cc, ns+1, tt, ct, callers
else:
timings[fn] = 0, 0, 0, 0, {}
return 1
def trace_dispatch_return(self, frame, t):
if frame is not self.cur[-2]:
assert frame is self.cur[-2].f_back, ("Bad return", self.cur[-3])
self.trace_dispatch_return(self.cur[-2], 0)
# Prefix "r" means part of the Returning or exiting frame.
# Prefix "p" means part of the Previous or Parent or older frame.
rpt, rit, ret, rfn, frame, rcur = self.cur
rit = rit + t
frame_total = rit + ret
ppt, pit, pet, pfn, pframe, pcur = rcur
self.cur = ppt, pit + rpt, pet + frame_total, pfn, pframe, pcur
timings = self.timings
cc, ns, tt, ct, callers = timings[rfn]
if not ns:
# This is the only occurrence of the function on the stack.
# Else this is a (directly or indirectly) recursive call, and
# its cumulative time will get updated when the topmost call to
# it returns.
ct = ct + frame_total
cc = cc + 1
if pfn in callers:
callers[pfn] = callers[pfn] + 1 # hack: gather more
# stats such as the amount of time added to ct courtesy
# of this specific call, and the contribution to cc
# courtesy of this call.
else:
callers[pfn] = 1
timings[rfn] = cc, ns - 1, tt + rit, ct, callers
return 1
dispatch = {
"call": trace_dispatch_call,
"exception": trace_dispatch_exception,
"return": trace_dispatch_return,
"c_call": trace_dispatch_c_call,
"c_exception": trace_dispatch_return, # the C function returned
"c_return": trace_dispatch_return,
}
# The next few functions play with self.cmd. By carefully preloading
# our parallel stack, we can force the profiled result to include
# an arbitrary string as the name of the calling function.
# We use self.cmd as that string, and the resulting stats look
# very nice :-).
def set_cmd(self, cmd):
if self.cur[-1]: return # already set
self.cmd = cmd
self.simulate_call(cmd)
class fake_code:
def __init__(self, filename, line, name):
self.co_filename = filename
self.co_line = line
self.co_name = name
self.co_firstlineno = 0
def __repr__(self):
return repr((self.co_filename, self.co_line, self.co_name))
class fake_frame:
def __init__(self, code, prior):
self.f_code = code
self.f_back = prior
def simulate_call(self, name):
code = self.fake_code('profile', 0, name)
if self.cur:
pframe = self.cur[-2]
else:
pframe = None
frame = self.fake_frame(code, pframe)
self.dispatch['call'](self, frame, 0)
# collect stats from pending stack, including getting final
# timings for self.cmd frame.
def simulate_cmd_complete(self):
get_time = self.get_time
t = get_time() - self.t
while self.cur[-1]:
# We *can* cause assertion errors here if
# dispatch_trace_return checks for a frame match!
self.dispatch['return'](self, self.cur[-2], t)
t = 0
self.t = get_time() - t
def print_stats(self, sort=-1):
import pstats
pstats.Stats(self).strip_dirs().sort_stats(sort). \
print_stats()
def dump_stats(self, file):
f = open(file, 'wb')
self.create_stats()
marshal.dump(self.stats, f)
f.close()
def create_stats(self):
self.simulate_cmd_complete()
self.snapshot_stats()
def snapshot_stats(self):
self.stats = {}
for func, (cc, ns, tt, ct, callers) in self.timings.iteritems():
callers = callers.copy()
nc = 0
for callcnt in callers.itervalues():
nc += callcnt
self.stats[func] = cc, nc, tt, ct, callers
# The following two methods can be called by clients to use
# a profiler to profile a statement, given as a string.
def run(self, cmd):
import __main__
dict = __main__.__dict__
return self.runctx(cmd, dict, dict)
def runctx(self, cmd, globals, locals):
self.set_cmd(cmd)
sys.setprofile(self.dispatcher)
try:
exec cmd in globals, locals
finally:
sys.setprofile(None)
return self
# This method is more useful to profile a single function call.
def runcall(self, func, *args, **kw):
self.set_cmd(repr(func))
sys.setprofile(self.dispatcher)
try:
return func(*args, **kw)
finally:
sys.setprofile(None)
#******************************************************************
# The following calculates the overhead for using a profiler. The
# problem is that it takes a fair amount of time for the profiler
# to stop the stopwatch (from the time it receives an event).
# Similarly, there is a delay from the time that the profiler
# re-starts the stopwatch before the user's code really gets to
# continue. The following code tries to measure the difference on
# a per-event basis.
#
# Note that this difference is only significant if there are a lot of
# events, and relatively little user code per event. For example,
# code with small functions will typically benefit from having the
# profiler calibrated for the current platform. This *could* be
# done on the fly during init() time, but it is not worth the
# effort. Also note that if too large a value specified, then
# execution time on some functions will actually appear as a
# negative number. It is *normal* for some functions (with very
# low call counts) to have such negative stats, even if the
# calibration figure is "correct."
#
# One alternative to profile-time calibration adjustments (i.e.,
# adding in the magic little delta during each event) is to track
# more carefully the number of events (and cumulatively, the number
# of events during sub functions) that are seen. If this were
# done, then the arithmetic could be done after the fact (i.e., at
# display time). Currently, we track only call/return events.
# These values can be deduced by examining the callees and callers
# vectors for each functions. Hence we *can* almost correct the
# internal time figure at print time (note that we currently don't
# track exception event processing counts). Unfortunately, there
# is currently no similar information for cumulative sub-function
# time. It would not be hard to "get all this info" at profiler
# time. Specifically, we would have to extend the tuples to keep
# counts of this in each frame, and then extend the defs of timing
# tuples to include the significant two figures. I'm a bit fearful
# that this additional feature will slow the heavily optimized
# event/time ratio (i.e., the profiler would run slower, fur a very
# low "value added" feature.)
#**************************************************************
def calibrate(self, m, verbose=0):
if self.__class__ is not Profile:
raise TypeError("Subclasses must override .calibrate().")
saved_bias = self.bias
self.bias = 0
try:
return self._calibrate_inner(m, verbose)
finally:
self.bias = saved_bias
def _calibrate_inner(self, m, verbose):
get_time = self.get_time
# Set up a test case to be run with and without profiling. Include
# lots of calls, because we're trying to quantify stopwatch overhead.
# Do not raise any exceptions, though, because we want to know
# exactly how many profile events are generated (one call event, +
# one return event, per Python-level call).
def f1(n):
for i in range(n):
x = 1
def f(m, f1=f1):
for i in range(m):
f1(100)
f(m) # warm up the cache
# elapsed_noprofile <- time f(m) takes without profiling.
t0 = get_time()
f(m)
t1 = get_time()
elapsed_noprofile = t1 - t0
if verbose:
print "elapsed time without profiling =", elapsed_noprofile
# elapsed_profile <- time f(m) takes with profiling. The difference
# is profiling overhead, only some of which the profiler subtracts
# out on its own.
p = Profile()
t0 = get_time()
p.runctx('f(m)', globals(), locals())
t1 = get_time()
elapsed_profile = t1 - t0
if verbose:
print "elapsed time with profiling =", elapsed_profile
# reported_time <- "CPU seconds" the profiler charged to f and f1.
total_calls = 0.0
reported_time = 0.0
for (filename, line, funcname), (cc, ns, tt, ct, callers) in \
p.timings.items():
if funcname in ("f", "f1"):
total_calls += cc
reported_time += tt
if verbose:
print "'CPU seconds' profiler reported =", reported_time
print "total # calls =", total_calls
if total_calls != m + 1:
raise ValueError("internal error: total calls = %d" % total_calls)
# reported_time - elapsed_noprofile = overhead the profiler wasn't
# able to measure. Divide by twice the number of calls (since there
# are two profiler events per call in this test) to get the hidden
# overhead per event.
mean = (reported_time - elapsed_noprofile) / 2.0 / total_calls
if verbose:
print "mean stopwatch overhead per profile event =", mean
return mean
#****************************************************************************
def Stats(*args):
print 'Report generating functions are in the "pstats" module\a'
def main():
usage = "profile.py [-o output_file_path] [-s sort] scriptfile [arg] ..."
parser = OptionParser(usage=usage)
parser.allow_interspersed_args = False
parser.add_option('-o', '--outfile', dest="outfile",
help="Save stats to <outfile>", default=None)
parser.add_option('-s', '--sort', dest="sort",
help="Sort order when printing to stdout, based on pstats.Stats class",
default=-1)
if not sys.argv[1:]:
parser.print_usage()
sys.exit(2)
(options, args) = parser.parse_args()
sys.argv[:] = args
if len(args) > 0:
progname = args[0]
sys.path.insert(0, os.path.dirname(progname))
with open(progname, 'rb') as fp:
code = compile(fp.read(), progname, 'exec')
globs = {
'__file__': progname,
'__name__': '__main__',
'__package__': None,
}
runctx(code, globs, None, options.outfile, options.sort)
else:
parser.print_usage()
return parser
# When invoked as main program, invoke the profiler on a script
if __name__ == '__main__':
main()
"""Class for printing reports on profiled python code."""
# Written by James Roskind
# Based on prior profile module by Sjoerd Mullender...
# which was hacked somewhat by: Guido van Rossum
# Copyright Disney Enterprises, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied. See the License for the specific language
# governing permissions and limitations under the License.
import sys
import os
import time
import marshal
import re
from functools import cmp_to_key
__all__ = ["Stats"]
class Stats:
"""This class is used for creating reports from data generated by the
Profile class. It is a "friend" of that class, and imports data either
by direct access to members of Profile class, or by reading in a dictionary
that was emitted (via marshal) from the Profile class.
The big change from the previous Profiler (in terms of raw functionality)
is that an "add()" method has been provided to combine Stats from
several distinct profile runs. Both the constructor and the add()
method now take arbitrarily many file names as arguments.
All the print methods now take an argument that indicates how many lines
to print. If the arg is a floating point number between 0 and 1.0, then
it is taken as a decimal percentage of the available lines to be printed
(e.g., .1 means print 10% of all available lines). If it is an integer,
it is taken to mean the number of lines of data that you wish to have
printed.
The sort_stats() method now processes some additional options (i.e., in
addition to the old -1, 0, 1, or 2). It takes an arbitrary number of
quoted strings to select the sort order. For example sort_stats('time',
'name') sorts on the major key of 'internal function time', and on the
minor key of 'the name of the function'. Look at the two tables in
sort_stats() and get_sort_arg_defs(self) for more examples.
All methods return self, so you can string together commands like:
Stats('foo', 'goo').strip_dirs().sort_stats('calls').\
print_stats(5).print_callers(5)
"""
def __init__(self, *args, **kwds):
# I can't figure out how to explicitly specify a stream keyword arg
# with *args:
# def __init__(self, *args, stream=sys.stdout): ...
# so I use **kwds and sqauwk if something unexpected is passed in.
self.stream = sys.stdout
if "stream" in kwds:
self.stream = kwds["stream"]
del kwds["stream"]
if kwds:
keys = kwds.keys()
keys.sort()
extras = ", ".join(["%s=%s" % (k, kwds[k]) for k in keys])
raise ValueError, "unrecognized keyword args: %s" % extras
if not len(args):
arg = None
else:
arg = args[0]
args = args[1:]
self.init(arg)
self.add(*args)
def init(self, arg):
self.all_callees = None # calc only if needed
self.files = []
self.fcn_list = None
self.total_tt = 0
self.total_calls = 0
self.prim_calls = 0
self.max_name_len = 0
self.top_level = {}
self.stats = {}
self.sort_arg_dict = {}
self.load_stats(arg)
trouble = 1
try:
self.get_top_level_stats()
trouble = 0
finally:
if trouble:
print >> self.stream, "Invalid timing data",
if self.files: print >> self.stream, self.files[-1],
print >> self.stream
def load_stats(self, arg):
if not arg: self.stats = {}
elif isinstance(arg, basestring):
f = open(arg, 'rb')
self.stats = marshal.load(f)
f.close()
try:
file_stats = os.stat(arg)
arg = time.ctime(file_stats.st_mtime) + " " + arg
except: # in case this is not unix
pass
self.files = [ arg ]
elif hasattr(arg, 'create_stats'):
arg.create_stats()
self.stats = arg.stats
arg.stats = {}
if not self.stats:
raise TypeError("Cannot create or construct a %r object from %r"
% (self.__class__, arg))
return
def get_top_level_stats(self):
for func, (cc, nc, tt, ct, callers) in self.stats.items():
self.total_calls += nc
self.prim_calls += cc
self.total_tt += tt
if ("jprofile", 0, "profiler") in callers:
self.top_level[func] = None
if len(func_std_string(func)) > self.max_name_len:
self.max_name_len = len(func_std_string(func))
def add(self, *arg_list):
if not arg_list: return self
if len(arg_list) > 1: self.add(*arg_list[1:])
other = arg_list[0]
if type(self) != type(other) or self.__class__ != other.__class__:
other = Stats(other)
self.files += other.files
self.total_calls += other.total_calls
self.prim_calls += other.prim_calls
self.total_tt += other.total_tt
for func in other.top_level:
self.top_level[func] = None
if self.max_name_len < other.max_name_len:
self.max_name_len = other.max_name_len
self.fcn_list = None
for func, stat in other.stats.iteritems():
if func in self.stats:
old_func_stat = self.stats[func]
else:
old_func_stat = (0, 0, 0, 0, {},)
self.stats[func] = add_func_stats(old_func_stat, stat)
return self
def dump_stats(self, filename):
"""Write the profile data to a file we know how to load back."""
f = file(filename, 'wb')
try:
marshal.dump(self.stats, f)
finally:
f.close()
# list the tuple indices and directions for sorting,
# along with some printable description
sort_arg_dict_default = {
"calls" : (((1,-1), ), "call count"),
"ncalls" : (((1,-1), ), "call count"),
"cumtime" : (((3,-1), ), "cumulative time"),
"cumulative": (((3,-1), ), "cumulative time"),
"file" : (((4, 1), ), "file name"),
"filename" : (((4, 1), ), "file name"),
"line" : (((5, 1), ), "line number"),
"module" : (((4, 1), ), "file name"),
"name" : (((6, 1), ), "function name"),
"nfl" : (((6, 1),(4, 1),(5, 1),), "name/file/line"),
"pcalls" : (((0,-1), ), "primitive call count"),
"stdname" : (((7, 1), ), "standard name"),
"time" : (((2,-1), ), "internal time"),
"tottime" : (((2,-1), ), "internal time"),
}
def get_sort_arg_defs(self):
"""Expand all abbreviations that are unique."""
if not self.sort_arg_dict:
self.sort_arg_dict = dict = {}
bad_list = {}
for word, tup in self.sort_arg_dict_default.iteritems():
fragment = word
while fragment:
if not fragment:
break
if fragment in dict:
bad_list[fragment] = 0
break
dict[fragment] = tup
fragment = fragment[:-1]
for word in bad_list:
del dict[word]
return self.sort_arg_dict
def sort_stats(self, *field):
if not field:
self.fcn_list = 0
return self
if len(field) == 1 and isinstance(field[0], (int, long)):
# Be compatible with old profiler
field = [ {-1: "stdname",
0: "calls",
1: "time",
2: "cumulative"}[field[0]] ]
sort_arg_defs = self.get_sort_arg_defs()
sort_tuple = ()
self.sort_type = ""
connector = ""
for word in field:
sort_tuple = sort_tuple + sort_arg_defs[word][0]
self.sort_type += connector + sort_arg_defs[word][1]
connector = ", "
stats_list = []
for func, (cc, nc, tt, ct, callers) in self.stats.iteritems():
stats_list.append((cc, nc, tt, ct) + func +
(func_std_string(func), func))
stats_list.sort(key=cmp_to_key(TupleComp(sort_tuple).compare))
self.fcn_list = fcn_list = []
for tuple in stats_list:
fcn_list.append(tuple[-1])
return self
def reverse_order(self):
if self.fcn_list:
self.fcn_list.reverse()
return self
def strip_dirs(self):
oldstats = self.stats
self.stats = newstats = {}
max_name_len = 0
for func, (cc, nc, tt, ct, callers) in oldstats.iteritems():
newfunc = func_strip_path(func)
if len(func_std_string(newfunc)) > max_name_len:
max_name_len = len(func_std_string(newfunc))
newcallers = {}
for func2, caller in callers.iteritems():
newcallers[func_strip_path(func2)] = caller
if newfunc in newstats:
newstats[newfunc] = add_func_stats(
newstats[newfunc],
(cc, nc, tt, ct, newcallers))
else:
newstats[newfunc] = (cc, nc, tt, ct, newcallers)
old_top = self.top_level
self.top_level = new_top = {}
for func in old_top:
new_top[func_strip_path(func)] = None
self.max_name_len = max_name_len
self.fcn_list = None
self.all_callees = None
return self
def calc_callees(self):
if self.all_callees: return
self.all_callees = all_callees = {}
for func, (cc, nc, tt, ct, callers) in self.stats.iteritems():
if not func in all_callees:
all_callees[func] = {}
for func2, caller in callers.iteritems():
if not func2 in all_callees:
all_callees[func2] = {}
all_callees[func2][func] = caller
return
#******************************************************************
# The following functions support actual printing of reports
#******************************************************************
# Optional "amount" is either a line count, or a percentage of lines.
def eval_print_amount(self, sel, list, msg):
new_list = list
if isinstance(sel, basestring):
try:
rex = re.compile(sel)
except re.error:
msg += " <Invalid regular expression %r>\n" % sel
return new_list, msg
new_list = []
for func in list:
if rex.search(func_std_string(func)):
new_list.append(func)
else:
count = len(list)
if isinstance(sel, float) and 0.0 <= sel < 1.0:
count = int(count * sel + .5)
new_list = list[:count]
elif isinstance(sel, (int, long)) and 0 <= sel < count:
count = sel
new_list = list[:count]
if len(list) != len(new_list):
msg += " List reduced from %r to %r due to restriction <%r>\n" % (
len(list), len(new_list), sel)
return new_list, msg
def get_print_list(self, sel_list):
width = self.max_name_len
if self.fcn_list:
stat_list = self.fcn_list[:]
msg = " Ordered by: " + self.sort_type + '\n'
else:
stat_list = self.stats.keys()
msg = " Random listing order was used\n"
for selection in sel_list:
stat_list, msg = self.eval_print_amount(selection, stat_list, msg)
count = len(stat_list)
if not stat_list:
return 0, stat_list
print >> self.stream, msg
if count < len(self.stats):
width = 0
for func in stat_list:
if len(func_std_string(func)) > width:
width = len(func_std_string(func))
return width+2, stat_list
def print_stats(self, *amount):
for filename in self.files:
print >> self.stream, filename
if self.files: print >> self.stream
indent = ' ' * 8
for func in self.top_level:
print >> self.stream, indent, func_get_function_name(func)
print >> self.stream, indent, self.total_calls, "function calls",
if self.total_calls != self.prim_calls:
print >> self.stream, "(%d primitive calls)" % self.prim_calls,
print >> self.stream, "in %.3f seconds" % self.total_tt
print >> self.stream
width, list = self.get_print_list(amount)
if list:
self.print_title()
for func in list:
self.print_line(func)
print >> self.stream
print >> self.stream
return self
def print_callees(self, *amount):
width, list = self.get_print_list(amount)
if list:
self.calc_callees()
self.print_call_heading(width, "called...")
for func in list:
if func in self.all_callees:
self.print_call_line(width, func, self.all_callees[func])
else:
self.print_call_line(width, func, {})
print >> self.stream
print >> self.stream
return self
def print_callers(self, *amount):
width, list = self.get_print_list(amount)
if list:
self.print_call_heading(width, "was called by...")
for func in list:
cc, nc, tt, ct, callers = self.stats[func]
self.print_call_line(width, func, callers, "<-")
print >> self.stream
print >> self.stream
return self
def print_call_heading(self, name_size, column_title):
print >> self.stream, "Function ".ljust(name_size) + column_title
# print sub-header only if we have new-style callers
subheader = False
for cc, nc, tt, ct, callers in self.stats.itervalues():
if callers:
value = callers.itervalues().next()
subheader = isinstance(value, tuple)
break
if subheader:
print >> self.stream, " "*name_size + " ncalls tottime cumtime"
def print_call_line(self, name_size, source, call_dict, arrow="->"):
print >> self.stream, func_std_string(source).ljust(name_size) + arrow,
if not call_dict:
print >> self.stream
return
clist = call_dict.keys()
clist.sort()
indent = ""
for func in clist:
name = func_std_string(func)
value = call_dict[func]
if isinstance(value, tuple):
nc, cc, tt, ct = value
if nc != cc:
substats = '%d/%d' % (nc, cc)
else:
substats = '%d' % (nc,)
substats = '%s %s %s %s' % (substats.rjust(7+2*len(indent)),
f8(tt), f8(ct), name)
left_width = name_size + 1
else:
substats = '%s(%r) %s' % (name, value, f8(self.stats[func][3]))
left_width = name_size + 3
print >> self.stream, indent*left_width + substats
indent = " "
def print_title(self):
print >> self.stream, ' ncalls tottime percall cumtime percall',
print >> self.stream, 'filename:lineno(function)'
def print_line(self, func): # hack : should print percentages
cc, nc, tt, ct, callers = self.stats[func]
c = str(nc)
if nc != cc:
c = c + '/' + str(cc)
print >> self.stream, c.rjust(9),
print >> self.stream, f8(tt),
if nc == 0:
print >> self.stream, ' '*8,
else:
print >> self.stream, f8(float(tt)/nc),
print >> self.stream, f8(ct),
if cc == 0:
print >> self.stream, ' '*8,
else:
print >> self.stream, f8(float(ct)/cc),
print >> self.stream, func_std_string(func)
class TupleComp:
"""This class provides a generic function for comparing any two tuples.
Each instance records a list of tuple-indices (from most significant
to least significant), and sort direction (ascending or decending) for
each tuple-index. The compare functions can then be used as the function
argument to the system sort() function when a list of tuples need to be
sorted in the instances order."""
def __init__(self, comp_select_list):
self.comp_select_list = comp_select_list
def compare (self, left, right):
for index, direction in self.comp_select_list:
l = left[index]
r = right[index]
if l < r:
return -direction
if l > r:
return direction
return 0
#**************************************************************************
# func_name is a triple (file:string, line:int, name:string)
def func_strip_path(func_name):
filename, line, name = func_name
return os.path.basename(filename), line, name
def func_get_function_name(func):
return func[2]
def func_std_string(func_name): # match what old profile produced
if func_name[:2] == ('~', 0):
# special case for built-in functions
name = func_name[2]
if name.startswith('<') and name.endswith('>'):
return '{%s}' % name[1:-1]
else:
return name
else:
return "%s:%d(%s)" % func_name
#**************************************************************************
# The following functions combine statists for pairs functions.
# The bulk of the processing involves correctly handling "call" lists,
# such as callers and callees.
#**************************************************************************
def add_func_stats(target, source):
"""Add together all the stats for two profile entries."""
cc, nc, tt, ct, callers = source
t_cc, t_nc, t_tt, t_ct, t_callers = target
return (cc+t_cc, nc+t_nc, tt+t_tt, ct+t_ct,
add_callers(t_callers, callers))
def add_callers(target, source):
"""Combine two caller lists in a single list."""
new_callers = {}
for func, caller in target.iteritems():
new_callers[func] = caller
for func, caller in source.iteritems():
if func in new_callers:
if isinstance(caller, tuple):
# format used by cProfile
new_callers[func] = tuple([i[0] + i[1] for i in
zip(caller, new_callers[func])])
else:
# format used by profile
new_callers[func] += caller
else:
new_callers[func] = caller
return new_callers
def count_calls(callers):
"""Sum the caller statistics to get total number of calls received."""
nc = 0
for calls in callers.itervalues():
nc += calls
return nc
#**************************************************************************
# The following functions support printing of reports
#**************************************************************************
def f8(x):
return "%8.3f" % x
#**************************************************************************
# Statistics browser added by ESR, April 2001
#**************************************************************************
if __name__ == '__main__':
import cmd
try:
import readline
except ImportError:
pass
class ProfileBrowser(cmd.Cmd):
def __init__(self, profile=None):
cmd.Cmd.__init__(self)
self.prompt = "% "
self.stats = None
self.stream = sys.stdout
if profile is not None:
self.do_read(profile)
def generic(self, fn, line):
args = line.split()
processed = []
for term in args:
try:
processed.append(int(term))
continue
except ValueError:
pass
try:
frac = float(term)
if frac > 1 or frac < 0:
print >> self.stream, "Fraction argument must be in [0, 1]"
continue
processed.append(frac)
continue
except ValueError:
pass
processed.append(term)
if self.stats:
getattr(self.stats, fn)(*processed)
else:
print >> self.stream, "No statistics object is loaded."
return 0
def generic_help(self):
print >> self.stream, "Arguments may be:"
print >> self.stream, "* An integer maximum number of entries to print."
print >> self.stream, "* A decimal fractional number between 0 and 1, controlling"
print >> self.stream, " what fraction of selected entries to print."
print >> self.stream, "* A regular expression; only entries with function names"
print >> self.stream, " that match it are printed."
def do_add(self, line):
if self.stats:
self.stats.add(line)
else:
print >> self.stream, "No statistics object is loaded."
return 0
def help_add(self):
print >> self.stream, "Add profile info from given file to current statistics object."
def do_callees(self, line):
return self.generic('print_callees', line)
def help_callees(self):
print >> self.stream, "Print callees statistics from the current stat object."
self.generic_help()
def do_callers(self, line):
return self.generic('print_callers', line)
def help_callers(self):
print >> self.stream, "Print callers statistics from the current stat object."
self.generic_help()
def do_EOF(self, line):
print >> self.stream, ""
return 1
def help_EOF(self):
print >> self.stream, "Leave the profile brower."
def do_quit(self, line):
return 1
def help_quit(self):
print >> self.stream, "Leave the profile brower."
def do_read(self, line):
if line:
try:
self.stats = Stats(line)
except IOError, args:
print >> self.stream, args[1]
return
except Exception as err:
print >> self.stream, err.__class__.__name__ + ':', err
return
self.prompt = line + "% "
elif len(self.prompt) > 2:
line = self.prompt[:-2]
self.do_read(line)
else:
print >> self.stream, "No statistics object is current -- cannot reload."
return 0
def help_read(self):
print >> self.stream, "Read in profile data from a specified file."
print >> self.stream, "Without argument, reload the current file."
def do_reverse(self, line):
if self.stats:
self.stats.reverse_order()
else:
print >> self.stream, "No statistics object is loaded."
return 0
def help_reverse(self):
print >> self.stream, "Reverse the sort order of the profiling report."
def do_sort(self, line):
if not self.stats:
print >> self.stream, "No statistics object is loaded."
return
abbrevs = self.stats.get_sort_arg_defs()
if line and all((x in abbrevs) for x in line.split()):
self.stats.sort_stats(*line.split())
else:
print >> self.stream, "Valid sort keys (unique prefixes are accepted):"
for (key, value) in Stats.sort_arg_dict_default.iteritems():
print >> self.stream, "%s -- %s" % (key, value[1])
return 0
def help_sort(self):
print >> self.stream, "Sort profile data according to specified keys."
print >> self.stream, "(Typing `sort' without arguments lists valid keys.)"
def complete_sort(self, text, *args):
return [a for a in Stats.sort_arg_dict_default if a.startswith(text)]
def do_stats(self, line):
return self.generic('print_stats', line)
def help_stats(self):
print >> self.stream, "Print statistics from the current stat object."
self.generic_help()
def do_strip(self, line):
if self.stats:
self.stats.strip_dirs()
else:
print >> self.stream, "No statistics object is loaded."
def help_strip(self):
print >> self.stream, "Strip leading path information from filenames in the report."
def help_help(self):
print >> self.stream, "Show help for a given command."
def postcmd(self, stop, line):
if stop:
return stop
return None
import sys
if len(sys.argv) > 1:
initprofile = sys.argv[1]
else:
initprofile = None
try:
browser = ProfileBrowser(initprofile)
print >> browser.stream, "Welcome to the profile statistics browser."
browser.cmdloop()
print >> browser.stream, "Goodbye."
except KeyboardInterrupt:
pass
# That's all, folks.
"""Parse a Python module and describe its classes and methods.
Parse enough of a Python file to recognize imports and class and
method definitions, and to find out the superclasses of a class.
The interface consists of a single function:
readmodule_ex(module [, path])
where module is the name of a Python module, and path is an optional
list of directories where the module is to be searched. If present,
path is prepended to the system search path sys.path. The return
value is a dictionary. The keys of the dictionary are the names of
the classes defined in the module (including classes that are defined
via the from XXX import YYY construct). The values are class
instances of the class Class defined here. One special key/value pair
is present for packages: the key '__path__' has a list as its value
which contains the package search path.
A class is described by the class Class in this module. Instances
of this class have the following instance variables:
module -- the module name
name -- the name of the class
super -- a list of super classes (Class instances)
methods -- a dictionary of methods
file -- the file in which the class was defined
lineno -- the line in the file on which the class statement occurred
The dictionary of methods uses the method names as keys and the line
numbers on which the method was defined as values.
If the name of a super class is not recognized, the corresponding
entry in the list of super classes is not a class instance but a
string giving the name of the super class. Since import statements
are recognized and imported modules are scanned as well, this
shouldn't happen often.
A function is described by the class Function in this module.
Instances of this class have the following instance variables:
module -- the module name
name -- the name of the class
file -- the file in which the class was defined
lineno -- the line in the file on which the class statement occurred
"""
import sys
import imp
import tokenize
from token import NAME, DEDENT, OP
from operator import itemgetter
__all__ = ["readmodule", "readmodule_ex", "Class", "Function"]
_modules = {} # cache of modules we've seen
# each Python class is represented by an instance of this class
class Class:
'''Class to represent a Python class.'''
def __init__(self, module, name, super, file, lineno):
self.module = module
self.name = name
if super is None:
super = []
self.super = super
self.methods = {}
self.file = file
self.lineno = lineno
def _addmethod(self, name, lineno):
self.methods[name] = lineno
class Function:
'''Class to represent a top-level Python function'''
def __init__(self, module, name, file, lineno):
self.module = module
self.name = name
self.file = file
self.lineno = lineno
def readmodule(module, path=None):
'''Backwards compatible interface.
Call readmodule_ex() and then only keep Class objects from the
resulting dictionary.'''
res = {}
for key, value in _readmodule(module, path or []).items():
if isinstance(value, Class):
res[key] = value
return res
def readmodule_ex(module, path=None):
'''Read a module file and return a dictionary of classes.
Search for MODULE in PATH and sys.path, read and parse the
module and return a dictionary with one entry for each class
found in the module.
'''
return _readmodule(module, path or [])
def _readmodule(module, path, inpackage=None):
'''Do the hard work for readmodule[_ex].
If INPACKAGE is given, it must be the dotted name of the package in
which we are searching for a submodule, and then PATH must be the
package search path; otherwise, we are searching for a top-level
module, and PATH is combined with sys.path.
'''
# Compute the full module name (prepending inpackage if set)
if inpackage is not None:
fullmodule = "%s.%s" % (inpackage, module)
else:
fullmodule = module
# Check in the cache
if fullmodule in _modules:
return _modules[fullmodule]
# Initialize the dict for this module's contents
dict = {}
# Check if it is a built-in module; we don't do much for these
if module in sys.builtin_module_names and inpackage is None:
_modules[module] = dict
return dict
# Check for a dotted module name
i = module.rfind('.')
if i >= 0:
package = module[:i]
submodule = module[i+1:]
parent = _readmodule(package, path, inpackage)
if inpackage is not None:
package = "%s.%s" % (inpackage, package)
if not '__path__' in parent:
raise ImportError('No package named {}'.format(package))
return _readmodule(submodule, parent['__path__'], package)
# Search the path for the module
f = None
if inpackage is not None:
f, fname, (_s, _m, ty) = imp.find_module(module, path)
else:
f, fname, (_s, _m, ty) = imp.find_module(module, path + sys.path)
if ty == imp.PKG_DIRECTORY:
dict['__path__'] = [fname]
path = [fname] + path
f, fname, (_s, _m, ty) = imp.find_module('__init__', [fname])
_modules[fullmodule] = dict
if ty != imp.PY_SOURCE:
# not Python source, can't do anything with this module
f.close()
return dict
stack = [] # stack of (class, indent) pairs
g = tokenize.generate_tokens(f.readline)
try:
for tokentype, token, start, _end, _line in g:
if tokentype == DEDENT:
lineno, thisindent = start
# close nested classes and defs
while stack and stack[-1][1] >= thisindent:
del stack[-1]
elif token == 'def':
lineno, thisindent = start
# close previous nested classes and defs
while stack and stack[-1][1] >= thisindent:
del stack[-1]
tokentype, meth_name, start = g.next()[0:3]
if tokentype != NAME:
continue # Syntax error
if stack:
cur_class = stack[-1][0]
if isinstance(cur_class, Class):
# it's a method
cur_class._addmethod(meth_name, lineno)
# else it's a nested def
else:
# it's a function
dict[meth_name] = Function(fullmodule, meth_name,
fname, lineno)
stack.append((None, thisindent)) # Marker for nested fns
elif token == 'class':
lineno, thisindent = start
# close previous nested classes and defs
while stack and stack[-1][1] >= thisindent:
del stack[-1]
tokentype, class_name, start = g.next()[0:3]
if tokentype != NAME:
continue # Syntax error
# parse what follows the class name
tokentype, token, start = g.next()[0:3]
inherit = None
if token == '(':
names = [] # List of superclasses
# there's a list of superclasses
level = 1
super = [] # Tokens making up current superclass
while True:
tokentype, token, start = g.next()[0:3]
if token in (')', ',') and level == 1:
n = "".join(super)
if n in dict:
# we know this super class
n = dict[n]
else:
c = n.split('.')
if len(c) > 1:
# super class is of the form
# module.class: look in module for
# class
m = c[-2]
c = c[-1]
if m in _modules:
d = _modules[m]
if c in d:
n = d[c]
names.append(n)
super = []
if token == '(':
level += 1
elif token == ')':
level -= 1
if level == 0:
break
elif token == ',' and level == 1:
pass
# only use NAME and OP (== dot) tokens for type name
elif tokentype in (NAME, OP) and level == 1:
super.append(token)
# expressions in the base list are not supported
inherit = names
cur_class = Class(fullmodule, class_name, inherit,
fname, lineno)
if not stack:
dict[class_name] = cur_class
stack.append((cur_class, thisindent))
elif token == 'import' and start[1] == 0:
modules = _getnamelist(g)
for mod, _mod2 in modules:
try:
# Recursively read the imported module
if inpackage is None:
_readmodule(mod, path)
else:
try:
_readmodule(mod, path, inpackage)
except ImportError:
_readmodule(mod, [])
except:
# If we can't find or parse the imported module,
# too bad -- don't die here.
pass
elif token == 'from' and start[1] == 0:
mod, token = _getname(g)
if not mod or token != "import":
continue
names = _getnamelist(g)
try:
# Recursively read the imported module
d = _readmodule(mod, path, inpackage)
except:
# If we can't find or parse the imported module,
# too bad -- don't die here.
continue
# add any classes that were defined in the imported module
# to our name space if they were mentioned in the list
for n, n2 in names:
if n in d:
dict[n2 or n] = d[n]
elif n == '*':
# don't add names that start with _
for n in d:
if n[0] != '_':
dict[n] = d[n]
except StopIteration:
pass
f.close()
return dict
def _getnamelist(g):
# Helper to get a comma-separated list of dotted names plus 'as'
# clauses. Return a list of pairs (name, name2) where name2 is
# the 'as' name, or None if there is no 'as' clause.
names = []
while True:
name, token = _getname(g)
if not name:
break
if token == 'as':
name2, token = _getname(g)
else:
name2 = None
names.append((name, name2))
while token != "," and "\n" not in token:
token = g.next()[1]
if token != ",":
break
return names
def _getname(g):
# Helper to get a dotted name, return a pair (name, token) where
# name is the dotted name, or None if there was no dotted name,
# and token is the next input token.
parts = []
tokentype, token = g.next()[0:2]
if tokentype != NAME and token != '*':
return (None, token)
parts.append(token)
while True:
tokentype, token = g.next()[0:2]
if token != '.':
break
tokentype, token = g.next()[0:2]
if tokentype != NAME:
break
parts.append(token)
return (".".join(parts), token)
def _main():
# Main program for testing.
import os
mod = sys.argv[1]
if os.path.exists(mod):
path = [os.path.dirname(mod)]
mod = os.path.basename(mod)
if mod.lower().endswith(".py"):
mod = mod[:-3]
else:
path = []
dict = readmodule_ex(mod, path)
objs = dict.values()
objs.sort(lambda a, b: cmp(getattr(a, 'lineno', 0),
getattr(b, 'lineno', 0)))
for obj in objs:
if isinstance(obj, Class):
print "class", obj.name, obj.super, obj.lineno
methods = sorted(obj.methods.iteritems(), key=itemgetter(1))
for name, lineno in methods:
if name != "__path__":
print " def", name, lineno
elif isinstance(obj, Function):
print "def", obj.name, obj.lineno
if __name__ == "__main__":
_main()
#!/usr/bin/env python
# -*- coding: latin-1 -*-
"""Generate Python documentation in HTML or text for interactive use.
In the Python interpreter, do "from pydoc import help" to provide online
help. Calling help(thing) on a Python object documents the object.
Or, at the shell command line outside of Python:
Run "pydoc <name>" to show documentation on something. <name> may be
the name of a function, module, package, or a dotted reference to a
class or function within a module or module in a package. If the
argument contains a path segment delimiter (e.g. slash on Unix,
backslash on Windows) it is treated as the path to a Python source file.
Run "pydoc -k <keyword>" to search for a keyword in the synopsis lines
of all available modules.
Run "pydoc -p <port>" to start an HTTP server on a given port on the
local machine to generate documentation web pages. Port number 0 can be
used to get an arbitrary unused port.
For platforms without a command line, "pydoc -g" starts the HTTP server
and also pops up a little window for controlling it.
Run "pydoc -w <name>" to write out the HTML documentation for a module
to a file named "<name>.html".
Module docs for core modules are assumed to be in
https://docs.python.org/library/
This can be overridden by setting the PYTHONDOCS environment variable
to a different URL or to a local directory containing the Library
Reference Manual pages.
"""
__author__ = "Ka-Ping Yee <[email protected]>"
__date__ = "26 February 2001"
__version__ = "$Revision: 88564 $"
__credits__ = """Guido van Rossum, for an excellent programming language.
Tommy Burnette, the original creator of manpy.
Paul Prescod, for all his work on onlinehelp.
Richard Chamberlain, for the first implementation of textdoc.
"""
# Known bugs that can't be fixed here:
# - imp.load_module() cannot be prevented from clobbering existing
# loaded modules, so calling synopsis() on a binary module file
# changes the contents of any existing module with the same name.
# - If the __file__ attribute on a module is a relative path and
# the current directory is changed with os.chdir(), an incorrect
# path will be displayed.
import sys, imp, os, re, types, inspect, __builtin__, pkgutil, warnings
from repr import Repr
from string import expandtabs, find, join, lower, split, strip, rfind, rstrip
from traceback import extract_tb
try:
from collections import deque
except ImportError:
# Python 2.3 compatibility
class deque(list):
def popleft(self):
return self.pop(0)
# --------------------------------------------------------- common routines
def pathdirs():
"""Convert sys.path into a list of absolute, existing, unique paths."""
dirs = []
normdirs = []
for dir in sys.path:
dir = os.path.abspath(dir or '.')
normdir = os.path.normcase(dir)
if normdir not in normdirs and os.path.isdir(dir):
dirs.append(dir)
normdirs.append(normdir)
return dirs
def getdoc(object):
"""Get the doc string or comments for an object."""
result = inspect.getdoc(object) or inspect.getcomments(object)
result = _encode(result)
return result and re.sub('^ *\n', '', rstrip(result)) or ''
def splitdoc(doc):
"""Split a doc string into a synopsis line (if any) and the rest."""
lines = split(strip(doc), '\n')
if len(lines) == 1:
return lines[0], ''
elif len(lines) >= 2 and not rstrip(lines[1]):
return lines[0], join(lines[2:], '\n')
return '', join(lines, '\n')
def classname(object, modname):
"""Get a class name and qualify it with a module name if necessary."""
name = object.__name__
if object.__module__ != modname:
name = object.__module__ + '.' + name
return name
def isdata(object):
"""Check if an object is of a type that probably means it's data."""
return not (inspect.ismodule(object) or inspect.isclass(object) or
inspect.isroutine(object) or inspect.isframe(object) or
inspect.istraceback(object) or inspect.iscode(object))
def replace(text, *pairs):
"""Do a series of global replacements on a string."""
while pairs:
text = join(split(text, pairs[0]), pairs[1])
pairs = pairs[2:]
return text
def cram(text, maxlen):
"""Omit part of a string if needed to make it fit in a maximum length."""
if len(text) > maxlen:
pre = max(0, (maxlen-3)//2)
post = max(0, maxlen-3-pre)
return text[:pre] + '...' + text[len(text)-post:]
return text
_re_stripid = re.compile(r' at 0x[0-9a-f]{6,16}(>+)$', re.IGNORECASE)
def stripid(text):
"""Remove the hexadecimal id from a Python object representation."""
# The behaviour of %p is implementation-dependent in terms of case.
return _re_stripid.sub(r'\1', text)
def _is_some_method(obj):
return inspect.ismethod(obj) or inspect.ismethoddescriptor(obj)
def allmethods(cl):
methods = {}
for key, value in inspect.getmembers(cl, _is_some_method):
methods[key] = 1
for base in cl.__bases__:
methods.update(allmethods(base)) # all your base are belong to us
for key in methods.keys():
methods[key] = getattr(cl, key)
return methods
def _split_list(s, predicate):
"""Split sequence s via predicate, and return pair ([true], [false]).
The return value is a 2-tuple of lists,
([x for x in s if predicate(x)],
[x for x in s if not predicate(x)])
"""
yes = []
no = []
for x in s:
if predicate(x):
yes.append(x)
else:
no.append(x)
return yes, no
def visiblename(name, all=None, obj=None):
"""Decide whether to show documentation on a variable."""
# Certain special names are redundant.
_hidden_names = ('__builtins__', '__doc__', '__file__', '__path__',
'__module__', '__name__', '__slots__', '__package__')
if name in _hidden_names: return 0
# Private names are hidden, but special names are displayed.
if name.startswith('__') and name.endswith('__'): return 1
# Namedtuples have public fields and methods with a single leading underscore
if name.startswith('_') and hasattr(obj, '_fields'):
return 1
if all is not None:
# only document that which the programmer exported in __all__
return name in all
else:
return not name.startswith('_')
def classify_class_attrs(object):
"""Wrap inspect.classify_class_attrs, with fixup for data descriptors."""
def fixup(data):
name, kind, cls, value = data
if inspect.isdatadescriptor(value):
kind = 'data descriptor'
return name, kind, cls, value
return map(fixup, inspect.classify_class_attrs(object))
# ----------------------------------------------------- Unicode support helpers
try:
_unicode = unicode
except NameError:
# If Python is built without Unicode support, the unicode type
# will not exist. Fake one that nothing will match, and make
# the _encode function that do nothing.
class _unicode(object):
pass
_encoding = 'ascii'
def _encode(text, encoding='ascii'):
return text
else:
import locale
_encoding = locale.getpreferredencoding()
def _encode(text, encoding=None):
if isinstance(text, unicode):
return text.encode(encoding or _encoding, 'xmlcharrefreplace')
else:
return text
def _binstr(obj):
# Ensure that we have an encoded (binary) string representation of obj,
# even if it is a unicode string.
if isinstance(obj, _unicode):
return obj.encode(_encoding, 'xmlcharrefreplace')
return str(obj)
# ----------------------------------------------------- module manipulation
def ispackage(path):
"""Guess whether a path refers to a package directory."""
if os.path.isdir(path):
for ext in ('.py', '.pyc', '.pyo'):
if os.path.isfile(os.path.join(path, '__init__' + ext)):
return True
return False
def source_synopsis(file):
line = file.readline()
while line[:1] == '#' or not strip(line):
line = file.readline()
if not line: break
line = strip(line)
if line[:4] == 'r"""': line = line[1:]
if line[:3] == '"""':
line = line[3:]
if line[-1:] == '\\': line = line[:-1]
while not strip(line):
line = file.readline()
if not line: break
result = strip(split(line, '"""')[0])
else: result = None
return result
def synopsis(filename, cache={}):
"""Get the one-line summary out of a module file."""
mtime = os.stat(filename).st_mtime
lastupdate, result = cache.get(filename, (None, None))
if lastupdate is None or lastupdate < mtime:
info = inspect.getmoduleinfo(filename)
try:
file = open(filename)
except IOError:
# module can't be opened, so skip it
return None
if info and 'b' in info[2]: # binary modules have to be imported
try: module = imp.load_module('__temp__', file, filename, info[1:])
except: return None
result = module.__doc__.splitlines()[0] if module.__doc__ else None
del sys.modules['__temp__']
else: # text modules can be directly examined
result = source_synopsis(file)
file.close()
cache[filename] = (mtime, result)
return result
class ErrorDuringImport(Exception):
"""Errors that occurred while trying to import something to document it."""
def __init__(self, filename, exc_info):
exc, value, tb = exc_info
self.filename = filename
self.exc = exc
self.value = value
self.tb = tb
def __str__(self):
exc = self.exc
if type(exc) is types.ClassType:
exc = exc.__name__
return 'problem in %s - %s: %s' % (self.filename, exc, self.value)
def importfile(path):
"""Import a Python source file or compiled file given its path."""
magic = imp.get_magic()
file = open(path, 'r')
if file.read(len(magic)) == magic:
kind = imp.PY_COMPILED
else:
kind = imp.PY_SOURCE
file.close()
filename = os.path.basename(path)
name, ext = os.path.splitext(filename)
file = open(path, 'r')
try:
module = imp.load_module(name, file, path, (ext, 'r', kind))
except:
raise ErrorDuringImport(path, sys.exc_info())
file.close()
return module
def safeimport(path, forceload=0, cache={}):
"""Import a module; handle errors; return None if the module isn't found.
If the module *is* found but an exception occurs, it's wrapped in an
ErrorDuringImport exception and reraised. Unlike __import__, if a
package path is specified, the module at the end of the path is returned,
not the package at the beginning. If the optional 'forceload' argument
is 1, we reload the module from disk (unless it's a dynamic extension)."""
try:
# If forceload is 1 and the module has been previously loaded from
# disk, we always have to reload the module. Checking the file's
# mtime isn't good enough (e.g. the module could contain a class
# that inherits from another module that has changed).
if forceload and path in sys.modules:
if path not in sys.builtin_module_names:
# Avoid simply calling reload() because it leaves names in
# the currently loaded module lying around if they're not
# defined in the new source file. Instead, remove the
# module from sys.modules and re-import. Also remove any
# submodules because they won't appear in the newly loaded
# module's namespace if they're already in sys.modules.
subs = [m for m in sys.modules if m.startswith(path + '.')]
for key in [path] + subs:
# Prevent garbage collection.
cache[key] = sys.modules[key]
del sys.modules[key]
module = __import__(path)
except:
# Did the error occur before or after the module was found?
(exc, value, tb) = info = sys.exc_info()
if path in sys.modules:
# An error occurred while executing the imported module.
raise ErrorDuringImport(sys.modules[path].__file__, info)
elif exc is SyntaxError:
# A SyntaxError occurred before we could execute the module.
raise ErrorDuringImport(value.filename, info)
elif exc is ImportError and extract_tb(tb)[-1][2]=='safeimport':
# The import error occurred directly in this function,
# which means there is no such module in the path.
return None
else:
# Some other error occurred during the importing process.
raise ErrorDuringImport(path, sys.exc_info())
for part in split(path, '.')[1:]:
try: module = getattr(module, part)
except AttributeError: return None
return module
# ---------------------------------------------------- formatter base class
class Doc:
def document(self, object, name=None, *args):
"""Generate documentation for an object."""
args = (object, name) + args
# 'try' clause is to attempt to handle the possibility that inspect
# identifies something in a way that pydoc itself has issues handling;
# think 'super' and how it is a descriptor (which raises the exception
# by lacking a __name__ attribute) and an instance.
if inspect.isgetsetdescriptor(object): return self.docdata(*args)
if inspect.ismemberdescriptor(object): return self.docdata(*args)
try:
if inspect.ismodule(object): return self.docmodule(*args)
if inspect.isclass(object): return self.docclass(*args)
if inspect.isroutine(object): return self.docroutine(*args)
except AttributeError:
pass
if isinstance(object, property): return self.docproperty(*args)
return self.docother(*args)
def fail(self, object, name=None, *args):
"""Raise an exception for unimplemented types."""
message = "don't know how to document object%s of type %s" % (
name and ' ' + repr(name), type(object).__name__)
raise TypeError, message
docmodule = docclass = docroutine = docother = docproperty = docdata = fail
def getdocloc(self, object,
basedir=os.path.join(sys.exec_prefix, "lib",
"python"+sys.version[0:3])):
"""Return the location of module docs or None"""
try:
file = inspect.getabsfile(object)
except TypeError:
file = '(built-in)'
docloc = os.environ.get("PYTHONDOCS",
"https://docs.python.org/library")
basedir = os.path.normcase(basedir)
if (isinstance(object, type(os)) and
(object.__name__ in ('errno', 'exceptions', 'gc', 'imp',
'marshal', 'posix', 'signal', 'sys',
'thread', 'zipimport') or
(file.startswith(basedir) and
not file.startswith(os.path.join(basedir, 'site-packages')))) and
object.__name__ not in ('xml.etree', 'test.pydoc_mod')):
if docloc.startswith(("http://", "https://")):
docloc = "%s/%s" % (docloc.rstrip("/"), object.__name__.lower())
else:
docloc = os.path.join(docloc, object.__name__.lower() + ".html")
else:
docloc = None
return docloc
# -------------------------------------------- HTML documentation generator
class HTMLRepr(Repr):
"""Class for safely making an HTML representation of a Python object."""
def __init__(self):
Repr.__init__(self)
self.maxlist = self.maxtuple = 20
self.maxdict = 10
self.maxstring = self.maxother = 100
def escape(self, text):
return replace(text, '&', '&', '<', '<', '>', '>')
def repr(self, object):
return Repr.repr(self, object)
def repr1(self, x, level):
if hasattr(type(x), '__name__'):
methodname = 'repr_' + join(split(type(x).__name__), '_')
if hasattr(self, methodname):
return getattr(self, methodname)(x, level)
return self.escape(cram(stripid(repr(x)), self.maxother))
def repr_string(self, x, level):
test = cram(x, self.maxstring)
testrepr = repr(test)
if '\\' in test and '\\' not in replace(testrepr, r'\\', ''):
# Backslashes are only literal in the string and are never
# needed to make any special characters, so show a raw string.
return 'r' + testrepr[0] + self.escape(test) + testrepr[0]
return re.sub(r'((\\[\\abfnrtv\'"]|\\[0-9]..|\\x..|\\u....)+)',
r'<font color="#c040c0">\1</font>',
self.escape(testrepr))
repr_str = repr_string
def repr_instance(self, x, level):
try:
return self.escape(cram(stripid(repr(x)), self.maxstring))
except:
return self.escape('<%s instance>' % x.__class__.__name__)
repr_unicode = repr_string
class HTMLDoc(Doc):
"""Formatter class for HTML documentation."""
# ------------------------------------------- HTML formatting utilities
_repr_instance = HTMLRepr()
repr = _repr_instance.repr
escape = _repr_instance.escape
def page(self, title, contents):
"""Format an HTML page."""
return _encode('''
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>Python: %s</title>
<meta charset="utf-8">
</head><body bgcolor="#f0f0f8">
%s
</body></html>''' % (title, contents), 'ascii')
def heading(self, title, fgcol, bgcol, extras=''):
"""Format a page heading."""
return '''
<table width="100%%" cellspacing=0 cellpadding=2 border=0 summary="heading">
<tr bgcolor="%s">
<td valign=bottom> <br>
<font color="%s" face="helvetica, arial"> <br>%s</font></td
><td align=right valign=bottom
><font color="%s" face="helvetica, arial">%s</font></td></tr></table>
''' % (bgcol, fgcol, title, fgcol, extras or ' ')
def section(self, title, fgcol, bgcol, contents, width=6,
prelude='', marginalia=None, gap=' '):
"""Format a section with a heading."""
if marginalia is None:
marginalia = '<tt>' + ' ' * width + '</tt>'
result = '''<p>
<table width="100%%" cellspacing=0 cellpadding=2 border=0 summary="section">
<tr bgcolor="%s">
<td colspan=3 valign=bottom> <br>
<font color="%s" face="helvetica, arial">%s</font></td></tr>
''' % (bgcol, fgcol, title)
if prelude:
result = result + '''
<tr bgcolor="%s"><td rowspan=2>%s</td>
<td colspan=2>%s</td></tr>
<tr><td>%s</td>''' % (bgcol, marginalia, prelude, gap)
else:
result = result + '''
<tr><td bgcolor="%s">%s</td><td>%s</td>''' % (bgcol, marginalia, gap)
return result + '\n<td width="100%%">%s</td></tr></table>' % contents
def bigsection(self, title, *args):
"""Format a section with a big heading."""
title = '<big><strong>%s</strong></big>' % title
return self.section(title, *args)
def preformat(self, text):
"""Format literal preformatted text."""
text = self.escape(expandtabs(text))
return replace(text, '\n\n', '\n \n', '\n\n', '\n \n',
' ', ' ', '\n', '<br>\n')
def multicolumn(self, list, format, cols=4):
"""Format a list of items into a multi-column list."""
result = ''
rows = (len(list)+cols-1)//cols
for col in range(cols):
result = result + '<td width="%d%%" valign=top>' % (100//cols)
for i in range(rows*col, rows*col+rows):
if i < len(list):
result = result + format(list[i]) + '<br>\n'
result = result + '</td>'
return '<table width="100%%" summary="list"><tr>%s</tr></table>' % result
def grey(self, text): return '<font color="#909090">%s</font>' % text
def namelink(self, name, *dicts):
"""Make a link for an identifier, given name-to-URL mappings."""
for dict in dicts:
if name in dict:
return '<a href="%s">%s</a>' % (dict[name], name)
return name
def classlink(self, object, modname):
"""Make a link for a class."""
name, module = object.__name__, sys.modules.get(object.__module__)
if hasattr(module, name) and getattr(module, name) is object:
return '<a href="%s.html#%s">%s</a>' % (
module.__name__, name, classname(object, modname))
return classname(object, modname)
def modulelink(self, object):
"""Make a link for a module."""
return '<a href="%s.html">%s</a>' % (object.__name__, object.__name__)
def modpkglink(self, data):
"""Make a link for a module or package to display in an index."""
name, path, ispackage, shadowed = data
if shadowed:
return self.grey(name)
if path:
url = '%s.%s.html' % (path, name)
else:
url = '%s.html' % name
if ispackage:
text = '<strong>%s</strong> (package)' % name
else:
text = name
return '<a href="%s">%s</a>' % (url, text)
def markup(self, text, escape=None, funcs={}, classes={}, methods={}):
"""Mark up some plain text, given a context of symbols to look for.
Each context dictionary maps object names to anchor names."""
escape = escape or self.escape
results = []
here = 0
pattern = re.compile(r'\b((http|ftp)://\S+[\w/]|'
r'RFC[- ]?(\d+)|'
r'PEP[- ]?(\d+)|'
r'(self\.)?(\w+))')
while True:
match = pattern.search(text, here)
if not match: break
start, end = match.span()
results.append(escape(text[here:start]))
all, scheme, rfc, pep, selfdot, name = match.groups()
if scheme:
url = escape(all).replace('"', '"')
results.append('<a href="%s">%s</a>' % (url, url))
elif rfc:
url = 'http://www.rfc-editor.org/rfc/rfc%d.txt' % int(rfc)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif pep:
url = 'http://www.python.org/dev/peps/pep-%04d/' % int(pep)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif selfdot:
# Create a link for methods like 'self.method(...)'
# and use <strong> for attributes like 'self.attr'
if text[end:end+1] == '(':
results.append('self.' + self.namelink(name, methods))
else:
results.append('self.<strong>%s</strong>' % name)
elif text[end:end+1] == '(':
results.append(self.namelink(name, methods, funcs, classes))
else:
results.append(self.namelink(name, classes))
here = end
results.append(escape(text[here:]))
return join(results, '')
# ---------------------------------------------- type-specific routines
def formattree(self, tree, modname, parent=None):
"""Produce HTML for a class tree as given by inspect.getclasstree()."""
result = ''
for entry in tree:
if type(entry) is type(()):
c, bases = entry
result = result + '<dt><font face="helvetica, arial">'
result = result + self.classlink(c, modname)
if bases and bases != (parent,):
parents = []
for base in bases:
parents.append(self.classlink(base, modname))
result = result + '(' + join(parents, ', ') + ')'
result = result + '\n</font></dt>'
elif type(entry) is type([]):
result = result + '<dd>\n%s</dd>\n' % self.formattree(
entry, modname, c)
return '<dl>\n%s</dl>\n' % result
def docmodule(self, object, name=None, mod=None, *ignored):
"""Produce HTML documentation for a module object."""
name = object.__name__ # ignore the passed-in name
try:
all = object.__all__
except AttributeError:
all = None
parts = split(name, '.')
links = []
for i in range(len(parts)-1):
links.append(
'<a href="%s.html"><font color="#ffffff">%s</font></a>' %
(join(parts[:i+1], '.'), parts[i]))
linkedname = join(links + parts[-1:], '.')
head = '<big><big><strong>%s</strong></big></big>' % linkedname
try:
path = inspect.getabsfile(object)
url = path
if sys.platform == 'win32':
import nturl2path
url = nturl2path.pathname2url(path)
filelink = '<a href="file:%s">%s</a>' % (url, path)
except TypeError:
filelink = '(built-in)'
info = []
if hasattr(object, '__version__'):
version = _binstr(object.__version__)
if version[:11] == '$' + 'Revision: ' and version[-1:] == '$':
version = strip(version[11:-1])
info.append('version %s' % self.escape(version))
if hasattr(object, '__date__'):
info.append(self.escape(_binstr(object.__date__)))
if info:
head = head + ' (%s)' % join(info, ', ')
docloc = self.getdocloc(object)
if docloc is not None:
docloc = '<br><a href="%(docloc)s">Module Docs</a>' % locals()
else:
docloc = ''
result = self.heading(
head, '#ffffff', '#7799ee',
'<a href=".">index</a><br>' + filelink + docloc)
modules = inspect.getmembers(object, inspect.ismodule)
classes, cdict = [], {}
for key, value in inspect.getmembers(object, inspect.isclass):
# if __all__ exists, believe it. Otherwise use old heuristic.
if (all is not None or
(inspect.getmodule(value) or object) is object):
if visiblename(key, all, object):
classes.append((key, value))
cdict[key] = cdict[value] = '#' + key
for key, value in classes:
for base in value.__bases__:
key, modname = base.__name__, base.__module__
module = sys.modules.get(modname)
if modname != name and module and hasattr(module, key):
if getattr(module, key) is base:
if not key in cdict:
cdict[key] = cdict[base] = modname + '.html#' + key
funcs, fdict = [], {}
for key, value in inspect.getmembers(object, inspect.isroutine):
# if __all__ exists, believe it. Otherwise use old heuristic.
if (all is not None or
inspect.isbuiltin(value) or inspect.getmodule(value) is object):
if visiblename(key, all, object):
funcs.append((key, value))
fdict[key] = '#-' + key
if inspect.isfunction(value): fdict[value] = fdict[key]
data = []
for key, value in inspect.getmembers(object, isdata):
if visiblename(key, all, object):
data.append((key, value))
doc = self.markup(getdoc(object), self.preformat, fdict, cdict)
doc = doc and '<tt>%s</tt>' % doc
result = result + '<p>%s</p>\n' % doc
if hasattr(object, '__path__'):
modpkgs = []
for importer, modname, ispkg in pkgutil.iter_modules(object.__path__):
modpkgs.append((modname, name, ispkg, 0))
modpkgs.sort()
contents = self.multicolumn(modpkgs, self.modpkglink)
result = result + self.bigsection(
'Package Contents', '#ffffff', '#aa55cc', contents)
elif modules:
contents = self.multicolumn(
modules, lambda key_value, s=self: s.modulelink(key_value[1]))
result = result + self.bigsection(
'Modules', '#ffffff', '#aa55cc', contents)
if classes:
classlist = map(lambda key_value: key_value[1], classes)
contents = [
self.formattree(inspect.getclasstree(classlist, 1), name)]
for key, value in classes:
contents.append(self.document(value, key, name, fdict, cdict))
result = result + self.bigsection(
'Classes', '#ffffff', '#ee77aa', join(contents))
if funcs:
contents = []
for key, value in funcs:
contents.append(self.document(value, key, name, fdict, cdict))
result = result + self.bigsection(
'Functions', '#ffffff', '#eeaa77', join(contents))
if data:
contents = []
for key, value in data:
contents.append(self.document(value, key))
result = result + self.bigsection(
'Data', '#ffffff', '#55aa55', join(contents, '<br>\n'))
if hasattr(object, '__author__'):
contents = self.markup(_binstr(object.__author__), self.preformat)
result = result + self.bigsection(
'Author', '#ffffff', '#7799ee', contents)
if hasattr(object, '__credits__'):
contents = self.markup(_binstr(object.__credits__), self.preformat)
result = result + self.bigsection(
'Credits', '#ffffff', '#7799ee', contents)
return result
def docclass(self, object, name=None, mod=None, funcs={}, classes={},
*ignored):
"""Produce HTML documentation for a class object."""
realname = object.__name__
name = name or realname
bases = object.__bases__
contents = []
push = contents.append
# Cute little class to pump out a horizontal rule between sections.
class HorizontalRule:
def __init__(self):
self.needone = 0
def maybe(self):
if self.needone:
push('<hr>\n')
self.needone = 1
hr = HorizontalRule()
# List the mro, if non-trivial.
mro = deque(inspect.getmro(object))
if len(mro) > 2:
hr.maybe()
push('<dl><dt>Method resolution order:</dt>\n')
for base in mro:
push('<dd>%s</dd>\n' % self.classlink(base,
object.__module__))
push('</dl>\n')
def spill(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
try:
value = getattr(object, name)
except Exception:
# Some descriptors may meet a failure in their __get__.
# (bug #1785)
push(self._docdescriptor(name, value, mod))
else:
push(self.document(value, name, mod,
funcs, classes, mdict, object))
push('\n')
return attrs
def spilldescriptors(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
push(self._docdescriptor(name, value, mod))
return attrs
def spilldata(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
base = self.docother(getattr(object, name), name, mod)
if (hasattr(value, '__call__') or
inspect.isdatadescriptor(value)):
doc = getattr(value, "__doc__", None)
else:
doc = None
if doc is None:
push('<dl><dt>%s</dl>\n' % base)
else:
doc = self.markup(getdoc(value), self.preformat,
funcs, classes, mdict)
doc = '<dd><tt>%s</tt>' % doc
push('<dl><dt>%s%s</dl>\n' % (base, doc))
push('\n')
return attrs
attrs = filter(lambda data: visiblename(data[0], obj=object),
classify_class_attrs(object))
mdict = {}
for key, kind, homecls, value in attrs:
mdict[key] = anchor = '#' + name + '-' + key
try:
value = getattr(object, name)
except Exception:
# Some descriptors may meet a failure in their __get__.
# (bug #1785)
pass
try:
# The value may not be hashable (e.g., a data attr with
# a dict or list value).
mdict[value] = anchor
except TypeError:
pass
while attrs:
if mro:
thisclass = mro.popleft()
else:
thisclass = attrs[0][2]
attrs, inherited = _split_list(attrs, lambda t: t[2] is thisclass)
if thisclass is __builtin__.object:
attrs = inherited
continue
elif thisclass is object:
tag = 'defined here'
else:
tag = 'inherited from %s' % self.classlink(thisclass,
object.__module__)
tag += ':<br>\n'
# Sort attrs by name.
try:
attrs.sort(key=lambda t: t[0])
except TypeError:
attrs.sort(lambda t1, t2: cmp(t1[0], t2[0])) # 2.3 compat
# Pump out the attrs, segregated by kind.
attrs = spill('Methods %s' % tag, attrs,
lambda t: t[1] == 'method')
attrs = spill('Class methods %s' % tag, attrs,
lambda t: t[1] == 'class method')
attrs = spill('Static methods %s' % tag, attrs,
lambda t: t[1] == 'static method')
attrs = spilldescriptors('Data descriptors %s' % tag, attrs,
lambda t: t[1] == 'data descriptor')
attrs = spilldata('Data and other attributes %s' % tag, attrs,
lambda t: t[1] == 'data')
assert attrs == []
attrs = inherited
contents = ''.join(contents)
if name == realname:
title = '<a name="%s">class <strong>%s</strong></a>' % (
name, realname)
else:
title = '<strong>%s</strong> = <a name="%s">class %s</a>' % (
name, name, realname)
if bases:
parents = []
for base in bases:
parents.append(self.classlink(base, object.__module__))
title = title + '(%s)' % join(parents, ', ')
doc = self.markup(getdoc(object), self.preformat, funcs, classes, mdict)
doc = doc and '<tt>%s<br> </tt>' % doc
return self.section(title, '#000000', '#ffc8d8', contents, 3, doc)
def formatvalue(self, object):
"""Format an argument default value as text."""
return self.grey('=' + self.repr(object))
def docroutine(self, object, name=None, mod=None,
funcs={}, classes={}, methods={}, cl=None):
"""Produce HTML documentation for a function or method object."""
realname = object.__name__
name = name or realname
anchor = (cl and cl.__name__ or '') + '-' + name
note = ''
skipdocs = 0
if inspect.ismethod(object):
imclass = object.im_class
if cl:
if imclass is not cl:
note = ' from ' + self.classlink(imclass, mod)
else:
if object.im_self is not None:
note = ' method of %s instance' % self.classlink(
object.im_self.__class__, mod)
else:
note = ' unbound %s method' % self.classlink(imclass,mod)
object = object.im_func
if name == realname:
title = '<a name="%s"><strong>%s</strong></a>' % (anchor, realname)
else:
if (cl and realname in cl.__dict__ and
cl.__dict__[realname] is object):
reallink = '<a href="#%s">%s</a>' % (
cl.__name__ + '-' + realname, realname)
skipdocs = 1
else:
reallink = realname
title = '<a name="%s"><strong>%s</strong></a> = %s' % (
anchor, name, reallink)
if inspect.isfunction(object):
args, varargs, varkw, defaults = inspect.getargspec(object)
argspec = inspect.formatargspec(
args, varargs, varkw, defaults, formatvalue=self.formatvalue)
if realname == '<lambda>':
title = '<strong>%s</strong> <em>lambda</em> ' % name
argspec = argspec[1:-1] # remove parentheses
else:
argspec = '(...)'
decl = title + argspec + (note and self.grey(
'<font face="helvetica, arial">%s</font>' % note))
if skipdocs:
return '<dl><dt>%s</dt></dl>\n' % decl
else:
doc = self.markup(
getdoc(object), self.preformat, funcs, classes, methods)
doc = doc and '<dd><tt>%s</tt></dd>' % doc
return '<dl><dt>%s</dt>%s</dl>\n' % (decl, doc)
def _docdescriptor(self, name, value, mod):
results = []
push = results.append
if name:
push('<dl><dt><strong>%s</strong></dt>\n' % name)
if value.__doc__ is not None:
doc = self.markup(getdoc(value), self.preformat)
push('<dd><tt>%s</tt></dd>\n' % doc)
push('</dl>\n')
return ''.join(results)
def docproperty(self, object, name=None, mod=None, cl=None):
"""Produce html documentation for a property."""
return self._docdescriptor(name, object, mod)
def docother(self, object, name=None, mod=None, *ignored):
"""Produce HTML documentation for a data object."""
lhs = name and '<strong>%s</strong> = ' % name or ''
return lhs + self.repr(object)
def docdata(self, object, name=None, mod=None, cl=None):
"""Produce html documentation for a data descriptor."""
return self._docdescriptor(name, object, mod)
def index(self, dir, shadowed=None):
"""Generate an HTML index for a directory of modules."""
modpkgs = []
if shadowed is None: shadowed = {}
for importer, name, ispkg in pkgutil.iter_modules([dir]):
modpkgs.append((name, '', ispkg, name in shadowed))
shadowed[name] = 1
modpkgs.sort()
contents = self.multicolumn(modpkgs, self.modpkglink)
return self.bigsection(dir, '#ffffff', '#ee77aa', contents)
# -------------------------------------------- text documentation generator
class TextRepr(Repr):
"""Class for safely making a text representation of a Python object."""
def __init__(self):
Repr.__init__(self)
self.maxlist = self.maxtuple = 20
self.maxdict = 10
self.maxstring = self.maxother = 100
def repr1(self, x, level):
if hasattr(type(x), '__name__'):
methodname = 'repr_' + join(split(type(x).__name__), '_')
if hasattr(self, methodname):
return getattr(self, methodname)(x, level)
return cram(stripid(repr(x)), self.maxother)
def repr_string(self, x, level):
test = cram(x, self.maxstring)
testrepr = repr(test)
if '\\' in test and '\\' not in replace(testrepr, r'\\', ''):
# Backslashes are only literal in the string and are never
# needed to make any special characters, so show a raw string.
return 'r' + testrepr[0] + test + testrepr[0]
return testrepr
repr_str = repr_string
def repr_instance(self, x, level):
try:
return cram(stripid(repr(x)), self.maxstring)
except:
return '<%s instance>' % x.__class__.__name__
class TextDoc(Doc):
"""Formatter class for text documentation."""
# ------------------------------------------- text formatting utilities
_repr_instance = TextRepr()
repr = _repr_instance.repr
def bold(self, text):
"""Format a string in bold by overstriking."""
return join(map(lambda ch: ch + '\b' + ch, text), '')
def indent(self, text, prefix=' '):
"""Indent text by prepending a given prefix to each line."""
if not text: return ''
lines = split(text, '\n')
lines = map(lambda line, prefix=prefix: prefix + line, lines)
if lines: lines[-1] = rstrip(lines[-1])
return join(lines, '\n')
def section(self, title, contents):
"""Format a section with a given heading."""
return self.bold(title) + '\n' + rstrip(self.indent(contents)) + '\n\n'
# ---------------------------------------------- type-specific routines
def formattree(self, tree, modname, parent=None, prefix=''):
"""Render in text a class tree as returned by inspect.getclasstree()."""
result = ''
for entry in tree:
if type(entry) is type(()):
c, bases = entry
result = result + prefix + classname(c, modname)
if bases and bases != (parent,):
parents = map(lambda c, m=modname: classname(c, m), bases)
result = result + '(%s)' % join(parents, ', ')
result = result + '\n'
elif type(entry) is type([]):
result = result + self.formattree(
entry, modname, c, prefix + ' ')
return result
def docmodule(self, object, name=None, mod=None):
"""Produce text documentation for a given module object."""
name = object.__name__ # ignore the passed-in name
synop, desc = splitdoc(getdoc(object))
result = self.section('NAME', name + (synop and ' - ' + synop))
try:
all = object.__all__
except AttributeError:
all = None
try:
file = inspect.getabsfile(object)
except TypeError:
file = '(built-in)'
result = result + self.section('FILE', file)
docloc = self.getdocloc(object)
if docloc is not None:
result = result + self.section('MODULE DOCS', docloc)
if desc:
result = result + self.section('DESCRIPTION', desc)
classes = []
for key, value in inspect.getmembers(object, inspect.isclass):
# if __all__ exists, believe it. Otherwise use old heuristic.
if (all is not None
or (inspect.getmodule(value) or object) is object):
if visiblename(key, all, object):
classes.append((key, value))
funcs = []
for key, value in inspect.getmembers(object, inspect.isroutine):
# if __all__ exists, believe it. Otherwise use old heuristic.
if (all is not None or
inspect.isbuiltin(value) or inspect.getmodule(value) is object):
if visiblename(key, all, object):
funcs.append((key, value))
data = []
for key, value in inspect.getmembers(object, isdata):
if visiblename(key, all, object):
data.append((key, value))
modpkgs = []
modpkgs_names = set()
if hasattr(object, '__path__'):
for importer, modname, ispkg in pkgutil.iter_modules(object.__path__):
modpkgs_names.add(modname)
if ispkg:
modpkgs.append(modname + ' (package)')
else:
modpkgs.append(modname)
modpkgs.sort()
result = result + self.section(
'PACKAGE CONTENTS', join(modpkgs, '\n'))
# Detect submodules as sometimes created by C extensions
submodules = []
for key, value in inspect.getmembers(object, inspect.ismodule):
if value.__name__.startswith(name + '.') and key not in modpkgs_names:
submodules.append(key)
if submodules:
submodules.sort()
result = result + self.section(
'SUBMODULES', join(submodules, '\n'))
if classes:
classlist = map(lambda key_value: key_value[1], classes)
contents = [self.formattree(
inspect.getclasstree(classlist, 1), name)]
for key, value in classes:
contents.append(self.document(value, key, name))
result = result + self.section('CLASSES', join(contents, '\n'))
if funcs:
contents = []
for key, value in funcs:
contents.append(self.document(value, key, name))
result = result + self.section('FUNCTIONS', join(contents, '\n'))
if data:
contents = []
for key, value in data:
contents.append(self.docother(value, key, name, maxlen=70))
result = result + self.section('DATA', join(contents, '\n'))
if hasattr(object, '__version__'):
version = _binstr(object.__version__)
if version[:11] == '$' + 'Revision: ' and version[-1:] == '$':
version = strip(version[11:-1])
result = result + self.section('VERSION', version)
if hasattr(object, '__date__'):
result = result + self.section('DATE', _binstr(object.__date__))
if hasattr(object, '__author__'):
result = result + self.section('AUTHOR', _binstr(object.__author__))
if hasattr(object, '__credits__'):
result = result + self.section('CREDITS', _binstr(object.__credits__))
return result
def docclass(self, object, name=None, mod=None, *ignored):
"""Produce text documentation for a given class object."""
realname = object.__name__
name = name or realname
bases = object.__bases__
def makename(c, m=object.__module__):
return classname(c, m)
if name == realname:
title = 'class ' + self.bold(realname)
else:
title = self.bold(name) + ' = class ' + realname
if bases:
parents = map(makename, bases)
title = title + '(%s)' % join(parents, ', ')
doc = getdoc(object)
contents = doc and [doc + '\n'] or []
push = contents.append
# List the mro, if non-trivial.
mro = deque(inspect.getmro(object))
if len(mro) > 2:
push("Method resolution order:")
for base in mro:
push(' ' + makename(base))
push('')
# Cute little class to pump out a horizontal rule between sections.
class HorizontalRule:
def __init__(self):
self.needone = 0
def maybe(self):
if self.needone:
push('-' * 70)
self.needone = 1
hr = HorizontalRule()
def spill(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
try:
value = getattr(object, name)
except Exception:
# Some descriptors may meet a failure in their __get__.
# (bug #1785)
push(self._docdescriptor(name, value, mod))
else:
push(self.document(value,
name, mod, object))
return attrs
def spilldescriptors(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
push(self._docdescriptor(name, value, mod))
return attrs
def spilldata(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
if (hasattr(value, '__call__') or
inspect.isdatadescriptor(value)):
doc = getdoc(value)
else:
doc = None
push(self.docother(getattr(object, name),
name, mod, maxlen=70, doc=doc) + '\n')
return attrs
attrs = filter(lambda data: visiblename(data[0], obj=object),
classify_class_attrs(object))
while attrs:
if mro:
thisclass = mro.popleft()
else:
thisclass = attrs[0][2]
attrs, inherited = _split_list(attrs, lambda t: t[2] is thisclass)
if thisclass is __builtin__.object:
attrs = inherited
continue
elif thisclass is object:
tag = "defined here"
else:
tag = "inherited from %s" % classname(thisclass,
object.__module__)
# Sort attrs by name.
attrs.sort()
# Pump out the attrs, segregated by kind.
attrs = spill("Methods %s:\n" % tag, attrs,
lambda t: t[1] == 'method')
attrs = spill("Class methods %s:\n" % tag, attrs,
lambda t: t[1] == 'class method')
attrs = spill("Static methods %s:\n" % tag, attrs,
lambda t: t[1] == 'static method')
attrs = spilldescriptors("Data descriptors %s:\n" % tag, attrs,
lambda t: t[1] == 'data descriptor')
attrs = spilldata("Data and other attributes %s:\n" % tag, attrs,
lambda t: t[1] == 'data')
assert attrs == []
attrs = inherited
contents = '\n'.join(contents)
if not contents:
return title + '\n'
return title + '\n' + self.indent(rstrip(contents), ' | ') + '\n'
def formatvalue(self, object):
"""Format an argument default value as text."""
return '=' + self.repr(object)
def docroutine(self, object, name=None, mod=None, cl=None):
"""Produce text documentation for a function or method object."""
realname = object.__name__
name = name or realname
note = ''
skipdocs = 0
if inspect.ismethod(object):
imclass = object.im_class
if cl:
if imclass is not cl:
note = ' from ' + classname(imclass, mod)
else:
if object.im_self is not None:
note = ' method of %s instance' % classname(
object.im_self.__class__, mod)
else:
note = ' unbound %s method' % classname(imclass,mod)
object = object.im_func
if name == realname:
title = self.bold(realname)
else:
if (cl and realname in cl.__dict__ and
cl.__dict__[realname] is object):
skipdocs = 1
title = self.bold(name) + ' = ' + realname
if inspect.isfunction(object):
args, varargs, varkw, defaults = inspect.getargspec(object)
argspec = inspect.formatargspec(
args, varargs, varkw, defaults, formatvalue=self.formatvalue)
if realname == '<lambda>':
title = self.bold(name) + ' lambda '
argspec = argspec[1:-1] # remove parentheses
else:
argspec = '(...)'
decl = title + argspec + note
if skipdocs:
return decl + '\n'
else:
doc = getdoc(object) or ''
return decl + '\n' + (doc and rstrip(self.indent(doc)) + '\n')
def _docdescriptor(self, name, value, mod):
results = []
push = results.append
if name:
push(self.bold(name))
push('\n')
doc = getdoc(value) or ''
if doc:
push(self.indent(doc))
push('\n')
return ''.join(results)
def docproperty(self, object, name=None, mod=None, cl=None):
"""Produce text documentation for a property."""
return self._docdescriptor(name, object, mod)
def docdata(self, object, name=None, mod=None, cl=None):
"""Produce text documentation for a data descriptor."""
return self._docdescriptor(name, object, mod)
def docother(self, object, name=None, mod=None, parent=None, maxlen=None, doc=None):
"""Produce text documentation for a data object."""
repr = self.repr(object)
if maxlen:
line = (name and name + ' = ' or '') + repr
chop = maxlen - len(line)
if chop < 0: repr = repr[:chop] + '...'
line = (name and self.bold(name) + ' = ' or '') + repr
if doc is not None:
line += '\n' + self.indent(str(doc))
return line
# --------------------------------------------------------- user interfaces
def pager(text):
"""The first time this is called, determine what kind of pager to use."""
global pager
pager = getpager()
pager(text)
def getpager():
"""Decide what method to use for paging through text."""
if type(sys.stdout) is not types.FileType:
return plainpager
if not hasattr(sys.stdin, "isatty"):
return plainpager
if not sys.stdin.isatty() or not sys.stdout.isatty():
return plainpager
if 'PAGER' in os.environ:
if sys.platform == 'win32': # pipes completely broken in Windows
return lambda text: tempfilepager(plain(text), os.environ['PAGER'])
elif os.environ.get('TERM') in ('dumb', 'emacs'):
return lambda text: pipepager(plain(text), os.environ['PAGER'])
else:
return lambda text: pipepager(text, os.environ['PAGER'])
if os.environ.get('TERM') in ('dumb', 'emacs'):
return plainpager
if sys.platform == 'win32' or sys.platform.startswith('os2'):
return lambda text: tempfilepager(plain(text), 'more <')
if hasattr(os, 'system') and os.system('(less) 2>/dev/null') == 0:
return lambda text: pipepager(text, 'less')
import tempfile
(fd, filename) = tempfile.mkstemp()
os.close(fd)
try:
if hasattr(os, 'system') and os.system('more "%s"' % filename) == 0:
return lambda text: pipepager(text, 'more')
else:
return ttypager
finally:
os.unlink(filename)
def plain(text):
"""Remove boldface formatting from text."""
return re.sub('.\b', '', text)
def pipepager(text, cmd):
"""Page through text by feeding it to another program."""
pipe = os.popen(cmd, 'w')
try:
pipe.write(_encode(text))
pipe.close()
except IOError:
pass # Ignore broken pipes caused by quitting the pager program.
def tempfilepager(text, cmd):
"""Page through text by invoking a program on a temporary file."""
import tempfile
filename = tempfile.mktemp()
file = open(filename, 'w')
file.write(_encode(text))
file.close()
try:
os.system(cmd + ' "' + filename + '"')
finally:
os.unlink(filename)
def ttypager(text):
"""Page through text on a text terminal."""
lines = plain(_encode(plain(text), getattr(sys.stdout, 'encoding', _encoding))).split('\n')
try:
import tty
fd = sys.stdin.fileno()
old = tty.tcgetattr(fd)
tty.setcbreak(fd)
getchar = lambda: sys.stdin.read(1)
except (ImportError, AttributeError):
tty = None
getchar = lambda: sys.stdin.readline()[:-1][:1]
try:
try:
h = int(os.environ.get('LINES', 0))
except ValueError:
h = 0
if h <= 1:
h = 25
r = inc = h - 1
sys.stdout.write(join(lines[:inc], '\n') + '\n')
while lines[r:]:
sys.stdout.write('-- more --')
sys.stdout.flush()
c = getchar()
if c in ('q', 'Q'):
sys.stdout.write('\r \r')
break
elif c in ('\r', '\n'):
sys.stdout.write('\r \r' + lines[r] + '\n')
r = r + 1
continue
if c in ('b', 'B', '\x1b'):
r = r - inc - inc
if r < 0: r = 0
sys.stdout.write('\n' + join(lines[r:r+inc], '\n') + '\n')
r = r + inc
finally:
if tty:
tty.tcsetattr(fd, tty.TCSAFLUSH, old)
def plainpager(text):
"""Simply print unformatted text. This is the ultimate fallback."""
sys.stdout.write(_encode(plain(text), getattr(sys.stdout, 'encoding', _encoding)))
def describe(thing):
"""Produce a short description of the given thing."""
if inspect.ismodule(thing):
if thing.__name__ in sys.builtin_module_names:
return 'built-in module ' + thing.__name__
if hasattr(thing, '__path__'):
return 'package ' + thing.__name__
else:
return 'module ' + thing.__name__
if inspect.isbuiltin(thing):
return 'built-in function ' + thing.__name__
if inspect.isgetsetdescriptor(thing):
return 'getset descriptor %s.%s.%s' % (
thing.__objclass__.__module__, thing.__objclass__.__name__,
thing.__name__)
if inspect.ismemberdescriptor(thing):
return 'member descriptor %s.%s.%s' % (
thing.__objclass__.__module__, thing.__objclass__.__name__,
thing.__name__)
if inspect.isclass(thing):
return 'class ' + thing.__name__
if inspect.isfunction(thing):
return 'function ' + thing.__name__
if inspect.ismethod(thing):
return 'method ' + thing.__name__
if type(thing) is types.InstanceType:
return 'instance of ' + thing.__class__.__name__
return type(thing).__name__
def locate(path, forceload=0):
"""Locate an object by name or dotted path, importing as necessary."""
parts = [part for part in split(path, '.') if part]
module, n = None, 0
while n < len(parts):
nextmodule = safeimport(join(parts[:n+1], '.'), forceload)
if nextmodule: module, n = nextmodule, n + 1
else: break
if module:
object = module
else:
object = __builtin__
for part in parts[n:]:
try:
object = getattr(object, part)
except AttributeError:
return None
return object
# --------------------------------------- interactive interpreter interface
text = TextDoc()
html = HTMLDoc()
class _OldStyleClass: pass
_OLD_INSTANCE_TYPE = type(_OldStyleClass())
def resolve(thing, forceload=0):
"""Given an object or a path to an object, get the object and its name."""
if isinstance(thing, str):
object = locate(thing, forceload)
if object is None:
raise ImportError, 'no Python documentation found for %r' % thing
return object, thing
else:
name = getattr(thing, '__name__', None)
return thing, name if isinstance(name, str) else None
def render_doc(thing, title='Python Library Documentation: %s', forceload=0):
"""Render text documentation, given an object or a path to an object."""
object, name = resolve(thing, forceload)
desc = describe(object)
module = inspect.getmodule(object)
if name and '.' in name:
desc += ' in ' + name[:name.rfind('.')]
elif module and module is not object:
desc += ' in module ' + module.__name__
if type(object) is _OLD_INSTANCE_TYPE:
# If the passed object is an instance of an old-style class,
# document its available methods instead of its value.
object = object.__class__
elif not (inspect.ismodule(object) or
inspect.isclass(object) or
inspect.isroutine(object) or
inspect.isgetsetdescriptor(object) or
inspect.ismemberdescriptor(object) or
isinstance(object, property)):
# If the passed object is a piece of data or an instance,
# document its available methods instead of its value.
object = type(object)
desc += ' object'
return title % desc + '\n\n' + text.document(object, name)
def doc(thing, title='Python Library Documentation: %s', forceload=0):
"""Display text documentation, given an object or a path to an object."""
try:
pager(render_doc(thing, title, forceload))
except (ImportError, ErrorDuringImport), value:
print value
def writedoc(thing, forceload=0):
"""Write HTML documentation to a file in the current directory."""
try:
object, name = resolve(thing, forceload)
page = html.page(describe(object), html.document(object, name))
file = open(name + '.html', 'w')
file.write(page)
file.close()
print 'wrote', name + '.html'
except (ImportError, ErrorDuringImport), value:
print value
def writedocs(dir, pkgpath='', done=None):
"""Write out HTML documentation for all modules in a directory tree."""
if done is None: done = {}
for importer, modname, ispkg in pkgutil.walk_packages([dir], pkgpath):
writedoc(modname)
return
class Helper:
# These dictionaries map a topic name to either an alias, or a tuple
# (label, seealso-items). The "label" is the label of the corresponding
# section in the .rst file under Doc/ and an index into the dictionary
# in pydoc_data/topics.py.
#
# CAUTION: if you change one of these dictionaries, be sure to adapt the
# list of needed labels in Doc/tools/pyspecific.py and
# regenerate the pydoc_data/topics.py file by running
# make pydoc-topics
# in Doc/ and copying the output file into the Lib/ directory.
keywords = {
'and': 'BOOLEAN',
'as': 'with',
'assert': ('assert', ''),
'break': ('break', 'while for'),
'class': ('class', 'CLASSES SPECIALMETHODS'),
'continue': ('continue', 'while for'),
'def': ('function', ''),
'del': ('del', 'BASICMETHODS'),
'elif': 'if',
'else': ('else', 'while for'),
'except': 'try',
'exec': ('exec', ''),
'finally': 'try',
'for': ('for', 'break continue while'),
'from': 'import',
'global': ('global', 'NAMESPACES'),
'if': ('if', 'TRUTHVALUE'),
'import': ('import', 'MODULES'),
'in': ('in', 'SEQUENCEMETHODS2'),
'is': 'COMPARISON',
'lambda': ('lambda', 'FUNCTIONS'),
'not': 'BOOLEAN',
'or': 'BOOLEAN',
'pass': ('pass', ''),
'print': ('print', ''),
'raise': ('raise', 'EXCEPTIONS'),
'return': ('return', 'FUNCTIONS'),
'try': ('try', 'EXCEPTIONS'),
'while': ('while', 'break continue if TRUTHVALUE'),
'with': ('with', 'CONTEXTMANAGERS EXCEPTIONS yield'),
'yield': ('yield', ''),
}
# Either add symbols to this dictionary or to the symbols dictionary
# directly: Whichever is easier. They are merged later.
_strprefixes = tuple(p + q for p in ('b', 'r', 'u') for q in ("'", '"'))
_symbols_inverse = {
'STRINGS' : ("'", "'''", '"""', '"') + _strprefixes,
'OPERATORS' : ('+', '-', '*', '**', '/', '//', '%', '<<', '>>', '&',
'|', '^', '~', '<', '>', '<=', '>=', '==', '!=', '<>'),
'COMPARISON' : ('<', '>', '<=', '>=', '==', '!=', '<>'),
'UNARY' : ('-', '~'),
'AUGMENTEDASSIGNMENT' : ('+=', '-=', '*=', '/=', '%=', '&=', '|=',
'^=', '<<=', '>>=', '**=', '//='),
'BITWISE' : ('<<', '>>', '&', '|', '^', '~'),
'COMPLEX' : ('j', 'J')
}
symbols = {
'%': 'OPERATORS FORMATTING',
'**': 'POWER',
',': 'TUPLES LISTS FUNCTIONS',
'.': 'ATTRIBUTES FLOAT MODULES OBJECTS',
'...': 'ELLIPSIS',
':': 'SLICINGS DICTIONARYLITERALS',
'@': 'def class',
'\\': 'STRINGS',
'_': 'PRIVATENAMES',
'__': 'PRIVATENAMES SPECIALMETHODS',
'`': 'BACKQUOTES',
'(': 'TUPLES FUNCTIONS CALLS',
')': 'TUPLES FUNCTIONS CALLS',
'[': 'LISTS SUBSCRIPTS SLICINGS',
']': 'LISTS SUBSCRIPTS SLICINGS'
}
for topic, symbols_ in _symbols_inverse.iteritems():
for symbol in symbols_:
topics = symbols.get(symbol, topic)
if topic not in topics:
topics = topics + ' ' + topic
symbols[symbol] = topics
topics = {
'TYPES': ('types', 'STRINGS UNICODE NUMBERS SEQUENCES MAPPINGS '
'FUNCTIONS CLASSES MODULES FILES inspect'),
'STRINGS': ('strings', 'str UNICODE SEQUENCES STRINGMETHODS FORMATTING '
'TYPES'),
'STRINGMETHODS': ('string-methods', 'STRINGS FORMATTING'),
'FORMATTING': ('formatstrings', 'OPERATORS'),
'UNICODE': ('strings', 'encodings unicode SEQUENCES STRINGMETHODS '
'FORMATTING TYPES'),
'NUMBERS': ('numbers', 'INTEGER FLOAT COMPLEX TYPES'),
'INTEGER': ('integers', 'int range'),
'FLOAT': ('floating', 'float math'),
'COMPLEX': ('imaginary', 'complex cmath'),
'SEQUENCES': ('typesseq', 'STRINGMETHODS FORMATTING xrange LISTS'),
'MAPPINGS': 'DICTIONARIES',
'FUNCTIONS': ('typesfunctions', 'def TYPES'),
'METHODS': ('typesmethods', 'class def CLASSES TYPES'),
'CODEOBJECTS': ('bltin-code-objects', 'compile FUNCTIONS TYPES'),
'TYPEOBJECTS': ('bltin-type-objects', 'types TYPES'),
'FRAMEOBJECTS': 'TYPES',
'TRACEBACKS': 'TYPES',
'NONE': ('bltin-null-object', ''),
'ELLIPSIS': ('bltin-ellipsis-object', 'SLICINGS'),
'FILES': ('bltin-file-objects', ''),
'SPECIALATTRIBUTES': ('specialattrs', ''),
'CLASSES': ('types', 'class SPECIALMETHODS PRIVATENAMES'),
'MODULES': ('typesmodules', 'import'),
'PACKAGES': 'import',
'EXPRESSIONS': ('operator-summary', 'lambda or and not in is BOOLEAN '
'COMPARISON BITWISE SHIFTING BINARY FORMATTING POWER '
'UNARY ATTRIBUTES SUBSCRIPTS SLICINGS CALLS TUPLES '
'LISTS DICTIONARIES BACKQUOTES'),
'OPERATORS': 'EXPRESSIONS',
'PRECEDENCE': 'EXPRESSIONS',
'OBJECTS': ('objects', 'TYPES'),
'SPECIALMETHODS': ('specialnames', 'BASICMETHODS ATTRIBUTEMETHODS '
'CALLABLEMETHODS SEQUENCEMETHODS1 MAPPINGMETHODS '
'SEQUENCEMETHODS2 NUMBERMETHODS CLASSES'),
'BASICMETHODS': ('customization', 'cmp hash repr str SPECIALMETHODS'),
'ATTRIBUTEMETHODS': ('attribute-access', 'ATTRIBUTES SPECIALMETHODS'),
'CALLABLEMETHODS': ('callable-types', 'CALLS SPECIALMETHODS'),
'SEQUENCEMETHODS1': ('sequence-types', 'SEQUENCES SEQUENCEMETHODS2 '
'SPECIALMETHODS'),
'SEQUENCEMETHODS2': ('sequence-methods', 'SEQUENCES SEQUENCEMETHODS1 '
'SPECIALMETHODS'),
'MAPPINGMETHODS': ('sequence-types', 'MAPPINGS SPECIALMETHODS'),
'NUMBERMETHODS': ('numeric-types', 'NUMBERS AUGMENTEDASSIGNMENT '
'SPECIALMETHODS'),
'EXECUTION': ('execmodel', 'NAMESPACES DYNAMICFEATURES EXCEPTIONS'),
'NAMESPACES': ('naming', 'global ASSIGNMENT DELETION DYNAMICFEATURES'),
'DYNAMICFEATURES': ('dynamic-features', ''),
'SCOPING': 'NAMESPACES',
'FRAMES': 'NAMESPACES',
'EXCEPTIONS': ('exceptions', 'try except finally raise'),
'COERCIONS': ('coercion-rules','CONVERSIONS'),
'CONVERSIONS': ('conversions', 'COERCIONS'),
'IDENTIFIERS': ('identifiers', 'keywords SPECIALIDENTIFIERS'),
'SPECIALIDENTIFIERS': ('id-classes', ''),
'PRIVATENAMES': ('atom-identifiers', ''),
'LITERALS': ('atom-literals', 'STRINGS BACKQUOTES NUMBERS '
'TUPLELITERALS LISTLITERALS DICTIONARYLITERALS'),
'TUPLES': 'SEQUENCES',
'TUPLELITERALS': ('exprlists', 'TUPLES LITERALS'),
'LISTS': ('typesseq-mutable', 'LISTLITERALS'),
'LISTLITERALS': ('lists', 'LISTS LITERALS'),
'DICTIONARIES': ('typesmapping', 'DICTIONARYLITERALS'),
'DICTIONARYLITERALS': ('dict', 'DICTIONARIES LITERALS'),
'BACKQUOTES': ('string-conversions', 'repr str STRINGS LITERALS'),
'ATTRIBUTES': ('attribute-references', 'getattr hasattr setattr '
'ATTRIBUTEMETHODS'),
'SUBSCRIPTS': ('subscriptions', 'SEQUENCEMETHODS1'),
'SLICINGS': ('slicings', 'SEQUENCEMETHODS2'),
'CALLS': ('calls', 'EXPRESSIONS'),
'POWER': ('power', 'EXPRESSIONS'),
'UNARY': ('unary', 'EXPRESSIONS'),
'BINARY': ('binary', 'EXPRESSIONS'),
'SHIFTING': ('shifting', 'EXPRESSIONS'),
'BITWISE': ('bitwise', 'EXPRESSIONS'),
'COMPARISON': ('comparisons', 'EXPRESSIONS BASICMETHODS'),
'BOOLEAN': ('booleans', 'EXPRESSIONS TRUTHVALUE'),
'ASSERTION': 'assert',
'ASSIGNMENT': ('assignment', 'AUGMENTEDASSIGNMENT'),
'AUGMENTEDASSIGNMENT': ('augassign', 'NUMBERMETHODS'),
'DELETION': 'del',
'PRINTING': 'print',
'RETURNING': 'return',
'IMPORTING': 'import',
'CONDITIONAL': 'if',
'LOOPING': ('compound', 'for while break continue'),
'TRUTHVALUE': ('truth', 'if while and or not BASICMETHODS'),
'DEBUGGING': ('debugger', 'pdb'),
'CONTEXTMANAGERS': ('context-managers', 'with'),
}
def __init__(self, input=None, output=None):
self._input = input
self._output = output
input = property(lambda self: self._input or sys.stdin)
output = property(lambda self: self._output or sys.stdout)
def __repr__(self):
if inspect.stack()[1][3] == '?':
self()
return ''
return '<pydoc.Helper instance>'
_GoInteractive = object()
def __call__(self, request=_GoInteractive):
if request is not self._GoInteractive:
self.help(request)
else:
self.intro()
self.interact()
self.output.write('''
You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)". Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.
''')
def interact(self):
self.output.write('\n')
while True:
try:
request = self.getline('help> ')
if not request: break
except (KeyboardInterrupt, EOFError):
break
request = strip(request)
# Make sure significant trailing quotation marks of literals don't
# get deleted while cleaning input
if (len(request) > 2 and request[0] == request[-1] in ("'", '"')
and request[0] not in request[1:-1]):
request = request[1:-1]
if lower(request) in ('q', 'quit'): break
self.help(request)
def getline(self, prompt):
"""Read one line, using raw_input when available."""
if self.input is sys.stdin:
return raw_input(prompt)
else:
self.output.write(prompt)
self.output.flush()
return self.input.readline()
def help(self, request):
if type(request) is type(''):
request = request.strip()
if request == 'help': self.intro()
elif request == 'keywords': self.listkeywords()
elif request == 'symbols': self.listsymbols()
elif request == 'topics': self.listtopics()
elif request == 'modules': self.listmodules()
elif request[:8] == 'modules ':
self.listmodules(split(request)[1])
elif request in self.symbols: self.showsymbol(request)
elif request in self.keywords: self.showtopic(request)
elif request in self.topics: self.showtopic(request)
elif request: doc(request, 'Help on %s:')
elif isinstance(request, Helper): self()
else: doc(request, 'Help on %s:')
self.output.write('\n')
def intro(self):
self.output.write('''
Welcome to Python %s! This is the online help utility.
If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://docs.python.org/%s/tutorial/.
Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules. To quit this help utility and
return to the interpreter, just type "quit".
To get a list of available modules, keywords, or topics, type "modules",
"keywords", or "topics". Each module also comes with a one-line summary
of what it does; to list the modules whose summaries contain a given word
such as "spam", type "modules spam".
''' % tuple([sys.version[:3]]*2))
def list(self, items, columns=4, width=80):
items = items[:]
items.sort()
colw = width / columns
rows = (len(items) + columns - 1) / columns
for row in range(rows):
for col in range(columns):
i = col * rows + row
if i < len(items):
self.output.write(items[i])
if col < columns - 1:
self.output.write(' ' + ' ' * (colw-1 - len(items[i])))
self.output.write('\n')
def listkeywords(self):
self.output.write('''
Here is a list of the Python keywords. Enter any keyword to get more help.
''')
self.list(self.keywords.keys())
def listsymbols(self):
self.output.write('''
Here is a list of the punctuation symbols which Python assigns special meaning
to. Enter any symbol to get more help.
''')
self.list(self.symbols.keys())
def listtopics(self):
self.output.write('''
Here is a list of available topics. Enter any topic name to get more help.
''')
self.list(self.topics.keys())
def showtopic(self, topic, more_xrefs=''):
try:
import pydoc_data.topics
except ImportError:
self.output.write('''
Sorry, topic and keyword documentation is not available because the
module "pydoc_data.topics" could not be found.
''')
return
target = self.topics.get(topic, self.keywords.get(topic))
if not target:
self.output.write('no documentation found for %s\n' % repr(topic))
return
if type(target) is type(''):
return self.showtopic(target, more_xrefs)
label, xrefs = target
try:
doc = pydoc_data.topics.topics[label]
except KeyError:
self.output.write('no documentation found for %s\n' % repr(topic))
return
pager(strip(doc) + '\n')
if more_xrefs:
xrefs = (xrefs or '') + ' ' + more_xrefs
if xrefs:
import StringIO, formatter
buffer = StringIO.StringIO()
formatter.DumbWriter(buffer).send_flowing_data(
'Related help topics: ' + join(split(xrefs), ', ') + '\n')
self.output.write('\n%s\n' % buffer.getvalue())
def showsymbol(self, symbol):
target = self.symbols[symbol]
topic, _, xrefs = target.partition(' ')
self.showtopic(topic, xrefs)
def listmodules(self, key=''):
if key:
self.output.write('''
Here is a list of matching modules. Enter any module name to get more help.
''')
apropos(key)
else:
self.output.write('''
Please wait a moment while I gather a list of all available modules...
''')
modules = {}
def callback(path, modname, desc, modules=modules):
if modname and modname[-9:] == '.__init__':
modname = modname[:-9] + ' (package)'
if find(modname, '.') < 0:
modules[modname] = 1
def onerror(modname):
callback(None, modname, None)
ModuleScanner().run(callback, onerror=onerror)
self.list(modules.keys())
self.output.write('''
Enter any module name to get more help. Or, type "modules spam" to search
for modules whose descriptions contain the word "spam".
''')
help = Helper()
class Scanner:
"""A generic tree iterator."""
def __init__(self, roots, children, descendp):
self.roots = roots[:]
self.state = []
self.children = children
self.descendp = descendp
def next(self):
if not self.state:
if not self.roots:
return None
root = self.roots.pop(0)
self.state = [(root, self.children(root))]
node, children = self.state[-1]
if not children:
self.state.pop()
return self.next()
child = children.pop(0)
if self.descendp(child):
self.state.append((child, self.children(child)))
return child
class ModuleScanner:
"""An interruptible scanner that searches module synopses."""
def run(self, callback, key=None, completer=None, onerror=None):
if key: key = lower(key)
self.quit = False
seen = {}
for modname in sys.builtin_module_names:
if modname != '__main__':
seen[modname] = 1
if key is None:
callback(None, modname, '')
else:
desc = split(__import__(modname).__doc__ or '', '\n')[0]
if find(lower(modname + ' - ' + desc), key) >= 0:
callback(None, modname, desc)
for importer, modname, ispkg in pkgutil.walk_packages(onerror=onerror):
if self.quit:
break
if key is None:
callback(None, modname, '')
else:
loader = importer.find_module(modname)
if hasattr(loader,'get_source'):
import StringIO
desc = source_synopsis(
StringIO.StringIO(loader.get_source(modname))
) or ''
if hasattr(loader,'get_filename'):
path = loader.get_filename(modname)
else:
path = None
else:
module = loader.load_module(modname)
desc = module.__doc__.splitlines()[0] if module.__doc__ else ''
path = getattr(module,'__file__',None)
if find(lower(modname + ' - ' + desc), key) >= 0:
callback(path, modname, desc)
if completer:
completer()
def apropos(key):
"""Print all the one-line module summaries that contain a substring."""
def callback(path, modname, desc):
if modname[-9:] == '.__init__':
modname = modname[:-9] + ' (package)'
print modname, desc and '- ' + desc
def onerror(modname):
pass
with warnings.catch_warnings():
warnings.filterwarnings('ignore') # ignore problems during import
ModuleScanner().run(callback, key, onerror=onerror)
# --------------------------------------------------- web browser interface
def serve(port, callback=None, completer=None):
import BaseHTTPServer, mimetools, select
# Patch up mimetools.Message so it doesn't break if rfc822 is reloaded.
class Message(mimetools.Message):
def __init__(self, fp, seekable=1):
Message = self.__class__
Message.__bases__[0].__bases__[0].__init__(self, fp, seekable)
self.encodingheader = self.getheader('content-transfer-encoding')
self.typeheader = self.getheader('content-type')
self.parsetype()
self.parseplist()
class DocHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def send_document(self, title, contents):
try:
self.send_response(200)
self.send_header('Content-Type', 'text/html')
self.end_headers()
self.wfile.write(html.page(title, contents))
except IOError: pass
def do_GET(self):
path = self.path
if path[-5:] == '.html': path = path[:-5]
if path[:1] == '/': path = path[1:]
if path and path != '.':
try:
obj = locate(path, forceload=1)
except ErrorDuringImport, value:
self.send_document(path, html.escape(str(value)))
return
if obj:
self.send_document(describe(obj), html.document(obj, path))
else:
self.send_document(path,
'no Python documentation found for %s' % repr(path))
else:
heading = html.heading(
'<big><big><strong>Python: Index of Modules</strong></big></big>',
'#ffffff', '#7799ee')
def bltinlink(name):
return '<a href="%s.html">%s</a>' % (name, name)
names = filter(lambda x: x != '__main__',
sys.builtin_module_names)
contents = html.multicolumn(names, bltinlink)
indices = ['<p>' + html.bigsection(
'Built-in Modules', '#ffffff', '#ee77aa', contents)]
seen = {}
for dir in sys.path:
indices.append(html.index(dir, seen))
contents = heading + join(indices) + '''<p align=right>
<font color="#909090" face="helvetica, arial"><strong>
pydoc</strong> by Ka-Ping Yee <[email protected]></font>'''
self.send_document('Index of Modules', contents)
def log_message(self, *args): pass
class DocServer(BaseHTTPServer.HTTPServer):
def __init__(self, port, callback):
host = 'localhost'
self.address = (host, port)
self.callback = callback
self.base.__init__(self, self.address, self.handler)
def serve_until_quit(self):
import select
self.quit = False
while not self.quit:
rd, wr, ex = select.select([self.socket.fileno()], [], [], 1)
if rd: self.handle_request()
def server_activate(self):
self.base.server_activate(self)
self.url = 'http://%s:%d/' % (self.address[0], self.server_port)
if self.callback: self.callback(self)
DocServer.base = BaseHTTPServer.HTTPServer
DocServer.handler = DocHandler
DocHandler.MessageClass = Message
try:
try:
DocServer(port, callback).serve_until_quit()
except (KeyboardInterrupt, select.error):
pass
finally:
if completer: completer()
# ----------------------------------------------------- graphical interface
def gui():
"""Graphical interface (starts web server and pops up a control window)."""
class GUI:
def __init__(self, window, port=7464):
self.window = window
self.server = None
self.scanner = None
import Tkinter
self.server_frm = Tkinter.Frame(window)
self.title_lbl = Tkinter.Label(self.server_frm,
text='Starting server...\n ')
self.open_btn = Tkinter.Button(self.server_frm,
text='open browser', command=self.open, state='disabled')
self.quit_btn = Tkinter.Button(self.server_frm,
text='quit serving', command=self.quit, state='disabled')
self.search_frm = Tkinter.Frame(window)
self.search_lbl = Tkinter.Label(self.search_frm, text='Search for')
self.search_ent = Tkinter.Entry(self.search_frm)
self.search_ent.bind('<Return>', self.search)
self.stop_btn = Tkinter.Button(self.search_frm,
text='stop', pady=0, command=self.stop, state='disabled')
if sys.platform == 'win32':
# Trying to hide and show this button crashes under Windows.
self.stop_btn.pack(side='right')
self.window.title('pydoc')
self.window.protocol('WM_DELETE_WINDOW', self.quit)
self.title_lbl.pack(side='top', fill='x')
self.open_btn.pack(side='left', fill='x', expand=1)
self.quit_btn.pack(side='right', fill='x', expand=1)
self.server_frm.pack(side='top', fill='x')
self.search_lbl.pack(side='left')
self.search_ent.pack(side='right', fill='x', expand=1)
self.search_frm.pack(side='top', fill='x')
self.search_ent.focus_set()
font = ('helvetica', sys.platform == 'win32' and 8 or 10)
self.result_lst = Tkinter.Listbox(window, font=font, height=6)
self.result_lst.bind('<Button-1>', self.select)
self.result_lst.bind('<Double-Button-1>', self.goto)
self.result_scr = Tkinter.Scrollbar(window,
orient='vertical', command=self.result_lst.yview)
self.result_lst.config(yscrollcommand=self.result_scr.set)
self.result_frm = Tkinter.Frame(window)
self.goto_btn = Tkinter.Button(self.result_frm,
text='go to selected', command=self.goto)
self.hide_btn = Tkinter.Button(self.result_frm,
text='hide results', command=self.hide)
self.goto_btn.pack(side='left', fill='x', expand=1)
self.hide_btn.pack(side='right', fill='x', expand=1)
self.window.update()
self.minwidth = self.window.winfo_width()
self.minheight = self.window.winfo_height()
self.bigminheight = (self.server_frm.winfo_reqheight() +
self.search_frm.winfo_reqheight() +
self.result_lst.winfo_reqheight() +
self.result_frm.winfo_reqheight())
self.bigwidth, self.bigheight = self.minwidth, self.bigminheight
self.expanded = 0
self.window.wm_geometry('%dx%d' % (self.minwidth, self.minheight))
self.window.wm_minsize(self.minwidth, self.minheight)
self.window.tk.willdispatch()
import threading
threading.Thread(
target=serve, args=(port, self.ready, self.quit)).start()
def ready(self, server):
self.server = server
self.title_lbl.config(
text='Python documentation server at\n' + server.url)
self.open_btn.config(state='normal')
self.quit_btn.config(state='normal')
def open(self, event=None, url=None):
url = url or self.server.url
try:
import webbrowser
webbrowser.open(url)
except ImportError: # pre-webbrowser.py compatibility
if sys.platform == 'win32':
os.system('start "%s"' % url)
else:
rc = os.system('netscape -remote "openURL(%s)" &' % url)
if rc: os.system('netscape "%s" &' % url)
def quit(self, event=None):
if self.server:
self.server.quit = 1
self.window.quit()
def search(self, event=None):
key = self.search_ent.get()
self.stop_btn.pack(side='right')
self.stop_btn.config(state='normal')
self.search_lbl.config(text='Searching for "%s"...' % key)
self.search_ent.forget()
self.search_lbl.pack(side='left')
self.result_lst.delete(0, 'end')
self.goto_btn.config(state='disabled')
self.expand()
import threading
if self.scanner:
self.scanner.quit = 1
self.scanner = ModuleScanner()
def onerror(modname):
pass
threading.Thread(target=self.scanner.run,
args=(self.update, key, self.done),
kwargs=dict(onerror=onerror)).start()
def update(self, path, modname, desc):
if modname[-9:] == '.__init__':
modname = modname[:-9] + ' (package)'
self.result_lst.insert('end',
modname + ' - ' + (desc or '(no description)'))
def stop(self, event=None):
if self.scanner:
self.scanner.quit = 1
self.scanner = None
def done(self):
self.scanner = None
self.search_lbl.config(text='Search for')
self.search_lbl.pack(side='left')
self.search_ent.pack(side='right', fill='x', expand=1)
if sys.platform != 'win32': self.stop_btn.forget()
self.stop_btn.config(state='disabled')
def select(self, event=None):
self.goto_btn.config(state='normal')
def goto(self, event=None):
selection = self.result_lst.curselection()
if selection:
modname = split(self.result_lst.get(selection[0]))[0]
self.open(url=self.server.url + modname + '.html')
def collapse(self):
if not self.expanded: return
self.result_frm.forget()
self.result_scr.forget()
self.result_lst.forget()
self.bigwidth = self.window.winfo_width()
self.bigheight = self.window.winfo_height()
self.window.wm_geometry('%dx%d' % (self.minwidth, self.minheight))
self.window.wm_minsize(self.minwidth, self.minheight)
self.expanded = 0
def expand(self):
if self.expanded: return
self.result_frm.pack(side='bottom', fill='x')
self.result_scr.pack(side='right', fill='y')
self.result_lst.pack(side='top', fill='both', expand=1)
self.window.wm_geometry('%dx%d' % (self.bigwidth, self.bigheight))
self.window.wm_minsize(self.minwidth, self.bigminheight)
self.expanded = 1
def hide(self, event=None):
self.stop()
self.collapse()
import Tkinter
try:
root = Tkinter.Tk()
# Tk will crash if pythonw.exe has an XP .manifest
# file and the root has is not destroyed explicitly.
# If the problem is ever fixed in Tk, the explicit
# destroy can go.
try:
gui = GUI(root)
root.mainloop()
finally:
root.destroy()
except KeyboardInterrupt:
pass
# -------------------------------------------------- command-line interface
def ispath(x):
return isinstance(x, str) and find(x, os.sep) >= 0
def cli():
"""Command-line interface (looks at sys.argv to decide what to do)."""
import getopt
class BadUsage: pass
# Scripts don't get the current directory in their path by default
# unless they are run with the '-m' switch
if '' not in sys.path:
scriptdir = os.path.dirname(sys.argv[0])
if scriptdir in sys.path:
sys.path.remove(scriptdir)
sys.path.insert(0, '.')
try:
opts, args = getopt.getopt(sys.argv[1:], 'gk:p:w')
writing = 0
for opt, val in opts:
if opt == '-g':
gui()
return
if opt == '-k':
apropos(val)
return
if opt == '-p':
try:
port = int(val)
except ValueError:
raise BadUsage
def ready(server):
print 'pydoc server ready at %s' % server.url
def stopped():
print 'pydoc server stopped'
serve(port, ready, stopped)
return
if opt == '-w':
writing = 1
if not args: raise BadUsage
for arg in args:
if ispath(arg) and not os.path.exists(arg):
print 'file %r does not exist' % arg
break
try:
if ispath(arg) and os.path.isfile(arg):
arg = importfile(arg)
if writing:
if ispath(arg) and os.path.isdir(arg):
writedocs(arg)
else:
writedoc(arg)
else:
help.help(arg)
except ErrorDuringImport, value:
print value
except (getopt.error, BadUsage):
cmd = os.path.basename(sys.argv[0])
print """pydoc - the Python documentation tool
%s <name> ...
Show text documentation on something. <name> may be the name of a
Python keyword, topic, function, module, or package, or a dotted
reference to a class or function within a module or module in a
package. If <name> contains a '%s', it is used as the path to a
Python source file to document. If name is 'keywords', 'topics',
or 'modules', a listing of these things is displayed.
%s -k <keyword>
Search for a keyword in the synopsis lines of all available modules.
%s -p <port>
Start an HTTP server on the given port on the local machine. Port
number 0 can be used to get an arbitrary unused port.
%s -g
Pop up a graphical interface for finding and serving documentation.
%s -w <name> ...
Write out the HTML documentation for a module to a file in the current
directory. If <name> contains a '%s', it is treated as a filename; if
it names a directory, documentation is written for all the contents.
""" % (cmd, os.sep, cmd, cmd, cmd, cmd, os.sep)
if __name__ == '__main__': cli()
# -*- coding: utf-8 -*-
# Autogenerated by Sphinx on Sat Aug 26 11:16:28 2017
topics = {'assert': '\n'
'The "assert" statement\n'
'**********************\n'
'\n'
'Assert statements are a convenient way to insert debugging '
'assertions\n'
'into a program:\n'
'\n'
' assert_stmt ::= "assert" expression ["," expression]\n'
'\n'
'The simple form, "assert expression", is equivalent to\n'
'\n'
' if __debug__:\n'
' if not expression: raise AssertionError\n'
'\n'
'The extended form, "assert expression1, expression2", is '
'equivalent to\n'
'\n'
' if __debug__:\n'
' if not expression1: raise AssertionError(expression2)\n'
'\n'
'These equivalences assume that "__debug__" and "AssertionError" '
'refer\n'
'to the built-in variables with those names. In the current\n'
'implementation, the built-in variable "__debug__" is "True" under\n'
'normal circumstances, "False" when optimization is requested '
'(command\n'
'line option -O). The current code generator emits no code for an\n'
'assert statement when optimization is requested at compile time. '
'Note\n'
'that it is unnecessary to include the source code for the '
'expression\n'
'that failed in the error message; it will be displayed as part of '
'the\n'
'stack trace.\n'
'\n'
'Assignments to "__debug__" are illegal. The value for the '
'built-in\n'
'variable is determined when the interpreter starts.\n',
'assignment': '\n'
'Assignment statements\n'
'*********************\n'
'\n'
'Assignment statements are used to (re)bind names to values and '
'to\n'
'modify attributes or items of mutable objects:\n'
'\n'
' assignment_stmt ::= (target_list "=")+ (expression_list | '
'yield_expression)\n'
' target_list ::= target ("," target)* [","]\n'
' target ::= identifier\n'
' | "(" target_list ")"\n'
' | "[" [target_list] "]"\n'
' | attributeref\n'
' | subscription\n'
' | slicing\n'
'\n'
'(See section Primaries for the syntax definitions for the last '
'three\n'
'symbols.)\n'
'\n'
'An assignment statement evaluates the expression list '
'(remember that\n'
'this can be a single expression or a comma-separated list, the '
'latter\n'
'yielding a tuple) and assigns the single resulting object to '
'each of\n'
'the target lists, from left to right.\n'
'\n'
'Assignment is defined recursively depending on the form of the '
'target\n'
'(list). When a target is part of a mutable object (an '
'attribute\n'
'reference, subscription or slicing), the mutable object must\n'
'ultimately perform the assignment and decide about its '
'validity, and\n'
'may raise an exception if the assignment is unacceptable. The '
'rules\n'
'observed by various types and the exceptions raised are given '
'with the\n'
'definition of the object types (see section The standard type\n'
'hierarchy).\n'
'\n'
'Assignment of an object to a target list is recursively '
'defined as\n'
'follows.\n'
'\n'
'* If the target list is a single target: The object is '
'assigned to\n'
' that target.\n'
'\n'
'* If the target list is a comma-separated list of targets: '
'The\n'
' object must be an iterable with the same number of items as '
'there\n'
' are targets in the target list, and the items are assigned, '
'from\n'
' left to right, to the corresponding targets.\n'
'\n'
'Assignment of an object to a single target is recursively '
'defined as\n'
'follows.\n'
'\n'
'* If the target is an identifier (name):\n'
'\n'
' * If the name does not occur in a "global" statement in the\n'
' current code block: the name is bound to the object in the '
'current\n'
' local namespace.\n'
'\n'
' * Otherwise: the name is bound to the object in the current '
'global\n'
' namespace.\n'
'\n'
' The name is rebound if it was already bound. This may cause '
'the\n'
' reference count for the object previously bound to the name '
'to reach\n'
' zero, causing the object to be deallocated and its '
'destructor (if it\n'
' has one) to be called.\n'
'\n'
'* If the target is a target list enclosed in parentheses or '
'in\n'
' square brackets: The object must be an iterable with the '
'same number\n'
' of items as there are targets in the target list, and its '
'items are\n'
' assigned, from left to right, to the corresponding targets.\n'
'\n'
'* If the target is an attribute reference: The primary '
'expression in\n'
' the reference is evaluated. It should yield an object with\n'
' assignable attributes; if this is not the case, "TypeError" '
'is\n'
' raised. That object is then asked to assign the assigned '
'object to\n'
' the given attribute; if it cannot perform the assignment, it '
'raises\n'
' an exception (usually but not necessarily '
'"AttributeError").\n'
'\n'
' Note: If the object is a class instance and the attribute '
'reference\n'
' occurs on both sides of the assignment operator, the RHS '
'expression,\n'
' "a.x" can access either an instance attribute or (if no '
'instance\n'
' attribute exists) a class attribute. The LHS target "a.x" '
'is always\n'
' set as an instance attribute, creating it if necessary. '
'Thus, the\n'
' two occurrences of "a.x" do not necessarily refer to the '
'same\n'
' attribute: if the RHS expression refers to a class '
'attribute, the\n'
' LHS creates a new instance attribute as the target of the\n'
' assignment:\n'
'\n'
' class Cls:\n'
' x = 3 # class variable\n'
' inst = Cls()\n'
' inst.x = inst.x + 1 # writes inst.x as 4 leaving Cls.x '
'as 3\n'
'\n'
' This description does not necessarily apply to descriptor\n'
' attributes, such as properties created with "property()".\n'
'\n'
'* If the target is a subscription: The primary expression in '
'the\n'
' reference is evaluated. It should yield either a mutable '
'sequence\n'
' object (such as a list) or a mapping object (such as a '
'dictionary).\n'
' Next, the subscript expression is evaluated.\n'
'\n'
' If the primary is a mutable sequence object (such as a '
'list), the\n'
' subscript must yield a plain integer. If it is negative, '
'the\n'
" sequence's length is added to it. The resulting value must "
'be a\n'
" nonnegative integer less than the sequence's length, and "
'the\n'
' sequence is asked to assign the assigned object to its item '
'with\n'
' that index. If the index is out of range, "IndexError" is '
'raised\n'
' (assignment to a subscripted sequence cannot add new items '
'to a\n'
' list).\n'
'\n'
' If the primary is a mapping object (such as a dictionary), '
'the\n'
" subscript must have a type compatible with the mapping's key "
'type,\n'
' and the mapping is then asked to create a key/datum pair '
'which maps\n'
' the subscript to the assigned object. This can either '
'replace an\n'
' existing key/value pair with the same key value, or insert a '
'new\n'
' key/value pair (if no key with the same value existed).\n'
'\n'
'* If the target is a slicing: The primary expression in the\n'
' reference is evaluated. It should yield a mutable sequence '
'object\n'
' (such as a list). The assigned object should be a sequence '
'object\n'
' of the same type. Next, the lower and upper bound '
'expressions are\n'
' evaluated, insofar they are present; defaults are zero and '
'the\n'
" sequence's length. The bounds should evaluate to (small) "
'integers.\n'
" If either bound is negative, the sequence's length is added "
'to it.\n'
' The resulting bounds are clipped to lie between zero and '
'the\n'
" sequence's length, inclusive. Finally, the sequence object "
'is asked\n'
' to replace the slice with the items of the assigned '
'sequence. The\n'
' length of the slice may be different from the length of the '
'assigned\n'
' sequence, thus changing the length of the target sequence, '
'if the\n'
' object allows it.\n'
'\n'
'**CPython implementation detail:** In the current '
'implementation, the\n'
'syntax for targets is taken to be the same as for expressions, '
'and\n'
'invalid syntax is rejected during the code generation phase, '
'causing\n'
'less detailed error messages.\n'
'\n'
'WARNING: Although the definition of assignment implies that '
'overlaps\n'
"between the left-hand side and the right-hand side are 'safe' "
'(for\n'
'example "a, b = b, a" swaps two variables), overlaps *within* '
'the\n'
'collection of assigned-to variables are not safe! For '
'instance, the\n'
'following program prints "[0, 2]":\n'
'\n'
' x = [0, 1]\n'
' i = 0\n'
' i, x[i] = 1, 2\n'
' print x\n'
'\n'
'\n'
'Augmented assignment statements\n'
'===============================\n'
'\n'
'Augmented assignment is the combination, in a single '
'statement, of a\n'
'binary operation and an assignment statement:\n'
'\n'
' augmented_assignment_stmt ::= augtarget augop '
'(expression_list | yield_expression)\n'
' augtarget ::= identifier | attributeref | '
'subscription | slicing\n'
' augop ::= "+=" | "-=" | "*=" | "/=" | '
'"//=" | "%=" | "**="\n'
' | ">>=" | "<<=" | "&=" | "^=" | "|="\n'
'\n'
'(See section Primaries for the syntax definitions for the last '
'three\n'
'symbols.)\n'
'\n'
'An augmented assignment evaluates the target (which, unlike '
'normal\n'
'assignment statements, cannot be an unpacking) and the '
'expression\n'
'list, performs the binary operation specific to the type of '
'assignment\n'
'on the two operands, and assigns the result to the original '
'target.\n'
'The target is only evaluated once.\n'
'\n'
'An augmented assignment expression like "x += 1" can be '
'rewritten as\n'
'"x = x + 1" to achieve a similar, but not exactly equal '
'effect. In the\n'
'augmented version, "x" is only evaluated once. Also, when '
'possible,\n'
'the actual operation is performed *in-place*, meaning that '
'rather than\n'
'creating a new object and assigning that to the target, the '
'old object\n'
'is modified instead.\n'
'\n'
'With the exception of assigning to tuples and multiple targets '
'in a\n'
'single statement, the assignment done by augmented assignment\n'
'statements is handled the same way as normal assignments. '
'Similarly,\n'
'with the exception of the possible *in-place* behavior, the '
'binary\n'
'operation performed by augmented assignment is the same as the '
'normal\n'
'binary operations.\n'
'\n'
'For targets which are attribute references, the same caveat '
'about\n'
'class and instance attributes applies as for regular '
'assignments.\n',
'atom-identifiers': '\n'
'Identifiers (Names)\n'
'*******************\n'
'\n'
'An identifier occurring as an atom is a name. See '
'section Identifiers\n'
'and keywords for lexical definition and section Naming '
'and binding for\n'
'documentation of naming and binding.\n'
'\n'
'When the name is bound to an object, evaluation of the '
'atom yields\n'
'that object. When a name is not bound, an attempt to '
'evaluate it\n'
'raises a "NameError" exception.\n'
'\n'
'**Private name mangling:** When an identifier that '
'textually occurs in\n'
'a class definition begins with two or more underscore '
'characters and\n'
'does not end in two or more underscores, it is '
'considered a *private\n'
'name* of that class. Private names are transformed to a '
'longer form\n'
'before code is generated for them. The transformation '
'inserts the\n'
'class name, with leading underscores removed and a '
'single underscore\n'
'inserted, in front of the name. For example, the '
'identifier "__spam"\n'
'occurring in a class named "Ham" will be transformed to '
'"_Ham__spam".\n'
'This transformation is independent of the syntactical '
'context in which\n'
'the identifier is used. If the transformed name is '
'extremely long\n'
'(longer than 255 characters), implementation defined '
'truncation may\n'
'happen. If the class name consists only of underscores, '
'no\n'
'transformation is done.\n',
'atom-literals': '\n'
'Literals\n'
'********\n'
'\n'
'Python supports string literals and various numeric '
'literals:\n'
'\n'
' literal ::= stringliteral | integer | longinteger\n'
' | floatnumber | imagnumber\n'
'\n'
'Evaluation of a literal yields an object of the given type '
'(string,\n'
'integer, long integer, floating point number, complex '
'number) with the\n'
'given value. The value may be approximated in the case of '
'floating\n'
'point and imaginary (complex) literals. See section '
'Literals for\n'
'details.\n'
'\n'
'All literals correspond to immutable data types, and hence '
'the\n'
"object's identity is less important than its value. "
'Multiple\n'
'evaluations of literals with the same value (either the '
'same\n'
'occurrence in the program text or a different occurrence) '
'may obtain\n'
'the same object or a different object with the same '
'value.\n',
'attribute-access': '\n'
'Customizing attribute access\n'
'****************************\n'
'\n'
'The following methods can be defined to customize the '
'meaning of\n'
'attribute access (use of, assignment to, or deletion of '
'"x.name") for\n'
'class instances.\n'
'\n'
'object.__getattr__(self, name)\n'
'\n'
' Called when an attribute lookup has not found the '
'attribute in the\n'
' usual places (i.e. it is not an instance attribute '
'nor is it found\n'
' in the class tree for "self"). "name" is the '
'attribute name. This\n'
' method should return the (computed) attribute value '
'or raise an\n'
' "AttributeError" exception.\n'
'\n'
' Note that if the attribute is found through the '
'normal mechanism,\n'
' "__getattr__()" is not called. (This is an '
'intentional asymmetry\n'
' between "__getattr__()" and "__setattr__()".) This is '
'done both for\n'
' efficiency reasons and because otherwise '
'"__getattr__()" would have\n'
' no way to access other attributes of the instance. '
'Note that at\n'
' least for instance variables, you can fake total '
'control by not\n'
' inserting any values in the instance attribute '
'dictionary (but\n'
' instead inserting them in another object). See the\n'
' "__getattribute__()" method below for a way to '
'actually get total\n'
' control in new-style classes.\n'
'\n'
'object.__setattr__(self, name, value)\n'
'\n'
' Called when an attribute assignment is attempted. '
'This is called\n'
' instead of the normal mechanism (i.e. store the value '
'in the\n'
' instance dictionary). *name* is the attribute name, '
'*value* is the\n'
' value to be assigned to it.\n'
'\n'
' If "__setattr__()" wants to assign to an instance '
'attribute, it\n'
' should not simply execute "self.name = value" --- '
'this would cause\n'
' a recursive call to itself. Instead, it should '
'insert the value in\n'
' the dictionary of instance attributes, e.g., '
'"self.__dict__[name] =\n'
' value". For new-style classes, rather than accessing '
'the instance\n'
' dictionary, it should call the base class method with '
'the same\n'
' name, for example, "object.__setattr__(self, name, '
'value)".\n'
'\n'
'object.__delattr__(self, name)\n'
'\n'
' Like "__setattr__()" but for attribute deletion '
'instead of\n'
' assignment. This should only be implemented if "del '
'obj.name" is\n'
' meaningful for the object.\n'
'\n'
'\n'
'More attribute access for new-style classes\n'
'===========================================\n'
'\n'
'The following methods only apply to new-style classes.\n'
'\n'
'object.__getattribute__(self, name)\n'
'\n'
' Called unconditionally to implement attribute '
'accesses for\n'
' instances of the class. If the class also defines '
'"__getattr__()",\n'
' the latter will not be called unless '
'"__getattribute__()" either\n'
' calls it explicitly or raises an "AttributeError". '
'This method\n'
' should return the (computed) attribute value or raise '
'an\n'
' "AttributeError" exception. In order to avoid '
'infinite recursion in\n'
' this method, its implementation should always call '
'the base class\n'
' method with the same name to access any attributes it '
'needs, for\n'
' example, "object.__getattribute__(self, name)".\n'
'\n'
' Note: This method may still be bypassed when looking '
'up special\n'
' methods as the result of implicit invocation via '
'language syntax\n'
' or built-in functions. See Special method lookup '
'for new-style\n'
' classes.\n'
'\n'
'\n'
'Implementing Descriptors\n'
'========================\n'
'\n'
'The following methods only apply when an instance of the '
'class\n'
'containing the method (a so-called *descriptor* class) '
'appears in an\n'
'*owner* class (the descriptor must be in either the '
"owner's class\n"
'dictionary or in the class dictionary for one of its '
'parents). In the\n'
'examples below, "the attribute" refers to the attribute '
'whose name is\n'
"the key of the property in the owner class' "
'"__dict__".\n'
'\n'
'object.__get__(self, instance, owner)\n'
'\n'
' Called to get the attribute of the owner class (class '
'attribute\n'
' access) or of an instance of that class (instance '
'attribute\n'
' access). *owner* is always the owner class, while '
'*instance* is the\n'
' instance that the attribute was accessed through, or '
'"None" when\n'
' the attribute is accessed through the *owner*. This '
'method should\n'
' return the (computed) attribute value or raise an '
'"AttributeError"\n'
' exception.\n'
'\n'
'object.__set__(self, instance, value)\n'
'\n'
' Called to set the attribute on an instance *instance* '
'of the owner\n'
' class to a new value, *value*.\n'
'\n'
'object.__delete__(self, instance)\n'
'\n'
' Called to delete the attribute on an instance '
'*instance* of the\n'
' owner class.\n'
'\n'
'\n'
'Invoking Descriptors\n'
'====================\n'
'\n'
'In general, a descriptor is an object attribute with '
'"binding\n'
'behavior", one whose attribute access has been '
'overridden by methods\n'
'in the descriptor protocol: "__get__()", "__set__()", '
'and\n'
'"__delete__()". If any of those methods are defined for '
'an object, it\n'
'is said to be a descriptor.\n'
'\n'
'The default behavior for attribute access is to get, '
'set, or delete\n'
"the attribute from an object's dictionary. For instance, "
'"a.x" has a\n'
'lookup chain starting with "a.__dict__[\'x\']", then\n'
'"type(a).__dict__[\'x\']", and continuing through the '
'base classes of\n'
'"type(a)" excluding metaclasses.\n'
'\n'
'However, if the looked-up value is an object defining '
'one of the\n'
'descriptor methods, then Python may override the default '
'behavior and\n'
'invoke the descriptor method instead. Where this occurs '
'in the\n'
'precedence chain depends on which descriptor methods '
'were defined and\n'
'how they were called. Note that descriptors are only '
'invoked for new\n'
'style objects or classes (ones that subclass "object()" '
'or "type()").\n'
'\n'
'The starting point for descriptor invocation is a '
'binding, "a.x". How\n'
'the arguments are assembled depends on "a":\n'
'\n'
'Direct Call\n'
' The simplest and least common call is when user code '
'directly\n'
' invokes a descriptor method: "x.__get__(a)".\n'
'\n'
'Instance Binding\n'
' If binding to a new-style object instance, "a.x" is '
'transformed\n'
' into the call: "type(a).__dict__[\'x\'].__get__(a, '
'type(a))".\n'
'\n'
'Class Binding\n'
' If binding to a new-style class, "A.x" is transformed '
'into the\n'
' call: "A.__dict__[\'x\'].__get__(None, A)".\n'
'\n'
'Super Binding\n'
' If "a" is an instance of "super", then the binding '
'"super(B,\n'
' obj).m()" searches "obj.__class__.__mro__" for the '
'base class "A"\n'
' immediately preceding "B" and then invokes the '
'descriptor with the\n'
' call: "A.__dict__[\'m\'].__get__(obj, '
'obj.__class__)".\n'
'\n'
'For instance bindings, the precedence of descriptor '
'invocation depends\n'
'on the which descriptor methods are defined. A '
'descriptor can define\n'
'any combination of "__get__()", "__set__()" and '
'"__delete__()". If it\n'
'does not define "__get__()", then accessing the '
'attribute will return\n'
'the descriptor object itself unless there is a value in '
"the object's\n"
'instance dictionary. If the descriptor defines '
'"__set__()" and/or\n'
'"__delete__()", it is a data descriptor; if it defines '
'neither, it is\n'
'a non-data descriptor. Normally, data descriptors '
'define both\n'
'"__get__()" and "__set__()", while non-data descriptors '
'have just the\n'
'"__get__()" method. Data descriptors with "__set__()" '
'and "__get__()"\n'
'defined always override a redefinition in an instance '
'dictionary. In\n'
'contrast, non-data descriptors can be overridden by '
'instances.\n'
'\n'
'Python methods (including "staticmethod()" and '
'"classmethod()") are\n'
'implemented as non-data descriptors. Accordingly, '
'instances can\n'
'redefine and override methods. This allows individual '
'instances to\n'
'acquire behaviors that differ from other instances of '
'the same class.\n'
'\n'
'The "property()" function is implemented as a data '
'descriptor.\n'
'Accordingly, instances cannot override the behavior of a '
'property.\n'
'\n'
'\n'
'__slots__\n'
'=========\n'
'\n'
'By default, instances of both old and new-style classes '
'have a\n'
'dictionary for attribute storage. This wastes space for '
'objects\n'
'having very few instance variables. The space '
'consumption can become\n'
'acute when creating large numbers of instances.\n'
'\n'
'The default can be overridden by defining *__slots__* in '
'a new-style\n'
'class definition. The *__slots__* declaration takes a '
'sequence of\n'
'instance variables and reserves just enough space in '
'each instance to\n'
'hold a value for each variable. Space is saved because '
'*__dict__* is\n'
'not created for each instance.\n'
'\n'
'__slots__\n'
'\n'
' This class variable can be assigned a string, '
'iterable, or sequence\n'
' of strings with variable names used by instances. If '
'defined in a\n'
' new-style class, *__slots__* reserves space for the '
'declared\n'
' variables and prevents the automatic creation of '
'*__dict__* and\n'
' *__weakref__* for each instance.\n'
'\n'
' New in version 2.2.\n'
'\n'
'Notes on using *__slots__*\n'
'\n'
'* When inheriting from a class without *__slots__*, the '
'*__dict__*\n'
' attribute of that class will always be accessible, so '
'a *__slots__*\n'
' definition in the subclass is meaningless.\n'
'\n'
'* Without a *__dict__* variable, instances cannot be '
'assigned new\n'
' variables not listed in the *__slots__* definition. '
'Attempts to\n'
' assign to an unlisted variable name raises '
'"AttributeError". If\n'
' dynamic assignment of new variables is desired, then '
'add\n'
' "\'__dict__\'" to the sequence of strings in the '
'*__slots__*\n'
' declaration.\n'
'\n'
' Changed in version 2.3: Previously, adding '
'"\'__dict__\'" to the\n'
' *__slots__* declaration would not enable the '
'assignment of new\n'
' attributes not specifically listed in the sequence of '
'instance\n'
' variable names.\n'
'\n'
'* Without a *__weakref__* variable for each instance, '
'classes\n'
' defining *__slots__* do not support weak references to '
'its\n'
' instances. If weak reference support is needed, then '
'add\n'
' "\'__weakref__\'" to the sequence of strings in the '
'*__slots__*\n'
' declaration.\n'
'\n'
' Changed in version 2.3: Previously, adding '
'"\'__weakref__\'" to the\n'
' *__slots__* declaration would not enable support for '
'weak\n'
' references.\n'
'\n'
'* *__slots__* are implemented at the class level by '
'creating\n'
' descriptors (Implementing Descriptors) for each '
'variable name. As a\n'
' result, class attributes cannot be used to set default '
'values for\n'
' instance variables defined by *__slots__*; otherwise, '
'the class\n'
' attribute would overwrite the descriptor assignment.\n'
'\n'
'* The action of a *__slots__* declaration is limited to '
'the class\n'
' where it is defined. As a result, subclasses will '
'have a *__dict__*\n'
' unless they also define *__slots__* (which must only '
'contain names\n'
' of any *additional* slots).\n'
'\n'
'* If a class defines a slot also defined in a base '
'class, the\n'
' instance variable defined by the base class slot is '
'inaccessible\n'
' (except by retrieving its descriptor directly from the '
'base class).\n'
' This renders the meaning of the program undefined. In '
'the future, a\n'
' check may be added to prevent this.\n'
'\n'
'* Nonempty *__slots__* does not work for classes derived '
'from\n'
' "variable-length" built-in types such as "long", "str" '
'and "tuple".\n'
'\n'
'* Any non-string iterable may be assigned to '
'*__slots__*. Mappings\n'
' may also be used; however, in the future, special '
'meaning may be\n'
' assigned to the values corresponding to each key.\n'
'\n'
'* *__class__* assignment works only if both classes have '
'the same\n'
' *__slots__*.\n'
'\n'
' Changed in version 2.6: Previously, *__class__* '
'assignment raised an\n'
' error if either new or old class had *__slots__*.\n',
'attribute-references': '\n'
'Attribute references\n'
'********************\n'
'\n'
'An attribute reference is a primary followed by a '
'period and a name:\n'
'\n'
' attributeref ::= primary "." identifier\n'
'\n'
'The primary must evaluate to an object of a type '
'that supports\n'
'attribute references, e.g., a module, list, or an '
'instance. This\n'
'object is then asked to produce the attribute whose '
'name is the\n'
'identifier. If this attribute is not available, the '
'exception\n'
'"AttributeError" is raised. Otherwise, the type and '
'value of the\n'
'object produced is determined by the object. '
'Multiple evaluations of\n'
'the same attribute reference may yield different '
'objects.\n',
'augassign': '\n'
'Augmented assignment statements\n'
'*******************************\n'
'\n'
'Augmented assignment is the combination, in a single statement, '
'of a\n'
'binary operation and an assignment statement:\n'
'\n'
' augmented_assignment_stmt ::= augtarget augop '
'(expression_list | yield_expression)\n'
' augtarget ::= identifier | attributeref | '
'subscription | slicing\n'
' augop ::= "+=" | "-=" | "*=" | "/=" | '
'"//=" | "%=" | "**="\n'
' | ">>=" | "<<=" | "&=" | "^=" | "|="\n'
'\n'
'(See section Primaries for the syntax definitions for the last '
'three\n'
'symbols.)\n'
'\n'
'An augmented assignment evaluates the target (which, unlike '
'normal\n'
'assignment statements, cannot be an unpacking) and the '
'expression\n'
'list, performs the binary operation specific to the type of '
'assignment\n'
'on the two operands, and assigns the result to the original '
'target.\n'
'The target is only evaluated once.\n'
'\n'
'An augmented assignment expression like "x += 1" can be '
'rewritten as\n'
'"x = x + 1" to achieve a similar, but not exactly equal effect. '
'In the\n'
'augmented version, "x" is only evaluated once. Also, when '
'possible,\n'
'the actual operation is performed *in-place*, meaning that '
'rather than\n'
'creating a new object and assigning that to the target, the old '
'object\n'
'is modified instead.\n'
'\n'
'With the exception of assigning to tuples and multiple targets '
'in a\n'
'single statement, the assignment done by augmented assignment\n'
'statements is handled the same way as normal assignments. '
'Similarly,\n'
'with the exception of the possible *in-place* behavior, the '
'binary\n'
'operation performed by augmented assignment is the same as the '
'normal\n'
'binary operations.\n'
'\n'
'For targets which are attribute references, the same caveat '
'about\n'
'class and instance attributes applies as for regular '
'assignments.\n',
'binary': '\n'
'Binary arithmetic operations\n'
'****************************\n'
'\n'
'The binary arithmetic operations have the conventional priority\n'
'levels. Note that some of these operations also apply to certain '
'non-\n'
'numeric types. Apart from the power operator, there are only two\n'
'levels, one for multiplicative operators and one for additive\n'
'operators:\n'
'\n'
' m_expr ::= u_expr | m_expr "*" u_expr | m_expr "//" u_expr | '
'm_expr "/" u_expr\n'
' | m_expr "%" u_expr\n'
' a_expr ::= m_expr | a_expr "+" m_expr | a_expr "-" m_expr\n'
'\n'
'The "*" (multiplication) operator yields the product of its '
'arguments.\n'
'The arguments must either both be numbers, or one argument must be '
'an\n'
'integer (plain or long) and the other must be a sequence. In the\n'
'former case, the numbers are converted to a common type and then\n'
'multiplied together. In the latter case, sequence repetition is\n'
'performed; a negative repetition factor yields an empty sequence.\n'
'\n'
'The "/" (division) and "//" (floor division) operators yield the\n'
'quotient of their arguments. The numeric arguments are first\n'
'converted to a common type. Plain or long integer division yields '
'an\n'
'integer of the same type; the result is that of mathematical '
'division\n'
"with the 'floor' function applied to the result. Division by zero\n"
'raises the "ZeroDivisionError" exception.\n'
'\n'
'The "%" (modulo) operator yields the remainder from the division '
'of\n'
'the first argument by the second. The numeric arguments are '
'first\n'
'converted to a common type. A zero right argument raises the\n'
'"ZeroDivisionError" exception. The arguments may be floating '
'point\n'
'numbers, e.g., "3.14%0.7" equals "0.34" (since "3.14" equals '
'"4*0.7 +\n'
'0.34".) The modulo operator always yields a result with the same '
'sign\n'
'as its second operand (or zero); the absolute value of the result '
'is\n'
'strictly smaller than the absolute value of the second operand '
'[2].\n'
'\n'
'The integer division and modulo operators are connected by the\n'
'following identity: "x == (x/y)*y + (x%y)". Integer division and\n'
'modulo are also connected with the built-in function "divmod()":\n'
'"divmod(x, y) == (x/y, x%y)". These identities don\'t hold for\n'
'floating point numbers; there similar identities hold '
'approximately\n'
'where "x/y" is replaced by "floor(x/y)" or "floor(x/y) - 1" [3].\n'
'\n'
'In addition to performing the modulo operation on numbers, the '
'"%"\n'
'operator is also overloaded by string and unicode objects to '
'perform\n'
'string formatting (also known as interpolation). The syntax for '
'string\n'
'formatting is described in the Python Library Reference, section\n'
'String Formatting Operations.\n'
'\n'
'Deprecated since version 2.3: The floor division operator, the '
'modulo\n'
'operator, and the "divmod()" function are no longer defined for\n'
'complex numbers. Instead, convert to a floating point number '
'using\n'
'the "abs()" function if appropriate.\n'
'\n'
'The "+" (addition) operator yields the sum of its arguments. The\n'
'arguments must either both be numbers or both sequences of the '
'same\n'
'type. In the former case, the numbers are converted to a common '
'type\n'
'and then added together. In the latter case, the sequences are\n'
'concatenated.\n'
'\n'
'The "-" (subtraction) operator yields the difference of its '
'arguments.\n'
'The numeric arguments are first converted to a common type.\n',
'bitwise': '\n'
'Binary bitwise operations\n'
'*************************\n'
'\n'
'Each of the three bitwise operations has a different priority '
'level:\n'
'\n'
' and_expr ::= shift_expr | and_expr "&" shift_expr\n'
' xor_expr ::= and_expr | xor_expr "^" and_expr\n'
' or_expr ::= xor_expr | or_expr "|" xor_expr\n'
'\n'
'The "&" operator yields the bitwise AND of its arguments, which '
'must\n'
'be plain or long integers. The arguments are converted to a '
'common\n'
'type.\n'
'\n'
'The "^" operator yields the bitwise XOR (exclusive OR) of its\n'
'arguments, which must be plain or long integers. The arguments '
'are\n'
'converted to a common type.\n'
'\n'
'The "|" operator yields the bitwise (inclusive) OR of its '
'arguments,\n'
'which must be plain or long integers. The arguments are '
'converted to\n'
'a common type.\n',
'bltin-code-objects': '\n'
'Code Objects\n'
'************\n'
'\n'
'Code objects are used by the implementation to '
'represent "pseudo-\n'
'compiled" executable Python code such as a function '
'body. They differ\n'
"from function objects because they don't contain a "
'reference to their\n'
'global execution environment. Code objects are '
'returned by the built-\n'
'in "compile()" function and can be extracted from '
'function objects\n'
'through their "func_code" attribute. See also the '
'"code" module.\n'
'\n'
'A code object can be executed or evaluated by passing '
'it (instead of a\n'
'source string) to the "exec" statement or the built-in '
'"eval()"\n'
'function.\n'
'\n'
'See The standard type hierarchy for more '
'information.\n',
'bltin-ellipsis-object': '\n'
'The Ellipsis Object\n'
'*******************\n'
'\n'
'This object is used by extended slice notation (see '
'Slicings). It\n'
'supports no special operations. There is exactly '
'one ellipsis object,\n'
'named "Ellipsis" (a built-in name).\n'
'\n'
'It is written as "Ellipsis". When in a subscript, '
'it can also be\n'
'written as "...", for example "seq[...]".\n',
'bltin-file-objects': '\n'
'File Objects\n'
'************\n'
'\n'
'File objects are implemented using C\'s "stdio" '
'package and can be\n'
'created with the built-in "open()" function. File '
'objects are also\n'
'returned by some other built-in functions and methods, '
'such as\n'
'"os.popen()" and "os.fdopen()" and the "makefile()" '
'method of socket\n'
'objects. Temporary files can be created using the '
'"tempfile" module,\n'
'and high-level file operations such as copying, '
'moving, and deleting\n'
'files and directories can be achieved with the '
'"shutil" module.\n'
'\n'
'When a file operation fails for an I/O-related reason, '
'the exception\n'
'"IOError" is raised. This includes situations where '
'the operation is\n'
'not defined for some reason, like "seek()" on a tty '
'device or writing\n'
'a file opened for reading.\n'
'\n'
'Files have the following methods:\n'
'\n'
'file.close()\n'
'\n'
' Close the file. A closed file cannot be read or '
'written any more.\n'
' Any operation which requires that the file be open '
'will raise a\n'
' "ValueError" after the file has been closed. '
'Calling "close()"\n'
' more than once is allowed.\n'
'\n'
' As of Python 2.5, you can avoid having to call this '
'method\n'
' explicitly if you use the "with" statement. For '
'example, the\n'
' following code will automatically close *f* when '
'the "with" block\n'
' is exited:\n'
'\n'
' from __future__ import with_statement # This '
"isn't required in Python 2.6\n"
'\n'
' with open("hello.txt") as f:\n'
' for line in f:\n'
' print line,\n'
'\n'
' In older versions of Python, you would have needed '
'to do this to\n'
' get the same effect:\n'
'\n'
' f = open("hello.txt")\n'
' try:\n'
' for line in f:\n'
' print line,\n'
' finally:\n'
' f.close()\n'
'\n'
' Note: Not all "file-like" types in Python support '
'use as a\n'
' context manager for the "with" statement. If '
'your code is\n'
' intended to work with any file-like object, you '
'can use the\n'
' function "contextlib.closing()" instead of using '
'the object\n'
' directly.\n'
'\n'
'file.flush()\n'
'\n'
' Flush the internal buffer, like "stdio"\'s '
'"fflush()". This may be\n'
' a no-op on some file-like objects.\n'
'\n'
' Note: "flush()" does not necessarily write the '
"file's data to\n"
' disk. Use "flush()" followed by "os.fsync()" to '
'ensure this\n'
' behavior.\n'
'\n'
'file.fileno()\n'
'\n'
' Return the integer "file descriptor" that is used '
'by the underlying\n'
' implementation to request I/O operations from the '
'operating system.\n'
' This can be useful for other, lower level '
'interfaces that use file\n'
' descriptors, such as the "fcntl" module or '
'"os.read()" and friends.\n'
'\n'
' Note: File-like objects which do not have a real '
'file descriptor\n'
' should *not* provide this method!\n'
'\n'
'file.isatty()\n'
'\n'
' Return "True" if the file is connected to a '
'tty(-like) device, else\n'
' "False".\n'
'\n'
' Note: If a file-like object is not associated with '
'a real file,\n'
' this method should *not* be implemented.\n'
'\n'
'file.next()\n'
'\n'
' A file object is its own iterator, for example '
'"iter(f)" returns\n'
' *f* (unless *f* is closed). When a file is used as '
'an iterator,\n'
' typically in a "for" loop (for example, "for line '
'in f: print\n'
' line.strip()"), the "next()" method is called '
'repeatedly. This\n'
' method returns the next input line, or raises '
'"StopIteration" when\n'
' EOF is hit when the file is open for reading '
'(behavior is undefined\n'
' when the file is open for writing). In order to '
'make a "for" loop\n'
' the most efficient way of looping over the lines of '
'a file (a very\n'
' common operation), the "next()" method uses a '
'hidden read-ahead\n'
' buffer. As a consequence of using a read-ahead '
'buffer, combining\n'
' "next()" with other file methods (like '
'"readline()") does not work\n'
' right. However, using "seek()" to reposition the '
'file to an\n'
' absolute position will flush the read-ahead '
'buffer.\n'
'\n'
' New in version 2.3.\n'
'\n'
'file.read([size])\n'
'\n'
' Read at most *size* bytes from the file (less if '
'the read hits EOF\n'
' before obtaining *size* bytes). If the *size* '
'argument is negative\n'
' or omitted, read all data until EOF is reached. '
'The bytes are\n'
' returned as a string object. An empty string is '
'returned when EOF\n'
' is encountered immediately. (For certain files, '
'like ttys, it\n'
' makes sense to continue reading after an EOF is '
'hit.) Note that\n'
' this method may call the underlying C function '
'"fread()" more than\n'
' once in an effort to acquire as close to *size* '
'bytes as possible.\n'
' Also note that when in non-blocking mode, less data '
'than was\n'
' requested may be returned, even if no *size* '
'parameter was given.\n'
'\n'
' Note: This function is simply a wrapper for the '
'underlying\n'
' "fread()" C function, and will behave the same in '
'corner cases,\n'
' such as whether the EOF value is cached.\n'
'\n'
'file.readline([size])\n'
'\n'
' Read one entire line from the file. A trailing '
'newline character\n'
' is kept in the string (but may be absent when a '
'file ends with an\n'
' incomplete line). [6] If the *size* argument is '
'present and non-\n'
' negative, it is a maximum byte count (including the '
'trailing\n'
' newline) and an incomplete line may be returned. '
'When *size* is not\n'
' 0, an empty string is returned *only* when EOF is '
'encountered\n'
' immediately.\n'
'\n'
' Note: Unlike "stdio"\'s "fgets()", the returned '
'string contains\n'
' null characters ("\'\\0\'") if they occurred in '
'the input.\n'
'\n'
'file.readlines([sizehint])\n'
'\n'
' Read until EOF using "readline()" and return a list '
'containing the\n'
' lines thus read. If the optional *sizehint* '
'argument is present,\n'
' instead of reading up to EOF, whole lines totalling '
'approximately\n'
' *sizehint* bytes (possibly after rounding up to an '
'internal buffer\n'
' size) are read. Objects implementing a file-like '
'interface may\n'
' choose to ignore *sizehint* if it cannot be '
'implemented, or cannot\n'
' be implemented efficiently.\n'
'\n'
'file.xreadlines()\n'
'\n'
' This method returns the same thing as "iter(f)".\n'
'\n'
' New in version 2.1.\n'
'\n'
' Deprecated since version 2.3: Use "for line in '
'file" instead.\n'
'\n'
'file.seek(offset[, whence])\n'
'\n'
' Set the file\'s current position, like "stdio"\'s '
'"fseek()". The\n'
' *whence* argument is optional and defaults to '
'"os.SEEK_SET" or "0"\n'
' (absolute file positioning); other values are '
'"os.SEEK_CUR" or "1"\n'
' (seek relative to the current position) and '
'"os.SEEK_END" or "2"\n'
" (seek relative to the file's end). There is no "
'return value.\n'
'\n'
' For example, "f.seek(2, os.SEEK_CUR)" advances the '
'position by two\n'
' and "f.seek(-3, os.SEEK_END)" sets the position to '
'the third to\n'
' last.\n'
'\n'
' Note that if the file is opened for appending (mode '
'"\'a\'" or\n'
' "\'a+\'"), any "seek()" operations will be undone '
'at the next write.\n'
' If the file is only opened for writing in append '
'mode (mode "\'a\'"),\n'
' this method is essentially a no-op, but it remains '
'useful for files\n'
' opened in append mode with reading enabled (mode '
'"\'a+\'"). If the\n'
' file is opened in text mode (without "\'b\'"), only '
'offsets returned\n'
' by "tell()" are legal. Use of other offsets causes '
'undefined\n'
' behavior.\n'
'\n'
' Note that not all file objects are seekable.\n'
'\n'
' Changed in version 2.6: Passing float values as '
'offset has been\n'
' deprecated.\n'
'\n'
'file.tell()\n'
'\n'
" Return the file's current position, like "
'"stdio"\'s "ftell()".\n'
'\n'
' Note: On Windows, "tell()" can return illegal '
'values (after an\n'
' "fgets()") when reading files with Unix-style '
'line-endings. Use\n'
' binary mode ("\'rb\'") to circumvent this '
'problem.\n'
'\n'
'file.truncate([size])\n'
'\n'
" Truncate the file's size. If the optional *size* "
'argument is\n'
' present, the file is truncated to (at most) that '
'size. The size\n'
' defaults to the current position. The current file '
'position is not\n'
' changed. Note that if a specified size exceeds the '
"file's current\n"
' size, the result is platform-dependent: '
'possibilities include that\n'
' the file may remain unchanged, increase to the '
'specified size as if\n'
' zero-filled, or increase to the specified size with '
'undefined new\n'
' content. Availability: Windows, many Unix '
'variants.\n'
'\n'
'file.write(str)\n'
'\n'
' Write a string to the file. There is no return '
'value. Due to\n'
' buffering, the string may not actually show up in '
'the file until\n'
' the "flush()" or "close()" method is called.\n'
'\n'
'file.writelines(sequence)\n'
'\n'
' Write a sequence of strings to the file. The '
'sequence can be any\n'
' iterable object producing strings, typically a list '
'of strings.\n'
' There is no return value. (The name is intended to '
'match\n'
' "readlines()"; "writelines()" does not add line '
'separators.)\n'
'\n'
'Files support the iterator protocol. Each iteration '
'returns the same\n'
'result as "readline()", and iteration ends when the '
'"readline()"\n'
'method returns an empty string.\n'
'\n'
'File objects also offer a number of other interesting '
'attributes.\n'
'These are not required for file-like objects, but '
'should be\n'
'implemented if they make sense for the particular '
'object.\n'
'\n'
'file.closed\n'
'\n'
' bool indicating the current state of the file '
'object. This is a\n'
' read-only attribute; the "close()" method changes '
'the value. It may\n'
' not be available on all file-like objects.\n'
'\n'
'file.encoding\n'
'\n'
' The encoding that this file uses. When Unicode '
'strings are written\n'
' to a file, they will be converted to byte strings '
'using this\n'
' encoding. In addition, when the file is connected '
'to a terminal,\n'
' the attribute gives the encoding that the terminal '
'is likely to use\n'
' (that information might be incorrect if the user '
'has misconfigured\n'
' the terminal). The attribute is read-only and may '
'not be present\n'
' on all file-like objects. It may also be "None", in '
'which case the\n'
' file uses the system default encoding for '
'converting Unicode\n'
' strings.\n'
'\n'
' New in version 2.3.\n'
'\n'
'file.errors\n'
'\n'
' The Unicode error handler used along with the '
'encoding.\n'
'\n'
' New in version 2.6.\n'
'\n'
'file.mode\n'
'\n'
' The I/O mode for the file. If the file was created '
'using the\n'
' "open()" built-in function, this will be the value '
'of the *mode*\n'
' parameter. This is a read-only attribute and may '
'not be present on\n'
' all file-like objects.\n'
'\n'
'file.name\n'
'\n'
' If the file object was created using "open()", the '
'name of the\n'
' file. Otherwise, some string that indicates the '
'source of the file\n'
' object, of the form "<...>". This is a read-only '
'attribute and may\n'
' not be present on all file-like objects.\n'
'\n'
'file.newlines\n'
'\n'
' If Python was built with *universal newlines* '
'enabled (the default)\n'
' this read-only attribute exists, and for files '
'opened in universal\n'
' newline read mode it keeps track of the types of '
'newlines\n'
' encountered while reading the file. The values it '
'can take are\n'
' "\'\\r\'", "\'\\n\'", "\'\\r\\n\'", "None" '
'(unknown, no newlines read yet) or\n'
' a tuple containing all the newline types seen, to '
'indicate that\n'
' multiple newline conventions were encountered. For '
'files not opened\n'
' in universal newlines read mode the value of this '
'attribute will be\n'
' "None".\n'
'\n'
'file.softspace\n'
'\n'
' Boolean that indicates whether a space character '
'needs to be\n'
' printed before another value when using the "print" '
'statement.\n'
' Classes that are trying to simulate a file object '
'should also have\n'
' a writable "softspace" attribute, which should be '
'initialized to\n'
' zero. This will be automatic for most classes '
'implemented in\n'
' Python (care may be needed for objects that '
'override attribute\n'
' access); types implemented in C will have to '
'provide a writable\n'
' "softspace" attribute.\n'
'\n'
' Note: This attribute is not used to control the '
'"print"\n'
' statement, but to allow the implementation of '
'"print" to keep\n'
' track of its internal state.\n',
'bltin-null-object': '\n'
'The Null Object\n'
'***************\n'
'\n'
"This object is returned by functions that don't "
'explicitly return a\n'
'value. It supports no special operations. There is '
'exactly one null\n'
'object, named "None" (a built-in name).\n'
'\n'
'It is written as "None".\n',
'bltin-type-objects': '\n'
'Type Objects\n'
'************\n'
'\n'
'Type objects represent the various object types. An '
"object's type is\n"
'accessed by the built-in function "type()". There are '
'no special\n'
'operations on types. The standard module "types" '
'defines names for\n'
'all standard built-in types.\n'
'\n'
'Types are written like this: "<type \'int\'>".\n',
'booleans': '\n'
'Boolean operations\n'
'******************\n'
'\n'
' or_test ::= and_test | or_test "or" and_test\n'
' and_test ::= not_test | and_test "and" not_test\n'
' not_test ::= comparison | "not" not_test\n'
'\n'
'In the context of Boolean operations, and also when expressions '
'are\n'
'used by control flow statements, the following values are '
'interpreted\n'
'as false: "False", "None", numeric zero of all types, and empty\n'
'strings and containers (including strings, tuples, lists,\n'
'dictionaries, sets and frozensets). All other values are '
'interpreted\n'
'as true. (See the "__nonzero__()" special method for a way to '
'change\n'
'this.)\n'
'\n'
'The operator "not" yields "True" if its argument is false, '
'"False"\n'
'otherwise.\n'
'\n'
'The expression "x and y" first evaluates *x*; if *x* is false, '
'its\n'
'value is returned; otherwise, *y* is evaluated and the resulting '
'value\n'
'is returned.\n'
'\n'
'The expression "x or y" first evaluates *x*; if *x* is true, its '
'value\n'
'is returned; otherwise, *y* is evaluated and the resulting value '
'is\n'
'returned.\n'
'\n'
'(Note that neither "and" nor "or" restrict the value and type '
'they\n'
'return to "False" and "True", but rather return the last '
'evaluated\n'
'argument. This is sometimes useful, e.g., if "s" is a string '
'that\n'
'should be replaced by a default value if it is empty, the '
'expression\n'
'"s or \'foo\'" yields the desired value. Because "not" has to '
'invent a\n'
'value anyway, it does not bother to return a value of the same '
'type as\n'
'its argument, so e.g., "not \'foo\'" yields "False", not '
'"\'\'".)\n',
'break': '\n'
'The "break" statement\n'
'*********************\n'
'\n'
' break_stmt ::= "break"\n'
'\n'
'"break" may only occur syntactically nested in a "for" or "while"\n'
'loop, but not nested in a function or class definition within that\n'
'loop.\n'
'\n'
'It terminates the nearest enclosing loop, skipping the optional '
'"else"\n'
'clause if the loop has one.\n'
'\n'
'If a "for" loop is terminated by "break", the loop control target\n'
'keeps its current value.\n'
'\n'
'When "break" passes control out of a "try" statement with a '
'"finally"\n'
'clause, that "finally" clause is executed before really leaving '
'the\n'
'loop.\n',
'callable-types': '\n'
'Emulating callable objects\n'
'**************************\n'
'\n'
'object.__call__(self[, args...])\n'
'\n'
' Called when the instance is "called" as a function; if '
'this method\n'
' is defined, "x(arg1, arg2, ...)" is a shorthand for\n'
' "x.__call__(arg1, arg2, ...)".\n',
'calls': '\n'
'Calls\n'
'*****\n'
'\n'
'A call calls a callable object (e.g., a *function*) with a '
'possibly\n'
'empty series of *arguments*:\n'
'\n'
' call ::= primary "(" [argument_list [","]\n'
' | expression genexpr_for] ")"\n'
' argument_list ::= positional_arguments ["," '
'keyword_arguments]\n'
' ["," "*" expression] ["," '
'keyword_arguments]\n'
' ["," "**" expression]\n'
' | keyword_arguments ["," "*" expression]\n'
' ["," "**" expression]\n'
' | "*" expression ["," keyword_arguments] ["," '
'"**" expression]\n'
' | "**" expression\n'
' positional_arguments ::= expression ("," expression)*\n'
' keyword_arguments ::= keyword_item ("," keyword_item)*\n'
' keyword_item ::= identifier "=" expression\n'
'\n'
'A trailing comma may be present after the positional and keyword\n'
'arguments but does not affect the semantics.\n'
'\n'
'The primary must evaluate to a callable object (user-defined\n'
'functions, built-in functions, methods of built-in objects, class\n'
'objects, methods of class instances, and certain class instances\n'
'themselves are callable; extensions may define additional callable\n'
'object types). All argument expressions are evaluated before the '
'call\n'
'is attempted. Please refer to section Function definitions for '
'the\n'
'syntax of formal *parameter* lists.\n'
'\n'
'If keyword arguments are present, they are first converted to\n'
'positional arguments, as follows. First, a list of unfilled slots '
'is\n'
'created for the formal parameters. If there are N positional\n'
'arguments, they are placed in the first N slots. Next, for each\n'
'keyword argument, the identifier is used to determine the\n'
'corresponding slot (if the identifier is the same as the first '
'formal\n'
'parameter name, the first slot is used, and so on). If the slot '
'is\n'
'already filled, a "TypeError" exception is raised. Otherwise, the\n'
'value of the argument is placed in the slot, filling it (even if '
'the\n'
'expression is "None", it fills the slot). When all arguments have\n'
'been processed, the slots that are still unfilled are filled with '
'the\n'
'corresponding default value from the function definition. '
'(Default\n'
'values are calculated, once, when the function is defined; thus, a\n'
'mutable object such as a list or dictionary used as default value '
'will\n'
"be shared by all calls that don't specify an argument value for "
'the\n'
'corresponding slot; this should usually be avoided.) If there are '
'any\n'
'unfilled slots for which no default value is specified, a '
'"TypeError"\n'
'exception is raised. Otherwise, the list of filled slots is used '
'as\n'
'the argument list for the call.\n'
'\n'
'**CPython implementation detail:** An implementation may provide\n'
'built-in functions whose positional parameters do not have names, '
'even\n'
"if they are 'named' for the purpose of documentation, and which\n"
'therefore cannot be supplied by keyword. In CPython, this is the '
'case\n'
'for functions implemented in C that use "PyArg_ParseTuple()" to '
'parse\n'
'their arguments.\n'
'\n'
'If there are more positional arguments than there are formal '
'parameter\n'
'slots, a "TypeError" exception is raised, unless a formal '
'parameter\n'
'using the syntax "*identifier" is present; in this case, that '
'formal\n'
'parameter receives a tuple containing the excess positional '
'arguments\n'
'(or an empty tuple if there were no excess positional arguments).\n'
'\n'
'If any keyword argument does not correspond to a formal parameter\n'
'name, a "TypeError" exception is raised, unless a formal parameter\n'
'using the syntax "**identifier" is present; in this case, that '
'formal\n'
'parameter receives a dictionary containing the excess keyword\n'
'arguments (using the keywords as keys and the argument values as\n'
'corresponding values), or a (new) empty dictionary if there were '
'no\n'
'excess keyword arguments.\n'
'\n'
'If the syntax "*expression" appears in the function call, '
'"expression"\n'
'must evaluate to an iterable. Elements from this iterable are '
'treated\n'
'as if they were additional positional arguments; if there are\n'
'positional arguments *x1*, ..., *xN*, and "expression" evaluates to '
'a\n'
'sequence *y1*, ..., *yM*, this is equivalent to a call with M+N\n'
'positional arguments *x1*, ..., *xN*, *y1*, ..., *yM*.\n'
'\n'
'A consequence of this is that although the "*expression" syntax '
'may\n'
'appear *after* some keyword arguments, it is processed *before* '
'the\n'
'keyword arguments (and the "**expression" argument, if any -- see\n'
'below). So:\n'
'\n'
' >>> def f(a, b):\n'
' ... print a, b\n'
' ...\n'
' >>> f(b=1, *(2,))\n'
' 2 1\n'
' >>> f(a=1, *(2,))\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in <module>\n'
" TypeError: f() got multiple values for keyword argument 'a'\n"
' >>> f(1, *(2,))\n'
' 1 2\n'
'\n'
'It is unusual for both keyword arguments and the "*expression" '
'syntax\n'
'to be used in the same call, so in practice this confusion does '
'not\n'
'arise.\n'
'\n'
'If the syntax "**expression" appears in the function call,\n'
'"expression" must evaluate to a mapping, the contents of which are\n'
'treated as additional keyword arguments. In the case of a keyword\n'
'appearing in both "expression" and as an explicit keyword argument, '
'a\n'
'"TypeError" exception is raised.\n'
'\n'
'Formal parameters using the syntax "*identifier" or "**identifier"\n'
'cannot be used as positional argument slots or as keyword argument\n'
'names. Formal parameters using the syntax "(sublist)" cannot be '
'used\n'
'as keyword argument names; the outermost sublist corresponds to a\n'
'single unnamed argument slot, and the argument value is assigned '
'to\n'
'the sublist using the usual tuple assignment rules after all other\n'
'parameter processing is done.\n'
'\n'
'A call always returns some value, possibly "None", unless it raises '
'an\n'
'exception. How this value is computed depends on the type of the\n'
'callable object.\n'
'\n'
'If it is---\n'
'\n'
'a user-defined function:\n'
' The code block for the function is executed, passing it the\n'
' argument list. The first thing the code block will do is bind '
'the\n'
' formal parameters to the arguments; this is described in '
'section\n'
' Function definitions. When the code block executes a "return"\n'
' statement, this specifies the return value of the function '
'call.\n'
'\n'
'a built-in function or method:\n'
' The result is up to the interpreter; see Built-in Functions for '
'the\n'
' descriptions of built-in functions and methods.\n'
'\n'
'a class object:\n'
' A new instance of that class is returned.\n'
'\n'
'a class instance method:\n'
' The corresponding user-defined function is called, with an '
'argument\n'
' list that is one longer than the argument list of the call: the\n'
' instance becomes the first argument.\n'
'\n'
'a class instance:\n'
' The class must define a "__call__()" method; the effect is then '
'the\n'
' same as if that method was called.\n',
'class': '\n'
'Class definitions\n'
'*****************\n'
'\n'
'A class definition defines a class object (see section The '
'standard\n'
'type hierarchy):\n'
'\n'
' classdef ::= "class" classname [inheritance] ":" suite\n'
' inheritance ::= "(" [expression_list] ")"\n'
' classname ::= identifier\n'
'\n'
'A class definition is an executable statement. It first evaluates '
'the\n'
'inheritance list, if present. Each item in the inheritance list\n'
'should evaluate to a class object or class type which allows\n'
"subclassing. The class's suite is then executed in a new "
'execution\n'
'frame (see section Naming and binding), using a newly created '
'local\n'
'namespace and the original global namespace. (Usually, the suite\n'
"contains only function definitions.) When the class's suite "
'finishes\n'
'execution, its execution frame is discarded but its local namespace '
'is\n'
'saved. [4] A class object is then created using the inheritance '
'list\n'
'for the base classes and the saved local namespace for the '
'attribute\n'
'dictionary. The class name is bound to this class object in the\n'
'original local namespace.\n'
'\n'
"**Programmer's note:** Variables defined in the class definition "
'are\n'
'class variables; they are shared by all instances. To create '
'instance\n'
'variables, they can be set in a method with "self.name = value". '
'Both\n'
'class and instance variables are accessible through the notation\n'
'""self.name"", and an instance variable hides a class variable '
'with\n'
'the same name when accessed in this way. Class variables can be '
'used\n'
'as defaults for instance variables, but using mutable values there '
'can\n'
'lead to unexpected results. For *new-style class*es, descriptors '
'can\n'
'be used to create instance variables with different implementation\n'
'details.\n'
'\n'
'Class definitions, like function definitions, may be wrapped by one '
'or\n'
'more *decorator* expressions. The evaluation rules for the '
'decorator\n'
'expressions are the same as for functions. The result must be a '
'class\n'
'object, which is then bound to the class name.\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] The exception is propagated to the invocation stack unless\n'
' there is a "finally" clause which happens to raise another\n'
' exception. That new exception causes the old one to be lost.\n'
'\n'
'[2] Currently, control "flows off the end" except in the case of\n'
' an exception or the execution of a "return", "continue", or\n'
' "break" statement.\n'
'\n'
'[3] A string literal appearing as the first statement in the\n'
' function body is transformed into the function\'s "__doc__"\n'
" attribute and therefore the function's *docstring*.\n"
'\n'
'[4] A string literal appearing as the first statement in the class\n'
' body is transformed into the namespace\'s "__doc__" item and\n'
" therefore the class's *docstring*.\n",
'comparisons': '\n'
'Comparisons\n'
'***********\n'
'\n'
'Unlike C, all comparison operations in Python have the same '
'priority,\n'
'which is lower than that of any arithmetic, shifting or '
'bitwise\n'
'operation. Also unlike C, expressions like "a < b < c" have '
'the\n'
'interpretation that is conventional in mathematics:\n'
'\n'
' comparison ::= or_expr ( comp_operator or_expr )*\n'
' comp_operator ::= "<" | ">" | "==" | ">=" | "<=" | "<>" | '
'"!="\n'
' | "is" ["not"] | ["not"] "in"\n'
'\n'
'Comparisons yield boolean values: "True" or "False".\n'
'\n'
'Comparisons can be chained arbitrarily, e.g., "x < y <= z" '
'is\n'
'equivalent to "x < y and y <= z", except that "y" is '
'evaluated only\n'
'once (but in both cases "z" is not evaluated at all when "x < '
'y" is\n'
'found to be false).\n'
'\n'
'Formally, if *a*, *b*, *c*, ..., *y*, *z* are expressions and '
'*op1*,\n'
'*op2*, ..., *opN* are comparison operators, then "a op1 b op2 '
'c ... y\n'
'opN z" is equivalent to "a op1 b and b op2 c and ... y opN '
'z", except\n'
'that each expression is evaluated at most once.\n'
'\n'
'Note that "a op1 b op2 c" doesn\'t imply any kind of '
'comparison between\n'
'*a* and *c*, so that, e.g., "x < y > z" is perfectly legal '
'(though\n'
'perhaps not pretty).\n'
'\n'
'The forms "<>" and "!=" are equivalent; for consistency with '
'C, "!="\n'
'is preferred; where "!=" is mentioned below "<>" is also '
'accepted.\n'
'The "<>" spelling is considered obsolescent.\n'
'\n'
'\n'
'Value comparisons\n'
'=================\n'
'\n'
'The operators "<", ">", "==", ">=", "<=", and "!=" compare '
'the values\n'
'of two objects. The objects do not need to have the same '
'type.\n'
'\n'
'Chapter Objects, values and types states that objects have a '
'value (in\n'
'addition to type and identity). The value of an object is a '
'rather\n'
'abstract notion in Python: For example, there is no canonical '
'access\n'
"method for an object's value. Also, there is no requirement "
'that the\n'
'value of an object should be constructed in a particular way, '
'e.g.\n'
'comprised of all its data attributes. Comparison operators '
'implement a\n'
'particular notion of what the value of an object is. One can '
'think of\n'
'them as defining the value of an object indirectly, by means '
'of their\n'
'comparison implementation.\n'
'\n'
'Types can customize their comparison behavior by implementing '
'a\n'
'"__cmp__()" method or *rich comparison methods* like '
'"__lt__()",\n'
'described in Basic customization.\n'
'\n'
'The default behavior for equality comparison ("==" and "!=") '
'is based\n'
'on the identity of the objects. Hence, equality comparison '
'of\n'
'instances with the same identity results in equality, and '
'equality\n'
'comparison of instances with different identities results in\n'
'inequality. A motivation for this default behavior is the '
'desire that\n'
'all objects should be reflexive (i.e. "x is y" implies "x == '
'y").\n'
'\n'
'The default order comparison ("<", ">", "<=", and ">=") gives '
'a\n'
'consistent but arbitrary order.\n'
'\n'
'(This unusual definition of comparison was used to simplify '
'the\n'
'definition of operations like sorting and the "in" and "not '
'in"\n'
'operators. In the future, the comparison rules for objects '
'of\n'
'different types are likely to change.)\n'
'\n'
'The behavior of the default equality comparison, that '
'instances with\n'
'different identities are always unequal, may be in contrast '
'to what\n'
'types will need that have a sensible definition of object '
'value and\n'
'value-based equality. Such types will need to customize '
'their\n'
'comparison behavior, and in fact, a number of built-in types '
'have done\n'
'that.\n'
'\n'
'The following list describes the comparison behavior of the '
'most\n'
'important built-in types.\n'
'\n'
'* Numbers of built-in numeric types (Numeric Types --- int, '
'float,\n'
' long, complex) and of the standard library types\n'
' "fractions.Fraction" and "decimal.Decimal" can be compared '
'within\n'
' and across their types, with the restriction that complex '
'numbers do\n'
' not support order comparison. Within the limits of the '
'types\n'
' involved, they compare mathematically (algorithmically) '
'correct\n'
' without loss of precision.\n'
'\n'
'* Strings (instances of "str" or "unicode") compare\n'
' lexicographically using the numeric equivalents (the result '
'of the\n'
' built-in function "ord()") of their characters. [4] When '
'comparing\n'
' an 8-bit string and a Unicode string, the 8-bit string is '
'converted\n'
' to Unicode. If the conversion fails, the strings are '
'considered\n'
' unequal.\n'
'\n'
'* Instances of "tuple" or "list" can be compared only within '
'each of\n'
' their types. Equality comparison across these types '
'results in\n'
' unequality, and ordering comparison across these types '
'gives an\n'
' arbitrary order.\n'
'\n'
' These sequences compare lexicographically using comparison '
'of\n'
' corresponding elements, whereby reflexivity of the elements '
'is\n'
' enforced.\n'
'\n'
' In enforcing reflexivity of elements, the comparison of '
'collections\n'
' assumes that for a collection element "x", "x == x" is '
'always true.\n'
' Based on that assumption, element identity is compared '
'first, and\n'
' element comparison is performed only for distinct '
'elements. This\n'
' approach yields the same result as a strict element '
'comparison\n'
' would, if the compared elements are reflexive. For '
'non-reflexive\n'
' elements, the result is different than for strict element\n'
' comparison.\n'
'\n'
' Lexicographical comparison between built-in collections '
'works as\n'
' follows:\n'
'\n'
' * For two collections to compare equal, they must be of the '
'same\n'
' type, have the same length, and each pair of '
'corresponding\n'
' elements must compare equal (for example, "[1,2] == '
'(1,2)" is\n'
' false because the type is not the same).\n'
'\n'
' * Collections are ordered the same as their first unequal '
'elements\n'
' (for example, "cmp([1,2,x], [1,2,y])" returns the same '
'as\n'
' "cmp(x,y)"). If a corresponding element does not exist, '
'the\n'
' shorter collection is ordered first (for example, "[1,2] '
'<\n'
' [1,2,3]" is true).\n'
'\n'
'* Mappings (instances of "dict") compare equal if and only if '
'they\n'
' have equal *(key, value)* pairs. Equality comparison of the '
'keys and\n'
' values enforces reflexivity.\n'
'\n'
' Outcomes other than equality are resolved consistently, but '
'are not\n'
' otherwise defined. [5]\n'
'\n'
'* Most other objects of built-in types compare unequal unless '
'they\n'
' are the same object; the choice whether one object is '
'considered\n'
' smaller or larger than another one is made arbitrarily but\n'
' consistently within one execution of a program.\n'
'\n'
'User-defined classes that customize their comparison behavior '
'should\n'
'follow some consistency rules, if possible:\n'
'\n'
'* Equality comparison should be reflexive. In other words, '
'identical\n'
' objects should compare equal:\n'
'\n'
' "x is y" implies "x == y"\n'
'\n'
'* Comparison should be symmetric. In other words, the '
'following\n'
' expressions should have the same result:\n'
'\n'
' "x == y" and "y == x"\n'
'\n'
' "x != y" and "y != x"\n'
'\n'
' "x < y" and "y > x"\n'
'\n'
' "x <= y" and "y >= x"\n'
'\n'
'* Comparison should be transitive. The following '
'(non-exhaustive)\n'
' examples illustrate that:\n'
'\n'
' "x > y and y > z" implies "x > z"\n'
'\n'
' "x < y and y <= z" implies "x < z"\n'
'\n'
'* Inverse comparison should result in the boolean negation. '
'In other\n'
' words, the following expressions should have the same '
'result:\n'
'\n'
' "x == y" and "not x != y"\n'
'\n'
' "x < y" and "not x >= y" (for total ordering)\n'
'\n'
' "x > y" and "not x <= y" (for total ordering)\n'
'\n'
' The last two expressions apply to totally ordered '
'collections (e.g.\n'
' to sequences, but not to sets or mappings). See also the\n'
' "total_ordering()" decorator.\n'
'\n'
'* The "hash()" result should be consistent with equality. '
'Objects\n'
' that are equal should either have the same hash value, or '
'be marked\n'
' as unhashable.\n'
'\n'
'Python does not enforce these consistency rules.\n'
'\n'
'\n'
'Membership test operations\n'
'==========================\n'
'\n'
'The operators "in" and "not in" test for membership. "x in '
's"\n'
'evaluates to "True" if *x* is a member of *s*, and "False" '
'otherwise.\n'
'"x not in s" returns the negation of "x in s". All built-in '
'sequences\n'
'and set types support this as well as dictionary, for which '
'"in" tests\n'
'whether the dictionary has a given key. For container types '
'such as\n'
'list, tuple, set, frozenset, dict, or collections.deque, the\n'
'expression "x in y" is equivalent to "any(x is e or x == e '
'for e in\n'
'y)".\n'
'\n'
'For the string and bytes types, "x in y" is "True" if and '
'only if *x*\n'
'is a substring of *y*. An equivalent test is "y.find(x) != '
'-1".\n'
'Empty strings are always considered to be a substring of any '
'other\n'
'string, so """ in "abc"" will return "True".\n'
'\n'
'For user-defined classes which define the "__contains__()" '
'method, "x\n'
'in y" returns "True" if "y.__contains__(x)" returns a true '
'value, and\n'
'"False" otherwise.\n'
'\n'
'For user-defined classes which do not define "__contains__()" '
'but do\n'
'define "__iter__()", "x in y" is "True" if some value "z" '
'with "x ==\n'
'z" is produced while iterating over "y". If an exception is '
'raised\n'
'during the iteration, it is as if "in" raised that '
'exception.\n'
'\n'
'Lastly, the old-style iteration protocol is tried: if a class '
'defines\n'
'"__getitem__()", "x in y" is "True" if and only if there is a '
'non-\n'
'negative integer index *i* such that "x == y[i]", and all '
'lower\n'
'integer indices do not raise "IndexError" exception. (If any '
'other\n'
'exception is raised, it is as if "in" raised that '
'exception).\n'
'\n'
'The operator "not in" is defined to have the inverse true '
'value of\n'
'"in".\n'
'\n'
'\n'
'Identity comparisons\n'
'====================\n'
'\n'
'The operators "is" and "is not" test for object identity: "x '
'is y" is\n'
'true if and only if *x* and *y* are the same object. "x is '
'not y"\n'
'yields the inverse truth value. [6]\n',
'compound': '\n'
'Compound statements\n'
'*******************\n'
'\n'
'Compound statements contain (groups of) other statements; they '
'affect\n'
'or control the execution of those other statements in some way. '
'In\n'
'general, compound statements span multiple lines, although in '
'simple\n'
'incarnations a whole compound statement may be contained in one '
'line.\n'
'\n'
'The "if", "while" and "for" statements implement traditional '
'control\n'
'flow constructs. "try" specifies exception handlers and/or '
'cleanup\n'
'code for a group of statements. Function and class definitions '
'are\n'
'also syntactically compound statements.\n'
'\n'
"Compound statements consist of one or more 'clauses.' A clause\n"
"consists of a header and a 'suite.' The clause headers of a\n"
'particular compound statement are all at the same indentation '
'level.\n'
'Each clause header begins with a uniquely identifying keyword '
'and ends\n'
'with a colon. A suite is a group of statements controlled by a\n'
'clause. A suite can be one or more semicolon-separated simple\n'
'statements on the same line as the header, following the '
"header's\n"
'colon, or it can be one or more indented statements on '
'subsequent\n'
'lines. Only the latter form of suite can contain nested '
'compound\n'
"statements; the following is illegal, mostly because it wouldn't "
'be\n'
'clear to which "if" clause a following "else" clause would '
'belong:\n'
'\n'
' if test1: if test2: print x\n'
'\n'
'Also note that the semicolon binds tighter than the colon in '
'this\n'
'context, so that in the following example, either all or none of '
'the\n'
'"print" statements are executed:\n'
'\n'
' if x < y < z: print x; print y; print z\n'
'\n'
'Summarizing:\n'
'\n'
' compound_stmt ::= if_stmt\n'
' | while_stmt\n'
' | for_stmt\n'
' | try_stmt\n'
' | with_stmt\n'
' | funcdef\n'
' | classdef\n'
' | decorated\n'
' suite ::= stmt_list NEWLINE | NEWLINE INDENT '
'statement+ DEDENT\n'
' statement ::= stmt_list NEWLINE | compound_stmt\n'
' stmt_list ::= simple_stmt (";" simple_stmt)* [";"]\n'
'\n'
'Note that statements always end in a "NEWLINE" possibly followed '
'by a\n'
'"DEDENT". Also note that optional continuation clauses always '
'begin\n'
'with a keyword that cannot start a statement, thus there are no\n'
'ambiguities (the \'dangling "else"\' problem is solved in Python '
'by\n'
'requiring nested "if" statements to be indented).\n'
'\n'
'The formatting of the grammar rules in the following sections '
'places\n'
'each clause on a separate line for clarity.\n'
'\n'
'\n'
'The "if" statement\n'
'==================\n'
'\n'
'The "if" statement is used for conditional execution:\n'
'\n'
' if_stmt ::= "if" expression ":" suite\n'
' ( "elif" expression ":" suite )*\n'
' ["else" ":" suite]\n'
'\n'
'It selects exactly one of the suites by evaluating the '
'expressions one\n'
'by one until one is found to be true (see section Boolean '
'operations\n'
'for the definition of true and false); then that suite is '
'executed\n'
'(and no other part of the "if" statement is executed or '
'evaluated).\n'
'If all expressions are false, the suite of the "else" clause, '
'if\n'
'present, is executed.\n'
'\n'
'\n'
'The "while" statement\n'
'=====================\n'
'\n'
'The "while" statement is used for repeated execution as long as '
'an\n'
'expression is true:\n'
'\n'
' while_stmt ::= "while" expression ":" suite\n'
' ["else" ":" suite]\n'
'\n'
'This repeatedly tests the expression and, if it is true, '
'executes the\n'
'first suite; if the expression is false (which may be the first '
'time\n'
'it is tested) the suite of the "else" clause, if present, is '
'executed\n'
'and the loop terminates.\n'
'\n'
'A "break" statement executed in the first suite terminates the '
'loop\n'
'without executing the "else" clause\'s suite. A "continue" '
'statement\n'
'executed in the first suite skips the rest of the suite and goes '
'back\n'
'to testing the expression.\n'
'\n'
'\n'
'The "for" statement\n'
'===================\n'
'\n'
'The "for" statement is used to iterate over the elements of a '
'sequence\n'
'(such as a string, tuple or list) or other iterable object:\n'
'\n'
' for_stmt ::= "for" target_list "in" expression_list ":" '
'suite\n'
' ["else" ":" suite]\n'
'\n'
'The expression list is evaluated once; it should yield an '
'iterable\n'
'object. An iterator is created for the result of the\n'
'"expression_list". The suite is then executed once for each '
'item\n'
'provided by the iterator, in the order of ascending indices. '
'Each\n'
'item in turn is assigned to the target list using the standard '
'rules\n'
'for assignments, and then the suite is executed. When the items '
'are\n'
'exhausted (which is immediately when the sequence is empty), the '
'suite\n'
'in the "else" clause, if present, is executed, and the loop\n'
'terminates.\n'
'\n'
'A "break" statement executed in the first suite terminates the '
'loop\n'
'without executing the "else" clause\'s suite. A "continue" '
'statement\n'
'executed in the first suite skips the rest of the suite and '
'continues\n'
'with the next item, or with the "else" clause if there was no '
'next\n'
'item.\n'
'\n'
'The suite may assign to the variable(s) in the target list; this '
'does\n'
'not affect the next item assigned to it.\n'
'\n'
'The target list is not deleted when the loop is finished, but if '
'the\n'
'sequence is empty, it will not have been assigned to at all by '
'the\n'
'loop. Hint: the built-in function "range()" returns a sequence '
'of\n'
'integers suitable to emulate the effect of Pascal\'s "for i := a '
'to b\n'
'do"; e.g., "range(3)" returns the list "[0, 1, 2]".\n'
'\n'
'Note: There is a subtlety when the sequence is being modified by '
'the\n'
' loop (this can only occur for mutable sequences, i.e. lists). '
'An\n'
' internal counter is used to keep track of which item is used '
'next,\n'
' and this is incremented on each iteration. When this counter '
'has\n'
' reached the length of the sequence the loop terminates. This '
'means\n'
' that if the suite deletes the current (or a previous) item '
'from the\n'
' sequence, the next item will be skipped (since it gets the '
'index of\n'
' the current item which has already been treated). Likewise, '
'if the\n'
' suite inserts an item in the sequence before the current item, '
'the\n'
' current item will be treated again the next time through the '
'loop.\n'
' This can lead to nasty bugs that can be avoided by making a\n'
' temporary copy using a slice of the whole sequence, e.g.,\n'
'\n'
' for x in a[:]:\n'
' if x < 0: a.remove(x)\n'
'\n'
'\n'
'The "try" statement\n'
'===================\n'
'\n'
'The "try" statement specifies exception handlers and/or cleanup '
'code\n'
'for a group of statements:\n'
'\n'
' try_stmt ::= try1_stmt | try2_stmt\n'
' try1_stmt ::= "try" ":" suite\n'
' ("except" [expression [("as" | ",") '
'identifier]] ":" suite)+\n'
' ["else" ":" suite]\n'
' ["finally" ":" suite]\n'
' try2_stmt ::= "try" ":" suite\n'
' "finally" ":" suite\n'
'\n'
'Changed in version 2.5: In previous versions of Python,\n'
'"try"..."except"..."finally" did not work. "try"..."except" had '
'to be\n'
'nested in "try"..."finally".\n'
'\n'
'The "except" clause(s) specify one or more exception handlers. '
'When no\n'
'exception occurs in the "try" clause, no exception handler is\n'
'executed. When an exception occurs in the "try" suite, a search '
'for an\n'
'exception handler is started. This search inspects the except '
'clauses\n'
'in turn until one is found that matches the exception. An '
'expression-\n'
'less except clause, if present, must be last; it matches any\n'
'exception. For an except clause with an expression, that '
'expression\n'
'is evaluated, and the clause matches the exception if the '
'resulting\n'
'object is "compatible" with the exception. An object is '
'compatible\n'
'with an exception if it is the class or a base class of the '
'exception\n'
'object, or a tuple containing an item compatible with the '
'exception.\n'
'\n'
'If no except clause matches the exception, the search for an '
'exception\n'
'handler continues in the surrounding code and on the invocation '
'stack.\n'
'[1]\n'
'\n'
'If the evaluation of an expression in the header of an except '
'clause\n'
'raises an exception, the original search for a handler is '
'canceled and\n'
'a search starts for the new exception in the surrounding code '
'and on\n'
'the call stack (it is treated as if the entire "try" statement '
'raised\n'
'the exception).\n'
'\n'
'When a matching except clause is found, the exception is '
'assigned to\n'
'the target specified in that except clause, if present, and the '
'except\n'
"clause's suite is executed. All except clauses must have an\n"
'executable block. When the end of this block is reached, '
'execution\n'
'continues normally after the entire try statement. (This means '
'that\n'
'if two nested handlers exist for the same exception, and the '
'exception\n'
'occurs in the try clause of the inner handler, the outer handler '
'will\n'
'not handle the exception.)\n'
'\n'
"Before an except clause's suite is executed, details about the\n"
'exception are assigned to three variables in the "sys" module:\n'
'"sys.exc_type" receives the object identifying the exception;\n'
'"sys.exc_value" receives the exception\'s parameter;\n'
'"sys.exc_traceback" receives a traceback object (see section '
'The\n'
'standard type hierarchy) identifying the point in the program '
'where\n'
'the exception occurred. These details are also available through '
'the\n'
'"sys.exc_info()" function, which returns a tuple "(exc_type,\n'
'exc_value, exc_traceback)". Use of the corresponding variables '
'is\n'
'deprecated in favor of this function, since their use is unsafe '
'in a\n'
'threaded program. As of Python 1.5, the variables are restored '
'to\n'
'their previous values (before the call) when returning from a '
'function\n'
'that handled an exception.\n'
'\n'
'The optional "else" clause is executed if and when control flows '
'off\n'
'the end of the "try" clause. [2] Exceptions in the "else" clause '
'are\n'
'not handled by the preceding "except" clauses.\n'
'\n'
'If "finally" is present, it specifies a \'cleanup\' handler. '
'The "try"\n'
'clause is executed, including any "except" and "else" clauses. '
'If an\n'
'exception occurs in any of the clauses and is not handled, the\n'
'exception is temporarily saved. The "finally" clause is '
'executed. If\n'
'there is a saved exception, it is re-raised at the end of the\n'
'"finally" clause. If the "finally" clause raises another '
'exception or\n'
'executes a "return" or "break" statement, the saved exception '
'is\n'
'discarded:\n'
'\n'
' >>> def f():\n'
' ... try:\n'
' ... 1/0\n'
' ... finally:\n'
' ... return 42\n'
' ...\n'
' >>> f()\n'
' 42\n'
'\n'
'The exception information is not available to the program '
'during\n'
'execution of the "finally" clause.\n'
'\n'
'When a "return", "break" or "continue" statement is executed in '
'the\n'
'"try" suite of a "try"..."finally" statement, the "finally" '
'clause is\n'
'also executed \'on the way out.\' A "continue" statement is '
'illegal in\n'
'the "finally" clause. (The reason is a problem with the current\n'
'implementation --- this restriction may be lifted in the '
'future).\n'
'\n'
'The return value of a function is determined by the last '
'"return"\n'
'statement executed. Since the "finally" clause always executes, '
'a\n'
'"return" statement executed in the "finally" clause will always '
'be the\n'
'last one executed:\n'
'\n'
' >>> def foo():\n'
' ... try:\n'
" ... return 'try'\n"
' ... finally:\n'
" ... return 'finally'\n"
' ...\n'
' >>> foo()\n'
" 'finally'\n"
'\n'
'Additional information on exceptions can be found in section\n'
'Exceptions, and information on using the "raise" statement to '
'generate\n'
'exceptions may be found in section The raise statement.\n'
'\n'
'\n'
'The "with" statement\n'
'====================\n'
'\n'
'New in version 2.5.\n'
'\n'
'The "with" statement is used to wrap the execution of a block '
'with\n'
'methods defined by a context manager (see section With '
'Statement\n'
'Context Managers). This allows common '
'"try"..."except"..."finally"\n'
'usage patterns to be encapsulated for convenient reuse.\n'
'\n'
' with_stmt ::= "with" with_item ("," with_item)* ":" suite\n'
' with_item ::= expression ["as" target]\n'
'\n'
'The execution of the "with" statement with one "item" proceeds '
'as\n'
'follows:\n'
'\n'
'1. The context expression (the expression given in the '
'"with_item")\n'
' is evaluated to obtain a context manager.\n'
'\n'
'2. The context manager\'s "__exit__()" is loaded for later use.\n'
'\n'
'3. The context manager\'s "__enter__()" method is invoked.\n'
'\n'
'4. If a target was included in the "with" statement, the return\n'
' value from "__enter__()" is assigned to it.\n'
'\n'
' Note: The "with" statement guarantees that if the '
'"__enter__()"\n'
' method returns without an error, then "__exit__()" will '
'always be\n'
' called. Thus, if an error occurs during the assignment to '
'the\n'
' target list, it will be treated the same as an error '
'occurring\n'
' within the suite would be. See step 6 below.\n'
'\n'
'5. The suite is executed.\n'
'\n'
'6. The context manager\'s "__exit__()" method is invoked. If an\n'
' exception caused the suite to be exited, its type, value, '
'and\n'
' traceback are passed as arguments to "__exit__()". Otherwise, '
'three\n'
' "None" arguments are supplied.\n'
'\n'
' If the suite was exited due to an exception, and the return '
'value\n'
' from the "__exit__()" method was false, the exception is '
'reraised.\n'
' If the return value was true, the exception is suppressed, '
'and\n'
' execution continues with the statement following the "with"\n'
' statement.\n'
'\n'
' If the suite was exited for any reason other than an '
'exception, the\n'
' return value from "__exit__()" is ignored, and execution '
'proceeds\n'
' at the normal location for the kind of exit that was taken.\n'
'\n'
'With more than one item, the context managers are processed as '
'if\n'
'multiple "with" statements were nested:\n'
'\n'
' with A() as a, B() as b:\n'
' suite\n'
'\n'
'is equivalent to\n'
'\n'
' with A() as a:\n'
' with B() as b:\n'
' suite\n'
'\n'
'Note: In Python 2.5, the "with" statement is only allowed when '
'the\n'
' "with_statement" feature has been enabled. It is always '
'enabled in\n'
' Python 2.6.\n'
'\n'
'Changed in version 2.7: Support for multiple context '
'expressions.\n'
'\n'
'See also:\n'
'\n'
' **PEP 343** - The "with" statement\n'
' The specification, background, and examples for the Python '
'"with"\n'
' statement.\n'
'\n'
'\n'
'Function definitions\n'
'====================\n'
'\n'
'A function definition defines a user-defined function object '
'(see\n'
'section The standard type hierarchy):\n'
'\n'
' decorated ::= decorators (classdef | funcdef)\n'
' decorators ::= decorator+\n'
' decorator ::= "@" dotted_name ["(" [argument_list [","]] '
'")"] NEWLINE\n'
' funcdef ::= "def" funcname "(" [parameter_list] ")" '
'":" suite\n'
' dotted_name ::= identifier ("." identifier)*\n'
' parameter_list ::= (defparameter ",")*\n'
' ( "*" identifier ["," "**" identifier]\n'
' | "**" identifier\n'
' | defparameter [","] )\n'
' defparameter ::= parameter ["=" expression]\n'
' sublist ::= parameter ("," parameter)* [","]\n'
' parameter ::= identifier | "(" sublist ")"\n'
' funcname ::= identifier\n'
'\n'
'A function definition is an executable statement. Its execution '
'binds\n'
'the function name in the current local namespace to a function '
'object\n'
'(a wrapper around the executable code for the function). This\n'
'function object contains a reference to the current global '
'namespace\n'
'as the global namespace to be used when the function is called.\n'
'\n'
'The function definition does not execute the function body; this '
'gets\n'
'executed only when the function is called. [3]\n'
'\n'
'A function definition may be wrapped by one or more *decorator*\n'
'expressions. Decorator expressions are evaluated when the '
'function is\n'
'defined, in the scope that contains the function definition. '
'The\n'
'result must be a callable, which is invoked with the function '
'object\n'
'as the only argument. The returned value is bound to the '
'function name\n'
'instead of the function object. Multiple decorators are applied '
'in\n'
'nested fashion. For example, the following code:\n'
'\n'
' @f1(arg)\n'
' @f2\n'
' def func(): pass\n'
'\n'
'is equivalent to:\n'
'\n'
' def func(): pass\n'
' func = f1(arg)(f2(func))\n'
'\n'
'When one or more top-level *parameters* have the form '
'*parameter* "="\n'
'*expression*, the function is said to have "default parameter '
'values."\n'
'For a parameter with a default value, the corresponding '
'*argument* may\n'
"be omitted from a call, in which case the parameter's default "
'value is\n'
'substituted. If a parameter has a default value, all following\n'
'parameters must also have a default value --- this is a '
'syntactic\n'
'restriction that is not expressed by the grammar.\n'
'\n'
'**Default parameter values are evaluated when the function '
'definition\n'
'is executed.** This means that the expression is evaluated '
'once, when\n'
'the function is defined, and that the same "pre-computed" value '
'is\n'
'used for each call. This is especially important to understand '
'when a\n'
'default parameter is a mutable object, such as a list or a '
'dictionary:\n'
'if the function modifies the object (e.g. by appending an item '
'to a\n'
'list), the default value is in effect modified. This is '
'generally not\n'
'what was intended. A way around this is to use "None" as the\n'
'default, and explicitly test for it in the body of the function, '
'e.g.:\n'
'\n'
' def whats_on_the_telly(penguin=None):\n'
' if penguin is None:\n'
' penguin = []\n'
' penguin.append("property of the zoo")\n'
' return penguin\n'
'\n'
'Function call semantics are described in more detail in section '
'Calls.\n'
'A function call always assigns values to all parameters '
'mentioned in\n'
'the parameter list, either from position arguments, from '
'keyword\n'
'arguments, or from default values. If the form ""*identifier"" '
'is\n'
'present, it is initialized to a tuple receiving any excess '
'positional\n'
'parameters, defaulting to the empty tuple. If the form\n'
'""**identifier"" is present, it is initialized to a new '
'dictionary\n'
'receiving any excess keyword arguments, defaulting to a new '
'empty\n'
'dictionary.\n'
'\n'
'It is also possible to create anonymous functions (functions not '
'bound\n'
'to a name), for immediate use in expressions. This uses lambda\n'
'expressions, described in section Lambdas. Note that the '
'lambda\n'
'expression is merely a shorthand for a simplified function '
'definition;\n'
'a function defined in a ""def"" statement can be passed around '
'or\n'
'assigned to another name just like a function defined by a '
'lambda\n'
'expression. The ""def"" form is actually more powerful since '
'it\n'
'allows the execution of multiple statements.\n'
'\n'
"**Programmer's note:** Functions are first-class objects. A "
'""def""\n'
'form executed inside a function definition defines a local '
'function\n'
'that can be returned or passed around. Free variables used in '
'the\n'
'nested function can access the local variables of the function\n'
'containing the def. See section Naming and binding for '
'details.\n'
'\n'
'\n'
'Class definitions\n'
'=================\n'
'\n'
'A class definition defines a class object (see section The '
'standard\n'
'type hierarchy):\n'
'\n'
' classdef ::= "class" classname [inheritance] ":" suite\n'
' inheritance ::= "(" [expression_list] ")"\n'
' classname ::= identifier\n'
'\n'
'A class definition is an executable statement. It first '
'evaluates the\n'
'inheritance list, if present. Each item in the inheritance '
'list\n'
'should evaluate to a class object or class type which allows\n'
"subclassing. The class's suite is then executed in a new "
'execution\n'
'frame (see section Naming and binding), using a newly created '
'local\n'
'namespace and the original global namespace. (Usually, the '
'suite\n'
"contains only function definitions.) When the class's suite "
'finishes\n'
'execution, its execution frame is discarded but its local '
'namespace is\n'
'saved. [4] A class object is then created using the inheritance '
'list\n'
'for the base classes and the saved local namespace for the '
'attribute\n'
'dictionary. The class name is bound to this class object in '
'the\n'
'original local namespace.\n'
'\n'
"**Programmer's note:** Variables defined in the class definition "
'are\n'
'class variables; they are shared by all instances. To create '
'instance\n'
'variables, they can be set in a method with "self.name = '
'value". Both\n'
'class and instance variables are accessible through the '
'notation\n'
'""self.name"", and an instance variable hides a class variable '
'with\n'
'the same name when accessed in this way. Class variables can be '
'used\n'
'as defaults for instance variables, but using mutable values '
'there can\n'
'lead to unexpected results. For *new-style class*es, '
'descriptors can\n'
'be used to create instance variables with different '
'implementation\n'
'details.\n'
'\n'
'Class definitions, like function definitions, may be wrapped by '
'one or\n'
'more *decorator* expressions. The evaluation rules for the '
'decorator\n'
'expressions are the same as for functions. The result must be a '
'class\n'
'object, which is then bound to the class name.\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] The exception is propagated to the invocation stack unless\n'
' there is a "finally" clause which happens to raise another\n'
' exception. That new exception causes the old one to be '
'lost.\n'
'\n'
'[2] Currently, control "flows off the end" except in the case '
'of\n'
' an exception or the execution of a "return", "continue", or\n'
' "break" statement.\n'
'\n'
'[3] A string literal appearing as the first statement in the\n'
' function body is transformed into the function\'s "__doc__"\n'
" attribute and therefore the function's *docstring*.\n"
'\n'
'[4] A string literal appearing as the first statement in the '
'class\n'
' body is transformed into the namespace\'s "__doc__" item '
'and\n'
" therefore the class's *docstring*.\n",
'context-managers': '\n'
'With Statement Context Managers\n'
'*******************************\n'
'\n'
'New in version 2.5.\n'
'\n'
'A *context manager* is an object that defines the '
'runtime context to\n'
'be established when executing a "with" statement. The '
'context manager\n'
'handles the entry into, and the exit from, the desired '
'runtime context\n'
'for the execution of the block of code. Context '
'managers are normally\n'
'invoked using the "with" statement (described in section '
'The with\n'
'statement), but can also be used by directly invoking '
'their methods.\n'
'\n'
'Typical uses of context managers include saving and '
'restoring various\n'
'kinds of global state, locking and unlocking resources, '
'closing opened\n'
'files, etc.\n'
'\n'
'For more information on context managers, see Context '
'Manager Types.\n'
'\n'
'object.__enter__(self)\n'
'\n'
' Enter the runtime context related to this object. The '
'"with"\n'
" statement will bind this method's return value to the "
'target(s)\n'
' specified in the "as" clause of the statement, if '
'any.\n'
'\n'
'object.__exit__(self, exc_type, exc_value, traceback)\n'
'\n'
' Exit the runtime context related to this object. The '
'parameters\n'
' describe the exception that caused the context to be '
'exited. If the\n'
' context was exited without an exception, all three '
'arguments will\n'
' be "None".\n'
'\n'
' If an exception is supplied, and the method wishes to '
'suppress the\n'
' exception (i.e., prevent it from being propagated), '
'it should\n'
' return a true value. Otherwise, the exception will be '
'processed\n'
' normally upon exit from this method.\n'
'\n'
' Note that "__exit__()" methods should not reraise the '
'passed-in\n'
" exception; this is the caller's responsibility.\n"
'\n'
'See also:\n'
'\n'
' **PEP 343** - The "with" statement\n'
' The specification, background, and examples for the '
'Python "with"\n'
' statement.\n',
'continue': '\n'
'The "continue" statement\n'
'************************\n'
'\n'
' continue_stmt ::= "continue"\n'
'\n'
'"continue" may only occur syntactically nested in a "for" or '
'"while"\n'
'loop, but not nested in a function or class definition or '
'"finally"\n'
'clause within that loop. It continues with the next cycle of '
'the\n'
'nearest enclosing loop.\n'
'\n'
'When "continue" passes control out of a "try" statement with a\n'
'"finally" clause, that "finally" clause is executed before '
'really\n'
'starting the next loop cycle.\n',
'conversions': '\n'
'Arithmetic conversions\n'
'**********************\n'
'\n'
'When a description of an arithmetic operator below uses the '
'phrase\n'
'"the numeric arguments are converted to a common type," the '
'arguments\n'
'are coerced using the coercion rules listed at Coercion '
'rules. If\n'
'both arguments are standard numeric types, the following '
'coercions are\n'
'applied:\n'
'\n'
'* If either argument is a complex number, the other is '
'converted to\n'
' complex;\n'
'\n'
'* otherwise, if either argument is a floating point number, '
'the\n'
' other is converted to floating point;\n'
'\n'
'* otherwise, if either argument is a long integer, the other '
'is\n'
' converted to long integer;\n'
'\n'
'* otherwise, both must be plain integers and no conversion '
'is\n'
' necessary.\n'
'\n'
'Some additional rules apply for certain operators (e.g., a '
'string left\n'
"argument to the '%' operator). Extensions can define their "
'own\n'
'coercions.\n',
'customization': '\n'
'Basic customization\n'
'*******************\n'
'\n'
'object.__new__(cls[, ...])\n'
'\n'
' Called to create a new instance of class *cls*. '
'"__new__()" is a\n'
' static method (special-cased so you need not declare it '
'as such)\n'
' that takes the class of which an instance was requested '
'as its\n'
' first argument. The remaining arguments are those '
'passed to the\n'
' object constructor expression (the call to the class). '
'The return\n'
' value of "__new__()" should be the new object instance '
'(usually an\n'
' instance of *cls*).\n'
'\n'
' Typical implementations create a new instance of the '
'class by\n'
' invoking the superclass\'s "__new__()" method using\n'
' "super(currentclass, cls).__new__(cls[, ...])" with '
'appropriate\n'
' arguments and then modifying the newly-created instance '
'as\n'
' necessary before returning it.\n'
'\n'
' If "__new__()" returns an instance of *cls*, then the '
'new\n'
' instance\'s "__init__()" method will be invoked like\n'
' "__init__(self[, ...])", where *self* is the new '
'instance and the\n'
' remaining arguments are the same as were passed to '
'"__new__()".\n'
'\n'
' If "__new__()" does not return an instance of *cls*, '
'then the new\n'
' instance\'s "__init__()" method will not be invoked.\n'
'\n'
' "__new__()" is intended mainly to allow subclasses of '
'immutable\n'
' types (like int, str, or tuple) to customize instance '
'creation. It\n'
' is also commonly overridden in custom metaclasses in '
'order to\n'
' customize class creation.\n'
'\n'
'object.__init__(self[, ...])\n'
'\n'
' Called after the instance has been created (by '
'"__new__()"), but\n'
' before it is returned to the caller. The arguments are '
'those\n'
' passed to the class constructor expression. If a base '
'class has an\n'
' "__init__()" method, the derived class\'s "__init__()" '
'method, if\n'
' any, must explicitly call it to ensure proper '
'initialization of the\n'
' base class part of the instance; for example:\n'
' "BaseClass.__init__(self, [args...])".\n'
'\n'
' Because "__new__()" and "__init__()" work together in '
'constructing\n'
' objects ("__new__()" to create it, and "__init__()" to '
'customise\n'
' it), no non-"None" value may be returned by '
'"__init__()"; doing so\n'
' will cause a "TypeError" to be raised at runtime.\n'
'\n'
'object.__del__(self)\n'
'\n'
' Called when the instance is about to be destroyed. This '
'is also\n'
' called a destructor. If a base class has a "__del__()" '
'method, the\n'
' derived class\'s "__del__()" method, if any, must '
'explicitly call it\n'
' to ensure proper deletion of the base class part of the '
'instance.\n'
' Note that it is possible (though not recommended!) for '
'the\n'
' "__del__()" method to postpone destruction of the '
'instance by\n'
' creating a new reference to it. It may then be called '
'at a later\n'
' time when this new reference is deleted. It is not '
'guaranteed that\n'
' "__del__()" methods are called for objects that still '
'exist when\n'
' the interpreter exits.\n'
'\n'
' Note: "del x" doesn\'t directly call "x.__del__()" --- '
'the former\n'
' decrements the reference count for "x" by one, and the '
'latter is\n'
' only called when "x"\'s reference count reaches zero. '
'Some common\n'
' situations that may prevent the reference count of an '
'object from\n'
' going to zero include: circular references between '
'objects (e.g.,\n'
' a doubly-linked list or a tree data structure with '
'parent and\n'
' child pointers); a reference to the object on the '
'stack frame of\n'
' a function that caught an exception (the traceback '
'stored in\n'
' "sys.exc_traceback" keeps the stack frame alive); or a '
'reference\n'
' to the object on the stack frame that raised an '
'unhandled\n'
' exception in interactive mode (the traceback stored '
'in\n'
' "sys.last_traceback" keeps the stack frame alive). '
'The first\n'
' situation can only be remedied by explicitly breaking '
'the cycles;\n'
' the latter two situations can be resolved by storing '
'"None" in\n'
' "sys.exc_traceback" or "sys.last_traceback". Circular '
'references\n'
' which are garbage are detected when the option cycle '
'detector is\n'
" enabled (it's on by default), but can only be cleaned "
'up if there\n'
' are no Python-level "__del__()" methods involved. '
'Refer to the\n'
' documentation for the "gc" module for more information '
'about how\n'
' "__del__()" methods are handled by the cycle '
'detector,\n'
' particularly the description of the "garbage" value.\n'
'\n'
' Warning: Due to the precarious circumstances under '
'which\n'
' "__del__()" methods are invoked, exceptions that occur '
'during\n'
' their execution are ignored, and a warning is printed '
'to\n'
' "sys.stderr" instead. Also, when "__del__()" is '
'invoked in\n'
' response to a module being deleted (e.g., when '
'execution of the\n'
' program is done), other globals referenced by the '
'"__del__()"\n'
' method may already have been deleted or in the process '
'of being\n'
' torn down (e.g. the import machinery shutting down). '
'For this\n'
' reason, "__del__()" methods should do the absolute '
'minimum needed\n'
' to maintain external invariants. Starting with '
'version 1.5,\n'
' Python guarantees that globals whose name begins with '
'a single\n'
' underscore are deleted from their module before other '
'globals are\n'
' deleted; if no other references to such globals exist, '
'this may\n'
' help in assuring that imported modules are still '
'available at the\n'
' time when the "__del__()" method is called.\n'
'\n'
' See also the "-R" command-line option.\n'
'\n'
'object.__repr__(self)\n'
'\n'
' Called by the "repr()" built-in function and by string '
'conversions\n'
' (reverse quotes) to compute the "official" string '
'representation of\n'
' an object. If at all possible, this should look like a '
'valid\n'
' Python expression that could be used to recreate an '
'object with the\n'
' same value (given an appropriate environment). If this '
'is not\n'
' possible, a string of the form "<...some useful '
'description...>"\n'
' should be returned. The return value must be a string '
'object. If a\n'
' class defines "__repr__()" but not "__str__()", then '
'"__repr__()"\n'
' is also used when an "informal" string representation of '
'instances\n'
' of that class is required.\n'
'\n'
' This is typically used for debugging, so it is important '
'that the\n'
' representation is information-rich and unambiguous.\n'
'\n'
'object.__str__(self)\n'
'\n'
' Called by the "str()" built-in function and by the '
'"print"\n'
' statement to compute the "informal" string '
'representation of an\n'
' object. This differs from "__repr__()" in that it does '
'not have to\n'
' be a valid Python expression: a more convenient or '
'concise\n'
' representation may be used instead. The return value '
'must be a\n'
' string object.\n'
'\n'
'object.__lt__(self, other)\n'
'object.__le__(self, other)\n'
'object.__eq__(self, other)\n'
'object.__ne__(self, other)\n'
'object.__gt__(self, other)\n'
'object.__ge__(self, other)\n'
'\n'
' New in version 2.1.\n'
'\n'
' These are the so-called "rich comparison" methods, and '
'are called\n'
' for comparison operators in preference to "__cmp__()" '
'below. The\n'
' correspondence between operator symbols and method names '
'is as\n'
' follows: "x<y" calls "x.__lt__(y)", "x<=y" calls '
'"x.__le__(y)",\n'
' "x==y" calls "x.__eq__(y)", "x!=y" and "x<>y" call '
'"x.__ne__(y)",\n'
' "x>y" calls "x.__gt__(y)", and "x>=y" calls '
'"x.__ge__(y)".\n'
'\n'
' A rich comparison method may return the singleton '
'"NotImplemented"\n'
' if it does not implement the operation for a given pair '
'of\n'
' arguments. By convention, "False" and "True" are '
'returned for a\n'
' successful comparison. However, these methods can return '
'any value,\n'
' so if the comparison operator is used in a Boolean '
'context (e.g.,\n'
' in the condition of an "if" statement), Python will call '
'"bool()"\n'
' on the value to determine if the result is true or '
'false.\n'
'\n'
' There are no implied relationships among the comparison '
'operators.\n'
' The truth of "x==y" does not imply that "x!=y" is '
'false.\n'
' Accordingly, when defining "__eq__()", one should also '
'define\n'
' "__ne__()" so that the operators will behave as '
'expected. See the\n'
' paragraph on "__hash__()" for some important notes on '
'creating\n'
' *hashable* objects which support custom comparison '
'operations and\n'
' are usable as dictionary keys.\n'
'\n'
' There are no swapped-argument versions of these methods '
'(to be used\n'
' when the left argument does not support the operation '
'but the right\n'
' argument does); rather, "__lt__()" and "__gt__()" are '
"each other's\n"
' reflection, "__le__()" and "__ge__()" are each other\'s '
'reflection,\n'
' and "__eq__()" and "__ne__()" are their own reflection.\n'
'\n'
' Arguments to rich comparison methods are never coerced.\n'
'\n'
' To automatically generate ordering operations from a '
'single root\n'
' operation, see "functools.total_ordering()".\n'
'\n'
'object.__cmp__(self, other)\n'
'\n'
' Called by comparison operations if rich comparison (see '
'above) is\n'
' not defined. Should return a negative integer if "self '
'< other",\n'
' zero if "self == other", a positive integer if "self > '
'other". If\n'
' no "__cmp__()", "__eq__()" or "__ne__()" operation is '
'defined,\n'
' class instances are compared by object identity '
'("address"). See\n'
' also the description of "__hash__()" for some important '
'notes on\n'
' creating *hashable* objects which support custom '
'comparison\n'
' operations and are usable as dictionary keys. (Note: '
'the\n'
' restriction that exceptions are not propagated by '
'"__cmp__()" has\n'
' been removed since Python 1.5.)\n'
'\n'
'object.__rcmp__(self, other)\n'
'\n'
' Changed in version 2.1: No longer supported.\n'
'\n'
'object.__hash__(self)\n'
'\n'
' Called by built-in function "hash()" and for operations '
'on members\n'
' of hashed collections including "set", "frozenset", and '
'"dict".\n'
' "__hash__()" should return an integer. The only '
'required property\n'
' is that objects which compare equal have the same hash '
'value; it is\n'
' advised to mix together the hash values of the '
'components of the\n'
' object that also play a part in comparison of objects by '
'packing\n'
' them into a tuple and hashing the tuple. Example:\n'
'\n'
' def __hash__(self):\n'
' return hash((self.name, self.nick, self.color))\n'
'\n'
' If a class does not define a "__cmp__()" or "__eq__()" '
'method it\n'
' should not define a "__hash__()" operation either; if it '
'defines\n'
' "__cmp__()" or "__eq__()" but not "__hash__()", its '
'instances will\n'
' not be usable in hashed collections. If a class defines '
'mutable\n'
' objects and implements a "__cmp__()" or "__eq__()" '
'method, it\n'
' should not implement "__hash__()", since hashable '
'collection\n'
" implementations require that an object's hash value is "
'immutable\n'
" (if the object's hash value changes, it will be in the "
'wrong hash\n'
' bucket).\n'
'\n'
' User-defined classes have "__cmp__()" and "__hash__()" '
'methods by\n'
' default; with them, all objects compare unequal (except '
'with\n'
' themselves) and "x.__hash__()" returns a result derived '
'from\n'
' "id(x)".\n'
'\n'
' Classes which inherit a "__hash__()" method from a '
'parent class but\n'
' change the meaning of "__cmp__()" or "__eq__()" such '
'that the hash\n'
' value returned is no longer appropriate (e.g. by '
'switching to a\n'
' value-based concept of equality instead of the default '
'identity\n'
' based equality) can explicitly flag themselves as being '
'unhashable\n'
' by setting "__hash__ = None" in the class definition. '
'Doing so\n'
' means that not only will instances of the class raise '
'an\n'
' appropriate "TypeError" when a program attempts to '
'retrieve their\n'
' hash value, but they will also be correctly identified '
'as\n'
' unhashable when checking "isinstance(obj, '
'collections.Hashable)"\n'
' (unlike classes which define their own "__hash__()" to '
'explicitly\n'
' raise "TypeError").\n'
'\n'
' Changed in version 2.5: "__hash__()" may now also return '
'a long\n'
' integer object; the 32-bit integer is then derived from '
'the hash of\n'
' that object.\n'
'\n'
' Changed in version 2.6: "__hash__" may now be set to '
'"None" to\n'
' explicitly flag instances of a class as unhashable.\n'
'\n'
'object.__nonzero__(self)\n'
'\n'
' Called to implement truth value testing and the built-in '
'operation\n'
' "bool()"; should return "False" or "True", or their '
'integer\n'
' equivalents "0" or "1". When this method is not '
'defined,\n'
' "__len__()" is called, if it is defined, and the object '
'is\n'
' considered true if its result is nonzero. If a class '
'defines\n'
' neither "__len__()" nor "__nonzero__()", all its '
'instances are\n'
' considered true.\n'
'\n'
'object.__unicode__(self)\n'
'\n'
' Called to implement "unicode()" built-in; should return '
'a Unicode\n'
' object. When this method is not defined, string '
'conversion is\n'
' attempted, and the result of string conversion is '
'converted to\n'
' Unicode using the system default encoding.\n',
'debugger': '\n'
'"pdb" --- The Python Debugger\n'
'*****************************\n'
'\n'
'**Source code:** Lib/pdb.py\n'
'\n'
'======================================================================\n'
'\n'
'The module "pdb" defines an interactive source code debugger '
'for\n'
'Python programs. It supports setting (conditional) breakpoints '
'and\n'
'single stepping at the source line level, inspection of stack '
'frames,\n'
'source code listing, and evaluation of arbitrary Python code in '
'the\n'
'context of any stack frame. It also supports post-mortem '
'debugging\n'
'and can be called under program control.\n'
'\n'
'The debugger is extensible --- it is actually defined as the '
'class\n'
'"Pdb". This is currently undocumented but easily understood by '
'reading\n'
'the source. The extension interface uses the modules "bdb" and '
'"cmd".\n'
'\n'
'The debugger\'s prompt is "(Pdb)". Typical usage to run a '
'program under\n'
'control of the debugger is:\n'
'\n'
' >>> import pdb\n'
' >>> import mymodule\n'
" >>> pdb.run('mymodule.test()')\n"
' > <string>(0)?()\n'
' (Pdb) continue\n'
' > <string>(1)?()\n'
' (Pdb) continue\n'
" NameError: 'spam'\n"
' > <string>(1)?()\n'
' (Pdb)\n'
'\n'
'"pdb.py" can also be invoked as a script to debug other '
'scripts. For\n'
'example:\n'
'\n'
' python -m pdb myscript.py\n'
'\n'
'When invoked as a script, pdb will automatically enter '
'post-mortem\n'
'debugging if the program being debugged exits abnormally. After '
'post-\n'
'mortem debugging (or after normal exit of the program), pdb '
'will\n'
"restart the program. Automatic restarting preserves pdb's state "
'(such\n'
'as breakpoints) and in most cases is more useful than quitting '
'the\n'
"debugger upon program's exit.\n"
'\n'
'New in version 2.4: Restarting post-mortem behavior added.\n'
'\n'
'The typical usage to break into the debugger from a running '
'program is\n'
'to insert\n'
'\n'
' import pdb; pdb.set_trace()\n'
'\n'
'at the location you want to break into the debugger. You can '
'then\n'
'step through the code following this statement, and continue '
'running\n'
'without the debugger using the "c" command.\n'
'\n'
'The typical usage to inspect a crashed program is:\n'
'\n'
' >>> import pdb\n'
' >>> import mymodule\n'
' >>> mymodule.test()\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in <module>\n'
' File "./mymodule.py", line 4, in test\n'
' test2()\n'
' File "./mymodule.py", line 3, in test2\n'
' print spam\n'
' NameError: spam\n'
' >>> pdb.pm()\n'
' > ./mymodule.py(3)test2()\n'
' -> print spam\n'
' (Pdb)\n'
'\n'
'The module defines the following functions; each enters the '
'debugger\n'
'in a slightly different way:\n'
'\n'
'pdb.run(statement[, globals[, locals]])\n'
'\n'
' Execute the *statement* (given as a string) under debugger '
'control.\n'
' The debugger prompt appears before any code is executed; you '
'can\n'
' set breakpoints and type "continue", or you can step through '
'the\n'
' statement using "step" or "next" (all these commands are '
'explained\n'
' below). The optional *globals* and *locals* arguments '
'specify the\n'
' environment in which the code is executed; by default the\n'
' dictionary of the module "__main__" is used. (See the '
'explanation\n'
' of the "exec" statement or the "eval()" built-in function.)\n'
'\n'
'pdb.runeval(expression[, globals[, locals]])\n'
'\n'
' Evaluate the *expression* (given as a string) under debugger\n'
' control. When "runeval()" returns, it returns the value of '
'the\n'
' expression. Otherwise this function is similar to "run()".\n'
'\n'
'pdb.runcall(function[, argument, ...])\n'
'\n'
' Call the *function* (a function or method object, not a '
'string)\n'
' with the given arguments. When "runcall()" returns, it '
'returns\n'
' whatever the function call returned. The debugger prompt '
'appears\n'
' as soon as the function is entered.\n'
'\n'
'pdb.set_trace()\n'
'\n'
' Enter the debugger at the calling stack frame. This is '
'useful to\n'
' hard-code a breakpoint at a given point in a program, even if '
'the\n'
' code is not otherwise being debugged (e.g. when an assertion\n'
' fails).\n'
'\n'
'pdb.post_mortem([traceback])\n'
'\n'
' Enter post-mortem debugging of the given *traceback* object. '
'If no\n'
' *traceback* is given, it uses the one of the exception that '
'is\n'
' currently being handled (an exception must be being handled '
'if the\n'
' default is to be used).\n'
'\n'
'pdb.pm()\n'
'\n'
' Enter post-mortem debugging of the traceback found in\n'
' "sys.last_traceback".\n'
'\n'
'The "run*" functions and "set_trace()" are aliases for '
'instantiating\n'
'the "Pdb" class and calling the method of the same name. If you '
'want\n'
'to access further features, you have to do this yourself:\n'
'\n'
"class pdb.Pdb(completekey='tab', stdin=None, stdout=None, "
'skip=None)\n'
'\n'
' "Pdb" is the debugger class.\n'
'\n'
' The *completekey*, *stdin* and *stdout* arguments are passed '
'to the\n'
' underlying "cmd.Cmd" class; see the description there.\n'
'\n'
' The *skip* argument, if given, must be an iterable of '
'glob-style\n'
' module name patterns. The debugger will not step into frames '
'that\n'
' originate in a module that matches one of these patterns. '
'[1]\n'
'\n'
' Example call to enable tracing with *skip*:\n'
'\n'
" import pdb; pdb.Pdb(skip=['django.*']).set_trace()\n"
'\n'
' New in version 2.7: The *skip* argument.\n'
'\n'
' run(statement[, globals[, locals]])\n'
' runeval(expression[, globals[, locals]])\n'
' runcall(function[, argument, ...])\n'
' set_trace()\n'
'\n'
' See the documentation for the functions explained above.\n',
'del': '\n'
'The "del" statement\n'
'*******************\n'
'\n'
' del_stmt ::= "del" target_list\n'
'\n'
'Deletion is recursively defined very similar to the way assignment '
'is\n'
'defined. Rather than spelling it out in full details, here are some\n'
'hints.\n'
'\n'
'Deletion of a target list recursively deletes each target, from left\n'
'to right.\n'
'\n'
'Deletion of a name removes the binding of that name from the local '
'or\n'
'global namespace, depending on whether the name occurs in a "global"\n'
'statement in the same code block. If the name is unbound, a\n'
'"NameError" exception will be raised.\n'
'\n'
'It is illegal to delete a name from the local namespace if it occurs\n'
'as a free variable in a nested block.\n'
'\n'
'Deletion of attribute references, subscriptions and slicings is '
'passed\n'
'to the primary object involved; deletion of a slicing is in general\n'
'equivalent to assignment of an empty slice of the right type (but '
'even\n'
'this is determined by the sliced object).\n',
'dict': '\n'
'Dictionary displays\n'
'*******************\n'
'\n'
'A dictionary display is a possibly empty series of key/datum pairs\n'
'enclosed in curly braces:\n'
'\n'
' dict_display ::= "{" [key_datum_list | dict_comprehension] '
'"}"\n'
' key_datum_list ::= key_datum ("," key_datum)* [","]\n'
' key_datum ::= expression ":" expression\n'
' dict_comprehension ::= expression ":" expression comp_for\n'
'\n'
'A dictionary display yields a new dictionary object.\n'
'\n'
'If a comma-separated sequence of key/datum pairs is given, they are\n'
'evaluated from left to right to define the entries of the '
'dictionary:\n'
'each key object is used as a key into the dictionary to store the\n'
'corresponding datum. This means that you can specify the same key\n'
"multiple times in the key/datum list, and the final dictionary's "
'value\n'
'for that key will be the last one given.\n'
'\n'
'A dict comprehension, in contrast to list and set comprehensions,\n'
'needs two expressions separated with a colon followed by the usual\n'
'"for" and "if" clauses. When the comprehension is run, the '
'resulting\n'
'key and value elements are inserted in the new dictionary in the '
'order\n'
'they are produced.\n'
'\n'
'Restrictions on the types of the key values are listed earlier in\n'
'section The standard type hierarchy. (To summarize, the key type\n'
'should be *hashable*, which excludes all mutable objects.) Clashes\n'
'between duplicate keys are not detected; the last datum (textually\n'
'rightmost in the display) stored for a given key value prevails.\n',
'dynamic-features': '\n'
'Interaction with dynamic features\n'
'*********************************\n'
'\n'
'There are several cases where Python statements are '
'illegal when used\n'
'in conjunction with nested scopes that contain free '
'variables.\n'
'\n'
'If a variable is referenced in an enclosing scope, it is '
'illegal to\n'
'delete the name. An error will be reported at compile '
'time.\n'
'\n'
'If the wild card form of import --- "import *" --- is '
'used in a\n'
'function and the function contains or is a nested block '
'with free\n'
'variables, the compiler will raise a "SyntaxError".\n'
'\n'
'If "exec" is used in a function and the function '
'contains or is a\n'
'nested block with free variables, the compiler will '
'raise a\n'
'"SyntaxError" unless the exec explicitly specifies the '
'local namespace\n'
'for the "exec". (In other words, "exec obj" would be '
'illegal, but\n'
'"exec obj in ns" would be legal.)\n'
'\n'
'The "eval()", "execfile()", and "input()" functions and '
'the "exec"\n'
'statement do not have access to the full environment for '
'resolving\n'
'names. Names may be resolved in the local and global '
'namespaces of\n'
'the caller. Free variables are not resolved in the '
'nearest enclosing\n'
'namespace, but in the global namespace. [1] The "exec" '
'statement and\n'
'the "eval()" and "execfile()" functions have optional '
'arguments to\n'
'override the global and local namespace. If only one '
'namespace is\n'
'specified, it is used for both.\n',
'else': '\n'
'The "if" statement\n'
'******************\n'
'\n'
'The "if" statement is used for conditional execution:\n'
'\n'
' if_stmt ::= "if" expression ":" suite\n'
' ( "elif" expression ":" suite )*\n'
' ["else" ":" suite]\n'
'\n'
'It selects exactly one of the suites by evaluating the expressions '
'one\n'
'by one until one is found to be true (see section Boolean '
'operations\n'
'for the definition of true and false); then that suite is executed\n'
'(and no other part of the "if" statement is executed or evaluated).\n'
'If all expressions are false, the suite of the "else" clause, if\n'
'present, is executed.\n',
'exceptions': '\n'
'Exceptions\n'
'**********\n'
'\n'
'Exceptions are a means of breaking out of the normal flow of '
'control\n'
'of a code block in order to handle errors or other '
'exceptional\n'
'conditions. An exception is *raised* at the point where the '
'error is\n'
'detected; it may be *handled* by the surrounding code block or '
'by any\n'
'code block that directly or indirectly invoked the code block '
'where\n'
'the error occurred.\n'
'\n'
'The Python interpreter raises an exception when it detects a '
'run-time\n'
'error (such as division by zero). A Python program can also\n'
'explicitly raise an exception with the "raise" statement. '
'Exception\n'
'handlers are specified with the "try" ... "except" statement. '
'The\n'
'"finally" clause of such a statement can be used to specify '
'cleanup\n'
'code which does not handle the exception, but is executed '
'whether an\n'
'exception occurred or not in the preceding code.\n'
'\n'
'Python uses the "termination" model of error handling: an '
'exception\n'
'handler can find out what happened and continue execution at '
'an outer\n'
'level, but it cannot repair the cause of the error and retry '
'the\n'
'failing operation (except by re-entering the offending piece '
'of code\n'
'from the top).\n'
'\n'
'When an exception is not handled at all, the interpreter '
'terminates\n'
'execution of the program, or returns to its interactive main '
'loop. In\n'
'either case, it prints a stack backtrace, except when the '
'exception is\n'
'"SystemExit".\n'
'\n'
'Exceptions are identified by class instances. The "except" '
'clause is\n'
'selected depending on the class of the instance: it must '
'reference the\n'
'class of the instance or a base class thereof. The instance '
'can be\n'
'received by the handler and can carry additional information '
'about the\n'
'exceptional condition.\n'
'\n'
'Exceptions can also be identified by strings, in which case '
'the\n'
'"except" clause is selected by object identity. An arbitrary '
'value\n'
'can be raised along with the identifying string which can be '
'passed to\n'
'the handler.\n'
'\n'
'Note: Messages to exceptions are not part of the Python API. '
'Their\n'
' contents may change from one version of Python to the next '
'without\n'
' warning and should not be relied on by code which will run '
'under\n'
' multiple versions of the interpreter.\n'
'\n'
'See also the description of the "try" statement in section The '
'try\n'
'statement and "raise" statement in section The raise '
'statement.\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] This limitation occurs because the code that is executed '
'by\n'
' these operations is not available at the time the module '
'is\n'
' compiled.\n',
'exec': '\n'
'The "exec" statement\n'
'********************\n'
'\n'
' exec_stmt ::= "exec" or_expr ["in" expression ["," expression]]\n'
'\n'
'This statement supports dynamic execution of Python code. The '
'first\n'
'expression should evaluate to either a Unicode string, a *Latin-1*\n'
'encoded string, an open file object, a code object, or a tuple. If '
'it\n'
'is a string, the string is parsed as a suite of Python statements\n'
'which is then executed (unless a syntax error occurs). [1] If it is '
'an\n'
'open file, the file is parsed until EOF and executed. If it is a '
'code\n'
'object, it is simply executed. For the interpretation of a tuple, '
'see\n'
"below. In all cases, the code that's executed is expected to be "
'valid\n'
'as file input (see section File input). Be aware that the "return"\n'
'and "yield" statements may not be used outside of function '
'definitions\n'
'even within the context of code passed to the "exec" statement.\n'
'\n'
'In all cases, if the optional parts are omitted, the code is '
'executed\n'
'in the current scope. If only the first expression after "in" is\n'
'specified, it should be a dictionary, which will be used for both '
'the\n'
'global and the local variables. If two expressions are given, they\n'
'are used for the global and local variables, respectively. If\n'
'provided, *locals* can be any mapping object. Remember that at '
'module\n'
'level, globals and locals are the same dictionary. If two separate\n'
'objects are given as *globals* and *locals*, the code will be '
'executed\n'
'as if it were embedded in a class definition.\n'
'\n'
'The first expression may also be a tuple of length 2 or 3. In this\n'
'case, the optional parts must be omitted. The form "exec(expr,\n'
'globals)" is equivalent to "exec expr in globals", while the form\n'
'"exec(expr, globals, locals)" is equivalent to "exec expr in '
'globals,\n'
'locals". The tuple form of "exec" provides compatibility with '
'Python\n'
'3, where "exec" is a function rather than a statement.\n'
'\n'
'Changed in version 2.4: Formerly, *locals* was required to be a\n'
'dictionary.\n'
'\n'
'As a side effect, an implementation may insert additional keys into\n'
'the dictionaries given besides those corresponding to variable '
'names\n'
'set by the executed code. For example, the current implementation '
'may\n'
'add a reference to the dictionary of the built-in module '
'"__builtin__"\n'
'under the key "__builtins__" (!).\n'
'\n'
"**Programmer's hints:** dynamic evaluation of expressions is "
'supported\n'
'by the built-in function "eval()". The built-in functions '
'"globals()"\n'
'and "locals()" return the current global and local dictionary,\n'
'respectively, which may be useful to pass around for use by "exec".\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] Note that the parser only accepts the Unix-style end of line\n'
' convention. If you are reading the code from a file, make sure '
'to\n'
' use *universal newlines* mode to convert Windows or Mac-style\n'
' newlines.\n',
'execmodel': '\n'
'Execution model\n'
'***************\n'
'\n'
'\n'
'Naming and binding\n'
'==================\n'
'\n'
'*Names* refer to objects. Names are introduced by name '
'binding\n'
'operations. Each occurrence of a name in the program text '
'refers to\n'
'the *binding* of that name established in the innermost '
'function block\n'
'containing the use.\n'
'\n'
'A *block* is a piece of Python program text that is executed as '
'a\n'
'unit. The following are blocks: a module, a function body, and '
'a class\n'
'definition. Each command typed interactively is a block. A '
'script\n'
'file (a file given as standard input to the interpreter or '
'specified\n'
'on the interpreter command line the first argument) is a code '
'block.\n'
'A script command (a command specified on the interpreter '
'command line\n'
"with the '**-c**' option) is a code block. The file read by "
'the\n'
'built-in function "execfile()" is a code block. The string '
'argument\n'
'passed to the built-in function "eval()" and to the "exec" '
'statement\n'
'is a code block. The expression read and evaluated by the '
'built-in\n'
'function "input()" is a code block.\n'
'\n'
'A code block is executed in an *execution frame*. A frame '
'contains\n'
'some administrative information (used for debugging) and '
'determines\n'
"where and how execution continues after the code block's "
'execution has\n'
'completed.\n'
'\n'
'A *scope* defines the visibility of a name within a block. If '
'a local\n'
'variable is defined in a block, its scope includes that block. '
'If the\n'
'definition occurs in a function block, the scope extends to any '
'blocks\n'
'contained within the defining one, unless a contained block '
'introduces\n'
'a different binding for the name. The scope of names defined '
'in a\n'
'class block is limited to the class block; it does not extend '
'to the\n'
'code blocks of methods -- this includes generator expressions '
'since\n'
'they are implemented using a function scope. This means that '
'the\n'
'following will fail:\n'
'\n'
' class A:\n'
' a = 42\n'
' b = list(a + i for i in range(10))\n'
'\n'
'When a name is used in a code block, it is resolved using the '
'nearest\n'
'enclosing scope. The set of all such scopes visible to a code '
'block\n'
"is called the block's *environment*.\n"
'\n'
'If a name is bound in a block, it is a local variable of that '
'block.\n'
'If a name is bound at the module level, it is a global '
'variable. (The\n'
'variables of the module code block are local and global.) If '
'a\n'
'variable is used in a code block but not defined there, it is a '
'*free\n'
'variable*.\n'
'\n'
'When a name is not found at all, a "NameError" exception is '
'raised.\n'
'If the name refers to a local variable that has not been bound, '
'a\n'
'"UnboundLocalError" exception is raised. "UnboundLocalError" '
'is a\n'
'subclass of "NameError".\n'
'\n'
'The following constructs bind names: formal parameters to '
'functions,\n'
'"import" statements, class and function definitions (these bind '
'the\n'
'class or function name in the defining block), and targets that '
'are\n'
'identifiers if occurring in an assignment, "for" loop header, '
'in the\n'
'second position of an "except" clause header or after "as" in a '
'"with"\n'
'statement. The "import" statement of the form "from ... import '
'*"\n'
'binds all names defined in the imported module, except those '
'beginning\n'
'with an underscore. This form may only be used at the module '
'level.\n'
'\n'
'A target occurring in a "del" statement is also considered '
'bound for\n'
'this purpose (though the actual semantics are to unbind the '
'name). It\n'
'is illegal to unbind a name that is referenced by an enclosing '
'scope;\n'
'the compiler will report a "SyntaxError".\n'
'\n'
'Each assignment or import statement occurs within a block '
'defined by a\n'
'class or function definition or at the module level (the '
'top-level\n'
'code block).\n'
'\n'
'If a name binding operation occurs anywhere within a code '
'block, all\n'
'uses of the name within the block are treated as references to '
'the\n'
'current block. This can lead to errors when a name is used '
'within a\n'
'block before it is bound. This rule is subtle. Python lacks\n'
'declarations and allows name binding operations to occur '
'anywhere\n'
'within a code block. The local variables of a code block can '
'be\n'
'determined by scanning the entire text of the block for name '
'binding\n'
'operations.\n'
'\n'
'If the global statement occurs within a block, all uses of the '
'name\n'
'specified in the statement refer to the binding of that name in '
'the\n'
'top-level namespace. Names are resolved in the top-level '
'namespace by\n'
'searching the global namespace, i.e. the namespace of the '
'module\n'
'containing the code block, and the builtins namespace, the '
'namespace\n'
'of the module "__builtin__". The global namespace is searched '
'first.\n'
'If the name is not found there, the builtins namespace is '
'searched.\n'
'The global statement must precede all uses of the name.\n'
'\n'
'The builtins namespace associated with the execution of a code '
'block\n'
'is actually found by looking up the name "__builtins__" in its '
'global\n'
'namespace; this should be a dictionary or a module (in the '
'latter case\n'
"the module's dictionary is used). By default, when in the "
'"__main__"\n'
'module, "__builtins__" is the built-in module "__builtin__" '
'(note: no\n'
'\'s\'); when in any other module, "__builtins__" is an alias '
'for the\n'
'dictionary of the "__builtin__" module itself. "__builtins__" '
'can be\n'
'set to a user-created dictionary to create a weak form of '
'restricted\n'
'execution.\n'
'\n'
'**CPython implementation detail:** Users should not touch\n'
'"__builtins__"; it is strictly an implementation detail. '
'Users\n'
'wanting to override values in the builtins namespace should '
'"import"\n'
'the "__builtin__" (no \'s\') module and modify its attributes\n'
'appropriately.\n'
'\n'
'The namespace for a module is automatically created the first '
'time a\n'
'module is imported. The main module for a script is always '
'called\n'
'"__main__".\n'
'\n'
'The "global" statement has the same scope as a name binding '
'operation\n'
'in the same block. If the nearest enclosing scope for a free '
'variable\n'
'contains a global statement, the free variable is treated as a '
'global.\n'
'\n'
'A class definition is an executable statement that may use and '
'define\n'
'names. These references follow the normal rules for name '
'resolution.\n'
'The namespace of the class definition becomes the attribute '
'dictionary\n'
'of the class. Names defined at the class scope are not visible '
'in\n'
'methods.\n'
'\n'
'\n'
'Interaction with dynamic features\n'
'---------------------------------\n'
'\n'
'There are several cases where Python statements are illegal '
'when used\n'
'in conjunction with nested scopes that contain free variables.\n'
'\n'
'If a variable is referenced in an enclosing scope, it is '
'illegal to\n'
'delete the name. An error will be reported at compile time.\n'
'\n'
'If the wild card form of import --- "import *" --- is used in '
'a\n'
'function and the function contains or is a nested block with '
'free\n'
'variables, the compiler will raise a "SyntaxError".\n'
'\n'
'If "exec" is used in a function and the function contains or is '
'a\n'
'nested block with free variables, the compiler will raise a\n'
'"SyntaxError" unless the exec explicitly specifies the local '
'namespace\n'
'for the "exec". (In other words, "exec obj" would be illegal, '
'but\n'
'"exec obj in ns" would be legal.)\n'
'\n'
'The "eval()", "execfile()", and "input()" functions and the '
'"exec"\n'
'statement do not have access to the full environment for '
'resolving\n'
'names. Names may be resolved in the local and global '
'namespaces of\n'
'the caller. Free variables are not resolved in the nearest '
'enclosing\n'
'namespace, but in the global namespace. [1] The "exec" '
'statement and\n'
'the "eval()" and "execfile()" functions have optional arguments '
'to\n'
'override the global and local namespace. If only one namespace '
'is\n'
'specified, it is used for both.\n'
'\n'
'\n'
'Exceptions\n'
'==========\n'
'\n'
'Exceptions are a means of breaking out of the normal flow of '
'control\n'
'of a code block in order to handle errors or other exceptional\n'
'conditions. An exception is *raised* at the point where the '
'error is\n'
'detected; it may be *handled* by the surrounding code block or '
'by any\n'
'code block that directly or indirectly invoked the code block '
'where\n'
'the error occurred.\n'
'\n'
'The Python interpreter raises an exception when it detects a '
'run-time\n'
'error (such as division by zero). A Python program can also\n'
'explicitly raise an exception with the "raise" statement. '
'Exception\n'
'handlers are specified with the "try" ... "except" statement. '
'The\n'
'"finally" clause of such a statement can be used to specify '
'cleanup\n'
'code which does not handle the exception, but is executed '
'whether an\n'
'exception occurred or not in the preceding code.\n'
'\n'
'Python uses the "termination" model of error handling: an '
'exception\n'
'handler can find out what happened and continue execution at an '
'outer\n'
'level, but it cannot repair the cause of the error and retry '
'the\n'
'failing operation (except by re-entering the offending piece of '
'code\n'
'from the top).\n'
'\n'
'When an exception is not handled at all, the interpreter '
'terminates\n'
'execution of the program, or returns to its interactive main '
'loop. In\n'
'either case, it prints a stack backtrace, except when the '
'exception is\n'
'"SystemExit".\n'
'\n'
'Exceptions are identified by class instances. The "except" '
'clause is\n'
'selected depending on the class of the instance: it must '
'reference the\n'
'class of the instance or a base class thereof. The instance '
'can be\n'
'received by the handler and can carry additional information '
'about the\n'
'exceptional condition.\n'
'\n'
'Exceptions can also be identified by strings, in which case '
'the\n'
'"except" clause is selected by object identity. An arbitrary '
'value\n'
'can be raised along with the identifying string which can be '
'passed to\n'
'the handler.\n'
'\n'
'Note: Messages to exceptions are not part of the Python API. '
'Their\n'
' contents may change from one version of Python to the next '
'without\n'
' warning and should not be relied on by code which will run '
'under\n'
' multiple versions of the interpreter.\n'
'\n'
'See also the description of the "try" statement in section The '
'try\n'
'statement and "raise" statement in section The raise '
'statement.\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] This limitation occurs because the code that is executed '
'by\n'
' these operations is not available at the time the module '
'is\n'
' compiled.\n',
'exprlists': '\n'
'Expression lists\n'
'****************\n'
'\n'
' expression_list ::= expression ( "," expression )* [","]\n'
'\n'
'An expression list containing at least one comma yields a '
'tuple. The\n'
'length of the tuple is the number of expressions in the list. '
'The\n'
'expressions are evaluated from left to right.\n'
'\n'
'The trailing comma is required only to create a single tuple '
'(a.k.a. a\n'
'*singleton*); it is optional in all other cases. A single '
'expression\n'
"without a trailing comma doesn't create a tuple, but rather "
'yields the\n'
'value of that expression. (To create an empty tuple, use an '
'empty pair\n'
'of parentheses: "()".)\n',
'floating': '\n'
'Floating point literals\n'
'***********************\n'
'\n'
'Floating point literals are described by the following lexical\n'
'definitions:\n'
'\n'
' floatnumber ::= pointfloat | exponentfloat\n'
' pointfloat ::= [intpart] fraction | intpart "."\n'
' exponentfloat ::= (intpart | pointfloat) exponent\n'
' intpart ::= digit+\n'
' fraction ::= "." digit+\n'
' exponent ::= ("e" | "E") ["+" | "-"] digit+\n'
'\n'
'Note that the integer and exponent parts of floating point '
'numbers can\n'
'look like octal integers, but are interpreted using radix 10. '
'For\n'
'example, "077e010" is legal, and denotes the same number as '
'"77e10".\n'
'The allowed range of floating point literals is implementation-\n'
'dependent. Some examples of floating point literals:\n'
'\n'
' 3.14 10. .001 1e100 3.14e-10 0e0\n'
'\n'
'Note that numeric literals do not include a sign; a phrase like '
'"-1"\n'
'is actually an expression composed of the unary operator "-" and '
'the\n'
'literal "1".\n',
'for': '\n'
'The "for" statement\n'
'*******************\n'
'\n'
'The "for" statement is used to iterate over the elements of a '
'sequence\n'
'(such as a string, tuple or list) or other iterable object:\n'
'\n'
' for_stmt ::= "for" target_list "in" expression_list ":" suite\n'
' ["else" ":" suite]\n'
'\n'
'The expression list is evaluated once; it should yield an iterable\n'
'object. An iterator is created for the result of the\n'
'"expression_list". The suite is then executed once for each item\n'
'provided by the iterator, in the order of ascending indices. Each\n'
'item in turn is assigned to the target list using the standard rules\n'
'for assignments, and then the suite is executed. When the items are\n'
'exhausted (which is immediately when the sequence is empty), the '
'suite\n'
'in the "else" clause, if present, is executed, and the loop\n'
'terminates.\n'
'\n'
'A "break" statement executed in the first suite terminates the loop\n'
'without executing the "else" clause\'s suite. A "continue" '
'statement\n'
'executed in the first suite skips the rest of the suite and '
'continues\n'
'with the next item, or with the "else" clause if there was no next\n'
'item.\n'
'\n'
'The suite may assign to the variable(s) in the target list; this '
'does\n'
'not affect the next item assigned to it.\n'
'\n'
'The target list is not deleted when the loop is finished, but if the\n'
'sequence is empty, it will not have been assigned to at all by the\n'
'loop. Hint: the built-in function "range()" returns a sequence of\n'
'integers suitable to emulate the effect of Pascal\'s "for i := a to '
'b\n'
'do"; e.g., "range(3)" returns the list "[0, 1, 2]".\n'
'\n'
'Note: There is a subtlety when the sequence is being modified by the\n'
' loop (this can only occur for mutable sequences, i.e. lists). An\n'
' internal counter is used to keep track of which item is used next,\n'
' and this is incremented on each iteration. When this counter has\n'
' reached the length of the sequence the loop terminates. This '
'means\n'
' that if the suite deletes the current (or a previous) item from '
'the\n'
' sequence, the next item will be skipped (since it gets the index '
'of\n'
' the current item which has already been treated). Likewise, if '
'the\n'
' suite inserts an item in the sequence before the current item, the\n'
' current item will be treated again the next time through the loop.\n'
' This can lead to nasty bugs that can be avoided by making a\n'
' temporary copy using a slice of the whole sequence, e.g.,\n'
'\n'
' for x in a[:]:\n'
' if x < 0: a.remove(x)\n',
'formatstrings': '\n'
'Format String Syntax\n'
'********************\n'
'\n'
'The "str.format()" method and the "Formatter" class share '
'the same\n'
'syntax for format strings (although in the case of '
'"Formatter",\n'
'subclasses can define their own format string syntax).\n'
'\n'
'Format strings contain "replacement fields" surrounded by '
'curly braces\n'
'"{}". Anything that is not contained in braces is '
'considered literal\n'
'text, which is copied unchanged to the output. If you need '
'to include\n'
'a brace character in the literal text, it can be escaped by '
'doubling:\n'
'"{{" and "}}".\n'
'\n'
'The grammar for a replacement field is as follows:\n'
'\n'
' replacement_field ::= "{" [field_name] ["!" '
'conversion] [":" format_spec] "}"\n'
' field_name ::= arg_name ("." attribute_name | '
'"[" element_index "]")*\n'
' arg_name ::= [identifier | integer]\n'
' attribute_name ::= identifier\n'
' element_index ::= integer | index_string\n'
' index_string ::= <any source character except '
'"]"> +\n'
' conversion ::= "r" | "s"\n'
' format_spec ::= <described in the next '
'section>\n'
'\n'
'In less formal terms, the replacement field can start with '
'a\n'
'*field_name* that specifies the object whose value is to be '
'formatted\n'
'and inserted into the output instead of the replacement '
'field. The\n'
'*field_name* is optionally followed by a *conversion* '
'field, which is\n'
'preceded by an exclamation point "\'!\'", and a '
'*format_spec*, which is\n'
'preceded by a colon "\':\'". These specify a non-default '
'format for the\n'
'replacement value.\n'
'\n'
'See also the Format Specification Mini-Language section.\n'
'\n'
'The *field_name* itself begins with an *arg_name* that is '
'either a\n'
"number or a keyword. If it's a number, it refers to a "
'positional\n'
"argument, and if it's a keyword, it refers to a named "
'keyword\n'
'argument. If the numerical arg_names in a format string '
'are 0, 1, 2,\n'
'... in sequence, they can all be omitted (not just some) '
'and the\n'
'numbers 0, 1, 2, ... will be automatically inserted in that '
'order.\n'
'Because *arg_name* is not quote-delimited, it is not '
'possible to\n'
'specify arbitrary dictionary keys (e.g., the strings '
'"\'10\'" or\n'
'"\':-]\'") within a format string. The *arg_name* can be '
'followed by any\n'
'number of index or attribute expressions. An expression of '
'the form\n'
'"\'.name\'" selects the named attribute using "getattr()", '
'while an\n'
'expression of the form "\'[index]\'" does an index lookup '
'using\n'
'"__getitem__()".\n'
'\n'
'Changed in version 2.7: The positional argument specifiers '
'can be\n'
'omitted, so "\'{} {}\'" is equivalent to "\'{0} {1}\'".\n'
'\n'
'Some simple format string examples:\n'
'\n'
' "First, thou shalt count to {0}" # References first '
'positional argument\n'
' "Bring me a {}" # Implicitly '
'references the first positional argument\n'
' "From {} to {}" # Same as "From {0} to '
'{1}"\n'
' "My quest is {name}" # References keyword '
"argument 'name'\n"
' "Weight in tons {0.weight}" # \'weight\' attribute '
'of first positional arg\n'
' "Units destroyed: {players[0]}" # First element of '
"keyword argument 'players'.\n"
'\n'
'The *conversion* field causes a type coercion before '
'formatting.\n'
'Normally, the job of formatting a value is done by the '
'"__format__()"\n'
'method of the value itself. However, in some cases it is '
'desirable to\n'
'force a type to be formatted as a string, overriding its '
'own\n'
'definition of formatting. By converting the value to a '
'string before\n'
'calling "__format__()", the normal formatting logic is '
'bypassed.\n'
'\n'
'Two conversion flags are currently supported: "\'!s\'" '
'which calls\n'
'"str()" on the value, and "\'!r\'" which calls "repr()".\n'
'\n'
'Some examples:\n'
'\n'
' "Harold\'s a clever {0!s}" # Calls str() on the '
'argument first\n'
' "Bring out the holy {name!r}" # Calls repr() on the '
'argument first\n'
'\n'
'The *format_spec* field contains a specification of how the '
'value\n'
'should be presented, including such details as field width, '
'alignment,\n'
'padding, decimal precision and so on. Each value type can '
'define its\n'
'own "formatting mini-language" or interpretation of the '
'*format_spec*.\n'
'\n'
'Most built-in types support a common formatting '
'mini-language, which\n'
'is described in the next section.\n'
'\n'
'A *format_spec* field can also include nested replacement '
'fields\n'
'within it. These nested replacement fields may contain a '
'field name,\n'
'conversion flag and format specification, but deeper '
'nesting is not\n'
'allowed. The replacement fields within the format_spec '
'are\n'
'substituted before the *format_spec* string is interpreted. '
'This\n'
'allows the formatting of a value to be dynamically '
'specified.\n'
'\n'
'See the Format examples section for some examples.\n'
'\n'
'\n'
'Format Specification Mini-Language\n'
'==================================\n'
'\n'
'"Format specifications" are used within replacement fields '
'contained\n'
'within a format string to define how individual values are '
'presented\n'
'(see Format String Syntax). They can also be passed '
'directly to the\n'
'built-in "format()" function. Each formattable type may '
'define how\n'
'the format specification is to be interpreted.\n'
'\n'
'Most built-in types implement the following options for '
'format\n'
'specifications, although some of the formatting options are '
'only\n'
'supported by the numeric types.\n'
'\n'
'A general convention is that an empty format string ("""") '
'produces\n'
'the same result as if you had called "str()" on the value. '
'A non-empty\n'
'format string typically modifies the result.\n'
'\n'
'The general form of a *standard format specifier* is:\n'
'\n'
' format_spec ::= '
'[[fill]align][sign][#][0][width][,][.precision][type]\n'
' fill ::= <any character>\n'
' align ::= "<" | ">" | "=" | "^"\n'
' sign ::= "+" | "-" | " "\n'
' width ::= integer\n'
' precision ::= integer\n'
' type ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" '
'| "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"\n'
'\n'
'If a valid *align* value is specified, it can be preceded '
'by a *fill*\n'
'character that can be any character and defaults to a space '
'if\n'
'omitted. It is not possible to use a literal curly brace '
'(""{"" or\n'
'""}"") as the *fill* character when using the '
'"str.format()" method.\n'
'However, it is possible to insert a curly brace with a '
'nested\n'
"replacement field. This limitation doesn't affect the "
'"format()"\n'
'function.\n'
'\n'
'The meaning of the various alignment options is as '
'follows:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Option | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'<\'" | Forces the field to be left-aligned '
'within the available |\n'
' | | space (this is the default for most '
'objects). |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'>\'" | Forces the field to be right-aligned '
'within the available |\n'
' | | space (this is the default for '
'numbers). |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'=\'" | Forces the padding to be placed after '
'the sign (if any) |\n'
' | | but before the digits. This is used for '
'printing fields |\n'
" | | in the form '+000000120'. This alignment "
'option is only |\n'
' | | valid for numeric types. It becomes the '
"default when '0' |\n"
' | | immediately precedes the field '
'width. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'^\'" | Forces the field to be centered within '
'the available |\n'
' | | '
'space. '
'|\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'Note that unless a minimum field width is defined, the '
'field width\n'
'will always be the same size as the data to fill it, so '
'that the\n'
'alignment option has no meaning in this case.\n'
'\n'
'The *sign* option is only valid for number types, and can '
'be one of\n'
'the following:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Option | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'+\'" | indicates that a sign should be used for '
'both positive as |\n'
' | | well as negative '
'numbers. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'-\'" | indicates that a sign should be used '
'only for negative |\n'
' | | numbers (this is the default '
'behavior). |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | space | indicates that a leading space should be '
'used on positive |\n'
' | | numbers, and a minus sign on negative '
'numbers. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'The "\'#\'" option is only valid for integers, and only for '
'binary,\n'
'octal, or hexadecimal output. If present, it specifies '
'that the\n'
'output will be prefixed by "\'0b\'", "\'0o\'", or "\'0x\'", '
'respectively.\n'
'\n'
'The "\',\'" option signals the use of a comma for a '
'thousands separator.\n'
'For a locale aware separator, use the "\'n\'" integer '
'presentation type\n'
'instead.\n'
'\n'
'Changed in version 2.7: Added the "\',\'" option (see also '
'**PEP 378**).\n'
'\n'
'*width* is a decimal integer defining the minimum field '
'width. If not\n'
'specified, then the field width will be determined by the '
'content.\n'
'\n'
'When no explicit alignment is given, preceding the *width* '
'field by a\n'
'zero ("\'0\'") character enables sign-aware zero-padding '
'for numeric\n'
'types. This is equivalent to a *fill* character of "\'0\'" '
'with an\n'
'*alignment* type of "\'=\'".\n'
'\n'
'The *precision* is a decimal number indicating how many '
'digits should\n'
'be displayed after the decimal point for a floating point '
'value\n'
'formatted with "\'f\'" and "\'F\'", or before and after the '
'decimal point\n'
'for a floating point value formatted with "\'g\'" or '
'"\'G\'". For non-\n'
'number types the field indicates the maximum field size - '
'in other\n'
'words, how many characters will be used from the field '
'content. The\n'
'*precision* is not allowed for integer values.\n'
'\n'
'Finally, the *type* determines how the data should be '
'presented.\n'
'\n'
'The available string presentation types are:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Type | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'s\'" | String format. This is the default type '
'for strings and |\n'
' | | may be '
'omitted. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | None | The same as '
'"\'s\'". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'The available integer presentation types are:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Type | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'b\'" | Binary format. Outputs the number in '
'base 2. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'c\'" | Character. Converts the integer to the '
'corresponding |\n'
' | | unicode character before '
'printing. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'d\'" | Decimal Integer. Outputs the number in '
'base 10. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'o\'" | Octal format. Outputs the number in base '
'8. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'x\'" | Hex format. Outputs the number in base '
'16, using lower- |\n'
' | | case letters for the digits above '
'9. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'X\'" | Hex format. Outputs the number in base '
'16, using upper- |\n'
' | | case letters for the digits above '
'9. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'n\'" | Number. This is the same as "\'d\'", '
'except that it uses the |\n'
' | | current locale setting to insert the '
'appropriate number |\n'
' | | separator '
'characters. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | None | The same as '
'"\'d\'". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'In addition to the above presentation types, integers can '
'be formatted\n'
'with the floating point presentation types listed below '
'(except "\'n\'"\n'
'and "None"). When doing so, "float()" is used to convert '
'the integer\n'
'to a floating point number before formatting.\n'
'\n'
'The available presentation types for floating point and '
'decimal values\n'
'are:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Type | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'e\'" | Exponent notation. Prints the number in '
'scientific |\n'
" | | notation using the letter 'e' to indicate "
'the exponent. |\n'
' | | The default precision is '
'"6". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'E\'" | Exponent notation. Same as "\'e\'" '
'except it uses an upper |\n'
" | | case 'E' as the separator "
'character. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'f\'" | Fixed point. Displays the number as a '
'fixed-point number. |\n'
' | | The default precision is '
'"6". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'F\'" | Fixed point. Same as '
'"\'f\'". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'g\'" | General format. For a given precision '
'"p >= 1", this |\n'
' | | rounds the number to "p" significant '
'digits and then |\n'
' | | formats the result in either fixed-point '
'format or in |\n'
' | | scientific notation, depending on its '
'magnitude. The |\n'
' | | precise rules are as follows: suppose that '
'the result |\n'
' | | formatted with presentation type "\'e\'" '
'and precision "p-1" |\n'
' | | would have exponent "exp". Then if "-4 <= '
'exp < p", the |\n'
' | | number is formatted with presentation type '
'"\'f\'" and |\n'
' | | precision "p-1-exp". Otherwise, the '
'number is formatted |\n'
' | | with presentation type "\'e\'" and '
'precision "p-1". In both |\n'
' | | cases insignificant trailing zeros are '
'removed from the |\n'
' | | significand, and the decimal point is also '
'removed if |\n'
' | | there are no remaining digits following '
'it. Positive and |\n'
' | | negative infinity, positive and negative '
'zero, and nans, |\n'
' | | are formatted as "inf", "-inf", "0", "-0" '
'and "nan" |\n'
' | | respectively, regardless of the '
'precision. A precision of |\n'
' | | "0" is treated as equivalent to a '
'precision of "1". The |\n'
' | | default precision is '
'"6". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'G\'" | General format. Same as "\'g\'" except '
'switches to "\'E\'" if |\n'
' | | the number gets too large. The '
'representations of infinity |\n'
' | | and NaN are uppercased, '
'too. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'n\'" | Number. This is the same as "\'g\'", '
'except that it uses the |\n'
' | | current locale setting to insert the '
'appropriate number |\n'
' | | separator '
'characters. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'%\'" | Percentage. Multiplies the number by 100 '
'and displays in |\n'
' | | fixed ("\'f\'") format, followed by a '
'percent sign. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | None | The same as '
'"\'g\'". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'\n'
'Format examples\n'
'===============\n'
'\n'
'This section contains examples of the "str.format()" syntax '
'and\n'
'comparison with the old "%"-formatting.\n'
'\n'
'In most of the cases the syntax is similar to the old '
'"%"-formatting,\n'
'with the addition of the "{}" and with ":" used instead of '
'"%". For\n'
'example, "\'%03.2f\'" can be translated to "\'{:03.2f}\'".\n'
'\n'
'The new format syntax also supports new and different '
'options, shown\n'
'in the follow examples.\n'
'\n'
'Accessing arguments by position:\n'
'\n'
" >>> '{0}, {1}, {2}'.format('a', 'b', 'c')\n"
" 'a, b, c'\n"
" >>> '{}, {}, {}'.format('a', 'b', 'c') # 2.7+ only\n"
" 'a, b, c'\n"
" >>> '{2}, {1}, {0}'.format('a', 'b', 'c')\n"
" 'c, b, a'\n"
" >>> '{2}, {1}, {0}'.format(*'abc') # unpacking "
'argument sequence\n'
" 'c, b, a'\n"
" >>> '{0}{1}{0}'.format('abra', 'cad') # arguments' "
'indices can be repeated\n'
" 'abracadabra'\n"
'\n'
'Accessing arguments by name:\n'
'\n'
" >>> 'Coordinates: {latitude}, "
"{longitude}'.format(latitude='37.24N', "
"longitude='-115.81W')\n"
" 'Coordinates: 37.24N, -115.81W'\n"
" >>> coord = {'latitude': '37.24N', 'longitude': "
"'-115.81W'}\n"
" >>> 'Coordinates: {latitude}, "
"{longitude}'.format(**coord)\n"
" 'Coordinates: 37.24N, -115.81W'\n"
'\n'
"Accessing arguments' attributes:\n"
'\n'
' >>> c = 3-5j\n'
" >>> ('The complex number {0} is formed from the real "
"part {0.real} '\n"
" ... 'and the imaginary part {0.imag}.').format(c)\n"
" 'The complex number (3-5j) is formed from the real part "
"3.0 and the imaginary part -5.0.'\n"
' >>> class Point(object):\n'
' ... def __init__(self, x, y):\n'
' ... self.x, self.y = x, y\n'
' ... def __str__(self):\n'
" ... return 'Point({self.x}, "
"{self.y})'.format(self=self)\n"
' ...\n'
' >>> str(Point(4, 2))\n'
" 'Point(4, 2)'\n"
'\n'
"Accessing arguments' items:\n"
'\n'
' >>> coord = (3, 5)\n'
" >>> 'X: {0[0]}; Y: {0[1]}'.format(coord)\n"
" 'X: 3; Y: 5'\n"
'\n'
'Replacing "%s" and "%r":\n'
'\n'
' >>> "repr() shows quotes: {!r}; str() doesn\'t: '
'{!s}".format(\'test1\', \'test2\')\n'
' "repr() shows quotes: \'test1\'; str() doesn\'t: test2"\n'
'\n'
'Aligning the text and specifying a width:\n'
'\n'
" >>> '{:<30}'.format('left aligned')\n"
" 'left aligned '\n"
" >>> '{:>30}'.format('right aligned')\n"
" ' right aligned'\n"
" >>> '{:^30}'.format('centered')\n"
" ' centered '\n"
" >>> '{:*^30}'.format('centered') # use '*' as a fill "
'char\n'
" '***********centered***********'\n"
'\n'
'Replacing "%+f", "%-f", and "% f" and specifying a sign:\n'
'\n'
" >>> '{:+f}; {:+f}'.format(3.14, -3.14) # show it "
'always\n'
" '+3.140000; -3.140000'\n"
" >>> '{: f}; {: f}'.format(3.14, -3.14) # show a space "
'for positive numbers\n'
" ' 3.140000; -3.140000'\n"
" >>> '{:-f}; {:-f}'.format(3.14, -3.14) # show only the "
"minus -- same as '{:f}; {:f}'\n"
" '3.140000; -3.140000'\n"
'\n'
'Replacing "%x" and "%o" and converting the value to '
'different bases:\n'
'\n'
' >>> # format also supports binary numbers\n'
' >>> "int: {0:d}; hex: {0:x}; oct: {0:o}; bin: '
'{0:b}".format(42)\n'
" 'int: 42; hex: 2a; oct: 52; bin: 101010'\n"
' >>> # with 0x, 0o, or 0b as prefix:\n'
' >>> "int: {0:d}; hex: {0:#x}; oct: {0:#o}; bin: '
'{0:#b}".format(42)\n'
" 'int: 42; hex: 0x2a; oct: 0o52; bin: 0b101010'\n"
'\n'
'Using the comma as a thousands separator:\n'
'\n'
" >>> '{:,}'.format(1234567890)\n"
" '1,234,567,890'\n"
'\n'
'Expressing a percentage:\n'
'\n'
' >>> points = 19.5\n'
' >>> total = 22\n'
" >>> 'Correct answers: {:.2%}'.format(points/total)\n"
" 'Correct answers: 88.64%'\n"
'\n'
'Using type-specific formatting:\n'
'\n'
' >>> import datetime\n'
' >>> d = datetime.datetime(2010, 7, 4, 12, 15, 58)\n'
" >>> '{:%Y-%m-%d %H:%M:%S}'.format(d)\n"
" '2010-07-04 12:15:58'\n"
'\n'
'Nesting arguments and more complex examples:\n'
'\n'
" >>> for align, text in zip('<^>', ['left', 'center', "
"'right']):\n"
" ... '{0:{fill}{align}16}'.format(text, fill=align, "
'align=align)\n'
' ...\n'
" 'left<<<<<<<<<<<<'\n"
" '^^^^^center^^^^^'\n"
" '>>>>>>>>>>>right'\n"
' >>>\n'
' >>> octets = [192, 168, 0, 1]\n'
" >>> '{:02X}{:02X}{:02X}{:02X}'.format(*octets)\n"
" 'C0A80001'\n"
' >>> int(_, 16)\n'
' 3232235521\n'
' >>>\n'
' >>> width = 5\n'
' >>> for num in range(5,12):\n'
" ... for base in 'dXob':\n"
" ... print '{0:{width}{base}}'.format(num, "
'base=base, width=width),\n'
' ... print\n'
' ...\n'
' 5 5 5 101\n'
' 6 6 6 110\n'
' 7 7 7 111\n'
' 8 8 10 1000\n'
' 9 9 11 1001\n'
' 10 A 12 1010\n'
' 11 B 13 1011\n',
'function': '\n'
'Function definitions\n'
'********************\n'
'\n'
'A function definition defines a user-defined function object '
'(see\n'
'section The standard type hierarchy):\n'
'\n'
' decorated ::= decorators (classdef | funcdef)\n'
' decorators ::= decorator+\n'
' decorator ::= "@" dotted_name ["(" [argument_list [","]] '
'")"] NEWLINE\n'
' funcdef ::= "def" funcname "(" [parameter_list] ")" '
'":" suite\n'
' dotted_name ::= identifier ("." identifier)*\n'
' parameter_list ::= (defparameter ",")*\n'
' ( "*" identifier ["," "**" identifier]\n'
' | "**" identifier\n'
' | defparameter [","] )\n'
' defparameter ::= parameter ["=" expression]\n'
' sublist ::= parameter ("," parameter)* [","]\n'
' parameter ::= identifier | "(" sublist ")"\n'
' funcname ::= identifier\n'
'\n'
'A function definition is an executable statement. Its execution '
'binds\n'
'the function name in the current local namespace to a function '
'object\n'
'(a wrapper around the executable code for the function). This\n'
'function object contains a reference to the current global '
'namespace\n'
'as the global namespace to be used when the function is called.\n'
'\n'
'The function definition does not execute the function body; this '
'gets\n'
'executed only when the function is called. [3]\n'
'\n'
'A function definition may be wrapped by one or more *decorator*\n'
'expressions. Decorator expressions are evaluated when the '
'function is\n'
'defined, in the scope that contains the function definition. '
'The\n'
'result must be a callable, which is invoked with the function '
'object\n'
'as the only argument. The returned value is bound to the '
'function name\n'
'instead of the function object. Multiple decorators are applied '
'in\n'
'nested fashion. For example, the following code:\n'
'\n'
' @f1(arg)\n'
' @f2\n'
' def func(): pass\n'
'\n'
'is equivalent to:\n'
'\n'
' def func(): pass\n'
' func = f1(arg)(f2(func))\n'
'\n'
'When one or more top-level *parameters* have the form '
'*parameter* "="\n'
'*expression*, the function is said to have "default parameter '
'values."\n'
'For a parameter with a default value, the corresponding '
'*argument* may\n'
"be omitted from a call, in which case the parameter's default "
'value is\n'
'substituted. If a parameter has a default value, all following\n'
'parameters must also have a default value --- this is a '
'syntactic\n'
'restriction that is not expressed by the grammar.\n'
'\n'
'**Default parameter values are evaluated when the function '
'definition\n'
'is executed.** This means that the expression is evaluated '
'once, when\n'
'the function is defined, and that the same "pre-computed" value '
'is\n'
'used for each call. This is especially important to understand '
'when a\n'
'default parameter is a mutable object, such as a list or a '
'dictionary:\n'
'if the function modifies the object (e.g. by appending an item '
'to a\n'
'list), the default value is in effect modified. This is '
'generally not\n'
'what was intended. A way around this is to use "None" as the\n'
'default, and explicitly test for it in the body of the function, '
'e.g.:\n'
'\n'
' def whats_on_the_telly(penguin=None):\n'
' if penguin is None:\n'
' penguin = []\n'
' penguin.append("property of the zoo")\n'
' return penguin\n'
'\n'
'Function call semantics are described in more detail in section '
'Calls.\n'
'A function call always assigns values to all parameters '
'mentioned in\n'
'the parameter list, either from position arguments, from '
'keyword\n'
'arguments, or from default values. If the form ""*identifier"" '
'is\n'
'present, it is initialized to a tuple receiving any excess '
'positional\n'
'parameters, defaulting to the empty tuple. If the form\n'
'""**identifier"" is present, it is initialized to a new '
'dictionary\n'
'receiving any excess keyword arguments, defaulting to a new '
'empty\n'
'dictionary.\n'
'\n'
'It is also possible to create anonymous functions (functions not '
'bound\n'
'to a name), for immediate use in expressions. This uses lambda\n'
'expressions, described in section Lambdas. Note that the '
'lambda\n'
'expression is merely a shorthand for a simplified function '
'definition;\n'
'a function defined in a ""def"" statement can be passed around '
'or\n'
'assigned to another name just like a function defined by a '
'lambda\n'
'expression. The ""def"" form is actually more powerful since '
'it\n'
'allows the execution of multiple statements.\n'
'\n'
"**Programmer's note:** Functions are first-class objects. A "
'""def""\n'
'form executed inside a function definition defines a local '
'function\n'
'that can be returned or passed around. Free variables used in '
'the\n'
'nested function can access the local variables of the function\n'
'containing the def. See section Naming and binding for '
'details.\n',
'global': '\n'
'The "global" statement\n'
'**********************\n'
'\n'
' global_stmt ::= "global" identifier ("," identifier)*\n'
'\n'
'The "global" statement is a declaration which holds for the '
'entire\n'
'current code block. It means that the listed identifiers are to '
'be\n'
'interpreted as globals. It would be impossible to assign to a '
'global\n'
'variable without "global", although free variables may refer to\n'
'globals without being declared global.\n'
'\n'
'Names listed in a "global" statement must not be used in the same '
'code\n'
'block textually preceding that "global" statement.\n'
'\n'
'Names listed in a "global" statement must not be defined as '
'formal\n'
'parameters or in a "for" loop control target, "class" definition,\n'
'function definition, or "import" statement.\n'
'\n'
'**CPython implementation detail:** The current implementation does '
'not\n'
'enforce the latter two restrictions, but programs should not '
'abuse\n'
'this freedom, as future implementations may enforce them or '
'silently\n'
'change the meaning of the program.\n'
'\n'
'**Programmer\'s note:** "global" is a directive to the parser. '
'It\n'
'applies only to code parsed at the same time as the "global"\n'
'statement. In particular, a "global" statement contained in an '
'"exec"\n'
'statement does not affect the code block *containing* the "exec"\n'
'statement, and code contained in an "exec" statement is unaffected '
'by\n'
'"global" statements in the code containing the "exec" statement. '
'The\n'
'same applies to the "eval()", "execfile()" and "compile()" '
'functions.\n',
'id-classes': '\n'
'Reserved classes of identifiers\n'
'*******************************\n'
'\n'
'Certain classes of identifiers (besides keywords) have '
'special\n'
'meanings. These classes are identified by the patterns of '
'leading and\n'
'trailing underscore characters:\n'
'\n'
'"_*"\n'
' Not imported by "from module import *". The special '
'identifier "_"\n'
' is used in the interactive interpreter to store the result '
'of the\n'
' last evaluation; it is stored in the "__builtin__" module. '
'When\n'
' not in interactive mode, "_" has no special meaning and is '
'not\n'
' defined. See section The import statement.\n'
'\n'
' Note: The name "_" is often used in conjunction with\n'
' internationalization; refer to the documentation for the\n'
' "gettext" module for more information on this '
'convention.\n'
'\n'
'"__*__"\n'
' System-defined names. These names are defined by the '
'interpreter\n'
' and its implementation (including the standard library). '
'Current\n'
' system names are discussed in the Special method names '
'section and\n'
' elsewhere. More will likely be defined in future versions '
'of\n'
' Python. *Any* use of "__*__" names, in any context, that '
'does not\n'
' follow explicitly documented use, is subject to breakage '
'without\n'
' warning.\n'
'\n'
'"__*"\n'
' Class-private names. Names in this category, when used '
'within the\n'
' context of a class definition, are re-written to use a '
'mangled form\n'
' to help avoid name clashes between "private" attributes of '
'base and\n'
' derived classes. See section Identifiers (Names).\n',
'identifiers': '\n'
'Identifiers and keywords\n'
'************************\n'
'\n'
'Identifiers (also referred to as *names*) are described by '
'the\n'
'following lexical definitions:\n'
'\n'
' identifier ::= (letter|"_") (letter | digit | "_")*\n'
' letter ::= lowercase | uppercase\n'
' lowercase ::= "a"..."z"\n'
' uppercase ::= "A"..."Z"\n'
' digit ::= "0"..."9"\n'
'\n'
'Identifiers are unlimited in length. Case is significant.\n'
'\n'
'\n'
'Keywords\n'
'========\n'
'\n'
'The following identifiers are used as reserved words, or '
'*keywords* of\n'
'the language, and cannot be used as ordinary identifiers. '
'They must\n'
'be spelled exactly as written here:\n'
'\n'
' and del from not while\n'
' as elif global or with\n'
' assert else if pass yield\n'
' break except import print\n'
' class exec in raise\n'
' continue finally is return\n'
' def for lambda try\n'
'\n'
'Changed in version 2.4: "None" became a constant and is now '
'recognized\n'
'by the compiler as a name for the built-in object "None". '
'Although it\n'
'is not a keyword, you cannot assign a different object to '
'it.\n'
'\n'
'Changed in version 2.5: Using "as" and "with" as identifiers '
'triggers\n'
'a warning. To use them as keywords, enable the '
'"with_statement"\n'
'future feature .\n'
'\n'
'Changed in version 2.6: "as" and "with" are full keywords.\n'
'\n'
'\n'
'Reserved classes of identifiers\n'
'===============================\n'
'\n'
'Certain classes of identifiers (besides keywords) have '
'special\n'
'meanings. These classes are identified by the patterns of '
'leading and\n'
'trailing underscore characters:\n'
'\n'
'"_*"\n'
' Not imported by "from module import *". The special '
'identifier "_"\n'
' is used in the interactive interpreter to store the result '
'of the\n'
' last evaluation; it is stored in the "__builtin__" '
'module. When\n'
' not in interactive mode, "_" has no special meaning and is '
'not\n'
' defined. See section The import statement.\n'
'\n'
' Note: The name "_" is often used in conjunction with\n'
' internationalization; refer to the documentation for '
'the\n'
' "gettext" module for more information on this '
'convention.\n'
'\n'
'"__*__"\n'
' System-defined names. These names are defined by the '
'interpreter\n'
' and its implementation (including the standard library). '
'Current\n'
' system names are discussed in the Special method names '
'section and\n'
' elsewhere. More will likely be defined in future versions '
'of\n'
' Python. *Any* use of "__*__" names, in any context, that '
'does not\n'
' follow explicitly documented use, is subject to breakage '
'without\n'
' warning.\n'
'\n'
'"__*"\n'
' Class-private names. Names in this category, when used '
'within the\n'
' context of a class definition, are re-written to use a '
'mangled form\n'
' to help avoid name clashes between "private" attributes of '
'base and\n'
' derived classes. See section Identifiers (Names).\n',
'if': '\n'
'The "if" statement\n'
'******************\n'
'\n'
'The "if" statement is used for conditional execution:\n'
'\n'
' if_stmt ::= "if" expression ":" suite\n'
' ( "elif" expression ":" suite )*\n'
' ["else" ":" suite]\n'
'\n'
'It selects exactly one of the suites by evaluating the expressions '
'one\n'
'by one until one is found to be true (see section Boolean operations\n'
'for the definition of true and false); then that suite is executed\n'
'(and no other part of the "if" statement is executed or evaluated).\n'
'If all expressions are false, the suite of the "else" clause, if\n'
'present, is executed.\n',
'imaginary': '\n'
'Imaginary literals\n'
'******************\n'
'\n'
'Imaginary literals are described by the following lexical '
'definitions:\n'
'\n'
' imagnumber ::= (floatnumber | intpart) ("j" | "J")\n'
'\n'
'An imaginary literal yields a complex number with a real part '
'of 0.0.\n'
'Complex numbers are represented as a pair of floating point '
'numbers\n'
'and have the same restrictions on their range. To create a '
'complex\n'
'number with a nonzero real part, add a floating point number to '
'it,\n'
'e.g., "(3+4j)". Some examples of imaginary literals:\n'
'\n'
' 3.14j 10.j 10j .001j 1e100j 3.14e-10j\n',
'import': '\n'
'The "import" statement\n'
'**********************\n'
'\n'
' import_stmt ::= "import" module ["as" name] ( "," module '
'["as" name] )*\n'
' | "from" relative_module "import" identifier '
'["as" name]\n'
' ( "," identifier ["as" name] )*\n'
' | "from" relative_module "import" "(" '
'identifier ["as" name]\n'
' ( "," identifier ["as" name] )* [","] ")"\n'
' | "from" module "import" "*"\n'
' module ::= (identifier ".")* identifier\n'
' relative_module ::= "."* module | "."+\n'
' name ::= identifier\n'
'\n'
'Import statements are executed in two steps: (1) find a module, '
'and\n'
'initialize it if necessary; (2) define a name or names in the '
'local\n'
'namespace (of the scope where the "import" statement occurs). The\n'
'statement comes in two forms differing on whether it uses the '
'"from"\n'
'keyword. The first form (without "from") repeats these steps for '
'each\n'
'identifier in the list. The form with "from" performs step (1) '
'once,\n'
'and then performs step (2) repeatedly.\n'
'\n'
'To understand how step (1) occurs, one must first understand how\n'
'Python handles hierarchical naming of modules. To help organize\n'
'modules and provide a hierarchy in naming, Python has a concept '
'of\n'
'packages. A package can contain other packages and modules while\n'
'modules cannot contain other modules or packages. From a file '
'system\n'
'perspective, packages are directories and modules are files.\n'
'\n'
'Once the name of the module is known (unless otherwise specified, '
'the\n'
'term "module" will refer to both packages and modules), searching '
'for\n'
'the module or package can begin. The first place checked is\n'
'"sys.modules", the cache of all modules that have been imported\n'
'previously. If the module is found there then it is used in step '
'(2)\n'
'of import.\n'
'\n'
'If the module is not found in the cache, then "sys.meta_path" is\n'
'searched (the specification for "sys.meta_path" can be found in '
'**PEP\n'
'302**). The object is a list of *finder* objects which are queried '
'in\n'
'order as to whether they know how to load the module by calling '
'their\n'
'"find_module()" method with the name of the module. If the module\n'
'happens to be contained within a package (as denoted by the '
'existence\n'
'of a dot in the name), then a second argument to "find_module()" '
'is\n'
'given as the value of the "__path__" attribute from the parent '
'package\n'
'(everything up to the last dot in the name of the module being\n'
'imported). If a finder can find the module it returns a *loader*\n'
'(discussed later) or returns "None".\n'
'\n'
'If none of the finders on "sys.meta_path" are able to find the '
'module\n'
'then some implicitly defined finders are queried. Implementations '
'of\n'
'Python vary in what implicit meta path finders are defined. The '
'one\n'
'they all do define, though, is one that handles "sys.path_hooks",\n'
'"sys.path_importer_cache", and "sys.path".\n'
'\n'
'The implicit finder searches for the requested module in the '
'"paths"\n'
'specified in one of two places ("paths" do not have to be file '
'system\n'
'paths). If the module being imported is supposed to be contained\n'
'within a package then the second argument passed to '
'"find_module()",\n'
'"__path__" on the parent package, is used as the source of paths. '
'If\n'
'the module is not contained in a package then "sys.path" is used '
'as\n'
'the source of paths.\n'
'\n'
'Once the source of paths is chosen it is iterated over to find a\n'
'finder that can handle that path. The dict at\n'
'"sys.path_importer_cache" caches finders for paths and is checked '
'for\n'
'a finder. If the path does not have a finder cached then\n'
'"sys.path_hooks" is searched by calling each object in the list '
'with a\n'
'single argument of the path, returning a finder or raises\n'
'"ImportError". If a finder is returned then it is cached in\n'
'"sys.path_importer_cache" and then used for that path entry. If '
'no\n'
'finder can be found but the path exists then a value of "None" is\n'
'stored in "sys.path_importer_cache" to signify that an implicit, '
'file-\n'
'based finder that handles modules stored as individual files '
'should be\n'
'used for that path. If the path does not exist then a finder '
'which\n'
'always returns "None" is placed in the cache for the path.\n'
'\n'
'If no finder can find the module then "ImportError" is raised.\n'
'Otherwise some finder returned a loader whose "load_module()" '
'method\n'
'is called with the name of the module to load (see **PEP 302** for '
'the\n'
'original definition of loaders). A loader has several '
'responsibilities\n'
'to perform on a module it loads. First, if the module already '
'exists\n'
'in "sys.modules" (a possibility if the loader is called outside of '
'the\n'
'import machinery) then it is to use that module for initialization '
'and\n'
'not a new module. But if the module does not exist in '
'"sys.modules"\n'
'then it is to be added to that dict before initialization begins. '
'If\n'
'an error occurs during loading of the module and it was added to\n'
'"sys.modules" it is to be removed from the dict. If an error '
'occurs\n'
'but the module was already in "sys.modules" it is left in the '
'dict.\n'
'\n'
'The loader must set several attributes on the module. "__name__" '
'is to\n'
'be set to the name of the module. "__file__" is to be the "path" '
'to\n'
'the file unless the module is built-in (and thus listed in\n'
'"sys.builtin_module_names") in which case the attribute is not '
'set. If\n'
'what is being imported is a package then "__path__" is to be set '
'to a\n'
'list of paths to be searched when looking for modules and '
'packages\n'
'contained within the package being imported. "__package__" is '
'optional\n'
'but should be set to the name of package that contains the module '
'or\n'
'package (the empty string is used for module not contained in a\n'
'package). "__loader__" is also optional but should be set to the\n'
'loader object that is loading the module.\n'
'\n'
'If an error occurs during loading then the loader raises '
'"ImportError"\n'
'if some other exception is not already being propagated. Otherwise '
'the\n'
'loader returns the module that was loaded and initialized.\n'
'\n'
'When step (1) finishes without raising an exception, step (2) can\n'
'begin.\n'
'\n'
'The first form of "import" statement binds the module name in the\n'
'local namespace to the module object, and then goes on to import '
'the\n'
'next identifier, if any. If the module name is followed by "as", '
'the\n'
'name following "as" is used as the local name for the module.\n'
'\n'
'The "from" form does not bind the module name: it goes through '
'the\n'
'list of identifiers, looks each one of them up in the module found '
'in\n'
'step (1), and binds the name in the local namespace to the object '
'thus\n'
'found. As with the first form of "import", an alternate local '
'name\n'
'can be supplied by specifying ""as" localname". If a name is not\n'
'found, "ImportError" is raised. If the list of identifiers is\n'
'replaced by a star ("\'*\'"), all public names defined in the '
'module are\n'
'bound in the local namespace of the "import" statement..\n'
'\n'
'The *public names* defined by a module are determined by checking '
'the\n'
'module\'s namespace for a variable named "__all__"; if defined, it '
'must\n'
'be a sequence of strings which are names defined or imported by '
'that\n'
'module. The names given in "__all__" are all considered public '
'and\n'
'are required to exist. If "__all__" is not defined, the set of '
'public\n'
"names includes all names found in the module's namespace which do "
'not\n'
'begin with an underscore character ("\'_\'"). "__all__" should '
'contain\n'
'the entire public API. It is intended to avoid accidentally '
'exporting\n'
'items that are not part of the API (such as library modules which '
'were\n'
'imported and used within the module).\n'
'\n'
'The "from" form with "*" may only occur in a module scope. If '
'the\n'
'wild card form of import --- "import *" --- is used in a function '
'and\n'
'the function contains or is a nested block with free variables, '
'the\n'
'compiler will raise a "SyntaxError".\n'
'\n'
'When specifying what module to import you do not have to specify '
'the\n'
'absolute name of the module. When a module or package is '
'contained\n'
'within another package it is possible to make a relative import '
'within\n'
'the same top package without having to mention the package name. '
'By\n'
'using leading dots in the specified module or package after "from" '
'you\n'
'can specify how high to traverse up the current package hierarchy\n'
'without specifying exact names. One leading dot means the current\n'
'package where the module making the import exists. Two dots means '
'up\n'
'one package level. Three dots is up two levels, etc. So if you '
'execute\n'
'"from . import mod" from a module in the "pkg" package then you '
'will\n'
'end up importing "pkg.mod". If you execute "from ..subpkg2 import '
'mod"\n'
'from within "pkg.subpkg1" you will import "pkg.subpkg2.mod". The\n'
'specification for relative imports is contained within **PEP '
'328**.\n'
'\n'
'"importlib.import_module()" is provided to support applications '
'that\n'
'determine which modules need to be loaded dynamically.\n'
'\n'
'\n'
'Future statements\n'
'=================\n'
'\n'
'A *future statement* is a directive to the compiler that a '
'particular\n'
'module should be compiled using syntax or semantics that will be\n'
'available in a specified future release of Python. The future\n'
'statement is intended to ease migration to future versions of '
'Python\n'
'that introduce incompatible changes to the language. It allows '
'use of\n'
'the new features on a per-module basis before the release in which '
'the\n'
'feature becomes standard.\n'
'\n'
' future_statement ::= "from" "__future__" "import" feature ["as" '
'name]\n'
' ("," feature ["as" name])*\n'
' | "from" "__future__" "import" "(" feature '
'["as" name]\n'
' ("," feature ["as" name])* [","] ")"\n'
' feature ::= identifier\n'
' name ::= identifier\n'
'\n'
'A future statement must appear near the top of the module. The '
'only\n'
'lines that can appear before a future statement are:\n'
'\n'
'* the module docstring (if any),\n'
'\n'
'* comments,\n'
'\n'
'* blank lines, and\n'
'\n'
'* other future statements.\n'
'\n'
'The features recognized by Python 2.6 are "unicode_literals",\n'
'"print_function", "absolute_import", "division", "generators",\n'
'"nested_scopes" and "with_statement". "generators", '
'"with_statement",\n'
'"nested_scopes" are redundant in Python version 2.6 and above '
'because\n'
'they are always enabled.\n'
'\n'
'A future statement is recognized and treated specially at compile\n'
'time: Changes to the semantics of core constructs are often\n'
'implemented by generating different code. It may even be the '
'case\n'
'that a new feature introduces new incompatible syntax (such as a '
'new\n'
'reserved word), in which case the compiler may need to parse the\n'
'module differently. Such decisions cannot be pushed off until\n'
'runtime.\n'
'\n'
'For any given release, the compiler knows which feature names '
'have\n'
'been defined, and raises a compile-time error if a future '
'statement\n'
'contains a feature not known to it.\n'
'\n'
'The direct runtime semantics are the same as for any import '
'statement:\n'
'there is a standard module "__future__", described later, and it '
'will\n'
'be imported in the usual way at the time the future statement is\n'
'executed.\n'
'\n'
'The interesting runtime semantics depend on the specific feature\n'
'enabled by the future statement.\n'
'\n'
'Note that there is nothing special about the statement:\n'
'\n'
' import __future__ [as name]\n'
'\n'
"That is not a future statement; it's an ordinary import statement "
'with\n'
'no special semantics or syntax restrictions.\n'
'\n'
'Code compiled by an "exec" statement or calls to the built-in\n'
'functions "compile()" and "execfile()" that occur in a module "M"\n'
'containing a future statement will, by default, use the new '
'syntax or\n'
'semantics associated with the future statement. This can, '
'starting\n'
'with Python 2.2 be controlled by optional arguments to "compile()" '
'---\n'
'see the documentation of that function for details.\n'
'\n'
'A future statement typed at an interactive interpreter prompt '
'will\n'
'take effect for the rest of the interpreter session. If an\n'
'interpreter is started with the "-i" option, is passed a script '
'name\n'
'to execute, and the script includes a future statement, it will be '
'in\n'
'effect in the interactive session started after the script is\n'
'executed.\n'
'\n'
'See also:\n'
'\n'
' **PEP 236** - Back to the __future__\n'
' The original proposal for the __future__ mechanism.\n',
'in': '\n'
'Membership test operations\n'
'**************************\n'
'\n'
'The operators "in" and "not in" test for membership. "x in s"\n'
'evaluates to "True" if *x* is a member of *s*, and "False" otherwise.\n'
'"x not in s" returns the negation of "x in s". All built-in '
'sequences\n'
'and set types support this as well as dictionary, for which "in" '
'tests\n'
'whether the dictionary has a given key. For container types such as\n'
'list, tuple, set, frozenset, dict, or collections.deque, the\n'
'expression "x in y" is equivalent to "any(x is e or x == e for e in\n'
'y)".\n'
'\n'
'For the string and bytes types, "x in y" is "True" if and only if *x*\n'
'is a substring of *y*. An equivalent test is "y.find(x) != -1".\n'
'Empty strings are always considered to be a substring of any other\n'
'string, so """ in "abc"" will return "True".\n'
'\n'
'For user-defined classes which define the "__contains__()" method, "x\n'
'in y" returns "True" if "y.__contains__(x)" returns a true value, and\n'
'"False" otherwise.\n'
'\n'
'For user-defined classes which do not define "__contains__()" but do\n'
'define "__iter__()", "x in y" is "True" if some value "z" with "x ==\n'
'z" is produced while iterating over "y". If an exception is raised\n'
'during the iteration, it is as if "in" raised that exception.\n'
'\n'
'Lastly, the old-style iteration protocol is tried: if a class defines\n'
'"__getitem__()", "x in y" is "True" if and only if there is a non-\n'
'negative integer index *i* such that "x == y[i]", and all lower\n'
'integer indices do not raise "IndexError" exception. (If any other\n'
'exception is raised, it is as if "in" raised that exception).\n'
'\n'
'The operator "not in" is defined to have the inverse true value of\n'
'"in".\n',
'integers': '\n'
'Integer and long integer literals\n'
'*********************************\n'
'\n'
'Integer and long integer literals are described by the '
'following\n'
'lexical definitions:\n'
'\n'
' longinteger ::= integer ("l" | "L")\n'
' integer ::= decimalinteger | octinteger | hexinteger | '
'bininteger\n'
' decimalinteger ::= nonzerodigit digit* | "0"\n'
' octinteger ::= "0" ("o" | "O") octdigit+ | "0" octdigit+\n'
' hexinteger ::= "0" ("x" | "X") hexdigit+\n'
' bininteger ::= "0" ("b" | "B") bindigit+\n'
' nonzerodigit ::= "1"..."9"\n'
' octdigit ::= "0"..."7"\n'
' bindigit ::= "0" | "1"\n'
' hexdigit ::= digit | "a"..."f" | "A"..."F"\n'
'\n'
'Although both lower case "\'l\'" and upper case "\'L\'" are '
'allowed as\n'
'suffix for long integers, it is strongly recommended to always '
'use\n'
'"\'L\'", since the letter "\'l\'" looks too much like the digit '
'"\'1\'".\n'
'\n'
'Plain integer literals that are above the largest representable '
'plain\n'
'integer (e.g., 2147483647 when using 32-bit arithmetic) are '
'accepted\n'
'as if they were long integers instead. [1] There is no limit '
'for long\n'
'integer literals apart from what can be stored in available '
'memory.\n'
'\n'
'Some examples of plain integer literals (first row) and long '
'integer\n'
'literals (second and third rows):\n'
'\n'
' 7 2147483647 0177\n'
' 3L 79228162514264337593543950336L 0377L 0x100000000L\n'
' 79228162514264337593543950336 0xdeadbeef\n',
'lambda': '\n'
'Lambdas\n'
'*******\n'
'\n'
' lambda_expr ::= "lambda" [parameter_list]: expression\n'
' old_lambda_expr ::= "lambda" [parameter_list]: old_expression\n'
'\n'
'Lambda expressions (sometimes called lambda forms) have the same\n'
'syntactic position as expressions. They are a shorthand to '
'create\n'
'anonymous functions; the expression "lambda arguments: '
'expression"\n'
'yields a function object. The unnamed object behaves like a '
'function\n'
'object defined with\n'
'\n'
' def name(arguments):\n'
' return expression\n'
'\n'
'See section Function definitions for the syntax of parameter '
'lists.\n'
'Note that functions created with lambda expressions cannot '
'contain\n'
'statements.\n',
'lists': '\n'
'List displays\n'
'*************\n'
'\n'
'A list display is a possibly empty series of expressions enclosed '
'in\n'
'square brackets:\n'
'\n'
' list_display ::= "[" [expression_list | '
'list_comprehension] "]"\n'
' list_comprehension ::= expression list_for\n'
' list_for ::= "for" target_list "in" '
'old_expression_list [list_iter]\n'
' old_expression_list ::= old_expression [("," old_expression)+ '
'[","]]\n'
' old_expression ::= or_test | old_lambda_expr\n'
' list_iter ::= list_for | list_if\n'
' list_if ::= "if" old_expression [list_iter]\n'
'\n'
'A list display yields a new list object. Its contents are '
'specified\n'
'by providing either a list of expressions or a list comprehension.\n'
'When a comma-separated list of expressions is supplied, its '
'elements\n'
'are evaluated from left to right and placed into the list object '
'in\n'
'that order. When a list comprehension is supplied, it consists of '
'a\n'
'single expression followed by at least one "for" clause and zero '
'or\n'
'more "for" or "if" clauses. In this case, the elements of the new\n'
'list are those that would be produced by considering each of the '
'"for"\n'
'or "if" clauses a block, nesting from left to right, and '
'evaluating\n'
'the expression to produce a list element each time the innermost '
'block\n'
'is reached [1].\n',
'naming': '\n'
'Naming and binding\n'
'******************\n'
'\n'
'*Names* refer to objects. Names are introduced by name binding\n'
'operations. Each occurrence of a name in the program text refers '
'to\n'
'the *binding* of that name established in the innermost function '
'block\n'
'containing the use.\n'
'\n'
'A *block* is a piece of Python program text that is executed as a\n'
'unit. The following are blocks: a module, a function body, and a '
'class\n'
'definition. Each command typed interactively is a block. A '
'script\n'
'file (a file given as standard input to the interpreter or '
'specified\n'
'on the interpreter command line the first argument) is a code '
'block.\n'
'A script command (a command specified on the interpreter command '
'line\n'
"with the '**-c**' option) is a code block. The file read by the\n"
'built-in function "execfile()" is a code block. The string '
'argument\n'
'passed to the built-in function "eval()" and to the "exec" '
'statement\n'
'is a code block. The expression read and evaluated by the '
'built-in\n'
'function "input()" is a code block.\n'
'\n'
'A code block is executed in an *execution frame*. A frame '
'contains\n'
'some administrative information (used for debugging) and '
'determines\n'
"where and how execution continues after the code block's execution "
'has\n'
'completed.\n'
'\n'
'A *scope* defines the visibility of a name within a block. If a '
'local\n'
'variable is defined in a block, its scope includes that block. If '
'the\n'
'definition occurs in a function block, the scope extends to any '
'blocks\n'
'contained within the defining one, unless a contained block '
'introduces\n'
'a different binding for the name. The scope of names defined in '
'a\n'
'class block is limited to the class block; it does not extend to '
'the\n'
'code blocks of methods -- this includes generator expressions '
'since\n'
'they are implemented using a function scope. This means that the\n'
'following will fail:\n'
'\n'
' class A:\n'
' a = 42\n'
' b = list(a + i for i in range(10))\n'
'\n'
'When a name is used in a code block, it is resolved using the '
'nearest\n'
'enclosing scope. The set of all such scopes visible to a code '
'block\n'
"is called the block's *environment*.\n"
'\n'
'If a name is bound in a block, it is a local variable of that '
'block.\n'
'If a name is bound at the module level, it is a global variable. '
'(The\n'
'variables of the module code block are local and global.) If a\n'
'variable is used in a code block but not defined there, it is a '
'*free\n'
'variable*.\n'
'\n'
'When a name is not found at all, a "NameError" exception is '
'raised.\n'
'If the name refers to a local variable that has not been bound, a\n'
'"UnboundLocalError" exception is raised. "UnboundLocalError" is '
'a\n'
'subclass of "NameError".\n'
'\n'
'The following constructs bind names: formal parameters to '
'functions,\n'
'"import" statements, class and function definitions (these bind '
'the\n'
'class or function name in the defining block), and targets that '
'are\n'
'identifiers if occurring in an assignment, "for" loop header, in '
'the\n'
'second position of an "except" clause header or after "as" in a '
'"with"\n'
'statement. The "import" statement of the form "from ... import '
'*"\n'
'binds all names defined in the imported module, except those '
'beginning\n'
'with an underscore. This form may only be used at the module '
'level.\n'
'\n'
'A target occurring in a "del" statement is also considered bound '
'for\n'
'this purpose (though the actual semantics are to unbind the '
'name). It\n'
'is illegal to unbind a name that is referenced by an enclosing '
'scope;\n'
'the compiler will report a "SyntaxError".\n'
'\n'
'Each assignment or import statement occurs within a block defined '
'by a\n'
'class or function definition or at the module level (the '
'top-level\n'
'code block).\n'
'\n'
'If a name binding operation occurs anywhere within a code block, '
'all\n'
'uses of the name within the block are treated as references to '
'the\n'
'current block. This can lead to errors when a name is used within '
'a\n'
'block before it is bound. This rule is subtle. Python lacks\n'
'declarations and allows name binding operations to occur anywhere\n'
'within a code block. The local variables of a code block can be\n'
'determined by scanning the entire text of the block for name '
'binding\n'
'operations.\n'
'\n'
'If the global statement occurs within a block, all uses of the '
'name\n'
'specified in the statement refer to the binding of that name in '
'the\n'
'top-level namespace. Names are resolved in the top-level namespace '
'by\n'
'searching the global namespace, i.e. the namespace of the module\n'
'containing the code block, and the builtins namespace, the '
'namespace\n'
'of the module "__builtin__". The global namespace is searched '
'first.\n'
'If the name is not found there, the builtins namespace is '
'searched.\n'
'The global statement must precede all uses of the name.\n'
'\n'
'The builtins namespace associated with the execution of a code '
'block\n'
'is actually found by looking up the name "__builtins__" in its '
'global\n'
'namespace; this should be a dictionary or a module (in the latter '
'case\n'
"the module's dictionary is used). By default, when in the "
'"__main__"\n'
'module, "__builtins__" is the built-in module "__builtin__" (note: '
'no\n'
'\'s\'); when in any other module, "__builtins__" is an alias for '
'the\n'
'dictionary of the "__builtin__" module itself. "__builtins__" can '
'be\n'
'set to a user-created dictionary to create a weak form of '
'restricted\n'
'execution.\n'
'\n'
'**CPython implementation detail:** Users should not touch\n'
'"__builtins__"; it is strictly an implementation detail. Users\n'
'wanting to override values in the builtins namespace should '
'"import"\n'
'the "__builtin__" (no \'s\') module and modify its attributes\n'
'appropriately.\n'
'\n'
'The namespace for a module is automatically created the first time '
'a\n'
'module is imported. The main module for a script is always '
'called\n'
'"__main__".\n'
'\n'
'The "global" statement has the same scope as a name binding '
'operation\n'
'in the same block. If the nearest enclosing scope for a free '
'variable\n'
'contains a global statement, the free variable is treated as a '
'global.\n'
'\n'
'A class definition is an executable statement that may use and '
'define\n'
'names. These references follow the normal rules for name '
'resolution.\n'
'The namespace of the class definition becomes the attribute '
'dictionary\n'
'of the class. Names defined at the class scope are not visible '
'in\n'
'methods.\n'
'\n'
'\n'
'Interaction with dynamic features\n'
'=================================\n'
'\n'
'There are several cases where Python statements are illegal when '
'used\n'
'in conjunction with nested scopes that contain free variables.\n'
'\n'
'If a variable is referenced in an enclosing scope, it is illegal '
'to\n'
'delete the name. An error will be reported at compile time.\n'
'\n'
'If the wild card form of import --- "import *" --- is used in a\n'
'function and the function contains or is a nested block with free\n'
'variables, the compiler will raise a "SyntaxError".\n'
'\n'
'If "exec" is used in a function and the function contains or is a\n'
'nested block with free variables, the compiler will raise a\n'
'"SyntaxError" unless the exec explicitly specifies the local '
'namespace\n'
'for the "exec". (In other words, "exec obj" would be illegal, '
'but\n'
'"exec obj in ns" would be legal.)\n'
'\n'
'The "eval()", "execfile()", and "input()" functions and the '
'"exec"\n'
'statement do not have access to the full environment for '
'resolving\n'
'names. Names may be resolved in the local and global namespaces '
'of\n'
'the caller. Free variables are not resolved in the nearest '
'enclosing\n'
'namespace, but in the global namespace. [1] The "exec" statement '
'and\n'
'the "eval()" and "execfile()" functions have optional arguments '
'to\n'
'override the global and local namespace. If only one namespace '
'is\n'
'specified, it is used for both.\n',
'numbers': '\n'
'Numeric literals\n'
'****************\n'
'\n'
'There are four types of numeric literals: plain integers, long\n'
'integers, floating point numbers, and imaginary numbers. There '
'are no\n'
'complex literals (complex numbers can be formed by adding a real\n'
'number and an imaginary number).\n'
'\n'
'Note that numeric literals do not include a sign; a phrase like '
'"-1"\n'
'is actually an expression composed of the unary operator \'"-"\' '
'and the\n'
'literal "1".\n',
'numeric-types': '\n'
'Emulating numeric types\n'
'***********************\n'
'\n'
'The following methods can be defined to emulate numeric '
'objects.\n'
'Methods corresponding to operations that are not supported '
'by the\n'
'particular kind of number implemented (e.g., bitwise '
'operations for\n'
'non-integral numbers) should be left undefined.\n'
'\n'
'object.__add__(self, other)\n'
'object.__sub__(self, other)\n'
'object.__mul__(self, other)\n'
'object.__floordiv__(self, other)\n'
'object.__mod__(self, other)\n'
'object.__divmod__(self, other)\n'
'object.__pow__(self, other[, modulo])\n'
'object.__lshift__(self, other)\n'
'object.__rshift__(self, other)\n'
'object.__and__(self, other)\n'
'object.__xor__(self, other)\n'
'object.__or__(self, other)\n'
'\n'
' These methods are called to implement the binary '
'arithmetic\n'
' operations ("+", "-", "*", "//", "%", "divmod()", '
'"pow()", "**",\n'
' "<<", ">>", "&", "^", "|"). For instance, to evaluate '
'the\n'
' expression "x + y", where *x* is an instance of a class '
'that has an\n'
' "__add__()" method, "x.__add__(y)" is called. The '
'"__divmod__()"\n'
' method should be the equivalent to using '
'"__floordiv__()" and\n'
' "__mod__()"; it should not be related to "__truediv__()" '
'(described\n'
' below). Note that "__pow__()" should be defined to '
'accept an\n'
' optional third argument if the ternary version of the '
'built-in\n'
' "pow()" function is to be supported.\n'
'\n'
' If one of those methods does not support the operation '
'with the\n'
' supplied arguments, it should return "NotImplemented".\n'
'\n'
'object.__div__(self, other)\n'
'object.__truediv__(self, other)\n'
'\n'
' The division operator ("/") is implemented by these '
'methods. The\n'
' "__truediv__()" method is used when '
'"__future__.division" is in\n'
' effect, otherwise "__div__()" is used. If only one of '
'these two\n'
' methods is defined, the object will not support division '
'in the\n'
' alternate context; "TypeError" will be raised instead.\n'
'\n'
'object.__radd__(self, other)\n'
'object.__rsub__(self, other)\n'
'object.__rmul__(self, other)\n'
'object.__rdiv__(self, other)\n'
'object.__rtruediv__(self, other)\n'
'object.__rfloordiv__(self, other)\n'
'object.__rmod__(self, other)\n'
'object.__rdivmod__(self, other)\n'
'object.__rpow__(self, other)\n'
'object.__rlshift__(self, other)\n'
'object.__rrshift__(self, other)\n'
'object.__rand__(self, other)\n'
'object.__rxor__(self, other)\n'
'object.__ror__(self, other)\n'
'\n'
' These methods are called to implement the binary '
'arithmetic\n'
' operations ("+", "-", "*", "/", "%", "divmod()", '
'"pow()", "**",\n'
' "<<", ">>", "&", "^", "|") with reflected (swapped) '
'operands.\n'
' These functions are only called if the left operand does '
'not\n'
' support the corresponding operation and the operands are '
'of\n'
' different types. [2] For instance, to evaluate the '
'expression "x -\n'
' y", where *y* is an instance of a class that has an '
'"__rsub__()"\n'
' method, "y.__rsub__(x)" is called if "x.__sub__(y)" '
'returns\n'
' *NotImplemented*.\n'
'\n'
' Note that ternary "pow()" will not try calling '
'"__rpow__()" (the\n'
' coercion rules would become too complicated).\n'
'\n'
" Note: If the right operand's type is a subclass of the "
'left\n'
" operand's type and that subclass provides the "
'reflected method\n'
' for the operation, this method will be called before '
'the left\n'
" operand's non-reflected method. This behavior allows "
'subclasses\n'
" to override their ancestors' operations.\n"
'\n'
'object.__iadd__(self, other)\n'
'object.__isub__(self, other)\n'
'object.__imul__(self, other)\n'
'object.__idiv__(self, other)\n'
'object.__itruediv__(self, other)\n'
'object.__ifloordiv__(self, other)\n'
'object.__imod__(self, other)\n'
'object.__ipow__(self, other[, modulo])\n'
'object.__ilshift__(self, other)\n'
'object.__irshift__(self, other)\n'
'object.__iand__(self, other)\n'
'object.__ixor__(self, other)\n'
'object.__ior__(self, other)\n'
'\n'
' These methods are called to implement the augmented '
'arithmetic\n'
' assignments ("+=", "-=", "*=", "/=", "//=", "%=", "**=", '
'"<<=",\n'
' ">>=", "&=", "^=", "|="). These methods should attempt '
'to do the\n'
' operation in-place (modifying *self*) and return the '
'result (which\n'
' could be, but does not have to be, *self*). If a '
'specific method\n'
' is not defined, the augmented assignment falls back to '
'the normal\n'
' methods. For instance, to execute the statement "x += '
'y", where\n'
' *x* is an instance of a class that has an "__iadd__()" '
'method,\n'
' "x.__iadd__(y)" is called. If *x* is an instance of a '
'class that\n'
' does not define a "__iadd__()" method, "x.__add__(y)" '
'and\n'
' "y.__radd__(x)" are considered, as with the evaluation '
'of "x + y".\n'
'\n'
'object.__neg__(self)\n'
'object.__pos__(self)\n'
'object.__abs__(self)\n'
'object.__invert__(self)\n'
'\n'
' Called to implement the unary arithmetic operations '
'("-", "+",\n'
' "abs()" and "~").\n'
'\n'
'object.__complex__(self)\n'
'object.__int__(self)\n'
'object.__long__(self)\n'
'object.__float__(self)\n'
'\n'
' Called to implement the built-in functions "complex()", '
'"int()",\n'
' "long()", and "float()". Should return a value of the '
'appropriate\n'
' type.\n'
'\n'
'object.__oct__(self)\n'
'object.__hex__(self)\n'
'\n'
' Called to implement the built-in functions "oct()" and '
'"hex()".\n'
' Should return a string value.\n'
'\n'
'object.__index__(self)\n'
'\n'
' Called to implement "operator.index()". Also called '
'whenever\n'
' Python needs an integer object (such as in slicing). '
'Must return\n'
' an integer (int or long).\n'
'\n'
' New in version 2.5.\n'
'\n'
'object.__coerce__(self, other)\n'
'\n'
' Called to implement "mixed-mode" numeric arithmetic. '
'Should either\n'
' return a 2-tuple containing *self* and *other* converted '
'to a\n'
' common numeric type, or "None" if conversion is '
'impossible. When\n'
' the common type would be the type of "other", it is '
'sufficient to\n'
' return "None", since the interpreter will also ask the '
'other object\n'
' to attempt a coercion (but sometimes, if the '
'implementation of the\n'
' other type cannot be changed, it is useful to do the '
'conversion to\n'
' the other type here). A return value of '
'"NotImplemented" is\n'
' equivalent to returning "None".\n',
'objects': '\n'
'Objects, values and types\n'
'*************************\n'
'\n'
"*Objects* are Python's abstraction for data. All data in a "
'Python\n'
'program is represented by objects or by relations between '
'objects. (In\n'
'a sense, and in conformance to Von Neumann\'s model of a "stored\n'
'program computer," code is also represented by objects.)\n'
'\n'
"Every object has an identity, a type and a value. An object's\n"
'*identity* never changes once it has been created; you may think '
'of it\n'
'as the object\'s address in memory. The \'"is"\' operator '
'compares the\n'
'identity of two objects; the "id()" function returns an integer\n'
'representing its identity (currently implemented as its address). '
'An\n'
"object's *type* is also unchangeable. [1] An object's type "
'determines\n'
'the operations that the object supports (e.g., "does it have a\n'
'length?") and also defines the possible values for objects of '
'that\n'
'type. The "type()" function returns an object\'s type (which is '
'an\n'
'object itself). The *value* of some objects can change. '
'Objects\n'
'whose value can change are said to be *mutable*; objects whose '
'value\n'
'is unchangeable once they are created are called *immutable*. '
'(The\n'
'value of an immutable container object that contains a reference '
'to a\n'
"mutable object can change when the latter's value is changed; "
'however\n'
'the container is still considered immutable, because the '
'collection of\n'
'objects it contains cannot be changed. So, immutability is not\n'
'strictly the same as having an unchangeable value, it is more '
'subtle.)\n'
"An object's mutability is determined by its type; for instance,\n"
'numbers, strings and tuples are immutable, while dictionaries '
'and\n'
'lists are mutable.\n'
'\n'
'Objects are never explicitly destroyed; however, when they '
'become\n'
'unreachable they may be garbage-collected. An implementation is\n'
'allowed to postpone garbage collection or omit it altogether --- '
'it is\n'
'a matter of implementation quality how garbage collection is\n'
'implemented, as long as no objects are collected that are still\n'
'reachable.\n'
'\n'
'**CPython implementation detail:** CPython currently uses a '
'reference-\n'
'counting scheme with (optional) delayed detection of cyclically '
'linked\n'
'garbage, which collects most objects as soon as they become\n'
'unreachable, but is not guaranteed to collect garbage containing\n'
'circular references. See the documentation of the "gc" module '
'for\n'
'information on controlling the collection of cyclic garbage. '
'Other\n'
'implementations act differently and CPython may change. Do not '
'depend\n'
'on immediate finalization of objects when they become unreachable '
'(ex:\n'
'always close files).\n'
'\n'
"Note that the use of the implementation's tracing or debugging\n"
'facilities may keep objects alive that would normally be '
'collectable.\n'
'Also note that catching an exception with a \'"try"..."except"\'\n'
'statement may keep objects alive.\n'
'\n'
'Some objects contain references to "external" resources such as '
'open\n'
'files or windows. It is understood that these resources are '
'freed\n'
'when the object is garbage-collected, but since garbage '
'collection is\n'
'not guaranteed to happen, such objects also provide an explicit '
'way to\n'
'release the external resource, usually a "close()" method. '
'Programs\n'
'are strongly recommended to explicitly close such objects. The\n'
'\'"try"..."finally"\' statement provides a convenient way to do '
'this.\n'
'\n'
'Some objects contain references to other objects; these are '
'called\n'
'*containers*. Examples of containers are tuples, lists and\n'
"dictionaries. The references are part of a container's value. "
'In\n'
'most cases, when we talk about the value of a container, we imply '
'the\n'
'values, not the identities of the contained objects; however, '
'when we\n'
'talk about the mutability of a container, only the identities of '
'the\n'
'immediately contained objects are implied. So, if an immutable\n'
'container (like a tuple) contains a reference to a mutable '
'object, its\n'
'value changes if that mutable object is changed.\n'
'\n'
'Types affect almost all aspects of object behavior. Even the\n'
'importance of object identity is affected in some sense: for '
'immutable\n'
'types, operations that compute new values may actually return a\n'
'reference to any existing object with the same type and value, '
'while\n'
'for mutable objects this is not allowed. E.g., after "a = 1; b = '
'1",\n'
'"a" and "b" may or may not refer to the same object with the '
'value\n'
'one, depending on the implementation, but after "c = []; d = []", '
'"c"\n'
'and "d" are guaranteed to refer to two different, unique, newly\n'
'created empty lists. (Note that "c = d = []" assigns the same '
'object\n'
'to both "c" and "d".)\n',
'operator-summary': '\n'
'Operator precedence\n'
'*******************\n'
'\n'
'The following table summarizes the operator precedences '
'in Python,\n'
'from lowest precedence (least binding) to highest '
'precedence (most\n'
'binding). Operators in the same box have the same '
'precedence. Unless\n'
'the syntax is explicitly given, operators are binary. '
'Operators in\n'
'the same box group left to right (except for '
'comparisons, including\n'
'tests, which all have the same precedence and chain from '
'left to right\n'
'--- see section Comparisons --- and exponentiation, '
'which groups from\n'
'right to left).\n'
'\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| Operator | '
'Description |\n'
'+=================================================+=======================================+\n'
'| "lambda" | '
'Lambda expression |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "if" -- "else" | '
'Conditional expression |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "or" | '
'Boolean OR |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "and" | '
'Boolean AND |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "not" "x" | '
'Boolean NOT |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "in", "not in", "is", "is not", "<", "<=", ">", | '
'Comparisons, including membership |\n'
'| ">=", "<>", "!=", "==" | '
'tests and identity tests |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "|" | '
'Bitwise OR |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "^" | '
'Bitwise XOR |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "&" | '
'Bitwise AND |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "<<", ">>" | '
'Shifts |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "+", "-" | '
'Addition and subtraction |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "*", "/", "//", "%" | '
'Multiplication, division, remainder |\n'
'| | '
'[7] |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "+x", "-x", "~x" | '
'Positive, negative, bitwise NOT |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "**" | '
'Exponentiation [8] |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "x[index]", "x[index:index]", | '
'Subscription, slicing, call, |\n'
'| "x(arguments...)", "x.attribute" | '
'attribute reference |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "(expressions...)", "[expressions...]", "{key: | '
'Binding or tuple display, list |\n'
'| value...}", "`expressions...`" | '
'display, dictionary display, string |\n'
'| | '
'conversion |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] In Python 2.3 and later releases, a list '
'comprehension "leaks"\n'
' the control variables of each "for" it contains into '
'the\n'
' containing scope. However, this behavior is '
'deprecated, and\n'
' relying on it will not work in Python 3.\n'
'\n'
'[2] While "abs(x%y) < abs(y)" is true mathematically, '
'for floats\n'
' it may not be true numerically due to roundoff. For '
'example, and\n'
' assuming a platform on which a Python float is an '
'IEEE 754 double-\n'
' precision number, in order that "-1e-100 % 1e100" '
'have the same\n'
' sign as "1e100", the computed result is "-1e-100 + '
'1e100", which\n'
' is numerically exactly equal to "1e100". The '
'function\n'
' "math.fmod()" returns a result whose sign matches '
'the sign of the\n'
' first argument instead, and so returns "-1e-100" in '
'this case.\n'
' Which approach is more appropriate depends on the '
'application.\n'
'\n'
'[3] If x is very close to an exact integer multiple of '
"y, it's\n"
' possible for "floor(x/y)" to be one larger than '
'"(x-x%y)/y" due to\n'
' rounding. In such cases, Python returns the latter '
'result, in\n'
' order to preserve that "divmod(x,y)[0] * y + x % y" '
'be very close\n'
' to "x".\n'
'\n'
'[4] The Unicode standard distinguishes between *code '
'points* (e.g.\n'
' U+0041) and *abstract characters* (e.g. "LATIN '
'CAPITAL LETTER A").\n'
' While most abstract characters in Unicode are only '
'represented\n'
' using one code point, there is a number of abstract '
'characters\n'
' that can in addition be represented using a sequence '
'of more than\n'
' one code point. For example, the abstract character '
'"LATIN\n'
' CAPITAL LETTER C WITH CEDILLA" can be represented as '
'a single\n'
' *precomposed character* at code position U+00C7, or '
'as a sequence\n'
' of a *base character* at code position U+0043 (LATIN '
'CAPITAL\n'
' LETTER C), followed by a *combining character* at '
'code position\n'
' U+0327 (COMBINING CEDILLA).\n'
'\n'
' The comparison operators on unicode strings compare '
'at the level\n'
' of Unicode code points. This may be '
'counter-intuitive to humans.\n'
' For example, "u"\\u00C7" == u"\\u0043\\u0327"" is '
'"False", even\n'
' though both strings represent the same abstract '
'character "LATIN\n'
' CAPITAL LETTER C WITH CEDILLA".\n'
'\n'
' To compare strings at the level of abstract '
'characters (that is,\n'
' in a way intuitive to humans), use '
'"unicodedata.normalize()".\n'
'\n'
'[5] Earlier versions of Python used lexicographic '
'comparison of\n'
' the sorted (key, value) lists, but this was very '
'expensive for the\n'
' common case of comparing for equality. An even '
'earlier version of\n'
' Python compared dictionaries by identity only, but '
'this caused\n'
' surprises because people expected to be able to test '
'a dictionary\n'
' for emptiness by comparing it to "{}".\n'
'\n'
'[6] Due to automatic garbage-collection, free lists, and '
'the\n'
' dynamic nature of descriptors, you may notice '
'seemingly unusual\n'
' behaviour in certain uses of the "is" operator, like '
'those\n'
' involving comparisons between instance methods, or '
'constants.\n'
' Check their documentation for more info.\n'
'\n'
'[7] The "%" operator is also used for string formatting; '
'the same\n'
' precedence applies.\n'
'\n'
'[8] The power operator "**" binds less tightly than an '
'arithmetic\n'
' or bitwise unary operator on its right, that is, '
'"2**-1" is "0.5".\n',
'pass': '\n'
'The "pass" statement\n'
'********************\n'
'\n'
' pass_stmt ::= "pass"\n'
'\n'
'"pass" is a null operation --- when it is executed, nothing '
'happens.\n'
'It is useful as a placeholder when a statement is required\n'
'syntactically, but no code needs to be executed, for example:\n'
'\n'
' def f(arg): pass # a function that does nothing (yet)\n'
'\n'
' class C: pass # a class with no methods (yet)\n',
'power': '\n'
'The power operator\n'
'******************\n'
'\n'
'The power operator binds more tightly than unary operators on its\n'
'left; it binds less tightly than unary operators on its right. '
'The\n'
'syntax is:\n'
'\n'
' power ::= primary ["**" u_expr]\n'
'\n'
'Thus, in an unparenthesized sequence of power and unary operators, '
'the\n'
'operators are evaluated from right to left (this does not '
'constrain\n'
'the evaluation order for the operands): "-1**2" results in "-1".\n'
'\n'
'The power operator has the same semantics as the built-in "pow()"\n'
'function, when called with two arguments: it yields its left '
'argument\n'
'raised to the power of its right argument. The numeric arguments '
'are\n'
'first converted to a common type. The result type is that of the\n'
'arguments after coercion.\n'
'\n'
'With mixed operand types, the coercion rules for binary arithmetic\n'
'operators apply. For int and long int operands, the result has the\n'
'same type as the operands (after coercion) unless the second '
'argument\n'
'is negative; in that case, all arguments are converted to float and '
'a\n'
'float result is delivered. For example, "10**2" returns "100", but\n'
'"10**-2" returns "0.01". (This last feature was added in Python '
'2.2.\n'
'In Python 2.1 and before, if both arguments were of integer types '
'and\n'
'the second argument was negative, an exception was raised).\n'
'\n'
'Raising "0.0" to a negative power results in a '
'"ZeroDivisionError".\n'
'Raising a negative number to a fractional power results in a\n'
'"ValueError".\n',
'print': '\n'
'The "print" statement\n'
'*********************\n'
'\n'
' print_stmt ::= "print" ([expression ("," expression)* [","]]\n'
' | ">>" expression [("," expression)+ [","]])\n'
'\n'
'"print" evaluates each expression in turn and writes the resulting\n'
'object to standard output (see below). If an object is not a '
'string,\n'
'it is first converted to a string using the rules for string\n'
'conversions. The (resulting or original) string is then written. '
'A\n'
'space is written before each object is (converted and) written, '
'unless\n'
'the output system believes it is positioned at the beginning of a\n'
'line. This is the case (1) when no characters have yet been '
'written\n'
'to standard output, (2) when the last character written to '
'standard\n'
'output is a whitespace character except "\' \'", or (3) when the '
'last\n'
'write operation on standard output was not a "print" statement. '
'(In\n'
'some cases it may be functional to write an empty string to '
'standard\n'
'output for this reason.)\n'
'\n'
'Note: Objects which act like file objects but which are not the\n'
' built-in file objects often do not properly emulate this aspect '
'of\n'
" the file object's behavior, so it is best not to rely on this.\n"
'\n'
'A "\'\\n\'" character is written at the end, unless the "print" '
'statement\n'
'ends with a comma. This is the only action if the statement '
'contains\n'
'just the keyword "print".\n'
'\n'
'Standard output is defined as the file object named "stdout" in '
'the\n'
'built-in module "sys". If no such object exists, or if it does '
'not\n'
'have a "write()" method, a "RuntimeError" exception is raised.\n'
'\n'
'"print" also has an extended form, defined by the second portion '
'of\n'
'the syntax described above. This form is sometimes referred to as\n'
'""print" chevron." In this form, the first expression after the '
'">>"\n'
'must evaluate to a "file-like" object, specifically an object that '
'has\n'
'a "write()" method as described above. With this extended form, '
'the\n'
'subsequent expressions are printed to this file object. If the '
'first\n'
'expression evaluates to "None", then "sys.stdout" is used as the '
'file\n'
'for output.\n',
'raise': '\n'
'The "raise" statement\n'
'*********************\n'
'\n'
' raise_stmt ::= "raise" [expression ["," expression ["," '
'expression]]]\n'
'\n'
'If no expressions are present, "raise" re-raises the last '
'exception\n'
'that was active in the current scope. If no exception is active '
'in\n'
'the current scope, a "TypeError" exception is raised indicating '
'that\n'
'this is an error (if running under IDLE, a "Queue.Empty" exception '
'is\n'
'raised instead).\n'
'\n'
'Otherwise, "raise" evaluates the expressions to get three objects,\n'
'using "None" as the value of omitted expressions. The first two\n'
'objects are used to determine the *type* and *value* of the '
'exception.\n'
'\n'
'If the first object is an instance, the type of the exception is '
'the\n'
'class of the instance, the instance itself is the value, and the\n'
'second object must be "None".\n'
'\n'
'If the first object is a class, it becomes the type of the '
'exception.\n'
'The second object is used to determine the exception value: If it '
'is\n'
'an instance of the class, the instance becomes the exception value. '
'If\n'
'the second object is a tuple, it is used as the argument list for '
'the\n'
'class constructor; if it is "None", an empty argument list is '
'used,\n'
'and any other object is treated as a single argument to the\n'
'constructor. The instance so created by calling the constructor '
'is\n'
'used as the exception value.\n'
'\n'
'If a third object is present and not "None", it must be a '
'traceback\n'
'object (see section The standard type hierarchy), and it is\n'
'substituted instead of the current location as the place where the\n'
'exception occurred. If the third object is present and not a\n'
'traceback object or "None", a "TypeError" exception is raised. '
'The\n'
'three-expression form of "raise" is useful to re-raise an '
'exception\n'
'transparently in an except clause, but "raise" with no expressions\n'
'should be preferred if the exception to be re-raised was the most\n'
'recently active exception in the current scope.\n'
'\n'
'Additional information on exceptions can be found in section\n'
'Exceptions, and information about handling exceptions is in '
'section\n'
'The try statement.\n',
'return': '\n'
'The "return" statement\n'
'**********************\n'
'\n'
' return_stmt ::= "return" [expression_list]\n'
'\n'
'"return" may only occur syntactically nested in a function '
'definition,\n'
'not within a nested class definition.\n'
'\n'
'If an expression list is present, it is evaluated, else "None" is\n'
'substituted.\n'
'\n'
'"return" leaves the current function call with the expression list '
'(or\n'
'"None") as return value.\n'
'\n'
'When "return" passes control out of a "try" statement with a '
'"finally"\n'
'clause, that "finally" clause is executed before really leaving '
'the\n'
'function.\n'
'\n'
'In a generator function, the "return" statement is not allowed to\n'
'include an "expression_list". In that context, a bare "return"\n'
'indicates that the generator is done and will cause '
'"StopIteration" to\n'
'be raised.\n',
'sequence-types': '\n'
'Emulating container types\n'
'*************************\n'
'\n'
'The following methods can be defined to implement '
'container objects.\n'
'Containers usually are sequences (such as lists or tuples) '
'or mappings\n'
'(like dictionaries), but can represent other containers as '
'well. The\n'
'first set of methods is used either to emulate a sequence '
'or to\n'
'emulate a mapping; the difference is that for a sequence, '
'the\n'
'allowable keys should be the integers *k* for which "0 <= '
'k < N" where\n'
'*N* is the length of the sequence, or slice objects, which '
'define a\n'
'range of items. (For backwards compatibility, the method\n'
'"__getslice__()" (see below) can also be defined to handle '
'simple, but\n'
'not extended slices.) It is also recommended that mappings '
'provide the\n'
'methods "keys()", "values()", "items()", "has_key()", '
'"get()",\n'
'"clear()", "setdefault()", "iterkeys()", "itervalues()",\n'
'"iteritems()", "pop()", "popitem()", "copy()", and '
'"update()" behaving\n'
"similar to those for Python's standard dictionary "
'objects. The\n'
'"UserDict" module provides a "DictMixin" class to help '
'create those\n'
'methods from a base set of "__getitem__()", '
'"__setitem__()",\n'
'"__delitem__()", and "keys()". Mutable sequences should '
'provide\n'
'methods "append()", "count()", "index()", "extend()", '
'"insert()",\n'
'"pop()", "remove()", "reverse()" and "sort()", like Python '
'standard\n'
'list objects. Finally, sequence types should implement '
'addition\n'
'(meaning concatenation) and multiplication (meaning '
'repetition) by\n'
'defining the methods "__add__()", "__radd__()", '
'"__iadd__()",\n'
'"__mul__()", "__rmul__()" and "__imul__()" described '
'below; they\n'
'should not define "__coerce__()" or other numerical '
'operators. It is\n'
'recommended that both mappings and sequences implement '
'the\n'
'"__contains__()" method to allow efficient use of the "in" '
'operator;\n'
'for mappings, "in" should be equivalent of "has_key()"; '
'for sequences,\n'
'it should search through the values. It is further '
'recommended that\n'
'both mappings and sequences implement the "__iter__()" '
'method to allow\n'
'efficient iteration through the container; for mappings, '
'"__iter__()"\n'
'should be the same as "iterkeys()"; for sequences, it '
'should iterate\n'
'through the values.\n'
'\n'
'object.__len__(self)\n'
'\n'
' Called to implement the built-in function "len()". '
'Should return\n'
' the length of the object, an integer ">=" 0. Also, an '
'object that\n'
' doesn\'t define a "__nonzero__()" method and whose '
'"__len__()"\n'
' method returns zero is considered to be false in a '
'Boolean context.\n'
'\n'
' **CPython implementation detail:** In CPython, the '
'length is\n'
' required to be at most "sys.maxsize". If the length is '
'larger than\n'
' "sys.maxsize" some features (such as "len()") may '
'raise\n'
' "OverflowError". To prevent raising "OverflowError" by '
'truth value\n'
' testing, an object must define a "__nonzero__()" '
'method.\n'
'\n'
'object.__getitem__(self, key)\n'
'\n'
' Called to implement evaluation of "self[key]". For '
'sequence types,\n'
' the accepted keys should be integers and slice '
'objects. Note that\n'
' the special interpretation of negative indexes (if the '
'class wishes\n'
' to emulate a sequence type) is up to the '
'"__getitem__()" method. If\n'
' *key* is of an inappropriate type, "TypeError" may be '
'raised; if of\n'
' a value outside the set of indexes for the sequence '
'(after any\n'
' special interpretation of negative values), '
'"IndexError" should be\n'
' raised. For mapping types, if *key* is missing (not in '
'the\n'
' container), "KeyError" should be raised.\n'
'\n'
' Note: "for" loops expect that an "IndexError" will be '
'raised for\n'
' illegal indexes to allow proper detection of the end '
'of the\n'
' sequence.\n'
'\n'
'object.__missing__(self, key)\n'
'\n'
' Called by "dict"."__getitem__()" to implement '
'"self[key]" for dict\n'
' subclasses when key is not in the dictionary.\n'
'\n'
'object.__setitem__(self, key, value)\n'
'\n'
' Called to implement assignment to "self[key]". Same '
'note as for\n'
' "__getitem__()". This should only be implemented for '
'mappings if\n'
' the objects support changes to the values for keys, or '
'if new keys\n'
' can be added, or for sequences if elements can be '
'replaced. The\n'
' same exceptions should be raised for improper *key* '
'values as for\n'
' the "__getitem__()" method.\n'
'\n'
'object.__delitem__(self, key)\n'
'\n'
' Called to implement deletion of "self[key]". Same note '
'as for\n'
' "__getitem__()". This should only be implemented for '
'mappings if\n'
' the objects support removal of keys, or for sequences '
'if elements\n'
' can be removed from the sequence. The same exceptions '
'should be\n'
' raised for improper *key* values as for the '
'"__getitem__()" method.\n'
'\n'
'object.__iter__(self)\n'
'\n'
' This method is called when an iterator is required for '
'a container.\n'
' This method should return a new iterator object that '
'can iterate\n'
' over all the objects in the container. For mappings, '
'it should\n'
' iterate over the keys of the container, and should also '
'be made\n'
' available as the method "iterkeys()".\n'
'\n'
' Iterator objects also need to implement this method; '
'they are\n'
' required to return themselves. For more information on '
'iterator\n'
' objects, see Iterator Types.\n'
'\n'
'object.__reversed__(self)\n'
'\n'
' Called (if present) by the "reversed()" built-in to '
'implement\n'
' reverse iteration. It should return a new iterator '
'object that\n'
' iterates over all the objects in the container in '
'reverse order.\n'
'\n'
' If the "__reversed__()" method is not provided, the '
'"reversed()"\n'
' built-in will fall back to using the sequence protocol '
'("__len__()"\n'
' and "__getitem__()"). Objects that support the '
'sequence protocol\n'
' should only provide "__reversed__()" if they can '
'provide an\n'
' implementation that is more efficient than the one '
'provided by\n'
' "reversed()".\n'
'\n'
' New in version 2.6.\n'
'\n'
'The membership test operators ("in" and "not in") are '
'normally\n'
'implemented as an iteration through a sequence. However, '
'container\n'
'objects can supply the following special method with a '
'more efficient\n'
'implementation, which also does not require the object be '
'a sequence.\n'
'\n'
'object.__contains__(self, item)\n'
'\n'
' Called to implement membership test operators. Should '
'return true\n'
' if *item* is in *self*, false otherwise. For mapping '
'objects, this\n'
' should consider the keys of the mapping rather than the '
'values or\n'
' the key-item pairs.\n'
'\n'
' For objects that don\'t define "__contains__()", the '
'membership test\n'
' first tries iteration via "__iter__()", then the old '
'sequence\n'
' iteration protocol via "__getitem__()", see this '
'section in the\n'
' language reference.\n',
'shifting': '\n'
'Shifting operations\n'
'*******************\n'
'\n'
'The shifting operations have lower priority than the arithmetic\n'
'operations:\n'
'\n'
' shift_expr ::= a_expr | shift_expr ( "<<" | ">>" ) a_expr\n'
'\n'
'These operators accept plain or long integers as arguments. '
'The\n'
'arguments are converted to a common type. They shift the first\n'
'argument to the left or right by the number of bits given by '
'the\n'
'second argument.\n'
'\n'
'A right shift by *n* bits is defined as division by "pow(2, '
'n)". A\n'
'left shift by *n* bits is defined as multiplication with "pow(2, '
'n)".\n'
'Negative shift counts raise a "ValueError" exception.\n'
'\n'
'Note: In the current implementation, the right-hand operand is\n'
' required to be at most "sys.maxsize". If the right-hand '
'operand is\n'
' larger than "sys.maxsize" an "OverflowError" exception is '
'raised.\n',
'slicings': '\n'
'Slicings\n'
'********\n'
'\n'
'A slicing selects a range of items in a sequence object (e.g., '
'a\n'
'string, tuple or list). Slicings may be used as expressions or '
'as\n'
'targets in assignment or "del" statements. The syntax for a '
'slicing:\n'
'\n'
' slicing ::= simple_slicing | extended_slicing\n'
' simple_slicing ::= primary "[" short_slice "]"\n'
' extended_slicing ::= primary "[" slice_list "]"\n'
' slice_list ::= slice_item ("," slice_item)* [","]\n'
' slice_item ::= expression | proper_slice | ellipsis\n'
' proper_slice ::= short_slice | long_slice\n'
' short_slice ::= [lower_bound] ":" [upper_bound]\n'
' long_slice ::= short_slice ":" [stride]\n'
' lower_bound ::= expression\n'
' upper_bound ::= expression\n'
' stride ::= expression\n'
' ellipsis ::= "..."\n'
'\n'
'There is ambiguity in the formal syntax here: anything that '
'looks like\n'
'an expression list also looks like a slice list, so any '
'subscription\n'
'can be interpreted as a slicing. Rather than further '
'complicating the\n'
'syntax, this is disambiguated by defining that in this case the\n'
'interpretation as a subscription takes priority over the\n'
'interpretation as a slicing (this is the case if the slice list\n'
'contains no proper slice nor ellipses). Similarly, when the '
'slice\n'
'list has exactly one short slice and no trailing comma, the\n'
'interpretation as a simple slicing takes priority over that as '
'an\n'
'extended slicing.\n'
'\n'
'The semantics for a simple slicing are as follows. The primary '
'must\n'
'evaluate to a sequence object. The lower and upper bound '
'expressions,\n'
'if present, must evaluate to plain integers; defaults are zero '
'and the\n'
'"sys.maxint", respectively. If either bound is negative, the\n'
"sequence's length is added to it. The slicing now selects all "
'items\n'
'with index *k* such that "i <= k < j" where *i* and *j* are the\n'
'specified lower and upper bounds. This may be an empty '
'sequence. It\n'
'is not an error if *i* or *j* lie outside the range of valid '
'indexes\n'
"(such items don't exist so they aren't selected).\n"
'\n'
'The semantics for an extended slicing are as follows. The '
'primary\n'
'must evaluate to a mapping object, and it is indexed with a key '
'that\n'
'is constructed from the slice list, as follows. If the slice '
'list\n'
'contains at least one comma, the key is a tuple containing the\n'
'conversion of the slice items; otherwise, the conversion of the '
'lone\n'
'slice item is the key. The conversion of a slice item that is '
'an\n'
'expression is that expression. The conversion of an ellipsis '
'slice\n'
'item is the built-in "Ellipsis" object. The conversion of a '
'proper\n'
'slice is a slice object (see section The standard type '
'hierarchy)\n'
'whose "start", "stop" and "step" attributes are the values of '
'the\n'
'expressions given as lower bound, upper bound and stride,\n'
'respectively, substituting "None" for missing expressions.\n',
'specialattrs': '\n'
'Special Attributes\n'
'******************\n'
'\n'
'The implementation adds a few special read-only attributes '
'to several\n'
'object types, where they are relevant. Some of these are '
'not reported\n'
'by the "dir()" built-in function.\n'
'\n'
'object.__dict__\n'
'\n'
' A dictionary or other mapping object used to store an '
"object's\n"
' (writable) attributes.\n'
'\n'
'object.__methods__\n'
'\n'
' Deprecated since version 2.2: Use the built-in function '
'"dir()" to\n'
" get a list of an object's attributes. This attribute is "
'no longer\n'
' available.\n'
'\n'
'object.__members__\n'
'\n'
' Deprecated since version 2.2: Use the built-in function '
'"dir()" to\n'
" get a list of an object's attributes. This attribute is "
'no longer\n'
' available.\n'
'\n'
'instance.__class__\n'
'\n'
' The class to which a class instance belongs.\n'
'\n'
'class.__bases__\n'
'\n'
' The tuple of base classes of a class object.\n'
'\n'
'definition.__name__\n'
'\n'
' The name of the class, type, function, method, '
'descriptor, or\n'
' generator instance.\n'
'\n'
'The following attributes are only supported by *new-style '
'class*es.\n'
'\n'
'class.__mro__\n'
'\n'
' This attribute is a tuple of classes that are considered '
'when\n'
' looking for base classes during method resolution.\n'
'\n'
'class.mro()\n'
'\n'
' This method can be overridden by a metaclass to customize '
'the\n'
' method resolution order for its instances. It is called '
'at class\n'
' instantiation, and its result is stored in "__mro__".\n'
'\n'
'class.__subclasses__()\n'
'\n'
' Each new-style class keeps a list of weak references to '
'its\n'
' immediate subclasses. This method returns a list of all '
'those\n'
' references still alive. Example:\n'
'\n'
' >>> int.__subclasses__()\n'
" [<type 'bool'>]\n"
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] Additional information on these special methods may be '
'found\n'
' in the Python Reference Manual (Basic customization).\n'
'\n'
'[2] As a consequence, the list "[1, 2]" is considered equal '
'to\n'
' "[1.0, 2.0]", and similarly for tuples.\n'
'\n'
"[3] They must have since the parser can't tell the type of "
'the\n'
' operands.\n'
'\n'
'[4] Cased characters are those with general category '
'property\n'
' being one of "Lu" (Letter, uppercase), "Ll" (Letter, '
'lowercase),\n'
' or "Lt" (Letter, titlecase).\n'
'\n'
'[5] To format only a tuple you should therefore provide a\n'
' singleton tuple whose only element is the tuple to be '
'formatted.\n'
'\n'
'[6] The advantage of leaving the newline on is that '
'returning an\n'
' empty string is then an unambiguous EOF indication. It '
'is also\n'
' possible (in cases where it might matter, for example, '
'if you want\n'
' to make an exact copy of a file while scanning its '
'lines) to tell\n'
' whether the last line of a file ended in a newline or '
'not (yes\n'
' this happens!).\n',
'specialnames': '\n'
'Special method names\n'
'********************\n'
'\n'
'A class can implement certain operations that are invoked by '
'special\n'
'syntax (such as arithmetic operations or subscripting and '
'slicing) by\n'
"defining methods with special names. This is Python's "
'approach to\n'
'*operator overloading*, allowing classes to define their own '
'behavior\n'
'with respect to language operators. For instance, if a '
'class defines\n'
'a method named "__getitem__()", and "x" is an instance of '
'this class,\n'
'then "x[i]" is roughly equivalent to "x.__getitem__(i)" for '
'old-style\n'
'classes and "type(x).__getitem__(x, i)" for new-style '
'classes. Except\n'
'where mentioned, attempts to execute an operation raise an '
'exception\n'
'when no appropriate method is defined (typically '
'"AttributeError" or\n'
'"TypeError").\n'
'\n'
'When implementing a class that emulates any built-in type, '
'it is\n'
'important that the emulation only be implemented to the '
'degree that it\n'
'makes sense for the object being modelled. For example, '
'some\n'
'sequences may work well with retrieval of individual '
'elements, but\n'
'extracting a slice may not make sense. (One example of this '
'is the\n'
'"NodeList" interface in the W3C\'s Document Object Model.)\n'
'\n'
'\n'
'Basic customization\n'
'===================\n'
'\n'
'object.__new__(cls[, ...])\n'
'\n'
' Called to create a new instance of class *cls*. '
'"__new__()" is a\n'
' static method (special-cased so you need not declare it '
'as such)\n'
' that takes the class of which an instance was requested '
'as its\n'
' first argument. The remaining arguments are those passed '
'to the\n'
' object constructor expression (the call to the class). '
'The return\n'
' value of "__new__()" should be the new object instance '
'(usually an\n'
' instance of *cls*).\n'
'\n'
' Typical implementations create a new instance of the '
'class by\n'
' invoking the superclass\'s "__new__()" method using\n'
' "super(currentclass, cls).__new__(cls[, ...])" with '
'appropriate\n'
' arguments and then modifying the newly-created instance '
'as\n'
' necessary before returning it.\n'
'\n'
' If "__new__()" returns an instance of *cls*, then the '
'new\n'
' instance\'s "__init__()" method will be invoked like\n'
' "__init__(self[, ...])", where *self* is the new instance '
'and the\n'
' remaining arguments are the same as were passed to '
'"__new__()".\n'
'\n'
' If "__new__()" does not return an instance of *cls*, then '
'the new\n'
' instance\'s "__init__()" method will not be invoked.\n'
'\n'
' "__new__()" is intended mainly to allow subclasses of '
'immutable\n'
' types (like int, str, or tuple) to customize instance '
'creation. It\n'
' is also commonly overridden in custom metaclasses in '
'order to\n'
' customize class creation.\n'
'\n'
'object.__init__(self[, ...])\n'
'\n'
' Called after the instance has been created (by '
'"__new__()"), but\n'
' before it is returned to the caller. The arguments are '
'those\n'
' passed to the class constructor expression. If a base '
'class has an\n'
' "__init__()" method, the derived class\'s "__init__()" '
'method, if\n'
' any, must explicitly call it to ensure proper '
'initialization of the\n'
' base class part of the instance; for example:\n'
' "BaseClass.__init__(self, [args...])".\n'
'\n'
' Because "__new__()" and "__init__()" work together in '
'constructing\n'
' objects ("__new__()" to create it, and "__init__()" to '
'customise\n'
' it), no non-"None" value may be returned by "__init__()"; '
'doing so\n'
' will cause a "TypeError" to be raised at runtime.\n'
'\n'
'object.__del__(self)\n'
'\n'
' Called when the instance is about to be destroyed. This '
'is also\n'
' called a destructor. If a base class has a "__del__()" '
'method, the\n'
' derived class\'s "__del__()" method, if any, must '
'explicitly call it\n'
' to ensure proper deletion of the base class part of the '
'instance.\n'
' Note that it is possible (though not recommended!) for '
'the\n'
' "__del__()" method to postpone destruction of the '
'instance by\n'
' creating a new reference to it. It may then be called at '
'a later\n'
' time when this new reference is deleted. It is not '
'guaranteed that\n'
' "__del__()" methods are called for objects that still '
'exist when\n'
' the interpreter exits.\n'
'\n'
' Note: "del x" doesn\'t directly call "x.__del__()" --- '
'the former\n'
' decrements the reference count for "x" by one, and the '
'latter is\n'
' only called when "x"\'s reference count reaches zero. '
'Some common\n'
' situations that may prevent the reference count of an '
'object from\n'
' going to zero include: circular references between '
'objects (e.g.,\n'
' a doubly-linked list or a tree data structure with '
'parent and\n'
' child pointers); a reference to the object on the stack '
'frame of\n'
' a function that caught an exception (the traceback '
'stored in\n'
' "sys.exc_traceback" keeps the stack frame alive); or a '
'reference\n'
' to the object on the stack frame that raised an '
'unhandled\n'
' exception in interactive mode (the traceback stored in\n'
' "sys.last_traceback" keeps the stack frame alive). The '
'first\n'
' situation can only be remedied by explicitly breaking '
'the cycles;\n'
' the latter two situations can be resolved by storing '
'"None" in\n'
' "sys.exc_traceback" or "sys.last_traceback". Circular '
'references\n'
' which are garbage are detected when the option cycle '
'detector is\n'
" enabled (it's on by default), but can only be cleaned "
'up if there\n'
' are no Python-level "__del__()" methods involved. Refer '
'to the\n'
' documentation for the "gc" module for more information '
'about how\n'
' "__del__()" methods are handled by the cycle detector,\n'
' particularly the description of the "garbage" value.\n'
'\n'
' Warning: Due to the precarious circumstances under which\n'
' "__del__()" methods are invoked, exceptions that occur '
'during\n'
' their execution are ignored, and a warning is printed '
'to\n'
' "sys.stderr" instead. Also, when "__del__()" is invoked '
'in\n'
' response to a module being deleted (e.g., when '
'execution of the\n'
' program is done), other globals referenced by the '
'"__del__()"\n'
' method may already have been deleted or in the process '
'of being\n'
' torn down (e.g. the import machinery shutting down). '
'For this\n'
' reason, "__del__()" methods should do the absolute '
'minimum needed\n'
' to maintain external invariants. Starting with version '
'1.5,\n'
' Python guarantees that globals whose name begins with a '
'single\n'
' underscore are deleted from their module before other '
'globals are\n'
' deleted; if no other references to such globals exist, '
'this may\n'
' help in assuring that imported modules are still '
'available at the\n'
' time when the "__del__()" method is called.\n'
'\n'
' See also the "-R" command-line option.\n'
'\n'
'object.__repr__(self)\n'
'\n'
' Called by the "repr()" built-in function and by string '
'conversions\n'
' (reverse quotes) to compute the "official" string '
'representation of\n'
' an object. If at all possible, this should look like a '
'valid\n'
' Python expression that could be used to recreate an '
'object with the\n'
' same value (given an appropriate environment). If this '
'is not\n'
' possible, a string of the form "<...some useful '
'description...>"\n'
' should be returned. The return value must be a string '
'object. If a\n'
' class defines "__repr__()" but not "__str__()", then '
'"__repr__()"\n'
' is also used when an "informal" string representation of '
'instances\n'
' of that class is required.\n'
'\n'
' This is typically used for debugging, so it is important '
'that the\n'
' representation is information-rich and unambiguous.\n'
'\n'
'object.__str__(self)\n'
'\n'
' Called by the "str()" built-in function and by the '
'"print"\n'
' statement to compute the "informal" string representation '
'of an\n'
' object. This differs from "__repr__()" in that it does '
'not have to\n'
' be a valid Python expression: a more convenient or '
'concise\n'
' representation may be used instead. The return value must '
'be a\n'
' string object.\n'
'\n'
'object.__lt__(self, other)\n'
'object.__le__(self, other)\n'
'object.__eq__(self, other)\n'
'object.__ne__(self, other)\n'
'object.__gt__(self, other)\n'
'object.__ge__(self, other)\n'
'\n'
' New in version 2.1.\n'
'\n'
' These are the so-called "rich comparison" methods, and '
'are called\n'
' for comparison operators in preference to "__cmp__()" '
'below. The\n'
' correspondence between operator symbols and method names '
'is as\n'
' follows: "x<y" calls "x.__lt__(y)", "x<=y" calls '
'"x.__le__(y)",\n'
' "x==y" calls "x.__eq__(y)", "x!=y" and "x<>y" call '
'"x.__ne__(y)",\n'
' "x>y" calls "x.__gt__(y)", and "x>=y" calls '
'"x.__ge__(y)".\n'
'\n'
' A rich comparison method may return the singleton '
'"NotImplemented"\n'
' if it does not implement the operation for a given pair '
'of\n'
' arguments. By convention, "False" and "True" are returned '
'for a\n'
' successful comparison. However, these methods can return '
'any value,\n'
' so if the comparison operator is used in a Boolean '
'context (e.g.,\n'
' in the condition of an "if" statement), Python will call '
'"bool()"\n'
' on the value to determine if the result is true or '
'false.\n'
'\n'
' There are no implied relationships among the comparison '
'operators.\n'
' The truth of "x==y" does not imply that "x!=y" is false.\n'
' Accordingly, when defining "__eq__()", one should also '
'define\n'
' "__ne__()" so that the operators will behave as '
'expected. See the\n'
' paragraph on "__hash__()" for some important notes on '
'creating\n'
' *hashable* objects which support custom comparison '
'operations and\n'
' are usable as dictionary keys.\n'
'\n'
' There are no swapped-argument versions of these methods '
'(to be used\n'
' when the left argument does not support the operation but '
'the right\n'
' argument does); rather, "__lt__()" and "__gt__()" are '
"each other's\n"
' reflection, "__le__()" and "__ge__()" are each other\'s '
'reflection,\n'
' and "__eq__()" and "__ne__()" are their own reflection.\n'
'\n'
' Arguments to rich comparison methods are never coerced.\n'
'\n'
' To automatically generate ordering operations from a '
'single root\n'
' operation, see "functools.total_ordering()".\n'
'\n'
'object.__cmp__(self, other)\n'
'\n'
' Called by comparison operations if rich comparison (see '
'above) is\n'
' not defined. Should return a negative integer if "self < '
'other",\n'
' zero if "self == other", a positive integer if "self > '
'other". If\n'
' no "__cmp__()", "__eq__()" or "__ne__()" operation is '
'defined,\n'
' class instances are compared by object identity '
'("address"). See\n'
' also the description of "__hash__()" for some important '
'notes on\n'
' creating *hashable* objects which support custom '
'comparison\n'
' operations and are usable as dictionary keys. (Note: the\n'
' restriction that exceptions are not propagated by '
'"__cmp__()" has\n'
' been removed since Python 1.5.)\n'
'\n'
'object.__rcmp__(self, other)\n'
'\n'
' Changed in version 2.1: No longer supported.\n'
'\n'
'object.__hash__(self)\n'
'\n'
' Called by built-in function "hash()" and for operations '
'on members\n'
' of hashed collections including "set", "frozenset", and '
'"dict".\n'
' "__hash__()" should return an integer. The only required '
'property\n'
' is that objects which compare equal have the same hash '
'value; it is\n'
' advised to mix together the hash values of the components '
'of the\n'
' object that also play a part in comparison of objects by '
'packing\n'
' them into a tuple and hashing the tuple. Example:\n'
'\n'
' def __hash__(self):\n'
' return hash((self.name, self.nick, self.color))\n'
'\n'
' If a class does not define a "__cmp__()" or "__eq__()" '
'method it\n'
' should not define a "__hash__()" operation either; if it '
'defines\n'
' "__cmp__()" or "__eq__()" but not "__hash__()", its '
'instances will\n'
' not be usable in hashed collections. If a class defines '
'mutable\n'
' objects and implements a "__cmp__()" or "__eq__()" '
'method, it\n'
' should not implement "__hash__()", since hashable '
'collection\n'
" implementations require that an object's hash value is "
'immutable\n'
" (if the object's hash value changes, it will be in the "
'wrong hash\n'
' bucket).\n'
'\n'
' User-defined classes have "__cmp__()" and "__hash__()" '
'methods by\n'
' default; with them, all objects compare unequal (except '
'with\n'
' themselves) and "x.__hash__()" returns a result derived '
'from\n'
' "id(x)".\n'
'\n'
' Classes which inherit a "__hash__()" method from a parent '
'class but\n'
' change the meaning of "__cmp__()" or "__eq__()" such that '
'the hash\n'
' value returned is no longer appropriate (e.g. by '
'switching to a\n'
' value-based concept of equality instead of the default '
'identity\n'
' based equality) can explicitly flag themselves as being '
'unhashable\n'
' by setting "__hash__ = None" in the class definition. '
'Doing so\n'
' means that not only will instances of the class raise an\n'
' appropriate "TypeError" when a program attempts to '
'retrieve their\n'
' hash value, but they will also be correctly identified '
'as\n'
' unhashable when checking "isinstance(obj, '
'collections.Hashable)"\n'
' (unlike classes which define their own "__hash__()" to '
'explicitly\n'
' raise "TypeError").\n'
'\n'
' Changed in version 2.5: "__hash__()" may now also return '
'a long\n'
' integer object; the 32-bit integer is then derived from '
'the hash of\n'
' that object.\n'
'\n'
' Changed in version 2.6: "__hash__" may now be set to '
'"None" to\n'
' explicitly flag instances of a class as unhashable.\n'
'\n'
'object.__nonzero__(self)\n'
'\n'
' Called to implement truth value testing and the built-in '
'operation\n'
' "bool()"; should return "False" or "True", or their '
'integer\n'
' equivalents "0" or "1". When this method is not '
'defined,\n'
' "__len__()" is called, if it is defined, and the object '
'is\n'
' considered true if its result is nonzero. If a class '
'defines\n'
' neither "__len__()" nor "__nonzero__()", all its '
'instances are\n'
' considered true.\n'
'\n'
'object.__unicode__(self)\n'
'\n'
' Called to implement "unicode()" built-in; should return a '
'Unicode\n'
' object. When this method is not defined, string '
'conversion is\n'
' attempted, and the result of string conversion is '
'converted to\n'
' Unicode using the system default encoding.\n'
'\n'
'\n'
'Customizing attribute access\n'
'============================\n'
'\n'
'The following methods can be defined to customize the '
'meaning of\n'
'attribute access (use of, assignment to, or deletion of '
'"x.name") for\n'
'class instances.\n'
'\n'
'object.__getattr__(self, name)\n'
'\n'
' Called when an attribute lookup has not found the '
'attribute in the\n'
' usual places (i.e. it is not an instance attribute nor is '
'it found\n'
' in the class tree for "self"). "name" is the attribute '
'name. This\n'
' method should return the (computed) attribute value or '
'raise an\n'
' "AttributeError" exception.\n'
'\n'
' Note that if the attribute is found through the normal '
'mechanism,\n'
' "__getattr__()" is not called. (This is an intentional '
'asymmetry\n'
' between "__getattr__()" and "__setattr__()".) This is '
'done both for\n'
' efficiency reasons and because otherwise "__getattr__()" '
'would have\n'
' no way to access other attributes of the instance. Note '
'that at\n'
' least for instance variables, you can fake total control '
'by not\n'
' inserting any values in the instance attribute dictionary '
'(but\n'
' instead inserting them in another object). See the\n'
' "__getattribute__()" method below for a way to actually '
'get total\n'
' control in new-style classes.\n'
'\n'
'object.__setattr__(self, name, value)\n'
'\n'
' Called when an attribute assignment is attempted. This '
'is called\n'
' instead of the normal mechanism (i.e. store the value in '
'the\n'
' instance dictionary). *name* is the attribute name, '
'*value* is the\n'
' value to be assigned to it.\n'
'\n'
' If "__setattr__()" wants to assign to an instance '
'attribute, it\n'
' should not simply execute "self.name = value" --- this '
'would cause\n'
' a recursive call to itself. Instead, it should insert '
'the value in\n'
' the dictionary of instance attributes, e.g., '
'"self.__dict__[name] =\n'
' value". For new-style classes, rather than accessing the '
'instance\n'
' dictionary, it should call the base class method with the '
'same\n'
' name, for example, "object.__setattr__(self, name, '
'value)".\n'
'\n'
'object.__delattr__(self, name)\n'
'\n'
' Like "__setattr__()" but for attribute deletion instead '
'of\n'
' assignment. This should only be implemented if "del '
'obj.name" is\n'
' meaningful for the object.\n'
'\n'
'\n'
'More attribute access for new-style classes\n'
'-------------------------------------------\n'
'\n'
'The following methods only apply to new-style classes.\n'
'\n'
'object.__getattribute__(self, name)\n'
'\n'
' Called unconditionally to implement attribute accesses '
'for\n'
' instances of the class. If the class also defines '
'"__getattr__()",\n'
' the latter will not be called unless "__getattribute__()" '
'either\n'
' calls it explicitly or raises an "AttributeError". This '
'method\n'
' should return the (computed) attribute value or raise an\n'
' "AttributeError" exception. In order to avoid infinite '
'recursion in\n'
' this method, its implementation should always call the '
'base class\n'
' method with the same name to access any attributes it '
'needs, for\n'
' example, "object.__getattribute__(self, name)".\n'
'\n'
' Note: This method may still be bypassed when looking up '
'special\n'
' methods as the result of implicit invocation via '
'language syntax\n'
' or built-in functions. See Special method lookup for '
'new-style\n'
' classes.\n'
'\n'
'\n'
'Implementing Descriptors\n'
'------------------------\n'
'\n'
'The following methods only apply when an instance of the '
'class\n'
'containing the method (a so-called *descriptor* class) '
'appears in an\n'
"*owner* class (the descriptor must be in either the owner's "
'class\n'
'dictionary or in the class dictionary for one of its '
'parents). In the\n'
'examples below, "the attribute" refers to the attribute '
'whose name is\n'
'the key of the property in the owner class\' "__dict__".\n'
'\n'
'object.__get__(self, instance, owner)\n'
'\n'
' Called to get the attribute of the owner class (class '
'attribute\n'
' access) or of an instance of that class (instance '
'attribute\n'
' access). *owner* is always the owner class, while '
'*instance* is the\n'
' instance that the attribute was accessed through, or '
'"None" when\n'
' the attribute is accessed through the *owner*. This '
'method should\n'
' return the (computed) attribute value or raise an '
'"AttributeError"\n'
' exception.\n'
'\n'
'object.__set__(self, instance, value)\n'
'\n'
' Called to set the attribute on an instance *instance* of '
'the owner\n'
' class to a new value, *value*.\n'
'\n'
'object.__delete__(self, instance)\n'
'\n'
' Called to delete the attribute on an instance *instance* '
'of the\n'
' owner class.\n'
'\n'
'\n'
'Invoking Descriptors\n'
'--------------------\n'
'\n'
'In general, a descriptor is an object attribute with '
'"binding\n'
'behavior", one whose attribute access has been overridden by '
'methods\n'
'in the descriptor protocol: "__get__()", "__set__()", and\n'
'"__delete__()". If any of those methods are defined for an '
'object, it\n'
'is said to be a descriptor.\n'
'\n'
'The default behavior for attribute access is to get, set, or '
'delete\n'
"the attribute from an object's dictionary. For instance, "
'"a.x" has a\n'
'lookup chain starting with "a.__dict__[\'x\']", then\n'
'"type(a).__dict__[\'x\']", and continuing through the base '
'classes of\n'
'"type(a)" excluding metaclasses.\n'
'\n'
'However, if the looked-up value is an object defining one of '
'the\n'
'descriptor methods, then Python may override the default '
'behavior and\n'
'invoke the descriptor method instead. Where this occurs in '
'the\n'
'precedence chain depends on which descriptor methods were '
'defined and\n'
'how they were called. Note that descriptors are only '
'invoked for new\n'
'style objects or classes (ones that subclass "object()" or '
'"type()").\n'
'\n'
'The starting point for descriptor invocation is a binding, '
'"a.x". How\n'
'the arguments are assembled depends on "a":\n'
'\n'
'Direct Call\n'
' The simplest and least common call is when user code '
'directly\n'
' invokes a descriptor method: "x.__get__(a)".\n'
'\n'
'Instance Binding\n'
' If binding to a new-style object instance, "a.x" is '
'transformed\n'
' into the call: "type(a).__dict__[\'x\'].__get__(a, '
'type(a))".\n'
'\n'
'Class Binding\n'
' If binding to a new-style class, "A.x" is transformed '
'into the\n'
' call: "A.__dict__[\'x\'].__get__(None, A)".\n'
'\n'
'Super Binding\n'
' If "a" is an instance of "super", then the binding '
'"super(B,\n'
' obj).m()" searches "obj.__class__.__mro__" for the base '
'class "A"\n'
' immediately preceding "B" and then invokes the descriptor '
'with the\n'
' call: "A.__dict__[\'m\'].__get__(obj, obj.__class__)".\n'
'\n'
'For instance bindings, the precedence of descriptor '
'invocation depends\n'
'on the which descriptor methods are defined. A descriptor '
'can define\n'
'any combination of "__get__()", "__set__()" and '
'"__delete__()". If it\n'
'does not define "__get__()", then accessing the attribute '
'will return\n'
'the descriptor object itself unless there is a value in the '
"object's\n"
'instance dictionary. If the descriptor defines "__set__()" '
'and/or\n'
'"__delete__()", it is a data descriptor; if it defines '
'neither, it is\n'
'a non-data descriptor. Normally, data descriptors define '
'both\n'
'"__get__()" and "__set__()", while non-data descriptors have '
'just the\n'
'"__get__()" method. Data descriptors with "__set__()" and '
'"__get__()"\n'
'defined always override a redefinition in an instance '
'dictionary. In\n'
'contrast, non-data descriptors can be overridden by '
'instances.\n'
'\n'
'Python methods (including "staticmethod()" and '
'"classmethod()") are\n'
'implemented as non-data descriptors. Accordingly, instances '
'can\n'
'redefine and override methods. This allows individual '
'instances to\n'
'acquire behaviors that differ from other instances of the '
'same class.\n'
'\n'
'The "property()" function is implemented as a data '
'descriptor.\n'
'Accordingly, instances cannot override the behavior of a '
'property.\n'
'\n'
'\n'
'__slots__\n'
'---------\n'
'\n'
'By default, instances of both old and new-style classes have '
'a\n'
'dictionary for attribute storage. This wastes space for '
'objects\n'
'having very few instance variables. The space consumption '
'can become\n'
'acute when creating large numbers of instances.\n'
'\n'
'The default can be overridden by defining *__slots__* in a '
'new-style\n'
'class definition. The *__slots__* declaration takes a '
'sequence of\n'
'instance variables and reserves just enough space in each '
'instance to\n'
'hold a value for each variable. Space is saved because '
'*__dict__* is\n'
'not created for each instance.\n'
'\n'
'__slots__\n'
'\n'
' This class variable can be assigned a string, iterable, '
'or sequence\n'
' of strings with variable names used by instances. If '
'defined in a\n'
' new-style class, *__slots__* reserves space for the '
'declared\n'
' variables and prevents the automatic creation of '
'*__dict__* and\n'
' *__weakref__* for each instance.\n'
'\n'
' New in version 2.2.\n'
'\n'
'Notes on using *__slots__*\n'
'\n'
'* When inheriting from a class without *__slots__*, the '
'*__dict__*\n'
' attribute of that class will always be accessible, so a '
'*__slots__*\n'
' definition in the subclass is meaningless.\n'
'\n'
'* Without a *__dict__* variable, instances cannot be '
'assigned new\n'
' variables not listed in the *__slots__* definition. '
'Attempts to\n'
' assign to an unlisted variable name raises '
'"AttributeError". If\n'
' dynamic assignment of new variables is desired, then add\n'
' "\'__dict__\'" to the sequence of strings in the '
'*__slots__*\n'
' declaration.\n'
'\n'
' Changed in version 2.3: Previously, adding "\'__dict__\'" '
'to the\n'
' *__slots__* declaration would not enable the assignment of '
'new\n'
' attributes not specifically listed in the sequence of '
'instance\n'
' variable names.\n'
'\n'
'* Without a *__weakref__* variable for each instance, '
'classes\n'
' defining *__slots__* do not support weak references to '
'its\n'
' instances. If weak reference support is needed, then add\n'
' "\'__weakref__\'" to the sequence of strings in the '
'*__slots__*\n'
' declaration.\n'
'\n'
' Changed in version 2.3: Previously, adding '
'"\'__weakref__\'" to the\n'
' *__slots__* declaration would not enable support for weak\n'
' references.\n'
'\n'
'* *__slots__* are implemented at the class level by '
'creating\n'
' descriptors (Implementing Descriptors) for each variable '
'name. As a\n'
' result, class attributes cannot be used to set default '
'values for\n'
' instance variables defined by *__slots__*; otherwise, the '
'class\n'
' attribute would overwrite the descriptor assignment.\n'
'\n'
'* The action of a *__slots__* declaration is limited to the '
'class\n'
' where it is defined. As a result, subclasses will have a '
'*__dict__*\n'
' unless they also define *__slots__* (which must only '
'contain names\n'
' of any *additional* slots).\n'
'\n'
'* If a class defines a slot also defined in a base class, '
'the\n'
' instance variable defined by the base class slot is '
'inaccessible\n'
' (except by retrieving its descriptor directly from the '
'base class).\n'
' This renders the meaning of the program undefined. In the '
'future, a\n'
' check may be added to prevent this.\n'
'\n'
'* Nonempty *__slots__* does not work for classes derived '
'from\n'
' "variable-length" built-in types such as "long", "str" and '
'"tuple".\n'
'\n'
'* Any non-string iterable may be assigned to *__slots__*. '
'Mappings\n'
' may also be used; however, in the future, special meaning '
'may be\n'
' assigned to the values corresponding to each key.\n'
'\n'
'* *__class__* assignment works only if both classes have the '
'same\n'
' *__slots__*.\n'
'\n'
' Changed in version 2.6: Previously, *__class__* assignment '
'raised an\n'
' error if either new or old class had *__slots__*.\n'
'\n'
'\n'
'Customizing class creation\n'
'==========================\n'
'\n'
'By default, new-style classes are constructed using '
'"type()". A class\n'
'definition is read into a separate namespace and the value '
'of class\n'
'name is bound to the result of "type(name, bases, dict)".\n'
'\n'
'When the class definition is read, if *__metaclass__* is '
'defined then\n'
'the callable assigned to it will be called instead of '
'"type()". This\n'
'allows classes or functions to be written which monitor or '
'alter the\n'
'class creation process:\n'
'\n'
'* Modifying the class dictionary prior to the class being '
'created.\n'
'\n'
'* Returning an instance of another class -- essentially '
'performing\n'
' the role of a factory function.\n'
'\n'
"These steps will have to be performed in the metaclass's "
'"__new__()"\n'
'method -- "type.__new__()" can then be called from this '
'method to\n'
'create a class with different properties. This example adds '
'a new\n'
'element to the class dictionary before creating the class:\n'
'\n'
' class metacls(type):\n'
' def __new__(mcs, name, bases, dict):\n'
" dict['foo'] = 'metacls was here'\n"
' return type.__new__(mcs, name, bases, dict)\n'
'\n'
'You can of course also override other class methods (or add '
'new\n'
'methods); for example defining a custom "__call__()" method '
'in the\n'
'metaclass allows custom behavior when the class is called, '
'e.g. not\n'
'always creating a new instance.\n'
'\n'
'__metaclass__\n'
'\n'
' This variable can be any callable accepting arguments for '
'"name",\n'
' "bases", and "dict". Upon class creation, the callable '
'is used\n'
' instead of the built-in "type()".\n'
'\n'
' New in version 2.2.\n'
'\n'
'The appropriate metaclass is determined by the following '
'precedence\n'
'rules:\n'
'\n'
'* If "dict[\'__metaclass__\']" exists, it is used.\n'
'\n'
'* Otherwise, if there is at least one base class, its '
'metaclass is\n'
' used (this looks for a *__class__* attribute first and if '
'not found,\n'
' uses its type).\n'
'\n'
'* Otherwise, if a global variable named __metaclass__ '
'exists, it is\n'
' used.\n'
'\n'
'* Otherwise, the old-style, classic metaclass '
'(types.ClassType) is\n'
' used.\n'
'\n'
'The potential uses for metaclasses are boundless. Some ideas '
'that have\n'
'been explored including logging, interface checking, '
'automatic\n'
'delegation, automatic property creation, proxies, '
'frameworks, and\n'
'automatic resource locking/synchronization.\n'
'\n'
'\n'
'Customizing instance and subclass checks\n'
'========================================\n'
'\n'
'New in version 2.6.\n'
'\n'
'The following methods are used to override the default '
'behavior of the\n'
'"isinstance()" and "issubclass()" built-in functions.\n'
'\n'
'In particular, the metaclass "abc.ABCMeta" implements these '
'methods in\n'
'order to allow the addition of Abstract Base Classes (ABCs) '
'as\n'
'"virtual base classes" to any class or type (including '
'built-in\n'
'types), including other ABCs.\n'
'\n'
'class.__instancecheck__(self, instance)\n'
'\n'
' Return true if *instance* should be considered a (direct '
'or\n'
' indirect) instance of *class*. If defined, called to '
'implement\n'
' "isinstance(instance, class)".\n'
'\n'
'class.__subclasscheck__(self, subclass)\n'
'\n'
' Return true if *subclass* should be considered a (direct '
'or\n'
' indirect) subclass of *class*. If defined, called to '
'implement\n'
' "issubclass(subclass, class)".\n'
'\n'
'Note that these methods are looked up on the type '
'(metaclass) of a\n'
'class. They cannot be defined as class methods in the '
'actual class.\n'
'This is consistent with the lookup of special methods that '
'are called\n'
'on instances, only in this case the instance is itself a '
'class.\n'
'\n'
'See also:\n'
'\n'
' **PEP 3119** - Introducing Abstract Base Classes\n'
' Includes the specification for customizing '
'"isinstance()" and\n'
' "issubclass()" behavior through "__instancecheck__()" '
'and\n'
' "__subclasscheck__()", with motivation for this '
'functionality in\n'
' the context of adding Abstract Base Classes (see the '
'"abc"\n'
' module) to the language.\n'
'\n'
'\n'
'Emulating callable objects\n'
'==========================\n'
'\n'
'object.__call__(self[, args...])\n'
'\n'
' Called when the instance is "called" as a function; if '
'this method\n'
' is defined, "x(arg1, arg2, ...)" is a shorthand for\n'
' "x.__call__(arg1, arg2, ...)".\n'
'\n'
'\n'
'Emulating container types\n'
'=========================\n'
'\n'
'The following methods can be defined to implement container '
'objects.\n'
'Containers usually are sequences (such as lists or tuples) '
'or mappings\n'
'(like dictionaries), but can represent other containers as '
'well. The\n'
'first set of methods is used either to emulate a sequence or '
'to\n'
'emulate a mapping; the difference is that for a sequence, '
'the\n'
'allowable keys should be the integers *k* for which "0 <= k '
'< N" where\n'
'*N* is the length of the sequence, or slice objects, which '
'define a\n'
'range of items. (For backwards compatibility, the method\n'
'"__getslice__()" (see below) can also be defined to handle '
'simple, but\n'
'not extended slices.) It is also recommended that mappings '
'provide the\n'
'methods "keys()", "values()", "items()", "has_key()", '
'"get()",\n'
'"clear()", "setdefault()", "iterkeys()", "itervalues()",\n'
'"iteritems()", "pop()", "popitem()", "copy()", and '
'"update()" behaving\n'
"similar to those for Python's standard dictionary objects. "
'The\n'
'"UserDict" module provides a "DictMixin" class to help '
'create those\n'
'methods from a base set of "__getitem__()", '
'"__setitem__()",\n'
'"__delitem__()", and "keys()". Mutable sequences should '
'provide\n'
'methods "append()", "count()", "index()", "extend()", '
'"insert()",\n'
'"pop()", "remove()", "reverse()" and "sort()", like Python '
'standard\n'
'list objects. Finally, sequence types should implement '
'addition\n'
'(meaning concatenation) and multiplication (meaning '
'repetition) by\n'
'defining the methods "__add__()", "__radd__()", '
'"__iadd__()",\n'
'"__mul__()", "__rmul__()" and "__imul__()" described below; '
'they\n'
'should not define "__coerce__()" or other numerical '
'operators. It is\n'
'recommended that both mappings and sequences implement the\n'
'"__contains__()" method to allow efficient use of the "in" '
'operator;\n'
'for mappings, "in" should be equivalent of "has_key()"; for '
'sequences,\n'
'it should search through the values. It is further '
'recommended that\n'
'both mappings and sequences implement the "__iter__()" '
'method to allow\n'
'efficient iteration through the container; for mappings, '
'"__iter__()"\n'
'should be the same as "iterkeys()"; for sequences, it should '
'iterate\n'
'through the values.\n'
'\n'
'object.__len__(self)\n'
'\n'
' Called to implement the built-in function "len()". '
'Should return\n'
' the length of the object, an integer ">=" 0. Also, an '
'object that\n'
' doesn\'t define a "__nonzero__()" method and whose '
'"__len__()"\n'
' method returns zero is considered to be false in a '
'Boolean context.\n'
'\n'
' **CPython implementation detail:** In CPython, the length '
'is\n'
' required to be at most "sys.maxsize". If the length is '
'larger than\n'
' "sys.maxsize" some features (such as "len()") may raise\n'
' "OverflowError". To prevent raising "OverflowError" by '
'truth value\n'
' testing, an object must define a "__nonzero__()" method.\n'
'\n'
'object.__getitem__(self, key)\n'
'\n'
' Called to implement evaluation of "self[key]". For '
'sequence types,\n'
' the accepted keys should be integers and slice objects. '
'Note that\n'
' the special interpretation of negative indexes (if the '
'class wishes\n'
' to emulate a sequence type) is up to the "__getitem__()" '
'method. If\n'
' *key* is of an inappropriate type, "TypeError" may be '
'raised; if of\n'
' a value outside the set of indexes for the sequence '
'(after any\n'
' special interpretation of negative values), "IndexError" '
'should be\n'
' raised. For mapping types, if *key* is missing (not in '
'the\n'
' container), "KeyError" should be raised.\n'
'\n'
' Note: "for" loops expect that an "IndexError" will be '
'raised for\n'
' illegal indexes to allow proper detection of the end of '
'the\n'
' sequence.\n'
'\n'
'object.__missing__(self, key)\n'
'\n'
' Called by "dict"."__getitem__()" to implement "self[key]" '
'for dict\n'
' subclasses when key is not in the dictionary.\n'
'\n'
'object.__setitem__(self, key, value)\n'
'\n'
' Called to implement assignment to "self[key]". Same note '
'as for\n'
' "__getitem__()". This should only be implemented for '
'mappings if\n'
' the objects support changes to the values for keys, or if '
'new keys\n'
' can be added, or for sequences if elements can be '
'replaced. The\n'
' same exceptions should be raised for improper *key* '
'values as for\n'
' the "__getitem__()" method.\n'
'\n'
'object.__delitem__(self, key)\n'
'\n'
' Called to implement deletion of "self[key]". Same note '
'as for\n'
' "__getitem__()". This should only be implemented for '
'mappings if\n'
' the objects support removal of keys, or for sequences if '
'elements\n'
' can be removed from the sequence. The same exceptions '
'should be\n'
' raised for improper *key* values as for the '
'"__getitem__()" method.\n'
'\n'
'object.__iter__(self)\n'
'\n'
' This method is called when an iterator is required for a '
'container.\n'
' This method should return a new iterator object that can '
'iterate\n'
' over all the objects in the container. For mappings, it '
'should\n'
' iterate over the keys of the container, and should also '
'be made\n'
' available as the method "iterkeys()".\n'
'\n'
' Iterator objects also need to implement this method; they '
'are\n'
' required to return themselves. For more information on '
'iterator\n'
' objects, see Iterator Types.\n'
'\n'
'object.__reversed__(self)\n'
'\n'
' Called (if present) by the "reversed()" built-in to '
'implement\n'
' reverse iteration. It should return a new iterator '
'object that\n'
' iterates over all the objects in the container in reverse '
'order.\n'
'\n'
' If the "__reversed__()" method is not provided, the '
'"reversed()"\n'
' built-in will fall back to using the sequence protocol '
'("__len__()"\n'
' and "__getitem__()"). Objects that support the sequence '
'protocol\n'
' should only provide "__reversed__()" if they can provide '
'an\n'
' implementation that is more efficient than the one '
'provided by\n'
' "reversed()".\n'
'\n'
' New in version 2.6.\n'
'\n'
'The membership test operators ("in" and "not in") are '
'normally\n'
'implemented as an iteration through a sequence. However, '
'container\n'
'objects can supply the following special method with a more '
'efficient\n'
'implementation, which also does not require the object be a '
'sequence.\n'
'\n'
'object.__contains__(self, item)\n'
'\n'
' Called to implement membership test operators. Should '
'return true\n'
' if *item* is in *self*, false otherwise. For mapping '
'objects, this\n'
' should consider the keys of the mapping rather than the '
'values or\n'
' the key-item pairs.\n'
'\n'
' For objects that don\'t define "__contains__()", the '
'membership test\n'
' first tries iteration via "__iter__()", then the old '
'sequence\n'
' iteration protocol via "__getitem__()", see this section '
'in the\n'
' language reference.\n'
'\n'
'\n'
'Additional methods for emulation of sequence types\n'
'==================================================\n'
'\n'
'The following optional methods can be defined to further '
'emulate\n'
'sequence objects. Immutable sequences methods should at '
'most only\n'
'define "__getslice__()"; mutable sequences might define all '
'three\n'
'methods.\n'
'\n'
'object.__getslice__(self, i, j)\n'
'\n'
' Deprecated since version 2.0: Support slice objects as '
'parameters\n'
' to the "__getitem__()" method. (However, built-in types '
'in CPython\n'
' currently still implement "__getslice__()". Therefore, '
'you have to\n'
' override it in derived classes when implementing '
'slicing.)\n'
'\n'
' Called to implement evaluation of "self[i:j]". The '
'returned object\n'
' should be of the same type as *self*. Note that missing '
'*i* or *j*\n'
' in the slice expression are replaced by zero or '
'"sys.maxsize",\n'
' respectively. If negative indexes are used in the slice, '
'the\n'
' length of the sequence is added to that index. If the '
'instance does\n'
' not implement the "__len__()" method, an "AttributeError" '
'is\n'
' raised. No guarantee is made that indexes adjusted this '
'way are not\n'
' still negative. Indexes which are greater than the '
'length of the\n'
' sequence are not modified. If no "__getslice__()" is '
'found, a slice\n'
' object is created instead, and passed to "__getitem__()" '
'instead.\n'
'\n'
'object.__setslice__(self, i, j, sequence)\n'
'\n'
' Called to implement assignment to "self[i:j]". Same notes '
'for *i*\n'
' and *j* as for "__getslice__()".\n'
'\n'
' This method is deprecated. If no "__setslice__()" is '
'found, or for\n'
' extended slicing of the form "self[i:j:k]", a slice '
'object is\n'
' created, and passed to "__setitem__()", instead of '
'"__setslice__()"\n'
' being called.\n'
'\n'
'object.__delslice__(self, i, j)\n'
'\n'
' Called to implement deletion of "self[i:j]". Same notes '
'for *i* and\n'
' *j* as for "__getslice__()". This method is deprecated. '
'If no\n'
' "__delslice__()" is found, or for extended slicing of the '
'form\n'
' "self[i:j:k]", a slice object is created, and passed to\n'
' "__delitem__()", instead of "__delslice__()" being '
'called.\n'
'\n'
'Notice that these methods are only invoked when a single '
'slice with a\n'
'single colon is used, and the slice method is available. '
'For slice\n'
'operations involving extended slice notation, or in absence '
'of the\n'
'slice methods, "__getitem__()", "__setitem__()" or '
'"__delitem__()" is\n'
'called with a slice object as argument.\n'
'\n'
'The following example demonstrate how to make your program '
'or module\n'
'compatible with earlier versions of Python (assuming that '
'methods\n'
'"__getitem__()", "__setitem__()" and "__delitem__()" support '
'slice\n'
'objects as arguments):\n'
'\n'
' class MyClass:\n'
' ...\n'
' def __getitem__(self, index):\n'
' ...\n'
' def __setitem__(self, index, value):\n'
' ...\n'
' def __delitem__(self, index):\n'
' ...\n'
'\n'
' if sys.version_info < (2, 0):\n'
" # They won't be defined if version is at least "
'2.0 final\n'
'\n'
' def __getslice__(self, i, j):\n'
' return self[max(0, i):max(0, j):]\n'
' def __setslice__(self, i, j, seq):\n'
' self[max(0, i):max(0, j):] = seq\n'
' def __delslice__(self, i, j):\n'
' del self[max(0, i):max(0, j):]\n'
' ...\n'
'\n'
'Note the calls to "max()"; these are necessary because of '
'the handling\n'
'of negative indices before the "__*slice__()" methods are '
'called.\n'
'When negative indexes are used, the "__*item__()" methods '
'receive them\n'
'as provided, but the "__*slice__()" methods get a "cooked" '
'form of the\n'
'index values. For each negative index value, the length of '
'the\n'
'sequence is added to the index before calling the method '
'(which may\n'
'still result in a negative index); this is the customary '
'handling of\n'
'negative indexes by the built-in sequence types, and the '
'"__*item__()"\n'
'methods are expected to do this as well. However, since '
'they should\n'
'already be doing that, negative indexes cannot be passed in; '
'they must\n'
'be constrained to the bounds of the sequence before being '
'passed to\n'
'the "__*item__()" methods. Calling "max(0, i)" conveniently '
'returns\n'
'the proper value.\n'
'\n'
'\n'
'Emulating numeric types\n'
'=======================\n'
'\n'
'The following methods can be defined to emulate numeric '
'objects.\n'
'Methods corresponding to operations that are not supported '
'by the\n'
'particular kind of number implemented (e.g., bitwise '
'operations for\n'
'non-integral numbers) should be left undefined.\n'
'\n'
'object.__add__(self, other)\n'
'object.__sub__(self, other)\n'
'object.__mul__(self, other)\n'
'object.__floordiv__(self, other)\n'
'object.__mod__(self, other)\n'
'object.__divmod__(self, other)\n'
'object.__pow__(self, other[, modulo])\n'
'object.__lshift__(self, other)\n'
'object.__rshift__(self, other)\n'
'object.__and__(self, other)\n'
'object.__xor__(self, other)\n'
'object.__or__(self, other)\n'
'\n'
' These methods are called to implement the binary '
'arithmetic\n'
' operations ("+", "-", "*", "//", "%", "divmod()", '
'"pow()", "**",\n'
' "<<", ">>", "&", "^", "|"). For instance, to evaluate '
'the\n'
' expression "x + y", where *x* is an instance of a class '
'that has an\n'
' "__add__()" method, "x.__add__(y)" is called. The '
'"__divmod__()"\n'
' method should be the equivalent to using "__floordiv__()" '
'and\n'
' "__mod__()"; it should not be related to "__truediv__()" '
'(described\n'
' below). Note that "__pow__()" should be defined to '
'accept an\n'
' optional third argument if the ternary version of the '
'built-in\n'
' "pow()" function is to be supported.\n'
'\n'
' If one of those methods does not support the operation '
'with the\n'
' supplied arguments, it should return "NotImplemented".\n'
'\n'
'object.__div__(self, other)\n'
'object.__truediv__(self, other)\n'
'\n'
' The division operator ("/") is implemented by these '
'methods. The\n'
' "__truediv__()" method is used when "__future__.division" '
'is in\n'
' effect, otherwise "__div__()" is used. If only one of '
'these two\n'
' methods is defined, the object will not support division '
'in the\n'
' alternate context; "TypeError" will be raised instead.\n'
'\n'
'object.__radd__(self, other)\n'
'object.__rsub__(self, other)\n'
'object.__rmul__(self, other)\n'
'object.__rdiv__(self, other)\n'
'object.__rtruediv__(self, other)\n'
'object.__rfloordiv__(self, other)\n'
'object.__rmod__(self, other)\n'
'object.__rdivmod__(self, other)\n'
'object.__rpow__(self, other)\n'
'object.__rlshift__(self, other)\n'
'object.__rrshift__(self, other)\n'
'object.__rand__(self, other)\n'
'object.__rxor__(self, other)\n'
'object.__ror__(self, other)\n'
'\n'
' These methods are called to implement the binary '
'arithmetic\n'
' operations ("+", "-", "*", "/", "%", "divmod()", "pow()", '
'"**",\n'
' "<<", ">>", "&", "^", "|") with reflected (swapped) '
'operands.\n'
' These functions are only called if the left operand does '
'not\n'
' support the corresponding operation and the operands are '
'of\n'
' different types. [2] For instance, to evaluate the '
'expression "x -\n'
' y", where *y* is an instance of a class that has an '
'"__rsub__()"\n'
' method, "y.__rsub__(x)" is called if "x.__sub__(y)" '
'returns\n'
' *NotImplemented*.\n'
'\n'
' Note that ternary "pow()" will not try calling '
'"__rpow__()" (the\n'
' coercion rules would become too complicated).\n'
'\n'
" Note: If the right operand's type is a subclass of the "
'left\n'
" operand's type and that subclass provides the reflected "
'method\n'
' for the operation, this method will be called before '
'the left\n'
" operand's non-reflected method. This behavior allows "
'subclasses\n'
" to override their ancestors' operations.\n"
'\n'
'object.__iadd__(self, other)\n'
'object.__isub__(self, other)\n'
'object.__imul__(self, other)\n'
'object.__idiv__(self, other)\n'
'object.__itruediv__(self, other)\n'
'object.__ifloordiv__(self, other)\n'
'object.__imod__(self, other)\n'
'object.__ipow__(self, other[, modulo])\n'
'object.__ilshift__(self, other)\n'
'object.__irshift__(self, other)\n'
'object.__iand__(self, other)\n'
'object.__ixor__(self, other)\n'
'object.__ior__(self, other)\n'
'\n'
' These methods are called to implement the augmented '
'arithmetic\n'
' assignments ("+=", "-=", "*=", "/=", "//=", "%=", "**=", '
'"<<=",\n'
' ">>=", "&=", "^=", "|="). These methods should attempt '
'to do the\n'
' operation in-place (modifying *self*) and return the '
'result (which\n'
' could be, but does not have to be, *self*). If a '
'specific method\n'
' is not defined, the augmented assignment falls back to '
'the normal\n'
' methods. For instance, to execute the statement "x += '
'y", where\n'
' *x* is an instance of a class that has an "__iadd__()" '
'method,\n'
' "x.__iadd__(y)" is called. If *x* is an instance of a '
'class that\n'
' does not define a "__iadd__()" method, "x.__add__(y)" '
'and\n'
' "y.__radd__(x)" are considered, as with the evaluation of '
'"x + y".\n'
'\n'
'object.__neg__(self)\n'
'object.__pos__(self)\n'
'object.__abs__(self)\n'
'object.__invert__(self)\n'
'\n'
' Called to implement the unary arithmetic operations ("-", '
'"+",\n'
' "abs()" and "~").\n'
'\n'
'object.__complex__(self)\n'
'object.__int__(self)\n'
'object.__long__(self)\n'
'object.__float__(self)\n'
'\n'
' Called to implement the built-in functions "complex()", '
'"int()",\n'
' "long()", and "float()". Should return a value of the '
'appropriate\n'
' type.\n'
'\n'
'object.__oct__(self)\n'
'object.__hex__(self)\n'
'\n'
' Called to implement the built-in functions "oct()" and '
'"hex()".\n'
' Should return a string value.\n'
'\n'
'object.__index__(self)\n'
'\n'
' Called to implement "operator.index()". Also called '
'whenever\n'
' Python needs an integer object (such as in slicing). '
'Must return\n'
' an integer (int or long).\n'
'\n'
' New in version 2.5.\n'
'\n'
'object.__coerce__(self, other)\n'
'\n'
' Called to implement "mixed-mode" numeric arithmetic. '
'Should either\n'
' return a 2-tuple containing *self* and *other* converted '
'to a\n'
' common numeric type, or "None" if conversion is '
'impossible. When\n'
' the common type would be the type of "other", it is '
'sufficient to\n'
' return "None", since the interpreter will also ask the '
'other object\n'
' to attempt a coercion (but sometimes, if the '
'implementation of the\n'
' other type cannot be changed, it is useful to do the '
'conversion to\n'
' the other type here). A return value of "NotImplemented" '
'is\n'
' equivalent to returning "None".\n'
'\n'
'\n'
'Coercion rules\n'
'==============\n'
'\n'
'This section used to document the rules for coercion. As '
'the language\n'
'has evolved, the coercion rules have become hard to '
'document\n'
'precisely; documenting what one version of one particular\n'
'implementation does is undesirable. Instead, here are some '
'informal\n'
'guidelines regarding coercion. In Python 3, coercion will '
'not be\n'
'supported.\n'
'\n'
'* If the left operand of a % operator is a string or Unicode '
'object,\n'
' no coercion takes place and the string formatting '
'operation is\n'
' invoked instead.\n'
'\n'
'* It is no longer recommended to define a coercion '
'operation. Mixed-\n'
" mode operations on types that don't define coercion pass "
'the\n'
' original arguments to the operation.\n'
'\n'
'* New-style classes (those derived from "object") never '
'invoke the\n'
' "__coerce__()" method in response to a binary operator; '
'the only\n'
' time "__coerce__()" is invoked is when the built-in '
'function\n'
' "coerce()" is called.\n'
'\n'
'* For most intents and purposes, an operator that returns\n'
' "NotImplemented" is treated the same as one that is not '
'implemented\n'
' at all.\n'
'\n'
'* Below, "__op__()" and "__rop__()" are used to signify the '
'generic\n'
' method names corresponding to an operator; "__iop__()" is '
'used for\n'
' the corresponding in-place operator. For example, for the '
'operator\n'
' \'"+"\', "__add__()" and "__radd__()" are used for the '
'left and right\n'
' variant of the binary operator, and "__iadd__()" for the '
'in-place\n'
' variant.\n'
'\n'
'* For objects *x* and *y*, first "x.__op__(y)" is tried. If '
'this is\n'
' not implemented or returns "NotImplemented", '
'"y.__rop__(x)" is\n'
' tried. If this is also not implemented or returns '
'"NotImplemented",\n'
' a "TypeError" exception is raised. But see the following '
'exception:\n'
'\n'
'* Exception to the previous item: if the left operand is an '
'instance\n'
' of a built-in type or a new-style class, and the right '
'operand is an\n'
' instance of a proper subclass of that type or class and '
'overrides\n'
' the base\'s "__rop__()" method, the right operand\'s '
'"__rop__()"\n'
' method is tried *before* the left operand\'s "__op__()" '
'method.\n'
'\n'
' This is done so that a subclass can completely override '
'binary\n'
' operators. Otherwise, the left operand\'s "__op__()" '
'method would\n'
' always accept the right operand: when an instance of a '
'given class\n'
' is expected, an instance of a subclass of that class is '
'always\n'
' acceptable.\n'
'\n'
'* When either operand type defines a coercion, this coercion '
'is\n'
' called before that type\'s "__op__()" or "__rop__()" '
'method is\n'
' called, but no sooner. If the coercion returns an object '
'of a\n'
' different type for the operand whose coercion is invoked, '
'part of\n'
' the process is redone using the new object.\n'
'\n'
'* When an in-place operator (like \'"+="\') is used, if the '
'left\n'
' operand implements "__iop__()", it is invoked without any '
'coercion.\n'
' When the operation falls back to "__op__()" and/or '
'"__rop__()", the\n'
' normal coercion rules apply.\n'
'\n'
'* In "x + y", if *x* is a sequence that implements sequence\n'
' concatenation, sequence concatenation is invoked.\n'
'\n'
'* In "x * y", if one operand is a sequence that implements '
'sequence\n'
' repetition, and the other is an integer ("int" or "long"), '
'sequence\n'
' repetition is invoked.\n'
'\n'
'* Rich comparisons (implemented by methods "__eq__()" and so '
'on)\n'
' never use coercion. Three-way comparison (implemented by\n'
' "__cmp__()") does use coercion under the same conditions '
'as other\n'
' binary operations use it.\n'
'\n'
'* In the current implementation, the built-in numeric types '
'"int",\n'
' "long", "float", and "complex" do not use coercion. All '
'these types\n'
' implement a "__coerce__()" method, for use by the '
'built-in\n'
' "coerce()" function.\n'
'\n'
' Changed in version 2.7: The complex type no longer makes '
'implicit\n'
' calls to the "__coerce__()" method for mixed-type binary '
'arithmetic\n'
' operations.\n'
'\n'
'\n'
'With Statement Context Managers\n'
'===============================\n'
'\n'
'New in version 2.5.\n'
'\n'
'A *context manager* is an object that defines the runtime '
'context to\n'
'be established when executing a "with" statement. The '
'context manager\n'
'handles the entry into, and the exit from, the desired '
'runtime context\n'
'for the execution of the block of code. Context managers '
'are normally\n'
'invoked using the "with" statement (described in section The '
'with\n'
'statement), but can also be used by directly invoking their '
'methods.\n'
'\n'
'Typical uses of context managers include saving and '
'restoring various\n'
'kinds of global state, locking and unlocking resources, '
'closing opened\n'
'files, etc.\n'
'\n'
'For more information on context managers, see Context '
'Manager Types.\n'
'\n'
'object.__enter__(self)\n'
'\n'
' Enter the runtime context related to this object. The '
'"with"\n'
" statement will bind this method's return value to the "
'target(s)\n'
' specified in the "as" clause of the statement, if any.\n'
'\n'
'object.__exit__(self, exc_type, exc_value, traceback)\n'
'\n'
' Exit the runtime context related to this object. The '
'parameters\n'
' describe the exception that caused the context to be '
'exited. If the\n'
' context was exited without an exception, all three '
'arguments will\n'
' be "None".\n'
'\n'
' If an exception is supplied, and the method wishes to '
'suppress the\n'
' exception (i.e., prevent it from being propagated), it '
'should\n'
' return a true value. Otherwise, the exception will be '
'processed\n'
' normally upon exit from this method.\n'
'\n'
' Note that "__exit__()" methods should not reraise the '
'passed-in\n'
" exception; this is the caller's responsibility.\n"
'\n'
'See also:\n'
'\n'
' **PEP 343** - The "with" statement\n'
' The specification, background, and examples for the '
'Python "with"\n'
' statement.\n'
'\n'
'\n'
'Special method lookup for old-style classes\n'
'===========================================\n'
'\n'
'For old-style classes, special methods are always looked up '
'in exactly\n'
'the same way as any other method or attribute. This is the '
'case\n'
'regardless of whether the method is being looked up '
'explicitly as in\n'
'"x.__getitem__(i)" or implicitly as in "x[i]".\n'
'\n'
'This behaviour means that special methods may exhibit '
'different\n'
'behaviour for different instances of a single old-style '
'class if the\n'
'appropriate special attributes are set differently:\n'
'\n'
' >>> class C:\n'
' ... pass\n'
' ...\n'
' >>> c1 = C()\n'
' >>> c2 = C()\n'
' >>> c1.__len__ = lambda: 5\n'
' >>> c2.__len__ = lambda: 9\n'
' >>> len(c1)\n'
' 5\n'
' >>> len(c2)\n'
' 9\n'
'\n'
'\n'
'Special method lookup for new-style classes\n'
'===========================================\n'
'\n'
'For new-style classes, implicit invocations of special '
'methods are\n'
"only guaranteed to work correctly if defined on an object's "
'type, not\n'
"in the object's instance dictionary. That behaviour is the "
'reason why\n'
'the following code raises an exception (unlike the '
'equivalent example\n'
'with old-style classes):\n'
'\n'
' >>> class C(object):\n'
' ... pass\n'
' ...\n'
' >>> c = C()\n'
' >>> c.__len__ = lambda: 5\n'
' >>> len(c)\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in <module>\n'
" TypeError: object of type 'C' has no len()\n"
'\n'
'The rationale behind this behaviour lies with a number of '
'special\n'
'methods such as "__hash__()" and "__repr__()" that are '
'implemented by\n'
'all objects, including type objects. If the implicit lookup '
'of these\n'
'methods used the conventional lookup process, they would '
'fail when\n'
'invoked on the type object itself:\n'
'\n'
' >>> 1 .__hash__() == hash(1)\n'
' True\n'
' >>> int.__hash__() == hash(int)\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in <module>\n'
" TypeError: descriptor '__hash__' of 'int' object needs an "
'argument\n'
'\n'
'Incorrectly attempting to invoke an unbound method of a '
'class in this\n'
"way is sometimes referred to as 'metaclass confusion', and "
'is avoided\n'
'by bypassing the instance when looking up special methods:\n'
'\n'
' >>> type(1).__hash__(1) == hash(1)\n'
' True\n'
' >>> type(int).__hash__(int) == hash(int)\n'
' True\n'
'\n'
'In addition to bypassing any instance attributes in the '
'interest of\n'
'correctness, implicit special method lookup generally also '
'bypasses\n'
'the "__getattribute__()" method even of the object\'s '
'metaclass:\n'
'\n'
' >>> class Meta(type):\n'
' ... def __getattribute__(*args):\n'
' ... print "Metaclass getattribute invoked"\n'
' ... return type.__getattribute__(*args)\n'
' ...\n'
' >>> class C(object):\n'
' ... __metaclass__ = Meta\n'
' ... def __len__(self):\n'
' ... return 10\n'
' ... def __getattribute__(*args):\n'
' ... print "Class getattribute invoked"\n'
' ... return object.__getattribute__(*args)\n'
' ...\n'
' >>> c = C()\n'
' >>> c.__len__() # Explicit lookup via '
'instance\n'
' Class getattribute invoked\n'
' 10\n'
' >>> type(c).__len__(c) # Explicit lookup via '
'type\n'
' Metaclass getattribute invoked\n'
' 10\n'
' >>> len(c) # Implicit lookup\n'
' 10\n'
'\n'
'Bypassing the "__getattribute__()" machinery in this fashion '
'provides\n'
'significant scope for speed optimisations within the '
'interpreter, at\n'
'the cost of some flexibility in the handling of special '
'methods (the\n'
'special method *must* be set on the class object itself in '
'order to be\n'
'consistently invoked by the interpreter).\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
"[1] It *is* possible in some cases to change an object's "
'type,\n'
" under certain controlled conditions. It generally isn't "
'a good\n'
' idea though, since it can lead to some very strange '
'behaviour if\n'
' it is handled incorrectly.\n'
'\n'
'[2] For operands of the same type, it is assumed that if the '
'non-\n'
' reflected method (such as "__add__()") fails the '
'operation is not\n'
' supported, which is why the reflected method is not '
'called.\n',
'string-methods': '\n'
'String Methods\n'
'**************\n'
'\n'
'Below are listed the string methods which both 8-bit '
'strings and\n'
'Unicode objects support. Some of them are also available '
'on\n'
'"bytearray" objects.\n'
'\n'
"In addition, Python's strings support the sequence type "
'methods\n'
'described in the Sequence Types --- str, unicode, list, '
'tuple,\n'
'bytearray, buffer, xrange section. To output formatted '
'strings use\n'
'template strings or the "%" operator described in the '
'String\n'
'Formatting Operations section. Also, see the "re" module '
'for string\n'
'functions based on regular expressions.\n'
'\n'
'str.capitalize()\n'
'\n'
' Return a copy of the string with its first character '
'capitalized\n'
' and the rest lowercased.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.center(width[, fillchar])\n'
'\n'
' Return centered in a string of length *width*. Padding '
'is done\n'
' using the specified *fillchar* (default is a space).\n'
'\n'
' Changed in version 2.4: Support for the *fillchar* '
'argument.\n'
'\n'
'str.count(sub[, start[, end]])\n'
'\n'
' Return the number of non-overlapping occurrences of '
'substring *sub*\n'
' in the range [*start*, *end*]. Optional arguments '
'*start* and\n'
' *end* are interpreted as in slice notation.\n'
'\n'
'str.decode([encoding[, errors]])\n'
'\n'
' Decodes the string using the codec registered for '
'*encoding*.\n'
' *encoding* defaults to the default string encoding. '
'*errors* may\n'
' be given to set a different error handling scheme. The '
'default is\n'
' "\'strict\'", meaning that encoding errors raise '
'"UnicodeError".\n'
' Other possible values are "\'ignore\'", "\'replace\'" '
'and any other\n'
' name registered via "codecs.register_error()", see '
'section Codec\n'
' Base Classes.\n'
'\n'
' New in version 2.2.\n'
'\n'
' Changed in version 2.3: Support for other error '
'handling schemes\n'
' added.\n'
'\n'
' Changed in version 2.7: Support for keyword arguments '
'added.\n'
'\n'
'str.encode([encoding[, errors]])\n'
'\n'
' Return an encoded version of the string. Default '
'encoding is the\n'
' current default string encoding. *errors* may be given '
'to set a\n'
' different error handling scheme. The default for '
'*errors* is\n'
' "\'strict\'", meaning that encoding errors raise a '
'"UnicodeError".\n'
' Other possible values are "\'ignore\'", "\'replace\'",\n'
' "\'xmlcharrefreplace\'", "\'backslashreplace\'" and any '
'other name\n'
' registered via "codecs.register_error()", see section '
'Codec Base\n'
' Classes. For a list of possible encodings, see section '
'Standard\n'
' Encodings.\n'
'\n'
' New in version 2.0.\n'
'\n'
' Changed in version 2.3: Support for '
'"\'xmlcharrefreplace\'" and\n'
' "\'backslashreplace\'" and other error handling schemes '
'added.\n'
'\n'
' Changed in version 2.7: Support for keyword arguments '
'added.\n'
'\n'
'str.endswith(suffix[, start[, end]])\n'
'\n'
' Return "True" if the string ends with the specified '
'*suffix*,\n'
' otherwise return "False". *suffix* can also be a tuple '
'of suffixes\n'
' to look for. With optional *start*, test beginning at '
'that\n'
' position. With optional *end*, stop comparing at that '
'position.\n'
'\n'
' Changed in version 2.5: Accept tuples as *suffix*.\n'
'\n'
'str.expandtabs([tabsize])\n'
'\n'
' Return a copy of the string where all tab characters '
'are replaced\n'
' by one or more spaces, depending on the current column '
'and the\n'
' given tab size. Tab positions occur every *tabsize* '
'characters\n'
' (default is 8, giving tab positions at columns 0, 8, 16 '
'and so on).\n'
' To expand the string, the current column is set to zero '
'and the\n'
' string is examined character by character. If the '
'character is a\n'
' tab ("\\t"), one or more space characters are inserted '
'in the result\n'
' until the current column is equal to the next tab '
'position. (The\n'
' tab character itself is not copied.) If the character '
'is a newline\n'
' ("\\n") or return ("\\r"), it is copied and the current '
'column is\n'
' reset to zero. Any other character is copied unchanged '
'and the\n'
' current column is incremented by one regardless of how '
'the\n'
' character is represented when printed.\n'
'\n'
" >>> '01\\t012\\t0123\\t01234'.expandtabs()\n"
" '01 012 0123 01234'\n"
" >>> '01\\t012\\t0123\\t01234'.expandtabs(4)\n"
" '01 012 0123 01234'\n"
'\n'
'str.find(sub[, start[, end]])\n'
'\n'
' Return the lowest index in the string where substring '
'*sub* is\n'
' found within the slice "s[start:end]". Optional '
'arguments *start*\n'
' and *end* are interpreted as in slice notation. Return '
'"-1" if\n'
' *sub* is not found.\n'
'\n'
' Note: The "find()" method should be used only if you '
'need to know\n'
' the position of *sub*. To check if *sub* is a '
'substring or not,\n'
' use the "in" operator:\n'
'\n'
" >>> 'Py' in 'Python'\n"
' True\n'
'\n'
'str.format(*args, **kwargs)\n'
'\n'
' Perform a string formatting operation. The string on '
'which this\n'
' method is called can contain literal text or '
'replacement fields\n'
' delimited by braces "{}". Each replacement field '
'contains either\n'
' the numeric index of a positional argument, or the name '
'of a\n'
' keyword argument. Returns a copy of the string where '
'each\n'
' replacement field is replaced with the string value of '
'the\n'
' corresponding argument.\n'
'\n'
' >>> "The sum of 1 + 2 is {0}".format(1+2)\n'
" 'The sum of 1 + 2 is 3'\n"
'\n'
' See Format String Syntax for a description of the '
'various\n'
' formatting options that can be specified in format '
'strings.\n'
'\n'
' This method of string formatting is the new standard in '
'Python 3,\n'
' and should be preferred to the "%" formatting described '
'in String\n'
' Formatting Operations in new code.\n'
'\n'
' New in version 2.6.\n'
'\n'
'str.index(sub[, start[, end]])\n'
'\n'
' Like "find()", but raise "ValueError" when the '
'substring is not\n'
' found.\n'
'\n'
'str.isalnum()\n'
'\n'
' Return true if all characters in the string are '
'alphanumeric and\n'
' there is at least one character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.isalpha()\n'
'\n'
' Return true if all characters in the string are '
'alphabetic and\n'
' there is at least one character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.isdigit()\n'
'\n'
' Return true if all characters in the string are digits '
'and there is\n'
' at least one character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.islower()\n'
'\n'
' Return true if all cased characters [4] in the string '
'are lowercase\n'
' and there is at least one cased character, false '
'otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.isspace()\n'
'\n'
' Return true if there are only whitespace characters in '
'the string\n'
' and there is at least one character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.istitle()\n'
'\n'
' Return true if the string is a titlecased string and '
'there is at\n'
' least one character, for example uppercase characters '
'may only\n'
' follow uncased characters and lowercase characters only '
'cased ones.\n'
' Return false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.isupper()\n'
'\n'
' Return true if all cased characters [4] in the string '
'are uppercase\n'
' and there is at least one cased character, false '
'otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.join(iterable)\n'
'\n'
' Return a string which is the concatenation of the '
'strings in\n'
' *iterable*. A "TypeError" will be raised if there are '
'any non-\n'
' string values in *iterable*, including "bytes" '
'objects. The\n'
' separator between elements is the string providing this '
'method.\n'
'\n'
'str.ljust(width[, fillchar])\n'
'\n'
' Return the string left justified in a string of length '
'*width*.\n'
' Padding is done using the specified *fillchar* (default '
'is a\n'
' space). The original string is returned if *width* is '
'less than or\n'
' equal to "len(s)".\n'
'\n'
' Changed in version 2.4: Support for the *fillchar* '
'argument.\n'
'\n'
'str.lower()\n'
'\n'
' Return a copy of the string with all the cased '
'characters [4]\n'
' converted to lowercase.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.lstrip([chars])\n'
'\n'
' Return a copy of the string with leading characters '
'removed. The\n'
' *chars* argument is a string specifying the set of '
'characters to be\n'
' removed. If omitted or "None", the *chars* argument '
'defaults to\n'
' removing whitespace. The *chars* argument is not a '
'prefix; rather,\n'
' all combinations of its values are stripped:\n'
'\n'
" >>> ' spacious '.lstrip()\n"
" 'spacious '\n"
" >>> 'www.example.com'.lstrip('cmowz.')\n"
" 'example.com'\n"
'\n'
' Changed in version 2.2.2: Support for the *chars* '
'argument.\n'
'\n'
'str.partition(sep)\n'
'\n'
' Split the string at the first occurrence of *sep*, and '
'return a\n'
' 3-tuple containing the part before the separator, the '
'separator\n'
' itself, and the part after the separator. If the '
'separator is not\n'
' found, return a 3-tuple containing the string itself, '
'followed by\n'
' two empty strings.\n'
'\n'
' New in version 2.5.\n'
'\n'
'str.replace(old, new[, count])\n'
'\n'
' Return a copy of the string with all occurrences of '
'substring *old*\n'
' replaced by *new*. If the optional argument *count* is '
'given, only\n'
' the first *count* occurrences are replaced.\n'
'\n'
'str.rfind(sub[, start[, end]])\n'
'\n'
' Return the highest index in the string where substring '
'*sub* is\n'
' found, such that *sub* is contained within '
'"s[start:end]".\n'
' Optional arguments *start* and *end* are interpreted as '
'in slice\n'
' notation. Return "-1" on failure.\n'
'\n'
'str.rindex(sub[, start[, end]])\n'
'\n'
' Like "rfind()" but raises "ValueError" when the '
'substring *sub* is\n'
' not found.\n'
'\n'
'str.rjust(width[, fillchar])\n'
'\n'
' Return the string right justified in a string of length '
'*width*.\n'
' Padding is done using the specified *fillchar* (default '
'is a\n'
' space). The original string is returned if *width* is '
'less than or\n'
' equal to "len(s)".\n'
'\n'
' Changed in version 2.4: Support for the *fillchar* '
'argument.\n'
'\n'
'str.rpartition(sep)\n'
'\n'
' Split the string at the last occurrence of *sep*, and '
'return a\n'
' 3-tuple containing the part before the separator, the '
'separator\n'
' itself, and the part after the separator. If the '
'separator is not\n'
' found, return a 3-tuple containing two empty strings, '
'followed by\n'
' the string itself.\n'
'\n'
' New in version 2.5.\n'
'\n'
'str.rsplit([sep[, maxsplit]])\n'
'\n'
' Return a list of the words in the string, using *sep* '
'as the\n'
' delimiter string. If *maxsplit* is given, at most '
'*maxsplit* splits\n'
' are done, the *rightmost* ones. If *sep* is not '
'specified or\n'
' "None", any whitespace string is a separator. Except '
'for splitting\n'
' from the right, "rsplit()" behaves like "split()" which '
'is\n'
' described in detail below.\n'
'\n'
' New in version 2.4.\n'
'\n'
'str.rstrip([chars])\n'
'\n'
' Return a copy of the string with trailing characters '
'removed. The\n'
' *chars* argument is a string specifying the set of '
'characters to be\n'
' removed. If omitted or "None", the *chars* argument '
'defaults to\n'
' removing whitespace. The *chars* argument is not a '
'suffix; rather,\n'
' all combinations of its values are stripped:\n'
'\n'
" >>> ' spacious '.rstrip()\n"
" ' spacious'\n"
" >>> 'mississippi'.rstrip('ipz')\n"
" 'mississ'\n"
'\n'
' Changed in version 2.2.2: Support for the *chars* '
'argument.\n'
'\n'
'str.split([sep[, maxsplit]])\n'
'\n'
' Return a list of the words in the string, using *sep* '
'as the\n'
' delimiter string. If *maxsplit* is given, at most '
'*maxsplit*\n'
' splits are done (thus, the list will have at most '
'"maxsplit+1"\n'
' elements). If *maxsplit* is not specified or "-1", '
'then there is\n'
' no limit on the number of splits (all possible splits '
'are made).\n'
'\n'
' If *sep* is given, consecutive delimiters are not '
'grouped together\n'
' and are deemed to delimit empty strings (for example,\n'
' "\'1,,2\'.split(\',\')" returns "[\'1\', \'\', '
'\'2\']"). The *sep* argument\n'
' may consist of multiple characters (for example,\n'
' "\'1<>2<>3\'.split(\'<>\')" returns "[\'1\', \'2\', '
'\'3\']"). Splitting an\n'
' empty string with a specified separator returns '
'"[\'\']".\n'
'\n'
' If *sep* is not specified or is "None", a different '
'splitting\n'
' algorithm is applied: runs of consecutive whitespace '
'are regarded\n'
' as a single separator, and the result will contain no '
'empty strings\n'
' at the start or end if the string has leading or '
'trailing\n'
' whitespace. Consequently, splitting an empty string or '
'a string\n'
' consisting of just whitespace with a "None" separator '
'returns "[]".\n'
'\n'
' For example, "\' 1 2 3 \'.split()" returns "[\'1\', '
'\'2\', \'3\']", and\n'
' "\' 1 2 3 \'.split(None, 1)" returns "[\'1\', '
'\'2 3 \']".\n'
'\n'
'str.splitlines([keepends])\n'
'\n'
' Return a list of the lines in the string, breaking at '
'line\n'
' boundaries. This method uses the *universal newlines* '
'approach to\n'
' splitting lines. Line breaks are not included in the '
'resulting list\n'
' unless *keepends* is given and true.\n'
'\n'
' Python recognizes ""\\r"", ""\\n"", and ""\\r\\n"" as '
'line boundaries\n'
' for 8-bit strings.\n'
'\n'
' For example:\n'
'\n'
" >>> 'ab c\\n\\nde fg\\rkl\\r\\n'.splitlines()\n"
" ['ab c', '', 'de fg', 'kl']\n"
" >>> 'ab c\\n\\nde fg\\rkl\\r\\n'.splitlines(True)\n"
" ['ab c\\n', '\\n', 'de fg\\r', 'kl\\r\\n']\n"
'\n'
' Unlike "split()" when a delimiter string *sep* is '
'given, this\n'
' method returns an empty list for the empty string, and '
'a terminal\n'
' line break does not result in an extra line:\n'
'\n'
' >>> "".splitlines()\n'
' []\n'
' >>> "One line\\n".splitlines()\n'
" ['One line']\n"
'\n'
' For comparison, "split(\'\\n\')" gives:\n'
'\n'
" >>> ''.split('\\n')\n"
" ['']\n"
" >>> 'Two lines\\n'.split('\\n')\n"
" ['Two lines', '']\n"
'\n'
'unicode.splitlines([keepends])\n'
'\n'
' Return a list of the lines in the string, like '
'"str.splitlines()".\n'
' However, the Unicode method splits on the following '
'line\n'
' boundaries, which are a superset of the *universal '
'newlines*\n'
' recognized for 8-bit strings.\n'
'\n'
' '
'+-------------------------+-------------------------------+\n'
' | Representation | '
'Description |\n'
' '
'+=========================+===============================+\n'
' | "\\n" | Line '
'Feed |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\r" | Carriage '
'Return |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\r\\n" | Carriage Return + Line '
'Feed |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\v" or "\\x0b" | Line '
'Tabulation |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\f" or "\\x0c" | Form '
'Feed |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\x1c" | File '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\x1d" | Group '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\x1e" | Record '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\x85" | Next Line (C1 Control '
'Code) |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\u2028" | Line '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\u2029" | Paragraph '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
'\n'
' Changed in version 2.7: "\\v" and "\\f" added to list '
'of line\n'
' boundaries.\n'
'\n'
'str.startswith(prefix[, start[, end]])\n'
'\n'
' Return "True" if string starts with the *prefix*, '
'otherwise return\n'
' "False". *prefix* can also be a tuple of prefixes to '
'look for.\n'
' With optional *start*, test string beginning at that '
'position.\n'
' With optional *end*, stop comparing string at that '
'position.\n'
'\n'
' Changed in version 2.5: Accept tuples as *prefix*.\n'
'\n'
'str.strip([chars])\n'
'\n'
' Return a copy of the string with the leading and '
'trailing\n'
' characters removed. The *chars* argument is a string '
'specifying the\n'
' set of characters to be removed. If omitted or "None", '
'the *chars*\n'
' argument defaults to removing whitespace. The *chars* '
'argument is\n'
' not a prefix or suffix; rather, all combinations of its '
'values are\n'
' stripped:\n'
'\n'
" >>> ' spacious '.strip()\n"
" 'spacious'\n"
" >>> 'www.example.com'.strip('cmowz.')\n"
" 'example'\n"
'\n'
' Changed in version 2.2.2: Support for the *chars* '
'argument.\n'
'\n'
'str.swapcase()\n'
'\n'
' Return a copy of the string with uppercase characters '
'converted to\n'
' lowercase and vice versa.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.title()\n'
'\n'
' Return a titlecased version of the string where words '
'start with an\n'
' uppercase character and the remaining characters are '
'lowercase.\n'
'\n'
' The algorithm uses a simple language-independent '
'definition of a\n'
' word as groups of consecutive letters. The definition '
'works in\n'
' many contexts but it means that apostrophes in '
'contractions and\n'
' possessives form word boundaries, which may not be the '
'desired\n'
' result:\n'
'\n'
' >>> "they\'re bill\'s friends from the UK".title()\n'
' "They\'Re Bill\'S Friends From The Uk"\n'
'\n'
' A workaround for apostrophes can be constructed using '
'regular\n'
' expressions:\n'
'\n'
' >>> import re\n'
' >>> def titlecase(s):\n'
' ... return re.sub(r"[A-Za-z]+(\'[A-Za-z]+)?",\n'
' ... lambda mo: '
'mo.group(0)[0].upper() +\n'
' ... '
'mo.group(0)[1:].lower(),\n'
' ... s)\n'
' ...\n'
' >>> titlecase("they\'re bill\'s friends.")\n'
' "They\'re Bill\'s Friends."\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.translate(table[, deletechars])\n'
'\n'
' Return a copy of the string where all characters '
'occurring in the\n'
' optional argument *deletechars* are removed, and the '
'remaining\n'
' characters have been mapped through the given '
'translation table,\n'
' which must be a string of length 256.\n'
'\n'
' You can use the "maketrans()" helper function in the '
'"string"\n'
' module to create a translation table. For string '
'objects, set the\n'
' *table* argument to "None" for translations that only '
'delete\n'
' characters:\n'
'\n'
" >>> 'read this short text'.translate(None, 'aeiou')\n"
" 'rd ths shrt txt'\n"
'\n'
' New in version 2.6: Support for a "None" *table* '
'argument.\n'
'\n'
' For Unicode objects, the "translate()" method does not '
'accept the\n'
' optional *deletechars* argument. Instead, it returns a '
'copy of the\n'
' *s* where all characters have been mapped through the '
'given\n'
' translation table which must be a mapping of Unicode '
'ordinals to\n'
' Unicode ordinals, Unicode strings or "None". Unmapped '
'characters\n'
' are left untouched. Characters mapped to "None" are '
'deleted. Note,\n'
' a more flexible approach is to create a custom '
'character mapping\n'
' codec using the "codecs" module (see "encodings.cp1251" '
'for an\n'
' example).\n'
'\n'
'str.upper()\n'
'\n'
' Return a copy of the string with all the cased '
'characters [4]\n'
' converted to uppercase. Note that '
'"str.upper().isupper()" might be\n'
' "False" if "s" contains uncased characters or if the '
'Unicode\n'
' category of the resulting character(s) is not "Lu" '
'(Letter,\n'
' uppercase), but e.g. "Lt" (Letter, titlecase).\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.zfill(width)\n'
'\n'
' Return the numeric string left filled with zeros in a '
'string of\n'
' length *width*. A sign prefix is handled correctly. '
'The original\n'
' string is returned if *width* is less than or equal to '
'"len(s)".\n'
'\n'
' New in version 2.2.2.\n'
'\n'
'The following methods are present only on unicode '
'objects:\n'
'\n'
'unicode.isnumeric()\n'
'\n'
' Return "True" if there are only numeric characters in '
'S, "False"\n'
' otherwise. Numeric characters include digit characters, '
'and all\n'
' characters that have the Unicode numeric value '
'property, e.g.\n'
' U+2155, VULGAR FRACTION ONE FIFTH.\n'
'\n'
'unicode.isdecimal()\n'
'\n'
' Return "True" if there are only decimal characters in '
'S, "False"\n'
' otherwise. Decimal characters include digit characters, '
'and all\n'
' characters that can be used to form decimal-radix '
'numbers, e.g.\n'
' U+0660, ARABIC-INDIC DIGIT ZERO.\n',
'strings': '\n'
'String literals\n'
'***************\n'
'\n'
'String literals are described by the following lexical '
'definitions:\n'
'\n'
' stringliteral ::= [stringprefix](shortstring | longstring)\n'
' stringprefix ::= "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" '
'| "uR"\n'
' | "b" | "B" | "br" | "Br" | "bR" | "BR"\n'
' shortstring ::= "\'" shortstringitem* "\'" | \'"\' '
'shortstringitem* \'"\'\n'
' longstring ::= "\'\'\'" longstringitem* "\'\'\'"\n'
' | \'"""\' longstringitem* \'"""\'\n'
' shortstringitem ::= shortstringchar | escapeseq\n'
' longstringitem ::= longstringchar | escapeseq\n'
' shortstringchar ::= <any source character except "\\" or '
'newline or the quote>\n'
' longstringchar ::= <any source character except "\\">\n'
' escapeseq ::= "\\" <any ASCII character>\n'
'\n'
'One syntactic restriction not indicated by these productions is '
'that\n'
'whitespace is not allowed between the "stringprefix" and the rest '
'of\n'
'the string literal. The source character set is defined by the\n'
'encoding declaration; it is ASCII if no encoding declaration is '
'given\n'
'in the source file; see section Encoding declarations.\n'
'\n'
'In plain English: String literals can be enclosed in matching '
'single\n'
'quotes ("\'") or double quotes ("""). They can also be enclosed '
'in\n'
'matching groups of three single or double quotes (these are '
'generally\n'
'referred to as *triple-quoted strings*). The backslash ("\\")\n'
'character is used to escape characters that otherwise have a '
'special\n'
'meaning, such as newline, backslash itself, or the quote '
'character.\n'
'String literals may optionally be prefixed with a letter "\'r\'" '
'or\n'
'"\'R\'"; such strings are called *raw strings* and use different '
'rules\n'
'for interpreting backslash escape sequences. A prefix of "\'u\'" '
'or\n'
'"\'U\'" makes the string a Unicode string. Unicode strings use '
'the\n'
'Unicode character set as defined by the Unicode Consortium and '
'ISO\n'
'10646. Some additional escape sequences, described below, are\n'
'available in Unicode strings. A prefix of "\'b\'" or "\'B\'" is '
'ignored in\n'
'Python 2; it indicates that the literal should become a bytes '
'literal\n'
'in Python 3 (e.g. when code is automatically converted with '
'2to3). A\n'
'"\'u\'" or "\'b\'" prefix may be followed by an "\'r\'" prefix.\n'
'\n'
'In triple-quoted strings, unescaped newlines and quotes are '
'allowed\n'
'(and are retained), except that three unescaped quotes in a row\n'
'terminate the string. (A "quote" is the character used to open '
'the\n'
'string, i.e. either "\'" or """.)\n'
'\n'
'Unless an "\'r\'" or "\'R\'" prefix is present, escape sequences '
'in\n'
'strings are interpreted according to rules similar to those used '
'by\n'
'Standard C. The recognized escape sequences are:\n'
'\n'
'+-------------------+-----------------------------------+---------+\n'
'| Escape Sequence | Meaning | Notes '
'|\n'
'+===================+===================================+=========+\n'
'| "\\newline" | Ignored '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\\\" | Backslash ("\\") '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\\'" | Single quote ("\'") '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\"" | Double quote (""") '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\a" | ASCII Bell (BEL) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\b" | ASCII Backspace (BS) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\f" | ASCII Formfeed (FF) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\n" | ASCII Linefeed (LF) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\N{name}" | Character named *name* in the '
'| |\n'
'| | Unicode database (Unicode only) | '
'|\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\r" | ASCII Carriage Return (CR) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\t" | ASCII Horizontal Tab (TAB) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\uxxxx" | Character with 16-bit hex value | '
'(1) |\n'
'| | *xxxx* (Unicode only) | '
'|\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\Uxxxxxxxx" | Character with 32-bit hex value | '
'(2) |\n'
'| | *xxxxxxxx* (Unicode only) | '
'|\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\v" | ASCII Vertical Tab (VT) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\ooo" | Character with octal value *ooo* | '
'(3,5) |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\xhh" | Character with hex value *hh* | '
'(4,5) |\n'
'+-------------------+-----------------------------------+---------+\n'
'\n'
'Notes:\n'
'\n'
'1. Individual code units which form parts of a surrogate pair '
'can\n'
' be encoded using this escape sequence.\n'
'\n'
'2. Any Unicode character can be encoded this way, but characters\n'
' outside the Basic Multilingual Plane (BMP) will be encoded '
'using a\n'
' surrogate pair if Python is compiled to use 16-bit code units '
'(the\n'
' default).\n'
'\n'
'3. As in Standard C, up to three octal digits are accepted.\n'
'\n'
'4. Unlike in Standard C, exactly two hex digits are required.\n'
'\n'
'5. In a string literal, hexadecimal and octal escapes denote the\n'
' byte with the given value; it is not necessary that the byte\n'
' encodes a character in the source character set. In a Unicode\n'
' literal, these escapes denote a Unicode character with the '
'given\n'
' value.\n'
'\n'
'Unlike Standard C, all unrecognized escape sequences are left in '
'the\n'
'string unchanged, i.e., *the backslash is left in the string*. '
'(This\n'
'behavior is useful when debugging: if an escape sequence is '
'mistyped,\n'
'the resulting output is more easily recognized as broken.) It is '
'also\n'
'important to note that the escape sequences marked as "(Unicode '
'only)"\n'
'in the table above fall into the category of unrecognized escapes '
'for\n'
'non-Unicode string literals.\n'
'\n'
'When an "\'r\'" or "\'R\'" prefix is present, a character '
'following a\n'
'backslash is included in the string without change, and *all\n'
'backslashes are left in the string*. For example, the string '
'literal\n'
'"r"\\n"" consists of two characters: a backslash and a lowercase '
'"\'n\'".\n'
'String quotes can be escaped with a backslash, but the backslash\n'
'remains in the string; for example, "r"\\""" is a valid string '
'literal\n'
'consisting of two characters: a backslash and a double quote; '
'"r"\\""\n'
'is not a valid string literal (even a raw string cannot end in an '
'odd\n'
'number of backslashes). Specifically, *a raw string cannot end '
'in a\n'
'single backslash* (since the backslash would escape the '
'following\n'
'quote character). Note also that a single backslash followed by '
'a\n'
'newline is interpreted as those two characters as part of the '
'string,\n'
'*not* as a line continuation.\n'
'\n'
'When an "\'r\'" or "\'R\'" prefix is used in conjunction with a '
'"\'u\'" or\n'
'"\'U\'" prefix, then the "\\uXXXX" and "\\UXXXXXXXX" escape '
'sequences are\n'
'processed while *all other backslashes are left in the string*. '
'For\n'
'example, the string literal "ur"\\u0062\\n"" consists of three '
'Unicode\n'
"characters: 'LATIN SMALL LETTER B', 'REVERSE SOLIDUS', and "
"'LATIN\n"
"SMALL LETTER N'. Backslashes can be escaped with a preceding\n"
'backslash; however, both remain in the string. As a result, '
'"\\uXXXX"\n'
'escape sequences are only recognized when there are an odd number '
'of\n'
'backslashes.\n',
'subscriptions': '\n'
'Subscriptions\n'
'*************\n'
'\n'
'A subscription selects an item of a sequence (string, tuple '
'or list)\n'
'or mapping (dictionary) object:\n'
'\n'
' subscription ::= primary "[" expression_list "]"\n'
'\n'
'The primary must evaluate to an object of a sequence or '
'mapping type.\n'
'\n'
'If the primary is a mapping, the expression list must '
'evaluate to an\n'
'object whose value is one of the keys of the mapping, and '
'the\n'
'subscription selects the value in the mapping that '
'corresponds to that\n'
'key. (The expression list is a tuple except if it has '
'exactly one\n'
'item.)\n'
'\n'
'If the primary is a sequence, the expression (list) must '
'evaluate to a\n'
'plain integer. If this value is negative, the length of '
'the sequence\n'
'is added to it (so that, e.g., "x[-1]" selects the last '
'item of "x".)\n'
'The resulting value must be a nonnegative integer less than '
'the number\n'
'of items in the sequence, and the subscription selects the '
'item whose\n'
'index is that value (counting from zero).\n'
'\n'
"A string's items are characters. A character is not a "
'separate data\n'
'type but a string of exactly one character.\n',
'truth': '\n'
'Truth Value Testing\n'
'*******************\n'
'\n'
'Any object can be tested for truth value, for use in an "if" or\n'
'"while" condition or as operand of the Boolean operations below. '
'The\n'
'following values are considered false:\n'
'\n'
'* "None"\n'
'\n'
'* "False"\n'
'\n'
'* zero of any numeric type, for example, "0", "0L", "0.0", "0j".\n'
'\n'
'* any empty sequence, for example, "\'\'", "()", "[]".\n'
'\n'
'* any empty mapping, for example, "{}".\n'
'\n'
'* instances of user-defined classes, if the class defines a\n'
' "__nonzero__()" or "__len__()" method, when that method returns '
'the\n'
' integer zero or "bool" value "False". [1]\n'
'\n'
'All other values are considered true --- so objects of many types '
'are\n'
'always true.\n'
'\n'
'Operations and built-in functions that have a Boolean result '
'always\n'
'return "0" or "False" for false and "1" or "True" for true, unless\n'
'otherwise stated. (Important exception: the Boolean operations '
'"or"\n'
'and "and" always return one of their operands.)\n',
'try': '\n'
'The "try" statement\n'
'*******************\n'
'\n'
'The "try" statement specifies exception handlers and/or cleanup code\n'
'for a group of statements:\n'
'\n'
' try_stmt ::= try1_stmt | try2_stmt\n'
' try1_stmt ::= "try" ":" suite\n'
' ("except" [expression [("as" | ",") identifier]] ":" '
'suite)+\n'
' ["else" ":" suite]\n'
' ["finally" ":" suite]\n'
' try2_stmt ::= "try" ":" suite\n'
' "finally" ":" suite\n'
'\n'
'Changed in version 2.5: In previous versions of Python,\n'
'"try"..."except"..."finally" did not work. "try"..."except" had to '
'be\n'
'nested in "try"..."finally".\n'
'\n'
'The "except" clause(s) specify one or more exception handlers. When '
'no\n'
'exception occurs in the "try" clause, no exception handler is\n'
'executed. When an exception occurs in the "try" suite, a search for '
'an\n'
'exception handler is started. This search inspects the except '
'clauses\n'
'in turn until one is found that matches the exception. An '
'expression-\n'
'less except clause, if present, must be last; it matches any\n'
'exception. For an except clause with an expression, that expression\n'
'is evaluated, and the clause matches the exception if the resulting\n'
'object is "compatible" with the exception. An object is compatible\n'
'with an exception if it is the class or a base class of the '
'exception\n'
'object, or a tuple containing an item compatible with the exception.\n'
'\n'
'If no except clause matches the exception, the search for an '
'exception\n'
'handler continues in the surrounding code and on the invocation '
'stack.\n'
'[1]\n'
'\n'
'If the evaluation of an expression in the header of an except clause\n'
'raises an exception, the original search for a handler is canceled '
'and\n'
'a search starts for the new exception in the surrounding code and on\n'
'the call stack (it is treated as if the entire "try" statement '
'raised\n'
'the exception).\n'
'\n'
'When a matching except clause is found, the exception is assigned to\n'
'the target specified in that except clause, if present, and the '
'except\n'
"clause's suite is executed. All except clauses must have an\n"
'executable block. When the end of this block is reached, execution\n'
'continues normally after the entire try statement. (This means that\n'
'if two nested handlers exist for the same exception, and the '
'exception\n'
'occurs in the try clause of the inner handler, the outer handler '
'will\n'
'not handle the exception.)\n'
'\n'
"Before an except clause's suite is executed, details about the\n"
'exception are assigned to three variables in the "sys" module:\n'
'"sys.exc_type" receives the object identifying the exception;\n'
'"sys.exc_value" receives the exception\'s parameter;\n'
'"sys.exc_traceback" receives a traceback object (see section The\n'
'standard type hierarchy) identifying the point in the program where\n'
'the exception occurred. These details are also available through the\n'
'"sys.exc_info()" function, which returns a tuple "(exc_type,\n'
'exc_value, exc_traceback)". Use of the corresponding variables is\n'
'deprecated in favor of this function, since their use is unsafe in a\n'
'threaded program. As of Python 1.5, the variables are restored to\n'
'their previous values (before the call) when returning from a '
'function\n'
'that handled an exception.\n'
'\n'
'The optional "else" clause is executed if and when control flows off\n'
'the end of the "try" clause. [2] Exceptions in the "else" clause are\n'
'not handled by the preceding "except" clauses.\n'
'\n'
'If "finally" is present, it specifies a \'cleanup\' handler. The '
'"try"\n'
'clause is executed, including any "except" and "else" clauses. If '
'an\n'
'exception occurs in any of the clauses and is not handled, the\n'
'exception is temporarily saved. The "finally" clause is executed. '
'If\n'
'there is a saved exception, it is re-raised at the end of the\n'
'"finally" clause. If the "finally" clause raises another exception '
'or\n'
'executes a "return" or "break" statement, the saved exception is\n'
'discarded:\n'
'\n'
' >>> def f():\n'
' ... try:\n'
' ... 1/0\n'
' ... finally:\n'
' ... return 42\n'
' ...\n'
' >>> f()\n'
' 42\n'
'\n'
'The exception information is not available to the program during\n'
'execution of the "finally" clause.\n'
'\n'
'When a "return", "break" or "continue" statement is executed in the\n'
'"try" suite of a "try"..."finally" statement, the "finally" clause '
'is\n'
'also executed \'on the way out.\' A "continue" statement is illegal '
'in\n'
'the "finally" clause. (The reason is a problem with the current\n'
'implementation --- this restriction may be lifted in the future).\n'
'\n'
'The return value of a function is determined by the last "return"\n'
'statement executed. Since the "finally" clause always executes, a\n'
'"return" statement executed in the "finally" clause will always be '
'the\n'
'last one executed:\n'
'\n'
' >>> def foo():\n'
' ... try:\n'
" ... return 'try'\n"
' ... finally:\n'
" ... return 'finally'\n"
' ...\n'
' >>> foo()\n'
" 'finally'\n"
'\n'
'Additional information on exceptions can be found in section\n'
'Exceptions, and information on using the "raise" statement to '
'generate\n'
'exceptions may be found in section The raise statement.\n',
'types': '\n'
'The standard type hierarchy\n'
'***************************\n'
'\n'
'Below is a list of the types that are built into Python. '
'Extension\n'
'modules (written in C, Java, or other languages, depending on the\n'
'implementation) can define additional types. Future versions of\n'
'Python may add types to the type hierarchy (e.g., rational '
'numbers,\n'
'efficiently stored arrays of integers, etc.).\n'
'\n'
'Some of the type descriptions below contain a paragraph listing\n'
"'special attributes.' These are attributes that provide access to "
'the\n'
'implementation and are not intended for general use. Their '
'definition\n'
'may change in the future.\n'
'\n'
'None\n'
' This type has a single value. There is a single object with '
'this\n'
' value. This object is accessed through the built-in name "None". '
'It\n'
' is used to signify the absence of a value in many situations, '
'e.g.,\n'
" it is returned from functions that don't explicitly return\n"
' anything. Its truth value is false.\n'
'\n'
'NotImplemented\n'
' This type has a single value. There is a single object with '
'this\n'
' value. This object is accessed through the built-in name\n'
' "NotImplemented". Numeric methods and rich comparison methods '
'may\n'
' return this value if they do not implement the operation for '
'the\n'
' operands provided. (The interpreter will then try the '
'reflected\n'
' operation, or some other fallback, depending on the operator.) '
'Its\n'
' truth value is true.\n'
'\n'
'Ellipsis\n'
' This type has a single value. There is a single object with '
'this\n'
' value. This object is accessed through the built-in name\n'
' "Ellipsis". It is used to indicate the presence of the "..." '
'syntax\n'
' in a slice. Its truth value is true.\n'
'\n'
'"numbers.Number"\n'
' These are created by numeric literals and returned as results '
'by\n'
' arithmetic operators and arithmetic built-in functions. '
'Numeric\n'
' objects are immutable; once created their value never changes.\n'
' Python numbers are of course strongly related to mathematical\n'
' numbers, but subject to the limitations of numerical '
'representation\n'
' in computers.\n'
'\n'
' Python distinguishes between integers, floating point numbers, '
'and\n'
' complex numbers:\n'
'\n'
' "numbers.Integral"\n'
' These represent elements from the mathematical set of '
'integers\n'
' (positive and negative).\n'
'\n'
' There are three types of integers:\n'
'\n'
' Plain integers\n'
' These represent numbers in the range -2147483648 through\n'
' 2147483647. (The range may be larger on machines with a\n'
' larger natural word size, but not smaller.) When the '
'result\n'
' of an operation would fall outside this range, the result '
'is\n'
' normally returned as a long integer (in some cases, the\n'
' exception "OverflowError" is raised instead). For the\n'
' purpose of shift and mask operations, integers are assumed '
'to\n'
" have a binary, 2's complement notation using 32 or more "
'bits,\n'
' and hiding no bits from the user (i.e., all 4294967296\n'
' different bit patterns correspond to different values).\n'
'\n'
' Long integers\n'
' These represent numbers in an unlimited range, subject to\n'
' available (virtual) memory only. For the purpose of '
'shift\n'
' and mask operations, a binary representation is assumed, '
'and\n'
" negative numbers are represented in a variant of 2's\n"
' complement which gives the illusion of an infinite string '
'of\n'
' sign bits extending to the left.\n'
'\n'
' Booleans\n'
' These represent the truth values False and True. The two\n'
' objects representing the values "False" and "True" are '
'the\n'
' only Boolean objects. The Boolean type is a subtype of '
'plain\n'
' integers, and Boolean values behave like the values 0 and '
'1,\n'
' respectively, in almost all contexts, the exception being\n'
' that when converted to a string, the strings ""False"" or\n'
' ""True"" are returned, respectively.\n'
'\n'
' The rules for integer representation are intended to give '
'the\n'
' most meaningful interpretation of shift and mask operations\n'
' involving negative integers and the least surprises when\n'
' switching between the plain and long integer domains. Any\n'
' operation, if it yields a result in the plain integer '
'domain,\n'
' will yield the same result in the long integer domain or '
'when\n'
' using mixed operands. The switch between domains is '
'transparent\n'
' to the programmer.\n'
'\n'
' "numbers.Real" ("float")\n'
' These represent machine-level double precision floating '
'point\n'
' numbers. You are at the mercy of the underlying machine\n'
' architecture (and C or Java implementation) for the accepted\n'
' range and handling of overflow. Python does not support '
'single-\n'
' precision floating point numbers; the savings in processor '
'and\n'
' memory usage that are usually the reason for using these are\n'
' dwarfed by the overhead of using objects in Python, so there '
'is\n'
' no reason to complicate the language with two kinds of '
'floating\n'
' point numbers.\n'
'\n'
' "numbers.Complex"\n'
' These represent complex numbers as a pair of machine-level\n'
' double precision floating point numbers. The same caveats '
'apply\n'
' as for floating point numbers. The real and imaginary parts '
'of a\n'
' complex number "z" can be retrieved through the read-only\n'
' attributes "z.real" and "z.imag".\n'
'\n'
'Sequences\n'
' These represent finite ordered sets indexed by non-negative\n'
' numbers. The built-in function "len()" returns the number of '
'items\n'
' of a sequence. When the length of a sequence is *n*, the index '
'set\n'
' contains the numbers 0, 1, ..., *n*-1. Item *i* of sequence *a* '
'is\n'
' selected by "a[i]".\n'
'\n'
' Sequences also support slicing: "a[i:j]" selects all items with\n'
' index *k* such that *i* "<=" *k* "<" *j*. When used as an\n'
' expression, a slice is a sequence of the same type. This '
'implies\n'
' that the index set is renumbered so that it starts at 0.\n'
'\n'
' Some sequences also support "extended slicing" with a third '
'"step"\n'
' parameter: "a[i:j:k]" selects all items of *a* with index *x* '
'where\n'
' "x = i + n*k", *n* ">=" "0" and *i* "<=" *x* "<" *j*.\n'
'\n'
' Sequences are distinguished according to their mutability:\n'
'\n'
' Immutable sequences\n'
' An object of an immutable sequence type cannot change once it '
'is\n'
' created. (If the object contains references to other '
'objects,\n'
' these other objects may be mutable and may be changed; '
'however,\n'
' the collection of objects directly referenced by an '
'immutable\n'
' object cannot change.)\n'
'\n'
' The following types are immutable sequences:\n'
'\n'
' Strings\n'
' The items of a string are characters. There is no '
'separate\n'
' character type; a character is represented by a string of '
'one\n'
' item. Characters represent (at least) 8-bit bytes. The\n'
' built-in functions "chr()" and "ord()" convert between\n'
' characters and nonnegative integers representing the byte\n'
' values. Bytes with the values 0--127 usually represent '
'the\n'
' corresponding ASCII values, but the interpretation of '
'values\n'
' is up to the program. The string data type is also used '
'to\n'
' represent arrays of bytes, e.g., to hold data read from a\n'
' file.\n'
'\n'
' (On systems whose native character set is not ASCII, '
'strings\n'
' may use EBCDIC in their internal representation, provided '
'the\n'
' functions "chr()" and "ord()" implement a mapping between\n'
' ASCII and EBCDIC, and string comparison preserves the '
'ASCII\n'
' order. Or perhaps someone can propose a better rule?)\n'
'\n'
' Unicode\n'
' The items of a Unicode object are Unicode code units. A\n'
' Unicode code unit is represented by a Unicode object of '
'one\n'
' item and can hold either a 16-bit or 32-bit value\n'
' representing a Unicode ordinal (the maximum value for the\n'
' ordinal is given in "sys.maxunicode", and depends on how\n'
' Python is configured at compile time). Surrogate pairs '
'may\n'
' be present in the Unicode object, and will be reported as '
'two\n'
' separate items. The built-in functions "unichr()" and\n'
' "ord()" convert between code units and nonnegative '
'integers\n'
' representing the Unicode ordinals as defined in the '
'Unicode\n'
' Standard 3.0. Conversion from and to other encodings are\n'
' possible through the Unicode method "encode()" and the '
'built-\n'
' in function "unicode()".\n'
'\n'
' Tuples\n'
' The items of a tuple are arbitrary Python objects. Tuples '
'of\n'
' two or more items are formed by comma-separated lists of\n'
" expressions. A tuple of one item (a 'singleton') can be\n"
' formed by affixing a comma to an expression (an expression '
'by\n'
' itself does not create a tuple, since parentheses must be\n'
' usable for grouping of expressions). An empty tuple can '
'be\n'
' formed by an empty pair of parentheses.\n'
'\n'
' Mutable sequences\n'
' Mutable sequences can be changed after they are created. '
'The\n'
' subscription and slicing notations can be used as the target '
'of\n'
' assignment and "del" (delete) statements.\n'
'\n'
' There are currently two intrinsic mutable sequence types:\n'
'\n'
' Lists\n'
' The items of a list are arbitrary Python objects. Lists '
'are\n'
' formed by placing a comma-separated list of expressions '
'in\n'
' square brackets. (Note that there are no special cases '
'needed\n'
' to form lists of length 0 or 1.)\n'
'\n'
' Byte Arrays\n'
' A bytearray object is a mutable array. They are created '
'by\n'
' the built-in "bytearray()" constructor. Aside from being\n'
' mutable (and hence unhashable), byte arrays otherwise '
'provide\n'
' the same interface and functionality as immutable bytes\n'
' objects.\n'
'\n'
' The extension module "array" provides an additional example '
'of a\n'
' mutable sequence type.\n'
'\n'
'Set types\n'
' These represent unordered, finite sets of unique, immutable\n'
' objects. As such, they cannot be indexed by any subscript. '
'However,\n'
' they can be iterated over, and the built-in function "len()"\n'
' returns the number of items in a set. Common uses for sets are '
'fast\n'
' membership testing, removing duplicates from a sequence, and\n'
' computing mathematical operations such as intersection, union,\n'
' difference, and symmetric difference.\n'
'\n'
' For set elements, the same immutability rules apply as for\n'
' dictionary keys. Note that numeric types obey the normal rules '
'for\n'
' numeric comparison: if two numbers compare equal (e.g., "1" and\n'
' "1.0"), only one of them can be contained in a set.\n'
'\n'
' There are currently two intrinsic set types:\n'
'\n'
' Sets\n'
' These represent a mutable set. They are created by the '
'built-in\n'
' "set()" constructor and can be modified afterwards by '
'several\n'
' methods, such as "add()".\n'
'\n'
' Frozen sets\n'
' These represent an immutable set. They are created by the\n'
' built-in "frozenset()" constructor. As a frozenset is '
'immutable\n'
' and *hashable*, it can be used again as an element of '
'another\n'
' set, or as a dictionary key.\n'
'\n'
'Mappings\n'
' These represent finite sets of objects indexed by arbitrary '
'index\n'
' sets. The subscript notation "a[k]" selects the item indexed by '
'"k"\n'
' from the mapping "a"; this can be used in expressions and as '
'the\n'
' target of assignments or "del" statements. The built-in '
'function\n'
' "len()" returns the number of items in a mapping.\n'
'\n'
' There is currently a single intrinsic mapping type:\n'
'\n'
' Dictionaries\n'
' These represent finite sets of objects indexed by nearly\n'
' arbitrary values. The only types of values not acceptable '
'as\n'
' keys are values containing lists or dictionaries or other\n'
' mutable types that are compared by value rather than by '
'object\n'
' identity, the reason being that the efficient implementation '
'of\n'
" dictionaries requires a key's hash value to remain constant.\n"
' Numeric types used for keys obey the normal rules for '
'numeric\n'
' comparison: if two numbers compare equal (e.g., "1" and '
'"1.0")\n'
' then they can be used interchangeably to index the same\n'
' dictionary entry.\n'
'\n'
' Dictionaries are mutable; they can be created by the "{...}"\n'
' notation (see section Dictionary displays).\n'
'\n'
' The extension modules "dbm", "gdbm", and "bsddb" provide\n'
' additional examples of mapping types.\n'
'\n'
'Callable types\n'
' These are the types to which the function call operation (see\n'
' section Calls) can be applied:\n'
'\n'
' User-defined functions\n'
' A user-defined function object is created by a function\n'
' definition (see section Function definitions). It should be\n'
' called with an argument list containing the same number of '
'items\n'
" as the function's formal parameter list.\n"
'\n'
' Special attributes:\n'
'\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
' | Attribute | Meaning '
'| |\n'
' '
'+=========================+=================================+=============+\n'
' | "__doc__" "func_doc" | The function\'s documentation '
'| Writable |\n'
' | | string, or "None" if '
'| |\n'
' | | unavailable. '
'| |\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
' | "__name__" "func_name" | The function\'s name '
'| Writable |\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
' | "__module__" | The name of the module the | '
'Writable |\n'
' | | function was defined in, or '
'| |\n'
' | | "None" if unavailable. '
'| |\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
' | "__defaults__" | A tuple containing default | '
'Writable |\n'
' | "func_defaults" | argument values for those '
'| |\n'
' | | arguments that have defaults, '
'| |\n'
' | | or "None" if no arguments have '
'| |\n'
' | | a default value. '
'| |\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
' | "__code__" "func_code" | The code object representing | '
'Writable |\n'
' | | the compiled function body. '
'| |\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
' | "__globals__" | A reference to the dictionary | '
'Read-only |\n'
' | "func_globals" | that holds the function\'s '
'| |\n'
' | | global variables --- the global '
'| |\n'
' | | namespace of the module in '
'| |\n'
' | | which the function was defined. '
'| |\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
' | "__dict__" "func_dict" | The namespace supporting | '
'Writable |\n'
' | | arbitrary function attributes. '
'| |\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
' | "__closure__" | "None" or a tuple of cells that | '
'Read-only |\n'
' | "func_closure" | contain bindings for the '
'| |\n'
" | | function's free variables. "
'| |\n'
' '
'+-------------------------+---------------------------------+-------------+\n'
'\n'
' Most of the attributes labelled "Writable" check the type of '
'the\n'
' assigned value.\n'
'\n'
' Changed in version 2.4: "func_name" is now writable.\n'
'\n'
' Changed in version 2.6: The double-underscore attributes\n'
' "__closure__", "__code__", "__defaults__", and "__globals__"\n'
' were introduced as aliases for the corresponding "func_*"\n'
' attributes for forwards compatibility with Python 3.\n'
'\n'
' Function objects also support getting and setting arbitrary\n'
' attributes, which can be used, for example, to attach '
'metadata\n'
' to functions. Regular attribute dot-notation is used to get '
'and\n'
' set such attributes. *Note that the current implementation '
'only\n'
' supports function attributes on user-defined functions. '
'Function\n'
' attributes on built-in functions may be supported in the\n'
' future.*\n'
'\n'
" Additional information about a function's definition can be\n"
' retrieved from its code object; see the description of '
'internal\n'
' types below.\n'
'\n'
' User-defined methods\n'
' A user-defined method object combines a class, a class '
'instance\n'
' (or "None") and any callable object (normally a user-defined\n'
' function).\n'
'\n'
' Special read-only attributes: "im_self" is the class '
'instance\n'
' object, "im_func" is the function object; "im_class" is the\n'
' class of "im_self" for bound methods or the class that asked '
'for\n'
' the method for unbound methods; "__doc__" is the method\'s\n'
' documentation (same as "im_func.__doc__"); "__name__" is the\n'
' method name (same as "im_func.__name__"); "__module__" is '
'the\n'
' name of the module the method was defined in, or "None" if\n'
' unavailable.\n'
'\n'
' Changed in version 2.2: "im_self" used to refer to the class\n'
' that defined the method.\n'
'\n'
' Changed in version 2.6: For Python 3 forward-compatibility,\n'
' "im_func" is also available as "__func__", and "im_self" as\n'
' "__self__".\n'
'\n'
' Methods also support accessing (but not setting) the '
'arbitrary\n'
' function attributes on the underlying function object.\n'
'\n'
' User-defined method objects may be created when getting an\n'
' attribute of a class (perhaps via an instance of that class), '
'if\n'
' that attribute is a user-defined function object, an unbound\n'
' user-defined method object, or a class method object. When '
'the\n'
' attribute is a user-defined method object, a new method '
'object\n'
' is only created if the class from which it is being retrieved '
'is\n'
' the same as, or a derived class of, the class stored in the\n'
' original method object; otherwise, the original method object '
'is\n'
' used as it is.\n'
'\n'
' When a user-defined method object is created by retrieving a\n'
' user-defined function object from a class, its "im_self"\n'
' attribute is "None" and the method object is said to be '
'unbound.\n'
' When one is created by retrieving a user-defined function '
'object\n'
' from a class via one of its instances, its "im_self" '
'attribute\n'
' is the instance, and the method object is said to be bound. '
'In\n'
' either case, the new method\'s "im_class" attribute is the '
'class\n'
' from which the retrieval takes place, and its "im_func"\n'
' attribute is the original function object.\n'
'\n'
' When a user-defined method object is created by retrieving\n'
' another method object from a class or instance, the behaviour '
'is\n'
' the same as for a function object, except that the "im_func"\n'
' attribute of the new instance is not the original method '
'object\n'
' but its "im_func" attribute.\n'
'\n'
' When a user-defined method object is created by retrieving a\n'
' class method object from a class or instance, its "im_self"\n'
' attribute is the class itself, and its "im_func" attribute '
'is\n'
' the function object underlying the class method.\n'
'\n'
' When an unbound user-defined method object is called, the\n'
' underlying function ("im_func") is called, with the '
'restriction\n'
' that the first argument must be an instance of the proper '
'class\n'
' ("im_class") or of a derived class thereof.\n'
'\n'
' When a bound user-defined method object is called, the\n'
' underlying function ("im_func") is called, inserting the '
'class\n'
' instance ("im_self") in front of the argument list. For\n'
' instance, when "C" is a class which contains a definition for '
'a\n'
' function "f()", and "x" is an instance of "C", calling '
'"x.f(1)"\n'
' is equivalent to calling "C.f(x, 1)".\n'
'\n'
' When a user-defined method object is derived from a class '
'method\n'
' object, the "class instance" stored in "im_self" will '
'actually\n'
' be the class itself, so that calling either "x.f(1)" or '
'"C.f(1)"\n'
' is equivalent to calling "f(C,1)" where "f" is the '
'underlying\n'
' function.\n'
'\n'
' Note that the transformation from function object to (unbound '
'or\n'
' bound) method object happens each time the attribute is\n'
' retrieved from the class or instance. In some cases, a '
'fruitful\n'
' optimization is to assign the attribute to a local variable '
'and\n'
' call that local variable. Also notice that this '
'transformation\n'
' only happens for user-defined functions; other callable '
'objects\n'
' (and all non-callable objects) are retrieved without\n'
' transformation. It is also important to note that '
'user-defined\n'
' functions which are attributes of a class instance are not\n'
' converted to bound methods; this *only* happens when the\n'
' function is an attribute of the class.\n'
'\n'
' Generator functions\n'
' A function or method which uses the "yield" statement (see\n'
' section The yield statement) is called a *generator '
'function*.\n'
' Such a function, when called, always returns an iterator '
'object\n'
' which can be used to execute the body of the function: '
'calling\n'
' the iterator\'s "next()" method will cause the function to\n'
' execute until it provides a value using the "yield" '
'statement.\n'
' When the function executes a "return" statement or falls off '
'the\n'
' end, a "StopIteration" exception is raised and the iterator '
'will\n'
' have reached the end of the set of values to be returned.\n'
'\n'
' Built-in functions\n'
' A built-in function object is a wrapper around a C function.\n'
' Examples of built-in functions are "len()" and "math.sin()"\n'
' ("math" is a standard built-in module). The number and type '
'of\n'
' the arguments are determined by the C function. Special '
'read-\n'
' only attributes: "__doc__" is the function\'s documentation\n'
' string, or "None" if unavailable; "__name__" is the '
"function's\n"
' name; "__self__" is set to "None" (but see the next item);\n'
' "__module__" is the name of the module the function was '
'defined\n'
' in or "None" if unavailable.\n'
'\n'
' Built-in methods\n'
' This is really a different disguise of a built-in function, '
'this\n'
' time containing an object passed to the C function as an\n'
' implicit extra argument. An example of a built-in method is\n'
' "alist.append()", assuming *alist* is a list object. In this\n'
' case, the special read-only attribute "__self__" is set to '
'the\n'
' object denoted by *alist*.\n'
'\n'
' Class Types\n'
' Class types, or "new-style classes," are callable. These\n'
' objects normally act as factories for new instances of\n'
' themselves, but variations are possible for class types that\n'
' override "__new__()". The arguments of the call are passed '
'to\n'
' "__new__()" and, in the typical case, to "__init__()" to\n'
' initialize the new instance.\n'
'\n'
' Classic Classes\n'
' Class objects are described below. When a class object is\n'
' called, a new class instance (also described below) is '
'created\n'
" and returned. This implies a call to the class's "
'"__init__()"\n'
' method if it has one. Any arguments are passed on to the\n'
' "__init__()" method. If there is no "__init__()" method, '
'the\n'
' class must be called without arguments.\n'
'\n'
' Class instances\n'
' Class instances are described below. Class instances are\n'
' callable only when the class has a "__call__()" method;\n'
' "x(arguments)" is a shorthand for "x.__call__(arguments)".\n'
'\n'
'Modules\n'
' Modules are imported by the "import" statement (see section The\n'
' import statement). A module object has a namespace implemented '
'by a\n'
' dictionary object (this is the dictionary referenced by the\n'
' func_globals attribute of functions defined in the module).\n'
' Attribute references are translated to lookups in this '
'dictionary,\n'
' e.g., "m.x" is equivalent to "m.__dict__["x"]". A module object\n'
' does not contain the code object used to initialize the module\n'
" (since it isn't needed once the initialization is done).\n"
'\n'
" Attribute assignment updates the module's namespace dictionary,\n"
' e.g., "m.x = 1" is equivalent to "m.__dict__["x"] = 1".\n'
'\n'
' Special read-only attribute: "__dict__" is the module\'s '
'namespace\n'
' as a dictionary object.\n'
'\n'
' **CPython implementation detail:** Because of the way CPython\n'
' clears module dictionaries, the module dictionary will be '
'cleared\n'
' when the module falls out of scope even if the dictionary still '
'has\n'
' live references. To avoid this, copy the dictionary or keep '
'the\n'
' module around while using its dictionary directly.\n'
'\n'
' Predefined (writable) attributes: "__name__" is the module\'s '
'name;\n'
' "__doc__" is the module\'s documentation string, or "None" if\n'
' unavailable; "__file__" is the pathname of the file from which '
'the\n'
' module was loaded, if it was loaded from a file. The "__file__"\n'
' attribute is not present for C modules that are statically '
'linked\n'
' into the interpreter; for extension modules loaded dynamically '
'from\n'
' a shared library, it is the pathname of the shared library '
'file.\n'
'\n'
'Classes\n'
' Both class types (new-style classes) and class objects (old-\n'
' style/classic classes) are typically created by class '
'definitions\n'
' (see section Class definitions). A class has a namespace\n'
' implemented by a dictionary object. Class attribute references '
'are\n'
' translated to lookups in this dictionary, e.g., "C.x" is '
'translated\n'
' to "C.__dict__["x"]" (although for new-style classes in '
'particular\n'
' there are a number of hooks which allow for other means of '
'locating\n'
' attributes). When the attribute name is not found there, the\n'
' attribute search continues in the base classes. For old-style\n'
' classes, the search is depth-first, left-to-right in the order '
'of\n'
' occurrence in the base class list. New-style classes use the '
'more\n'
' complex C3 method resolution order which behaves correctly even '
'in\n'
" the presence of 'diamond' inheritance structures where there "
'are\n'
' multiple inheritance paths leading back to a common ancestor.\n'
' Additional details on the C3 MRO used by new-style classes can '
'be\n'
' found in the documentation accompanying the 2.3 release at\n'
' https://www.python.org/download/releases/2.3/mro/.\n'
'\n'
' When a class attribute reference (for class "C", say) would '
'yield a\n'
' user-defined function object or an unbound user-defined method\n'
' object whose associated class is either "C" or one of its base\n'
' classes, it is transformed into an unbound user-defined method\n'
' object whose "im_class" attribute is "C". When it would yield a\n'
' class method object, it is transformed into a bound '
'user-defined\n'
' method object whose "im_self" attribute is "C". When it would\n'
' yield a static method object, it is transformed into the object\n'
' wrapped by the static method object. See section Implementing\n'
' Descriptors for another way in which attributes retrieved from '
'a\n'
' class may differ from those actually contained in its '
'"__dict__"\n'
' (note that only new-style classes support descriptors).\n'
'\n'
" Class attribute assignments update the class's dictionary, "
'never\n'
' the dictionary of a base class.\n'
'\n'
' A class object can be called (see above) to yield a class '
'instance\n'
' (see below).\n'
'\n'
' Special attributes: "__name__" is the class name; "__module__" '
'is\n'
' the module name in which the class was defined; "__dict__" is '
'the\n'
' dictionary containing the class\'s namespace; "__bases__" is a '
'tuple\n'
' (possibly empty or a singleton) containing the base classes, in '
'the\n'
' order of their occurrence in the base class list; "__doc__" is '
'the\n'
' class\'s documentation string, or "None" if undefined.\n'
'\n'
'Class instances\n'
' A class instance is created by calling a class object (see '
'above).\n'
' A class instance has a namespace implemented as a dictionary '
'which\n'
' is the first place in which attribute references are searched.\n'
" When an attribute is not found there, and the instance's class "
'has\n'
' an attribute by that name, the search continues with the class\n'
' attributes. If a class attribute is found that is a '
'user-defined\n'
' function object or an unbound user-defined method object whose\n'
' associated class is the class (call it "C") of the instance for\n'
' which the attribute reference was initiated or one of its bases, '
'it\n'
' is transformed into a bound user-defined method object whose\n'
' "im_class" attribute is "C" and whose "im_self" attribute is '
'the\n'
' instance. Static method and class method objects are also\n'
' transformed, as if they had been retrieved from class "C"; see\n'
' above under "Classes". See section Implementing Descriptors for\n'
' another way in which attributes of a class retrieved via its\n'
' instances may differ from the objects actually stored in the\n'
' class\'s "__dict__". If no class attribute is found, and the\n'
' object\'s class has a "__getattr__()" method, that is called to\n'
' satisfy the lookup.\n'
'\n'
" Attribute assignments and deletions update the instance's\n"
" dictionary, never a class's dictionary. If the class has a\n"
' "__setattr__()" or "__delattr__()" method, this is called '
'instead\n'
' of updating the instance dictionary directly.\n'
'\n'
' Class instances can pretend to be numbers, sequences, or '
'mappings\n'
' if they have methods with certain special names. See section\n'
' Special method names.\n'
'\n'
' Special attributes: "__dict__" is the attribute dictionary;\n'
' "__class__" is the instance\'s class.\n'
'\n'
'Files\n'
' A file object represents an open file. File objects are created '
'by\n'
' the "open()" built-in function, and also by "os.popen()",\n'
' "os.fdopen()", and the "makefile()" method of socket objects '
'(and\n'
' perhaps by other functions or methods provided by extension\n'
' modules). The objects "sys.stdin", "sys.stdout" and '
'"sys.stderr"\n'
' are initialized to file objects corresponding to the '
"interpreter's\n"
' standard input, output and error streams. See File Objects for\n'
' complete documentation of file objects.\n'
'\n'
'Internal types\n'
' A few types used internally by the interpreter are exposed to '
'the\n'
' user. Their definitions may change with future versions of the\n'
' interpreter, but they are mentioned here for completeness.\n'
'\n'
' Code objects\n'
' Code objects represent *byte-compiled* executable Python '
'code,\n'
' or *bytecode*. The difference between a code object and a\n'
' function object is that the function object contains an '
'explicit\n'
" reference to the function's globals (the module in which it "
'was\n'
' defined), while a code object contains no context; also the\n'
' default argument values are stored in the function object, '
'not\n'
' in the code object (because they represent values calculated '
'at\n'
' run-time). Unlike function objects, code objects are '
'immutable\n'
' and contain no references (directly or indirectly) to '
'mutable\n'
' objects.\n'
'\n'
' Special read-only attributes: "co_name" gives the function '
'name;\n'
' "co_argcount" is the number of positional arguments '
'(including\n'
' arguments with default values); "co_nlocals" is the number '
'of\n'
' local variables used by the function (including arguments);\n'
' "co_varnames" is a tuple containing the names of the local\n'
' variables (starting with the argument names); "co_cellvars" '
'is a\n'
' tuple containing the names of local variables that are\n'
' referenced by nested functions; "co_freevars" is a tuple\n'
' containing the names of free variables; "co_code" is a '
'string\n'
' representing the sequence of bytecode instructions; '
'"co_consts"\n'
' is a tuple containing the literals used by the bytecode;\n'
' "co_names" is a tuple containing the names used by the '
'bytecode;\n'
' "co_filename" is the filename from which the code was '
'compiled;\n'
' "co_firstlineno" is the first line number of the function;\n'
' "co_lnotab" is a string encoding the mapping from bytecode\n'
' offsets to line numbers (for details see the source code of '
'the\n'
' interpreter); "co_stacksize" is the required stack size\n'
' (including local variables); "co_flags" is an integer '
'encoding a\n'
' number of flags for the interpreter.\n'
'\n'
' The following flag bits are defined for "co_flags": bit '
'"0x04"\n'
' is set if the function uses the "*arguments" syntax to accept '
'an\n'
' arbitrary number of positional arguments; bit "0x08" is set '
'if\n'
' the function uses the "**keywords" syntax to accept '
'arbitrary\n'
' keyword arguments; bit "0x20" is set if the function is a\n'
' generator.\n'
'\n'
' Future feature declarations ("from __future__ import '
'division")\n'
' also use bits in "co_flags" to indicate whether a code '
'object\n'
' was compiled with a particular feature enabled: bit "0x2000" '
'is\n'
' set if the function was compiled with future division '
'enabled;\n'
' bits "0x10" and "0x1000" were used in earlier versions of\n'
' Python.\n'
'\n'
' Other bits in "co_flags" are reserved for internal use.\n'
'\n'
' If a code object represents a function, the first item in\n'
' "co_consts" is the documentation string of the function, or\n'
' "None" if undefined.\n'
'\n'
' Frame objects\n'
' Frame objects represent execution frames. They may occur in\n'
' traceback objects (see below).\n'
'\n'
' Special read-only attributes: "f_back" is to the previous '
'stack\n'
' frame (towards the caller), or "None" if this is the bottom\n'
' stack frame; "f_code" is the code object being executed in '
'this\n'
' frame; "f_locals" is the dictionary used to look up local\n'
' variables; "f_globals" is used for global variables;\n'
' "f_builtins" is used for built-in (intrinsic) names;\n'
' "f_restricted" is a flag indicating whether the function is\n'
' executing in restricted execution mode; "f_lasti" gives the\n'
' precise instruction (this is an index into the bytecode '
'string\n'
' of the code object).\n'
'\n'
' Special writable attributes: "f_trace", if not "None", is a\n'
' function called at the start of each source code line (this '
'is\n'
' used by the debugger); "f_exc_type", "f_exc_value",\n'
' "f_exc_traceback" represent the last exception raised in the\n'
' parent frame provided another exception was ever raised in '
'the\n'
' current frame (in all other cases they are "None"); '
'"f_lineno"\n'
' is the current line number of the frame --- writing to this '
'from\n'
' within a trace function jumps to the given line (only for '
'the\n'
' bottom-most frame). A debugger can implement a Jump command\n'
' (aka Set Next Statement) by writing to f_lineno.\n'
'\n'
' Traceback objects\n'
' Traceback objects represent a stack trace of an exception. '
'A\n'
' traceback object is created when an exception occurs. When '
'the\n'
' search for an exception handler unwinds the execution stack, '
'at\n'
' each unwound level a traceback object is inserted in front '
'of\n'
' the current traceback. When an exception handler is '
'entered,\n'
' the stack trace is made available to the program. (See '
'section\n'
' The try statement.) It is accessible as "sys.exc_traceback", '
'and\n'
' also as the third item of the tuple returned by\n'
' "sys.exc_info()". The latter is the preferred interface, '
'since\n'
' it works correctly when the program is using multiple '
'threads.\n'
' When the program contains no suitable handler, the stack '
'trace\n'
' is written (nicely formatted) to the standard error stream; '
'if\n'
' the interpreter is interactive, it is also made available to '
'the\n'
' user as "sys.last_traceback".\n'
'\n'
' Special read-only attributes: "tb_next" is the next level in '
'the\n'
' stack trace (towards the frame where the exception occurred), '
'or\n'
' "None" if there is no next level; "tb_frame" points to the\n'
' execution frame of the current level; "tb_lineno" gives the '
'line\n'
' number where the exception occurred; "tb_lasti" indicates '
'the\n'
' precise instruction. The line number and last instruction '
'in\n'
' the traceback may differ from the line number of its frame\n'
' object if the exception occurred in a "try" statement with '
'no\n'
' matching except clause or with a finally clause.\n'
'\n'
' Slice objects\n'
' Slice objects are used to represent slices when *extended '
'slice\n'
' syntax* is used. This is a slice using two colons, or '
'multiple\n'
' slices or ellipses separated by commas, e.g., "a[i:j:step]",\n'
' "a[i:j, k:l]", or "a[..., i:j]". They are also created by '
'the\n'
' built-in "slice()" function.\n'
'\n'
' Special read-only attributes: "start" is the lower bound; '
'"stop"\n'
' is the upper bound; "step" is the step value; each is "None" '
'if\n'
' omitted. These attributes can have any type.\n'
'\n'
' Slice objects support one method:\n'
'\n'
' slice.indices(self, length)\n'
'\n'
' This method takes a single integer argument *length* and\n'
' computes information about the extended slice that the '
'slice\n'
' object would describe if applied to a sequence of '
'*length*\n'
' items. It returns a tuple of three integers; '
'respectively\n'
' these are the *start* and *stop* indices and the *step* '
'or\n'
' stride length of the slice. Missing or out-of-bounds '
'indices\n'
' are handled in a manner consistent with regular slices.\n'
'\n'
' New in version 2.3.\n'
'\n'
' Static method objects\n'
' Static method objects provide a way of defeating the\n'
' transformation of function objects to method objects '
'described\n'
' above. A static method object is a wrapper around any other\n'
' object, usually a user-defined method object. When a static\n'
' method object is retrieved from a class or a class instance, '
'the\n'
' object actually returned is the wrapped object, which is not\n'
' subject to any further transformation. Static method objects '
'are\n'
' not themselves callable, although the objects they wrap '
'usually\n'
' are. Static method objects are created by the built-in\n'
' "staticmethod()" constructor.\n'
'\n'
' Class method objects\n'
' A class method object, like a static method object, is a '
'wrapper\n'
' around another object that alters the way in which that '
'object\n'
' is retrieved from classes and class instances. The behaviour '
'of\n'
' class method objects upon such retrieval is described above,\n'
' under "User-defined methods". Class method objects are '
'created\n'
' by the built-in "classmethod()" constructor.\n',
'typesfunctions': '\n'
'Functions\n'
'*********\n'
'\n'
'Function objects are created by function definitions. The '
'only\n'
'operation on a function object is to call it: '
'"func(argument-list)".\n'
'\n'
'There are really two flavors of function objects: built-in '
'functions\n'
'and user-defined functions. Both support the same '
'operation (to call\n'
'the function), but the implementation is different, hence '
'the\n'
'different object types.\n'
'\n'
'See Function definitions for more information.\n',
'typesmapping': '\n'
'Mapping Types --- "dict"\n'
'************************\n'
'\n'
'A *mapping* object maps *hashable* values to arbitrary '
'objects.\n'
'Mappings are mutable objects. There is currently only one '
'standard\n'
'mapping type, the *dictionary*. (For other containers see '
'the built\n'
'in "list", "set", and "tuple" classes, and the "collections" '
'module.)\n'
'\n'
"A dictionary's keys are *almost* arbitrary values. Values "
'that are\n'
'not *hashable*, that is, values containing lists, '
'dictionaries or\n'
'other mutable types (that are compared by value rather than '
'by object\n'
'identity) may not be used as keys. Numeric types used for '
'keys obey\n'
'the normal rules for numeric comparison: if two numbers '
'compare equal\n'
'(such as "1" and "1.0") then they can be used '
'interchangeably to index\n'
'the same dictionary entry. (Note however, that since '
'computers store\n'
'floating-point numbers as approximations it is usually '
'unwise to use\n'
'them as dictionary keys.)\n'
'\n'
'Dictionaries can be created by placing a comma-separated '
'list of "key:\n'
'value" pairs within braces, for example: "{\'jack\': 4098, '
"'sjoerd':\n"
'4127}" or "{4098: \'jack\', 4127: \'sjoerd\'}", or by the '
'"dict"\n'
'constructor.\n'
'\n'
'class dict(**kwarg)\n'
'class dict(mapping, **kwarg)\n'
'class dict(iterable, **kwarg)\n'
'\n'
' Return a new dictionary initialized from an optional '
'positional\n'
' argument and a possibly empty set of keyword arguments.\n'
'\n'
' If no positional argument is given, an empty dictionary '
'is created.\n'
' If a positional argument is given and it is a mapping '
'object, a\n'
' dictionary is created with the same key-value pairs as '
'the mapping\n'
' object. Otherwise, the positional argument must be an '
'*iterable*\n'
' object. Each item in the iterable must itself be an '
'iterable with\n'
' exactly two objects. The first object of each item '
'becomes a key\n'
' in the new dictionary, and the second object the '
'corresponding\n'
' value. If a key occurs more than once, the last value '
'for that key\n'
' becomes the corresponding value in the new dictionary.\n'
'\n'
' If keyword arguments are given, the keyword arguments and '
'their\n'
' values are added to the dictionary created from the '
'positional\n'
' argument. If a key being added is already present, the '
'value from\n'
' the keyword argument replaces the value from the '
'positional\n'
' argument.\n'
'\n'
' To illustrate, the following examples all return a '
'dictionary equal\n'
' to "{"one": 1, "two": 2, "three": 3}":\n'
'\n'
' >>> a = dict(one=1, two=2, three=3)\n'
" >>> b = {'one': 1, 'two': 2, 'three': 3}\n"
" >>> c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))\n"
" >>> d = dict([('two', 2), ('one', 1), ('three', 3)])\n"
" >>> e = dict({'three': 3, 'one': 1, 'two': 2})\n"
' >>> a == b == c == d == e\n'
' True\n'
'\n'
' Providing keyword arguments as in the first example only '
'works for\n'
' keys that are valid Python identifiers. Otherwise, any '
'valid keys\n'
' can be used.\n'
'\n'
' New in version 2.2.\n'
'\n'
' Changed in version 2.3: Support for building a dictionary '
'from\n'
' keyword arguments added.\n'
'\n'
' These are the operations that dictionaries support (and '
'therefore,\n'
' custom mapping types should support too):\n'
'\n'
' len(d)\n'
'\n'
' Return the number of items in the dictionary *d*.\n'
'\n'
' d[key]\n'
'\n'
' Return the item of *d* with key *key*. Raises a '
'"KeyError" if\n'
' *key* is not in the map.\n'
'\n'
' If a subclass of dict defines a method "__missing__()" '
'and *key*\n'
' is not present, the "d[key]" operation calls that '
'method with\n'
' the key *key* as argument. The "d[key]" operation '
'then returns\n'
' or raises whatever is returned or raised by the\n'
' "__missing__(key)" call. No other operations or '
'methods invoke\n'
' "__missing__()". If "__missing__()" is not defined, '
'"KeyError"\n'
' is raised. "__missing__()" must be a method; it cannot '
'be an\n'
' instance variable:\n'
'\n'
' >>> class Counter(dict):\n'
' ... def __missing__(self, key):\n'
' ... return 0\n'
' >>> c = Counter()\n'
" >>> c['red']\n"
' 0\n'
" >>> c['red'] += 1\n"
" >>> c['red']\n"
' 1\n'
'\n'
' The example above shows part of the implementation of\n'
' "collections.Counter". A different "__missing__" '
'method is used\n'
' by "collections.defaultdict".\n'
'\n'
' New in version 2.5: Recognition of __missing__ methods '
'of dict\n'
' subclasses.\n'
'\n'
' d[key] = value\n'
'\n'
' Set "d[key]" to *value*.\n'
'\n'
' del d[key]\n'
'\n'
' Remove "d[key]" from *d*. Raises a "KeyError" if '
'*key* is not\n'
' in the map.\n'
'\n'
' key in d\n'
'\n'
' Return "True" if *d* has a key *key*, else "False".\n'
'\n'
' New in version 2.2.\n'
'\n'
' key not in d\n'
'\n'
' Equivalent to "not key in d".\n'
'\n'
' New in version 2.2.\n'
'\n'
' iter(d)\n'
'\n'
' Return an iterator over the keys of the dictionary. '
'This is a\n'
' shortcut for "iterkeys()".\n'
'\n'
' clear()\n'
'\n'
' Remove all items from the dictionary.\n'
'\n'
' copy()\n'
'\n'
' Return a shallow copy of the dictionary.\n'
'\n'
' fromkeys(seq[, value])\n'
'\n'
' Create a new dictionary with keys from *seq* and '
'values set to\n'
' *value*.\n'
'\n'
' "fromkeys()" is a class method that returns a new '
'dictionary.\n'
' *value* defaults to "None".\n'
'\n'
' New in version 2.3.\n'
'\n'
' get(key[, default])\n'
'\n'
' Return the value for *key* if *key* is in the '
'dictionary, else\n'
' *default*. If *default* is not given, it defaults to '
'"None", so\n'
' that this method never raises a "KeyError".\n'
'\n'
' has_key(key)\n'
'\n'
' Test for the presence of *key* in the dictionary. '
'"has_key()"\n'
' is deprecated in favor of "key in d".\n'
'\n'
' items()\n'
'\n'
' Return a copy of the dictionary\'s list of "(key, '
'value)" pairs.\n'
'\n'
' **CPython implementation detail:** Keys and values are '
'listed in\n'
' an arbitrary order which is non-random, varies across '
'Python\n'
" implementations, and depends on the dictionary's "
'history of\n'
' insertions and deletions.\n'
'\n'
' If "items()", "keys()", "values()", "iteritems()", '
'"iterkeys()",\n'
' and "itervalues()" are called with no intervening '
'modifications\n'
' to the dictionary, the lists will directly '
'correspond. This\n'
' allows the creation of "(value, key)" pairs using '
'"zip()":\n'
' "pairs = zip(d.values(), d.keys())". The same '
'relationship\n'
' holds for the "iterkeys()" and "itervalues()" methods: '
'"pairs =\n'
' zip(d.itervalues(), d.iterkeys())" provides the same '
'value for\n'
' "pairs". Another way to create the same list is "pairs '
'= [(v, k)\n'
' for (k, v) in d.iteritems()]".\n'
'\n'
' iteritems()\n'
'\n'
' Return an iterator over the dictionary\'s "(key, '
'value)" pairs.\n'
' See the note for "dict.items()".\n'
'\n'
' Using "iteritems()" while adding or deleting entries '
'in the\n'
' dictionary may raise a "RuntimeError" or fail to '
'iterate over\n'
' all entries.\n'
'\n'
' New in version 2.2.\n'
'\n'
' iterkeys()\n'
'\n'
" Return an iterator over the dictionary's keys. See "
'the note for\n'
' "dict.items()".\n'
'\n'
' Using "iterkeys()" while adding or deleting entries in '
'the\n'
' dictionary may raise a "RuntimeError" or fail to '
'iterate over\n'
' all entries.\n'
'\n'
' New in version 2.2.\n'
'\n'
' itervalues()\n'
'\n'
" Return an iterator over the dictionary's values. See "
'the note\n'
' for "dict.items()".\n'
'\n'
' Using "itervalues()" while adding or deleting entries '
'in the\n'
' dictionary may raise a "RuntimeError" or fail to '
'iterate over\n'
' all entries.\n'
'\n'
' New in version 2.2.\n'
'\n'
' keys()\n'
'\n'
" Return a copy of the dictionary's list of keys. See "
'the note\n'
' for "dict.items()".\n'
'\n'
' pop(key[, default])\n'
'\n'
' If *key* is in the dictionary, remove it and return '
'its value,\n'
' else return *default*. If *default* is not given and '
'*key* is\n'
' not in the dictionary, a "KeyError" is raised.\n'
'\n'
' New in version 2.3.\n'
'\n'
' popitem()\n'
'\n'
' Remove and return an arbitrary "(key, value)" pair '
'from the\n'
' dictionary.\n'
'\n'
' "popitem()" is useful to destructively iterate over a\n'
' dictionary, as often used in set algorithms. If the '
'dictionary\n'
' is empty, calling "popitem()" raises a "KeyError".\n'
'\n'
' setdefault(key[, default])\n'
'\n'
' If *key* is in the dictionary, return its value. If '
'not, insert\n'
' *key* with a value of *default* and return *default*. '
'*default*\n'
' defaults to "None".\n'
'\n'
' update([other])\n'
'\n'
' Update the dictionary with the key/value pairs from '
'*other*,\n'
' overwriting existing keys. Return "None".\n'
'\n'
' "update()" accepts either another dictionary object or '
'an\n'
' iterable of key/value pairs (as tuples or other '
'iterables of\n'
' length two). If keyword arguments are specified, the '
'dictionary\n'
' is then updated with those key/value pairs: '
'"d.update(red=1,\n'
' blue=2)".\n'
'\n'
' Changed in version 2.4: Allowed the argument to be an '
'iterable\n'
' of key/value pairs and allowed keyword arguments.\n'
'\n'
' values()\n'
'\n'
" Return a copy of the dictionary's list of values. See "
'the note\n'
' for "dict.items()".\n'
'\n'
' viewitems()\n'
'\n'
' Return a new view of the dictionary\'s items ("(key, '
'value)"\n'
' pairs). See below for documentation of view objects.\n'
'\n'
' New in version 2.7.\n'
'\n'
' viewkeys()\n'
'\n'
" Return a new view of the dictionary's keys. See below "
'for\n'
' documentation of view objects.\n'
'\n'
' New in version 2.7.\n'
'\n'
' viewvalues()\n'
'\n'
" Return a new view of the dictionary's values. See "
'below for\n'
' documentation of view objects.\n'
'\n'
' New in version 2.7.\n'
'\n'
' Dictionaries compare equal if and only if they have the '
'same "(key,\n'
' value)" pairs.\n'
'\n'
'\n'
'Dictionary view objects\n'
'=======================\n'
'\n'
'The objects returned by "dict.viewkeys()", '
'"dict.viewvalues()" and\n'
'"dict.viewitems()" are *view objects*. They provide a '
'dynamic view on\n'
"the dictionary's entries, which means that when the "
'dictionary\n'
'changes, the view reflects these changes.\n'
'\n'
'Dictionary views can be iterated over to yield their '
'respective data,\n'
'and support membership tests:\n'
'\n'
'len(dictview)\n'
'\n'
' Return the number of entries in the dictionary.\n'
'\n'
'iter(dictview)\n'
'\n'
' Return an iterator over the keys, values or items '
'(represented as\n'
' tuples of "(key, value)") in the dictionary.\n'
'\n'
' Keys and values are iterated over in an arbitrary order '
'which is\n'
' non-random, varies across Python implementations, and '
'depends on\n'
" the dictionary's history of insertions and deletions. If "
'keys,\n'
' values and items views are iterated over with no '
'intervening\n'
' modifications to the dictionary, the order of items will '
'directly\n'
' correspond. This allows the creation of "(value, key)" '
'pairs using\n'
' "zip()": "pairs = zip(d.values(), d.keys())". Another '
'way to\n'
' create the same list is "pairs = [(v, k) for (k, v) in '
'd.items()]".\n'
'\n'
' Iterating views while adding or deleting entries in the '
'dictionary\n'
' may raise a "RuntimeError" or fail to iterate over all '
'entries.\n'
'\n'
'x in dictview\n'
'\n'
' Return "True" if *x* is in the underlying dictionary\'s '
'keys, values\n'
' or items (in the latter case, *x* should be a "(key, '
'value)"\n'
' tuple).\n'
'\n'
'Keys views are set-like since their entries are unique and '
'hashable.\n'
'If all values are hashable, so that (key, value) pairs are '
'unique and\n'
'hashable, then the items view is also set-like. (Values '
'views are not\n'
'treated as set-like since the entries are generally not '
'unique.) Then\n'
'these set operations are available ("other" refers either to '
'another\n'
'view or a set):\n'
'\n'
'dictview & other\n'
'\n'
' Return the intersection of the dictview and the other '
'object as a\n'
' new set.\n'
'\n'
'dictview | other\n'
'\n'
' Return the union of the dictview and the other object as '
'a new set.\n'
'\n'
'dictview - other\n'
'\n'
' Return the difference between the dictview and the other '
'object\n'
" (all elements in *dictview* that aren't in *other*) as a "
'new set.\n'
'\n'
'dictview ^ other\n'
'\n'
' Return the symmetric difference (all elements either in '
'*dictview*\n'
' or *other*, but not in both) of the dictview and the '
'other object\n'
' as a new set.\n'
'\n'
'An example of dictionary view usage:\n'
'\n'
" >>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, "
"'spam': 500}\n"
' >>> keys = dishes.viewkeys()\n'
' >>> values = dishes.viewvalues()\n'
'\n'
' >>> # iteration\n'
' >>> n = 0\n'
' >>> for val in values:\n'
' ... n += val\n'
' >>> print(n)\n'
' 504\n'
'\n'
' >>> # keys and values are iterated over in the same '
'order\n'
' >>> list(keys)\n'
" ['eggs', 'bacon', 'sausage', 'spam']\n"
' >>> list(values)\n'
' [2, 1, 1, 500]\n'
'\n'
' >>> # view objects are dynamic and reflect dict changes\n'
" >>> del dishes['eggs']\n"
" >>> del dishes['sausage']\n"
' >>> list(keys)\n'
" ['spam', 'bacon']\n"
'\n'
' >>> # set operations\n'
" >>> keys & {'eggs', 'bacon', 'salad'}\n"
" {'bacon'}\n",
'typesmethods': '\n'
'Methods\n'
'*******\n'
'\n'
'Methods are functions that are called using the attribute '
'notation.\n'
'There are two flavors: built-in methods (such as "append()" '
'on lists)\n'
'and class instance methods. Built-in methods are described '
'with the\n'
'types that support them.\n'
'\n'
'The implementation adds two special read-only attributes to '
'class\n'
'instance methods: "m.im_self" is the object on which the '
'method\n'
'operates, and "m.im_func" is the function implementing the '
'method.\n'
'Calling "m(arg-1, arg-2, ..., arg-n)" is completely '
'equivalent to\n'
'calling "m.im_func(m.im_self, arg-1, arg-2, ..., arg-n)".\n'
'\n'
'Class instance methods are either *bound* or *unbound*, '
'referring to\n'
'whether the method was accessed through an instance or a '
'class,\n'
'respectively. When a method is unbound, its "im_self" '
'attribute will\n'
'be "None" and if called, an explicit "self" object must be '
'passed as\n'
'the first argument. In this case, "self" must be an '
'instance of the\n'
"unbound method's class (or a subclass of that class), "
'otherwise a\n'
'"TypeError" is raised.\n'
'\n'
'Like function objects, methods objects support getting '
'arbitrary\n'
'attributes. However, since method attributes are actually '
'stored on\n'
'the underlying function object ("meth.im_func"), setting '
'method\n'
'attributes on either bound or unbound methods is '
'disallowed.\n'
'Attempting to set an attribute on a method results in an\n'
'"AttributeError" being raised. In order to set a method '
'attribute,\n'
'you need to explicitly set it on the underlying function '
'object:\n'
'\n'
' >>> class C:\n'
' ... def method(self):\n'
' ... pass\n'
' ...\n'
' >>> c = C()\n'
" >>> c.method.whoami = 'my name is method' # can't set on "
'the method\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in <module>\n'
" AttributeError: 'instancemethod' object has no attribute "
"'whoami'\n"
" >>> c.method.im_func.whoami = 'my name is method'\n"
' >>> c.method.whoami\n'
" 'my name is method'\n"
'\n'
'See The standard type hierarchy for more information.\n',
'typesmodules': '\n'
'Modules\n'
'*******\n'
'\n'
'The only special operation on a module is attribute access: '
'"m.name",\n'
'where *m* is a module and *name* accesses a name defined in '
"*m*'s\n"
'symbol table. Module attributes can be assigned to. (Note '
'that the\n'
'"import" statement is not, strictly speaking, an operation '
'on a module\n'
'object; "import foo" does not require a module object named '
'*foo* to\n'
'exist, rather it requires an (external) *definition* for a '
'module\n'
'named *foo* somewhere.)\n'
'\n'
'A special attribute of every module is "__dict__". This is '
'the\n'
"dictionary containing the module's symbol table. Modifying "
'this\n'
"dictionary will actually change the module's symbol table, "
'but direct\n'
'assignment to the "__dict__" attribute is not possible (you '
'can write\n'
'"m.__dict__[\'a\'] = 1", which defines "m.a" to be "1", but '
"you can't\n"
'write "m.__dict__ = {}"). Modifying "__dict__" directly is '
'not\n'
'recommended.\n'
'\n'
'Modules built into the interpreter are written like this: '
'"<module\n'
'\'sys\' (built-in)>". If loaded from a file, they are '
'written as\n'
'"<module \'os\' from '
'\'/usr/local/lib/pythonX.Y/os.pyc\'>".\n',
'typesseq': '\n'
'Sequence Types --- "str", "unicode", "list", "tuple", '
'"bytearray", "buffer", "xrange"\n'
'*************************************************************************************\n'
'\n'
'There are seven sequence types: strings, Unicode strings, '
'lists,\n'
'tuples, bytearrays, buffers, and xrange objects.\n'
'\n'
'For other containers see the built in "dict" and "set" classes, '
'and\n'
'the "collections" module.\n'
'\n'
'String literals are written in single or double quotes: '
'"\'xyzzy\'",\n'
'""frobozz"". See String literals for more about string '
'literals.\n'
'Unicode strings are much like strings, but are specified in the '
'syntax\n'
'using a preceding "\'u\'" character: "u\'abc\'", "u"def"". In '
'addition to\n'
'the functionality described here, there are also '
'string-specific\n'
'methods described in the String Methods section. Lists are '
'constructed\n'
'with square brackets, separating items with commas: "[a, b, '
'c]".\n'
'Tuples are constructed by the comma operator (not within square\n'
'brackets), with or without enclosing parentheses, but an empty '
'tuple\n'
'must have the enclosing parentheses, such as "a, b, c" or "()". '
'A\n'
'single item tuple must have a trailing comma, such as "(d,)".\n'
'\n'
'Bytearray objects are created with the built-in function\n'
'"bytearray()".\n'
'\n'
'Buffer objects are not directly supported by Python syntax, but '
'can be\n'
'created by calling the built-in function "buffer()". They '
"don't\n"
'support concatenation or repetition.\n'
'\n'
'Objects of type xrange are similar to buffers in that there is '
'no\n'
'specific syntax to create them, but they are created using the\n'
'"xrange()" function. They don\'t support slicing, concatenation '
'or\n'
'repetition, and using "in", "not in", "min()" or "max()" on them '
'is\n'
'inefficient.\n'
'\n'
'Most sequence types support the following operations. The "in" '
'and\n'
'"not in" operations have the same priorities as the comparison\n'
'operations. The "+" and "*" operations have the same priority '
'as the\n'
'corresponding numeric operations. [3] Additional methods are '
'provided\n'
'for Mutable Sequence Types.\n'
'\n'
'This table lists the sequence operations sorted in ascending '
'priority.\n'
'In the table, *s* and *t* are sequences of the same type; *n*, '
'*i* and\n'
'*j* are integers:\n'
'\n'
'+--------------------+----------------------------------+------------+\n'
'| Operation | Result | '
'Notes |\n'
'+====================+==================================+============+\n'
'| "x in s" | "True" if an item of *s* is | '
'(1) |\n'
'| | equal to *x*, else "False" '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'| "x not in s" | "False" if an item of *s* is | '
'(1) |\n'
'| | equal to *x*, else "True" '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'| "s + t" | the concatenation of *s* and *t* | '
'(6) |\n'
'+--------------------+----------------------------------+------------+\n'
'| "s * n, n * s" | equivalent to adding *s* to | '
'(2) |\n'
'| | itself *n* times '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'| "s[i]" | *i*th item of *s*, origin 0 | '
'(3) |\n'
'+--------------------+----------------------------------+------------+\n'
'| "s[i:j]" | slice of *s* from *i* to *j* | '
'(3)(4) |\n'
'+--------------------+----------------------------------+------------+\n'
'| "s[i:j:k]" | slice of *s* from *i* to *j* | '
'(3)(5) |\n'
'| | with step *k* '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'| "len(s)" | length of *s* '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'| "min(s)" | smallest item of *s* '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'| "max(s)" | largest item of *s* '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'| "s.index(x)" | index of the first occurrence of '
'| |\n'
'| | *x* in *s* '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'| "s.count(x)" | total number of occurrences of '
'| |\n'
'| | *x* in *s* '
'| |\n'
'+--------------------+----------------------------------+------------+\n'
'\n'
'Sequence types also support comparisons. In particular, tuples '
'and\n'
'lists are compared lexicographically by comparing corresponding\n'
'elements. This means that to compare equal, every element must '
'compare\n'
'equal and the two sequences must be of the same type and have '
'the same\n'
'length. (For full details see Comparisons in the language '
'reference.)\n'
'\n'
'Notes:\n'
'\n'
'1. When *s* is a string or Unicode string object the "in" and '
'"not\n'
' in" operations act like a substring test. In Python '
'versions\n'
' before 2.3, *x* had to be a string of length 1. In Python 2.3 '
'and\n'
' beyond, *x* may be a string of any length.\n'
'\n'
'2. Values of *n* less than "0" are treated as "0" (which yields '
'an\n'
' empty sequence of the same type as *s*). Note that items in '
'the\n'
' sequence *s* are not copied; they are referenced multiple '
'times.\n'
' This often haunts new Python programmers; consider:\n'
'\n'
' >>> lists = [[]] * 3\n'
' >>> lists\n'
' [[], [], []]\n'
' >>> lists[0].append(3)\n'
' >>> lists\n'
' [[3], [3], [3]]\n'
'\n'
' What has happened is that "[[]]" is a one-element list '
'containing\n'
' an empty list, so all three elements of "[[]] * 3" are '
'references\n'
' to this single empty list. Modifying any of the elements of\n'
' "lists" modifies this single list. You can create a list of\n'
' different lists this way:\n'
'\n'
' >>> lists = [[] for i in range(3)]\n'
' >>> lists[0].append(3)\n'
' >>> lists[1].append(5)\n'
' >>> lists[2].append(7)\n'
' >>> lists\n'
' [[3], [5], [7]]\n'
'\n'
' Further explanation is available in the FAQ entry How do I '
'create a\n'
' multidimensional list?.\n'
'\n'
'3. If *i* or *j* is negative, the index is relative to the end '
'of\n'
' sequence *s*: "len(s) + i" or "len(s) + j" is substituted. '
'But\n'
' note that "-0" is still "0".\n'
'\n'
'4. The slice of *s* from *i* to *j* is defined as the sequence '
'of\n'
' items with index *k* such that "i <= k < j". If *i* or *j* '
'is\n'
' greater than "len(s)", use "len(s)". If *i* is omitted or '
'"None",\n'
' use "0". If *j* is omitted or "None", use "len(s)". If *i* '
'is\n'
' greater than or equal to *j*, the slice is empty.\n'
'\n'
'5. The slice of *s* from *i* to *j* with step *k* is defined as '
'the\n'
' sequence of items with index "x = i + n*k" such that "0 <= n '
'<\n'
' (j-i)/k". In other words, the indices are "i", "i+k", '
'"i+2*k",\n'
' "i+3*k" and so on, stopping when *j* is reached (but never\n'
' including *j*). When *k* is positive, *i* and *j* are '
'reduced to\n'
' "len(s)" if they are greater. When *k* is negative, *i* and '
'*j* are\n'
' reduced to "len(s) - 1" if they are greater. If *i* or *j* '
'are\n'
' omitted or "None", they become "end" values (which end '
'depends on\n'
' the sign of *k*). Note, *k* cannot be zero. If *k* is '
'"None", it\n'
' is treated like "1".\n'
'\n'
'6. **CPython implementation detail:** If *s* and *t* are both\n'
' strings, some Python implementations such as CPython can '
'usually\n'
' perform an in-place optimization for assignments of the form '
'"s = s\n'
' + t" or "s += t". When applicable, this optimization makes\n'
' quadratic run-time much less likely. This optimization is '
'both\n'
' version and implementation dependent. For performance '
'sensitive\n'
' code, it is preferable to use the "str.join()" method which '
'assures\n'
' consistent linear concatenation performance across versions '
'and\n'
' implementations.\n'
'\n'
' Changed in version 2.4: Formerly, string concatenation never\n'
' occurred in-place.\n'
'\n'
'\n'
'String Methods\n'
'==============\n'
'\n'
'Below are listed the string methods which both 8-bit strings '
'and\n'
'Unicode objects support. Some of them are also available on\n'
'"bytearray" objects.\n'
'\n'
"In addition, Python's strings support the sequence type methods\n"
'described in the Sequence Types --- str, unicode, list, tuple,\n'
'bytearray, buffer, xrange section. To output formatted strings '
'use\n'
'template strings or the "%" operator described in the String\n'
'Formatting Operations section. Also, see the "re" module for '
'string\n'
'functions based on regular expressions.\n'
'\n'
'str.capitalize()\n'
'\n'
' Return a copy of the string with its first character '
'capitalized\n'
' and the rest lowercased.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.center(width[, fillchar])\n'
'\n'
' Return centered in a string of length *width*. Padding is '
'done\n'
' using the specified *fillchar* (default is a space).\n'
'\n'
' Changed in version 2.4: Support for the *fillchar* argument.\n'
'\n'
'str.count(sub[, start[, end]])\n'
'\n'
' Return the number of non-overlapping occurrences of substring '
'*sub*\n'
' in the range [*start*, *end*]. Optional arguments *start* '
'and\n'
' *end* are interpreted as in slice notation.\n'
'\n'
'str.decode([encoding[, errors]])\n'
'\n'
' Decodes the string using the codec registered for '
'*encoding*.\n'
' *encoding* defaults to the default string encoding. *errors* '
'may\n'
' be given to set a different error handling scheme. The '
'default is\n'
' "\'strict\'", meaning that encoding errors raise '
'"UnicodeError".\n'
' Other possible values are "\'ignore\'", "\'replace\'" and any '
'other\n'
' name registered via "codecs.register_error()", see section '
'Codec\n'
' Base Classes.\n'
'\n'
' New in version 2.2.\n'
'\n'
' Changed in version 2.3: Support for other error handling '
'schemes\n'
' added.\n'
'\n'
' Changed in version 2.7: Support for keyword arguments added.\n'
'\n'
'str.encode([encoding[, errors]])\n'
'\n'
' Return an encoded version of the string. Default encoding is '
'the\n'
' current default string encoding. *errors* may be given to '
'set a\n'
' different error handling scheme. The default for *errors* '
'is\n'
' "\'strict\'", meaning that encoding errors raise a '
'"UnicodeError".\n'
' Other possible values are "\'ignore\'", "\'replace\'",\n'
' "\'xmlcharrefreplace\'", "\'backslashreplace\'" and any other '
'name\n'
' registered via "codecs.register_error()", see section Codec '
'Base\n'
' Classes. For a list of possible encodings, see section '
'Standard\n'
' Encodings.\n'
'\n'
' New in version 2.0.\n'
'\n'
' Changed in version 2.3: Support for "\'xmlcharrefreplace\'" '
'and\n'
' "\'backslashreplace\'" and other error handling schemes '
'added.\n'
'\n'
' Changed in version 2.7: Support for keyword arguments added.\n'
'\n'
'str.endswith(suffix[, start[, end]])\n'
'\n'
' Return "True" if the string ends with the specified '
'*suffix*,\n'
' otherwise return "False". *suffix* can also be a tuple of '
'suffixes\n'
' to look for. With optional *start*, test beginning at that\n'
' position. With optional *end*, stop comparing at that '
'position.\n'
'\n'
' Changed in version 2.5: Accept tuples as *suffix*.\n'
'\n'
'str.expandtabs([tabsize])\n'
'\n'
' Return a copy of the string where all tab characters are '
'replaced\n'
' by one or more spaces, depending on the current column and '
'the\n'
' given tab size. Tab positions occur every *tabsize* '
'characters\n'
' (default is 8, giving tab positions at columns 0, 8, 16 and '
'so on).\n'
' To expand the string, the current column is set to zero and '
'the\n'
' string is examined character by character. If the character '
'is a\n'
' tab ("\\t"), one or more space characters are inserted in the '
'result\n'
' until the current column is equal to the next tab position. '
'(The\n'
' tab character itself is not copied.) If the character is a '
'newline\n'
' ("\\n") or return ("\\r"), it is copied and the current '
'column is\n'
' reset to zero. Any other character is copied unchanged and '
'the\n'
' current column is incremented by one regardless of how the\n'
' character is represented when printed.\n'
'\n'
" >>> '01\\t012\\t0123\\t01234'.expandtabs()\n"
" '01 012 0123 01234'\n"
" >>> '01\\t012\\t0123\\t01234'.expandtabs(4)\n"
" '01 012 0123 01234'\n"
'\n'
'str.find(sub[, start[, end]])\n'
'\n'
' Return the lowest index in the string where substring *sub* '
'is\n'
' found within the slice "s[start:end]". Optional arguments '
'*start*\n'
' and *end* are interpreted as in slice notation. Return "-1" '
'if\n'
' *sub* is not found.\n'
'\n'
' Note: The "find()" method should be used only if you need to '
'know\n'
' the position of *sub*. To check if *sub* is a substring or '
'not,\n'
' use the "in" operator:\n'
'\n'
" >>> 'Py' in 'Python'\n"
' True\n'
'\n'
'str.format(*args, **kwargs)\n'
'\n'
' Perform a string formatting operation. The string on which '
'this\n'
' method is called can contain literal text or replacement '
'fields\n'
' delimited by braces "{}". Each replacement field contains '
'either\n'
' the numeric index of a positional argument, or the name of a\n'
' keyword argument. Returns a copy of the string where each\n'
' replacement field is replaced with the string value of the\n'
' corresponding argument.\n'
'\n'
' >>> "The sum of 1 + 2 is {0}".format(1+2)\n'
" 'The sum of 1 + 2 is 3'\n"
'\n'
' See Format String Syntax for a description of the various\n'
' formatting options that can be specified in format strings.\n'
'\n'
' This method of string formatting is the new standard in '
'Python 3,\n'
' and should be preferred to the "%" formatting described in '
'String\n'
' Formatting Operations in new code.\n'
'\n'
' New in version 2.6.\n'
'\n'
'str.index(sub[, start[, end]])\n'
'\n'
' Like "find()", but raise "ValueError" when the substring is '
'not\n'
' found.\n'
'\n'
'str.isalnum()\n'
'\n'
' Return true if all characters in the string are alphanumeric '
'and\n'
' there is at least one character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.isalpha()\n'
'\n'
' Return true if all characters in the string are alphabetic '
'and\n'
' there is at least one character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.isdigit()\n'
'\n'
' Return true if all characters in the string are digits and '
'there is\n'
' at least one character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.islower()\n'
'\n'
' Return true if all cased characters [4] in the string are '
'lowercase\n'
' and there is at least one cased character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.isspace()\n'
'\n'
' Return true if there are only whitespace characters in the '
'string\n'
' and there is at least one character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.istitle()\n'
'\n'
' Return true if the string is a titlecased string and there is '
'at\n'
' least one character, for example uppercase characters may '
'only\n'
' follow uncased characters and lowercase characters only cased '
'ones.\n'
' Return false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.isupper()\n'
'\n'
' Return true if all cased characters [4] in the string are '
'uppercase\n'
' and there is at least one cased character, false otherwise.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.join(iterable)\n'
'\n'
' Return a string which is the concatenation of the strings in\n'
' *iterable*. A "TypeError" will be raised if there are any '
'non-\n'
' string values in *iterable*, including "bytes" objects. The\n'
' separator between elements is the string providing this '
'method.\n'
'\n'
'str.ljust(width[, fillchar])\n'
'\n'
' Return the string left justified in a string of length '
'*width*.\n'
' Padding is done using the specified *fillchar* (default is a\n'
' space). The original string is returned if *width* is less '
'than or\n'
' equal to "len(s)".\n'
'\n'
' Changed in version 2.4: Support for the *fillchar* argument.\n'
'\n'
'str.lower()\n'
'\n'
' Return a copy of the string with all the cased characters '
'[4]\n'
' converted to lowercase.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.lstrip([chars])\n'
'\n'
' Return a copy of the string with leading characters removed. '
'The\n'
' *chars* argument is a string specifying the set of characters '
'to be\n'
' removed. If omitted or "None", the *chars* argument defaults '
'to\n'
' removing whitespace. The *chars* argument is not a prefix; '
'rather,\n'
' all combinations of its values are stripped:\n'
'\n'
" >>> ' spacious '.lstrip()\n"
" 'spacious '\n"
" >>> 'www.example.com'.lstrip('cmowz.')\n"
" 'example.com'\n"
'\n'
' Changed in version 2.2.2: Support for the *chars* argument.\n'
'\n'
'str.partition(sep)\n'
'\n'
' Split the string at the first occurrence of *sep*, and return '
'a\n'
' 3-tuple containing the part before the separator, the '
'separator\n'
' itself, and the part after the separator. If the separator '
'is not\n'
' found, return a 3-tuple containing the string itself, '
'followed by\n'
' two empty strings.\n'
'\n'
' New in version 2.5.\n'
'\n'
'str.replace(old, new[, count])\n'
'\n'
' Return a copy of the string with all occurrences of substring '
'*old*\n'
' replaced by *new*. If the optional argument *count* is '
'given, only\n'
' the first *count* occurrences are replaced.\n'
'\n'
'str.rfind(sub[, start[, end]])\n'
'\n'
' Return the highest index in the string where substring *sub* '
'is\n'
' found, such that *sub* is contained within "s[start:end]".\n'
' Optional arguments *start* and *end* are interpreted as in '
'slice\n'
' notation. Return "-1" on failure.\n'
'\n'
'str.rindex(sub[, start[, end]])\n'
'\n'
' Like "rfind()" but raises "ValueError" when the substring '
'*sub* is\n'
' not found.\n'
'\n'
'str.rjust(width[, fillchar])\n'
'\n'
' Return the string right justified in a string of length '
'*width*.\n'
' Padding is done using the specified *fillchar* (default is a\n'
' space). The original string is returned if *width* is less '
'than or\n'
' equal to "len(s)".\n'
'\n'
' Changed in version 2.4: Support for the *fillchar* argument.\n'
'\n'
'str.rpartition(sep)\n'
'\n'
' Split the string at the last occurrence of *sep*, and return '
'a\n'
' 3-tuple containing the part before the separator, the '
'separator\n'
' itself, and the part after the separator. If the separator '
'is not\n'
' found, return a 3-tuple containing two empty strings, '
'followed by\n'
' the string itself.\n'
'\n'
' New in version 2.5.\n'
'\n'
'str.rsplit([sep[, maxsplit]])\n'
'\n'
' Return a list of the words in the string, using *sep* as the\n'
' delimiter string. If *maxsplit* is given, at most *maxsplit* '
'splits\n'
' are done, the *rightmost* ones. If *sep* is not specified '
'or\n'
' "None", any whitespace string is a separator. Except for '
'splitting\n'
' from the right, "rsplit()" behaves like "split()" which is\n'
' described in detail below.\n'
'\n'
' New in version 2.4.\n'
'\n'
'str.rstrip([chars])\n'
'\n'
' Return a copy of the string with trailing characters '
'removed. The\n'
' *chars* argument is a string specifying the set of characters '
'to be\n'
' removed. If omitted or "None", the *chars* argument defaults '
'to\n'
' removing whitespace. The *chars* argument is not a suffix; '
'rather,\n'
' all combinations of its values are stripped:\n'
'\n'
" >>> ' spacious '.rstrip()\n"
" ' spacious'\n"
" >>> 'mississippi'.rstrip('ipz')\n"
" 'mississ'\n"
'\n'
' Changed in version 2.2.2: Support for the *chars* argument.\n'
'\n'
'str.split([sep[, maxsplit]])\n'
'\n'
' Return a list of the words in the string, using *sep* as the\n'
' delimiter string. If *maxsplit* is given, at most '
'*maxsplit*\n'
' splits are done (thus, the list will have at most '
'"maxsplit+1"\n'
' elements). If *maxsplit* is not specified or "-1", then '
'there is\n'
' no limit on the number of splits (all possible splits are '
'made).\n'
'\n'
' If *sep* is given, consecutive delimiters are not grouped '
'together\n'
' and are deemed to delimit empty strings (for example,\n'
' "\'1,,2\'.split(\',\')" returns "[\'1\', \'\', \'2\']"). The '
'*sep* argument\n'
' may consist of multiple characters (for example,\n'
' "\'1<>2<>3\'.split(\'<>\')" returns "[\'1\', \'2\', \'3\']"). '
'Splitting an\n'
' empty string with a specified separator returns "[\'\']".\n'
'\n'
' If *sep* is not specified or is "None", a different '
'splitting\n'
' algorithm is applied: runs of consecutive whitespace are '
'regarded\n'
' as a single separator, and the result will contain no empty '
'strings\n'
' at the start or end if the string has leading or trailing\n'
' whitespace. Consequently, splitting an empty string or a '
'string\n'
' consisting of just whitespace with a "None" separator returns '
'"[]".\n'
'\n'
' For example, "\' 1 2 3 \'.split()" returns "[\'1\', '
'\'2\', \'3\']", and\n'
' "\' 1 2 3 \'.split(None, 1)" returns "[\'1\', \'2 3 '
'\']".\n'
'\n'
'str.splitlines([keepends])\n'
'\n'
' Return a list of the lines in the string, breaking at line\n'
' boundaries. This method uses the *universal newlines* '
'approach to\n'
' splitting lines. Line breaks are not included in the '
'resulting list\n'
' unless *keepends* is given and true.\n'
'\n'
' Python recognizes ""\\r"", ""\\n"", and ""\\r\\n"" as line '
'boundaries\n'
' for 8-bit strings.\n'
'\n'
' For example:\n'
'\n'
" >>> 'ab c\\n\\nde fg\\rkl\\r\\n'.splitlines()\n"
" ['ab c', '', 'de fg', 'kl']\n"
" >>> 'ab c\\n\\nde fg\\rkl\\r\\n'.splitlines(True)\n"
" ['ab c\\n', '\\n', 'de fg\\r', 'kl\\r\\n']\n"
'\n'
' Unlike "split()" when a delimiter string *sep* is given, '
'this\n'
' method returns an empty list for the empty string, and a '
'terminal\n'
' line break does not result in an extra line:\n'
'\n'
' >>> "".splitlines()\n'
' []\n'
' >>> "One line\\n".splitlines()\n'
" ['One line']\n"
'\n'
' For comparison, "split(\'\\n\')" gives:\n'
'\n'
" >>> ''.split('\\n')\n"
" ['']\n"
" >>> 'Two lines\\n'.split('\\n')\n"
" ['Two lines', '']\n"
'\n'
'unicode.splitlines([keepends])\n'
'\n'
' Return a list of the lines in the string, like '
'"str.splitlines()".\n'
' However, the Unicode method splits on the following line\n'
' boundaries, which are a superset of the *universal newlines*\n'
' recognized for 8-bit strings.\n'
'\n'
' +-------------------------+-------------------------------+\n'
' | Representation | Description |\n'
' +=========================+===============================+\n'
' | "\\n" | Line Feed |\n'
' +-------------------------+-------------------------------+\n'
' | "\\r" | Carriage Return |\n'
' +-------------------------+-------------------------------+\n'
' | "\\r\\n" | Carriage Return + Line Feed '
'|\n'
' +-------------------------+-------------------------------+\n'
' | "\\v" or "\\x0b" | Line Tabulation '
'|\n'
' +-------------------------+-------------------------------+\n'
' | "\\f" or "\\x0c" | Form Feed '
'|\n'
' +-------------------------+-------------------------------+\n'
' | "\\x1c" | File Separator |\n'
' +-------------------------+-------------------------------+\n'
' | "\\x1d" | Group Separator |\n'
' +-------------------------+-------------------------------+\n'
' | "\\x1e" | Record Separator |\n'
' +-------------------------+-------------------------------+\n'
' | "\\x85" | Next Line (C1 Control Code) |\n'
' +-------------------------+-------------------------------+\n'
' | "\\u2028" | Line Separator |\n'
' +-------------------------+-------------------------------+\n'
' | "\\u2029" | Paragraph Separator |\n'
' +-------------------------+-------------------------------+\n'
'\n'
' Changed in version 2.7: "\\v" and "\\f" added to list of '
'line\n'
' boundaries.\n'
'\n'
'str.startswith(prefix[, start[, end]])\n'
'\n'
' Return "True" if string starts with the *prefix*, otherwise '
'return\n'
' "False". *prefix* can also be a tuple of prefixes to look '
'for.\n'
' With optional *start*, test string beginning at that '
'position.\n'
' With optional *end*, stop comparing string at that position.\n'
'\n'
' Changed in version 2.5: Accept tuples as *prefix*.\n'
'\n'
'str.strip([chars])\n'
'\n'
' Return a copy of the string with the leading and trailing\n'
' characters removed. The *chars* argument is a string '
'specifying the\n'
' set of characters to be removed. If omitted or "None", the '
'*chars*\n'
' argument defaults to removing whitespace. The *chars* '
'argument is\n'
' not a prefix or suffix; rather, all combinations of its '
'values are\n'
' stripped:\n'
'\n'
" >>> ' spacious '.strip()\n"
" 'spacious'\n"
" >>> 'www.example.com'.strip('cmowz.')\n"
" 'example'\n"
'\n'
' Changed in version 2.2.2: Support for the *chars* argument.\n'
'\n'
'str.swapcase()\n'
'\n'
' Return a copy of the string with uppercase characters '
'converted to\n'
' lowercase and vice versa.\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.title()\n'
'\n'
' Return a titlecased version of the string where words start '
'with an\n'
' uppercase character and the remaining characters are '
'lowercase.\n'
'\n'
' The algorithm uses a simple language-independent definition '
'of a\n'
' word as groups of consecutive letters. The definition works '
'in\n'
' many contexts but it means that apostrophes in contractions '
'and\n'
' possessives form word boundaries, which may not be the '
'desired\n'
' result:\n'
'\n'
' >>> "they\'re bill\'s friends from the UK".title()\n'
' "They\'Re Bill\'S Friends From The Uk"\n'
'\n'
' A workaround for apostrophes can be constructed using '
'regular\n'
' expressions:\n'
'\n'
' >>> import re\n'
' >>> def titlecase(s):\n'
' ... return re.sub(r"[A-Za-z]+(\'[A-Za-z]+)?",\n'
' ... lambda mo: mo.group(0)[0].upper() +\n'
' ... mo.group(0)[1:].lower(),\n'
' ... s)\n'
' ...\n'
' >>> titlecase("they\'re bill\'s friends.")\n'
' "They\'re Bill\'s Friends."\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.translate(table[, deletechars])\n'
'\n'
' Return a copy of the string where all characters occurring in '
'the\n'
' optional argument *deletechars* are removed, and the '
'remaining\n'
' characters have been mapped through the given translation '
'table,\n'
' which must be a string of length 256.\n'
'\n'
' You can use the "maketrans()" helper function in the '
'"string"\n'
' module to create a translation table. For string objects, set '
'the\n'
' *table* argument to "None" for translations that only delete\n'
' characters:\n'
'\n'
" >>> 'read this short text'.translate(None, 'aeiou')\n"
" 'rd ths shrt txt'\n"
'\n'
' New in version 2.6: Support for a "None" *table* argument.\n'
'\n'
' For Unicode objects, the "translate()" method does not accept '
'the\n'
' optional *deletechars* argument. Instead, it returns a copy '
'of the\n'
' *s* where all characters have been mapped through the given\n'
' translation table which must be a mapping of Unicode ordinals '
'to\n'
' Unicode ordinals, Unicode strings or "None". Unmapped '
'characters\n'
' are left untouched. Characters mapped to "None" are deleted. '
'Note,\n'
' a more flexible approach is to create a custom character '
'mapping\n'
' codec using the "codecs" module (see "encodings.cp1251" for '
'an\n'
' example).\n'
'\n'
'str.upper()\n'
'\n'
' Return a copy of the string with all the cased characters '
'[4]\n'
' converted to uppercase. Note that "str.upper().isupper()" '
'might be\n'
' "False" if "s" contains uncased characters or if the Unicode\n'
' category of the resulting character(s) is not "Lu" (Letter,\n'
' uppercase), but e.g. "Lt" (Letter, titlecase).\n'
'\n'
' For 8-bit strings, this method is locale-dependent.\n'
'\n'
'str.zfill(width)\n'
'\n'
' Return the numeric string left filled with zeros in a string '
'of\n'
' length *width*. A sign prefix is handled correctly. The '
'original\n'
' string is returned if *width* is less than or equal to '
'"len(s)".\n'
'\n'
' New in version 2.2.2.\n'
'\n'
'The following methods are present only on unicode objects:\n'
'\n'
'unicode.isnumeric()\n'
'\n'
' Return "True" if there are only numeric characters in S, '
'"False"\n'
' otherwise. Numeric characters include digit characters, and '
'all\n'
' characters that have the Unicode numeric value property, '
'e.g.\n'
' U+2155, VULGAR FRACTION ONE FIFTH.\n'
'\n'
'unicode.isdecimal()\n'
'\n'
' Return "True" if there are only decimal characters in S, '
'"False"\n'
' otherwise. Decimal characters include digit characters, and '
'all\n'
' characters that can be used to form decimal-radix numbers, '
'e.g.\n'
' U+0660, ARABIC-INDIC DIGIT ZERO.\n'
'\n'
'\n'
'String Formatting Operations\n'
'============================\n'
'\n'
'String and Unicode objects have one unique built-in operation: '
'the "%"\n'
'operator (modulo). This is also known as the string '
'*formatting* or\n'
'*interpolation* operator. Given "format % values" (where '
'*format* is\n'
'a string or Unicode object), "%" conversion specifications in '
'*format*\n'
'are replaced with zero or more elements of *values*. The effect '
'is\n'
'similar to the using "sprintf()" in the C language. If *format* '
'is a\n'
'Unicode object, or if any of the objects being converted using '
'the\n'
'"%s" conversion are Unicode objects, the result will also be a '
'Unicode\n'
'object.\n'
'\n'
'If *format* requires a single argument, *values* may be a single '
'non-\n'
'tuple object. [5] Otherwise, *values* must be a tuple with '
'exactly\n'
'the number of items specified by the format string, or a single\n'
'mapping object (for example, a dictionary).\n'
'\n'
'A conversion specifier contains two or more characters and has '
'the\n'
'following components, which must occur in this order:\n'
'\n'
'1. The "\'%\'" character, which marks the start of the '
'specifier.\n'
'\n'
'2. Mapping key (optional), consisting of a parenthesised '
'sequence\n'
' of characters (for example, "(somename)").\n'
'\n'
'3. Conversion flags (optional), which affect the result of some\n'
' conversion types.\n'
'\n'
'4. Minimum field width (optional). If specified as an "\'*\'"\n'
' (asterisk), the actual width is read from the next element of '
'the\n'
' tuple in *values*, and the object to convert comes after the\n'
' minimum field width and optional precision.\n'
'\n'
'5. Precision (optional), given as a "\'.\'" (dot) followed by '
'the\n'
' precision. If specified as "\'*\'" (an asterisk), the actual '
'width\n'
' is read from the next element of the tuple in *values*, and '
'the\n'
' value to convert comes after the precision.\n'
'\n'
'6. Length modifier (optional).\n'
'\n'
'7. Conversion type.\n'
'\n'
'When the right argument is a dictionary (or other mapping type), '
'then\n'
'the formats in the string *must* include a parenthesised mapping '
'key\n'
'into that dictionary inserted immediately after the "\'%\'" '
'character.\n'
'The mapping key selects the value to be formatted from the '
'mapping.\n'
'For example:\n'
'\n'
">>> print '%(language)s has %(number)03d quote types.' % \\\n"
'... {"language": "Python", "number": 2}\n'
'Python has 002 quote types.\n'
'\n'
'In this case no "*" specifiers may occur in a format (since '
'they\n'
'require a sequential parameter list).\n'
'\n'
'The conversion flag characters are:\n'
'\n'
'+-----------+-----------------------------------------------------------------------+\n'
'| Flag | '
'Meaning '
'|\n'
'+===========+=======================================================================+\n'
'| "\'#\'" | The value conversion will use the "alternate '
'form" (where defined |\n'
'| | '
'below). '
'|\n'
'+-----------+-----------------------------------------------------------------------+\n'
'| "\'0\'" | The conversion will be zero padded for numeric '
'values. |\n'
'+-----------+-----------------------------------------------------------------------+\n'
'| "\'-\'" | The converted value is left adjusted (overrides '
'the "\'0\'" conversion |\n'
'| | if both are '
'given). |\n'
'+-----------+-----------------------------------------------------------------------+\n'
'| "\' \'" | (a space) A blank should be left before a '
'positive number (or empty |\n'
'| | string) produced by a signed '
'conversion. |\n'
'+-----------+-----------------------------------------------------------------------+\n'
'| "\'+\'" | A sign character ("\'+\'" or "\'-\'") will '
'precede the conversion |\n'
'| | (overrides a "space" '
'flag). |\n'
'+-----------+-----------------------------------------------------------------------+\n'
'\n'
'A length modifier ("h", "l", or "L") may be present, but is '
'ignored as\n'
'it is not necessary for Python -- so e.g. "%ld" is identical to '
'"%d".\n'
'\n'
'The conversion types are:\n'
'\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| Conversion | '
'Meaning | Notes '
'|\n'
'+==============+=======================================================+=========+\n'
'| "\'d\'" | Signed integer '
'decimal. | |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'i\'" | Signed integer '
'decimal. | |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'o\'" | Signed octal '
'value. | (1) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'u\'" | Obsolete type -- it is identical to '
'"\'d\'". | (7) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'x\'" | Signed hexadecimal '
'(lowercase). | (2) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'X\'" | Signed hexadecimal '
'(uppercase). | (2) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'e\'" | Floating point exponential format '
'(lowercase). | (3) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'E\'" | Floating point exponential format '
'(uppercase). | (3) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'f\'" | Floating point decimal '
'format. | (3) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'F\'" | Floating point decimal '
'format. | (3) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'g\'" | Floating point format. Uses lowercase '
'exponential | (4) |\n'
'| | format if exponent is less than -4 or not less '
'than | |\n'
'| | precision, decimal format '
'otherwise. | |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'G\'" | Floating point format. Uses uppercase '
'exponential | (4) |\n'
'| | format if exponent is less than -4 or not less '
'than | |\n'
'| | precision, decimal format '
'otherwise. | |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'c\'" | Single character (accepts integer or single '
'character | |\n'
'| | '
'string). | '
'|\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'r\'" | String (converts any Python object using '
'repr()). | (5) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'s\'" | String (converts any Python object using '
'"str()"). | (6) |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'| "\'%\'" | No argument is converted, results in a '
'"\'%\'" | |\n'
'| | character in the '
'result. | |\n'
'+--------------+-------------------------------------------------------+---------+\n'
'\n'
'Notes:\n'
'\n'
'1. The alternate form causes a leading zero ("\'0\'") to be '
'inserted\n'
' between left-hand padding and the formatting of the number if '
'the\n'
' leading character of the result is not already a zero.\n'
'\n'
'2. The alternate form causes a leading "\'0x\'" or "\'0X\'" '
'(depending\n'
' on whether the "\'x\'" or "\'X\'" format was used) to be '
'inserted\n'
' before the first digit.\n'
'\n'
'3. The alternate form causes the result to always contain a '
'decimal\n'
' point, even if no digits follow it.\n'
'\n'
' The precision determines the number of digits after the '
'decimal\n'
' point and defaults to 6.\n'
'\n'
'4. The alternate form causes the result to always contain a '
'decimal\n'
' point, and trailing zeroes are not removed as they would '
'otherwise\n'
' be.\n'
'\n'
' The precision determines the number of significant digits '
'before\n'
' and after the decimal point and defaults to 6.\n'
'\n'
'5. The "%r" conversion was added in Python 2.0.\n'
'\n'
' The precision determines the maximal number of characters '
'used.\n'
'\n'
'6. If the object or format provided is a "unicode" string, the\n'
' resulting string will also be "unicode".\n'
'\n'
' The precision determines the maximal number of characters '
'used.\n'
'\n'
'7. See **PEP 237**.\n'
'\n'
'Since Python strings have an explicit length, "%s" conversions '
'do not\n'
'assume that "\'\\0\'" is the end of the string.\n'
'\n'
'Changed in version 2.7: "%f" conversions for numbers whose '
'absolute\n'
'value is over 1e50 are no longer replaced by "%g" conversions.\n'
'\n'
'Additional string operations are defined in standard modules '
'"string"\n'
'and "re".\n'
'\n'
'\n'
'XRange Type\n'
'===========\n'
'\n'
'The "xrange" type is an immutable sequence which is commonly '
'used for\n'
'looping. The advantage of the "xrange" type is that an '
'"xrange"\n'
'object will always take the same amount of memory, no matter the '
'size\n'
'of the range it represents. There are no consistent '
'performance\n'
'advantages.\n'
'\n'
'XRange objects have very little behavior: they only support '
'indexing,\n'
'iteration, and the "len()" function.\n'
'\n'
'\n'
'Mutable Sequence Types\n'
'======================\n'
'\n'
'List and "bytearray" objects support additional operations that '
'allow\n'
'in-place modification of the object. Other mutable sequence '
'types\n'
'(when added to the language) should also support these '
'operations.\n'
'Strings and tuples are immutable sequence types: such objects '
'cannot\n'
'be modified once created. The following operations are defined '
'on\n'
'mutable sequence types (where *x* is an arbitrary object):\n'
'\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| Operation | '
'Result | Notes |\n'
'+================================+==================================+=======================+\n'
'| "s[i] = x" | item *i* of *s* is replaced '
'by | |\n'
'| | '
'*x* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s[i:j] = t" | slice of *s* from *i* to *j* '
'is | |\n'
'| | replaced by the contents of '
'the | |\n'
'| | iterable '
'*t* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "del s[i:j]" | same as "s[i:j] = '
'[]" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s[i:j:k] = t" | the elements of "s[i:j:k]" '
'are | (1) |\n'
'| | replaced by those of '
'*t* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "del s[i:j:k]" | removes the elements '
'of | |\n'
'| | "s[i:j:k]" from the '
'list | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.append(x)" | same as "s[len(s):len(s)] = '
'[x]" | (2) |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.extend(t)" or "s += t" | for the most part the same '
'as | (3) |\n'
'| | "s[len(s):len(s)] = '
't" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s *= n" | updates *s* with its '
'contents | (11) |\n'
'| | repeated *n* '
'times | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.count(x)" | return number of *i*\'s for '
'which | |\n'
'| | "s[i] == '
'x" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.index(x[, i[, j]])" | return smallest *k* such '
'that | (4) |\n'
'| | "s[k] == x" and "i <= k < '
'j" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.insert(i, x)" | same as "s[i:i] = '
'[x]" | (5) |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.pop([i])" | same as "x = s[i]; del '
's[i]; | (6) |\n'
'| | return '
'x" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.remove(x)" | same as "del '
's[s.index(x)]" | (4) |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.reverse()" | reverses the items of *s* '
'in | (7) |\n'
'| | '
'place | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.sort([cmp[, key[, | sort the items of *s* in '
'place | (7)(8)(9)(10) |\n'
'| reverse]]])" '
'| | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'\n'
'Notes:\n'
'\n'
'1. *t* must have the same length as the slice it is replacing.\n'
'\n'
'2. The C implementation of Python has historically accepted\n'
' multiple parameters and implicitly joined them into a tuple; '
'this\n'
' no longer works in Python 2.0. Use of this misfeature has '
'been\n'
' deprecated since Python 1.4.\n'
'\n'
'3. *t* can be any iterable object.\n'
'\n'
'4. Raises "ValueError" when *x* is not found in *s*. When a\n'
' negative index is passed as the second or third parameter to '
'the\n'
' "index()" method, the list length is added, as for slice '
'indices.\n'
' If it is still negative, it is truncated to zero, as for '
'slice\n'
' indices.\n'
'\n'
' Changed in version 2.3: Previously, "index()" didn\'t have '
'arguments\n'
' for specifying start and stop positions.\n'
'\n'
'5. When a negative index is passed as the first parameter to '
'the\n'
' "insert()" method, the list length is added, as for slice '
'indices.\n'
' If it is still negative, it is truncated to zero, as for '
'slice\n'
' indices.\n'
'\n'
' Changed in version 2.3: Previously, all negative indices '
'were\n'
' truncated to zero.\n'
'\n'
'6. The "pop()" method\'s optional argument *i* defaults to "-1", '
'so\n'
' that by default the last item is removed and returned.\n'
'\n'
'7. The "sort()" and "reverse()" methods modify the list in '
'place\n'
' for economy of space when sorting or reversing a large list. '
'To\n'
" remind you that they operate by side effect, they don't "
'return the\n'
' sorted or reversed list.\n'
'\n'
'8. The "sort()" method takes optional arguments for controlling '
'the\n'
' comparisons.\n'
'\n'
' *cmp* specifies a custom comparison function of two arguments '
'(list\n'
' items) which should return a negative, zero or positive '
'number\n'
' depending on whether the first argument is considered smaller '
'than,\n'
' equal to, or larger than the second argument: "cmp=lambda '
'x,y:\n'
' cmp(x.lower(), y.lower())". The default value is "None".\n'
'\n'
' *key* specifies a function of one argument that is used to '
'extract\n'
' a comparison key from each list element: "key=str.lower". '
'The\n'
' default value is "None".\n'
'\n'
' *reverse* is a boolean value. If set to "True", then the '
'list\n'
' elements are sorted as if each comparison were reversed.\n'
'\n'
' In general, the *key* and *reverse* conversion processes are '
'much\n'
' faster than specifying an equivalent *cmp* function. This '
'is\n'
' because *cmp* is called multiple times for each list element '
'while\n'
' *key* and *reverse* touch each element only once. Use\n'
' "functools.cmp_to_key()" to convert an old-style *cmp* '
'function to\n'
' a *key* function.\n'
'\n'
' Changed in version 2.3: Support for "None" as an equivalent '
'to\n'
' omitting *cmp* was added.\n'
'\n'
' Changed in version 2.4: Support for *key* and *reverse* was '
'added.\n'
'\n'
'9. Starting with Python 2.3, the "sort()" method is guaranteed '
'to\n'
' be stable. A sort is stable if it guarantees not to change '
'the\n'
' relative order of elements that compare equal --- this is '
'helpful\n'
' for sorting in multiple passes (for example, sort by '
'department,\n'
' then by salary grade).\n'
'\n'
'10. **CPython implementation detail:** While a list is being\n'
' sorted, the effect of attempting to mutate, or even inspect, '
'the\n'
' list is undefined. The C implementation of Python 2.3 and '
'newer\n'
' makes the list appear empty for the duration, and raises\n'
' "ValueError" if it can detect that the list has been '
'mutated\n'
' during a sort.\n'
'\n'
'11. The value *n* is an integer, or an object implementing\n'
' "__index__()". Zero and negative values of *n* clear the\n'
' sequence. Items in the sequence are not copied; they are\n'
' referenced multiple times, as explained for "s * n" under '
'Sequence\n'
' Types --- str, unicode, list, tuple, bytearray, buffer, '
'xrange.\n',
'typesseq-mutable': '\n'
'Mutable Sequence Types\n'
'**********************\n'
'\n'
'List and "bytearray" objects support additional '
'operations that allow\n'
'in-place modification of the object. Other mutable '
'sequence types\n'
'(when added to the language) should also support these '
'operations.\n'
'Strings and tuples are immutable sequence types: such '
'objects cannot\n'
'be modified once created. The following operations are '
'defined on\n'
'mutable sequence types (where *x* is an arbitrary '
'object):\n'
'\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| Operation | '
'Result | Notes '
'|\n'
'+================================+==================================+=======================+\n'
'| "s[i] = x" | item *i* of *s* is '
'replaced by | |\n'
'| | '
'*x* | '
'|\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s[i:j] = t" | slice of *s* from *i* '
'to *j* is | |\n'
'| | replaced by the '
'contents of the | |\n'
'| | iterable '
'*t* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "del s[i:j]" | same as "s[i:j] = '
'[]" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s[i:j:k] = t" | the elements of '
'"s[i:j:k]" are | (1) |\n'
'| | replaced by those of '
'*t* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "del s[i:j:k]" | removes the elements '
'of | |\n'
'| | "s[i:j:k]" from the '
'list | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.append(x)" | same as '
'"s[len(s):len(s)] = [x]" | (2) |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.extend(t)" or "s += t" | for the most part the '
'same as | (3) |\n'
'| | "s[len(s):len(s)] = '
't" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s *= n" | updates *s* with its '
'contents | (11) |\n'
'| | repeated *n* '
'times | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.count(x)" | return number of '
"*i*'s for which | |\n"
'| | "s[i] == '
'x" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.index(x[, i[, j]])" | return smallest *k* '
'such that | (4) |\n'
'| | "s[k] == x" and "i <= '
'k < j" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.insert(i, x)" | same as "s[i:i] = '
'[x]" | (5) |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.pop([i])" | same as "x = s[i]; '
'del s[i]; | (6) |\n'
'| | return '
'x" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.remove(x)" | same as "del '
's[s.index(x)]" | (4) |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.reverse()" | reverses the items of '
'*s* in | (7) |\n'
'| | '
'place | '
'|\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.sort([cmp[, key[, | sort the items of *s* '
'in place | (7)(8)(9)(10) |\n'
'| reverse]]])" '
'| '
'| |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'\n'
'Notes:\n'
'\n'
'1. *t* must have the same length as the slice it is '
'replacing.\n'
'\n'
'2. The C implementation of Python has historically '
'accepted\n'
' multiple parameters and implicitly joined them into a '
'tuple; this\n'
' no longer works in Python 2.0. Use of this '
'misfeature has been\n'
' deprecated since Python 1.4.\n'
'\n'
'3. *t* can be any iterable object.\n'
'\n'
'4. Raises "ValueError" when *x* is not found in *s*. '
'When a\n'
' negative index is passed as the second or third '
'parameter to the\n'
' "index()" method, the list length is added, as for '
'slice indices.\n'
' If it is still negative, it is truncated to zero, as '
'for slice\n'
' indices.\n'
'\n'
' Changed in version 2.3: Previously, "index()" didn\'t '
'have arguments\n'
' for specifying start and stop positions.\n'
'\n'
'5. When a negative index is passed as the first '
'parameter to the\n'
' "insert()" method, the list length is added, as for '
'slice indices.\n'
' If it is still negative, it is truncated to zero, as '
'for slice\n'
' indices.\n'
'\n'
' Changed in version 2.3: Previously, all negative '
'indices were\n'
' truncated to zero.\n'
'\n'
'6. The "pop()" method\'s optional argument *i* defaults '
'to "-1", so\n'
' that by default the last item is removed and '
'returned.\n'
'\n'
'7. The "sort()" and "reverse()" methods modify the list '
'in place\n'
' for economy of space when sorting or reversing a '
'large list. To\n'
' remind you that they operate by side effect, they '
"don't return the\n"
' sorted or reversed list.\n'
'\n'
'8. The "sort()" method takes optional arguments for '
'controlling the\n'
' comparisons.\n'
'\n'
' *cmp* specifies a custom comparison function of two '
'arguments (list\n'
' items) which should return a negative, zero or '
'positive number\n'
' depending on whether the first argument is considered '
'smaller than,\n'
' equal to, or larger than the second argument: '
'"cmp=lambda x,y:\n'
' cmp(x.lower(), y.lower())". The default value is '
'"None".\n'
'\n'
' *key* specifies a function of one argument that is '
'used to extract\n'
' a comparison key from each list element: '
'"key=str.lower". The\n'
' default value is "None".\n'
'\n'
' *reverse* is a boolean value. If set to "True", then '
'the list\n'
' elements are sorted as if each comparison were '
'reversed.\n'
'\n'
' In general, the *key* and *reverse* conversion '
'processes are much\n'
' faster than specifying an equivalent *cmp* function. '
'This is\n'
' because *cmp* is called multiple times for each list '
'element while\n'
' *key* and *reverse* touch each element only once. '
'Use\n'
' "functools.cmp_to_key()" to convert an old-style '
'*cmp* function to\n'
' a *key* function.\n'
'\n'
' Changed in version 2.3: Support for "None" as an '
'equivalent to\n'
' omitting *cmp* was added.\n'
'\n'
' Changed in version 2.4: Support for *key* and '
'*reverse* was added.\n'
'\n'
'9. Starting with Python 2.3, the "sort()" method is '
'guaranteed to\n'
' be stable. A sort is stable if it guarantees not to '
'change the\n'
' relative order of elements that compare equal --- '
'this is helpful\n'
' for sorting in multiple passes (for example, sort by '
'department,\n'
' then by salary grade).\n'
'\n'
'10. **CPython implementation detail:** While a list is '
'being\n'
' sorted, the effect of attempting to mutate, or even '
'inspect, the\n'
' list is undefined. The C implementation of Python '
'2.3 and newer\n'
' makes the list appear empty for the duration, and '
'raises\n'
' "ValueError" if it can detect that the list has been '
'mutated\n'
' during a sort.\n'
'\n'
'11. The value *n* is an integer, or an object '
'implementing\n'
' "__index__()". Zero and negative values of *n* '
'clear the\n'
' sequence. Items in the sequence are not copied; '
'they are\n'
' referenced multiple times, as explained for "s * n" '
'under Sequence\n'
' Types --- str, unicode, list, tuple, bytearray, '
'buffer, xrange.\n',
'unary': '\n'
'Unary arithmetic and bitwise operations\n'
'***************************************\n'
'\n'
'All unary arithmetic and bitwise operations have the same '
'priority:\n'
'\n'
' u_expr ::= power | "-" u_expr | "+" u_expr | "~" u_expr\n'
'\n'
'The unary "-" (minus) operator yields the negation of its numeric\n'
'argument.\n'
'\n'
'The unary "+" (plus) operator yields its numeric argument '
'unchanged.\n'
'\n'
'The unary "~" (invert) operator yields the bitwise inversion of '
'its\n'
'plain or long integer argument. The bitwise inversion of "x" is\n'
'defined as "-(x+1)". It only applies to integral numbers.\n'
'\n'
'In all three cases, if the argument does not have the proper type, '
'a\n'
'"TypeError" exception is raised.\n',
'while': '\n'
'The "while" statement\n'
'*********************\n'
'\n'
'The "while" statement is used for repeated execution as long as an\n'
'expression is true:\n'
'\n'
' while_stmt ::= "while" expression ":" suite\n'
' ["else" ":" suite]\n'
'\n'
'This repeatedly tests the expression and, if it is true, executes '
'the\n'
'first suite; if the expression is false (which may be the first '
'time\n'
'it is tested) the suite of the "else" clause, if present, is '
'executed\n'
'and the loop terminates.\n'
'\n'
'A "break" statement executed in the first suite terminates the '
'loop\n'
'without executing the "else" clause\'s suite. A "continue" '
'statement\n'
'executed in the first suite skips the rest of the suite and goes '
'back\n'
'to testing the expression.\n',
'with': '\n'
'The "with" statement\n'
'********************\n'
'\n'
'New in version 2.5.\n'
'\n'
'The "with" statement is used to wrap the execution of a block with\n'
'methods defined by a context manager (see section With Statement\n'
'Context Managers). This allows common "try"..."except"..."finally"\n'
'usage patterns to be encapsulated for convenient reuse.\n'
'\n'
' with_stmt ::= "with" with_item ("," with_item)* ":" suite\n'
' with_item ::= expression ["as" target]\n'
'\n'
'The execution of the "with" statement with one "item" proceeds as\n'
'follows:\n'
'\n'
'1. The context expression (the expression given in the "with_item")\n'
' is evaluated to obtain a context manager.\n'
'\n'
'2. The context manager\'s "__exit__()" is loaded for later use.\n'
'\n'
'3. The context manager\'s "__enter__()" method is invoked.\n'
'\n'
'4. If a target was included in the "with" statement, the return\n'
' value from "__enter__()" is assigned to it.\n'
'\n'
' Note: The "with" statement guarantees that if the "__enter__()"\n'
' method returns without an error, then "__exit__()" will always '
'be\n'
' called. Thus, if an error occurs during the assignment to the\n'
' target list, it will be treated the same as an error occurring\n'
' within the suite would be. See step 6 below.\n'
'\n'
'5. The suite is executed.\n'
'\n'
'6. The context manager\'s "__exit__()" method is invoked. If an\n'
' exception caused the suite to be exited, its type, value, and\n'
' traceback are passed as arguments to "__exit__()". Otherwise, '
'three\n'
' "None" arguments are supplied.\n'
'\n'
' If the suite was exited due to an exception, and the return '
'value\n'
' from the "__exit__()" method was false, the exception is '
'reraised.\n'
' If the return value was true, the exception is suppressed, and\n'
' execution continues with the statement following the "with"\n'
' statement.\n'
'\n'
' If the suite was exited for any reason other than an exception, '
'the\n'
' return value from "__exit__()" is ignored, and execution '
'proceeds\n'
' at the normal location for the kind of exit that was taken.\n'
'\n'
'With more than one item, the context managers are processed as if\n'
'multiple "with" statements were nested:\n'
'\n'
' with A() as a, B() as b:\n'
' suite\n'
'\n'
'is equivalent to\n'
'\n'
' with A() as a:\n'
' with B() as b:\n'
' suite\n'
'\n'
'Note: In Python 2.5, the "with" statement is only allowed when the\n'
' "with_statement" feature has been enabled. It is always enabled '
'in\n'
' Python 2.6.\n'
'\n'
'Changed in version 2.7: Support for multiple context expressions.\n'
'\n'
'See also:\n'
'\n'
' **PEP 343** - The "with" statement\n'
' The specification, background, and examples for the Python '
'"with"\n'
' statement.\n',
'yield': '\n'
'The "yield" statement\n'
'*********************\n'
'\n'
' yield_stmt ::= yield_expression\n'
'\n'
'The "yield" statement is only used when defining a generator '
'function,\n'
'and is only used in the body of the generator function. Using a\n'
'"yield" statement in a function definition is sufficient to cause '
'that\n'
'definition to create a generator function instead of a normal\n'
'function.\n'
'\n'
'When a generator function is called, it returns an iterator known '
'as a\n'
'generator iterator, or more commonly, a generator. The body of '
'the\n'
"generator function is executed by calling the generator's "
'"next()"\n'
'method repeatedly until it raises an exception.\n'
'\n'
'When a "yield" statement is executed, the state of the generator '
'is\n'
'frozen and the value of "expression_list" is returned to '
'"next()"\'s\n'
'caller. By "frozen" we mean that all local state is retained,\n'
'including the current bindings of local variables, the instruction\n'
'pointer, and the internal evaluation stack: enough information is\n'
'saved so that the next time "next()" is invoked, the function can\n'
'proceed exactly as if the "yield" statement were just another '
'external\n'
'call.\n'
'\n'
'As of Python version 2.5, the "yield" statement is now allowed in '
'the\n'
'"try" clause of a "try" ... "finally" construct. If the generator '
'is\n'
'not resumed before it is finalized (by reaching a zero reference '
'count\n'
"or by being garbage collected), the generator-iterator's "
'"close()"\n'
'method will be called, allowing any pending "finally" clauses to\n'
'execute.\n'
'\n'
'For full details of "yield" semantics, refer to the Yield '
'expressions\n'
'section.\n'
'\n'
'Note: In Python 2.2, the "yield" statement was only allowed when '
'the\n'
' "generators" feature has been enabled. This "__future__" import\n'
' statement was used to enable the feature:\n'
'\n'
' from __future__ import generators\n'
'\n'
'See also:\n'
'\n'
' **PEP 255** - Simple Generators\n'
' The proposal for adding generators and the "yield" statement '
'to\n'
' Python.\n'
'\n'
' **PEP 342** - Coroutines via Enhanced Generators\n'
' The proposal that, among other generator enhancements, '
'proposed\n'
' allowing "yield" to appear inside a "try" ... "finally" '
'block.\n'}
"""Routine to "compile" a .py file to a .pyc (or .pyo) file.
This module has intimate knowledge of the format of .pyc files.
"""
import __builtin__
import imp
import marshal
import os
import sys
import traceback
MAGIC = imp.get_magic()
__all__ = ["compile", "main", "PyCompileError"]
class PyCompileError(Exception):
"""Exception raised when an error occurs while attempting to
compile the file.
To raise this exception, use
raise PyCompileError(exc_type,exc_value,file[,msg])
where
exc_type: exception type to be used in error message
type name can be accesses as class variable
'exc_type_name'
exc_value: exception value to be used in error message
can be accesses as class variable 'exc_value'
file: name of file being compiled to be used in error message
can be accesses as class variable 'file'
msg: string message to be written as error message
If no value is given, a default exception message will be given,
consistent with 'standard' py_compile output.
message (or default) can be accesses as class variable 'msg'
"""
def __init__(self, exc_type, exc_value, file, msg=''):
exc_type_name = exc_type.__name__
if exc_type is SyntaxError:
tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value))
errmsg = tbtext.replace('File "<string>"', 'File "%s"' % file)
else:
errmsg = "Sorry: %s: %s" % (exc_type_name,exc_value)
Exception.__init__(self,msg or errmsg,exc_type_name,exc_value,file)
self.exc_type_name = exc_type_name
self.exc_value = exc_value
self.file = file
self.msg = msg or errmsg
def __str__(self):
return self.msg
def wr_long(f, x):
"""Internal; write a 32-bit int to a file in little-endian order."""
f.write(chr( x & 0xff))
f.write(chr((x >> 8) & 0xff))
f.write(chr((x >> 16) & 0xff))
f.write(chr((x >> 24) & 0xff))
def compile(file, cfile=None, dfile=None, doraise=False):
"""Does nothing on IronPython.
IronPython does not currently support compiling to .pyc
or any other format.
"""
return # not supported on IronPython, so return unconditionally
def main(args=None):
"""Compile several source files.
The files named in 'args' (or on the command line, if 'args' is
not specified) are compiled and the resulting bytecode is cached
in the normal manner. This function does not search a directory
structure to locate source files; it only compiles files named
explicitly. If '-' is the only parameter in args, the list of
files is taken from standard input.
"""
if args is None:
args = sys.argv[1:]
rv = 0
if args == ['-']:
while True:
filename = sys.stdin.readline()
if not filename:
break
filename = filename.rstrip('\n')
try:
compile(filename, doraise=True)
except PyCompileError as error:
rv = 1
sys.stderr.write("%s\n" % error.msg)
except IOError as error:
rv = 1
sys.stderr.write("%s\n" % error)
else:
for filename in args:
try:
compile(filename, doraise=True)
except PyCompileError as error:
# return value to indicate at least one failure
rv = 1
sys.stderr.write("%s\n" % error.msg)
return rv
if __name__ == "__main__":
sys.exit(main())
"""A multi-producer, multi-consumer queue."""
from time import time as _time
try:
import threading as _threading
except ImportError:
import dummy_threading as _threading
from collections import deque
import heapq
__all__ = ['Empty', 'Full', 'Queue', 'PriorityQueue', 'LifoQueue']
class Empty(Exception):
"Exception raised by Queue.get(block=0)/get_nowait()."
pass
class Full(Exception):
"Exception raised by Queue.put(block=0)/put_nowait()."
pass
class Queue:
"""Create a queue object with a given maximum size.
If maxsize is <= 0, the queue size is infinite.
"""
def __init__(self, maxsize=0):
self.maxsize = maxsize
self._init(maxsize)
# mutex must be held whenever the queue is mutating. All methods
# that acquire mutex must release it before returning. mutex
# is shared between the three conditions, so acquiring and
# releasing the conditions also acquires and releases mutex.
self.mutex = _threading.Lock()
# Notify not_empty whenever an item is added to the queue; a
# thread waiting to get is notified then.
self.not_empty = _threading.Condition(self.mutex)
# Notify not_full whenever an item is removed from the queue;
# a thread waiting to put is notified then.
self.not_full = _threading.Condition(self.mutex)
# Notify all_tasks_done whenever the number of unfinished tasks
# drops to zero; thread waiting to join() is notified to resume
self.all_tasks_done = _threading.Condition(self.mutex)
self.unfinished_tasks = 0
def task_done(self):
"""Indicate that a formerly enqueued task is complete.
Used by Queue consumer threads. For each get() used to fetch a task,
a subsequent call to task_done() tells the queue that the processing
on the task is complete.
If a join() is currently blocking, it will resume when all items
have been processed (meaning that a task_done() call was received
for every item that had been put() into the queue).
Raises a ValueError if called more times than there were items
placed in the queue.
"""
self.all_tasks_done.acquire()
try:
unfinished = self.unfinished_tasks - 1
if unfinished <= 0:
if unfinished < 0:
raise ValueError('task_done() called too many times')
self.all_tasks_done.notify_all()
self.unfinished_tasks = unfinished
finally:
self.all_tasks_done.release()
def join(self):
"""Blocks until all items in the Queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls task_done()
to indicate the item was retrieved and all work on it is complete.
When the count of unfinished tasks drops to zero, join() unblocks.
"""
self.all_tasks_done.acquire()
try:
while self.unfinished_tasks:
self.all_tasks_done.wait()
finally:
self.all_tasks_done.release()
def qsize(self):
"""Return the approximate size of the queue (not reliable!)."""
self.mutex.acquire()
n = self._qsize()
self.mutex.release()
return n
def empty(self):
"""Return True if the queue is empty, False otherwise (not reliable!)."""
self.mutex.acquire()
n = not self._qsize()
self.mutex.release()
return n
def full(self):
"""Return True if the queue is full, False otherwise (not reliable!)."""
self.mutex.acquire()
n = 0 < self.maxsize == self._qsize()
self.mutex.release()
return n
def put(self, item, block=True, timeout=None):
"""Put an item into the queue.
If optional args 'block' is true and 'timeout' is None (the default),
block if necessary until a free slot is available. If 'timeout' is
a non-negative number, it blocks at most 'timeout' seconds and raises
the Full exception if no free slot was available within that time.
Otherwise ('block' is false), put an item on the queue if a free slot
is immediately available, else raise the Full exception ('timeout'
is ignored in that case).
"""
self.not_full.acquire()
try:
if self.maxsize > 0:
if not block:
if self._qsize() == self.maxsize:
raise Full
elif timeout is None:
while self._qsize() == self.maxsize:
self.not_full.wait()
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = _time() + timeout
while self._qsize() == self.maxsize:
remaining = endtime - _time()
if remaining <= 0.0:
raise Full
self.not_full.wait(remaining)
self._put(item)
self.unfinished_tasks += 1
self.not_empty.notify()
finally:
self.not_full.release()
def put_nowait(self, item):
"""Put an item into the queue without blocking.
Only enqueue the item if a free slot is immediately available.
Otherwise raise the Full exception.
"""
return self.put(item, False)
def get(self, block=True, timeout=None):
"""Remove and return an item from the queue.
If optional args 'block' is true and 'timeout' is None (the default),
block if necessary until an item is available. If 'timeout' is
a non-negative number, it blocks at most 'timeout' seconds and raises
the Empty exception if no item was available within that time.
Otherwise ('block' is false), return an item if one is immediately
available, else raise the Empty exception ('timeout' is ignored
in that case).
"""
self.not_empty.acquire()
try:
if not block:
if not self._qsize():
raise Empty
elif timeout is None:
while not self._qsize():
self.not_empty.wait()
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = _time() + timeout
while not self._qsize():
remaining = endtime - _time()
if remaining <= 0.0:
raise Empty
self.not_empty.wait(remaining)
item = self._get()
self.not_full.notify()
return item
finally:
self.not_empty.release()
def get_nowait(self):
"""Remove and return an item from the queue without blocking.
Only get an item if one is immediately available. Otherwise
raise the Empty exception.
"""
return self.get(False)
# Override these methods to implement other queue organizations
# (e.g. stack or priority queue).
# These will only be called with appropriate locks held
# Initialize the queue representation
def _init(self, maxsize):
self.queue = deque()
def _qsize(self, len=len):
return len(self.queue)
# Put a new item in the queue
def _put(self, item):
self.queue.append(item)
# Get an item from the queue
def _get(self):
return self.queue.popleft()
class PriorityQueue(Queue):
'''Variant of Queue that retrieves open entries in priority order (lowest first).
Entries are typically tuples of the form: (priority number, data).
'''
def _init(self, maxsize):
self.queue = []
def _qsize(self, len=len):
return len(self.queue)
def _put(self, item, heappush=heapq.heappush):
heappush(self.queue, item)
def _get(self, heappop=heapq.heappop):
return heappop(self.queue)
class LifoQueue(Queue):
'''Variant of Queue that retrieves most recently added entries first.'''
def _init(self, maxsize):
self.queue = []
def _qsize(self, len=len):
return len(self.queue)
def _put(self, item):
self.queue.append(item)
def _get(self):
return self.queue.pop()
#! /usr/bin/env python
"""Conversions to/from quoted-printable transport encoding as per RFC 1521."""
# (Dec 1991 version).
__all__ = ["encode", "decode", "encodestring", "decodestring"]
ESCAPE = '='
MAXLINESIZE = 76
HEX = '0123456789ABCDEF'
EMPTYSTRING = ''
try:
from binascii import a2b_qp, b2a_qp
except ImportError:
a2b_qp = None
b2a_qp = None
def needsquoting(c, quotetabs, header):
"""Decide whether a particular character needs to be quoted.
The 'quotetabs' flag indicates whether embedded tabs and spaces should be
quoted. Note that line-ending tabs and spaces are always encoded, as per
RFC 1521.
"""
if c in ' \t':
return quotetabs
# if header, we have to escape _ because _ is used to escape space
if c == '_':
return header
return c == ESCAPE or not (' ' <= c <= '~')
def quote(c):
"""Quote a single character."""
i = ord(c)
return ESCAPE + HEX[i//16] + HEX[i%16]
def encode(input, output, quotetabs, header = 0):
"""Read 'input', apply quoted-printable encoding, and write to 'output'.
'input' and 'output' are files with readline() and write() methods.
The 'quotetabs' flag indicates whether embedded tabs and spaces should be
quoted. Note that line-ending tabs and spaces are always encoded, as per
RFC 1521.
The 'header' flag indicates whether we are encoding spaces as _ as per
RFC 1522.
"""
if b2a_qp is not None:
data = input.read()
odata = b2a_qp(data, quotetabs = quotetabs, header = header)
output.write(odata)
return
def write(s, output=output, lineEnd='\n'):
# RFC 1521 requires that the line ending in a space or tab must have
# that trailing character encoded.
if s and s[-1:] in ' \t':
output.write(s[:-1] + quote(s[-1]) + lineEnd)
elif s == '.':
output.write(quote(s) + lineEnd)
else:
output.write(s + lineEnd)
prevline = None
while 1:
line = input.readline()
if not line:
break
outline = []
# Strip off any readline induced trailing newline
stripped = ''
if line[-1:] == '\n':
line = line[:-1]
stripped = '\n'
# Calculate the un-length-limited encoded line
for c in line:
if needsquoting(c, quotetabs, header):
c = quote(c)
if header and c == ' ':
outline.append('_')
else:
outline.append(c)
# First, write out the previous line
if prevline is not None:
write(prevline)
# Now see if we need any soft line breaks because of RFC-imposed
# length limitations. Then do the thisline->prevline dance.
thisline = EMPTYSTRING.join(outline)
while len(thisline) > MAXLINESIZE:
# Don't forget to include the soft line break `=' sign in the
# length calculation!
write(thisline[:MAXLINESIZE-1], lineEnd='=\n')
thisline = thisline[MAXLINESIZE-1:]
# Write out the current line
prevline = thisline
# Write out the last line, without a trailing newline
if prevline is not None:
write(prevline, lineEnd=stripped)
def encodestring(s, quotetabs = 0, header = 0):
if b2a_qp is not None:
return b2a_qp(s, quotetabs = quotetabs, header = header)
from cStringIO import StringIO
infp = StringIO(s)
outfp = StringIO()
encode(infp, outfp, quotetabs, header)
return outfp.getvalue()
def decode(input, output, header = 0):
"""Read 'input', apply quoted-printable decoding, and write to 'output'.
'input' and 'output' are files with readline() and write() methods.
If 'header' is true, decode underscore as space (per RFC 1522)."""
if a2b_qp is not None:
data = input.read()
odata = a2b_qp(data, header = header)
output.write(odata)
return
new = ''
while 1:
line = input.readline()
if not line: break
i, n = 0, len(line)
if n > 0 and line[n-1] == '\n':
partial = 0; n = n-1
# Strip trailing whitespace
while n > 0 and line[n-1] in " \t\r":
n = n-1
else:
partial = 1
while i < n:
c = line[i]
if c == '_' and header:
new = new + ' '; i = i+1
elif c != ESCAPE:
new = new + c; i = i+1
elif i+1 == n and not partial:
partial = 1; break
elif i+1 < n and line[i+1] == ESCAPE:
new = new + ESCAPE; i = i+2
elif i+2 < n and ishex(line[i+1]) and ishex(line[i+2]):
new = new + chr(unhex(line[i+1:i+3])); i = i+3
else: # Bad escape sequence -- leave it in
new = new + c; i = i+1
if not partial:
output.write(new + '\n')
new = ''
if new:
output.write(new)
def decodestring(s, header = 0):
if a2b_qp is not None:
return a2b_qp(s, header = header)
from cStringIO import StringIO
infp = StringIO(s)
outfp = StringIO()
decode(infp, outfp, header = header)
return outfp.getvalue()
# Other helper functions
def ishex(c):
"""Return true if the character 'c' is a hexadecimal digit."""
return '0' <= c <= '9' or 'a' <= c <= 'f' or 'A' <= c <= 'F'
def unhex(s):
"""Get the integer value of a hexadecimal number."""
bits = 0
for c in s:
if '0' <= c <= '9':
i = ord('0')
elif 'a' <= c <= 'f':
i = ord('a')-10
elif 'A' <= c <= 'F':
i = ord('A')-10
else:
break
bits = bits*16 + (ord(c) - i)
return bits
def main():
import sys
import getopt
try:
opts, args = getopt.getopt(sys.argv[1:], 'td')
except getopt.error, msg:
sys.stdout = sys.stderr
print msg
print "usage: quopri [-t | -d] [file] ..."
print "-t: quote tabs"
print "-d: decode; default encode"
sys.exit(2)
deco = 0
tabs = 0
for o, a in opts:
if o == '-t': tabs = 1
if o == '-d': deco = 1
if tabs and deco:
sys.stdout = sys.stderr
print "-t and -d are mutually exclusive"
sys.exit(2)
if not args: args = ['-']
sts = 0
for file in args:
if file == '-':
fp = sys.stdin
else:
try:
fp = open(file)
except IOError, msg:
sys.stderr.write("%s: can't open (%s)\n" % (file, msg))
sts = 1
continue
if deco:
decode(fp, sys.stdout)
else:
encode(fp, sys.stdout, tabs)
if fp is not sys.stdin:
fp.close()
if sts:
sys.exit(sts)
if __name__ == '__main__':
main()
"""Random variable generators.
integers
--------
uniform within range
sequences
---------
pick random element
pick random sample
generate random permutation
distributions on the real line:
------------------------------
uniform
triangular
normal (Gaussian)
lognormal
negative exponential
gamma
beta
pareto
Weibull
distributions on the circle (angles 0 to 2pi)
---------------------------------------------
circular uniform
von Mises
General notes on the underlying Mersenne Twister core generator:
* The period is 2**19937-1.
* It is one of the most extensively tested generators in existence.
* Without a direct way to compute N steps forward, the semantics of
jumpahead(n) are weakened to simply jump to another distant state and rely
on the large period to avoid overlapping sequences.
* The random() method is implemented in C, executes in a single Python step,
and is, therefore, threadsafe.
"""
from __future__ import division
from warnings import warn as _warn
from types import MethodType as _MethodType, BuiltinMethodType as _BuiltinMethodType
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
from math import sqrt as _sqrt, acos as _acos, cos as _cos, sin as _sin
from os import urandom as _urandom
from binascii import hexlify as _hexlify
import hashlib as _hashlib
__all__ = ["Random","seed","random","uniform","randint","choice","sample",
"randrange","shuffle","normalvariate","lognormvariate",
"expovariate","vonmisesvariate","gammavariate","triangular",
"gauss","betavariate","paretovariate","weibullvariate",
"getstate","setstate","jumpahead", "WichmannHill", "getrandbits",
"SystemRandom"]
NV_MAGICCONST = 4 * _exp(-0.5)/_sqrt(2.0)
TWOPI = 2.0*_pi
LOG4 = _log(4.0)
SG_MAGICCONST = 1.0 + _log(4.5)
BPF = 53 # Number of bits in a float
RECIP_BPF = 2**-BPF
# Translated by Guido van Rossum from C source provided by
# Adrian Baddeley. Adapted by Raymond Hettinger for use with
# the Mersenne Twister and os.urandom() core generators.
import _random
class Random(_random.Random):
"""Random number generator base class used by bound module functions.
Used to instantiate instances of Random to get generators that don't
share state. Especially useful for multi-threaded programs, creating
a different instance of Random for each thread, and using the jumpahead()
method to ensure that the generated sequences seen by each thread don't
overlap.
Class Random can also be subclassed if you want to use a different basic
generator of your own devising: in that case, override the following
methods: random(), seed(), getstate(), setstate() and jumpahead().
Optionally, implement a getrandbits() method so that randrange() can cover
arbitrarily large ranges.
"""
VERSION = 3 # used by getstate/setstate
def __init__(self, x=None):
"""Initialize an instance.
Optional argument x controls seeding, as for Random.seed().
"""
self.seed(x)
self.gauss_next = None
def seed(self, a=None):
"""Initialize internal state of the random number generator.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
If a is not None or is an int or long, hash(a) is used instead.
Hash values for some types are nondeterministic when the
PYTHONHASHSEED environment variable is enabled.
"""
if a is None:
try:
# Seed with enough bytes to span the 19937 bit
# state space for the Mersenne Twister
a = long(_hexlify(_urandom(2500)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
super(Random, self).seed(a)
self.gauss_next = None
def getstate(self):
"""Return internal state; can be passed to setstate() later."""
return self.VERSION, super(Random, self).getstate(), self.gauss_next
def setstate(self, state):
"""Restore internal state from object returned by getstate()."""
version = state[0]
if version == 3:
version, internalstate, self.gauss_next = state
super(Random, self).setstate(internalstate)
elif version == 2:
version, internalstate, self.gauss_next = state
# In version 2, the state was saved as signed ints, which causes
# inconsistencies between 32/64-bit systems. The state is
# really unsigned 32-bit ints, so we convert negative ints from
# version 2 to positive longs for version 3.
try:
internalstate = tuple( long(x) % (2**32) for x in internalstate )
except ValueError, e:
raise TypeError, e
super(Random, self).setstate(internalstate)
else:
raise ValueError("state with version %s passed to "
"Random.setstate() of version %s" %
(version, self.VERSION))
def jumpahead(self, n):
"""Change the internal state to one that is likely far away
from the current state. This method will not be in Py3.x,
so it is better to simply reseed.
"""
# The super.jumpahead() method uses shuffling to change state,
# so it needs a large and "interesting" n to work with. Here,
# we use hashing to create a large n for the shuffle.
s = repr(n) + repr(self.getstate())
n = int(_hashlib.new('sha512', s).hexdigest(), 16)
super(Random, self).jumpahead(n)
## ---- Methods below this point do not need to be overridden when
## ---- subclassing for the purpose of using a different core generator.
## -------------------- pickle support -------------------
def __getstate__(self): # for pickle
return self.getstate()
def __setstate__(self, state): # for pickle
self.setstate(state)
def __reduce__(self):
return self.__class__, (), self.getstate()
## -------------------- integer methods -------------------
def randrange(self, start, stop=None, step=1, _int=int, _maxwidth=1L<<BPF):
"""Choose a random item from range(start, stop[, step]).
This fixes the problem with randint() which includes the
endpoint; in Python this is usually not what you want.
"""
# This code is a bit messy to make it fast for the
# common case while still doing adequate error checking.
istart = _int(start)
if istart != start:
raise ValueError, "non-integer arg 1 for randrange()"
if stop is None:
if istart > 0:
if istart >= _maxwidth:
return self._randbelow(istart)
return _int(self.random() * istart)
raise ValueError, "empty range for randrange()"
# stop argument supplied.
istop = _int(stop)
if istop != stop:
raise ValueError, "non-integer stop for randrange()"
width = istop - istart
if step == 1 and width > 0:
# Note that
# int(istart + self.random()*width)
# instead would be incorrect. For example, consider istart
# = -2 and istop = 0. Then the guts would be in
# -2.0 to 0.0 exclusive on both ends (ignoring that random()
# might return 0.0), and because int() truncates toward 0, the
# final result would be -1 or 0 (instead of -2 or -1).
# istart + int(self.random()*width)
# would also be incorrect, for a subtler reason: the RHS
# can return a long, and then randrange() would also return
# a long, but we're supposed to return an int (for backward
# compatibility).
if width >= _maxwidth:
return _int(istart + self._randbelow(width))
return _int(istart + _int(self.random()*width))
if step == 1:
raise ValueError, "empty range for randrange() (%d,%d, %d)" % (istart, istop, width)
# Non-unit step argument supplied.
istep = _int(step)
if istep != step:
raise ValueError, "non-integer step for randrange()"
if istep > 0:
n = (width + istep - 1) // istep
elif istep < 0:
n = (width + istep + 1) // istep
else:
raise ValueError, "zero step for randrange()"
if n <= 0:
raise ValueError, "empty range for randrange()"
if n >= _maxwidth:
return istart + istep*self._randbelow(n)
return istart + istep*_int(self.random() * n)
def randint(self, a, b):
"""Return random integer in range [a, b], including both end points.
"""
return self.randrange(a, b+1)
def _randbelow(self, n, _log=_log, _int=int, _maxwidth=1L<<BPF,
_Method=_MethodType, _BuiltinMethod=_BuiltinMethodType):
"""Return a random int in the range [0,n)
Handles the case where n has more bits than returned
by a single call to the underlying generator.
"""
try:
getrandbits = self.getrandbits
except AttributeError:
pass
else:
# Only call self.getrandbits if the original random() builtin method
# has not been overridden or if a new getrandbits() was supplied.
# This assures that the two methods correspond.
if type(self.random) is _BuiltinMethod or type(getrandbits) is _Method:
k = _int(1.00001 + _log(n-1, 2.0)) # 2**k > n-1 > 2**(k-2)
r = getrandbits(k)
while r >= n:
r = getrandbits(k)
return r
if n >= _maxwidth:
_warn("Underlying random() generator does not supply \n"
"enough bits to choose from a population range this large")
return _int(self.random() * n)
## -------------------- sequence methods -------------------
def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
def shuffle(self, x, random=None):
"""x, random=random.random -> shuffle list x in place; return None.
Optional arg random is a 0-argument function returning a random
float in [0.0, 1.0); by default, the standard random.random.
"""
if random is None:
random = self.random
_int = int
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = _int(random() * (i+1))
x[i], x[j] = x[j], x[i]
def sample(self, population, k):
"""Chooses k unique random elements from a population sequence.
Returns a new list containing elements from the population while
leaving the original population unchanged. The resulting list is
in selection order so that all sub-slices will also be valid random
samples. This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).
Members of the population need not be hashable or unique. If the
population contains repeats, then each occurrence is a possible
selection in the sample.
To choose a sample in a range of integers, use xrange as an argument.
This is especially fast and space efficient for sampling from a
large population: sample(xrange(10000000), 60)
"""
# Sampling without replacement entails tracking either potential
# selections (the pool) in a list or previous selections in a set.
# When the number of selections is small compared to the
# population, then tracking selections is efficient, requiring
# only a small set and an occasional reselection. For
# a larger number of selections, the pool tracking method is
# preferred since the list takes less space than the
# set and it doesn't suffer from frequent reselections.
n = len(population)
if not 0 <= k <= n:
raise ValueError("sample larger than population")
random = self.random
_int = int
result = [None] * k
setsize = 21 # size of a small set minus size of an empty list
if k > 5:
setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
if n <= setsize or hasattr(population, "keys"):
# An n-length list is smaller than a k-length set, or this is a
# mapping type so the other algorithm wouldn't work.
pool = list(population)
for i in xrange(k): # invariant: non-selected at [0,n-i)
j = _int(random() * (n-i))
result[i] = pool[j]
pool[j] = pool[n-i-1] # move non-selected item into vacancy
else:
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = _int(random() * n)
while j in selected:
j = _int(random() * n)
selected_add(j)
result[i] = population[j]
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
## -------------------- real-valued distributions -------------------
## -------------------- uniform distribution -------------------
def uniform(self, a, b):
"Get a random number in the range [a, b) or [a, b] depending on rounding."
return a + (b-a) * self.random()
## -------------------- triangular --------------------
def triangular(self, low=0.0, high=1.0, mode=None):
"""Triangular distribution.
Continuous distribution bounded by given lower and upper limits,
and having a given mode value in-between.
http://en.wikipedia.org/wiki/Triangular_distribution
"""
u = self.random()
try:
c = 0.5 if mode is None else (mode - low) / (high - low)
except ZeroDivisionError:
return low
if u > c:
u = 1.0 - u
c = 1.0 - c
low, high = high, low
return low + (high - low) * (u * c) ** 0.5
## -------------------- normal distribution --------------------
def normalvariate(self, mu, sigma):
"""Normal distribution.
mu is the mean, and sigma is the standard deviation.
"""
# mu = mean, sigma = standard deviation
# Uses Kinderman and Monahan method. Reference: Kinderman,
# A.J. and Monahan, J.F., "Computer generation of random
# variables using the ratio of uniform deviates", ACM Trans
# Math Software, 3, (1977), pp257-260.
random = self.random
while 1:
u1 = random()
u2 = 1.0 - random()
z = NV_MAGICCONST*(u1-0.5)/u2
zz = z*z/4.0
if zz <= -_log(u2):
break
return mu + z*sigma
## -------------------- lognormal distribution --------------------
def lognormvariate(self, mu, sigma):
"""Log normal distribution.
If you take the natural logarithm of this distribution, you'll get a
normal distribution with mean mu and standard deviation sigma.
mu can have any value, and sigma must be greater than zero.
"""
return _exp(self.normalvariate(mu, sigma))
## -------------------- exponential distribution --------------------
def expovariate(self, lambd):
"""Exponential distribution.
lambd is 1.0 divided by the desired mean. It should be
nonzero. (The parameter would be called "lambda", but that is
a reserved word in Python.) Returned values range from 0 to
positive infinity if lambd is positive, and from negative
infinity to 0 if lambd is negative.
"""
# lambd: rate lambd = 1/mean
# ('lambda' is a Python reserved word)
# we use 1-random() instead of random() to preclude the
# possibility of taking the log of zero.
return -_log(1.0 - self.random())/lambd
## -------------------- von Mises distribution --------------------
def vonmisesvariate(self, mu, kappa):
"""Circular data distribution.
mu is the mean angle, expressed in radians between 0 and 2*pi, and
kappa is the concentration parameter, which must be greater than or
equal to zero. If kappa is equal to zero, this distribution reduces
to a uniform random angle over the range 0 to 2*pi.
"""
# mu: mean angle (in radians between 0 and 2*pi)
# kappa: concentration parameter kappa (>= 0)
# if kappa = 0 generate uniform random angle
# Based upon an algorithm published in: Fisher, N.I.,
# "Statistical Analysis of Circular Data", Cambridge
# University Press, 1993.
# Thanks to Magnus Kessler for a correction to the
# implementation of step 4.
random = self.random
if kappa <= 1e-6:
return TWOPI * random()
s = 0.5 / kappa
r = s + _sqrt(1.0 + s * s)
while 1:
u1 = random()
z = _cos(_pi * u1)
d = z / (r + z)
u2 = random()
if u2 < 1.0 - d * d or u2 <= (1.0 - d) * _exp(d):
break
q = 1.0 / r
f = (q + z) / (1.0 + q * z)
u3 = random()
if u3 > 0.5:
theta = (mu + _acos(f)) % TWOPI
else:
theta = (mu - _acos(f)) % TWOPI
return theta
## -------------------- gamma distribution --------------------
def gammavariate(self, alpha, beta):
"""Gamma distribution. Not the gamma function!
Conditions on the parameters are alpha > 0 and beta > 0.
The probability distribution function is:
x ** (alpha - 1) * math.exp(-x / beta)
pdf(x) = --------------------------------------
math.gamma(alpha) * beta ** alpha
"""
# alpha > 0, beta > 0, mean is alpha*beta, variance is alpha*beta**2
# Warning: a few older sources define the gamma distribution in terms
# of alpha > -1.0
if alpha <= 0.0 or beta <= 0.0:
raise ValueError, 'gammavariate: alpha and beta must be > 0.0'
random = self.random
if alpha > 1.0:
# Uses R.C.H. Cheng, "The generation of Gamma
# variables with non-integral shape parameters",
# Applied Statistics, (1977), 26, No. 1, p71-74
ainv = _sqrt(2.0 * alpha - 1.0)
bbb = alpha - LOG4
ccc = alpha + ainv
while 1:
u1 = random()
if not 1e-7 < u1 < .9999999:
continue
u2 = 1.0 - random()
v = _log(u1/(1.0-u1))/ainv
x = alpha*_exp(v)
z = u1*u1*u2
r = bbb+ccc*v-x
if r + SG_MAGICCONST - 4.5*z >= 0.0 or r >= _log(z):
return x * beta
elif alpha == 1.0:
# expovariate(1)
u = random()
while u <= 1e-7:
u = random()
return -_log(u) * beta
else: # alpha is between 0 and 1 (exclusive)
# Uses ALGORITHM GS of Statistical Computing - Kennedy & Gentle
while 1:
u = random()
b = (_e + alpha)/_e
p = b*u
if p <= 1.0:
x = p ** (1.0/alpha)
else:
x = -_log((b-p)/alpha)
u1 = random()
if p > 1.0:
if u1 <= x ** (alpha - 1.0):
break
elif u1 <= _exp(-x):
break
return x * beta
## -------------------- Gauss (faster alternative) --------------------
def gauss(self, mu, sigma):
"""Gaussian distribution.
mu is the mean, and sigma is the standard deviation. This is
slightly faster than the normalvariate() function.
Not thread-safe without a lock around calls.
"""
# When x and y are two variables from [0, 1), uniformly
# distributed, then
#
# cos(2*pi*x)*sqrt(-2*log(1-y))
# sin(2*pi*x)*sqrt(-2*log(1-y))
#
# are two *independent* variables with normal distribution
# (mu = 0, sigma = 1).
# (Lambert Meertens)
# (corrected version; bug discovered by Mike Miller, fixed by LM)
# Multithreading note: When two threads call this function
# simultaneously, it is possible that they will receive the
# same return value. The window is very small though. To
# avoid this, you have to use a lock around all calls. (I
# didn't want to slow this down in the serial case by using a
# lock here.)
random = self.random
z = self.gauss_next
self.gauss_next = None
if z is None:
x2pi = random() * TWOPI
g2rad = _sqrt(-2.0 * _log(1.0 - random()))
z = _cos(x2pi) * g2rad
self.gauss_next = _sin(x2pi) * g2rad
return mu + z*sigma
## -------------------- beta --------------------
## See
## http://mail.python.org/pipermail/python-bugs-list/2001-January/003752.html
## for Ivan Frohne's insightful analysis of why the original implementation:
##
## def betavariate(self, alpha, beta):
## # Discrete Event Simulation in C, pp 87-88.
##
## y = self.expovariate(alpha)
## z = self.expovariate(1.0/beta)
## return z/(y+z)
##
## was dead wrong, and how it probably got that way.
def betavariate(self, alpha, beta):
"""Beta distribution.
Conditions on the parameters are alpha > 0 and beta > 0.
Returned values range between 0 and 1.
"""
# This version due to Janne Sinkkonen, and matches all the std
# texts (e.g., Knuth Vol 2 Ed 3 pg 134 "the beta distribution").
y = self.gammavariate(alpha, 1.)
if y == 0:
return 0.0
else:
return y / (y + self.gammavariate(beta, 1.))
## -------------------- Pareto --------------------
def paretovariate(self, alpha):
"""Pareto distribution. alpha is the shape parameter."""
# Jain, pg. 495
u = 1.0 - self.random()
return 1.0 / pow(u, 1.0/alpha)
## -------------------- Weibull --------------------
def weibullvariate(self, alpha, beta):
"""Weibull distribution.
alpha is the scale parameter and beta is the shape parameter.
"""
# Jain, pg. 499; bug fix courtesy Bill Arms
u = 1.0 - self.random()
return alpha * pow(-_log(u), 1.0/beta)
## -------------------- Wichmann-Hill -------------------
class WichmannHill(Random):
VERSION = 1 # used by getstate/setstate
def seed(self, a=None):
"""Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
If a is not None or an int or long, hash(a) is used instead.
If a is an int or long, a is used directly. Distinct values between
0 and 27814431486575L inclusive are guaranteed to yield distinct
internal states (this guarantee is specific to the default
Wichmann-Hill generator).
"""
if a is None:
try:
a = long(_hexlify(_urandom(16)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
if not isinstance(a, (int, long)):
a = hash(a)
a, x = divmod(a, 30268)
a, y = divmod(a, 30306)
a, z = divmod(a, 30322)
self._seed = int(x)+1, int(y)+1, int(z)+1
self.gauss_next = None
def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
# Wichman-Hill random number generator.
#
# Wichmann, B. A. & Hill, I. D. (1982)
# Algorithm AS 183:
# An efficient and portable pseudo-random number generator
# Applied Statistics 31 (1982) 188-190
#
# see also:
# Correction to Algorithm AS 183
# Applied Statistics 33 (1984) 123
#
# McLeod, A. I. (1985)
# A remark on Algorithm AS 183
# Applied Statistics 34 (1985),198-200
# This part is thread-unsafe:
# BEGIN CRITICAL SECTION
x, y, z = self._seed
x = (171 * x) % 30269
y = (172 * y) % 30307
z = (170 * z) % 30323
self._seed = x, y, z
# END CRITICAL SECTION
# Note: on a platform using IEEE-754 double arithmetic, this can
# never return 0.0 (asserted by Tim; proof too long for a comment).
return (x/30269.0 + y/30307.0 + z/30323.0) % 1.0
def getstate(self):
"""Return internal state; can be passed to setstate() later."""
return self.VERSION, self._seed, self.gauss_next
def setstate(self, state):
"""Restore internal state from object returned by getstate()."""
version = state[0]
if version == 1:
version, self._seed, self.gauss_next = state
else:
raise ValueError("state with version %s passed to "
"Random.setstate() of version %s" %
(version, self.VERSION))
def jumpahead(self, n):
"""Act as if n calls to random() were made, but quickly.
n is an int, greater than or equal to 0.
Example use: If you have 2 threads and know that each will
consume no more than a million random numbers, create two Random
objects r1 and r2, then do
r2.setstate(r1.getstate())
r2.jumpahead(1000000)
Then r1 and r2 will use guaranteed-disjoint segments of the full
period.
"""
if not n >= 0:
raise ValueError("n must be >= 0")
x, y, z = self._seed
x = int(x * pow(171, n, 30269)) % 30269
y = int(y * pow(172, n, 30307)) % 30307
z = int(z * pow(170, n, 30323)) % 30323
self._seed = x, y, z
def __whseed(self, x=0, y=0, z=0):
"""Set the Wichmann-Hill seed from (x, y, z).
These must be integers in the range [0, 256).
"""
if not type(x) == type(y) == type(z) == int:
raise TypeError('seeds must be integers')
if not (0 <= x < 256 and 0 <= y < 256 and 0 <= z < 256):
raise ValueError('seeds must be in range(0, 256)')
if 0 == x == y == z:
# Initialize from current time
import time
t = long(time.time() * 256)
t = int((t&0xffffff) ^ (t>>24))
t, x = divmod(t, 256)
t, y = divmod(t, 256)
t, z = divmod(t, 256)
# Zero is a poor seed, so substitute 1
self._seed = (x or 1, y or 1, z or 1)
self.gauss_next = None
def whseed(self, a=None):
"""Seed from hashable object's hash code.
None or no argument seeds from current time. It is not guaranteed
that objects with distinct hash codes lead to distinct internal
states.
This is obsolete, provided for compatibility with the seed routine
used prior to Python 2.1. Use the .seed() method instead.
"""
if a is None:
self.__whseed()
return
a = hash(a)
a, x = divmod(a, 256)
a, y = divmod(a, 256)
a, z = divmod(a, 256)
x = (x + a) % 256 or 1
y = (y + a) % 256 or 1
z = (z + a) % 256 or 1
self.__whseed(x, y, z)
## --------------- Operating System Random Source ------------------
class SystemRandom(Random):
"""Alternate random number generator using sources provided
by the operating system (such as /dev/urandom on Unix or
CryptGenRandom on Windows).
Not available on all systems (see os.urandom() for details).
"""
def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
return (long(_hexlify(_urandom(7)), 16) >> 3) * RECIP_BPF
def getrandbits(self, k):
"""getrandbits(k) -> x. Generates a long int with k random bits."""
if k <= 0:
raise ValueError('number of bits must be greater than zero')
if k != int(k):
raise TypeError('number of bits should be an integer')
bytes = (k + 7) // 8 # bits / 8 and rounded up
x = long(_hexlify(_urandom(bytes)), 16)
return x >> (bytes * 8 - k) # trim excess bits
def _stub(self, *args, **kwds):
"Stub method. Not used for a system random number generator."
return None
seed = jumpahead = _stub
def _notimplemented(self, *args, **kwds):
"Method should not be called for a system random number generator."
raise NotImplementedError('System entropy source does not have state.')
getstate = setstate = _notimplemented
## -------------------- test program --------------------
def _test_generator(n, func, args):
import time
print n, 'times', func.__name__
total = 0.0
sqsum = 0.0
smallest = 1e10
largest = -1e10
t0 = time.time()
for i in range(n):
x = func(*args)
total += x
sqsum = sqsum + x*x
smallest = min(x, smallest)
largest = max(x, largest)
t1 = time.time()
print round(t1-t0, 3), 'sec,',
avg = total/n
stddev = _sqrt(sqsum/n - avg*avg)
print 'avg %g, stddev %g, min %g, max %g' % \
(avg, stddev, smallest, largest)
def _test(N=2000):
_test_generator(N, random, ())
_test_generator(N, normalvariate, (0.0, 1.0))
_test_generator(N, lognormvariate, (0.0, 1.0))
_test_generator(N, vonmisesvariate, (0.0, 1.0))
_test_generator(N, gammavariate, (0.01, 1.0))
_test_generator(N, gammavariate, (0.1, 1.0))
_test_generator(N, gammavariate, (0.1, 2.0))
_test_generator(N, gammavariate, (0.5, 1.0))
_test_generator(N, gammavariate, (0.9, 1.0))
_test_generator(N, gammavariate, (1.0, 1.0))
_test_generator(N, gammavariate, (2.0, 1.0))
_test_generator(N, gammavariate, (20.0, 1.0))
_test_generator(N, gammavariate, (200.0, 1.0))
_test_generator(N, gauss, (0.0, 1.0))
_test_generator(N, betavariate, (3.0, 3.0))
_test_generator(N, triangular, (0.0, 1.0, 1.0/3.0))
# Create one instance, seeded from current time, and export its methods
# as module-level functions. The functions share state across all uses
#(both in the user's code and in the Python libraries), but that's fine
# for most programs and is easier for the casual user than making them
# instantiate their own Random() instance.
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
triangular = _inst.triangular
randint = _inst.randint
choice = _inst.choice
randrange = _inst.randrange
sample = _inst.sample
shuffle = _inst.shuffle
normalvariate = _inst.normalvariate
lognormvariate = _inst.lognormvariate
expovariate = _inst.expovariate
vonmisesvariate = _inst.vonmisesvariate
gammavariate = _inst.gammavariate
gauss = _inst.gauss
betavariate = _inst.betavariate
paretovariate = _inst.paretovariate
weibullvariate = _inst.weibullvariate
getstate = _inst.getstate
setstate = _inst.setstate
jumpahead = _inst.jumpahead
getrandbits = _inst.getrandbits
if __name__ == '__main__':
_test()
"""Redo the builtin repr() (representation) but with limits on most sizes."""
__all__ = ["Repr","repr"]
import __builtin__
from itertools import islice
class Repr:
def __init__(self):
self.maxlevel = 6
self.maxtuple = 6
self.maxlist = 6
self.maxarray = 5
self.maxdict = 4
self.maxset = 6
self.maxfrozenset = 6
self.maxdeque = 6
self.maxstring = 30
self.maxlong = 40
self.maxother = 20
def repr(self, x):
return self.repr1(x, self.maxlevel)
def repr1(self, x, level):
typename = type(x).__name__
if ' ' in typename:
parts = typename.split()
typename = '_'.join(parts)
if hasattr(self, 'repr_' + typename):
return getattr(self, 'repr_' + typename)(x, level)
else:
s = __builtin__.repr(x)
if len(s) > self.maxother:
i = max(0, (self.maxother-3)//2)
j = max(0, self.maxother-3-i)
s = s[:i] + '...' + s[len(s)-j:]
return s
def _repr_iterable(self, x, level, left, right, maxiter, trail=''):
n = len(x)
if level <= 0 and n:
s = '...'
else:
newlevel = level - 1
repr1 = self.repr1
pieces = [repr1(elem, newlevel) for elem in islice(x, maxiter)]
if n > maxiter: pieces.append('...')
s = ', '.join(pieces)
if n == 1 and trail: right = trail + right
return '%s%s%s' % (left, s, right)
def repr_tuple(self, x, level):
return self._repr_iterable(x, level, '(', ')', self.maxtuple, ',')
def repr_list(self, x, level):
return self._repr_iterable(x, level, '[', ']', self.maxlist)
def repr_array(self, x, level):
header = "array('%s', [" % x.typecode
return self._repr_iterable(x, level, header, '])', self.maxarray)
def repr_set(self, x, level):
x = _possibly_sorted(x)
return self._repr_iterable(x, level, 'set([', '])', self.maxset)
def repr_frozenset(self, x, level):
x = _possibly_sorted(x)
return self._repr_iterable(x, level, 'frozenset([', '])',
self.maxfrozenset)
def repr_deque(self, x, level):
return self._repr_iterable(x, level, 'deque([', '])', self.maxdeque)
def repr_dict(self, x, level):
n = len(x)
if n == 0: return '{}'
if level <= 0: return '{...}'
newlevel = level - 1
repr1 = self.repr1
pieces = []
for key in islice(_possibly_sorted(x), self.maxdict):
keyrepr = repr1(key, newlevel)
valrepr = repr1(x[key], newlevel)
pieces.append('%s: %s' % (keyrepr, valrepr))
if n > self.maxdict: pieces.append('...')
s = ', '.join(pieces)
return '{%s}' % (s,)
def repr_str(self, x, level):
s = __builtin__.repr(x[:self.maxstring])
if len(s) > self.maxstring:
i = max(0, (self.maxstring-3)//2)
j = max(0, self.maxstring-3-i)
s = __builtin__.repr(x[:i] + x[len(x)-j:])
s = s[:i] + '...' + s[len(s)-j:]
return s
def repr_long(self, x, level):
s = __builtin__.repr(x) # XXX Hope this isn't too slow...
if len(s) > self.maxlong:
i = max(0, (self.maxlong-3)//2)
j = max(0, self.maxlong-3-i)
s = s[:i] + '...' + s[len(s)-j:]
return s
def repr_instance(self, x, level):
try:
s = __builtin__.repr(x)
# Bugs in x.__repr__() can cause arbitrary
# exceptions -- then make up something
except Exception:
return '<%s instance at %x>' % (x.__class__.__name__, id(x))
if len(s) > self.maxstring:
i = max(0, (self.maxstring-3)//2)
j = max(0, self.maxstring-3-i)
s = s[:i] + '...' + s[len(s)-j:]
return s
def _possibly_sorted(x):
# Since not all sequences of items can be sorted and comparison
# functions may raise arbitrary exceptions, return an unsorted
# sequence in that case.
try:
return sorted(x)
except Exception:
return list(x)
aRepr = Repr()
repr = aRepr.repr
"""Restricted execution facilities.
The class RExec exports methods r_exec(), r_eval(), r_execfile(), and
r_import(), which correspond roughly to the built-in operations
exec, eval(), execfile() and import, but executing the code in an
environment that only exposes those built-in operations that are
deemed safe. To this end, a modest collection of 'fake' modules is
created which mimics the standard modules by the same names. It is a
policy decision which built-in modules and operations are made
available; this module provides a reasonable default, but derived
classes can change the policies e.g. by overriding or extending class
variables like ok_builtin_modules or methods like make_sys().
XXX To do:
- r_open should allow writing tmp dir
- r_exec etc. with explicit globals/locals? (Use rexec("exec ... in ...")?)
"""
from warnings import warnpy3k
warnpy3k("the rexec module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
import sys
import __builtin__
import os
import ihooks
import imp
__all__ = ["RExec"]
class FileBase:
ok_file_methods = ('fileno', 'flush', 'isatty', 'read', 'readline',
'readlines', 'seek', 'tell', 'write', 'writelines', 'xreadlines',
'__iter__')
class FileWrapper(FileBase):
# XXX This is just like a Bastion -- should use that!
def __init__(self, f):
for m in self.ok_file_methods:
if not hasattr(self, m) and hasattr(f, m):
setattr(self, m, getattr(f, m))
def close(self):
self.flush()
TEMPLATE = """
def %s(self, *args):
return getattr(self.mod, self.name).%s(*args)
"""
class FileDelegate(FileBase):
def __init__(self, mod, name):
self.mod = mod
self.name = name
for m in FileBase.ok_file_methods + ('close',):
exec TEMPLATE % (m, m)
class RHooks(ihooks.Hooks):
def __init__(self, *args):
# Hacks to support both old and new interfaces:
# old interface was RHooks(rexec[, verbose])
# new interface is RHooks([verbose])
verbose = 0
rexec = None
if args and type(args[-1]) == type(0):
verbose = args[-1]
args = args[:-1]
if args and hasattr(args[0], '__class__'):
rexec = args[0]
args = args[1:]
if args:
raise TypeError, "too many arguments"
ihooks.Hooks.__init__(self, verbose)
self.rexec = rexec
def set_rexec(self, rexec):
# Called by RExec instance to complete initialization
self.rexec = rexec
def get_suffixes(self):
return self.rexec.get_suffixes()
def is_builtin(self, name):
return self.rexec.is_builtin(name)
def init_builtin(self, name):
m = __import__(name)
return self.rexec.copy_except(m, ())
def init_frozen(self, name): raise SystemError, "don't use this"
def load_source(self, *args): raise SystemError, "don't use this"
def load_compiled(self, *args): raise SystemError, "don't use this"
def load_package(self, *args): raise SystemError, "don't use this"
def load_dynamic(self, name, filename, file):
return self.rexec.load_dynamic(name, filename, file)
def add_module(self, name):
return self.rexec.add_module(name)
def modules_dict(self):
return self.rexec.modules
def default_path(self):
return self.rexec.modules['sys'].path
# XXX Backwards compatibility
RModuleLoader = ihooks.FancyModuleLoader
RModuleImporter = ihooks.ModuleImporter
class RExec(ihooks._Verbose):
"""Basic restricted execution framework.
Code executed in this restricted environment will only have access to
modules and functions that are deemed safe; you can subclass RExec to
add or remove capabilities as desired.
The RExec class can prevent code from performing unsafe operations like
reading or writing disk files, or using TCP/IP sockets. However, it does
not protect against code using extremely large amounts of memory or
processor time.
"""
ok_path = tuple(sys.path) # That's a policy decision
ok_builtin_modules = ('audioop', 'array', 'binascii',
'cmath', 'errno', 'imageop',
'marshal', 'math', 'md5', 'operator',
'parser', 'select',
'sha', '_sre', 'strop', 'struct', 'time',
'_weakref')
ok_posix_names = ('error', 'fstat', 'listdir', 'lstat', 'readlink',
'stat', 'times', 'uname', 'getpid', 'getppid',
'getcwd', 'getuid', 'getgid', 'geteuid', 'getegid')
ok_sys_names = ('byteorder', 'copyright', 'exit', 'getdefaultencoding',
'getrefcount', 'hexversion', 'maxint', 'maxunicode',
'platform', 'ps1', 'ps2', 'version', 'version_info')
nok_builtin_names = ('open', 'file', 'reload', '__import__')
ok_file_types = (imp.C_EXTENSION, imp.PY_SOURCE)
def __init__(self, hooks = None, verbose = 0):
"""Returns an instance of the RExec class.
The hooks parameter is an instance of the RHooks class or a subclass
of it. If it is omitted or None, the default RHooks class is
instantiated.
Whenever the RExec module searches for a module (even a built-in one)
or reads a module's code, it doesn't actually go out to the file
system itself. Rather, it calls methods of an RHooks instance that
was passed to or created by its constructor. (Actually, the RExec
object doesn't make these calls --- they are made by a module loader
object that's part of the RExec object. This allows another level of
flexibility, which can be useful when changing the mechanics of
import within the restricted environment.)
By providing an alternate RHooks object, we can control the file
system accesses made to import a module, without changing the
actual algorithm that controls the order in which those accesses are
made. For instance, we could substitute an RHooks object that
passes all filesystem requests to a file server elsewhere, via some
RPC mechanism such as ILU. Grail's applet loader uses this to support
importing applets from a URL for a directory.
If the verbose parameter is true, additional debugging output may be
sent to standard output.
"""
raise RuntimeError, "This code is not secure in Python 2.2 and later"
ihooks._Verbose.__init__(self, verbose)
# XXX There's a circular reference here:
self.hooks = hooks or RHooks(verbose)
self.hooks.set_rexec(self)
self.modules = {}
self.ok_dynamic_modules = self.ok_builtin_modules
list = []
for mname in self.ok_builtin_modules:
if mname in sys.builtin_module_names:
list.append(mname)
self.ok_builtin_modules = tuple(list)
self.set_trusted_path()
self.make_builtin()
self.make_initial_modules()
# make_sys must be last because it adds the already created
# modules to its builtin_module_names
self.make_sys()
self.loader = RModuleLoader(self.hooks, verbose)
self.importer = RModuleImporter(self.loader, verbose)
def set_trusted_path(self):
# Set the path from which dynamic modules may be loaded.
# Those dynamic modules must also occur in ok_builtin_modules
self.trusted_path = filter(os.path.isabs, sys.path)
def load_dynamic(self, name, filename, file):
if name not in self.ok_dynamic_modules:
raise ImportError, "untrusted dynamic module: %s" % name
if name in sys.modules:
src = sys.modules[name]
else:
src = imp.load_dynamic(name, filename, file)
dst = self.copy_except(src, [])
return dst
def make_initial_modules(self):
self.make_main()
self.make_osname()
# Helpers for RHooks
def get_suffixes(self):
return [item # (suff, mode, type)
for item in imp.get_suffixes()
if item[2] in self.ok_file_types]
def is_builtin(self, mname):
return mname in self.ok_builtin_modules
# The make_* methods create specific built-in modules
def make_builtin(self):
m = self.copy_except(__builtin__, self.nok_builtin_names)
m.__import__ = self.r_import
m.reload = self.r_reload
m.open = m.file = self.r_open
def make_main(self):
self.add_module('__main__')
def make_osname(self):
osname = os.name
src = __import__(osname)
dst = self.copy_only(src, self.ok_posix_names)
dst.environ = e = {}
for key, value in os.environ.items():
e[key] = value
def make_sys(self):
m = self.copy_only(sys, self.ok_sys_names)
m.modules = self.modules
m.argv = ['RESTRICTED']
m.path = map(None, self.ok_path)
m.exc_info = self.r_exc_info
m = self.modules['sys']
l = self.modules.keys() + list(self.ok_builtin_modules)
l.sort()
m.builtin_module_names = tuple(l)
# The copy_* methods copy existing modules with some changes
def copy_except(self, src, exceptions):
dst = self.copy_none(src)
for name in dir(src):
setattr(dst, name, getattr(src, name))
for name in exceptions:
try:
delattr(dst, name)
except AttributeError:
pass
return dst
def copy_only(self, src, names):
dst = self.copy_none(src)
for name in names:
try:
value = getattr(src, name)
except AttributeError:
continue
setattr(dst, name, value)
return dst
def copy_none(self, src):
m = self.add_module(src.__name__)
m.__doc__ = src.__doc__
return m
# Add a module -- return an existing module or create one
def add_module(self, mname):
m = self.modules.get(mname)
if m is None:
self.modules[mname] = m = self.hooks.new_module(mname)
m.__builtins__ = self.modules['__builtin__']
return m
# The r* methods are public interfaces
def r_exec(self, code):
"""Execute code within a restricted environment.
The code parameter must either be a string containing one or more
lines of Python code, or a compiled code object, which will be
executed in the restricted environment's __main__ module.
"""
m = self.add_module('__main__')
exec code in m.__dict__
def r_eval(self, code):
"""Evaluate code within a restricted environment.
The code parameter must either be a string containing a Python
expression, or a compiled code object, which will be evaluated in
the restricted environment's __main__ module. The value of the
expression or code object will be returned.
"""
m = self.add_module('__main__')
return eval(code, m.__dict__)
def r_execfile(self, file):
"""Execute the Python code in the file in the restricted
environment's __main__ module.
"""
m = self.add_module('__main__')
execfile(file, m.__dict__)
def r_import(self, mname, globals={}, locals={}, fromlist=[]):
"""Import a module, raising an ImportError exception if the module
is considered unsafe.
This method is implicitly called by code executing in the
restricted environment. Overriding this method in a subclass is
used to change the policies enforced by a restricted environment.
"""
return self.importer.import_module(mname, globals, locals, fromlist)
def r_reload(self, m):
"""Reload the module object, re-parsing and re-initializing it.
This method is implicitly called by code executing in the
restricted environment. Overriding this method in a subclass is
used to change the policies enforced by a restricted environment.
"""
return self.importer.reload(m)
def r_unload(self, m):
"""Unload the module.
Removes it from the restricted environment's sys.modules dictionary.
This method is implicitly called by code executing in the
restricted environment. Overriding this method in a subclass is
used to change the policies enforced by a restricted environment.
"""
return self.importer.unload(m)
# The s_* methods are similar but also swap std{in,out,err}
def make_delegate_files(self):
s = self.modules['sys']
self.delegate_stdin = FileDelegate(s, 'stdin')
self.delegate_stdout = FileDelegate(s, 'stdout')
self.delegate_stderr = FileDelegate(s, 'stderr')
self.restricted_stdin = FileWrapper(sys.stdin)
self.restricted_stdout = FileWrapper(sys.stdout)
self.restricted_stderr = FileWrapper(sys.stderr)
def set_files(self):
if not hasattr(self, 'save_stdin'):
self.save_files()
if not hasattr(self, 'delegate_stdin'):
self.make_delegate_files()
s = self.modules['sys']
s.stdin = self.restricted_stdin
s.stdout = self.restricted_stdout
s.stderr = self.restricted_stderr
sys.stdin = self.delegate_stdin
sys.stdout = self.delegate_stdout
sys.stderr = self.delegate_stderr
def reset_files(self):
self.restore_files()
s = self.modules['sys']
self.restricted_stdin = s.stdin
self.restricted_stdout = s.stdout
self.restricted_stderr = s.stderr
def save_files(self):
self.save_stdin = sys.stdin
self.save_stdout = sys.stdout
self.save_stderr = sys.stderr
def restore_files(self):
sys.stdin = self.save_stdin
sys.stdout = self.save_stdout
sys.stderr = self.save_stderr
def s_apply(self, func, args=(), kw={}):
self.save_files()
try:
self.set_files()
r = func(*args, **kw)
finally:
self.restore_files()
return r
def s_exec(self, *args):
"""Execute code within a restricted environment.
Similar to the r_exec() method, but the code will be granted access
to restricted versions of the standard I/O streams sys.stdin,
sys.stderr, and sys.stdout.
The code parameter must either be a string containing one or more
lines of Python code, or a compiled code object, which will be
executed in the restricted environment's __main__ module.
"""
return self.s_apply(self.r_exec, args)
def s_eval(self, *args):
"""Evaluate code within a restricted environment.
Similar to the r_eval() method, but the code will be granted access
to restricted versions of the standard I/O streams sys.stdin,
sys.stderr, and sys.stdout.
The code parameter must either be a string containing a Python
expression, or a compiled code object, which will be evaluated in
the restricted environment's __main__ module. The value of the
expression or code object will be returned.
"""
return self.s_apply(self.r_eval, args)
def s_execfile(self, *args):
"""Execute the Python code in the file in the restricted
environment's __main__ module.
Similar to the r_execfile() method, but the code will be granted
access to restricted versions of the standard I/O streams sys.stdin,
sys.stderr, and sys.stdout.
"""
return self.s_apply(self.r_execfile, args)
def s_import(self, *args):
"""Import a module, raising an ImportError exception if the module
is considered unsafe.
This method is implicitly called by code executing in the
restricted environment. Overriding this method in a subclass is
used to change the policies enforced by a restricted environment.
Similar to the r_import() method, but has access to restricted
versions of the standard I/O streams sys.stdin, sys.stderr, and
sys.stdout.
"""
return self.s_apply(self.r_import, args)
def s_reload(self, *args):
"""Reload the module object, re-parsing and re-initializing it.
This method is implicitly called by code executing in the
restricted environment. Overriding this method in a subclass is
used to change the policies enforced by a restricted environment.
Similar to the r_reload() method, but has access to restricted
versions of the standard I/O streams sys.stdin, sys.stderr, and
sys.stdout.
"""
return self.s_apply(self.r_reload, args)
def s_unload(self, *args):
"""Unload the module.
Removes it from the restricted environment's sys.modules dictionary.
This method is implicitly called by code executing in the
restricted environment. Overriding this method in a subclass is
used to change the policies enforced by a restricted environment.
Similar to the r_unload() method, but has access to restricted
versions of the standard I/O streams sys.stdin, sys.stderr, and
sys.stdout.
"""
return self.s_apply(self.r_unload, args)
# Restricted open(...)
def r_open(self, file, mode='r', buf=-1):
"""Method called when open() is called in the restricted environment.
The arguments are identical to those of the open() function, and a
file object (or a class instance compatible with file objects)
should be returned. RExec's default behaviour is allow opening
any file for reading, but forbidding any attempt to write a file.
This method is implicitly called by code executing in the
restricted environment. Overriding this method in a subclass is
used to change the policies enforced by a restricted environment.
"""
mode = str(mode)
if mode not in ('r', 'rb'):
raise IOError, "can't open files for writing in restricted mode"
return open(file, mode, buf)
# Restricted version of sys.exc_info()
def r_exc_info(self):
ty, va, tr = sys.exc_info()
tr = None
return ty, va, tr
def test():
import getopt, traceback
opts, args = getopt.getopt(sys.argv[1:], 'vt:')
verbose = 0
trusted = []
for o, a in opts:
if o == '-v':
verbose = verbose+1
if o == '-t':
trusted.append(a)
r = RExec(verbose=verbose)
if trusted:
r.ok_builtin_modules = r.ok_builtin_modules + tuple(trusted)
if args:
r.modules['sys'].argv = args
r.modules['sys'].path.insert(0, os.path.dirname(args[0]))
else:
r.modules['sys'].path.insert(0, "")
fp = sys.stdin
if args and args[0] != '-':
try:
fp = open(args[0])
except IOError, msg:
print "%s: can't open file %r" % (sys.argv[0], args[0])
return 1
if fp.isatty():
try:
import readline
except ImportError:
pass
import code
class RestrictedConsole(code.InteractiveConsole):
def runcode(self, co):
self.locals['__builtins__'] = r.modules['__builtin__']
r.s_apply(code.InteractiveConsole.runcode, (self, co))
try:
RestrictedConsole(r.modules['__main__'].__dict__).interact()
except SystemExit, n:
return n
else:
text = fp.read()
fp.close()
c = compile(text, fp.name, 'exec')
try:
r.s_exec(c)
except SystemExit, n:
return n
except:
traceback.print_exc()
return 1
if __name__ == '__main__':
sys.exit(test())
"""RFC 2822 message manipulation.
Note: This is only a very rough sketch of a full RFC-822 parser; in particular
the tokenizing of addresses does not adhere to all the quoting rules.
Note: RFC 2822 is a long awaited update to RFC 822. This module should
conform to RFC 2822, and is thus mis-named (it's not worth renaming it). Some
effort at RFC 2822 updates have been made, but a thorough audit has not been
performed. Consider any RFC 2822 non-conformance to be a bug.
RFC 2822: http://www.faqs.org/rfcs/rfc2822.html
RFC 822 : http://www.faqs.org/rfcs/rfc822.html (obsolete)
Directions for use:
To create a Message object: first open a file, e.g.:
fp = open(file, 'r')
You can use any other legal way of getting an open file object, e.g. use
sys.stdin or call os.popen(). Then pass the open file object to the Message()
constructor:
m = Message(fp)
This class can work with any input object that supports a readline method. If
the input object has seek and tell capability, the rewindbody method will
work; also illegal lines will be pushed back onto the input stream. If the
input object lacks seek but has an `unread' method that can push back a line
of input, Message will use that to push back illegal lines. Thus this class
can be used to parse messages coming from a buffered stream.
The optional `seekable' argument is provided as a workaround for certain stdio
libraries in which tell() discards buffered data before discovering that the
lseek() system call doesn't work. For maximum portability, you should set the
seekable argument to zero to prevent that initial \code{tell} when passing in
an unseekable object such as a file object created from a socket object. If
it is 1 on entry -- which it is by default -- the tell() method of the open
file object is called once; if this raises an exception, seekable is reset to
0. For other nonzero values of seekable, this test is not made.
To get the text of a particular header there are several methods:
str = m.getheader(name)
str = m.getrawheader(name)
where name is the name of the header, e.g. 'Subject'. The difference is that
getheader() strips the leading and trailing whitespace, while getrawheader()
doesn't. Both functions retain embedded whitespace (including newlines)
exactly as they are specified in the header, and leave the case of the text
unchanged.
For addresses and address lists there are functions
realname, mailaddress = m.getaddr(name)
list = m.getaddrlist(name)
where the latter returns a list of (realname, mailaddr) tuples.
There is also a method
time = m.getdate(name)
which parses a Date-like field and returns a time-compatible tuple,
i.e. a tuple such as returned by time.localtime() or accepted by
time.mktime().
See the class definition for lower level access methods.
There are also some utility functions here.
"""
# Cleanup and extensions by Eric S. Raymond <[email protected]>
import time
from warnings import warnpy3k
warnpy3k("in 3.x, rfc822 has been removed in favor of the email package",
stacklevel=2)
__all__ = ["Message","AddressList","parsedate","parsedate_tz","mktime_tz"]
_blanklines = ('\r\n', '\n') # Optimization for islast()
class Message:
"""Represents a single RFC 2822-compliant message."""
def __init__(self, fp, seekable = 1):
"""Initialize the class instance and read the headers."""
if seekable == 1:
# Exercise tell() to make sure it works
# (and then assume seek() works, too)
try:
fp.tell()
except (AttributeError, IOError):
seekable = 0
self.fp = fp
self.seekable = seekable
self.startofheaders = None
self.startofbody = None
#
if self.seekable:
try:
self.startofheaders = self.fp.tell()
except IOError:
self.seekable = 0
#
self.readheaders()
#
if self.seekable:
try:
self.startofbody = self.fp.tell()
except IOError:
self.seekable = 0
def rewindbody(self):
"""Rewind the file to the start of the body (if seekable)."""
if not self.seekable:
raise IOError, "unseekable file"
self.fp.seek(self.startofbody)
def readheaders(self):
"""Read header lines.
Read header lines up to the entirely blank line that terminates them.
The (normally blank) line that ends the headers is skipped, but not
included in the returned list. If a non-header line ends the headers,
(which is an error), an attempt is made to backspace over it; it is
never included in the returned list.
The variable self.status is set to the empty string if all went well,
otherwise it is an error message. The variable self.headers is a
completely uninterpreted list of lines contained in the header (so
printing them will reproduce the header exactly as it appears in the
file).
"""
self.dict = {}
self.unixfrom = ''
self.headers = lst = []
self.status = ''
headerseen = ""
firstline = 1
startofline = unread = tell = None
if hasattr(self.fp, 'unread'):
unread = self.fp.unread
elif self.seekable:
tell = self.fp.tell
while 1:
if tell:
try:
startofline = tell()
except IOError:
startofline = tell = None
self.seekable = 0
line = self.fp.readline()
if not line:
self.status = 'EOF in headers'
break
# Skip unix From name time lines
if firstline and line.startswith('From '):
self.unixfrom = self.unixfrom + line
continue
firstline = 0
if headerseen and line[0] in ' \t':
# It's a continuation line.
lst.append(line)
x = (self.dict[headerseen] + "\n " + line.strip())
self.dict[headerseen] = x.strip()
continue
elif self.iscomment(line):
# It's a comment. Ignore it.
continue
elif self.islast(line):
# Note! No pushback here! The delimiter line gets eaten.
break
headerseen = self.isheader(line)
if headerseen:
# It's a legal header line, save it.
lst.append(line)
self.dict[headerseen] = line[len(headerseen)+1:].strip()
continue
elif headerseen is not None:
# An empty header name. These aren't allowed in HTTP, but it's
# probably a benign mistake. Don't add the header, just keep
# going.
continue
else:
# It's not a header line; throw it back and stop here.
if not self.dict:
self.status = 'No headers'
else:
self.status = 'Non-header line where header expected'
# Try to undo the read.
if unread:
unread(line)
elif tell:
self.fp.seek(startofline)
else:
self.status = self.status + '; bad seek'
break
def isheader(self, line):
"""Determine whether a given line is a legal header.
This method should return the header name, suitably canonicalized.
You may override this method in order to use Message parsing on tagged
data in RFC 2822-like formats with special header formats.
"""
i = line.find(':')
if i > -1:
return line[:i].lower()
return None
def islast(self, line):
"""Determine whether a line is a legal end of RFC 2822 headers.
You may override this method if your application wants to bend the
rules, e.g. to strip trailing whitespace, or to recognize MH template
separators ('--------'). For convenience (e.g. for code reading from
sockets) a line consisting of \\r\\n also matches.
"""
return line in _blanklines
def iscomment(self, line):
"""Determine whether a line should be skipped entirely.
You may override this method in order to use Message parsing on tagged
data in RFC 2822-like formats that support embedded comments or
free-text data.
"""
return False
def getallmatchingheaders(self, name):
"""Find all header lines matching a given header name.
Look through the list of headers and find all lines matching a given
header name (and their continuation lines). A list of the lines is
returned, without interpretation. If the header does not occur, an
empty list is returned. If the header occurs multiple times, all
occurrences are returned. Case is not important in the header name.
"""
name = name.lower() + ':'
n = len(name)
lst = []
hit = 0
for line in self.headers:
if line[:n].lower() == name:
hit = 1
elif not line[:1].isspace():
hit = 0
if hit:
lst.append(line)
return lst
def getfirstmatchingheader(self, name):
"""Get the first header line matching name.
This is similar to getallmatchingheaders, but it returns only the
first matching header (and its continuation lines).
"""
name = name.lower() + ':'
n = len(name)
lst = []
hit = 0
for line in self.headers:
if hit:
if not line[:1].isspace():
break
elif line[:n].lower() == name:
hit = 1
if hit:
lst.append(line)
return lst
def getrawheader(self, name):
"""A higher-level interface to getfirstmatchingheader().
Return a string containing the literal text of the header but with the
keyword stripped. All leading, trailing and embedded whitespace is
kept in the string, however. Return None if the header does not
occur.
"""
lst = self.getfirstmatchingheader(name)
if not lst:
return None
lst[0] = lst[0][len(name) + 1:]
return ''.join(lst)
def getheader(self, name, default=None):
"""Get the header value for a name.
This is the normal interface: it returns a stripped version of the
header value for a given header name, or None if it doesn't exist.
This uses the dictionary version which finds the *last* such header.
"""
return self.dict.get(name.lower(), default)
get = getheader
def getheaders(self, name):
"""Get all values for a header.
This returns a list of values for headers given more than once; each
value in the result list is stripped in the same way as the result of
getheader(). If the header is not given, return an empty list.
"""
result = []
current = ''
have_header = 0
for s in self.getallmatchingheaders(name):
if s[0].isspace():
if current:
current = "%s\n %s" % (current, s.strip())
else:
current = s.strip()
else:
if have_header:
result.append(current)
current = s[s.find(":") + 1:].strip()
have_header = 1
if have_header:
result.append(current)
return result
def getaddr(self, name):
"""Get a single address from a header, as a tuple.
An example return value:
('Guido van Rossum', '[email protected]')
"""
# New, by Ben Escoto
alist = self.getaddrlist(name)
if alist:
return alist[0]
else:
return (None, None)
def getaddrlist(self, name):
"""Get a list of addresses from a header.
Retrieves a list of addresses from a header, where each address is a
tuple as returned by getaddr(). Scans all named headers, so it works
properly with multiple To: or Cc: headers for example.
"""
raw = []
for h in self.getallmatchingheaders(name):
if h[0] in ' \t':
raw.append(h)
else:
if raw:
raw.append(', ')
i = h.find(':')
if i > 0:
addr = h[i+1:]
raw.append(addr)
alladdrs = ''.join(raw)
a = AddressList(alladdrs)
return a.addresslist
def getdate(self, name):
"""Retrieve a date field from a header.
Retrieves a date field from the named header, returning a tuple
compatible with time.mktime().
"""
try:
data = self[name]
except KeyError:
return None
return parsedate(data)
def getdate_tz(self, name):
"""Retrieve a date field from a header as a 10-tuple.
The first 9 elements make up a tuple compatible with time.mktime(),
and the 10th is the offset of the poster's time zone from GMT/UTC.
"""
try:
data = self[name]
except KeyError:
return None
return parsedate_tz(data)
# Access as a dictionary (only finds *last* header of each type):
def __len__(self):
"""Get the number of headers in a message."""
return len(self.dict)
def __getitem__(self, name):
"""Get a specific header, as from a dictionary."""
return self.dict[name.lower()]
def __setitem__(self, name, value):
"""Set the value of a header.
Note: This is not a perfect inversion of __getitem__, because any
changed headers get stuck at the end of the raw-headers list rather
than where the altered header was.
"""
del self[name] # Won't fail if it doesn't exist
self.dict[name.lower()] = value
text = name + ": " + value
for line in text.split("\n"):
self.headers.append(line + "\n")
def __delitem__(self, name):
"""Delete all occurrences of a specific header, if it is present."""
name = name.lower()
if not name in self.dict:
return
del self.dict[name]
name = name + ':'
n = len(name)
lst = []
hit = 0
for i in range(len(self.headers)):
line = self.headers[i]
if line[:n].lower() == name:
hit = 1
elif not line[:1].isspace():
hit = 0
if hit:
lst.append(i)
for i in reversed(lst):
del self.headers[i]
def setdefault(self, name, default=""):
lowername = name.lower()
if lowername in self.dict:
return self.dict[lowername]
else:
text = name + ": " + default
for line in text.split("\n"):
self.headers.append(line + "\n")
self.dict[lowername] = default
return default
def has_key(self, name):
"""Determine whether a message contains the named header."""
return name.lower() in self.dict
def __contains__(self, name):
"""Determine whether a message contains the named header."""
return name.lower() in self.dict
def __iter__(self):
return iter(self.dict)
def keys(self):
"""Get all of a message's header field names."""
return self.dict.keys()
def values(self):
"""Get all of a message's header field values."""
return self.dict.values()
def items(self):
"""Get all of a message's headers.
Returns a list of name, value tuples.
"""
return self.dict.items()
def __str__(self):
return ''.join(self.headers)
# Utility functions
# -----------------
# XXX Should fix unquote() and quote() to be really conformant.
# XXX The inverses of the parse functions may also be useful.
def unquote(s):
"""Remove quotes from a string."""
if len(s) > 1:
if s.startswith('"') and s.endswith('"'):
return s[1:-1].replace('\\\\', '\\').replace('\\"', '"')
if s.startswith('<') and s.endswith('>'):
return s[1:-1]
return s
def quote(s):
"""Add quotes around a string."""
return s.replace('\\', '\\\\').replace('"', '\\"')
def parseaddr(address):
"""Parse an address into a (realname, mailaddr) tuple."""
a = AddressList(address)
lst = a.addresslist
if not lst:
return (None, None)
return lst[0]
class AddrlistClass:
"""Address parser class by Ben Escoto.
To understand what this class does, it helps to have a copy of
RFC 2822 in front of you.
http://www.faqs.org/rfcs/rfc2822.html
Note: this class interface is deprecated and may be removed in the future.
Use rfc822.AddressList instead.
"""
def __init__(self, field):
"""Initialize a new instance.
`field' is an unparsed address header field, containing one or more
addresses.
"""
self.specials = '()<>@,:;.\"[]'
self.pos = 0
self.LWS = ' \t'
self.CR = '\r\n'
self.atomends = self.specials + self.LWS + self.CR
# Note that RFC 2822 now specifies `.' as obs-phrase, meaning that it
# is obsolete syntax. RFC 2822 requires that we recognize obsolete
# syntax, so allow dots in phrases.
self.phraseends = self.atomends.replace('.', '')
self.field = field
self.commentlist = []
def gotonext(self):
"""Parse up to the start of the next address."""
while self.pos < len(self.field):
if self.field[self.pos] in self.LWS + '\n\r':
self.pos = self.pos + 1
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
else: break
def getaddrlist(self):
"""Parse all addresses.
Returns a list containing all of the addresses.
"""
result = []
ad = self.getaddress()
while ad:
result += ad
ad = self.getaddress()
return result
def getaddress(self):
"""Parse the next address."""
self.commentlist = []
self.gotonext()
oldpos = self.pos
oldcl = self.commentlist
plist = self.getphraselist()
self.gotonext()
returnlist = []
if self.pos >= len(self.field):
# Bad email address technically, no domain.
if plist:
returnlist = [(' '.join(self.commentlist), plist[0])]
elif self.field[self.pos] in '.@':
# email address is just an addrspec
# this isn't very efficient since we start over
self.pos = oldpos
self.commentlist = oldcl
addrspec = self.getaddrspec()
returnlist = [(' '.join(self.commentlist), addrspec)]
elif self.field[self.pos] == ':':
# address is a group
returnlist = []
fieldlen = len(self.field)
self.pos += 1
while self.pos < len(self.field):
self.gotonext()
if self.pos < fieldlen and self.field[self.pos] == ';':
self.pos += 1
break
returnlist = returnlist + self.getaddress()
elif self.field[self.pos] == '<':
# Address is a phrase then a route addr
routeaddr = self.getrouteaddr()
if self.commentlist:
returnlist = [(' '.join(plist) + ' (' + \
' '.join(self.commentlist) + ')', routeaddr)]
else: returnlist = [(' '.join(plist), routeaddr)]
else:
if plist:
returnlist = [(' '.join(self.commentlist), plist[0])]
elif self.field[self.pos] in self.specials:
self.pos += 1
self.gotonext()
if self.pos < len(self.field) and self.field[self.pos] == ',':
self.pos += 1
return returnlist
def getrouteaddr(self):
"""Parse a route address (Return-path value).
This method just skips all the route stuff and returns the addrspec.
"""
if self.field[self.pos] != '<':
return
expectroute = 0
self.pos += 1
self.gotonext()
adlist = ""
while self.pos < len(self.field):
if expectroute:
self.getdomain()
expectroute = 0
elif self.field[self.pos] == '>':
self.pos += 1
break
elif self.field[self.pos] == '@':
self.pos += 1
expectroute = 1
elif self.field[self.pos] == ':':
self.pos += 1
else:
adlist = self.getaddrspec()
self.pos += 1
break
self.gotonext()
return adlist
def getaddrspec(self):
"""Parse an RFC 2822 addr-spec."""
aslist = []
self.gotonext()
while self.pos < len(self.field):
if self.field[self.pos] == '.':
aslist.append('.')
self.pos += 1
elif self.field[self.pos] == '"':
aslist.append('"%s"' % self.getquote())
elif self.field[self.pos] in self.atomends:
break
else: aslist.append(self.getatom())
self.gotonext()
if self.pos >= len(self.field) or self.field[self.pos] != '@':
return ''.join(aslist)
aslist.append('@')
self.pos += 1
self.gotonext()
return ''.join(aslist) + self.getdomain()
def getdomain(self):
"""Get the complete domain name from an address."""
sdlist = []
while self.pos < len(self.field):
if self.field[self.pos] in self.LWS:
self.pos += 1
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
elif self.field[self.pos] == '[':
sdlist.append(self.getdomainliteral())
elif self.field[self.pos] == '.':
self.pos += 1
sdlist.append('.')
elif self.field[self.pos] in self.atomends:
break
else: sdlist.append(self.getatom())
return ''.join(sdlist)
def getdelimited(self, beginchar, endchars, allowcomments = 1):
"""Parse a header fragment delimited by special characters.
`beginchar' is the start character for the fragment. If self is not
looking at an instance of `beginchar' then getdelimited returns the
empty string.
`endchars' is a sequence of allowable end-delimiting characters.
Parsing stops when one of these is encountered.
If `allowcomments' is non-zero, embedded RFC 2822 comments are allowed
within the parsed fragment.
"""
if self.field[self.pos] != beginchar:
return ''
slist = ['']
quote = 0
self.pos += 1
while self.pos < len(self.field):
if quote == 1:
slist.append(self.field[self.pos])
quote = 0
elif self.field[self.pos] in endchars:
self.pos += 1
break
elif allowcomments and self.field[self.pos] == '(':
slist.append(self.getcomment())
continue # have already advanced pos from getcomment
elif self.field[self.pos] == '\\':
quote = 1
else:
slist.append(self.field[self.pos])
self.pos += 1
return ''.join(slist)
def getquote(self):
"""Get a quote-delimited fragment from self's field."""
return self.getdelimited('"', '"\r', 0)
def getcomment(self):
"""Get a parenthesis-delimited fragment from self's field."""
return self.getdelimited('(', ')\r', 1)
def getdomainliteral(self):
"""Parse an RFC 2822 domain-literal."""
return '[%s]' % self.getdelimited('[', ']\r', 0)
def getatom(self, atomends=None):
"""Parse an RFC 2822 atom.
Optional atomends specifies a different set of end token delimiters
(the default is to use self.atomends). This is used e.g. in
getphraselist() since phrase endings must not include the `.' (which
is legal in phrases)."""
atomlist = ['']
if atomends is None:
atomends = self.atomends
while self.pos < len(self.field):
if self.field[self.pos] in atomends:
break
else: atomlist.append(self.field[self.pos])
self.pos += 1
return ''.join(atomlist)
def getphraselist(self):
"""Parse a sequence of RFC 2822 phrases.
A phrase is a sequence of words, which are in turn either RFC 2822
atoms or quoted-strings. Phrases are canonicalized by squeezing all
runs of continuous whitespace into one space.
"""
plist = []
while self.pos < len(self.field):
if self.field[self.pos] in self.LWS:
self.pos += 1
elif self.field[self.pos] == '"':
plist.append(self.getquote())
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
elif self.field[self.pos] in self.phraseends:
break
else:
plist.append(self.getatom(self.phraseends))
return plist
class AddressList(AddrlistClass):
"""An AddressList encapsulates a list of parsed RFC 2822 addresses."""
def __init__(self, field):
AddrlistClass.__init__(self, field)
if field:
self.addresslist = self.getaddrlist()
else:
self.addresslist = []
def __len__(self):
return len(self.addresslist)
def __str__(self):
return ", ".join(map(dump_address_pair, self.addresslist))
def __add__(self, other):
# Set union
newaddr = AddressList(None)
newaddr.addresslist = self.addresslist[:]
for x in other.addresslist:
if not x in self.addresslist:
newaddr.addresslist.append(x)
return newaddr
def __iadd__(self, other):
# Set union, in-place
for x in other.addresslist:
if not x in self.addresslist:
self.addresslist.append(x)
return self
def __sub__(self, other):
# Set difference
newaddr = AddressList(None)
for x in self.addresslist:
if not x in other.addresslist:
newaddr.addresslist.append(x)
return newaddr
def __isub__(self, other):
# Set difference, in-place
for x in other.addresslist:
if x in self.addresslist:
self.addresslist.remove(x)
return self
def __getitem__(self, index):
# Make indexing, slices, and 'in' work
return self.addresslist[index]
def dump_address_pair(pair):
"""Dump a (name, address) pair in a canonicalized form."""
if pair[0]:
return '"' + pair[0] + '" <' + pair[1] + '>'
else:
return pair[1]
# Parse a date field
_monthnames = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul',
'aug', 'sep', 'oct', 'nov', 'dec',
'january', 'february', 'march', 'april', 'may', 'june', 'july',
'august', 'september', 'october', 'november', 'december']
_daynames = ['mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun']
# The timezone table does not include the military time zones defined
# in RFC822, other than Z. According to RFC1123, the description in
# RFC822 gets the signs wrong, so we can't rely on any such time
# zones. RFC1123 recommends that numeric timezone indicators be used
# instead of timezone names.
_timezones = {'UT':0, 'UTC':0, 'GMT':0, 'Z':0,
'AST': -400, 'ADT': -300, # Atlantic (used in Canada)
'EST': -500, 'EDT': -400, # Eastern
'CST': -600, 'CDT': -500, # Central
'MST': -700, 'MDT': -600, # Mountain
'PST': -800, 'PDT': -700 # Pacific
}
def parsedate_tz(data):
"""Convert a date string to a time tuple.
Accounts for military timezones.
"""
if not data:
return None
data = data.split()
if data[0][-1] in (',', '.') or data[0].lower() in _daynames:
# There's a dayname here. Skip it
del data[0]
else:
# no space after the "weekday,"?
i = data[0].rfind(',')
if i >= 0:
data[0] = data[0][i+1:]
if len(data) == 3: # RFC 850 date, deprecated
stuff = data[0].split('-')
if len(stuff) == 3:
data = stuff + data[1:]
if len(data) == 4:
s = data[3]
i = s.find('+')
if i > 0:
data[3:] = [s[:i], s[i+1:]]
else:
data.append('') # Dummy tz
if len(data) < 5:
return None
data = data[:5]
[dd, mm, yy, tm, tz] = data
mm = mm.lower()
if not mm in _monthnames:
dd, mm = mm, dd.lower()
if not mm in _monthnames:
return None
mm = _monthnames.index(mm)+1
if mm > 12: mm = mm - 12
if dd[-1] == ',':
dd = dd[:-1]
i = yy.find(':')
if i > 0:
yy, tm = tm, yy
if yy[-1] == ',':
yy = yy[:-1]
if not yy[0].isdigit():
yy, tz = tz, yy
if tm[-1] == ',':
tm = tm[:-1]
tm = tm.split(':')
if len(tm) == 2:
[thh, tmm] = tm
tss = '0'
elif len(tm) == 3:
[thh, tmm, tss] = tm
else:
return None
try:
yy = int(yy)
dd = int(dd)
thh = int(thh)
tmm = int(tmm)
tss = int(tss)
except ValueError:
return None
tzoffset = None
tz = tz.upper()
if tz in _timezones:
tzoffset = _timezones[tz]
else:
try:
tzoffset = int(tz)
except ValueError:
pass
# Convert a timezone offset into seconds ; -0500 -> -18000
if tzoffset:
if tzoffset < 0:
tzsign = -1
tzoffset = -tzoffset
else:
tzsign = 1
tzoffset = tzsign * ( (tzoffset//100)*3600 + (tzoffset % 100)*60)
return (yy, mm, dd, thh, tmm, tss, 0, 1, 0, tzoffset)
def parsedate(data):
"""Convert a time string to a time tuple."""
t = parsedate_tz(data)
if t is None:
return t
return t[:9]
def mktime_tz(data):
"""Turn a 10-tuple as returned by parsedate_tz() into a UTC timestamp."""
if data[9] is None:
# No zone info, so localtime is better assumption than GMT
return time.mktime(data[:8] + (-1,))
else:
t = time.mktime(data[:8] + (0,))
return t - data[9] - time.timezone
def formatdate(timeval=None):
"""Returns time format preferred for Internet standards.
Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123
According to RFC 1123, day and month names must always be in
English. If not for that, this code could use strftime(). It
can't because strftime() honors the locale and could generate
non-English names.
"""
if timeval is None:
timeval = time.time()
timeval = time.gmtime(timeval)
return "%s, %02d %s %04d %02d:%02d:%02d GMT" % (
("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")[timeval[6]],
timeval[2],
("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec")[timeval[1]-1],
timeval[0], timeval[3], timeval[4], timeval[5])
# When used as script, run a small test program.
# The first command line argument must be a filename containing one
# message in RFC-822 format.
if __name__ == '__main__':
import sys, os
file = os.path.join(os.environ['HOME'], 'Mail/inbox/1')
if sys.argv[1:]: file = sys.argv[1]
f = open(file, 'r')
m = Message(f)
print 'From:', m.getaddr('from')
print 'To:', m.getaddrlist('to')
print 'Subject:', m.getheader('subject')
print 'Date:', m.getheader('date')
date = m.getdate_tz('date')
tz = date[-1]
date = time.localtime(mktime_tz(date))
if date:
print 'ParsedDate:', time.asctime(date),
hhmmss = tz
hhmm, ss = divmod(hhmmss, 60)
hh, mm = divmod(hhmm, 60)
print "%+03d%02d" % (hh, mm),
if ss: print ".%02d" % ss,
print
else:
print 'ParsedDate:', None
m.rewindbody()
n = 0
while f.readline():
n += 1
print 'Lines:', n
print '-'*70
print 'len =', len(m)
if 'Date' in m: print 'Date =', m['Date']
if 'X-Nonsense' in m: pass
print 'keys =', m.keys()
print 'values =', m.values()
print 'items =', m.items()
"""Word completion for GNU readline.
The completer completes keywords, built-ins and globals in a selectable
namespace (which defaults to __main__); when completing NAME.NAME..., it
evaluates (!) the expression up to the last dot and completes its attributes.
It's very cool to do "import sys" type "sys.", hit the completion key (twice),
and see the list of names defined by the sys module!
Tip: to use the tab key as the completion key, call
readline.parse_and_bind("tab: complete")
Notes:
- Exceptions raised by the completer function are *ignored* (and generally cause
the completion to fail). This is a feature -- since readline sets the tty
device in raw (or cbreak) mode, printing a traceback wouldn't work well
without some complicated hoopla to save, reset and restore the tty state.
- The evaluation of the NAME.NAME... form may cause arbitrary application
defined code to be executed if an object with a __getattr__ hook is found.
Since it is the responsibility of the application (or the user) to enable this
feature, I consider this an acceptable risk. More complicated expressions
(e.g. function calls or indexing operations) are *not* evaluated.
- GNU readline is also used by the built-in functions input() and
raw_input(), and thus these also benefit/suffer from the completer
features. Clearly an interactive application can benefit by
specifying its own completer function and using raw_input() for all
its input.
- When the original stdin is not a tty device, GNU readline is never
used, and this module (and the readline module) are silently inactive.
"""
import __builtin__
import __main__
__all__ = ["Completer"]
class Completer:
def __init__(self, namespace = None):
"""Create a new completer for the command line.
Completer([namespace]) -> completer instance.
If unspecified, the default namespace where completions are performed
is __main__ (technically, __main__.__dict__). Namespaces should be
given as dictionaries.
Completer instances should be used as the completion mechanism of
readline via the set_completer() call:
readline.set_completer(Completer(my_namespace).complete)
"""
if namespace and not isinstance(namespace, dict):
raise TypeError,'namespace must be a dictionary'
# Don't bind to namespace quite yet, but flag whether the user wants a
# specific namespace or to use __main__.__dict__. This will allow us
# to bind to __main__.__dict__ at completion time, not now.
if namespace is None:
self.use_main_ns = 1
else:
self.use_main_ns = 0
self.namespace = namespace
def complete(self, text, state):
"""Return the next possible completion for 'text'.
This is called successively with state == 0, 1, 2, ... until it
returns None. The completion should begin with 'text'.
"""
if self.use_main_ns:
self.namespace = __main__.__dict__
if state == 0:
if "." in text:
self.matches = self.attr_matches(text)
else:
self.matches = self.global_matches(text)
try:
return self.matches[state]
except IndexError:
return None
def _callable_postfix(self, val, word):
if hasattr(val, '__call__'):
word = word + "("
return word
def global_matches(self, text):
"""Compute matches when text is a simple name.
Return a list of all keywords, built-in functions and names currently
defined in self.namespace that match.
"""
import keyword
matches = []
seen = {"__builtins__"}
n = len(text)
for word in keyword.kwlist:
if word[:n] == text:
seen.add(word)
matches.append(word)
for nspace in [self.namespace, __builtin__.__dict__]:
for word, val in nspace.items():
if word[:n] == text and word not in seen:
seen.add(word)
matches.append(self._callable_postfix(val, word))
return matches
def attr_matches(self, text):
"""Compute matches when text contains a dot.
Assuming the text is of the form NAME.NAME....[NAME], and is
evaluable in self.namespace, it will be evaluated and its attributes
(as revealed by dir()) are used as possible completions. (For class
instances, class members are also considered.)
WARNING: this can still invoke arbitrary C code, if an object
with a __getattr__ hook is evaluated.
"""
import re
m = re.match(r"(\w+(\.\w+)*)\.(\w*)", text)
if not m:
return []
expr, attr = m.group(1, 3)
try:
thisobject = eval(expr, self.namespace)
except Exception:
return []
# get the content of the object, except __builtins__
words = set(dir(thisobject))
words.discard("__builtins__")
if hasattr(thisobject, '__class__'):
words.add('__class__')
words.update(get_class_members(thisobject.__class__))
matches = []
n = len(attr)
for word in words:
if word[:n] == attr:
try:
val = getattr(thisobject, word)
except Exception:
continue # Exclude properties that are not set
word = self._callable_postfix(val, "%s.%s" % (expr, word))
matches.append(word)
matches.sort()
return matches
def get_class_members(klass):
ret = dir(klass)
if hasattr(klass,'__bases__'):
for base in klass.__bases__:
ret = ret + get_class_members(base)
return ret
try:
import readline
except ImportError:
pass
else:
readline.set_completer(Completer().complete)
""" robotparser.py
Copyright (C) 2000 Bastian Kleineidam
You can choose between two licenses when using this package:
1) GNU GPLv2
2) PSF license for Python 2.2
The robots.txt Exclusion Protocol is implemented as specified in
http://www.robotstxt.org/norobots-rfc.txt
"""
import urlparse
import urllib
__all__ = ["RobotFileParser"]
class RobotFileParser:
""" This class provides a set of methods to read, parse and answer
questions about a single robots.txt file.
"""
def __init__(self, url=''):
self.entries = []
self.default_entry = None
self.disallow_all = False
self.allow_all = False
self.set_url(url)
self.last_checked = 0
def mtime(self):
"""Returns the time the robots.txt file was last fetched.
This is useful for long-running web spiders that need to
check for new robots.txt files periodically.
"""
return self.last_checked
def modified(self):
"""Sets the time the robots.txt file was last fetched to the
current time.
"""
import time
self.last_checked = time.time()
def set_url(self, url):
"""Sets the URL referring to a robots.txt file."""
self.url = url
self.host, self.path = urlparse.urlparse(url)[1:3]
def read(self):
"""Reads the robots.txt URL and feeds it to the parser."""
opener = URLopener()
f = opener.open(self.url)
lines = [line.strip() for line in f]
f.close()
self.errcode = opener.errcode
if self.errcode in (401, 403):
self.disallow_all = True
elif self.errcode >= 400 and self.errcode < 500:
self.allow_all = True
elif self.errcode == 200 and lines:
self.parse(lines)
def _add_entry(self, entry):
if "*" in entry.useragents:
# the default entry is considered last
if self.default_entry is None:
# the first default entry wins
self.default_entry = entry
else:
self.entries.append(entry)
def parse(self, lines):
"""parse the input lines from a robots.txt file.
We allow that a user-agent: line is not preceded by
one or more blank lines."""
# states:
# 0: start state
# 1: saw user-agent line
# 2: saw an allow or disallow line
state = 0
linenumber = 0
entry = Entry()
self.modified()
for line in lines:
linenumber += 1
if not line:
if state == 1:
entry = Entry()
state = 0
elif state == 2:
self._add_entry(entry)
entry = Entry()
state = 0
# remove optional comment and strip line
i = line.find('#')
if i >= 0:
line = line[:i]
line = line.strip()
if not line:
continue
line = line.split(':', 1)
if len(line) == 2:
line[0] = line[0].strip().lower()
line[1] = urllib.unquote(line[1].strip())
if line[0] == "user-agent":
if state == 2:
self._add_entry(entry)
entry = Entry()
entry.useragents.append(line[1])
state = 1
elif line[0] == "disallow":
if state != 0:
entry.rulelines.append(RuleLine(line[1], False))
state = 2
elif line[0] == "allow":
if state != 0:
entry.rulelines.append(RuleLine(line[1], True))
state = 2
if state == 2:
self._add_entry(entry)
def can_fetch(self, useragent, url):
"""using the parsed robots.txt decide if useragent can fetch url"""
if self.disallow_all:
return False
if self.allow_all:
return True
# Until the robots.txt file has been read or found not
# to exist, we must assume that no url is allowable.
# This prevents false positives when a user erroneously
# calls can_fetch() before calling read().
if not self.last_checked:
return False
# search for given user agent matches
# the first match counts
parsed_url = urlparse.urlparse(urllib.unquote(url))
url = urlparse.urlunparse(('', '', parsed_url.path,
parsed_url.params, parsed_url.query, parsed_url.fragment))
url = urllib.quote(url)
if not url:
url = "/"
for entry in self.entries:
if entry.applies_to(useragent):
return entry.allowance(url)
# try the default entry last
if self.default_entry:
return self.default_entry.allowance(url)
# agent not found ==> access granted
return True
def __str__(self):
entries = self.entries
if self.default_entry is not None:
entries = entries + [self.default_entry]
return '\n'.join(map(str, entries)) + '\n'
class RuleLine:
"""A rule line is a single "Allow:" (allowance==True) or "Disallow:"
(allowance==False) followed by a path."""
def __init__(self, path, allowance):
if path == '' and not allowance:
# an empty value means allow all
allowance = True
path = urlparse.urlunparse(urlparse.urlparse(path))
self.path = urllib.quote(path)
self.allowance = allowance
def applies_to(self, filename):
return self.path == "*" or filename.startswith(self.path)
def __str__(self):
return (self.allowance and "Allow" or "Disallow") + ": " + self.path
class Entry:
"""An entry has one or more user-agents and zero or more rulelines"""
def __init__(self):
self.useragents = []
self.rulelines = []
def __str__(self):
ret = []
for agent in self.useragents:
ret.extend(["User-agent: ", agent, "\n"])
for line in self.rulelines:
ret.extend([str(line), "\n"])
return ''.join(ret)
def applies_to(self, useragent):
"""check if this entry applies to the specified agent"""
# split the name token and make it lower case
useragent = useragent.split("/")[0].lower()
for agent in self.useragents:
if agent == '*':
# we have the catch-all agent
return True
agent = agent.lower()
if agent in useragent:
return True
return False
def allowance(self, filename):
"""Preconditions:
- our agent applies to this entry
- filename is URL decoded"""
for line in self.rulelines:
if line.applies_to(filename):
return line.allowance
return True
class URLopener(urllib.FancyURLopener):
def __init__(self, *args):
urllib.FancyURLopener.__init__(self, *args)
self.errcode = 200
def prompt_user_passwd(self, host, realm):
## If robots.txt file is accessible only with a password,
## we act as if the file wasn't there.
return None, None
def http_error_default(self, url, fp, errcode, errmsg, headers):
self.errcode = errcode
return urllib.FancyURLopener.http_error_default(self, url, fp, errcode,
errmsg, headers)
"""runpy.py - locating and running Python code using the module namespace
Provides support for locating and running Python scripts using the Python
module namespace instead of the native filesystem.
This allows Python code to play nicely with non-filesystem based PEP 302
importers when locating support scripts as well as when importing modules.
"""
# Written by Nick Coghlan <ncoghlan at gmail.com>
# to implement PEP 338 (Executing Modules as Scripts)
import sys
import imp
from pkgutil import read_code
try:
from imp import get_loader
except ImportError:
from pkgutil import get_loader
__all__ = [
"run_module", "run_path",
]
class _TempModule(object):
"""Temporarily replace a module in sys.modules with an empty namespace"""
def __init__(self, mod_name):
self.mod_name = mod_name
self.module = imp.new_module(mod_name)
self._saved_module = []
def __enter__(self):
mod_name = self.mod_name
try:
self._saved_module.append(sys.modules[mod_name])
except KeyError:
pass
sys.modules[mod_name] = self.module
return self
def __exit__(self, *args):
if self._saved_module:
sys.modules[self.mod_name] = self._saved_module[0]
else:
del sys.modules[self.mod_name]
self._saved_module = []
class _ModifiedArgv0(object):
def __init__(self, value):
self.value = value
self._saved_value = self._sentinel = object()
def __enter__(self):
if self._saved_value is not self._sentinel:
raise RuntimeError("Already preserving saved value")
self._saved_value = sys.argv[0]
sys.argv[0] = self.value
def __exit__(self, *args):
self.value = self._sentinel
sys.argv[0] = self._saved_value
def _run_code(code, run_globals, init_globals=None,
mod_name=None, mod_fname=None,
mod_loader=None, pkg_name=None):
"""Helper to run code in nominated namespace"""
if init_globals is not None:
run_globals.update(init_globals)
run_globals.update(__name__ = mod_name,
__file__ = mod_fname,
__loader__ = mod_loader,
__package__ = pkg_name)
exec code in run_globals
return run_globals
def _run_module_code(code, init_globals=None,
mod_name=None, mod_fname=None,
mod_loader=None, pkg_name=None):
"""Helper to run code in new namespace with sys modified"""
with _TempModule(mod_name) as temp_module, _ModifiedArgv0(mod_fname):
mod_globals = temp_module.module.__dict__
_run_code(code, mod_globals, init_globals,
mod_name, mod_fname, mod_loader, pkg_name)
# Copy the globals of the temporary module, as they
# may be cleared when the temporary module goes away
return mod_globals.copy()
# This helper is needed due to a missing component in the PEP 302
# loader protocol (specifically, "get_filename" is non-standard)
# Since we can't introduce new features in maintenance releases,
# support was added to zipimporter under the name '_get_filename'
def _get_filename(loader, mod_name):
for attr in ("get_filename", "_get_filename"):
meth = getattr(loader, attr, None)
if meth is not None:
return meth(mod_name)
return None
# Helper to get the loader, code and filename for a module
def _get_module_details(mod_name, error=ImportError):
try:
loader = get_loader(mod_name)
if loader is None:
raise error("No module named %s" % mod_name)
ispkg = loader.is_package(mod_name)
except ImportError as e:
raise error(format(e))
if ispkg:
if mod_name == "__main__" or mod_name.endswith(".__main__"):
raise error("Cannot use package as __main__ module")
__import__(mod_name) # Do not catch exceptions initializing package
try:
pkg_main_name = mod_name + ".__main__"
return _get_module_details(pkg_main_name)
except ImportError, e:
raise error(("%s; %r is a package and cannot " +
"be directly executed") %(e, mod_name))
try:
code = loader.get_code(mod_name)
except ImportError as e:
raise error(format(e))
if code is None:
raise error("No code object available for %s" % mod_name)
filename = _get_filename(loader, mod_name)
return mod_name, loader, code, filename
def _get_main_module_details(error=ImportError):
# Helper that gives a nicer error message when attempting to
# execute a zipfile or directory by invoking __main__.py
main_name = "__main__"
try:
return _get_module_details(main_name)
except ImportError as exc:
if main_name in str(exc):
raise error("can't find %r module in %r" %
(main_name, sys.path[0]))
raise
class _Error(Exception):
"""Error that _run_module_as_main() should report without a traceback"""
# This function is the actual implementation of the -m switch and direct
# execution of zipfiles and directories and is deliberately kept private.
# This avoids a repeat of the situation where run_module() no longer met the
# needs of mainmodule.c, but couldn't be changed because it was public
def _run_module_as_main(mod_name, alter_argv=True):
"""Runs the designated module in the __main__ namespace
Note that the executed module will have full access to the
__main__ namespace. If this is not desirable, the run_module()
function should be used to run the module code in a fresh namespace.
At the very least, these variables in __main__ will be overwritten:
__name__
__file__
__loader__
__package__
"""
try:
if alter_argv or mod_name != "__main__": # i.e. -m switch
mod_name, loader, code, fname = _get_module_details(
mod_name, _Error)
else: # i.e. directory or zipfile execution
mod_name, loader, code, fname = _get_main_module_details(_Error)
except _Error as exc:
msg = "%s: %s" % (sys.executable, exc)
sys.exit(msg)
pkg_name = mod_name.rpartition('.')[0]
main_globals = sys.modules["__main__"].__dict__
if alter_argv:
sys.argv[0] = fname
return _run_code(code, main_globals, None,
"__main__", fname, loader, pkg_name)
def run_module(mod_name, init_globals=None,
run_name=None, alter_sys=False):
"""Execute a module's code without importing it
Returns the resulting top level namespace dictionary
"""
mod_name, loader, code, fname = _get_module_details(mod_name)
if run_name is None:
run_name = mod_name
pkg_name = mod_name.rpartition('.')[0]
if alter_sys:
return _run_module_code(code, init_globals, run_name,
fname, loader, pkg_name)
else:
# Leave the sys module alone
return _run_code(code, {}, init_globals, run_name,
fname, loader, pkg_name)
# XXX (ncoghlan): Perhaps expose the C API function
# as imp.get_importer instead of reimplementing it in Python?
def _get_importer(path_name):
"""Python version of PyImport_GetImporter C API function"""
cache = sys.path_importer_cache
try:
importer = cache[path_name]
except KeyError:
# Not yet cached. Flag as using the
# standard machinery until we finish
# checking the hooks
cache[path_name] = None
for hook in sys.path_hooks:
try:
importer = hook(path_name)
break
except ImportError:
pass
else:
# The following check looks a bit odd. The trick is that
# NullImporter raises ImportError if the supplied path is a
# *valid* directory entry (and hence able to be handled
# by the standard import machinery)
try:
importer = imp.NullImporter(path_name)
except ImportError:
return None
cache[path_name] = importer
return importer
def _get_code_from_file(fname):
# Check for a compiled file first
with open(fname, "rb") as f:
code = read_code(f)
if code is None:
# That didn't work, so try it as normal source code
with open(fname, "rU") as f:
code = compile(f.read(), fname, 'exec')
return code
def run_path(path_name, init_globals=None, run_name=None):
"""Execute code located at the specified filesystem location
Returns the resulting top level namespace dictionary
The file path may refer directly to a Python script (i.e.
one that could be directly executed with execfile) or else
it may refer to a zipfile or directory containing a top
level __main__.py script.
"""
if run_name is None:
run_name = "<run_path>"
importer = _get_importer(path_name)
if isinstance(importer, imp.NullImporter):
# Not a valid sys.path entry, so run the code directly
# execfile() doesn't help as we want to allow compiled files
code = _get_code_from_file(path_name)
return _run_module_code(code, init_globals, run_name, path_name)
else:
# Importer is defined for path, so add it to
# the start of sys.path
sys.path.insert(0, path_name)
try:
# Here's where things are a little different from the run_module
# case. There, we only had to replace the module in sys while the
# code was running and doing so was somewhat optional. Here, we
# have no choice and we have to remove it even while we read the
# code. If we don't do this, a __loader__ attribute in the
# existing __main__ module may prevent location of the new module.
main_name = "__main__"
saved_main = sys.modules[main_name]
del sys.modules[main_name]
try:
mod_name, loader, code, fname = _get_main_module_details()
finally:
sys.modules[main_name] = saved_main
pkg_name = ""
with _TempModule(run_name) as temp_module, \
_ModifiedArgv0(path_name):
mod_globals = temp_module.module.__dict__
return _run_code(code, mod_globals, init_globals,
run_name, fname, loader, pkg_name).copy()
finally:
try:
sys.path.remove(path_name)
except ValueError:
pass
if __name__ == "__main__":
# Run the module specified as the next command line argument
if len(sys.argv) < 2:
print >> sys.stderr, "No module specified for execution"
else:
del sys.argv[0] # Make the requested module sys.argv[0]
_run_module_as_main(sys.argv[0])
"""A generally useful event scheduler class.
Each instance of this class manages its own queue.
No multi-threading is implied; you are supposed to hack that
yourself, or use a single instance per application.
Each instance is parametrized with two functions, one that is
supposed to return the current time, one that is supposed to
implement a delay. You can implement real-time scheduling by
substituting time and sleep from built-in module time, or you can
implement simulated time by writing your own functions. This can
also be used to integrate scheduling with STDWIN events; the delay
function is allowed to modify the queue. Time can be expressed as
integers or floating point numbers, as long as it is consistent.
Events are specified by tuples (time, priority, action, argument).
As in UNIX, lower priority numbers mean higher priority; in this
way the queue can be maintained as a priority queue. Execution of the
event means calling the action function, passing it the argument
sequence in "argument" (remember that in Python, multiple function
arguments are be packed in a sequence).
The action function may be an instance method so it
has another way to reference private data (besides global variables).
"""
# XXX The timefunc and delayfunc should have been defined as methods
# XXX so you can define new kinds of schedulers using subclassing
# XXX instead of having to define a module or class just to hold
# XXX the global state of your particular time and delay functions.
import heapq
from collections import namedtuple
__all__ = ["scheduler"]
Event = namedtuple('Event', 'time, priority, action, argument')
class scheduler:
def __init__(self, timefunc, delayfunc):
"""Initialize a new instance, passing the time and delay
functions"""
self._queue = []
self.timefunc = timefunc
self.delayfunc = delayfunc
def enterabs(self, time, priority, action, argument):
"""Enter a new event in the queue at an absolute time.
Returns an ID for the event which can be used to remove it,
if necessary.
"""
event = Event(time, priority, action, argument)
heapq.heappush(self._queue, event)
return event # The ID
def enter(self, delay, priority, action, argument):
"""A variant that specifies the time as a relative time.
This is actually the more commonly used interface.
"""
time = self.timefunc() + delay
return self.enterabs(time, priority, action, argument)
def cancel(self, event):
"""Remove an event from the queue.
This must be presented the ID as returned by enter().
If the event is not in the queue, this raises ValueError.
"""
self._queue.remove(event)
heapq.heapify(self._queue)
def empty(self):
"""Check whether the queue is empty."""
return not self._queue
def run(self):
"""Execute events until the queue is empty.
When there is a positive delay until the first event, the
delay function is called and the event is left in the queue;
otherwise, the event is removed from the queue and executed
(its action function is called, passing it the argument). If
the delay function returns prematurely, it is simply
restarted.
It is legal for both the delay function and the action
function to modify the queue or to raise an exception;
exceptions are not caught but the scheduler's state remains
well-defined so run() may be called again.
A questionable hack is added to allow other threads to run:
just after an event is executed, a delay of 0 is executed, to
avoid monopolizing the CPU when other threads are also
runnable.
"""
# localize variable access to minimize overhead
# and to improve thread safety
q = self._queue
delayfunc = self.delayfunc
timefunc = self.timefunc
pop = heapq.heappop
while q:
time, priority, action, argument = checked_event = q[0]
now = timefunc()
if now < time:
delayfunc(time - now)
else:
event = pop(q)
# Verify that the event was not removed or altered
# by another thread after we last looked at q[0].
if event is checked_event:
action(*argument)
delayfunc(0) # Let other threads run
else:
heapq.heappush(q, event)
@property
def queue(self):
"""An ordered list of upcoming events.
Events are named tuples with fields for:
time, priority, action, arguments
"""
# Use heapq to sort the queue rather than using 'sorted(self._queue)'.
# With heapq, two events scheduled at the same time will show in
# the actual order they would be retrieved.
events = self._queue[:]
return map(heapq.heappop, [events]*len(events))
"""Classes to represent arbitrary sets (including sets of sets).
This module implements sets using dictionaries whose values are
ignored. The usual operations (union, intersection, deletion, etc.)
are provided as both methods and operators.
Important: sets are not sequences! While they support 'x in s',
'len(s)', and 'for x in s', none of those operations are unique for
sequences; for example, mappings support all three as well. The
characteristic operation for sequences is subscripting with small
integers: s[i], for i in range(len(s)). Sets don't support
subscripting at all. Also, sequences allow multiple occurrences and
their elements have a definite order; sets on the other hand don't
record multiple occurrences and don't remember the order of element
insertion (which is why they don't support s[i]).
The following classes are provided:
BaseSet -- All the operations common to both mutable and immutable
sets. This is an abstract class, not meant to be directly
instantiated.
Set -- Mutable sets, subclass of BaseSet; not hashable.
ImmutableSet -- Immutable sets, subclass of BaseSet; hashable.
An iterable argument is mandatory to create an ImmutableSet.
_TemporarilyImmutableSet -- A wrapper around a Set, hashable,
giving the same hash value as the immutable set equivalent
would have. Do not use this class directly.
Only hashable objects can be added to a Set. In particular, you cannot
really add a Set as an element to another Set; if you try, what is
actually added is an ImmutableSet built from it (it compares equal to
the one you tried adding).
When you ask if `x in y' where x is a Set and y is a Set or
ImmutableSet, x is wrapped into a _TemporarilyImmutableSet z, and
what's tested is actually `z in y'.
"""
# Code history:
#
# - Greg V. Wilson wrote the first version, using a different approach
# to the mutable/immutable problem, and inheriting from dict.
#
# - Alex Martelli modified Greg's version to implement the current
# Set/ImmutableSet approach, and make the data an attribute.
#
# - Guido van Rossum rewrote much of the code, made some API changes,
# and cleaned up the docstrings.
#
# - Raymond Hettinger added a number of speedups and other
# improvements.
from itertools import ifilter, ifilterfalse
__all__ = ['BaseSet', 'Set', 'ImmutableSet']
import warnings
warnings.warn("the sets module is deprecated", DeprecationWarning,
stacklevel=2)
class BaseSet(object):
"""Common base class for mutable and immutable sets."""
__slots__ = ['_data']
# Constructor
def __init__(self):
"""This is an abstract class."""
# Don't call this from a concrete subclass!
if self.__class__ is BaseSet:
raise TypeError, ("BaseSet is an abstract class. "
"Use Set or ImmutableSet.")
# Standard protocols: __len__, __repr__, __str__, __iter__
def __len__(self):
"""Return the number of elements of a set."""
return len(self._data)
def __repr__(self):
"""Return string representation of a set.
This looks like 'Set([<list of elements>])'.
"""
return self._repr()
# __str__ is the same as __repr__
__str__ = __repr__
def _repr(self, sorted=False):
elements = self._data.keys()
if sorted:
elements.sort()
return '%s(%r)' % (self.__class__.__name__, elements)
def __iter__(self):
"""Return an iterator over the elements or a set.
This is the keys iterator for the underlying dict.
"""
return self._data.iterkeys()
# Three-way comparison is not supported. However, because __eq__ is
# tried before __cmp__, if Set x == Set y, x.__eq__(y) returns True and
# then cmp(x, y) returns 0 (Python doesn't actually call __cmp__ in this
# case).
def __cmp__(self, other):
raise TypeError, "can't compare sets using cmp()"
# Equality comparisons using the underlying dicts. Mixed-type comparisons
# are allowed here, where Set == z for non-Set z always returns False,
# and Set != z always True. This allows expressions like "x in y" to
# give the expected result when y is a sequence of mixed types, not
# raising a pointless TypeError just because y contains a Set, or x is
# a Set and y contain's a non-set ("in" invokes only __eq__).
# Subtle: it would be nicer if __eq__ and __ne__ could return
# NotImplemented instead of True or False. Then the other comparand
# would get a chance to determine the result, and if the other comparand
# also returned NotImplemented then it would fall back to object address
# comparison (which would always return False for __eq__ and always
# True for __ne__). However, that doesn't work, because this type
# *also* implements __cmp__: if, e.g., __eq__ returns NotImplemented,
# Python tries __cmp__ next, and the __cmp__ here then raises TypeError.
def __eq__(self, other):
if isinstance(other, BaseSet):
return self._data == other._data
else:
return False
def __ne__(self, other):
if isinstance(other, BaseSet):
return self._data != other._data
else:
return True
# Copying operations
def copy(self):
"""Return a shallow copy of a set."""
result = self.__class__()
result._data.update(self._data)
return result
__copy__ = copy # For the copy module
def __deepcopy__(self, memo):
"""Return a deep copy of a set; used by copy module."""
# This pre-creates the result and inserts it in the memo
# early, in case the deep copy recurses into another reference
# to this same set. A set can't be an element of itself, but
# it can certainly contain an object that has a reference to
# itself.
from copy import deepcopy
result = self.__class__()
memo[id(self)] = result
data = result._data
value = True
for elt in self:
data[deepcopy(elt, memo)] = value
return result
# Standard set operations: union, intersection, both differences.
# Each has an operator version (e.g. __or__, invoked with |) and a
# method version (e.g. union).
# Subtle: Each pair requires distinct code so that the outcome is
# correct when the type of other isn't suitable. For example, if
# we did "union = __or__" instead, then Set().union(3) would return
# NotImplemented instead of raising TypeError (albeit that *why* it
# raises TypeError as-is is also a bit subtle).
def __or__(self, other):
"""Return the union of two sets as a new set.
(I.e. all elements that are in either set.)
"""
if not isinstance(other, BaseSet):
return NotImplemented
return self.union(other)
def union(self, other):
"""Return the union of two sets as a new set.
(I.e. all elements that are in either set.)
"""
result = self.__class__(self)
result._update(other)
return result
def __and__(self, other):
"""Return the intersection of two sets as a new set.
(I.e. all elements that are in both sets.)
"""
if not isinstance(other, BaseSet):
return NotImplemented
return self.intersection(other)
def intersection(self, other):
"""Return the intersection of two sets as a new set.
(I.e. all elements that are in both sets.)
"""
if not isinstance(other, BaseSet):
other = Set(other)
if len(self) <= len(other):
little, big = self, other
else:
little, big = other, self
common = ifilter(big._data.__contains__, little)
return self.__class__(common)
def __xor__(self, other):
"""Return the symmetric difference of two sets as a new set.
(I.e. all elements that are in exactly one of the sets.)
"""
if not isinstance(other, BaseSet):
return NotImplemented
return self.symmetric_difference(other)
def symmetric_difference(self, other):
"""Return the symmetric difference of two sets as a new set.
(I.e. all elements that are in exactly one of the sets.)
"""
result = self.__class__()
data = result._data
value = True
selfdata = self._data
try:
otherdata = other._data
except AttributeError:
otherdata = Set(other)._data
for elt in ifilterfalse(otherdata.__contains__, selfdata):
data[elt] = value
for elt in ifilterfalse(selfdata.__contains__, otherdata):
data[elt] = value
return result
def __sub__(self, other):
"""Return the difference of two sets as a new Set.
(I.e. all elements that are in this set and not in the other.)
"""
if not isinstance(other, BaseSet):
return NotImplemented
return self.difference(other)
def difference(self, other):
"""Return the difference of two sets as a new Set.
(I.e. all elements that are in this set and not in the other.)
"""
result = self.__class__()
data = result._data
try:
otherdata = other._data
except AttributeError:
otherdata = Set(other)._data
value = True
for elt in ifilterfalse(otherdata.__contains__, self):
data[elt] = value
return result
# Membership test
def __contains__(self, element):
"""Report whether an element is a member of a set.
(Called in response to the expression `element in self'.)
"""
try:
return element in self._data
except TypeError:
transform = getattr(element, "__as_temporarily_immutable__", None)
if transform is None:
raise # re-raise the TypeError exception we caught
return transform() in self._data
# Subset and superset test
def issubset(self, other):
"""Report whether another set contains this set."""
self._binary_sanity_check(other)
if len(self) > len(other): # Fast check for obvious cases
return False
for elt in ifilterfalse(other._data.__contains__, self):
return False
return True
def issuperset(self, other):
"""Report whether this set contains another set."""
self._binary_sanity_check(other)
if len(self) < len(other): # Fast check for obvious cases
return False
for elt in ifilterfalse(self._data.__contains__, other):
return False
return True
# Inequality comparisons using the is-subset relation.
__le__ = issubset
__ge__ = issuperset
def __lt__(self, other):
self._binary_sanity_check(other)
return len(self) < len(other) and self.issubset(other)
def __gt__(self, other):
self._binary_sanity_check(other)
return len(self) > len(other) and self.issuperset(other)
# We inherit object.__hash__, so we must deny this explicitly
__hash__ = None
# Assorted helpers
def _binary_sanity_check(self, other):
# Check that the other argument to a binary operation is also
# a set, raising a TypeError otherwise.
if not isinstance(other, BaseSet):
raise TypeError, "Binary operation only permitted between sets"
def _compute_hash(self):
# Calculate hash code for a set by xor'ing the hash codes of
# the elements. This ensures that the hash code does not depend
# on the order in which elements are added to the set. This is
# not called __hash__ because a BaseSet should not be hashable;
# only an ImmutableSet is hashable.
result = 0
for elt in self:
result ^= hash(elt)
return result
def _update(self, iterable):
# The main loop for update() and the subclass __init__() methods.
data = self._data
# Use the fast update() method when a dictionary is available.
if isinstance(iterable, BaseSet):
data.update(iterable._data)
return
value = True
if type(iterable) in (list, tuple, xrange):
# Optimized: we know that __iter__() and next() can't
# raise TypeError, so we can move 'try:' out of the loop.
it = iter(iterable)
while True:
try:
for element in it:
data[element] = value
return
except TypeError:
transform = getattr(element, "__as_immutable__", None)
if transform is None:
raise # re-raise the TypeError exception we caught
data[transform()] = value
else:
# Safe: only catch TypeError where intended
for element in iterable:
try:
data[element] = value
except TypeError:
transform = getattr(element, "__as_immutable__", None)
if transform is None:
raise # re-raise the TypeError exception we caught
data[transform()] = value
class ImmutableSet(BaseSet):
"""Immutable set class."""
__slots__ = ['_hashcode']
# BaseSet + hashing
def __init__(self, iterable=None):
"""Construct an immutable set from an optional iterable."""
self._hashcode = None
self._data = {}
if iterable is not None:
self._update(iterable)
def __hash__(self):
if self._hashcode is None:
self._hashcode = self._compute_hash()
return self._hashcode
def __getstate__(self):
return self._data, self._hashcode
def __setstate__(self, state):
self._data, self._hashcode = state
class Set(BaseSet):
""" Mutable set class."""
__slots__ = []
# BaseSet + operations requiring mutability; no hashing
def __init__(self, iterable=None):
"""Construct a set from an optional iterable."""
self._data = {}
if iterable is not None:
self._update(iterable)
def __getstate__(self):
# getstate's results are ignored if it is not
return self._data,
def __setstate__(self, data):
self._data, = data
# In-place union, intersection, differences.
# Subtle: The xyz_update() functions deliberately return None,
# as do all mutating operations on built-in container types.
# The __xyz__ spellings have to return self, though.
def __ior__(self, other):
"""Update a set with the union of itself and another."""
self._binary_sanity_check(other)
self._data.update(other._data)
return self
def union_update(self, other):
"""Update a set with the union of itself and another."""
self._update(other)
def __iand__(self, other):
"""Update a set with the intersection of itself and another."""
self._binary_sanity_check(other)
self._data = (self & other)._data
return self
def intersection_update(self, other):
"""Update a set with the intersection of itself and another."""
if isinstance(other, BaseSet):
self &= other
else:
self._data = (self.intersection(other))._data
def __ixor__(self, other):
"""Update a set with the symmetric difference of itself and another."""
self._binary_sanity_check(other)
self.symmetric_difference_update(other)
return self
def symmetric_difference_update(self, other):
"""Update a set with the symmetric difference of itself and another."""
data = self._data
value = True
if not isinstance(other, BaseSet):
other = Set(other)
if self is other:
self.clear()
for elt in other:
if elt in data:
del data[elt]
else:
data[elt] = value
def __isub__(self, other):
"""Remove all elements of another set from this set."""
self._binary_sanity_check(other)
self.difference_update(other)
return self
def difference_update(self, other):
"""Remove all elements of another set from this set."""
data = self._data
if not isinstance(other, BaseSet):
other = Set(other)
if self is other:
self.clear()
for elt in ifilter(data.__contains__, other):
del data[elt]
# Python dict-like mass mutations: update, clear
def update(self, iterable):
"""Add all values from an iterable (such as a list or file)."""
self._update(iterable)
def clear(self):
"""Remove all elements from this set."""
self._data.clear()
# Single-element mutations: add, remove, discard
def add(self, element):
"""Add an element to a set.
This has no effect if the element is already present.
"""
try:
self._data[element] = True
except TypeError:
transform = getattr(element, "__as_immutable__", None)
if transform is None:
raise # re-raise the TypeError exception we caught
self._data[transform()] = True
def remove(self, element):
"""Remove an element from a set; it must be a member.
If the element is not a member, raise a KeyError.
"""
try:
del self._data[element]
except TypeError:
transform = getattr(element, "__as_temporarily_immutable__", None)
if transform is None:
raise # re-raise the TypeError exception we caught
del self._data[transform()]
def discard(self, element):
"""Remove an element from a set if it is a member.
If the element is not a member, do nothing.
"""
try:
self.remove(element)
except KeyError:
pass
def pop(self):
"""Remove and return an arbitrary set element."""
return self._data.popitem()[0]
def __as_immutable__(self):
# Return a copy of self as an immutable set
return ImmutableSet(self)
def __as_temporarily_immutable__(self):
# Return self wrapped in a temporarily immutable set
return _TemporarilyImmutableSet(self)
class _TemporarilyImmutableSet(BaseSet):
# Wrap a mutable set as if it was temporarily immutable.
# This only supplies hashing and equality comparisons.
def __init__(self, set):
self._set = set
self._data = set._data # Needed by ImmutableSet.__eq__()
def __hash__(self):
return self._set._compute_hash()
"""A parser for SGML, using the derived class as a static DTD."""
# XXX This only supports those SGML features used by HTML.
# XXX There should be a way to distinguish between PCDATA (parsed
# character data -- the normal case), RCDATA (replaceable character
# data -- only char and entity references and end tags are special)
# and CDATA (character data -- only end tags are special). RCDATA is
# not supported at all.
from warnings import warnpy3k
warnpy3k("the sgmllib module has been removed in Python 3.0",
stacklevel=2)
del warnpy3k
import markupbase
import re
__all__ = ["SGMLParser", "SGMLParseError"]
# Regular expressions used for parsing
interesting = re.compile('[&<]')
incomplete = re.compile('&([a-zA-Z][a-zA-Z0-9]*|#[0-9]*)?|'
'<([a-zA-Z][^<>]*|'
'/([a-zA-Z][^<>]*)?|'
'![^<>]*)?')
entityref = re.compile('&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]')
charref = re.compile('&#([0-9]+)[^0-9]')
starttagopen = re.compile('<[>a-zA-Z]')
shorttagopen = re.compile('<[a-zA-Z][-.a-zA-Z0-9]*/')
shorttag = re.compile('<([a-zA-Z][-.a-zA-Z0-9]*)/([^/]*)/')
piclose = re.compile('>')
endbracket = re.compile('[<>]')
tagfind = re.compile('[a-zA-Z][-_.a-zA-Z0-9]*')
attrfind = re.compile(
r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'
r'(\'[^\']*\'|"[^"]*"|[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*))?')
class SGMLParseError(RuntimeError):
"""Exception raised for all parse errors."""
pass
# SGML parser base class -- find tags and call handler functions.
# Usage: p = SGMLParser(); p.feed(data); ...; p.close().
# The dtd is defined by deriving a class which defines methods
# with special names to handle tags: start_foo and end_foo to handle
# <foo> and </foo>, respectively, or do_foo to handle <foo> by itself.
# (Tags are converted to lower case for this purpose.) The data
# between tags is passed to the parser by calling self.handle_data()
# with some data as argument (the data may be split up in arbitrary
# chunks). Entity references are passed by calling
# self.handle_entityref() with the entity reference as argument.
class SGMLParser(markupbase.ParserBase):
# Definition of entities -- derived classes may override
entity_or_charref = re.compile('&(?:'
'([a-zA-Z][-.a-zA-Z0-9]*)|#([0-9]+)'
')(;?)')
def __init__(self, verbose=0):
"""Initialize and reset this instance."""
self.verbose = verbose
self.reset()
def reset(self):
"""Reset this instance. Loses all unprocessed data."""
self.__starttag_text = None
self.rawdata = ''
self.stack = []
self.lasttag = '???'
self.nomoretags = 0
self.literal = 0
markupbase.ParserBase.reset(self)
def setnomoretags(self):
"""Enter literal mode (CDATA) till EOF.
Intended for derived classes only.
"""
self.nomoretags = self.literal = 1
def setliteral(self, *args):
"""Enter literal mode (CDATA).
Intended for derived classes only.
"""
self.literal = 1
def feed(self, data):
"""Feed some data to the parser.
Call this as often as you want, with as little or as much text
as you want (may include '\n'). (This just saves the text,
all the processing is done by goahead().)
"""
self.rawdata = self.rawdata + data
self.goahead(0)
def close(self):
"""Handle the remaining data."""
self.goahead(1)
def error(self, message):
raise SGMLParseError(message)
# Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
def goahead(self, end):
rawdata = self.rawdata
i = 0
n = len(rawdata)
while i < n:
if self.nomoretags:
self.handle_data(rawdata[i:n])
i = n
break
match = interesting.search(rawdata, i)
if match: j = match.start()
else: j = n
if i < j:
self.handle_data(rawdata[i:j])
i = j
if i == n: break
if rawdata[i] == '<':
if starttagopen.match(rawdata, i):
if self.literal:
self.handle_data(rawdata[i])
i = i+1
continue
k = self.parse_starttag(i)
if k < 0: break
i = k
continue
if rawdata.startswith("</", i):
k = self.parse_endtag(i)
if k < 0: break
i = k
self.literal = 0
continue
if self.literal:
if n > (i + 1):
self.handle_data("<")
i = i+1
else:
# incomplete
break
continue
if rawdata.startswith("<!--", i):
# Strictly speaking, a comment is --.*--
# within a declaration tag <!...>.
# This should be removed,
# and comments handled only in parse_declaration.
k = self.parse_comment(i)
if k < 0: break
i = k
continue
if rawdata.startswith("<?", i):
k = self.parse_pi(i)
if k < 0: break
i = i+k
continue
if rawdata.startswith("<!", i):
# This is some sort of declaration; in "HTML as
# deployed," this should only be the document type
# declaration ("<!DOCTYPE html...>").
k = self.parse_declaration(i)
if k < 0: break
i = k
continue
elif rawdata[i] == '&':
if self.literal:
self.handle_data(rawdata[i])
i = i+1
continue
match = charref.match(rawdata, i)
if match:
name = match.group(1)
self.handle_charref(name)
i = match.end(0)
if rawdata[i-1] != ';': i = i-1
continue
match = entityref.match(rawdata, i)
if match:
name = match.group(1)
self.handle_entityref(name)
i = match.end(0)
if rawdata[i-1] != ';': i = i-1
continue
else:
self.error('neither < nor & ??')
# We get here only if incomplete matches but
# nothing else
match = incomplete.match(rawdata, i)
if not match:
self.handle_data(rawdata[i])
i = i+1
continue
j = match.end(0)
if j == n:
break # Really incomplete
self.handle_data(rawdata[i:j])
i = j
# end while
if end and i < n:
self.handle_data(rawdata[i:n])
i = n
self.rawdata = rawdata[i:]
# XXX if end: check for empty stack
# Extensions for the DOCTYPE scanner:
_decl_otherchars = '='
# Internal -- parse processing instr, return length or -1 if not terminated
def parse_pi(self, i):
rawdata = self.rawdata
if rawdata[i:i+2] != '<?':
self.error('unexpected call to parse_pi()')
match = piclose.search(rawdata, i+2)
if not match:
return -1
j = match.start(0)
self.handle_pi(rawdata[i+2: j])
j = match.end(0)
return j-i
def get_starttag_text(self):
return self.__starttag_text
# Internal -- handle starttag, return length or -1 if not terminated
def parse_starttag(self, i):
self.__starttag_text = None
start_pos = i
rawdata = self.rawdata
if shorttagopen.match(rawdata, i):
# SGML shorthand: <tag/data/ == <tag>data</tag>
# XXX Can data contain &... (entity or char refs)?
# XXX Can data contain < or > (tag characters)?
# XXX Can there be whitespace before the first /?
match = shorttag.match(rawdata, i)
if not match:
return -1
tag, data = match.group(1, 2)
self.__starttag_text = '<%s/' % tag
tag = tag.lower()
k = match.end(0)
self.finish_shorttag(tag, data)
self.__starttag_text = rawdata[start_pos:match.end(1) + 1]
return k
# XXX The following should skip matching quotes (' or ")
# As a shortcut way to exit, this isn't so bad, but shouldn't
# be used to locate the actual end of the start tag since the
# < or > characters may be embedded in an attribute value.
match = endbracket.search(rawdata, i+1)
if not match:
return -1
j = match.start(0)
# Now parse the data between i+1 and j into a tag and attrs
attrs = []
if rawdata[i:i+2] == '<>':
# SGML shorthand: <> == <last open tag seen>
k = j
tag = self.lasttag
else:
match = tagfind.match(rawdata, i+1)
if not match:
self.error('unexpected call to parse_starttag')
k = match.end(0)
tag = rawdata[i+1:k].lower()
self.lasttag = tag
while k < j:
match = attrfind.match(rawdata, k)
if not match: break
attrname, rest, attrvalue = match.group(1, 2, 3)
if not rest:
attrvalue = attrname
else:
if (attrvalue[:1] == "'" == attrvalue[-1:] or
attrvalue[:1] == '"' == attrvalue[-1:]):
# strip quotes
attrvalue = attrvalue[1:-1]
attrvalue = self.entity_or_charref.sub(
self._convert_ref, attrvalue)
attrs.append((attrname.lower(), attrvalue))
k = match.end(0)
if rawdata[j] == '>':
j = j+1
self.__starttag_text = rawdata[start_pos:j]
self.finish_starttag(tag, attrs)
return j
# Internal -- convert entity or character reference
def _convert_ref(self, match):
if match.group(2):
return self.convert_charref(match.group(2)) or \
'&#%s%s' % match.groups()[1:]
elif match.group(3):
return self.convert_entityref(match.group(1)) or \
'&%s;' % match.group(1)
else:
return '&%s' % match.group(1)
# Internal -- parse endtag
def parse_endtag(self, i):
rawdata = self.rawdata
match = endbracket.search(rawdata, i+1)
if not match:
return -1
j = match.start(0)
tag = rawdata[i+2:j].strip().lower()
if rawdata[j] == '>':
j = j+1
self.finish_endtag(tag)
return j
# Internal -- finish parsing of <tag/data/ (same as <tag>data</tag>)
def finish_shorttag(self, tag, data):
self.finish_starttag(tag, [])
self.handle_data(data)
self.finish_endtag(tag)
# Internal -- finish processing of start tag
# Return -1 for unknown tag, 0 for open-only tag, 1 for balanced tag
def finish_starttag(self, tag, attrs):
try:
method = getattr(self, 'start_' + tag)
except AttributeError:
try:
method = getattr(self, 'do_' + tag)
except AttributeError:
self.unknown_starttag(tag, attrs)
return -1
else:
self.handle_starttag(tag, method, attrs)
return 0
else:
self.stack.append(tag)
self.handle_starttag(tag, method, attrs)
return 1
# Internal -- finish processing of end tag
def finish_endtag(self, tag):
if not tag:
found = len(self.stack) - 1
if found < 0:
self.unknown_endtag(tag)
return
else:
if tag not in self.stack:
try:
method = getattr(self, 'end_' + tag)
except AttributeError:
self.unknown_endtag(tag)
else:
self.report_unbalanced(tag)
return
found = len(self.stack)
for i in range(found):
if self.stack[i] == tag: found = i
while len(self.stack) > found:
tag = self.stack[-1]
try:
method = getattr(self, 'end_' + tag)
except AttributeError:
method = None
if method:
self.handle_endtag(tag, method)
else:
self.unknown_endtag(tag)
del self.stack[-1]
# Overridable -- handle start tag
def handle_starttag(self, tag, method, attrs):
method(attrs)
# Overridable -- handle end tag
def handle_endtag(self, tag, method):
method()
# Example -- report an unbalanced </...> tag.
def report_unbalanced(self, tag):
if self.verbose:
print '*** Unbalanced </' + tag + '>'
print '*** Stack:', self.stack
def convert_charref(self, name):
"""Convert character reference, may be overridden."""
try:
n = int(name)
except ValueError:
return
if not 0 <= n <= 127:
return
return self.convert_codepoint(n)
def convert_codepoint(self, codepoint):
return chr(codepoint)
def handle_charref(self, name):
"""Handle character reference, no need to override."""
replacement = self.convert_charref(name)
if replacement is None:
self.unknown_charref(name)
else:
self.handle_data(replacement)
# Definition of entities -- derived classes may override
entitydefs = \
{'lt': '<', 'gt': '>', 'amp': '&', 'quot': '"', 'apos': '\''}
def convert_entityref(self, name):
"""Convert entity references.
As an alternative to overriding this method; one can tailor the
results by setting up the self.entitydefs mapping appropriately.
"""
table = self.entitydefs
if name in table:
return table[name]
else:
return
def handle_entityref(self, name):
"""Handle entity references, no need to override."""
replacement = self.convert_entityref(name)
if replacement is None:
self.unknown_entityref(name)
else:
self.handle_data(replacement)
# Example -- handle data, should be overridden
def handle_data(self, data):
pass
# Example -- handle comment, could be overridden
def handle_comment(self, data):
pass
# Example -- handle declaration, could be overridden
def handle_decl(self, decl):
pass
# Example -- handle processing instruction, could be overridden
def handle_pi(self, data):
pass
# To be overridden -- handlers for unknown objects
def unknown_starttag(self, tag, attrs): pass
def unknown_endtag(self, tag): pass
def unknown_charref(self, ref): pass
def unknown_entityref(self, ref): pass
class TestSGMLParser(SGMLParser):
def __init__(self, verbose=0):
self.testdata = ""
SGMLParser.__init__(self, verbose)
def handle_data(self, data):
self.testdata = self.testdata + data
if len(repr(self.testdata)) >= 70:
self.flush()
def flush(self):
data = self.testdata
if data:
self.testdata = ""
print 'data:', repr(data)
def handle_comment(self, data):
self.flush()
r = repr(data)
if len(r) > 68:
r = r[:32] + '...' + r[-32:]
print 'comment:', r
def unknown_starttag(self, tag, attrs):
self.flush()
if not attrs:
print 'start tag: <' + tag + '>'
else:
print 'start tag: <' + tag,
for name, value in attrs:
print name + '=' + '"' + value + '"',
print '>'
def unknown_endtag(self, tag):
self.flush()
print 'end tag: </' + tag + '>'
def unknown_entityref(self, ref):
self.flush()
print '*** unknown entity ref: &' + ref + ';'
def unknown_charref(self, ref):
self.flush()
print '*** unknown char ref: &#' + ref + ';'
def unknown_decl(self, data):
self.flush()
print '*** unknown decl: [' + data + ']'
def close(self):
SGMLParser.close(self)
self.flush()
def test(args = None):
import sys
if args is None:
args = sys.argv[1:]
if args and args[0] == '-s':
args = args[1:]
klass = SGMLParser
else:
klass = TestSGMLParser
if args:
file = args[0]
else:
file = 'test.html'
if file == '-':
f = sys.stdin
else:
try:
f = open(file, 'r')
except IOError, msg:
print file, ":", msg
sys.exit(1)
data = f.read()
if f is not sys.stdin:
f.close()
x = klass()
for c in data:
x.feed(c)
x.close()
if __name__ == '__main__':
test()
# $Id$
#
# Copyright (C) 2005 Gregory P. Smith ([email protected])
# Licensed to PSF under a Contributor Agreement.
import warnings
warnings.warn("the sha module is deprecated; use the hashlib module instead",
DeprecationWarning, 2)
from hashlib import sha1 as sha
new = sha
blocksize = 1 # legacy value (wrong in any useful sense)
digest_size = 20
digestsize = 20
"""Manage shelves of pickled objects.
A "shelf" is a persistent, dictionary-like object. The difference
with dbm databases is that the values (not the keys!) in a shelf can
be essentially arbitrary Python objects -- anything that the "pickle"
module can handle. This includes most class instances, recursive data
types, and objects containing lots of shared sub-objects. The keys
are ordinary strings.
To summarize the interface (key is a string, data is an arbitrary
object):
import shelve
d = shelve.open(filename) # open, with (g)dbm filename -- no suffix
d[key] = data # store data at key (overwrites old data if
# using an existing key)
data = d[key] # retrieve a COPY of the data at key (raise
# KeyError if no such key) -- NOTE that this
# access returns a *copy* of the entry!
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = d.has_key(key) # true if the key exists; same as "key in d"
list = d.keys() # a list of all existing keys (slow!)
d.close() # close it
Dependent on the implementation, closing a persistent dictionary may
or may not be necessary to flush changes to disk.
Normally, d[key] returns a COPY of the entry. This needs care when
mutable entries are mutated: for example, if d[key] is a list,
d[key].append(anitem)
does NOT modify the entry d[key] itself, as stored in the persistent
mapping -- it only modifies the copy, which is then immediately
discarded, so that the append has NO effect whatsoever. To append an
item to d[key] in a way that will affect the persistent mapping, use:
data = d[key]
data.append(anitem)
d[key] = data
To avoid the problem with mutable entries, you may pass the keyword
argument writeback=True in the call to shelve.open. When you use:
d = shelve.open(filename, writeback=True)
then d keeps a cache of all entries you access, and writes them all back
to the persistent mapping when you call d.close(). This ensures that
such usage as d[key].append(anitem) works as intended.
However, using keyword argument writeback=True may consume vast amount
of memory for the cache, and it may make d.close() very slow, if you
access many of d's entries after opening it in this way: d has no way to
check which of the entries you access are mutable and/or which ones you
actually mutate, so it must cache, and write back at close, all of the
entries that you access. You can call d.sync() to write back all the
entries in the cache, and empty the cache (d.sync() also synchronizes
the persistent dictionary on disk, if feasible).
"""
# Try using cPickle and cStringIO if available.
try:
from cPickle import Pickler, Unpickler
except ImportError:
from pickle import Pickler, Unpickler
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
import UserDict
__all__ = ["Shelf","BsdDbShelf","DbfilenameShelf","open"]
class _ClosedDict(UserDict.DictMixin):
'Marker for a closed dict. Access attempts raise a ValueError.'
def closed(self, *args):
raise ValueError('invalid operation on closed shelf')
__getitem__ = __setitem__ = __delitem__ = keys = closed
def __repr__(self):
return '<Closed Dictionary>'
class Shelf(UserDict.DictMixin):
"""Base class for shelf implementations.
This is initialized with a dictionary-like object.
See the module's __doc__ string for an overview of the interface.
"""
def __init__(self, dict, protocol=None, writeback=False):
self.dict = dict
if protocol is None:
protocol = 0
self._protocol = protocol
self.writeback = writeback
self.cache = {}
def keys(self):
return self.dict.keys()
def __len__(self):
return len(self.dict)
def has_key(self, key):
return key in self.dict
def __contains__(self, key):
return key in self.dict
def get(self, key, default=None):
if key in self.dict:
return self[key]
return default
def __getitem__(self, key):
try:
value = self.cache[key]
except KeyError:
f = StringIO(self.dict[key])
value = Unpickler(f).load()
if self.writeback:
self.cache[key] = value
return value
def __setitem__(self, key, value):
if self.writeback:
self.cache[key] = value
f = StringIO()
p = Pickler(f, self._protocol)
p.dump(value)
self.dict[key] = f.getvalue()
def __delitem__(self, key):
del self.dict[key]
try:
del self.cache[key]
except KeyError:
pass
def close(self):
if self.dict is None:
return
try:
self.sync()
try:
self.dict.close()
except AttributeError:
pass
finally:
# Catch errors that may happen when close is called from __del__
# because CPython is in interpreter shutdown.
try:
self.dict = _ClosedDict()
except:
self.dict = None
def __del__(self):
if not hasattr(self, 'writeback'):
# __init__ didn't succeed, so don't bother closing
return
self.close()
def sync(self):
if self.writeback and self.cache:
self.writeback = False
for key, entry in self.cache.iteritems():
self[key] = entry
self.writeback = True
self.cache = {}
if hasattr(self.dict, 'sync'):
self.dict.sync()
class BsdDbShelf(Shelf):
"""Shelf implementation using the "BSD" db interface.
This adds methods first(), next(), previous(), last() and
set_location() that have no counterpart in [g]dbm databases.
The actual database must be opened using one of the "bsddb"
modules "open" routines (i.e. bsddb.hashopen, bsddb.btopen or
bsddb.rnopen) and passed to the constructor.
See the module's __doc__ string for an overview of the interface.
"""
def __init__(self, dict, protocol=None, writeback=False):
Shelf.__init__(self, dict, protocol, writeback)
def set_location(self, key):
(key, value) = self.dict.set_location(key)
f = StringIO(value)
return (key, Unpickler(f).load())
def next(self):
(key, value) = self.dict.next()
f = StringIO(value)
return (key, Unpickler(f).load())
def previous(self):
(key, value) = self.dict.previous()
f = StringIO(value)
return (key, Unpickler(f).load())
def first(self):
(key, value) = self.dict.first()
f = StringIO(value)
return (key, Unpickler(f).load())
def last(self):
(key, value) = self.dict.last()
f = StringIO(value)
return (key, Unpickler(f).load())
class DbfilenameShelf(Shelf):
"""Shelf implementation using the "anydbm" generic dbm interface.
This is initialized with the filename for the dbm database.
See the module's __doc__ string for an overview of the interface.
"""
def __init__(self, filename, flag='c', protocol=None, writeback=False):
import anydbm
Shelf.__init__(self, anydbm.open(filename, flag), protocol, writeback)
def open(filename, flag='c', protocol=None, writeback=False):
"""Open a persistent dictionary for reading and writing.
The filename parameter is the base filename for the underlying
database. As a side-effect, an extension may be added to the
filename and more than one file may be created. The optional flag
parameter has the same interpretation as the flag parameter of
anydbm.open(). The optional protocol parameter specifies the
version of the pickle protocol (0, 1, or 2).
See the module's __doc__ string for an overview of the interface.
"""
return DbfilenameShelf(filename, flag, protocol, writeback)
# -*- coding: iso-8859-1 -*-
"""A lexical analyzer class for simple shell-like syntaxes."""
# Module and documentation by Eric S. Raymond, 21 Dec 1998
# Input stacking and error message cleanup added by ESR, March 2000
# push_source() and pop_source() made explicit by ESR, January 2001.
# Posix compliance, split(), string arguments, and
# iterator interface by Gustavo Niemeyer, April 2003.
import os.path
import sys
from collections import deque
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
__all__ = ["shlex", "split"]
class shlex:
"A lexical analyzer class for simple shell-like syntaxes."
def __init__(self, instream=None, infile=None, posix=False):
if isinstance(instream, basestring):
instream = StringIO(instream)
if instream is not None:
self.instream = instream
self.infile = infile
else:
self.instream = sys.stdin
self.infile = None
self.posix = posix
if posix:
self.eof = None
else:
self.eof = ''
self.commenters = '#'
self.wordchars = ('abcdfeghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_')
if self.posix:
self.wordchars += ('��������������������������������'
'������������������������������')
self.whitespace = ' \t\r\n'
self.whitespace_split = False
self.quotes = '\'"'
self.escape = '\\'
self.escapedquotes = '"'
self.state = ' '
self.pushback = deque()
self.lineno = 1
self.debug = 0
self.token = ''
self.filestack = deque()
self.source = None
if self.debug:
print 'shlex: reading from %s, line %d' \
% (self.instream, self.lineno)
def push_token(self, tok):
"Push a token onto the stack popped by the get_token method"
if self.debug >= 1:
print "shlex: pushing token " + repr(tok)
self.pushback.appendleft(tok)
def push_source(self, newstream, newfile=None):
"Push an input source onto the lexer's input source stack."
if isinstance(newstream, basestring):
newstream = StringIO(newstream)
self.filestack.appendleft((self.infile, self.instream, self.lineno))
self.infile = newfile
self.instream = newstream
self.lineno = 1
if self.debug:
if newfile is not None:
print 'shlex: pushing to file %s' % (self.infile,)
else:
print 'shlex: pushing to stream %s' % (self.instream,)
def pop_source(self):
"Pop the input source stack."
self.instream.close()
(self.infile, self.instream, self.lineno) = self.filestack.popleft()
if self.debug:
print 'shlex: popping to %s, line %d' \
% (self.instream, self.lineno)
self.state = ' '
def get_token(self):
"Get a token from the input stream (or from stack if it's nonempty)"
if self.pushback:
tok = self.pushback.popleft()
if self.debug >= 1:
print "shlex: popping token " + repr(tok)
return tok
# No pushback. Get a token.
raw = self.read_token()
# Handle inclusions
if self.source is not None:
while raw == self.source:
spec = self.sourcehook(self.read_token())
if spec:
(newfile, newstream) = spec
self.push_source(newstream, newfile)
raw = self.get_token()
# Maybe we got EOF instead?
while raw == self.eof:
if not self.filestack:
return self.eof
else:
self.pop_source()
raw = self.get_token()
# Neither inclusion nor EOF
if self.debug >= 1:
if raw != self.eof:
print "shlex: token=" + repr(raw)
else:
print "shlex: token=EOF"
return raw
def read_token(self):
quoted = False
escapedstate = ' '
while True:
nextchar = self.instream.read(1)
if nextchar == '\n':
self.lineno = self.lineno + 1
if self.debug >= 3:
print "shlex: in state", repr(self.state), \
"I see character:", repr(nextchar)
if self.state is None:
self.token = '' # past end of file
break
elif self.state == ' ':
if not nextchar:
self.state = None # end of file
break
elif nextchar in self.whitespace:
if self.debug >= 2:
print "shlex: I see whitespace in whitespace state"
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
elif nextchar in self.commenters:
self.instream.readline()
self.lineno = self.lineno + 1
elif self.posix and nextchar in self.escape:
escapedstate = 'a'
self.state = nextchar
elif nextchar in self.wordchars:
self.token = nextchar
self.state = 'a'
elif nextchar in self.quotes:
if not self.posix:
self.token = nextchar
self.state = nextchar
elif self.whitespace_split:
self.token = nextchar
self.state = 'a'
else:
self.token = nextchar
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
elif self.state in self.quotes:
quoted = True
if not nextchar: # end of file
if self.debug >= 2:
print "shlex: I see EOF in quotes state"
# XXX what error should be raised here?
raise ValueError, "No closing quotation"
if nextchar == self.state:
if not self.posix:
self.token = self.token + nextchar
self.state = ' '
break
else:
self.state = 'a'
elif self.posix and nextchar in self.escape and \
self.state in self.escapedquotes:
escapedstate = self.state
self.state = nextchar
else:
self.token = self.token + nextchar
elif self.state in self.escape:
if not nextchar: # end of file
if self.debug >= 2:
print "shlex: I see EOF in escape state"
# XXX what error should be raised here?
raise ValueError, "No escaped character"
# In posix shells, only the quote itself or the escape
# character may be escaped within quotes.
if escapedstate in self.quotes and \
nextchar != self.state and nextchar != escapedstate:
self.token = self.token + self.state
self.token = self.token + nextchar
self.state = escapedstate
elif self.state == 'a':
if not nextchar:
self.state = None # end of file
break
elif nextchar in self.whitespace:
if self.debug >= 2:
print "shlex: I see whitespace in word state"
self.state = ' '
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
elif nextchar in self.commenters:
self.instream.readline()
self.lineno = self.lineno + 1
if self.posix:
self.state = ' '
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
elif self.posix and nextchar in self.quotes:
self.state = nextchar
elif self.posix and nextchar in self.escape:
escapedstate = 'a'
self.state = nextchar
elif nextchar in self.wordchars or nextchar in self.quotes \
or self.whitespace_split:
self.token = self.token + nextchar
else:
self.pushback.appendleft(nextchar)
if self.debug >= 2:
print "shlex: I see punctuation in word state"
self.state = ' '
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
result = self.token
self.token = ''
if self.posix and not quoted and result == '':
result = None
if self.debug > 1:
if result:
print "shlex: raw token=" + repr(result)
else:
print "shlex: raw token=EOF"
return result
def sourcehook(self, newfile):
"Hook called on a filename to be sourced."
if newfile[0] == '"':
newfile = newfile[1:-1]
# This implements cpp-like semantics for relative-path inclusion.
if isinstance(self.infile, basestring) and not os.path.isabs(newfile):
newfile = os.path.join(os.path.dirname(self.infile), newfile)
return (newfile, open(newfile, "r"))
def error_leader(self, infile=None, lineno=None):
"Emit a C-compiler-like, Emacs-friendly error-message leader."
if infile is None:
infile = self.infile
if lineno is None:
lineno = self.lineno
return "\"%s\", line %d: " % (infile, lineno)
def __iter__(self):
return self
def next(self):
token = self.get_token()
if token == self.eof:
raise StopIteration
return token
def split(s, comments=False, posix=True):
lex = shlex(s, posix=posix)
lex.whitespace_split = True
if not comments:
lex.commenters = ''
return list(lex)
if __name__ == '__main__':
if len(sys.argv) == 1:
lexer = shlex()
else:
file = sys.argv[1]
lexer = shlex(open(file), file)
while 1:
tt = lexer.get_token()
if tt:
print "Token: " + repr(tt)
else:
break
"""Utility functions for copying and archiving files and directory trees.
XXX The functions here don't copy the resource fork or other metadata on Mac.
"""
import os
import sys
import stat
from os.path import abspath
import fnmatch
import collections
import errno
try:
import zlib
del zlib
_ZLIB_SUPPORTED = True
except ImportError:
_ZLIB_SUPPORTED = False
try:
import bz2
del bz2
_BZ2_SUPPORTED = True
except ImportError:
_BZ2_SUPPORTED = False
try:
from pwd import getpwnam
except ImportError:
getpwnam = None
try:
from grp import getgrnam
except ImportError:
getgrnam = None
__all__ = ["copyfileobj", "copyfile", "copymode", "copystat", "copy", "copy2",
"copytree", "move", "rmtree", "Error", "SpecialFileError",
"ExecError", "make_archive", "get_archive_formats",
"register_archive_format", "unregister_archive_format",
"ignore_patterns"]
class Error(EnvironmentError):
pass
class SpecialFileError(EnvironmentError):
"""Raised when trying to do a kind of operation (e.g. copying) which is
not supported on a special file (e.g. a named pipe)"""
class ExecError(EnvironmentError):
"""Raised when a command could not be executed"""
try:
WindowsError
except NameError:
WindowsError = None
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
def _samefile(src, dst):
# Macintosh, Unix.
if hasattr(os.path, 'samefile'):
try:
return os.path.samefile(src, dst)
except OSError:
return False
# All other platforms: check for same pathname.
return (os.path.normcase(os.path.abspath(src)) ==
os.path.normcase(os.path.abspath(dst)))
def copyfile(src, dst):
"""Copy data from src to dst"""
if _samefile(src, dst):
raise Error("`%s` and `%s` are the same file" % (src, dst))
for fn in [src, dst]:
try:
st = os.stat(fn)
except OSError:
# File most likely does not exist
pass
else:
# XXX What about other special files? (sockets, devices...)
if stat.S_ISFIFO(st.st_mode):
raise SpecialFileError("`%s` is a named pipe" % fn)
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
copyfileobj(fsrc, fdst)
def copymode(src, dst):
"""Copy mode bits from src to dst"""
if hasattr(os, 'chmod'):
st = os.stat(src)
mode = stat.S_IMODE(st.st_mode)
os.chmod(dst, mode)
def copystat(src, dst):
"""Copy file metadata
Copy the permission bits, last access time, last modification time, and
flags from `src` to `dst`. On Linux, copystat() also copies the "extended
attributes" where possible. The file contents, owner, and group are
unaffected. `src` and `dst` are path names given as strings.
"""
st = os.stat(src)
mode = stat.S_IMODE(st.st_mode)
if hasattr(os, 'utime'):
os.utime(dst, (st.st_atime, st.st_mtime))
if hasattr(os, 'chmod'):
os.chmod(dst, mode)
if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):
try:
os.chflags(dst, st.st_flags)
except OSError, why:
for err in 'EOPNOTSUPP', 'ENOTSUP':
if hasattr(errno, err) and why.errno == getattr(errno, err):
break
else:
raise
def copy(src, dst):
"""Copy data and mode bits ("cp src dst").
The destination may be a directory.
"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst)
copymode(src, dst)
def copy2(src, dst):
"""Copy data and metadata. Return the file's destination.
Metadata is copied with copystat(). Please see the copystat function
for more information.
The destination may be a directory.
"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst)
copystat(src, dst)
def ignore_patterns(*patterns):
"""Function that can be used as copytree() ignore parameter.
Patterns is a sequence of glob-style patterns
that are used to exclude files"""
def _ignore_patterns(path, names):
ignored_names = []
for pattern in patterns:
ignored_names.extend(fnmatch.filter(names, pattern))
return set(ignored_names)
return _ignore_patterns
def copytree(src, dst, symlinks=False, ignore=None):
"""Recursively copy a directory tree using copy2().
The destination directory must not already exist.
If exception(s) occur, an Error is raised with a list of reasons.
If the optional symlinks flag is true, symbolic links in the
source tree result in symbolic links in the destination tree; if
it is false, the contents of the files pointed to by symbolic
links are copied.
The optional ignore argument is a callable. If given, it
is called with the `src` parameter, which is the directory
being visited by copytree(), and `names` which is the list of
`src` contents, as returned by os.listdir():
callable(src, names) -> ignored_names
Since copytree() is called recursively, the callable will be
called once for each directory that is copied. It returns a
list of names relative to the `src` directory that should
not be copied.
XXX Consider this example code rather than the ultimate tool.
"""
names = os.listdir(src)
if ignore is not None:
ignored_names = ignore(src, names)
else:
ignored_names = set()
os.makedirs(dst)
errors = []
for name in names:
if name in ignored_names:
continue
srcname = os.path.join(src, name)
dstname = os.path.join(dst, name)
try:
if symlinks and os.path.islink(srcname):
linkto = os.readlink(srcname)
os.symlink(linkto, dstname)
elif os.path.isdir(srcname):
copytree(srcname, dstname, symlinks, ignore)
else:
# Will raise a SpecialFileError for unsupported file types
copy2(srcname, dstname)
# catch the Error from the recursive copytree so that we can
# continue with other files
except Error, err:
errors.extend(err.args[0])
except EnvironmentError, why:
errors.append((srcname, dstname, str(why)))
try:
copystat(src, dst)
except OSError, why:
if WindowsError is not None and isinstance(why, WindowsError):
# Copying file access times may fail on Windows
pass
else:
errors.append((src, dst, str(why)))
if errors:
raise Error, errors
def rmtree(path, ignore_errors=False, onerror=None):
"""Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror
is set, it is called to handle the error with arguments (func,
path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
path is the argument to that function that caused it to fail; and
exc_info is a tuple returned by sys.exc_info(). If ignore_errors
is false and onerror is None, an exception is raised.
"""
if ignore_errors:
def onerror(*args):
pass
elif onerror is None:
def onerror(*args):
raise
try:
if os.path.islink(path):
# symlinks to directories are forbidden, see bug #1669
raise OSError("Cannot call rmtree on a symbolic link")
except OSError:
onerror(os.path.islink, path, sys.exc_info())
# can't continue even if onerror hook returns
return
names = []
try:
names = os.listdir(path)
except os.error, err:
onerror(os.listdir, path, sys.exc_info())
for name in names:
fullname = os.path.join(path, name)
try:
mode = os.lstat(fullname).st_mode
except os.error:
mode = 0
if stat.S_ISDIR(mode):
rmtree(fullname, ignore_errors, onerror)
else:
try:
os.remove(fullname)
except os.error, err:
onerror(os.remove, fullname, sys.exc_info())
try:
os.rmdir(path)
except os.error:
onerror(os.rmdir, path, sys.exc_info())
def _basename(path):
# A basename() variant which first strips the trailing slash, if present.
# Thus we always get the last component of the path, even for directories.
sep = os.path.sep + (os.path.altsep or '')
return os.path.basename(path.rstrip(sep))
def move(src, dst):
"""Recursively move a file or directory to another location. This is
similar to the Unix "mv" command.
If the destination is a directory or a symlink to a directory, the source
is moved inside the directory. The destination path must not already
exist.
If the destination already exists but is not a directory, it may be
overwritten depending on os.rename() semantics.
If the destination is on our current filesystem, then rename() is used.
Otherwise, src is copied to the destination and then removed.
A lot more could be done here... A look at a mv.c shows a lot of
the issues this implementation glosses over.
"""
real_dst = dst
if os.path.isdir(dst):
if _samefile(src, dst):
# We might be on a case insensitive filesystem,
# perform the rename anyway.
os.rename(src, dst)
return
real_dst = os.path.join(dst, _basename(src))
if os.path.exists(real_dst):
raise Error, "Destination path '%s' already exists" % real_dst
try:
os.rename(src, real_dst)
except OSError:
if os.path.isdir(src):
if _destinsrc(src, dst):
raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)
copytree(src, real_dst, symlinks=True)
rmtree(src)
else:
copy2(src, real_dst)
os.unlink(src)
def _destinsrc(src, dst):
src = abspath(src)
dst = abspath(dst)
if not src.endswith(os.path.sep):
src += os.path.sep
if not dst.endswith(os.path.sep):
dst += os.path.sep
return dst.startswith(src)
def _get_gid(name):
"""Returns a gid, given a group name."""
if getgrnam is None or name is None:
return None
try:
result = getgrnam(name)
except KeyError:
result = None
if result is not None:
return result[2]
return None
def _get_uid(name):
"""Returns an uid, given a user name."""
if getpwnam is None or name is None:
return None
try:
result = getpwnam(name)
except KeyError:
result = None
if result is not None:
return result[2]
return None
def _make_tarball(base_name, base_dir, compress="gzip", verbose=0, dry_run=0,
owner=None, group=None, logger=None):
"""Create a (possibly compressed) tar file from all the files under
'base_dir'.
'compress' must be "gzip" (the default), "bzip2", or None.
'owner' and 'group' can be used to define an owner and a group for the
archive that is being built. If not provided, the current owner and group
will be used.
The output tar file will be named 'base_name' + ".tar", possibly plus
the appropriate compression extension (".gz", or ".bz2").
Returns the output filename.
"""
if compress is None:
tar_compression = ''
elif _ZLIB_SUPPORTED and compress == 'gzip':
tar_compression = 'gz'
elif _BZ2_SUPPORTED and compress == 'bzip2':
tar_compression = 'bz2'
else:
raise ValueError("bad value for 'compress', or compression format not "
"supported : {0}".format(compress))
compress_ext = '.' + tar_compression if compress else ''
archive_name = base_name + '.tar' + compress_ext
archive_dir = os.path.dirname(archive_name)
if archive_dir and not os.path.exists(archive_dir):
if logger is not None:
logger.info("creating %s", archive_dir)
if not dry_run:
os.makedirs(archive_dir)
# creating the tarball
import tarfile # late import so Python build itself doesn't break
if logger is not None:
logger.info('Creating tar archive')
uid = _get_uid(owner)
gid = _get_gid(group)
def _set_uid_gid(tarinfo):
if gid is not None:
tarinfo.gid = gid
tarinfo.gname = group
if uid is not None:
tarinfo.uid = uid
tarinfo.uname = owner
return tarinfo
if not dry_run:
tar = tarfile.open(archive_name, 'w|%s' % tar_compression)
try:
tar.add(base_dir, filter=_set_uid_gid)
finally:
tar.close()
return archive_name
def _call_external_zip(base_dir, zip_filename, verbose, dry_run, logger):
# XXX see if we want to keep an external call here
if verbose:
zipoptions = "-r"
else:
zipoptions = "-rq"
cmd = ["zip", zipoptions, zip_filename, base_dir]
if logger is not None:
logger.info(' '.join(cmd))
if dry_run:
return
import subprocess
try:
subprocess.check_call(cmd)
except subprocess.CalledProcessError:
# XXX really should distinguish between "couldn't find
# external 'zip' command" and "zip failed".
raise ExecError, \
("unable to create zip file '%s': "
"could neither import the 'zipfile' module nor "
"find a standalone zip utility") % zip_filename
def _make_zipfile(base_name, base_dir, verbose=0, dry_run=0, logger=None):
"""Create a zip file from all the files under 'base_dir'.
The output zip file will be named 'base_name' + ".zip". Uses either the
"zipfile" Python module (if available) or the InfoZIP "zip" utility
(if installed and found on the default search path). If neither tool is
available, raises ExecError. Returns the name of the output zip
file.
"""
zip_filename = base_name + ".zip"
archive_dir = os.path.dirname(base_name)
if archive_dir and not os.path.exists(archive_dir):
if logger is not None:
logger.info("creating %s", archive_dir)
if not dry_run:
os.makedirs(archive_dir)
# If zipfile module is not available, try spawning an external 'zip'
# command.
try:
import zlib
import zipfile
except ImportError:
zipfile = None
if zipfile is None:
_call_external_zip(base_dir, zip_filename, verbose, dry_run, logger)
else:
if logger is not None:
logger.info("creating '%s' and adding '%s' to it",
zip_filename, base_dir)
if not dry_run:
with zipfile.ZipFile(zip_filename, "w",
compression=zipfile.ZIP_DEFLATED) as zf:
path = os.path.normpath(base_dir)
if path != os.curdir:
zf.write(path, path)
if logger is not None:
logger.info("adding '%s'", path)
for dirpath, dirnames, filenames in os.walk(base_dir):
for name in sorted(dirnames):
path = os.path.normpath(os.path.join(dirpath, name))
zf.write(path, path)
if logger is not None:
logger.info("adding '%s'", path)
for name in filenames:
path = os.path.normpath(os.path.join(dirpath, name))
if os.path.isfile(path):
zf.write(path, path)
if logger is not None:
logger.info("adding '%s'", path)
return zip_filename
_ARCHIVE_FORMATS = {
'tar': (_make_tarball, [('compress', None)], "uncompressed tar file"),
'zip': (_make_zipfile, [], "ZIP file")
}
if _ZLIB_SUPPORTED:
_ARCHIVE_FORMATS['gztar'] = (_make_tarball, [('compress', 'gzip')],
"gzip'ed tar-file")
if _BZ2_SUPPORTED:
_ARCHIVE_FORMATS['bztar'] = (_make_tarball, [('compress', 'bzip2')],
"bzip2'ed tar-file")
def get_archive_formats():
"""Returns a list of supported formats for archiving and unarchiving.
Each element of the returned sequence is a tuple (name, description)
"""
formats = [(name, registry[2]) for name, registry in
_ARCHIVE_FORMATS.items()]
formats.sort()
return formats
def register_archive_format(name, function, extra_args=None, description=''):
"""Registers an archive format.
name is the name of the format. function is the callable that will be
used to create archives. If provided, extra_args is a sequence of
(name, value) tuples that will be passed as arguments to the callable.
description can be provided to describe the format, and will be returned
by the get_archive_formats() function.
"""
if extra_args is None:
extra_args = []
if not isinstance(function, collections.Callable):
raise TypeError('The %s object is not callable' % function)
if not isinstance(extra_args, (tuple, list)):
raise TypeError('extra_args needs to be a sequence')
for element in extra_args:
if not isinstance(element, (tuple, list)) or len(element) !=2 :
raise TypeError('extra_args elements are : (arg_name, value)')
_ARCHIVE_FORMATS[name] = (function, extra_args, description)
def unregister_archive_format(name):
del _ARCHIVE_FORMATS[name]
def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
dry_run=0, owner=None, group=None, logger=None):
"""Create an archive file (eg. zip or tar).
'base_name' is the name of the file to create, minus any format-specific
extension; 'format' is the archive format: one of "zip", "tar", "gztar",
or "bztar". Or any other registered format.
'root_dir' is a directory that will be the root directory of the
archive; ie. we typically chdir into 'root_dir' before creating the
archive. 'base_dir' is the directory where we start archiving from;
ie. 'base_dir' will be the common prefix of all files and
directories in the archive. 'root_dir' and 'base_dir' both default
to the current directory. Returns the name of the archive file.
'owner' and 'group' are used when creating a tar archive. By default,
uses the current owner and group.
"""
save_cwd = os.getcwd()
if root_dir is not None:
if logger is not None:
logger.debug("changing into '%s'", root_dir)
base_name = os.path.abspath(base_name)
if not dry_run:
os.chdir(root_dir)
if base_dir is None:
base_dir = os.curdir
kwargs = {'dry_run': dry_run, 'logger': logger}
try:
format_info = _ARCHIVE_FORMATS[format]
except KeyError:
raise ValueError, "unknown archive format '%s'" % format
func = format_info[0]
for arg, val in format_info[1]:
kwargs[arg] = val
if format != 'zip':
kwargs['owner'] = owner
kwargs['group'] = group
try:
filename = func(base_name, base_dir, **kwargs)
finally:
if root_dir is not None:
if logger is not None:
logger.debug("changing back to '%s'", save_cwd)
os.chdir(save_cwd)
return filename
"""Simple HTTP Server.
This module builds on BaseHTTPServer by implementing the standard GET
and HEAD requests in a fairly straightforward manner.
"""
__version__ = "0.6"
__all__ = ["SimpleHTTPRequestHandler"]
import os
import posixpath
import BaseHTTPServer
import urllib
import urlparse
import cgi
import sys
import shutil
import mimetypes
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
"""Simple HTTP request handler with GET and HEAD commands.
This serves files from the current directory and any of its
subdirectories. The MIME type for files is determined by
calling the .guess_type() method.
The GET and HEAD requests are identical except that the HEAD
request omits the actual contents of the file.
"""
server_version = "SimpleHTTP/" + __version__
def do_GET(self):
"""Serve a GET request."""
f = self.send_head()
if f:
try:
self.copyfile(f, self.wfile)
finally:
f.close()
def do_HEAD(self):
"""Serve a HEAD request."""
f = self.send_head()
if f:
f.close()
def send_head(self):
"""Common code for GET and HEAD commands.
This sends the response code and MIME headers.
Return value is either a file object (which has to be copied
to the outputfile by the caller unless the command was HEAD,
and must be closed by the caller under all circumstances), or
None, in which case the caller has nothing further to do.
"""
path = self.translate_path(self.path)
f = None
if os.path.isdir(path):
parts = urlparse.urlsplit(self.path)
if not parts.path.endswith('/'):
# redirect browser - doing basically what apache does
self.send_response(301)
new_parts = (parts[0], parts[1], parts[2] + '/',
parts[3], parts[4])
new_url = urlparse.urlunsplit(new_parts)
self.send_header("Location", new_url)
self.end_headers()
return None
for index in "index.html", "index.htm":
index = os.path.join(path, index)
if os.path.exists(index):
path = index
break
else:
return self.list_directory(path)
ctype = self.guess_type(path)
try:
# Always read in binary mode. Opening files in text mode may cause
# newline translations, making the actual size of the content
# transmitted *less* than the content-length!
f = open(path, 'rb')
except IOError:
self.send_error(404, "File not found")
return None
try:
self.send_response(200)
self.send_header("Content-type", ctype)
fs = os.fstat(f.fileno())
self.send_header("Content-Length", str(fs[6]))
self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
self.end_headers()
return f
except:
f.close()
raise
def list_directory(self, path):
"""Helper to produce a directory listing (absent index.html).
Return value is either a file object, or None (indicating an
error). In either case, the headers are sent, making the
interface the same as for send_head().
"""
try:
list = os.listdir(path)
except os.error:
self.send_error(404, "No permission to list directory")
return None
list.sort(key=lambda a: a.lower())
f = StringIO()
displaypath = cgi.escape(urllib.unquote(self.path))
f.write('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">')
f.write("<html>\n<title>Directory listing for %s</title>\n" % displaypath)
f.write("<body>\n<h2>Directory listing for %s</h2>\n" % displaypath)
f.write("<hr>\n<ul>\n")
for name in list:
fullname = os.path.join(path, name)
displayname = linkname = name
# Append / for directories or @ for symbolic links
if os.path.isdir(fullname):
displayname = name + "/"
linkname = name + "/"
if os.path.islink(fullname):
displayname = name + "@"
# Note: a link to a directory displays with @ and links with /
f.write('<li><a href="%s">%s</a>\n'
% (urllib.quote(linkname), cgi.escape(displayname)))
f.write("</ul>\n<hr>\n</body>\n</html>\n")
length = f.tell()
f.seek(0)
self.send_response(200)
encoding = sys.getfilesystemencoding()
self.send_header("Content-type", "text/html; charset=%s" % encoding)
self.send_header("Content-Length", str(length))
self.end_headers()
return f
def translate_path(self, path):
"""Translate a /-separated PATH to the local filename syntax.
Components that mean special things to the local file system
(e.g. drive or directory names) are ignored. (XXX They should
probably be diagnosed.)
"""
# abandon query parameters
path = path.split('?',1)[0]
path = path.split('#',1)[0]
# Don't forget explicit trailing slash when normalizing. Issue17324
trailing_slash = path.rstrip().endswith('/')
path = posixpath.normpath(urllib.unquote(path))
words = path.split('/')
words = filter(None, words)
path = os.getcwd()
for word in words:
if os.path.dirname(word) or word in (os.curdir, os.pardir):
# Ignore components that are not a simple file/directory name
continue
path = os.path.join(path, word)
if trailing_slash:
path += '/'
return path
def copyfile(self, source, outputfile):
"""Copy all data between two file objects.
The SOURCE argument is a file object open for reading
(or anything with a read() method) and the DESTINATION
argument is a file object open for writing (or
anything with a write() method).
The only reason for overriding this would be to change
the block size or perhaps to replace newlines by CRLF
-- note however that this the default server uses this
to copy binary data as well.
"""
shutil.copyfileobj(source, outputfile)
def guess_type(self, path):
"""Guess the type of a file.
Argument is a PATH (a filename).
Return value is a string of the form type/subtype,
usable for a MIME Content-type header.
The default implementation looks the file's extension
up in the table self.extensions_map, using application/octet-stream
as a default; however it would be permissible (if
slow) to look inside the data to make a better guess.
"""
base, ext = posixpath.splitext(path)
if ext in self.extensions_map:
return self.extensions_map[ext]
ext = ext.lower()
if ext in self.extensions_map:
return self.extensions_map[ext]
else:
return self.extensions_map['']
if not mimetypes.inited:
mimetypes.init() # try to read system mime.types
extensions_map = mimetypes.types_map.copy()
extensions_map.update({
'': 'application/octet-stream', # Default
'.py': 'text/plain',
'.c': 'text/plain',
'.h': 'text/plain',
})
def test(HandlerClass = SimpleHTTPRequestHandler,
ServerClass = BaseHTTPServer.HTTPServer):
BaseHTTPServer.test(HandlerClass, ServerClass)
if __name__ == '__main__':
test()
r"""Simple XML-RPC Server.
This module can be used to create simple XML-RPC servers
by creating a server and either installing functions, a
class instance, or by extending the SimpleXMLRPCServer
class.
It can also be used to handle XML-RPC requests in a CGI
environment using CGIXMLRPCRequestHandler.
A list of possible usage patterns follows:
1. Install functions:
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_function(pow)
server.register_function(lambda x,y: x+y, 'add')
server.serve_forever()
2. Install an instance:
class MyFuncs:
def __init__(self):
# make all of the string functions available through
# string.func_name
import string
self.string = string
def _listMethods(self):
# implement this method so that system.listMethods
# knows to advertise the strings methods
return list_public_methods(self) + \
['string.' + method for method in list_public_methods(self.string)]
def pow(self, x, y): return pow(x, y)
def add(self, x, y) : return x + y
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_introspection_functions()
server.register_instance(MyFuncs())
server.serve_forever()
3. Install an instance with custom dispatch method:
class Math:
def _listMethods(self):
# this method must be present for system.listMethods
# to work
return ['add', 'pow']
def _methodHelp(self, method):
# this method must be present for system.methodHelp
# to work
if method == 'add':
return "add(2,3) => 5"
elif method == 'pow':
return "pow(x, y[, z]) => number"
else:
# By convention, return empty
# string if no help is available
return ""
def _dispatch(self, method, params):
if method == 'pow':
return pow(*params)
elif method == 'add':
return params[0] + params[1]
else:
raise 'bad method'
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_introspection_functions()
server.register_instance(Math())
server.serve_forever()
4. Subclass SimpleXMLRPCServer:
class MathServer(SimpleXMLRPCServer):
def _dispatch(self, method, params):
try:
# We are forcing the 'export_' prefix on methods that are
# callable through XML-RPC to prevent potential security
# problems
func = getattr(self, 'export_' + method)
except AttributeError:
raise Exception('method "%s" is not supported' % method)
else:
return func(*params)
def export_add(self, x, y):
return x + y
server = MathServer(("localhost", 8000))
server.serve_forever()
5. CGI script:
server = CGIXMLRPCRequestHandler()
server.register_function(pow)
server.handle_request()
"""
# Written by Brian Quinlan ([email protected]).
# Based on code written by Fredrik Lundh.
import xmlrpclib
from xmlrpclib import Fault
import SocketServer
import BaseHTTPServer
import sys
import os
import traceback
import re
try:
import fcntl
except ImportError:
fcntl = None
def resolve_dotted_attribute(obj, attr, allow_dotted_names=True):
"""resolve_dotted_attribute(a, 'b.c.d') => a.b.c.d
Resolves a dotted attribute name to an object. Raises
an AttributeError if any attribute in the chain starts with a '_'.
If the optional allow_dotted_names argument is false, dots are not
supported and this function operates similar to getattr(obj, attr).
"""
if allow_dotted_names:
attrs = attr.split('.')
else:
attrs = [attr]
for i in attrs:
if i.startswith('_'):
raise AttributeError(
'attempt to access private attribute "%s"' % i
)
else:
obj = getattr(obj,i)
return obj
def list_public_methods(obj):
"""Returns a list of attribute strings, found in the specified
object, which represent callable attributes"""
return [member for member in dir(obj)
if not member.startswith('_') and
hasattr(getattr(obj, member), '__call__')]
def remove_duplicates(lst):
"""remove_duplicates([2,2,2,1,3,3]) => [3,1,2]
Returns a copy of a list without duplicates. Every list
item must be hashable and the order of the items in the
resulting list is not defined.
"""
u = {}
for x in lst:
u[x] = 1
return u.keys()
class SimpleXMLRPCDispatcher:
"""Mix-in class that dispatches XML-RPC requests.
This class is used to register XML-RPC method handlers
and then to dispatch them. This class doesn't need to be
instanced directly when used by SimpleXMLRPCServer but it
can be instanced when used by the MultiPathXMLRPCServer.
"""
def __init__(self, allow_none=False, encoding=None):
self.funcs = {}
self.instance = None
self.allow_none = allow_none
self.encoding = encoding
def register_instance(self, instance, allow_dotted_names=False):
"""Registers an instance to respond to XML-RPC requests.
Only one instance can be installed at a time.
If the registered instance has a _dispatch method then that
method will be called with the name of the XML-RPC method and
its parameters as a tuple
e.g. instance._dispatch('add',(2,3))
If the registered instance does not have a _dispatch method
then the instance will be searched to find a matching method
and, if found, will be called. Methods beginning with an '_'
are considered private and will not be called by
SimpleXMLRPCServer.
If a registered function matches an XML-RPC request, then it
will be called instead of the registered instance.
If the optional allow_dotted_names argument is true and the
instance does not have a _dispatch method, method names
containing dots are supported and resolved, as long as none of
the name segments start with an '_'.
*** SECURITY WARNING: ***
Enabling the allow_dotted_names options allows intruders
to access your module's global variables and may allow
intruders to execute arbitrary code on your machine. Only
use this option on a secure, closed network.
"""
self.instance = instance
self.allow_dotted_names = allow_dotted_names
def register_function(self, function, name = None):
"""Registers a function to respond to XML-RPC requests.
The optional name argument can be used to set a Unicode name
for the function.
"""
if name is None:
name = function.__name__
self.funcs[name] = function
def register_introspection_functions(self):
"""Registers the XML-RPC introspection methods in the system
namespace.
see http://xmlrpc.usefulinc.com/doc/reserved.html
"""
self.funcs.update({'system.listMethods' : self.system_listMethods,
'system.methodSignature' : self.system_methodSignature,
'system.methodHelp' : self.system_methodHelp})
def register_multicall_functions(self):
"""Registers the XML-RPC multicall method in the system
namespace.
see http://www.xmlrpc.com/discuss/msgReader$1208"""
self.funcs.update({'system.multicall' : self.system_multicall})
def _marshaled_dispatch(self, data, dispatch_method = None, path = None):
"""Dispatches an XML-RPC method from marshalled (XML) data.
XML-RPC methods are dispatched from the marshalled (XML) data
using the _dispatch method and the result is returned as
marshalled data. For backwards compatibility, a dispatch
function can be provided as an argument (see comment in
SimpleXMLRPCRequestHandler.do_POST) but overriding the
existing method through subclassing is the preferred means
of changing method dispatch behavior.
"""
try:
params, method = xmlrpclib.loads(data)
# generate response
if dispatch_method is not None:
response = dispatch_method(method, params)
else:
response = self._dispatch(method, params)
# wrap response in a singleton tuple
response = (response,)
response = xmlrpclib.dumps(response, methodresponse=1,
allow_none=self.allow_none, encoding=self.encoding)
except Fault, fault:
response = xmlrpclib.dumps(fault, allow_none=self.allow_none,
encoding=self.encoding)
except:
# report exception back to server
exc_type, exc_value, exc_tb = sys.exc_info()
response = xmlrpclib.dumps(
xmlrpclib.Fault(1, "%s:%s" % (exc_type, exc_value)),
encoding=self.encoding, allow_none=self.allow_none,
)
return response
def system_listMethods(self):
"""system.listMethods() => ['add', 'subtract', 'multiple']
Returns a list of the methods supported by the server."""
methods = self.funcs.keys()
if self.instance is not None:
# Instance can implement _listMethod to return a list of
# methods
if hasattr(self.instance, '_listMethods'):
methods = remove_duplicates(
methods + self.instance._listMethods()
)
# if the instance has a _dispatch method then we
# don't have enough information to provide a list
# of methods
elif not hasattr(self.instance, '_dispatch'):
methods = remove_duplicates(
methods + list_public_methods(self.instance)
)
methods.sort()
return methods
def system_methodSignature(self, method_name):
"""system.methodSignature('add') => [double, int, int]
Returns a list describing the signature of the method. In the
above example, the add method takes two integers as arguments
and returns a double result.
This server does NOT support system.methodSignature."""
# See http://xmlrpc.usefulinc.com/doc/sysmethodsig.html
return 'signatures not supported'
def system_methodHelp(self, method_name):
"""system.methodHelp('add') => "Adds two integers together"
Returns a string containing documentation for the specified method."""
method = None
if method_name in self.funcs:
method = self.funcs[method_name]
elif self.instance is not None:
# Instance can implement _methodHelp to return help for a method
if hasattr(self.instance, '_methodHelp'):
return self.instance._methodHelp(method_name)
# if the instance has a _dispatch method then we
# don't have enough information to provide help
elif not hasattr(self.instance, '_dispatch'):
try:
method = resolve_dotted_attribute(
self.instance,
method_name,
self.allow_dotted_names
)
except AttributeError:
pass
# Note that we aren't checking that the method actually
# be a callable object of some kind
if method is None:
return ""
else:
import pydoc
return pydoc.getdoc(method)
def system_multicall(self, call_list):
"""system.multicall([{'methodName': 'add', 'params': [2, 2]}, ...]) => \
[[4], ...]
Allows the caller to package multiple XML-RPC calls into a single
request.
See http://www.xmlrpc.com/discuss/msgReader$1208
"""
results = []
for call in call_list:
method_name = call['methodName']
params = call['params']
try:
# XXX A marshalling error in any response will fail the entire
# multicall. If someone cares they should fix this.
results.append([self._dispatch(method_name, params)])
except Fault, fault:
results.append(
{'faultCode' : fault.faultCode,
'faultString' : fault.faultString}
)
except:
exc_type, exc_value, exc_tb = sys.exc_info()
results.append(
{'faultCode' : 1,
'faultString' : "%s:%s" % (exc_type, exc_value)}
)
return results
def _dispatch(self, method, params):
"""Dispatches the XML-RPC method.
XML-RPC calls are forwarded to a registered function that
matches the called XML-RPC method name. If no such function
exists then the call is forwarded to the registered instance,
if available.
If the registered instance has a _dispatch method then that
method will be called with the name of the XML-RPC method and
its parameters as a tuple
e.g. instance._dispatch('add',(2,3))
If the registered instance does not have a _dispatch method
then the instance will be searched to find a matching method
and, if found, will be called.
Methods beginning with an '_' are considered private and will
not be called.
"""
func = None
try:
# check to see if a matching function has been registered
func = self.funcs[method]
except KeyError:
if self.instance is not None:
# check for a _dispatch method
if hasattr(self.instance, '_dispatch'):
return self.instance._dispatch(method, params)
else:
# call instance method directly
try:
func = resolve_dotted_attribute(
self.instance,
method,
self.allow_dotted_names
)
except AttributeError:
pass
if func is not None:
return func(*params)
else:
raise Exception('method "%s" is not supported' % method)
class SimpleXMLRPCRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
"""Simple XML-RPC request handler class.
Handles all HTTP POST requests and attempts to decode them as
XML-RPC requests.
"""
# Class attribute listing the accessible path components;
# paths not on this list will result in a 404 error.
rpc_paths = ('/', '/RPC2')
#if not None, encode responses larger than this, if possible
encode_threshold = 1400 #a common MTU
#Override form StreamRequestHandler: full buffering of output
#and no Nagle.
wbufsize = -1
disable_nagle_algorithm = True
# a re to match a gzip Accept-Encoding
aepattern = re.compile(r"""
\s* ([^\s;]+) \s* #content-coding
(;\s* q \s*=\s* ([0-9\.]+))? #q
""", re.VERBOSE | re.IGNORECASE)
def accept_encodings(self):
r = {}
ae = self.headers.get("Accept-Encoding", "")
for e in ae.split(","):
match = self.aepattern.match(e)
if match:
v = match.group(3)
v = float(v) if v else 1.0
r[match.group(1)] = v
return r
def is_rpc_path_valid(self):
if self.rpc_paths:
return self.path in self.rpc_paths
else:
# If .rpc_paths is empty, just assume all paths are legal
return True
def do_POST(self):
"""Handles the HTTP POST request.
Attempts to interpret all HTTP POST requests as XML-RPC calls,
which are forwarded to the server's _dispatch method for handling.
"""
# Check that the path is legal
if not self.is_rpc_path_valid():
self.report_404()
return
try:
# Get arguments by reading body of request.
# We read this in chunks to avoid straining
# socket.read(); around the 10 or 15Mb mark, some platforms
# begin to have problems (bug #792570).
max_chunk_size = 10*1024*1024
size_remaining = int(self.headers["content-length"])
L = []
while size_remaining:
chunk_size = min(size_remaining, max_chunk_size)
chunk = self.rfile.read(chunk_size)
if not chunk:
break
L.append(chunk)
size_remaining -= len(L[-1])
data = ''.join(L)
data = self.decode_request_content(data)
if data is None:
return #response has been sent
# In previous versions of SimpleXMLRPCServer, _dispatch
# could be overridden in this class, instead of in
# SimpleXMLRPCDispatcher. To maintain backwards compatibility,
# check to see if a subclass implements _dispatch and dispatch
# using that method if present.
response = self.server._marshaled_dispatch(
data, getattr(self, '_dispatch', None), self.path
)
except Exception, e: # This should only happen if the module is buggy
# internal error, report as HTTP server error
self.send_response(500)
# Send information about the exception if requested
if hasattr(self.server, '_send_traceback_header') and \
self.server._send_traceback_header:
self.send_header("X-exception", str(e))
self.send_header("X-traceback", traceback.format_exc())
self.send_header("Content-length", "0")
self.end_headers()
else:
# got a valid XML RPC response
self.send_response(200)
self.send_header("Content-type", "text/xml")
if self.encode_threshold is not None:
if len(response) > self.encode_threshold:
q = self.accept_encodings().get("gzip", 0)
if q:
try:
response = xmlrpclib.gzip_encode(response)
self.send_header("Content-Encoding", "gzip")
except NotImplementedError:
pass
self.send_header("Content-length", str(len(response)))
self.end_headers()
self.wfile.write(response)
def decode_request_content(self, data):
#support gzip encoding of request
encoding = self.headers.get("content-encoding", "identity").lower()
if encoding == "identity":
return data
if encoding == "gzip":
try:
return xmlrpclib.gzip_decode(data)
except NotImplementedError:
self.send_response(501, "encoding %r not supported" % encoding)
except ValueError:
self.send_response(400, "error decoding gzip content")
else:
self.send_response(501, "encoding %r not supported" % encoding)
self.send_header("Content-length", "0")
self.end_headers()
def report_404 (self):
# Report a 404 error
self.send_response(404)
response = 'No such page'
self.send_header("Content-type", "text/plain")
self.send_header("Content-length", str(len(response)))
self.end_headers()
self.wfile.write(response)
def log_request(self, code='-', size='-'):
"""Selectively log an accepted request."""
if self.server.logRequests:
BaseHTTPServer.BaseHTTPRequestHandler.log_request(self, code, size)
class SimpleXMLRPCServer(SocketServer.TCPServer,
SimpleXMLRPCDispatcher):
"""Simple XML-RPC server.
Simple XML-RPC server that allows functions and a single instance
to be installed to handle requests. The default implementation
attempts to dispatch XML-RPC calls to the functions or instance
installed in the server. Override the _dispatch method inhereted
from SimpleXMLRPCDispatcher to change this behavior.
"""
allow_reuse_address = True
# Warning: this is for debugging purposes only! Never set this to True in
# production code, as will be sending out sensitive information (exception
# and stack trace details) when exceptions are raised inside
# SimpleXMLRPCRequestHandler.do_POST
_send_traceback_header = False
def __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler,
logRequests=True, allow_none=False, encoding=None, bind_and_activate=True):
self.logRequests = logRequests
SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding)
SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate)
# [Bug #1222790] If possible, set close-on-exec flag; if a
# method spawns a subprocess, the subprocess shouldn't have
# the listening socket open.
if fcntl is not None and hasattr(fcntl, 'FD_CLOEXEC'):
flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD)
flags |= fcntl.FD_CLOEXEC
fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags)
class MultiPathXMLRPCServer(SimpleXMLRPCServer):
"""Multipath XML-RPC Server
This specialization of SimpleXMLRPCServer allows the user to create
multiple Dispatcher instances and assign them to different
HTTP request paths. This makes it possible to run two or more
'virtual XML-RPC servers' at the same port.
Make sure that the requestHandler accepts the paths in question.
"""
def __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler,
logRequests=True, allow_none=False, encoding=None, bind_and_activate=True):
SimpleXMLRPCServer.__init__(self, addr, requestHandler, logRequests, allow_none,
encoding, bind_and_activate)
self.dispatchers = {}
self.allow_none = allow_none
self.encoding = encoding
def add_dispatcher(self, path, dispatcher):
self.dispatchers[path] = dispatcher
return dispatcher
def get_dispatcher(self, path):
return self.dispatchers[path]
def _marshaled_dispatch(self, data, dispatch_method = None, path = None):
try:
response = self.dispatchers[path]._marshaled_dispatch(
data, dispatch_method, path)
except:
# report low level exception back to server
# (each dispatcher should have handled their own
# exceptions)
exc_type, exc_value = sys.exc_info()[:2]
response = xmlrpclib.dumps(
xmlrpclib.Fault(1, "%s:%s" % (exc_type, exc_value)),
encoding=self.encoding, allow_none=self.allow_none)
return response
class CGIXMLRPCRequestHandler(SimpleXMLRPCDispatcher):
"""Simple handler for XML-RPC data passed through CGI."""
def __init__(self, allow_none=False, encoding=None):
SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding)
def handle_xmlrpc(self, request_text):
"""Handle a single XML-RPC request"""
response = self._marshaled_dispatch(request_text)
print 'Content-Type: text/xml'
print 'Content-Length: %d' % len(response)
print
sys.stdout.write(response)
def handle_get(self):
"""Handle a single HTTP GET request.
Default implementation indicates an error because
XML-RPC uses the POST method.
"""
code = 400
message, explain = \
BaseHTTPServer.BaseHTTPRequestHandler.responses[code]
response = BaseHTTPServer.DEFAULT_ERROR_MESSAGE % \
{
'code' : code,
'message' : message,
'explain' : explain
}
print 'Status: %d %s' % (code, message)
print 'Content-Type: %s' % BaseHTTPServer.DEFAULT_ERROR_CONTENT_TYPE
print 'Content-Length: %d' % len(response)
print
sys.stdout.write(response)
def handle_request(self, request_text = None):
"""Handle a single XML-RPC request passed through a CGI post method.
If no XML data is given then it is read from stdin. The resulting
XML-RPC response is printed to stdout along with the correct HTTP
headers.
"""
if request_text is None and \
os.environ.get('REQUEST_METHOD', None) == 'GET':
self.handle_get()
else:
# POST data is normally available through stdin
try:
length = int(os.environ.get('CONTENT_LENGTH', None))
except (TypeError, ValueError):
length = -1
if request_text is None:
request_text = sys.stdin.read(length)
self.handle_xmlrpc(request_text)
if __name__ == '__main__':
print 'Running XML-RPC server on port 8000'
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_function(pow)
server.register_function(lambda x,y: x+y, 'add')
server.register_multicall_functions()
server.serve_forever()
This directory exists so that 3rd party packages can be installed
here. Read the source for site.py for more details.
"""Append module search paths for third-party packages to sys.path.
****************************************************************
* This module is automatically imported during initialization. *
****************************************************************
In earlier versions of Python (up to 1.5a3), scripts or modules that
needed to use site-specific modules would place ``import site''
somewhere near the top of their code. Because of the automatic
import, this is no longer necessary (but code that does it still
works).
This will append site-specific paths to the module search path. On
Unix (including Mac OSX), it starts with sys.prefix and
sys.exec_prefix (if different) and appends
lib/python<version>/site-packages as well as lib/site-python.
On other platforms (such as Windows), it tries each of the
prefixes directly, as well as with lib/site-packages appended. The
resulting directories, if they exist, are appended to sys.path, and
also inspected for path configuration files.
A path configuration file is a file whose name has the form
<package>.pth; its contents are additional directories (one per line)
to be added to sys.path. Non-existing directories (or
non-directories) are never added to sys.path; no directory is added to
sys.path more than once. Blank lines and lines beginning with
'#' are skipped. Lines starting with 'import' are executed.
For example, suppose sys.prefix and sys.exec_prefix are set to
/usr/local and there is a directory /usr/local/lib/python2.5/site-packages
with three subdirectories, foo, bar and spam, and two path
configuration files, foo.pth and bar.pth. Assume foo.pth contains the
following:
# foo package configuration
foo
bar
bletch
and bar.pth contains:
# bar package configuration
bar
Then the following directories are added to sys.path, in this order:
/usr/local/lib/python2.5/site-packages/bar
/usr/local/lib/python2.5/site-packages/foo
Note that bletch is omitted because it doesn't exist; bar precedes foo
because bar.pth comes alphabetically before foo.pth; and spam is
omitted because it is not mentioned in either path configuration file.
After these path manipulations, an attempt is made to import a module
named sitecustomize, which can perform arbitrary additional
site-specific customizations. If this import fails with an
ImportError exception, it is silently ignored.
"""
import sys
import os
import __builtin__
import traceback
# Prefixes for site-packages; add additional prefixes like /usr/local here
PREFIXES = [sys.prefix, sys.exec_prefix]
# Enable per user site-packages directory
# set it to False to disable the feature or True to force the feature
ENABLE_USER_SITE = None
# for distutils.commands.install
# These values are initialized by the getuserbase() and getusersitepackages()
# functions, through the main() function when Python starts.
USER_SITE = None
USER_BASE = None
def makepath(*paths):
dir = os.path.join(*paths)
try:
dir = os.path.abspath(dir)
except OSError:
pass
return dir, os.path.normcase(dir)
def abs__file__():
"""Set all module' __file__ attribute to an absolute path"""
for m in sys.modules.values():
if hasattr(m, '__loader__'):
continue # don't mess with a PEP 302-supplied __file__
try:
m.__file__ = os.path.abspath(m.__file__)
except (AttributeError, OSError):
pass
def removeduppaths():
""" Remove duplicate entries from sys.path along with making them
absolute"""
# This ensures that the initial path provided by the interpreter contains
# only absolute pathnames, even if we're running from the build directory.
L = []
known_paths = set()
for dir in sys.path:
# Filter out duplicate paths (on case-insensitive file systems also
# if they only differ in case); turn relative paths into absolute
# paths.
dir, dircase = makepath(dir)
if not dircase in known_paths:
L.append(dir)
known_paths.add(dircase)
sys.path[:] = L
return known_paths
def _init_pathinfo():
"""Return a set containing all existing directory entries from sys.path"""
d = set()
for dir in sys.path:
try:
if os.path.isdir(dir):
dir, dircase = makepath(dir)
d.add(dircase)
except TypeError:
continue
return d
def addpackage(sitedir, name, known_paths):
"""Process a .pth file within the site-packages directory:
For each line in the file, either combine it with sitedir to a path
and add that to known_paths, or execute it if it starts with 'import '.
"""
if known_paths is None:
_init_pathinfo()
reset = 1
else:
reset = 0
fullname = os.path.join(sitedir, name)
try:
f = open(fullname, "rU")
except IOError:
return
with f:
for n, line in enumerate(f):
if line.startswith("#"):
continue
try:
if line.startswith(("import ", "import\t")):
exec line
continue
line = line.rstrip()
dir, dircase = makepath(sitedir, line)
if not dircase in known_paths and os.path.exists(dir):
sys.path.append(dir)
known_paths.add(dircase)
except Exception as err:
print >>sys.stderr, "Error processing line {:d} of {}:\n".format(
n+1, fullname)
for record in traceback.format_exception(*sys.exc_info()):
for line in record.splitlines():
print >>sys.stderr, ' '+line
print >>sys.stderr, "\nRemainder of file ignored"
break
if reset:
known_paths = None
return known_paths
def addsitedir(sitedir, known_paths=None):
"""Add 'sitedir' argument to sys.path if missing and handle .pth files in
'sitedir'"""
if known_paths is None:
known_paths = _init_pathinfo()
reset = 1
else:
reset = 0
sitedir, sitedircase = makepath(sitedir)
if not sitedircase in known_paths:
sys.path.append(sitedir) # Add path component
try:
names = os.listdir(sitedir)
except os.error:
return
dotpth = os.extsep + "pth"
names = [name for name in names if name.endswith(dotpth)]
for name in sorted(names):
addpackage(sitedir, name, known_paths)
if reset:
known_paths = None
return known_paths
def check_enableusersite():
"""Check if user site directory is safe for inclusion
The function tests for the command line flag (including environment var),
process uid/gid equal to effective uid/gid.
None: Disabled for security reasons
False: Disabled by user (command line option)
True: Safe and enabled
"""
if sys.flags.no_user_site:
return False
if hasattr(os, "getuid") and hasattr(os, "geteuid"):
# check process uid == effective uid
if os.geteuid() != os.getuid():
return None
if hasattr(os, "getgid") and hasattr(os, "getegid"):
# check process gid == effective gid
if os.getegid() != os.getgid():
return None
return True
def getuserbase():
"""Returns the `user base` directory path.
The `user base` directory can be used to store data. If the global
variable ``USER_BASE`` is not initialized yet, this function will also set
it.
"""
global USER_BASE
if USER_BASE is not None:
return USER_BASE
from sysconfig import get_config_var
USER_BASE = get_config_var('userbase')
return USER_BASE
def getusersitepackages():
"""Returns the user-specific site-packages directory path.
If the global variable ``USER_SITE`` is not initialized yet, this
function will also set it.
"""
global USER_SITE
user_base = getuserbase() # this will also set USER_BASE
if USER_SITE is not None:
return USER_SITE
from sysconfig import get_path
import os
if sys.platform == 'darwin':
from sysconfig import get_config_var
if get_config_var('PYTHONFRAMEWORK'):
USER_SITE = get_path('purelib', 'osx_framework_user')
return USER_SITE
USER_SITE = get_path('purelib', '%s_user' % os.name)
return USER_SITE
def addusersitepackages(known_paths):
"""Add a per user site-package to sys.path
Each user has its own python directory with site-packages in the
home directory.
"""
# get the per user site-package path
# this call will also make sure USER_BASE and USER_SITE are set
user_site = getusersitepackages()
if ENABLE_USER_SITE and os.path.isdir(user_site):
addsitedir(user_site, known_paths)
return known_paths
def getsitepackages():
"""Returns a list containing all global site-packages directories
(and possibly site-python).
For each directory present in the global ``PREFIXES``, this function
will find its `site-packages` subdirectory depending on the system
environment, and will return a list of full paths.
"""
sitepackages = []
seen = set()
for prefix in PREFIXES:
if not prefix or prefix in seen:
continue
seen.add(prefix)
if sys.platform in ('os2emx', 'riscos', 'cli'):
sitepackages.append(os.path.join(prefix, "Lib", "site-packages"))
elif os.sep == '/':
sitepackages.append(os.path.join(prefix, "lib",
"python" + sys.version[:3],
"site-packages"))
sitepackages.append(os.path.join(prefix, "lib", "site-python"))
else:
sitepackages.append(prefix)
sitepackages.append(os.path.join(prefix, "lib", "site-packages"))
return sitepackages
def addsitepackages(known_paths):
"""Add site-packages (and possibly site-python) to sys.path"""
for sitedir in getsitepackages():
if os.path.isdir(sitedir):
addsitedir(sitedir, known_paths)
return known_paths
def setBEGINLIBPATH():
"""The OS/2 EMX port has optional extension modules that do double duty
as DLLs (and must use the .DLL file extension) for other extensions.
The library search path needs to be amended so these will be found
during module import. Use BEGINLIBPATH so that these are at the start
of the library search path.
"""
dllpath = os.path.join(sys.prefix, "Lib", "lib-dynload")
libpath = os.environ['BEGINLIBPATH'].split(';')
if libpath[-1]:
libpath.append(dllpath)
else:
libpath[-1] = dllpath
os.environ['BEGINLIBPATH'] = ';'.join(libpath)
def setquit():
"""Define new builtins 'quit' and 'exit'.
These are objects which make the interpreter exit when called.
The repr of each object contains a hint at how it works.
"""
if os.sep == ':':
eof = 'Cmd-Q'
elif os.sep == '\\':
eof = 'Ctrl-Z plus Return'
else:
eof = 'Ctrl-D (i.e. EOF)'
class Quitter(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return 'Use %s() or %s to exit' % (self.name, eof)
def __call__(self, code=None):
# Shells like IDLE catch the SystemExit, but listen when their
# stdin wrapper is closed.
try:
sys.stdin.close()
except:
pass
raise SystemExit(code)
__builtin__.quit = Quitter('quit')
__builtin__.exit = Quitter('exit')
class _Printer(object):
"""interactive prompt objects for printing the license text, a list of
contributors and the copyright notice."""
MAXLINES = 23
def __init__(self, name, data, files=(), dirs=()):
self.__name = name
self.__data = data
self.__files = files
self.__dirs = dirs
self.__lines = None
def __setup(self):
if self.__lines:
return
data = None
for dir in self.__dirs:
for filename in self.__files:
filename = os.path.join(dir, filename)
try:
fp = file(filename, "rU")
data = fp.read()
fp.close()
break
except IOError:
pass
if data:
break
if not data:
data = self.__data
self.__lines = data.split('\n')
self.__linecnt = len(self.__lines)
def __repr__(self):
self.__setup()
if len(self.__lines) <= self.MAXLINES:
return "\n".join(self.__lines)
else:
return "Type %s() to see the full %s text" % ((self.__name,)*2)
def __call__(self):
self.__setup()
prompt = 'Hit Return for more, or q (and Return) to quit: '
lineno = 0
while 1:
try:
for i in range(lineno, lineno + self.MAXLINES):
print self.__lines[i]
except IndexError:
break
else:
lineno += self.MAXLINES
key = None
while key is None:
key = raw_input(prompt)
if key not in ('', 'q'):
key = None
if key == 'q':
break
def setcopyright():
"""Set 'copyright' and 'credits' in __builtin__"""
__builtin__.copyright = _Printer("copyright", sys.copyright)
if sys.platform[:4] == 'java':
__builtin__.credits = _Printer(
"credits",
"Jython is maintained by the Jython developers (www.jython.org).")
elif sys.platform == 'cli':
__builtin__.credits = _Printer(
"credits",
"IronPython is maintained by the IronPython developers (www.ironpython.net).")
else:
__builtin__.credits = _Printer("credits", """\
Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
for supporting Python development. See www.python.org for more information.""")
here = os.path.dirname(os.__file__)
__builtin__.license = _Printer(
"license", "See https://www.python.org/psf/license/",
["LICENSE.txt", "LICENSE"],
[os.path.join(here, os.pardir), here, os.curdir])
class _Helper(object):
"""Define the builtin 'help'.
This is a wrapper around pydoc.help (with a twist).
"""
def __repr__(self):
return "Type help() for interactive help, " \
"or help(object) for help about object."
def __call__(self, *args, **kwds):
import pydoc
return pydoc.help(*args, **kwds)
def sethelper():
__builtin__.help = _Helper()
def aliasmbcs():
"""On Windows, some default encodings are not provided by Python,
while they are always available as "mbcs" in each locale. Make
them usable by aliasing to "mbcs" in such a case."""
if sys.platform == 'win32':
import locale, codecs
enc = locale.getdefaultlocale()[1]
if enc.startswith('cp'): # "cp***" ?
try:
codecs.lookup(enc)
except LookupError:
import encodings
encodings._cache[enc] = encodings._unknown
encodings.aliases.aliases[enc] = 'mbcs'
def setencoding():
"""Set the string encoding used by the Unicode implementation. The
default is 'ascii', but if you're willing to experiment, you can
change this."""
encoding = "ascii" # Default value set by _PyUnicode_Init()
if 0:
# Enable to support locale aware default string encodings.
import locale
loc = locale.getdefaultlocale()
if loc[1]:
encoding = loc[1]
if 0:
# Enable to switch off string to Unicode coercion and implicit
# Unicode to string conversion.
encoding = "undefined"
if encoding != "ascii":
# On Non-Unicode builds this will raise an AttributeError...
sys.setdefaultencoding(encoding) # Needs Python Unicode build !
def execsitecustomize():
"""Run custom site specific code, if available."""
try:
import sitecustomize
except ImportError:
pass
except Exception:
if sys.flags.verbose:
sys.excepthook(*sys.exc_info())
else:
print >>sys.stderr, \
"'import sitecustomize' failed; use -v for traceback"
def execusercustomize():
"""Run custom user specific code, if available."""
try:
import usercustomize
except ImportError:
pass
except Exception:
if sys.flags.verbose:
sys.excepthook(*sys.exc_info())
else:
print>>sys.stderr, \
"'import usercustomize' failed; use -v for traceback"
def main():
global ENABLE_USER_SITE
abs__file__()
known_paths = removeduppaths()
if ENABLE_USER_SITE is None:
ENABLE_USER_SITE = check_enableusersite()
known_paths = addusersitepackages(known_paths)
known_paths = addsitepackages(known_paths)
if sys.platform == 'os2emx':
setBEGINLIBPATH()
setquit()
setcopyright()
sethelper()
aliasmbcs()
setencoding()
execsitecustomize()
if ENABLE_USER_SITE:
execusercustomize()
# Remove sys.setdefaultencoding() so that users cannot change the
# encoding after initialization. The test for presence is needed when
# this module is run as a script, because this code is executed twice.
if hasattr(sys, "setdefaultencoding"):
del sys.setdefaultencoding
main()
def _script():
help = """\
%s [--user-base] [--user-site]
Without arguments print some useful information
With arguments print the value of USER_BASE and/or USER_SITE separated
by '%s'.
Exit codes with --user-base or --user-site:
0 - user site directory is enabled
1 - user site directory is disabled by user
2 - uses site directory is disabled by super user
or for security reasons
>2 - unknown error
"""
args = sys.argv[1:]
if not args:
print "sys.path = ["
for dir in sys.path:
print " %r," % (dir,)
print "]"
print "USER_BASE: %r (%s)" % (USER_BASE,
"exists" if os.path.isdir(USER_BASE) else "doesn't exist")
print "USER_SITE: %r (%s)" % (USER_SITE,
"exists" if os.path.isdir(USER_SITE) else "doesn't exist")
print "ENABLE_USER_SITE: %r" % ENABLE_USER_SITE
sys.exit(0)
buffer = []
if '--user-base' in args:
buffer.append(USER_BASE)
if '--user-site' in args:
buffer.append(USER_SITE)
if buffer:
print os.pathsep.join(buffer)
if ENABLE_USER_SITE:
sys.exit(0)
elif ENABLE_USER_SITE is False:
sys.exit(1)
elif ENABLE_USER_SITE is None:
sys.exit(2)
else:
sys.exit(3)
else:
import textwrap
print textwrap.dedent(help % (sys.argv[0], os.pathsep))
sys.exit(10)
if __name__ == '__main__':
_script()
#! /usr/bin/env python
"""An RFC 2821 smtp proxy.
Usage: %(program)s [options] [localhost:localport [remotehost:remoteport]]
Options:
--nosetuid
-n
This program generally tries to setuid `nobody', unless this flag is
set. The setuid call will fail if this program is not run as root (in
which case, use this flag).
--version
-V
Print the version number and exit.
--class classname
-c classname
Use `classname' as the concrete SMTP proxy class. Uses `PureProxy' by
default.
--debug
-d
Turn on debugging prints.
--help
-h
Print this message and exit.
Version: %(__version__)s
If localhost is not given then `localhost' is used, and if localport is not
given then 8025 is used. If remotehost is not given then `localhost' is used,
and if remoteport is not given, then 25 is used.
"""
# Overview:
#
# This file implements the minimal SMTP protocol as defined in RFC 821. It
# has a hierarchy of classes which implement the backend functionality for the
# smtpd. A number of classes are provided:
#
# SMTPServer - the base class for the backend. Raises NotImplementedError
# if you try to use it.
#
# DebuggingServer - simply prints each message it receives on stdout.
#
# PureProxy - Proxies all messages to a real smtpd which does final
# delivery. One known problem with this class is that it doesn't handle
# SMTP errors from the backend server at all. This should be fixed
# (contributions are welcome!).
#
# MailmanProxy - An experimental hack to work with GNU Mailman
# <www.list.org>. Using this server as your real incoming smtpd, your
# mailhost will automatically recognize and accept mail destined to Mailman
# lists when those lists are created. Every message not destined for a list
# gets forwarded to a real backend smtpd, as with PureProxy. Again, errors
# are not handled correctly yet.
#
# Please note that this script requires Python 2.0
#
# Author: Barry Warsaw <[email protected]>
#
# TODO:
#
# - support mailbox delivery
# - alias files
# - ESMTP
# - handle error codes from the backend smtpd
import sys
import os
import errno
import getopt
import time
import socket
import asyncore
import asynchat
__all__ = ["SMTPServer","DebuggingServer","PureProxy","MailmanProxy"]
program = sys.argv[0]
__version__ = 'Python SMTP proxy version 0.2'
class Devnull:
def write(self, msg): pass
def flush(self): pass
DEBUGSTREAM = Devnull()
NEWLINE = '\n'
EMPTYSTRING = ''
COMMASPACE = ', '
def usage(code, msg=''):
print >> sys.stderr, __doc__ % globals()
if msg:
print >> sys.stderr, msg
sys.exit(code)
class SMTPChannel(asynchat.async_chat):
COMMAND = 0
DATA = 1
def __init__(self, server, conn, addr):
asynchat.async_chat.__init__(self, conn)
self.__server = server
self.__conn = conn
self.__addr = addr
self.__line = []
self.__state = self.COMMAND
self.__greeting = 0
self.__mailfrom = None
self.__rcpttos = []
self.__data = ''
self.__fqdn = socket.getfqdn()
try:
self.__peer = conn.getpeername()
except socket.error, err:
# a race condition may occur if the other end is closing
# before we can get the peername
self.close()
if err[0] != errno.ENOTCONN:
raise
return
print >> DEBUGSTREAM, 'Peer:', repr(self.__peer)
self.push('220 %s %s' % (self.__fqdn, __version__))
self.set_terminator('\r\n')
# Overrides base class for convenience
def push(self, msg):
asynchat.async_chat.push(self, msg + '\r\n')
# Implementation of base class abstract method
def collect_incoming_data(self, data):
self.__line.append(data)
# Implementation of base class abstract method
def found_terminator(self):
line = EMPTYSTRING.join(self.__line)
print >> DEBUGSTREAM, 'Data:', repr(line)
self.__line = []
if self.__state == self.COMMAND:
if not line:
self.push('500 Error: bad syntax')
return
method = None
i = line.find(' ')
if i < 0:
command = line.upper()
arg = None
else:
command = line[:i].upper()
arg = line[i+1:].strip()
method = getattr(self, 'smtp_' + command, None)
if not method:
self.push('502 Error: command "%s" not implemented' % command)
return
method(arg)
return
else:
if self.__state != self.DATA:
self.push('451 Internal confusion')
return
# Remove extraneous carriage returns and de-transparency according
# to RFC 821, Section 4.5.2.
data = []
for text in line.split('\r\n'):
if text and text[0] == '.':
data.append(text[1:])
else:
data.append(text)
self.__data = NEWLINE.join(data)
status = self.__server.process_message(self.__peer,
self.__mailfrom,
self.__rcpttos,
self.__data)
self.__rcpttos = []
self.__mailfrom = None
self.__state = self.COMMAND
self.set_terminator('\r\n')
if not status:
self.push('250 Ok')
else:
self.push(status)
# SMTP and ESMTP commands
def smtp_HELO(self, arg):
if not arg:
self.push('501 Syntax: HELO hostname')
return
if self.__greeting:
self.push('503 Duplicate HELO/EHLO')
else:
self.__greeting = arg
self.push('250 %s' % self.__fqdn)
def smtp_NOOP(self, arg):
if arg:
self.push('501 Syntax: NOOP')
else:
self.push('250 Ok')
def smtp_QUIT(self, arg):
# args is ignored
self.push('221 Bye')
self.close_when_done()
# factored
def __getaddr(self, keyword, arg):
address = None
keylen = len(keyword)
if arg[:keylen].upper() == keyword:
address = arg[keylen:].strip()
if not address:
pass
elif address[0] == '<' and address[-1] == '>' and address != '<>':
# Addresses can be in the form <[email protected]> but watch out
# for null address, e.g. <>
address = address[1:-1]
return address
def smtp_MAIL(self, arg):
print >> DEBUGSTREAM, '===> MAIL', arg
address = self.__getaddr('FROM:', arg) if arg else None
if not address:
self.push('501 Syntax: MAIL FROM:<address>')
return
if self.__mailfrom:
self.push('503 Error: nested MAIL command')
return
self.__mailfrom = address
print >> DEBUGSTREAM, 'sender:', self.__mailfrom
self.push('250 Ok')
def smtp_RCPT(self, arg):
print >> DEBUGSTREAM, '===> RCPT', arg
if not self.__mailfrom:
self.push('503 Error: need MAIL command')
return
address = self.__getaddr('TO:', arg) if arg else None
if not address:
self.push('501 Syntax: RCPT TO: <address>')
return
self.__rcpttos.append(address)
print >> DEBUGSTREAM, 'recips:', self.__rcpttos
self.push('250 Ok')
def smtp_RSET(self, arg):
if arg:
self.push('501 Syntax: RSET')
return
# Resets the sender, recipients, and data, but not the greeting
self.__mailfrom = None
self.__rcpttos = []
self.__data = ''
self.__state = self.COMMAND
self.push('250 Ok')
def smtp_DATA(self, arg):
if not self.__rcpttos:
self.push('503 Error: need RCPT command')
return
if arg:
self.push('501 Syntax: DATA')
return
self.__state = self.DATA
self.set_terminator('\r\n.\r\n')
self.push('354 End data with <CR><LF>.<CR><LF>')
class SMTPServer(asyncore.dispatcher):
def __init__(self, localaddr, remoteaddr):
self._localaddr = localaddr
self._remoteaddr = remoteaddr
asyncore.dispatcher.__init__(self)
try:
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
# try to re-use a server port if possible
self.set_reuse_addr()
self.bind(localaddr)
self.listen(5)
except:
# cleanup asyncore.socket_map before raising
self.close()
raise
else:
print >> DEBUGSTREAM, \
'%s started at %s\n\tLocal addr: %s\n\tRemote addr:%s' % (
self.__class__.__name__, time.ctime(time.time()),
localaddr, remoteaddr)
def handle_accept(self):
pair = self.accept()
if pair is not None:
conn, addr = pair
print >> DEBUGSTREAM, 'Incoming connection from %s' % repr(addr)
channel = SMTPChannel(self, conn, addr)
# API for "doing something useful with the message"
def process_message(self, peer, mailfrom, rcpttos, data):
"""Override this abstract method to handle messages from the client.
peer is a tuple containing (ipaddr, port) of the client that made the
socket connection to our smtp port.
mailfrom is the raw address the client claims the message is coming
from.
rcpttos is a list of raw addresses the client wishes to deliver the
message to.
data is a string containing the entire full text of the message,
headers (if supplied) and all. It has been `de-transparencied'
according to RFC 821, Section 4.5.2. In other words, a line
containing a `.' followed by other text has had the leading dot
removed.
This function should return None, for a normal `250 Ok' response;
otherwise it returns the desired response string in RFC 821 format.
"""
raise NotImplementedError
class DebuggingServer(SMTPServer):
# Do something with the gathered message
def process_message(self, peer, mailfrom, rcpttos, data):
inheaders = 1
lines = data.split('\n')
print '---------- MESSAGE FOLLOWS ----------'
for line in lines:
# headers first
if inheaders and not line:
print 'X-Peer:', peer[0]
inheaders = 0
print line
print '------------ END MESSAGE ------------'
class PureProxy(SMTPServer):
def process_message(self, peer, mailfrom, rcpttos, data):
lines = data.split('\n')
# Look for the last header
i = 0
for line in lines:
if not line:
break
i += 1
lines.insert(i, 'X-Peer: %s' % peer[0])
data = NEWLINE.join(lines)
refused = self._deliver(mailfrom, rcpttos, data)
# TBD: what to do with refused addresses?
print >> DEBUGSTREAM, 'we got some refusals:', refused
def _deliver(self, mailfrom, rcpttos, data):
import smtplib
refused = {}
try:
s = smtplib.SMTP()
s.connect(self._remoteaddr[0], self._remoteaddr[1])
try:
refused = s.sendmail(mailfrom, rcpttos, data)
finally:
s.quit()
except smtplib.SMTPRecipientsRefused, e:
print >> DEBUGSTREAM, 'got SMTPRecipientsRefused'
refused = e.recipients
except (socket.error, smtplib.SMTPException), e:
print >> DEBUGSTREAM, 'got', e.__class__
# All recipients were refused. If the exception had an associated
# error code, use it. Otherwise,fake it with a non-triggering
# exception code.
errcode = getattr(e, 'smtp_code', -1)
errmsg = getattr(e, 'smtp_error', 'ignore')
for r in rcpttos:
refused[r] = (errcode, errmsg)
return refused
class MailmanProxy(PureProxy):
def process_message(self, peer, mailfrom, rcpttos, data):
from cStringIO import StringIO
from Mailman import Utils
from Mailman import Message
from Mailman import MailList
# If the message is to a Mailman mailing list, then we'll invoke the
# Mailman script directly, without going through the real smtpd.
# Otherwise we'll forward it to the local proxy for disposition.
listnames = []
for rcpt in rcpttos:
local = rcpt.lower().split('@')[0]
# We allow the following variations on the theme
# listname
# listname-admin
# listname-owner
# listname-request
# listname-join
# listname-leave
parts = local.split('-')
if len(parts) > 2:
continue
listname = parts[0]
if len(parts) == 2:
command = parts[1]
else:
command = ''
if not Utils.list_exists(listname) or command not in (
'', 'admin', 'owner', 'request', 'join', 'leave'):
continue
listnames.append((rcpt, listname, command))
# Remove all list recipients from rcpttos and forward what we're not
# going to take care of ourselves. Linear removal should be fine
# since we don't expect a large number of recipients.
for rcpt, listname, command in listnames:
rcpttos.remove(rcpt)
# If there's any non-list destined recipients left,
print >> DEBUGSTREAM, 'forwarding recips:', ' '.join(rcpttos)
if rcpttos:
refused = self._deliver(mailfrom, rcpttos, data)
# TBD: what to do with refused addresses?
print >> DEBUGSTREAM, 'we got refusals:', refused
# Now deliver directly to the list commands
mlists = {}
s = StringIO(data)
msg = Message.Message(s)
# These headers are required for the proper execution of Mailman. All
# MTAs in existence seem to add these if the original message doesn't
# have them.
if not msg.getheader('from'):
msg['From'] = mailfrom
if not msg.getheader('date'):
msg['Date'] = time.ctime(time.time())
for rcpt, listname, command in listnames:
print >> DEBUGSTREAM, 'sending message to', rcpt
mlist = mlists.get(listname)
if not mlist:
mlist = MailList.MailList(listname, lock=0)
mlists[listname] = mlist
# dispatch on the type of command
if command == '':
# post
msg.Enqueue(mlist, tolist=1)
elif command == 'admin':
msg.Enqueue(mlist, toadmin=1)
elif command == 'owner':
msg.Enqueue(mlist, toowner=1)
elif command == 'request':
msg.Enqueue(mlist, torequest=1)
elif command in ('join', 'leave'):
# TBD: this is a hack!
if command == 'join':
msg['Subject'] = 'subscribe'
else:
msg['Subject'] = 'unsubscribe'
msg.Enqueue(mlist, torequest=1)
class Options:
setuid = 1
classname = 'PureProxy'
def parseargs():
global DEBUGSTREAM
try:
opts, args = getopt.getopt(
sys.argv[1:], 'nVhc:d',
['class=', 'nosetuid', 'version', 'help', 'debug'])
except getopt.error, e:
usage(1, e)
options = Options()
for opt, arg in opts:
if opt in ('-h', '--help'):
usage(0)
elif opt in ('-V', '--version'):
print >> sys.stderr, __version__
sys.exit(0)
elif opt in ('-n', '--nosetuid'):
options.setuid = 0
elif opt in ('-c', '--class'):
options.classname = arg
elif opt in ('-d', '--debug'):
DEBUGSTREAM = sys.stderr
# parse the rest of the arguments
if len(args) < 1:
localspec = 'localhost:8025'
remotespec = 'localhost:25'
elif len(args) < 2:
localspec = args[0]
remotespec = 'localhost:25'
elif len(args) < 3:
localspec = args[0]
remotespec = args[1]
else:
usage(1, 'Invalid arguments: %s' % COMMASPACE.join(args))
# split into host/port pairs
i = localspec.find(':')
if i < 0:
usage(1, 'Bad local spec: %s' % localspec)
options.localhost = localspec[:i]
try:
options.localport = int(localspec[i+1:])
except ValueError:
usage(1, 'Bad local port: %s' % localspec)
i = remotespec.find(':')
if i < 0:
usage(1, 'Bad remote spec: %s' % remotespec)
options.remotehost = remotespec[:i]
try:
options.remoteport = int(remotespec[i+1:])
except ValueError:
usage(1, 'Bad remote port: %s' % remotespec)
return options
if __name__ == '__main__':
options = parseargs()
# Become nobody
classname = options.classname
if "." in classname:
lastdot = classname.rfind(".")
mod = __import__(classname[:lastdot], globals(), locals(), [""])
classname = classname[lastdot+1:]
else:
import __main__ as mod
class_ = getattr(mod, classname)
proxy = class_((options.localhost, options.localport),
(options.remotehost, options.remoteport))
if options.setuid:
try:
import pwd
except ImportError:
print >> sys.stderr, \
'Cannot import module "pwd"; try running with -n option.'
sys.exit(1)
nobody = pwd.getpwnam('nobody')[2]
try:
os.setuid(nobody)
except OSError, e:
if e.errno != errno.EPERM: raise
print >> sys.stderr, \
'Cannot setuid "nobody"; try running with -n option.'
sys.exit(1)
try:
asyncore.loop()
except KeyboardInterrupt:
pass
#! /usr/bin/env python
'''SMTP/ESMTP client class.
This should follow RFC 821 (SMTP), RFC 1869 (ESMTP), RFC 2554 (SMTP
Authentication) and RFC 2487 (Secure SMTP over TLS).
Notes:
Please remember, when doing ESMTP, that the names of the SMTP service
extensions are NOT the same thing as the option keywords for the RCPT
and MAIL commands!
Example:
>>> import smtplib
>>> s=smtplib.SMTP("localhost")
>>> print s.help()
This is Sendmail version 8.8.4
Topics:
HELO EHLO MAIL RCPT DATA
RSET NOOP QUIT HELP VRFY
EXPN VERB ETRN DSN
For more info use "HELP <topic>".
To report bugs in the implementation send email to
[email protected].
For local information send email to Postmaster at your site.
End of HELP info
>>> s.putcmd("vrfy","someone@here")
>>> s.getreply()
(250, "Somebody OverHere <[email protected]>")
>>> s.quit()
'''
# Author: The Dragon De Monsyne <[email protected]>
# ESMTP support, test code and doc fixes added by
# Eric S. Raymond <[email protected]>
# Better RFC 821 compliance (MAIL and RCPT, and CRLF in data)
# by Carey Evans <[email protected]>, for picky mail servers.
# RFC 2554 (authentication) support by Gerhard Haering <[email protected]>.
#
# This was modified from the Python 1.5 library HTTP lib.
import socket
import re
import email.utils
import base64
import hmac
from email.base64mime import encode as encode_base64
from sys import stderr
__all__ = ["SMTPException", "SMTPServerDisconnected", "SMTPResponseException",
"SMTPSenderRefused", "SMTPRecipientsRefused", "SMTPDataError",
"SMTPConnectError", "SMTPHeloError", "SMTPAuthenticationError",
"quoteaddr", "quotedata", "SMTP"]
SMTP_PORT = 25
SMTP_SSL_PORT = 465
CRLF = "\r\n"
_MAXLINE = 8192 # more than 8 times larger than RFC 821, 4.5.3
OLDSTYLE_AUTH = re.compile(r"auth=(.*)", re.I)
# Exception classes used by this module.
class SMTPException(Exception):
"""Base class for all exceptions raised by this module."""
class SMTPServerDisconnected(SMTPException):
"""Not connected to any SMTP server.
This exception is raised when the server unexpectedly disconnects,
or when an attempt is made to use the SMTP instance before
connecting it to a server.
"""
class SMTPResponseException(SMTPException):
"""Base class for all exceptions that include an SMTP error code.
These exceptions are generated in some instances when the SMTP
server returns an error code. The error code is stored in the
`smtp_code' attribute of the error, and the `smtp_error' attribute
is set to the error message.
"""
def __init__(self, code, msg):
self.smtp_code = code
self.smtp_error = msg
self.args = (code, msg)
class SMTPSenderRefused(SMTPResponseException):
"""Sender address refused.
In addition to the attributes set by on all SMTPResponseException
exceptions, this sets `sender' to the string that the SMTP refused.
"""
def __init__(self, code, msg, sender):
self.smtp_code = code
self.smtp_error = msg
self.sender = sender
self.args = (code, msg, sender)
class SMTPRecipientsRefused(SMTPException):
"""All recipient addresses refused.
The errors for each recipient are accessible through the attribute
'recipients', which is a dictionary of exactly the same sort as
SMTP.sendmail() returns.
"""
def __init__(self, recipients):
self.recipients = recipients
self.args = (recipients,)
class SMTPDataError(SMTPResponseException):
"""The SMTP server didn't accept the data."""
class SMTPConnectError(SMTPResponseException):
"""Error during connection establishment."""
class SMTPHeloError(SMTPResponseException):
"""The server refused our HELO reply."""
class SMTPAuthenticationError(SMTPResponseException):
"""Authentication error.
Most probably the server didn't accept the username/password
combination provided.
"""
def quoteaddr(addr):
"""Quote a subset of the email addresses defined by RFC 821.
Should be able to handle anything rfc822.parseaddr can handle.
"""
m = (None, None)
try:
m = email.utils.parseaddr(addr)[1]
except AttributeError:
pass
if m == (None, None): # Indicates parse failure or AttributeError
# something weird here.. punt -ddm
return "<%s>" % addr
elif m is None:
# the sender wants an empty return address
return "<>"
else:
return "<%s>" % m
def _addr_only(addrstring):
displayname, addr = email.utils.parseaddr(addrstring)
if (displayname, addr) == ('', ''):
# parseaddr couldn't parse it, so use it as is.
return addrstring
return addr
def quotedata(data):
"""Quote data for email.
Double leading '.', and change Unix newline '\\n', or Mac '\\r' into
Internet CRLF end-of-line.
"""
return re.sub(r'(?m)^\.', '..',
re.sub(r'(?:\r\n|\n|\r(?!\n))', CRLF, data))
try:
import ssl
except ImportError:
_have_ssl = False
else:
class SSLFakeFile:
"""A fake file like object that really wraps a SSLObject.
It only supports what is needed in smtplib.
"""
def __init__(self, sslobj):
self.sslobj = sslobj
def readline(self, size=-1):
if size < 0:
size = None
str = ""
chr = None
while chr != "\n":
if size is not None and len(str) >= size:
break
chr = self.sslobj.read(1)
if not chr:
break
str += chr
return str
def close(self):
pass
_have_ssl = True
class SMTP:
"""This class manages a connection to an SMTP or ESMTP server.
SMTP Objects:
SMTP objects have the following attributes:
helo_resp
This is the message given by the server in response to the
most recent HELO command.
ehlo_resp
This is the message given by the server in response to the
most recent EHLO command. This is usually multiline.
does_esmtp
This is a True value _after you do an EHLO command_, if the
server supports ESMTP.
esmtp_features
This is a dictionary, which, if the server supports ESMTP,
will _after you do an EHLO command_, contain the names of the
SMTP service extensions this server supports, and their
parameters (if any).
Note, all extension names are mapped to lower case in the
dictionary.
See each method's docstrings for details. In general, there is a
method of the same name to perform each SMTP command. There is also a
method called 'sendmail' that will do an entire mail transaction.
"""
debuglevel = 0
file = None
helo_resp = None
ehlo_msg = "ehlo"
ehlo_resp = None
does_esmtp = 0
default_port = SMTP_PORT
def __init__(self, host='', port=0, local_hostname=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
"""Initialize a new instance.
If specified, `host' is the name of the remote host to which to
connect. If specified, `port' specifies the port to which to connect.
By default, smtplib.SMTP_PORT is used. If a host is specified the
connect method is called, and if it returns anything other than a
success code an SMTPConnectError is raised. If specified,
`local_hostname` is used as the FQDN of the local host for the
HELO/EHLO command. Otherwise, the local hostname is found using
socket.getfqdn().
"""
self.timeout = timeout
self.esmtp_features = {}
if host:
(code, msg) = self.connect(host, port)
if code != 220:
self.close()
raise SMTPConnectError(code, msg)
if local_hostname is not None:
self.local_hostname = local_hostname
else:
# RFC 2821 says we should use the fqdn in the EHLO/HELO verb, and
# if that can't be calculated, that we should use a domain literal
# instead (essentially an encoded IP address like [A.B.C.D]).
fqdn = socket.getfqdn()
if '.' in fqdn:
self.local_hostname = fqdn
else:
# We can't find an fqdn hostname, so use a domain literal
addr = '127.0.0.1'
try:
addr = socket.gethostbyname(socket.gethostname())
except socket.gaierror:
pass
self.local_hostname = '[%s]' % addr
def set_debuglevel(self, debuglevel):
"""Set the debug output level.
A non-false value results in debug messages for connection and for all
messages sent to and received from the server.
"""
self.debuglevel = debuglevel
def _get_socket(self, host, port, timeout):
# This makes it simpler for SMTP_SSL to use the SMTP connect code
# and just alter the socket connection bit.
if self.debuglevel > 0:
print>>stderr, 'connect:', (host, port)
return socket.create_connection((host, port), timeout)
def connect(self, host='localhost', port=0):
"""Connect to a host on a given port.
If the hostname ends with a colon (`:') followed by a number, and
there is no port specified, that suffix will be stripped off and the
number interpreted as the port number to use.
Note: This method is automatically invoked by __init__, if a host is
specified during instantiation.
"""
if not port and (host.find(':') == host.rfind(':')):
i = host.rfind(':')
if i >= 0:
host, port = host[:i], host[i + 1:]
try:
port = int(port)
except ValueError:
raise socket.error, "nonnumeric port"
if not port:
port = self.default_port
if self.debuglevel > 0:
print>>stderr, 'connect:', (host, port)
self.sock = self._get_socket(host, port, self.timeout)
(code, msg) = self.getreply()
if self.debuglevel > 0:
print>>stderr, "connect:", msg
return (code, msg)
def send(self, str):
"""Send `str' to the server."""
if self.debuglevel > 0:
print>>stderr, 'send:', repr(str)
if hasattr(self, 'sock') and self.sock:
try:
self.sock.sendall(str)
except socket.error:
self.close()
raise SMTPServerDisconnected('Server not connected')
else:
raise SMTPServerDisconnected('please run connect() first')
def putcmd(self, cmd, args=""):
"""Send a command to the server."""
if args == "":
str = '%s%s' % (cmd, CRLF)
else:
str = '%s %s%s' % (cmd, args, CRLF)
self.send(str)
def getreply(self):
"""Get a reply from the server.
Returns a tuple consisting of:
- server response code (e.g. '250', or such, if all goes well)
Note: returns -1 if it can't read response code.
- server response string corresponding to response code (multiline
responses are converted to a single, multiline string).
Raises SMTPServerDisconnected if end-of-file is reached.
"""
resp = []
if self.file is None:
self.file = self.sock.makefile('rb')
while 1:
try:
line = self.file.readline(_MAXLINE + 1)
except socket.error as e:
self.close()
raise SMTPServerDisconnected("Connection unexpectedly closed: "
+ str(e))
if line == '':
self.close()
raise SMTPServerDisconnected("Connection unexpectedly closed")
if self.debuglevel > 0:
print>>stderr, 'reply:', repr(line)
if len(line) > _MAXLINE:
raise SMTPResponseException(500, "Line too long.")
resp.append(line[4:].strip())
code = line[:3]
# Check that the error code is syntactically correct.
# Don't attempt to read a continuation line if it is broken.
try:
errcode = int(code)
except ValueError:
errcode = -1
break
# Check if multiline response.
if line[3:4] != "-":
break
errmsg = "\n".join(resp)
if self.debuglevel > 0:
print>>stderr, 'reply: retcode (%s); Msg: %s' % (errcode, errmsg)
return errcode, errmsg
def docmd(self, cmd, args=""):
"""Send a command, and return its response code."""
self.putcmd(cmd, args)
return self.getreply()
# std smtp commands
def helo(self, name=''):
"""SMTP 'helo' command.
Hostname to send for this command defaults to the FQDN of the local
host.
"""
self.putcmd("helo", name or self.local_hostname)
(code, msg) = self.getreply()
self.helo_resp = msg
return (code, msg)
def ehlo(self, name=''):
""" SMTP 'ehlo' command.
Hostname to send for this command defaults to the FQDN of the local
host.
"""
self.esmtp_features = {}
self.putcmd(self.ehlo_msg, name or self.local_hostname)
(code, msg) = self.getreply()
# According to RFC1869 some (badly written)
# MTA's will disconnect on an ehlo. Toss an exception if
# that happens -ddm
if code == -1 and len(msg) == 0:
self.close()
raise SMTPServerDisconnected("Server not connected")
self.ehlo_resp = msg
if code != 250:
return (code, msg)
self.does_esmtp = 1
#parse the ehlo response -ddm
resp = self.ehlo_resp.split('\n')
del resp[0]
for each in resp:
# To be able to communicate with as many SMTP servers as possible,
# we have to take the old-style auth advertisement into account,
# because:
# 1) Else our SMTP feature parser gets confused.
# 2) There are some servers that only advertise the auth methods we
# support using the old style.
auth_match = OLDSTYLE_AUTH.match(each)
if auth_match:
# This doesn't remove duplicates, but that's no problem
self.esmtp_features["auth"] = self.esmtp_features.get("auth", "") \
+ " " + auth_match.groups(0)[0]
continue
# RFC 1869 requires a space between ehlo keyword and parameters.
# It's actually stricter, in that only spaces are allowed between
# parameters, but were not going to check for that here. Note
# that the space isn't present if there are no parameters.
m = re.match(r'(?P<feature>[A-Za-z0-9][A-Za-z0-9\-]*) ?', each)
if m:
feature = m.group("feature").lower()
params = m.string[m.end("feature"):].strip()
if feature == "auth":
self.esmtp_features[feature] = self.esmtp_features.get(feature, "") \
+ " " + params
else:
self.esmtp_features[feature] = params
return (code, msg)
def has_extn(self, opt):
"""Does the server support a given SMTP service extension?"""
return opt.lower() in self.esmtp_features
def help(self, args=''):
"""SMTP 'help' command.
Returns help text from server."""
self.putcmd("help", args)
return self.getreply()[1]
def rset(self):
"""SMTP 'rset' command -- resets session."""
return self.docmd("rset")
def noop(self):
"""SMTP 'noop' command -- doesn't do anything :>"""
return self.docmd("noop")
def mail(self, sender, options=[]):
"""SMTP 'mail' command -- begins mail xfer session."""
optionlist = ''
if options and self.does_esmtp:
optionlist = ' ' + ' '.join(options)
self.putcmd("mail", "FROM:%s%s" % (quoteaddr(sender), optionlist))
return self.getreply()
def rcpt(self, recip, options=[]):
"""SMTP 'rcpt' command -- indicates 1 recipient for this mail."""
optionlist = ''
if options and self.does_esmtp:
optionlist = ' ' + ' '.join(options)
self.putcmd("rcpt", "TO:%s%s" % (quoteaddr(recip), optionlist))
return self.getreply()
def data(self, msg):
"""SMTP 'DATA' command -- sends message data to server.
Automatically quotes lines beginning with a period per rfc821.
Raises SMTPDataError if there is an unexpected reply to the
DATA command; the return value from this method is the final
response code received when the all data is sent.
"""
self.putcmd("data")
(code, repl) = self.getreply()
if self.debuglevel > 0:
print>>stderr, "data:", (code, repl)
if code != 354:
raise SMTPDataError(code, repl)
else:
q = quotedata(msg)
if q[-2:] != CRLF:
q = q + CRLF
q = q + "." + CRLF
self.send(q)
(code, msg) = self.getreply()
if self.debuglevel > 0:
print>>stderr, "data:", (code, msg)
return (code, msg)
def verify(self, address):
"""SMTP 'verify' command -- checks for address validity."""
self.putcmd("vrfy", _addr_only(address))
return self.getreply()
# a.k.a.
vrfy = verify
def expn(self, address):
"""SMTP 'expn' command -- expands a mailing list."""
self.putcmd("expn", _addr_only(address))
return self.getreply()
# some useful methods
def ehlo_or_helo_if_needed(self):
"""Call self.ehlo() and/or self.helo() if needed.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first.
This method may raise the following exceptions:
SMTPHeloError The server didn't reply properly to
the helo greeting.
"""
if self.helo_resp is None and self.ehlo_resp is None:
if not (200 <= self.ehlo()[0] <= 299):
(code, resp) = self.helo()
if not (200 <= code <= 299):
raise SMTPHeloError(code, resp)
def login(self, user, password):
"""Log in on an SMTP server that requires authentication.
The arguments are:
- user: The user name to authenticate with.
- password: The password for the authentication.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first.
This method will return normally if the authentication was successful.
This method may raise the following exceptions:
SMTPHeloError The server didn't reply properly to
the helo greeting.
SMTPAuthenticationError The server didn't accept the username/
password combination.
SMTPException No suitable authentication method was
found.
"""
def encode_cram_md5(challenge, user, password):
challenge = base64.decodestring(challenge)
response = user + " " + hmac.HMAC(password, challenge).hexdigest()
return encode_base64(response, eol="")
def encode_plain(user, password):
return encode_base64("\0%s\0%s" % (user, password), eol="")
AUTH_PLAIN = "PLAIN"
AUTH_CRAM_MD5 = "CRAM-MD5"
AUTH_LOGIN = "LOGIN"
self.ehlo_or_helo_if_needed()
if not self.has_extn("auth"):
raise SMTPException("SMTP AUTH extension not supported by server.")
# Authentication methods the server supports:
authlist = self.esmtp_features["auth"].split()
# List of authentication methods we support: from preferred to
# less preferred methods. Except for the purpose of testing the weaker
# ones, we prefer stronger methods like CRAM-MD5:
preferred_auths = [AUTH_CRAM_MD5, AUTH_PLAIN, AUTH_LOGIN]
# Determine the authentication method we'll use
authmethod = None
for method in preferred_auths:
if method in authlist:
authmethod = method
break
if authmethod == AUTH_CRAM_MD5:
(code, resp) = self.docmd("AUTH", AUTH_CRAM_MD5)
if code == 503:
# 503 == 'Error: already authenticated'
return (code, resp)
(code, resp) = self.docmd(encode_cram_md5(resp, user, password))
elif authmethod == AUTH_PLAIN:
(code, resp) = self.docmd("AUTH",
AUTH_PLAIN + " " + encode_plain(user, password))
elif authmethod == AUTH_LOGIN:
(code, resp) = self.docmd("AUTH",
"%s %s" % (AUTH_LOGIN, encode_base64(user, eol="")))
if code != 334:
raise SMTPAuthenticationError(code, resp)
(code, resp) = self.docmd(encode_base64(password, eol=""))
elif authmethod is None:
raise SMTPException("No suitable authentication method found.")
if code not in (235, 503):
# 235 == 'Authentication successful'
# 503 == 'Error: already authenticated'
raise SMTPAuthenticationError(code, resp)
return (code, resp)
def starttls(self, keyfile=None, certfile=None):
"""Puts the connection to the SMTP server into TLS mode.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first.
If the server supports TLS, this will encrypt the rest of the SMTP
session. If you provide the keyfile and certfile parameters,
the identity of the SMTP server and client can be checked. This,
however, depends on whether the socket module really checks the
certificates.
This method may raise the following exceptions:
SMTPHeloError The server didn't reply properly to
the helo greeting.
"""
self.ehlo_or_helo_if_needed()
if not self.has_extn("starttls"):
raise SMTPException("STARTTLS extension not supported by server.")
(resp, reply) = self.docmd("STARTTLS")
if resp == 220:
if not _have_ssl:
raise RuntimeError("No SSL support included in this Python")
self.sock = ssl.wrap_socket(self.sock, keyfile, certfile)
self.file = SSLFakeFile(self.sock)
# RFC 3207:
# The client MUST discard any knowledge obtained from
# the server, such as the list of SMTP service extensions,
# which was not obtained from the TLS negotiation itself.
self.helo_resp = None
self.ehlo_resp = None
self.esmtp_features = {}
self.does_esmtp = 0
else:
# RFC 3207:
# 501 Syntax error (no parameters allowed)
# 454 TLS not available due to temporary reason
raise SMTPResponseException(resp, reply)
return (resp, reply)
def sendmail(self, from_addr, to_addrs, msg, mail_options=[],
rcpt_options=[]):
"""This command performs an entire mail transaction.
The arguments are:
- from_addr : The address sending this mail.
- to_addrs : A list of addresses to send this mail to. A bare
string will be treated as a list with 1 address.
- msg : The message to send.
- mail_options : List of ESMTP options (such as 8bitmime) for the
mail command.
- rcpt_options : List of ESMTP options (such as DSN commands) for
all the rcpt commands.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first. If the server does ESMTP, message size
and each of the specified options will be passed to it. If EHLO
fails, HELO will be tried and ESMTP options suppressed.
This method will return normally if the mail is accepted for at least
one recipient. It returns a dictionary, with one entry for each
recipient that was refused. Each entry contains a tuple of the SMTP
error code and the accompanying error message sent by the server.
This method may raise the following exceptions:
SMTPHeloError The server didn't reply properly to
the helo greeting.
SMTPRecipientsRefused The server rejected ALL recipients
(no mail was sent).
SMTPSenderRefused The server didn't accept the from_addr.
SMTPDataError The server replied with an unexpected
error code (other than a refusal of
a recipient).
Note: the connection will be open even after an exception is raised.
Example:
>>> import smtplib
>>> s=smtplib.SMTP("localhost")
>>> tolist=["[email protected]","[email protected]","[email protected]","[email protected]"]
>>> msg = '''\\
... From: [email protected]
... Subject: testin'...
...
... This is a test '''
>>> s.sendmail("[email protected]",tolist,msg)
{ "[email protected]" : ( 550 ,"User unknown" ) }
>>> s.quit()
In the above example, the message was accepted for delivery to three
of the four addresses, and one was rejected, with the error code
550. If all addresses are accepted, then the method will return an
empty dictionary.
"""
self.ehlo_or_helo_if_needed()
esmtp_opts = []
if self.does_esmtp:
# Hmmm? what's this? -ddm
# self.esmtp_features['7bit']=""
if self.has_extn('size'):
esmtp_opts.append("size=%d" % len(msg))
for option in mail_options:
esmtp_opts.append(option)
(code, resp) = self.mail(from_addr, esmtp_opts)
if code != 250:
self.rset()
raise SMTPSenderRefused(code, resp, from_addr)
senderrs = {}
if isinstance(to_addrs, basestring):
to_addrs = [to_addrs]
for each in to_addrs:
(code, resp) = self.rcpt(each, rcpt_options)
if (code != 250) and (code != 251):
senderrs[each] = (code, resp)
if len(senderrs) == len(to_addrs):
# the server refused all our recipients
self.rset()
raise SMTPRecipientsRefused(senderrs)
(code, resp) = self.data(msg)
if code != 250:
self.rset()
raise SMTPDataError(code, resp)
#if we got here then somebody got our mail
return senderrs
def close(self):
"""Close the connection to the SMTP server."""
try:
file = self.file
self.file = None
if file:
file.close()
finally:
sock = self.sock
self.sock = None
if sock:
sock.close()
def quit(self):
"""Terminate the SMTP session."""
res = self.docmd("quit")
# A new EHLO is required after reconnecting with connect()
self.ehlo_resp = self.helo_resp = None
self.esmtp_features = {}
self.does_esmtp = False
self.close()
return res
if _have_ssl:
class SMTP_SSL(SMTP):
""" This is a subclass derived from SMTP that connects over an SSL
encrypted socket (to use this class you need a socket module that was
compiled with SSL support). If host is not specified, '' (the local
host) is used. If port is omitted, the standard SMTP-over-SSL port
(465) is used. local_hostname has the same meaning as it does in the
SMTP class. keyfile and certfile are also optional - they can contain
a PEM formatted private key and certificate chain file for the SSL
connection.
"""
default_port = SMTP_SSL_PORT
def __init__(self, host='', port=0, local_hostname=None,
keyfile=None, certfile=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
self.keyfile = keyfile
self.certfile = certfile
SMTP.__init__(self, host, port, local_hostname, timeout)
def _get_socket(self, host, port, timeout):
if self.debuglevel > 0:
print>>stderr, 'connect:', (host, port)
new_socket = socket.create_connection((host, port), timeout)
new_socket = ssl.wrap_socket(new_socket, self.keyfile, self.certfile)
self.file = SSLFakeFile(new_socket)
return new_socket
__all__.append("SMTP_SSL")
#
# LMTP extension
#
LMTP_PORT = 2003
class LMTP(SMTP):
"""LMTP - Local Mail Transfer Protocol
The LMTP protocol, which is very similar to ESMTP, is heavily based
on the standard SMTP client. It's common to use Unix sockets for
LMTP, so our connect() method must support that as well as a regular
host:port server. local_hostname has the same meaning as it does in
the SMTP class. To specify a Unix socket, you must use an absolute
path as the host, starting with a '/'.
Authentication is supported, using the regular SMTP mechanism. When
using a Unix socket, LMTP generally don't support or require any
authentication, but your mileage might vary."""
ehlo_msg = "lhlo"
def __init__(self, host='', port=LMTP_PORT, local_hostname=None):
"""Initialize a new instance."""
SMTP.__init__(self, host, port, local_hostname)
def connect(self, host='localhost', port=0):
"""Connect to the LMTP daemon, on either a Unix or a TCP socket."""
if host[0] != '/':
return SMTP.connect(self, host, port)
# Handle Unix-domain sockets.
try:
self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.sock.connect(host)
except socket.error:
if self.debuglevel > 0:
print>>stderr, 'connect fail:', host
if self.sock:
self.sock.close()
self.sock = None
raise
(code, msg) = self.getreply()
if self.debuglevel > 0:
print>>stderr, "connect:", msg
return (code, msg)
# Test the sendmail method, which tests most of the others.
# Note: This always sends to localhost.
if __name__ == '__main__':
import sys
def prompt(prompt):
sys.stdout.write(prompt + ": ")
return sys.stdin.readline().strip()
fromaddr = prompt("From")
toaddrs = prompt("To").split(',')
print "Enter message, end with ^D:"
msg = ''
while 1:
line = sys.stdin.readline()
if not line:
break
msg = msg + line
print "Message length is %d" % len(msg)
server = SMTP('localhost')
server.set_debuglevel(1)
server.sendmail(fromaddr, toaddrs, msg)
server.quit()
"""Routines to help recognizing sound files.
Function whathdr() recognizes various types of sound file headers.
It understands almost all headers that SOX can decode.
The return tuple contains the following items, in this order:
- file type (as SOX understands it)
- sampling rate (0 if unknown or hard to decode)
- number of channels (0 if unknown or hard to decode)
- number of frames in the file (-1 if unknown or hard to decode)
- number of bits/sample, or 'U' for U-LAW, or 'A' for A-LAW
If the file doesn't have a recognizable type, it returns None.
If the file can't be opened, IOError is raised.
To compute the total time, divide the number of frames by the
sampling rate (a frame contains a sample for each channel).
Function what() calls whathdr(). (It used to also use some
heuristics for raw data, but this doesn't work very well.)
Finally, the function test() is a simple main program that calls
what() for all files mentioned on the argument list. For directory
arguments it calls what() for all files in that directory. Default
argument is "." (testing all files in the current directory). The
option -r tells it to recurse down directories found inside
explicitly given directories.
"""
# The file structure is top-down except that the test program and its
# subroutine come last.
__all__ = ["what","whathdr"]
def what(filename):
"""Guess the type of a sound file"""
res = whathdr(filename)
return res
def whathdr(filename):
"""Recognize sound headers"""
f = open(filename, 'rb')
h = f.read(512)
for tf in tests:
res = tf(h, f)
if res:
return res
return None
#-----------------------------------#
# Subroutines per sound header type #
#-----------------------------------#
tests = []
def test_aifc(h, f):
import aifc
if h[:4] != 'FORM':
return None
if h[8:12] == 'AIFC':
fmt = 'aifc'
elif h[8:12] == 'AIFF':
fmt = 'aiff'
else:
return None
f.seek(0)
try:
a = aifc.openfp(f, 'r')
except (EOFError, aifc.Error):
return None
return (fmt, a.getframerate(), a.getnchannels(), \
a.getnframes(), 8*a.getsampwidth())
tests.append(test_aifc)
def test_au(h, f):
if h[:4] == '.snd':
f = get_long_be
elif h[:4] in ('\0ds.', 'dns.'):
f = get_long_le
else:
return None
type = 'au'
hdr_size = f(h[4:8])
data_size = f(h[8:12])
encoding = f(h[12:16])
rate = f(h[16:20])
nchannels = f(h[20:24])
sample_size = 1 # default
if encoding == 1:
sample_bits = 'U'
elif encoding == 2:
sample_bits = 8
elif encoding == 3:
sample_bits = 16
sample_size = 2
else:
sample_bits = '?'
frame_size = sample_size * nchannels
return type, rate, nchannels, data_size//frame_size, sample_bits
tests.append(test_au)
def test_hcom(h, f):
if h[65:69] != 'FSSD' or h[128:132] != 'HCOM':
return None
divisor = get_long_be(h[128+16:128+20])
return 'hcom', 22050//divisor, 1, -1, 8
tests.append(test_hcom)
def test_voc(h, f):
if h[:20] != 'Creative Voice File\032':
return None
sbseek = get_short_le(h[20:22])
rate = 0
if 0 <= sbseek < 500 and h[sbseek] == '\1':
ratecode = ord(h[sbseek+4])
rate = int(1000000.0 / (256 - ratecode))
return 'voc', rate, 1, -1, 8
tests.append(test_voc)
def test_wav(h, f):
# 'RIFF' <len> 'WAVE' 'fmt ' <len>
if h[:4] != 'RIFF' or h[8:12] != 'WAVE' or h[12:16] != 'fmt ':
return None
style = get_short_le(h[20:22])
nchannels = get_short_le(h[22:24])
rate = get_long_le(h[24:28])
sample_bits = get_short_le(h[34:36])
return 'wav', rate, nchannels, -1, sample_bits
tests.append(test_wav)
def test_8svx(h, f):
if h[:4] != 'FORM' or h[8:12] != '8SVX':
return None
# Should decode it to get #channels -- assume always 1
return '8svx', 0, 1, 0, 8
tests.append(test_8svx)
def test_sndt(h, f):
if h[:5] == 'SOUND':
nsamples = get_long_le(h[8:12])
rate = get_short_le(h[20:22])
return 'sndt', rate, 1, nsamples, 8
tests.append(test_sndt)
def test_sndr(h, f):
if h[:2] == '\0\0':
rate = get_short_le(h[2:4])
if 4000 <= rate <= 25000:
return 'sndr', rate, 1, -1, 8
tests.append(test_sndr)
#---------------------------------------------#
# Subroutines to extract numbers from strings #
#---------------------------------------------#
def get_long_be(s):
return (ord(s[0])<<24) | (ord(s[1])<<16) | (ord(s[2])<<8) | ord(s[3])
def get_long_le(s):
return (ord(s[3])<<24) | (ord(s[2])<<16) | (ord(s[1])<<8) | ord(s[0])
def get_short_be(s):
return (ord(s[0])<<8) | ord(s[1])
def get_short_le(s):
return (ord(s[1])<<8) | ord(s[0])
#--------------------#
# Small test program #
#--------------------#
def test():
import sys
recursive = 0
if sys.argv[1:] and sys.argv[1] == '-r':
del sys.argv[1:2]
recursive = 1
try:
if sys.argv[1:]:
testall(sys.argv[1:], recursive, 1)
else:
testall(['.'], recursive, 1)
except KeyboardInterrupt:
sys.stderr.write('\n[Interrupted]\n')
sys.exit(1)
def testall(list, recursive, toplevel):
import sys
import os
for filename in list:
if os.path.isdir(filename):
print filename + '/:',
if recursive or toplevel:
print 'recursing down:'
import glob
names = glob.glob(os.path.join(filename, '*'))
testall(names, recursive, 0)
else:
print '*** directory (use -r) ***'
else:
print filename + ':',
sys.stdout.flush()
try:
print what(filename)
except IOError:
print '*** not found ***'
if __name__ == '__main__':
test()
# Wrapper module for _socket, providing some additional facilities
# implemented in Python.
"""\
This module provides socket operations and some related functions.
On Unix, it supports IP (Internet Protocol) and Unix domain sockets.
On other systems, it only supports IP. Functions specific for a
socket are available as methods of the socket object.
Functions:
socket() -- create a new socket object
socketpair() -- create a pair of new socket objects [*]
fromfd() -- create a socket object from an open file descriptor [*]
gethostname() -- return the current hostname
gethostbyname() -- map a hostname to its IP number
gethostbyaddr() -- map an IP number or hostname to DNS info
getservbyname() -- map a service name and a protocol name to a port number
getprotobyname() -- map a protocol name (e.g. 'tcp') to a number
ntohs(), ntohl() -- convert 16, 32 bit int from network to host byte order
htons(), htonl() -- convert 16, 32 bit int from host to network byte order
inet_aton() -- convert IP addr string (123.45.67.89) to 32-bit packed format
inet_ntoa() -- convert 32-bit packed format IP to string (123.45.67.89)
ssl() -- secure socket layer support (only available if configured)
socket.getdefaulttimeout() -- get the default timeout value
socket.setdefaulttimeout() -- set the default timeout value
create_connection() -- connects to an address, with an optional timeout and
optional source address.
[*] not available on all platforms!
Special objects:
SocketType -- type object for socket objects
error -- exception raised for I/O errors
has_ipv6 -- boolean value indicating if IPv6 is supported
Integer constants:
AF_INET, AF_UNIX -- socket domains (first argument to socket() call)
SOCK_STREAM, SOCK_DGRAM, SOCK_RAW -- socket types (second argument)
Many other constants may be defined; these may be used in calls to
the setsockopt() and getsockopt() methods.
"""
import _socket
from _socket import *
from functools import partial
from types import MethodType
try:
import _ssl
except ImportError:
# no SSL support
pass
else:
def ssl(sock, keyfile=None, certfile=None):
# we do an internal import here because the ssl
# module imports the socket module
import ssl as _realssl
warnings.warn("socket.ssl() is deprecated. Use ssl.wrap_socket() instead.",
DeprecationWarning, stacklevel=2)
return _realssl.sslwrap_simple(sock, keyfile, certfile)
# we need to import the same constants we used to...
from _ssl import SSLError as sslerror
from _ssl import \
RAND_add, \
RAND_status, \
SSL_ERROR_ZERO_RETURN, \
SSL_ERROR_WANT_READ, \
SSL_ERROR_WANT_WRITE, \
SSL_ERROR_WANT_X509_LOOKUP, \
SSL_ERROR_SYSCALL, \
SSL_ERROR_SSL, \
SSL_ERROR_WANT_CONNECT, \
SSL_ERROR_EOF, \
SSL_ERROR_INVALID_ERROR_CODE
try:
from _ssl import RAND_egd
except ImportError:
# LibreSSL does not provide RAND_egd
pass
import os, sys, warnings
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
try:
import errno
except ImportError:
errno = None
EBADF = getattr(errno, 'EBADF', 9)
EINTR = getattr(errno, 'EINTR', 4)
__all__ = ["getfqdn", "create_connection"]
__all__.extend(os._get_exports_list(_socket))
_realsocket = socket
# WSA error codes
if sys.platform.lower().startswith("win"):
errorTab = {}
errorTab[10004] = "The operation was interrupted."
errorTab[10009] = "A bad file handle was passed."
errorTab[10013] = "Permission denied."
errorTab[10014] = "A fault occurred on the network??" # WSAEFAULT
errorTab[10022] = "An invalid operation was attempted."
errorTab[10035] = "The socket operation would block"
errorTab[10036] = "A blocking operation is already in progress."
errorTab[10048] = "The network address is in use."
errorTab[10054] = "The connection has been reset."
errorTab[10058] = "The network has been shut down."
errorTab[10060] = "The operation timed out."
errorTab[10061] = "Connection refused."
errorTab[10063] = "The name is too long."
errorTab[10064] = "The host is down."
errorTab[10065] = "The host is unreachable."
__all__.append("errorTab")
def getfqdn(name=''):
"""Get fully qualified domain name from name.
An empty argument is interpreted as meaning the local host.
First the hostname returned by gethostbyaddr() is checked, then
possibly existing aliases. In case no FQDN is available, hostname
from gethostname() is returned.
"""
name = name.strip()
if not name or name == '0.0.0.0':
name = gethostname()
try:
hostname, aliases, ipaddrs = gethostbyaddr(name)
except error:
pass
else:
aliases.insert(0, hostname)
for name in aliases:
if '.' in name:
break
else:
name = hostname
return name
_socketmethods = (
'bind', 'connect', 'connect_ex', 'fileno', 'listen',
'getpeername', 'getsockname', 'getsockopt', 'setsockopt',
'sendall', 'setblocking',
'settimeout', 'gettimeout', 'shutdown')
if os.name == "nt":
_socketmethods = _socketmethods + ('ioctl',)
if sys.platform == "riscos":
_socketmethods = _socketmethods + ('sleeptaskw',)
# All the method names that must be delegated to either the real socket
# object or the _closedsocket object.
_delegate_methods = ("recv", "recvfrom", "recv_into", "recvfrom_into",
"send", "sendto")
class _closedsocket(object):
__slots__ = []
def _dummy(*args):
raise error(EBADF, 'Bad file descriptor')
# All _delegate_methods must also be initialized here.
send = recv = recv_into = sendto = recvfrom = recvfrom_into = _dummy
__getattr__ = _dummy
# Wrapper around platform socket objects. This implements
# a platform-independent dup() functionality. The
# implementation currently relies on reference counting
# to close the underlying socket object.
class _socketobject(object):
__doc__ = _realsocket.__doc__
__slots__ = ["_sock", "__weakref__"] + list(_delegate_methods)
def __init__(self, family=AF_INET, type=SOCK_STREAM, proto=0, _sock=None):
if _sock is None:
_sock = _realsocket(family, type, proto)
self._sock = _sock
for method in _delegate_methods:
setattr(self, method, getattr(_sock, method))
def close(self, _closedsocket=_closedsocket,
_delegate_methods=_delegate_methods, setattr=setattr):
# This function should not reference any globals. See issue #808164.
self._sock = _closedsocket()
dummy = self._sock._dummy
for method in _delegate_methods:
setattr(self, method, dummy)
close.__doc__ = _realsocket.close.__doc__
def accept(self):
sock, addr = self._sock.accept()
return _socketobject(_sock=sock), addr
accept.__doc__ = _realsocket.accept.__doc__
def dup(self):
"""dup() -> socket object
Return a new socket object connected to the same system resource."""
return _socketobject(_sock=self._sock)
def makefile(self, mode='r', bufsize=-1):
"""makefile([mode[, bufsize]]) -> file object
Return a regular file object corresponding to the socket. The mode
and bufsize arguments are as for the built-in open() function."""
return _fileobject(self._sock, mode, bufsize)
family = property(lambda self: self._sock.family, doc="the socket family")
type = property(lambda self: self._sock.type, doc="the socket type")
proto = property(lambda self: self._sock.proto, doc="the socket protocol")
def meth(name,self,*args):
return getattr(self._sock,name)(*args)
for _m in _socketmethods:
p = partial(meth,_m)
p.__name__ = _m
p.__doc__ = getattr(_realsocket,_m).__doc__
m = MethodType(p,None,_socketobject)
setattr(_socketobject,_m,m)
socket = SocketType = _socketobject
class _fileobject(object):
"""Faux file object attached to a socket object."""
default_bufsize = 8192
name = "<socket>"
__slots__ = ["mode", "bufsize", "softspace",
# "closed" is a property, see below
"_sock", "_rbufsize", "_wbufsize", "_rbuf", "_wbuf", "_wbuf_len",
"_close"]
def __init__(self, sock, mode='rb', bufsize=-1, close=False):
self._sock = sock
self.mode = mode # Not actually used in this version
if bufsize < 0:
bufsize = self.default_bufsize
self.bufsize = bufsize
self.softspace = False
# _rbufsize is the suggested recv buffer size. It is *strictly*
# obeyed within readline() for recv calls. If it is larger than
# default_bufsize it will be used for recv calls within read().
if bufsize == 0:
self._rbufsize = 1
elif bufsize == 1:
self._rbufsize = self.default_bufsize
else:
self._rbufsize = bufsize
self._wbufsize = bufsize
# We use StringIO for the read buffer to avoid holding a list
# of variously sized string objects which have been known to
# fragment the heap due to how they are malloc()ed and often
# realloc()ed down much smaller than their original allocation.
self._rbuf = StringIO()
self._wbuf = [] # A list of strings
self._wbuf_len = 0
self._close = close
def _getclosed(self):
return self._sock is None
closed = property(_getclosed, doc="True if the file is closed")
def close(self):
try:
if self._sock:
self.flush()
finally:
if self._close:
self._sock.close()
self._sock = None
def __del__(self):
try:
self.close()
except:
# close() may fail if __init__ didn't complete
pass
def flush(self):
if self._wbuf:
data = "".join(self._wbuf)
self._wbuf = []
self._wbuf_len = 0
buffer_size = max(self._rbufsize, self.default_bufsize)
data_size = len(data)
write_offset = 0
view = memoryview(data)
try:
while write_offset < data_size:
self._sock.sendall(view[write_offset:write_offset+buffer_size])
write_offset += buffer_size
finally:
if write_offset < data_size:
remainder = data[write_offset:]
del view, data # explicit free
self._wbuf.append(remainder)
self._wbuf_len = len(remainder)
def fileno(self):
return self._sock.fileno()
def write(self, data):
data = str(data) # XXX Should really reject non-string non-buffers
if not data:
return
self._wbuf.append(data)
self._wbuf_len += len(data)
if (self._wbufsize == 0 or
(self._wbufsize == 1 and '\n' in data) or
(self._wbufsize > 1 and self._wbuf_len >= self._wbufsize)):
self.flush()
def writelines(self, list):
# XXX We could do better here for very long lists
# XXX Should really reject non-string non-buffers
lines = filter(None, map(str, list))
self._wbuf_len += sum(map(len, lines))
self._wbuf.extend(lines)
if (self._wbufsize <= 1 or
self._wbuf_len >= self._wbufsize):
self.flush()
def read(self, size=-1):
# Use max, disallow tiny reads in a loop as they are very inefficient.
# We never leave read() with any leftover data from a new recv() call
# in our internal buffer.
rbufsize = max(self._rbufsize, self.default_bufsize)
# Our use of StringIO rather than lists of string objects returned by
# recv() minimizes memory usage and fragmentation that occurs when
# rbufsize is large compared to the typical return value of recv().
buf = self._rbuf
buf.seek(0, 2) # seek end
if size < 0:
# Read until EOF
self._rbuf = StringIO() # reset _rbuf. we consume it via buf.
while True:
try:
data = self._sock.recv(rbufsize)
except error, e:
if e.args[0] == EINTR:
continue
raise
if not data:
break
buf.write(data)
return buf.getvalue()
else:
# Read until size bytes or EOF seen, whichever comes first
buf_len = buf.tell()
if buf_len >= size:
# Already have size bytes in our buffer? Extract and return.
buf.seek(0)
rv = buf.read(size)
self._rbuf = StringIO()
self._rbuf.write(buf.read())
return rv
self._rbuf = StringIO() # reset _rbuf. we consume it via buf.
while True:
left = size - buf_len
# recv() will malloc the amount of memory given as its
# parameter even though it often returns much less data
# than that. The returned data string is short lived
# as we copy it into a StringIO and free it. This avoids
# fragmentation issues on many platforms.
try:
data = self._sock.recv(left)
except error, e:
if e.args[0] == EINTR:
continue
raise
if not data:
break
n = len(data)
if n == size and not buf_len:
# Shortcut. Avoid buffer data copies when:
# - We have no data in our buffer.
# AND
# - Our call to recv returned exactly the
# number of bytes we were asked to read.
return data
if n == left:
buf.write(data)
del data # explicit free
break
assert n <= left, "recv(%d) returned %d bytes" % (left, n)
buf.write(data)
buf_len += n
del data # explicit free
#assert buf_len == buf.tell()
return buf.getvalue()
def readline(self, size=-1):
buf = self._rbuf
buf.seek(0, 2) # seek end
if buf.tell() > 0:
# check if we already have it in our buffer
buf.seek(0)
bline = buf.readline(size)
if bline.endswith('\n') or len(bline) == size:
self._rbuf = StringIO()
self._rbuf.write(buf.read())
return bline
del bline
if size < 0:
# Read until \n or EOF, whichever comes first
if self._rbufsize <= 1:
# Speed up unbuffered case
buf.seek(0)
buffers = [buf.read()]
self._rbuf = StringIO() # reset _rbuf. we consume it via buf.
data = None
recv = self._sock.recv
while True:
try:
while data != "\n":
data = recv(1)
if not data:
break
buffers.append(data)
except error, e:
# The try..except to catch EINTR was moved outside the
# recv loop to avoid the per byte overhead.
if e.args[0] == EINTR:
continue
raise
break
return "".join(buffers)
buf.seek(0, 2) # seek end
self._rbuf = StringIO() # reset _rbuf. we consume it via buf.
while True:
try:
data = self._sock.recv(self._rbufsize)
except error, e:
if e.args[0] == EINTR:
continue
raise
if not data:
break
nl = data.find('\n')
if nl >= 0:
nl += 1
buf.write(data[:nl])
self._rbuf.write(data[nl:])
del data
break
buf.write(data)
return buf.getvalue()
else:
# Read until size bytes or \n or EOF seen, whichever comes first
buf.seek(0, 2) # seek end
buf_len = buf.tell()
if buf_len >= size:
buf.seek(0)
rv = buf.read(size)
self._rbuf = StringIO()
self._rbuf.write(buf.read())
return rv
self._rbuf = StringIO() # reset _rbuf. we consume it via buf.
while True:
try:
data = self._sock.recv(self._rbufsize)
except error, e:
if e.args[0] == EINTR:
continue
raise
if not data:
break
left = size - buf_len
# did we just receive a newline?
nl = data.find('\n', 0, left)
if nl >= 0:
nl += 1
# save the excess data to _rbuf
self._rbuf.write(data[nl:])
if buf_len:
buf.write(data[:nl])
break
else:
# Shortcut. Avoid data copy through buf when returning
# a substring of our first recv().
return data[:nl]
n = len(data)
if n == size and not buf_len:
# Shortcut. Avoid data copy through buf when
# returning exactly all of our first recv().
return data
if n >= left:
buf.write(data[:left])
self._rbuf.write(data[left:])
break
buf.write(data)
buf_len += n
#assert buf_len == buf.tell()
return buf.getvalue()
def readlines(self, sizehint=0):
total = 0
list = []
while True:
line = self.readline()
if not line:
break
list.append(line)
total += len(line)
if sizehint and total >= sizehint:
break
return list
# Iterator protocols
def __iter__(self):
return self
def next(self):
line = self.readline()
if not line:
raise StopIteration
return line
_GLOBAL_DEFAULT_TIMEOUT = object()
def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT,
source_address=None):
"""Connect to *address* and return the socket object.
Convenience function. Connect to *address* (a 2-tuple ``(host,
port)``) and return the socket object. Passing the optional
*timeout* parameter will set the timeout on the socket instance
before attempting to connect. If no *timeout* is supplied, the
global default timeout setting returned by :func:`getdefaulttimeout`
is used. If *source_address* is set it must be a tuple of (host, port)
for the socket to bind as a source address before making the connection.
A host of '' or port 0 tells the OS to use the default.
"""
host, port = address
err = None
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
sock = socket(af, socktype, proto)
if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(sa)
return sock
except error as _:
err = _
if sock is not None:
sock.close()
if err is not None:
raise err
else:
raise error("getaddrinfo returns an empty list")
"""Generic socket server classes.
This module tries to capture the various aspects of defining a server:
For socket-based servers:
- address family:
- AF_INET{,6}: IP (Internet Protocol) sockets (default)
- AF_UNIX: Unix domain sockets
- others, e.g. AF_DECNET are conceivable (see <socket.h>
- socket type:
- SOCK_STREAM (reliable stream, e.g. TCP)
- SOCK_DGRAM (datagrams, e.g. UDP)
For request-based servers (including socket-based):
- client address verification before further looking at the request
(This is actually a hook for any processing that needs to look
at the request before anything else, e.g. logging)
- how to handle multiple requests:
- synchronous (one request is handled at a time)
- forking (each request is handled by a new process)
- threading (each request is handled by a new thread)
The classes in this module favor the server type that is simplest to
write: a synchronous TCP/IP server. This is bad class design, but
save some typing. (There's also the issue that a deep class hierarchy
slows down method lookups.)
There are five classes in an inheritance diagram, four of which represent
synchronous servers of four types:
+------------+
| BaseServer |
+------------+
|
v
+-----------+ +------------------+
| TCPServer |------->| UnixStreamServer |
+-----------+ +------------------+
|
v
+-----------+ +--------------------+
| UDPServer |------->| UnixDatagramServer |
+-----------+ +--------------------+
Note that UnixDatagramServer derives from UDPServer, not from
UnixStreamServer -- the only difference between an IP and a Unix
stream server is the address family, which is simply repeated in both
unix server classes.
Forking and threading versions of each type of server can be created
using the ForkingMixIn and ThreadingMixIn mix-in classes. For
instance, a threading UDP server class is created as follows:
class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
The Mix-in class must come first, since it overrides a method defined
in UDPServer! Setting the various member variables also changes
the behavior of the underlying server mechanism.
To implement a service, you must derive a class from
BaseRequestHandler and redefine its handle() method. You can then run
various versions of the service by combining one of the server classes
with your request handler class.
The request handler class must be different for datagram or stream
services. This can be hidden by using the request handler
subclasses StreamRequestHandler or DatagramRequestHandler.
Of course, you still have to use your head!
For instance, it makes no sense to use a forking server if the service
contains state in memory that can be modified by requests (since the
modifications in the child process would never reach the initial state
kept in the parent process and passed to each child). In this case,
you can use a threading server, but you will probably have to use
locks to avoid two requests that come in nearly simultaneous to apply
conflicting changes to the server state.
On the other hand, if you are building e.g. an HTTP server, where all
data is stored externally (e.g. in the file system), a synchronous
class will essentially render the service "deaf" while one request is
being handled -- which may be for a very long time if a client is slow
to read all the data it has requested. Here a threading or forking
server is appropriate.
In some cases, it may be appropriate to process part of a request
synchronously, but to finish processing in a forked child depending on
the request data. This can be implemented by using a synchronous
server and doing an explicit fork in the request handler class
handle() method.
Another approach to handling multiple simultaneous requests in an
environment that supports neither threads nor fork (or where these are
too expensive or inappropriate for the service) is to maintain an
explicit table of partially finished requests and to use select() to
decide which request to work on next (or whether to handle a new
incoming request). This is particularly important for stream services
where each client can potentially be connected for a long time (if
threads or subprocesses cannot be used).
Future work:
- Standard classes for Sun RPC (which uses either UDP or TCP)
- Standard mix-in classes to implement various authentication
and encryption schemes
- Standard framework for select-based multiplexing
XXX Open problems:
- What to do with out-of-band data?
BaseServer:
- split generic "request" functionality out into BaseServer class.
Copyright (C) 2000 Luke Kenneth Casson Leighton <[email protected]>
example: read entries from a SQL database (requires overriding
get_request() to return a table entry from the database).
entry is processed by a RequestHandlerClass.
"""
# Author of the BaseServer patch: Luke Kenneth Casson Leighton
__version__ = "0.4"
import socket
import select
import sys
import os
import errno
try:
import threading
except ImportError:
import dummy_threading as threading
__all__ = ["TCPServer","UDPServer","ForkingUDPServer","ForkingTCPServer",
"ThreadingUDPServer","ThreadingTCPServer","BaseRequestHandler",
"StreamRequestHandler","DatagramRequestHandler",
"ThreadingMixIn", "ForkingMixIn"]
if hasattr(socket, "AF_UNIX"):
__all__.extend(["UnixStreamServer","UnixDatagramServer",
"ThreadingUnixStreamServer",
"ThreadingUnixDatagramServer"])
def _eintr_retry(func, *args):
"""restart a system call interrupted by EINTR"""
while True:
try:
return func(*args)
except (OSError, select.error) as e:
if e.args[0] != errno.EINTR:
raise
class BaseServer:
"""Base class for server classes.
Methods for the caller:
- __init__(server_address, RequestHandlerClass)
- serve_forever(poll_interval=0.5)
- shutdown()
- handle_request() # if you do not use serve_forever()
- fileno() -> int # for select()
Methods that may be overridden:
- server_bind()
- server_activate()
- get_request() -> request, client_address
- handle_timeout()
- verify_request(request, client_address)
- server_close()
- process_request(request, client_address)
- shutdown_request(request)
- close_request(request)
- handle_error()
Methods for derived classes:
- finish_request(request, client_address)
Class variables that may be overridden by derived classes or
instances:
- timeout
- address_family
- socket_type
- allow_reuse_address
Instance variables:
- RequestHandlerClass
- socket
"""
timeout = None
def __init__(self, server_address, RequestHandlerClass):
"""Constructor. May be extended, do not override."""
self.server_address = server_address
self.RequestHandlerClass = RequestHandlerClass
self.__is_shut_down = threading.Event()
self.__shutdown_request = False
def server_activate(self):
"""Called by constructor to activate the server.
May be overridden.
"""
pass
def serve_forever(self, poll_interval=0.5):
"""Handle one request at a time until shutdown.
Polls for shutdown every poll_interval seconds. Ignores
self.timeout. If you need to do periodic tasks, do them in
another thread.
"""
self.__is_shut_down.clear()
try:
while not self.__shutdown_request:
# XXX: Consider using another file descriptor or
# connecting to the socket to wake this up instead of
# polling. Polling reduces our responsiveness to a
# shutdown request and wastes cpu at all other times.
r, w, e = _eintr_retry(select.select, [self], [], [],
poll_interval)
# bpo-35017: shutdown() called during select(), exit immediately.
if self.__shutdown_request:
break
if self in r:
self._handle_request_noblock()
finally:
self.__shutdown_request = False
self.__is_shut_down.set()
def shutdown(self):
"""Stops the serve_forever loop.
Blocks until the loop has finished. This must be called while
serve_forever() is running in another thread, or it will
deadlock.
"""
self.__shutdown_request = True
self.__is_shut_down.wait()
# The distinction between handling, getting, processing and
# finishing a request is fairly arbitrary. Remember:
#
# - handle_request() is the top-level call. It calls
# select, get_request(), verify_request() and process_request()
# - get_request() is different for stream or datagram sockets
# - process_request() is the place that may fork a new process
# or create a new thread to finish the request
# - finish_request() instantiates the request handler class;
# this constructor will handle the request all by itself
def handle_request(self):
"""Handle one request, possibly blocking.
Respects self.timeout.
"""
# Support people who used socket.settimeout() to escape
# handle_request before self.timeout was available.
timeout = self.socket.gettimeout()
if timeout is None:
timeout = self.timeout
elif self.timeout is not None:
timeout = min(timeout, self.timeout)
fd_sets = _eintr_retry(select.select, [self], [], [], timeout)
if not fd_sets[0]:
self.handle_timeout()
return
self._handle_request_noblock()
def _handle_request_noblock(self):
"""Handle one request, without blocking.
I assume that select.select has returned that the socket is
readable before this function was called, so there should be
no risk of blocking in get_request().
"""
try:
request, client_address = self.get_request()
except socket.error:
return
if self.verify_request(request, client_address):
try:
self.process_request(request, client_address)
except:
self.handle_error(request, client_address)
self.shutdown_request(request)
else:
self.shutdown_request(request)
def handle_timeout(self):
"""Called if no new request arrives within self.timeout.
Overridden by ForkingMixIn.
"""
pass
def verify_request(self, request, client_address):
"""Verify the request. May be overridden.
Return True if we should proceed with this request.
"""
return True
def process_request(self, request, client_address):
"""Call finish_request.
Overridden by ForkingMixIn and ThreadingMixIn.
"""
self.finish_request(request, client_address)
self.shutdown_request(request)
def server_close(self):
"""Called to clean-up the server.
May be overridden.
"""
pass
def finish_request(self, request, client_address):
"""Finish one request by instantiating RequestHandlerClass."""
self.RequestHandlerClass(request, client_address, self)
def shutdown_request(self, request):
"""Called to shutdown and close an individual request."""
self.close_request(request)
def close_request(self, request):
"""Called to clean up an individual request."""
pass
def handle_error(self, request, client_address):
"""Handle an error gracefully. May be overridden.
The default is to print a traceback and continue.
"""
print '-'*40
print 'Exception happened during processing of request from',
print client_address
import traceback
traceback.print_exc() # XXX But this goes to stderr!
print '-'*40
class TCPServer(BaseServer):
"""Base class for various socket-based server classes.
Defaults to synchronous IP stream (i.e., TCP).
Methods for the caller:
- __init__(server_address, RequestHandlerClass, bind_and_activate=True)
- serve_forever(poll_interval=0.5)
- shutdown()
- handle_request() # if you don't use serve_forever()
- fileno() -> int # for select()
Methods that may be overridden:
- server_bind()
- server_activate()
- get_request() -> request, client_address
- handle_timeout()
- verify_request(request, client_address)
- process_request(request, client_address)
- shutdown_request(request)
- close_request(request)
- handle_error()
Methods for derived classes:
- finish_request(request, client_address)
Class variables that may be overridden by derived classes or
instances:
- timeout
- address_family
- socket_type
- request_queue_size (only for stream sockets)
- allow_reuse_address
Instance variables:
- server_address
- RequestHandlerClass
- socket
"""
address_family = socket.AF_INET
socket_type = socket.SOCK_STREAM
request_queue_size = 5
allow_reuse_address = False
def __init__(self, server_address, RequestHandlerClass, bind_and_activate=True):
"""Constructor. May be extended, do not override."""
BaseServer.__init__(self, server_address, RequestHandlerClass)
self.socket = socket.socket(self.address_family,
self.socket_type)
if bind_and_activate:
try:
self.server_bind()
self.server_activate()
except:
self.server_close()
raise
def server_bind(self):
"""Called by constructor to bind the socket.
May be overridden.
"""
if self.allow_reuse_address:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.socket.bind(self.server_address)
self.server_address = self.socket.getsockname()
def server_activate(self):
"""Called by constructor to activate the server.
May be overridden.
"""
self.socket.listen(self.request_queue_size)
def server_close(self):
"""Called to clean-up the server.
May be overridden.
"""
self.socket.close()
def fileno(self):
"""Return socket file number.
Interface required by select().
"""
return self.socket.fileno()
def get_request(self):
"""Get the request and client address from the socket.
May be overridden.
"""
return self.socket.accept()
def shutdown_request(self, request):
"""Called to shutdown and close an individual request."""
try:
#explicitly shutdown. socket.close() merely releases
#the socket and waits for GC to perform the actual close.
request.shutdown(socket.SHUT_WR)
except socket.error:
pass #some platforms may raise ENOTCONN here
self.close_request(request)
def close_request(self, request):
"""Called to clean up an individual request."""
request.close()
class UDPServer(TCPServer):
"""UDP server class."""
allow_reuse_address = False
socket_type = socket.SOCK_DGRAM
max_packet_size = 8192
def get_request(self):
data, client_addr = self.socket.recvfrom(self.max_packet_size)
return (data, self.socket), client_addr
def server_activate(self):
# No need to call listen() for UDP.
pass
def shutdown_request(self, request):
# No need to shutdown anything.
self.close_request(request)
def close_request(self, request):
# No need to close anything.
pass
class ForkingMixIn:
"""Mix-in class to handle each request in a new process."""
timeout = 300
active_children = None
max_children = 40
def collect_children(self):
"""Internal routine to wait for children that have exited."""
if self.active_children is None:
return
# If we're above the max number of children, wait and reap them until
# we go back below threshold. Note that we use waitpid(-1) below to be
# able to collect children in size(<defunct children>) syscalls instead
# of size(<children>): the downside is that this might reap children
# which we didn't spawn, which is why we only resort to this when we're
# above max_children.
while len(self.active_children) >= self.max_children:
try:
pid, _ = os.waitpid(-1, 0)
self.active_children.discard(pid)
except OSError as e:
if e.errno == errno.ECHILD:
# we don't have any children, we're done
self.active_children.clear()
elif e.errno != errno.EINTR:
break
# Now reap all defunct children.
for pid in self.active_children.copy():
try:
pid, _ = os.waitpid(pid, os.WNOHANG)
# if the child hasn't exited yet, pid will be 0 and ignored by
# discard() below
self.active_children.discard(pid)
except OSError as e:
if e.errno == errno.ECHILD:
# someone else reaped it
self.active_children.discard(pid)
def handle_timeout(self):
"""Wait for zombies after self.timeout seconds of inactivity.
May be extended, do not override.
"""
self.collect_children()
def process_request(self, request, client_address):
"""Fork a new subprocess to process the request."""
self.collect_children()
pid = os.fork()
if pid:
# Parent process
if self.active_children is None:
self.active_children = set()
self.active_children.add(pid)
self.close_request(request) #close handle in parent process
return
else:
# Child process.
# This must never return, hence os._exit()!
try:
self.finish_request(request, client_address)
self.shutdown_request(request)
os._exit(0)
except:
try:
self.handle_error(request, client_address)
self.shutdown_request(request)
finally:
os._exit(1)
class ThreadingMixIn:
"""Mix-in class to handle each request in a new thread."""
# Decides how threads will act upon termination of the
# main process
daemon_threads = False
def process_request_thread(self, request, client_address):
"""Same as in BaseServer but as a thread.
In addition, exception handling is done here.
"""
try:
self.finish_request(request, client_address)
self.shutdown_request(request)
except:
self.handle_error(request, client_address)
self.shutdown_request(request)
def process_request(self, request, client_address):
"""Start a new thread to process the request."""
t = threading.Thread(target = self.process_request_thread,
args = (request, client_address))
t.daemon = self.daemon_threads
t.start()
class ForkingUDPServer(ForkingMixIn, UDPServer): pass
class ForkingTCPServer(ForkingMixIn, TCPServer): pass
class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
class ThreadingTCPServer(ThreadingMixIn, TCPServer): pass
if hasattr(socket, 'AF_UNIX'):
class UnixStreamServer(TCPServer):
address_family = socket.AF_UNIX
class UnixDatagramServer(UDPServer):
address_family = socket.AF_UNIX
class ThreadingUnixStreamServer(ThreadingMixIn, UnixStreamServer): pass
class ThreadingUnixDatagramServer(ThreadingMixIn, UnixDatagramServer): pass
class BaseRequestHandler:
"""Base class for request handler classes.
This class is instantiated for each request to be handled. The
constructor sets the instance variables request, client_address
and server, and then calls the handle() method. To implement a
specific service, all you need to do is to derive a class which
defines a handle() method.
The handle() method can find the request as self.request, the
client address as self.client_address, and the server (in case it
needs access to per-server information) as self.server. Since a
separate instance is created for each request, the handle() method
can define other arbitrary instance variables.
"""
def __init__(self, request, client_address, server):
self.request = request
self.client_address = client_address
self.server = server
self.setup()
try:
self.handle()
finally:
self.finish()
def setup(self):
pass
def handle(self):
pass
def finish(self):
pass
# The following two classes make it possible to use the same service
# class for stream or datagram servers.
# Each class sets up these instance variables:
# - rfile: a file object from which receives the request is read
# - wfile: a file object to which the reply is written
# When the handle() method returns, wfile is flushed properly
class StreamRequestHandler(BaseRequestHandler):
"""Define self.rfile and self.wfile for stream sockets."""
# Default buffer sizes for rfile, wfile.
# We default rfile to buffered because otherwise it could be
# really slow for large data (a getc() call per byte); we make
# wfile unbuffered because (a) often after a write() we want to
# read and we need to flush the line; (b) big writes to unbuffered
# files are typically optimized by stdio even when big reads
# aren't.
rbufsize = -1
wbufsize = 0
# A timeout to apply to the request socket, if not None.
timeout = None
# Disable nagle algorithm for this socket, if True.
# Use only when wbufsize != 0, to avoid small packets.
disable_nagle_algorithm = False
def setup(self):
self.connection = self.request
if self.timeout is not None:
self.connection.settimeout(self.timeout)
if self.disable_nagle_algorithm:
self.connection.setsockopt(socket.IPPROTO_TCP,
socket.TCP_NODELAY, True)
self.rfile = self.connection.makefile('rb', self.rbufsize)
self.wfile = self.connection.makefile('wb', self.wbufsize)
def finish(self):
if not self.wfile.closed:
try:
self.wfile.flush()
except socket.error:
# A final socket error may have occurred here, such as
# the local error ECONNABORTED.
pass
self.wfile.close()
self.rfile.close()
class DatagramRequestHandler(BaseRequestHandler):
"""Define self.rfile and self.wfile for datagram sockets."""
def setup(self):
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
self.packet, self.socket = self.request
self.rfile = StringIO(self.packet)
self.wfile = StringIO()
def finish(self):
self.socket.sendto(self.wfile.getvalue(), self.client_address)
# -*- coding: iso-8859-1 -*-
# pysqlite2/dbapi2.py: the DB-API 2.0 interface
#
# Copyright (C) 2004-2005 Gerhard H�ring <[email protected]>
#
# This file is part of pysqlite.
#
# This software is provided 'as-is', without any express or implied
# warranty. In no event will the authors be held liable for any damages
# arising from the use of this software.
#
# Permission is granted to anyone to use this software for any purpose,
# including commercial applications, and to alter it and redistribute it
# freely, subject to the following restrictions:
#
# 1. The origin of this software must not be misrepresented; you must not
# claim that you wrote the original software. If you use this software
# in a product, an acknowledgment in the product documentation would be
# appreciated but is not required.
# 2. Altered source versions must be plainly marked as such, and must not be
# misrepresented as being the original software.
# 3. This notice may not be removed or altered from any source distribution.
import collections
import datetime
import time
from _sqlite3 import *
paramstyle = "qmark"
threadsafety = 1
apilevel = "2.0"
Date = datetime.date
Time = datetime.time
Timestamp = datetime.datetime
def DateFromTicks(ticks):
return Date(*time.localtime(ticks)[:3])
def TimeFromTicks(ticks):
return Time(*time.localtime(ticks)[3:6])
def TimestampFromTicks(ticks):
return Timestamp(*time.localtime(ticks)[:6])
version_info = tuple([int(x) for x in version.split(".")])
sqlite_version_info = tuple([int(x) for x in sqlite_version.split(".")])
Binary = buffer
collections.Sequence.register(Row)
def register_adapters_and_converters():
def adapt_date(val):
return val.isoformat()
def adapt_datetime(val):
return val.isoformat(" ")
def convert_date(val):
return datetime.date(*map(int, val.split("-")))
def convert_timestamp(val):
datepart, timepart = val.split(" ")
year, month, day = map(int, datepart.split("-"))
timepart_full = timepart.split(".")
hours, minutes, seconds = map(int, timepart_full[0].split(":"))
if len(timepart_full) == 2:
microseconds = int('{:0<6.6}'.format(timepart_full[1].decode()))
else:
microseconds = 0
val = datetime.datetime(year, month, day, hours, minutes, seconds, microseconds)
return val
register_adapter(datetime.date, adapt_date)
register_adapter(datetime.datetime, adapt_datetime)
register_converter("date", convert_date)
register_converter("timestamp", convert_timestamp)
register_adapters_and_converters()
# Clean up namespace
del(register_adapters_and_converters)
# Mimic the sqlite3 console shell's .dump command
# Author: Paul Kippes <[email protected]>
# Every identifier in sql is quoted based on a comment in sqlite
# documentation "SQLite adds new keywords from time to time when it
# takes on new features. So to prevent your code from being broken by
# future enhancements, you should normally quote any identifier that
# is an English language word, even if you do not have to."
def _iterdump(connection):
"""
Returns an iterator to the dump of the database in an SQL text format.
Used to produce an SQL dump of the database. Useful to save an in-memory
database for later restoration. This function should not be called
directly but instead called from the Connection method, iterdump().
"""
cu = connection.cursor()
yield('BEGIN TRANSACTION;')
# sqlite_master table contains the SQL CREATE statements for the database.
q = """
SELECT "name", "type", "sql"
FROM "sqlite_master"
WHERE "sql" NOT NULL AND
"type" == 'table'
ORDER BY "name"
"""
schema_res = cu.execute(q)
for table_name, type, sql in schema_res.fetchall():
if table_name == 'sqlite_sequence':
yield('DELETE FROM "sqlite_sequence";')
elif table_name == 'sqlite_stat1':
yield('ANALYZE "sqlite_master";')
elif table_name.startswith('sqlite_'):
continue
# NOTE: Virtual table support not implemented
#elif sql.startswith('CREATE VIRTUAL TABLE'):
# qtable = table_name.replace("'", "''")
# yield("INSERT INTO sqlite_master(type,name,tbl_name,rootpage,sql)"\
# "VALUES('table','{0}','{0}',0,'{1}');".format(
# qtable,
# sql.replace("''")))
else:
yield('%s;' % sql)
# Build the insert statement for each row of the current table
table_name_ident = table_name.replace('"', '""')
res = cu.execute('PRAGMA table_info("{0}")'.format(table_name_ident))
column_names = [str(table_info[1]) for table_info in res.fetchall()]
q = """SELECT 'INSERT INTO "{0}" VALUES({1})' FROM "{0}";""".format(
table_name_ident,
",".join("""'||quote("{0}")||'""".format(col.replace('"', '""')) for col in column_names))
query_res = cu.execute(q)
for row in query_res:
yield("%s;" % row[0])
# Now when the type is 'index', 'trigger', or 'view'
q = """
SELECT "name", "type", "sql"
FROM "sqlite_master"
WHERE "sql" NOT NULL AND
"type" IN ('index', 'trigger', 'view')
"""
schema_res = cu.execute(q)
for name, type, sql in schema_res.fetchall():
yield('%s;' % sql)
yield('COMMIT;')
#-*- coding: ISO-8859-1 -*-
# pysqlite2/__init__.py: the pysqlite2 package.
#
# Copyright (C) 2005 Gerhard H�ring <[email protected]>
#
# This file is part of pysqlite.
#
# This software is provided 'as-is', without any express or implied
# warranty. In no event will the authors be held liable for any damages
# arising from the use of this software.
#
# Permission is granted to anyone to use this software for any purpose,
# including commercial applications, and to alter it and redistribute it
# freely, subject to the following restrictions:
#
# 1. The origin of this software must not be misrepresented; you must not
# claim that you wrote the original software. If you use this software
# in a product, an acknowledgment in the product documentation would be
# appreciated but is not required.
# 2. Altered source versions must be plainly marked as such, and must not be
# misrepresented as being the original software.
# 3. This notice may not be removed or altered from any source distribution.
def _():
import sys
if sys.platform == 'cli':
import clr
try:
clr.AddReference('IronPython.SQLite')
except:
pass
_()
del _
from dbapi2 import *
#
# Secret Labs' Regular Expression Engine
#
# various symbols used by the regular expression engine.
# run this script to update the _sre include files!
#
# Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved.
#
# See the sre.py file for information on usage and redistribution.
#
"""Internal support module for sre"""
# update when constants are added or removed
MAGIC = 20031017
try:
from _sre import MAXREPEAT
except ImportError:
import _sre
MAXREPEAT = _sre.MAXREPEAT = 65535
# SRE standard exception (access as sre.error)
# should this really be here?
class error(Exception):
pass
# operators
FAILURE = "failure"
SUCCESS = "success"
ANY = "any"
ANY_ALL = "any_all"
ASSERT = "assert"
ASSERT_NOT = "assert_not"
AT = "at"
BIGCHARSET = "bigcharset"
BRANCH = "branch"
CALL = "call"
CATEGORY = "category"
CHARSET = "charset"
GROUPREF = "groupref"
GROUPREF_IGNORE = "groupref_ignore"
GROUPREF_EXISTS = "groupref_exists"
IN = "in"
IN_IGNORE = "in_ignore"
INFO = "info"
JUMP = "jump"
LITERAL = "literal"
LITERAL_IGNORE = "literal_ignore"
MARK = "mark"
MAX_REPEAT = "max_repeat"
MAX_UNTIL = "max_until"
MIN_REPEAT = "min_repeat"
MIN_UNTIL = "min_until"
NEGATE = "negate"
NOT_LITERAL = "not_literal"
NOT_LITERAL_IGNORE = "not_literal_ignore"
RANGE = "range"
REPEAT = "repeat"
REPEAT_ONE = "repeat_one"
SUBPATTERN = "subpattern"
MIN_REPEAT_ONE = "min_repeat_one"
# positions
AT_BEGINNING = "at_beginning"
AT_BEGINNING_LINE = "at_beginning_line"
AT_BEGINNING_STRING = "at_beginning_string"
AT_BOUNDARY = "at_boundary"
AT_NON_BOUNDARY = "at_non_boundary"
AT_END = "at_end"
AT_END_LINE = "at_end_line"
AT_END_STRING = "at_end_string"
AT_LOC_BOUNDARY = "at_loc_boundary"
AT_LOC_NON_BOUNDARY = "at_loc_non_boundary"
AT_UNI_BOUNDARY = "at_uni_boundary"
AT_UNI_NON_BOUNDARY = "at_uni_non_boundary"
# categories
CATEGORY_DIGIT = "category_digit"
CATEGORY_NOT_DIGIT = "category_not_digit"
CATEGORY_SPACE = "category_space"
CATEGORY_NOT_SPACE = "category_not_space"
CATEGORY_WORD = "category_word"
CATEGORY_NOT_WORD = "category_not_word"
CATEGORY_LINEBREAK = "category_linebreak"
CATEGORY_NOT_LINEBREAK = "category_not_linebreak"
CATEGORY_LOC_WORD = "category_loc_word"
CATEGORY_LOC_NOT_WORD = "category_loc_not_word"
CATEGORY_UNI_DIGIT = "category_uni_digit"
CATEGORY_UNI_NOT_DIGIT = "category_uni_not_digit"
CATEGORY_UNI_SPACE = "category_uni_space"
CATEGORY_UNI_NOT_SPACE = "category_uni_not_space"
CATEGORY_UNI_WORD = "category_uni_word"
CATEGORY_UNI_NOT_WORD = "category_uni_not_word"
CATEGORY_UNI_LINEBREAK = "category_uni_linebreak"
CATEGORY_UNI_NOT_LINEBREAK = "category_uni_not_linebreak"
OPCODES = [
# failure=0 success=1 (just because it looks better that way :-)
FAILURE, SUCCESS,
ANY, ANY_ALL,
ASSERT, ASSERT_NOT,
AT,
BRANCH,
CALL,
CATEGORY,
CHARSET, BIGCHARSET,
GROUPREF, GROUPREF_EXISTS, GROUPREF_IGNORE,
IN, IN_IGNORE,
INFO,
JUMP,
LITERAL, LITERAL_IGNORE,
MARK,
MAX_UNTIL,
MIN_UNTIL,
NOT_LITERAL, NOT_LITERAL_IGNORE,
NEGATE,
RANGE,
REPEAT,
REPEAT_ONE,
SUBPATTERN,
MIN_REPEAT_ONE
]
ATCODES = [
AT_BEGINNING, AT_BEGINNING_LINE, AT_BEGINNING_STRING, AT_BOUNDARY,
AT_NON_BOUNDARY, AT_END, AT_END_LINE, AT_END_STRING,
AT_LOC_BOUNDARY, AT_LOC_NON_BOUNDARY, AT_UNI_BOUNDARY,
AT_UNI_NON_BOUNDARY
]
CHCODES = [
CATEGORY_DIGIT, CATEGORY_NOT_DIGIT, CATEGORY_SPACE,
CATEGORY_NOT_SPACE, CATEGORY_WORD, CATEGORY_NOT_WORD,
CATEGORY_LINEBREAK, CATEGORY_NOT_LINEBREAK, CATEGORY_LOC_WORD,
CATEGORY_LOC_NOT_WORD, CATEGORY_UNI_DIGIT, CATEGORY_UNI_NOT_DIGIT,
CATEGORY_UNI_SPACE, CATEGORY_UNI_NOT_SPACE, CATEGORY_UNI_WORD,
CATEGORY_UNI_NOT_WORD, CATEGORY_UNI_LINEBREAK,
CATEGORY_UNI_NOT_LINEBREAK
]
def makedict(list):
d = {}
i = 0
for item in list:
d[item] = i
i = i + 1
return d
OPCODES = makedict(OPCODES)
ATCODES = makedict(ATCODES)
CHCODES = makedict(CHCODES)
# replacement operations for "ignore case" mode
OP_IGNORE = {
GROUPREF: GROUPREF_IGNORE,
IN: IN_IGNORE,
LITERAL: LITERAL_IGNORE,
NOT_LITERAL: NOT_LITERAL_IGNORE
}
AT_MULTILINE = {
AT_BEGINNING: AT_BEGINNING_LINE,
AT_END: AT_END_LINE
}
AT_LOCALE = {
AT_BOUNDARY: AT_LOC_BOUNDARY,
AT_NON_BOUNDARY: AT_LOC_NON_BOUNDARY
}
AT_UNICODE = {
AT_BOUNDARY: AT_UNI_BOUNDARY,
AT_NON_BOUNDARY: AT_UNI_NON_BOUNDARY
}
CH_LOCALE = {
CATEGORY_DIGIT: CATEGORY_DIGIT,
CATEGORY_NOT_DIGIT: CATEGORY_NOT_DIGIT,
CATEGORY_SPACE: CATEGORY_SPACE,
CATEGORY_NOT_SPACE: CATEGORY_NOT_SPACE,
CATEGORY_WORD: CATEGORY_LOC_WORD,
CATEGORY_NOT_WORD: CATEGORY_LOC_NOT_WORD,
CATEGORY_LINEBREAK: CATEGORY_LINEBREAK,
CATEGORY_NOT_LINEBREAK: CATEGORY_NOT_LINEBREAK
}
CH_UNICODE = {
CATEGORY_DIGIT: CATEGORY_UNI_DIGIT,
CATEGORY_NOT_DIGIT: CATEGORY_UNI_NOT_DIGIT,
CATEGORY_SPACE: CATEGORY_UNI_SPACE,
CATEGORY_NOT_SPACE: CATEGORY_UNI_NOT_SPACE,
CATEGORY_WORD: CATEGORY_UNI_WORD,
CATEGORY_NOT_WORD: CATEGORY_UNI_NOT_WORD,
CATEGORY_LINEBREAK: CATEGORY_UNI_LINEBREAK,
CATEGORY_NOT_LINEBREAK: CATEGORY_UNI_NOT_LINEBREAK
}
# flags
SRE_FLAG_TEMPLATE = 1 # template mode (disable backtracking)
SRE_FLAG_IGNORECASE = 2 # case insensitive
SRE_FLAG_LOCALE = 4 # honour system locale
SRE_FLAG_MULTILINE = 8 # treat target as multiline string
SRE_FLAG_DOTALL = 16 # treat target as a single string
SRE_FLAG_UNICODE = 32 # use unicode locale
SRE_FLAG_VERBOSE = 64 # ignore whitespace and comments
SRE_FLAG_DEBUG = 128 # debugging
# flags for INFO primitive
SRE_INFO_PREFIX = 1 # has prefix
SRE_INFO_LITERAL = 2 # entire pattern is literal (given by prefix)
SRE_INFO_CHARSET = 4 # pattern starts with character from given set
if __name__ == "__main__":
def dump(f, d, prefix):
items = d.items()
items.sort(key=lambda a: a[1])
for k, v in items:
f.write("#define %s_%s %s\n" % (prefix, k.upper(), v))
f = open("sre_constants.h", "w")
f.write("""\
/*
* Secret Labs' Regular Expression Engine
*
* regular expression matching engine
*
* NOTE: This file is generated by sre_constants.py. If you need
* to change anything in here, edit sre_constants.py and run it.
*
* Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved.
*
* See the _sre.c file for information on usage and redistribution.
*/
""")
f.write("#define SRE_MAGIC %d\n" % MAGIC)
dump(f, OPCODES, "SRE_OP")
dump(f, ATCODES, "SRE")
dump(f, CHCODES, "SRE")
f.write("#define SRE_FLAG_TEMPLATE %d\n" % SRE_FLAG_TEMPLATE)
f.write("#define SRE_FLAG_IGNORECASE %d\n" % SRE_FLAG_IGNORECASE)
f.write("#define SRE_FLAG_LOCALE %d\n" % SRE_FLAG_LOCALE)
f.write("#define SRE_FLAG_MULTILINE %d\n" % SRE_FLAG_MULTILINE)
f.write("#define SRE_FLAG_DOTALL %d\n" % SRE_FLAG_DOTALL)
f.write("#define SRE_FLAG_UNICODE %d\n" % SRE_FLAG_UNICODE)
f.write("#define SRE_FLAG_VERBOSE %d\n" % SRE_FLAG_VERBOSE)
f.write("#define SRE_INFO_PREFIX %d\n" % SRE_INFO_PREFIX)
f.write("#define SRE_INFO_LITERAL %d\n" % SRE_INFO_LITERAL)
f.write("#define SRE_INFO_CHARSET %d\n" % SRE_INFO_CHARSET)
f.close()
print "done"
#
# Secret Labs' Regular Expression Engine
#
# convert re-style regular expression to sre pattern
#
# Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved.
#
# See the sre.py file for information on usage and redistribution.
#
"""Internal support module for sre"""
# XXX: show string offset and offending character for all errors
import sys
from sre_constants import *
SPECIAL_CHARS = ".\\[{()*+?^$|"
REPEAT_CHARS = "*+?{"
DIGITS = set("0123456789")
OCTDIGITS = set("01234567")
HEXDIGITS = set("0123456789abcdefABCDEF")
ASCIILETTERS = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
WHITESPACE = set(" \t\n\r\v\f")
ESCAPES = {
r"\a": (LITERAL, ord("\a")),
r"\b": (LITERAL, ord("\b")),
r"\f": (LITERAL, ord("\f")),
r"\n": (LITERAL, ord("\n")),
r"\r": (LITERAL, ord("\r")),
r"\t": (LITERAL, ord("\t")),
r"\v": (LITERAL, ord("\v")),
r"\\": (LITERAL, ord("\\"))
}
CATEGORIES = {
r"\A": (AT, AT_BEGINNING_STRING), # start of string
r"\b": (AT, AT_BOUNDARY),
r"\B": (AT, AT_NON_BOUNDARY),
r"\d": (IN, [(CATEGORY, CATEGORY_DIGIT)]),
r"\D": (IN, [(CATEGORY, CATEGORY_NOT_DIGIT)]),
r"\s": (IN, [(CATEGORY, CATEGORY_SPACE)]),
r"\S": (IN, [(CATEGORY, CATEGORY_NOT_SPACE)]),
r"\w": (IN, [(CATEGORY, CATEGORY_WORD)]),
r"\W": (IN, [(CATEGORY, CATEGORY_NOT_WORD)]),
r"\Z": (AT, AT_END_STRING), # end of string
}
FLAGS = {
# standard flags
"i": SRE_FLAG_IGNORECASE,
"L": SRE_FLAG_LOCALE,
"m": SRE_FLAG_MULTILINE,
"s": SRE_FLAG_DOTALL,
"x": SRE_FLAG_VERBOSE,
# extensions
"t": SRE_FLAG_TEMPLATE,
"u": SRE_FLAG_UNICODE,
}
class Pattern:
# master pattern object. keeps track of global attributes
def __init__(self):
self.flags = 0
self.open = []
self.groups = 1
self.groupdict = {}
self.lookbehind = 0
def opengroup(self, name=None):
gid = self.groups
self.groups = gid + 1
if name is not None:
ogid = self.groupdict.get(name, None)
if ogid is not None:
raise error, ("redefinition of group name %s as group %d; "
"was group %d" % (repr(name), gid, ogid))
self.groupdict[name] = gid
self.open.append(gid)
return gid
def closegroup(self, gid):
self.open.remove(gid)
def checkgroup(self, gid):
return gid < self.groups and gid not in self.open
class SubPattern:
# a subpattern, in intermediate form
def __init__(self, pattern, data=None):
self.pattern = pattern
if data is None:
data = []
self.data = data
self.width = None
def dump(self, level=0):
seqtypes = (tuple, list)
for op, av in self.data:
print level*" " + op,
if op == IN:
# member sublanguage
print
for op, a in av:
print (level+1)*" " + op, a
elif op == BRANCH:
print
for i, a in enumerate(av[1]):
if i:
print level*" " + "or"
a.dump(level+1)
elif op == GROUPREF_EXISTS:
condgroup, item_yes, item_no = av
print condgroup
item_yes.dump(level+1)
if item_no:
print level*" " + "else"
item_no.dump(level+1)
elif isinstance(av, seqtypes):
nl = 0
for a in av:
if isinstance(a, SubPattern):
if not nl:
print
a.dump(level+1)
nl = 1
else:
print a,
nl = 0
if not nl:
print
else:
print av
def __repr__(self):
return repr(self.data)
def __len__(self):
return len(self.data)
def __delitem__(self, index):
del self.data[index]
def __getitem__(self, index):
if isinstance(index, slice):
return SubPattern(self.pattern, self.data[index])
return self.data[index]
def __setitem__(self, index, code):
self.data[index] = code
def insert(self, index, code):
self.data.insert(index, code)
def append(self, code):
self.data.append(code)
def getwidth(self):
# determine the width (min, max) for this subpattern
if self.width:
return self.width
lo = hi = 0
UNITCODES = (ANY, RANGE, IN, LITERAL, NOT_LITERAL, CATEGORY)
REPEATCODES = (MIN_REPEAT, MAX_REPEAT)
for op, av in self.data:
if op is BRANCH:
i = MAXREPEAT - 1
j = 0
for av in av[1]:
l, h = av.getwidth()
i = min(i, l)
j = max(j, h)
lo = lo + i
hi = hi + j
elif op is CALL:
i, j = av.getwidth()
lo = lo + i
hi = hi + j
elif op is SUBPATTERN:
i, j = av[1].getwidth()
lo = lo + i
hi = hi + j
elif op in REPEATCODES:
i, j = av[2].getwidth()
lo = lo + i * av[0]
hi = hi + j * av[1]
elif op in UNITCODES:
lo = lo + 1
hi = hi + 1
elif op == SUCCESS:
break
self.width = min(lo, MAXREPEAT - 1), min(hi, MAXREPEAT)
return self.width
class Tokenizer:
def __init__(self, string):
self.string = string
self.index = 0
self.__next()
def __next(self):
if self.index >= len(self.string):
self.next = None
return
char = self.string[self.index]
if char[0] == "\\":
try:
c = self.string[self.index + 1]
except IndexError:
raise error, "bogus escape (end of line)"
char = char + c
self.index = self.index + len(char)
self.next = char
def match(self, char, skip=1):
if char == self.next:
if skip:
self.__next()
return 1
return 0
def get(self):
this = self.next
self.__next()
return this
def tell(self):
return self.index, self.next
def seek(self, index):
self.index, self.next = index
def isident(char):
return "a" <= char <= "z" or "A" <= char <= "Z" or char == "_"
def isdigit(char):
return "0" <= char <= "9"
def isname(name):
# check that group name is a valid string
if not isident(name[0]):
return False
for char in name[1:]:
if not isident(char) and not isdigit(char):
return False
return True
def _class_escape(source, escape, nested):
# handle escape code inside character class
code = ESCAPES.get(escape)
if code:
return code
code = CATEGORIES.get(escape)
if code and code[0] == IN:
return code
try:
c = escape[1:2]
if c == "x":
# hexadecimal escape (exactly two digits)
while source.next in HEXDIGITS and len(escape) < 4:
escape = escape + source.get()
escape = escape[2:]
if len(escape) != 2:
raise error, "bogus escape: %s" % repr("\\" + escape)
return LITERAL, int(escape, 16) & 0xff
elif c in OCTDIGITS:
# octal escape (up to three digits)
while source.next in OCTDIGITS and len(escape) < 4:
escape = escape + source.get()
escape = escape[1:]
return LITERAL, int(escape, 8) & 0xff
elif c in DIGITS:
raise error, "bogus escape: %s" % repr(escape)
if len(escape) == 2:
if sys.py3kwarning and c in ASCIILETTERS:
import warnings
if c in 'Uu':
warnings.warn('bad escape %s; Unicode escapes are '
'supported only since Python 3.3' % escape,
FutureWarning, stacklevel=nested + 6)
else:
warnings.warnpy3k('bad escape %s' % escape,
DeprecationWarning, stacklevel=nested + 6)
return LITERAL, ord(escape[1])
except ValueError:
pass
raise error, "bogus escape: %s" % repr(escape)
def _escape(source, escape, state, nested):
# handle escape code in expression
code = CATEGORIES.get(escape)
if code:
return code
code = ESCAPES.get(escape)
if code:
return code
try:
c = escape[1:2]
if c == "x":
# hexadecimal escape
while source.next in HEXDIGITS and len(escape) < 4:
escape = escape + source.get()
if len(escape) != 4:
raise ValueError
return LITERAL, int(escape[2:], 16) & 0xff
elif c == "0":
# octal escape
while source.next in OCTDIGITS and len(escape) < 4:
escape = escape + source.get()
return LITERAL, int(escape[1:], 8) & 0xff
elif c in DIGITS:
# octal escape *or* decimal group reference (sigh)
if source.next in DIGITS:
escape = escape + source.get()
if (escape[1] in OCTDIGITS and escape[2] in OCTDIGITS and
source.next in OCTDIGITS):
# got three octal digits; this is an octal escape
escape = escape + source.get()
return LITERAL, int(escape[1:], 8) & 0xff
# not an octal escape, so this is a group reference
group = int(escape[1:])
if group < state.groups:
if not state.checkgroup(group):
raise error, "cannot refer to open group"
if state.lookbehind:
import warnings
warnings.warn('group references in lookbehind '
'assertions are not supported',
RuntimeWarning, stacklevel=nested + 6)
return GROUPREF, group
raise ValueError
if len(escape) == 2:
if sys.py3kwarning and c in ASCIILETTERS:
import warnings
if c in 'Uu':
warnings.warn('bad escape %s; Unicode escapes are '
'supported only since Python 3.3' % escape,
FutureWarning, stacklevel=nested + 6)
else:
warnings.warnpy3k('bad escape %s' % escape,
DeprecationWarning, stacklevel=nested + 6)
return LITERAL, ord(escape[1])
except ValueError:
pass
raise error, "bogus escape: %s" % repr(escape)
def _parse_sub(source, state, nested):
# parse an alternation: a|b|c
items = []
itemsappend = items.append
sourcematch = source.match
while 1:
itemsappend(_parse(source, state, nested + 1))
if sourcematch("|"):
continue
if not nested:
break
if not source.next or sourcematch(")", 0):
break
else:
raise error, "pattern not properly closed"
if len(items) == 1:
return items[0]
subpattern = SubPattern(state)
subpatternappend = subpattern.append
# check if all items share a common prefix
while 1:
prefix = None
for item in items:
if not item:
break
if prefix is None:
prefix = item[0]
elif item[0] != prefix:
break
else:
# all subitems start with a common "prefix".
# move it out of the branch
for item in items:
del item[0]
subpatternappend(prefix)
continue # check next one
break
# check if the branch can be replaced by a character set
for item in items:
if len(item) != 1 or item[0][0] != LITERAL:
break
else:
# we can store this as a character set instead of a
# branch (the compiler may optimize this even more)
set = []
setappend = set.append
for item in items:
setappend(item[0])
subpatternappend((IN, set))
return subpattern
subpattern.append((BRANCH, (None, items)))
return subpattern
def _parse_sub_cond(source, state, condgroup, nested):
item_yes = _parse(source, state, nested + 1)
if source.match("|"):
item_no = _parse(source, state, nested + 1)
if source.match("|"):
raise error, "conditional backref with more than two branches"
else:
item_no = None
if source.next and not source.match(")", 0):
raise error, "pattern not properly closed"
subpattern = SubPattern(state)
subpattern.append((GROUPREF_EXISTS, (condgroup, item_yes, item_no)))
return subpattern
_PATTERNENDERS = set("|)")
_ASSERTCHARS = set("=!<")
_LOOKBEHINDASSERTCHARS = set("=!")
_REPEATCODES = set([MIN_REPEAT, MAX_REPEAT])
def _parse(source, state, nested):
# parse a simple pattern
subpattern = SubPattern(state)
# precompute constants into local variables
subpatternappend = subpattern.append
sourceget = source.get
sourcematch = source.match
_len = len
PATTERNENDERS = _PATTERNENDERS
ASSERTCHARS = _ASSERTCHARS
LOOKBEHINDASSERTCHARS = _LOOKBEHINDASSERTCHARS
REPEATCODES = _REPEATCODES
while 1:
if source.next in PATTERNENDERS:
break # end of subpattern
this = sourceget()
if this is None:
break # end of pattern
if state.flags & SRE_FLAG_VERBOSE:
# skip whitespace and comments
if this in WHITESPACE:
continue
if this == "#":
while 1:
this = sourceget()
if this in (None, "\n"):
break
continue
if this and this[0] not in SPECIAL_CHARS:
subpatternappend((LITERAL, ord(this)))
elif this == "[":
# character set
set = []
setappend = set.append
## if sourcematch(":"):
## pass # handle character classes
if sourcematch("^"):
setappend((NEGATE, None))
# check remaining characters
start = set[:]
while 1:
this = sourceget()
if this == "]" and set != start:
break
elif this and this[0] == "\\":
code1 = _class_escape(source, this, nested + 1)
elif this:
code1 = LITERAL, ord(this)
else:
raise error, "unexpected end of regular expression"
if sourcematch("-"):
# potential range
this = sourceget()
if this == "]":
if code1[0] is IN:
code1 = code1[1][0]
setappend(code1)
setappend((LITERAL, ord("-")))
break
elif this:
if this[0] == "\\":
code2 = _class_escape(source, this, nested + 1)
else:
code2 = LITERAL, ord(this)
if code1[0] != LITERAL or code2[0] != LITERAL:
raise error, "bad character range"
lo = code1[1]
hi = code2[1]
if hi < lo:
raise error, "bad character range"
setappend((RANGE, (lo, hi)))
else:
raise error, "unexpected end of regular expression"
else:
if code1[0] is IN:
code1 = code1[1][0]
setappend(code1)
# XXX: <fl> should move set optimization to compiler!
if _len(set)==1 and set[0][0] is LITERAL:
subpatternappend(set[0]) # optimization
elif _len(set)==2 and set[0][0] is NEGATE and set[1][0] is LITERAL:
subpatternappend((NOT_LITERAL, set[1][1])) # optimization
else:
# XXX: <fl> should add charmap optimization here
subpatternappend((IN, set))
elif this and this[0] in REPEAT_CHARS:
# repeat previous item
if this == "?":
min, max = 0, 1
elif this == "*":
min, max = 0, MAXREPEAT
elif this == "+":
min, max = 1, MAXREPEAT
elif this == "{":
if source.next == "}":
subpatternappend((LITERAL, ord(this)))
continue
here = source.tell()
min, max = 0, MAXREPEAT
lo = hi = ""
while source.next in DIGITS:
lo = lo + source.get()
if sourcematch(","):
while source.next in DIGITS:
hi = hi + sourceget()
else:
hi = lo
if not sourcematch("}"):
subpatternappend((LITERAL, ord(this)))
source.seek(here)
continue
if lo:
min = int(lo)
if min >= MAXREPEAT:
raise OverflowError("the repetition number is too large")
if hi:
max = int(hi)
if max >= MAXREPEAT:
raise OverflowError("the repetition number is too large")
if max < min:
raise error("bad repeat interval")
else:
raise error, "not supported"
# figure out which item to repeat
if subpattern:
item = subpattern[-1:]
else:
item = None
if not item or (_len(item) == 1 and item[0][0] == AT):
raise error, "nothing to repeat"
if item[0][0] in REPEATCODES:
raise error, "multiple repeat"
if sourcematch("?"):
subpattern[-1] = (MIN_REPEAT, (min, max, item))
else:
subpattern[-1] = (MAX_REPEAT, (min, max, item))
elif this == ".":
subpatternappend((ANY, None))
elif this == "(":
group = 1
name = None
condgroup = None
if sourcematch("?"):
group = 0
# options
if sourcematch("P"):
# python extensions
if sourcematch("<"):
# named group: skip forward to end of name
name = ""
while 1:
char = sourceget()
if char is None:
raise error, "unterminated name"
if char == ">":
break
name = name + char
group = 1
if not name:
raise error("missing group name")
if not isname(name):
raise error("bad character in group name %r" %
name)
elif sourcematch("="):
# named backreference
name = ""
while 1:
char = sourceget()
if char is None:
raise error, "unterminated name"
if char == ")":
break
name = name + char
if not name:
raise error("missing group name")
if not isname(name):
raise error("bad character in backref group name "
"%r" % name)
gid = state.groupdict.get(name)
if gid is None:
msg = "unknown group name: {0!r}".format(name)
raise error(msg)
if state.lookbehind:
import warnings
warnings.warn('group references in lookbehind '
'assertions are not supported',
RuntimeWarning, stacklevel=nested + 6)
subpatternappend((GROUPREF, gid))
continue
else:
char = sourceget()
if char is None:
raise error, "unexpected end of pattern"
raise error, "unknown specifier: ?P%s" % char
elif sourcematch(":"):
# non-capturing group
group = 2
elif sourcematch("#"):
# comment
while 1:
if source.next is None or source.next == ")":
break
sourceget()
if not sourcematch(")"):
raise error, "unbalanced parenthesis"
continue
elif source.next in ASSERTCHARS:
# lookahead assertions
char = sourceget()
dir = 1
if char == "<":
if source.next not in LOOKBEHINDASSERTCHARS:
raise error, "syntax error"
dir = -1 # lookbehind
char = sourceget()
state.lookbehind += 1
p = _parse_sub(source, state, nested + 1)
if dir < 0:
state.lookbehind -= 1
if not sourcematch(")"):
raise error, "unbalanced parenthesis"
if char == "=":
subpatternappend((ASSERT, (dir, p)))
else:
subpatternappend((ASSERT_NOT, (dir, p)))
continue
elif sourcematch("("):
# conditional backreference group
condname = ""
while 1:
char = sourceget()
if char is None:
raise error, "unterminated name"
if char == ")":
break
condname = condname + char
group = 2
if not condname:
raise error("missing group name")
if isname(condname):
condgroup = state.groupdict.get(condname)
if condgroup is None:
msg = "unknown group name: {0!r}".format(condname)
raise error(msg)
else:
try:
condgroup = int(condname)
except ValueError:
raise error, "bad character in group name"
if state.lookbehind:
import warnings
warnings.warn('group references in lookbehind '
'assertions are not supported',
RuntimeWarning, stacklevel=nested + 6)
else:
# flags
if not source.next in FLAGS:
raise error, "unexpected end of pattern"
while source.next in FLAGS:
state.flags = state.flags | FLAGS[sourceget()]
if group:
# parse group contents
if group == 2:
# anonymous group
group = None
else:
group = state.opengroup(name)
if condgroup:
p = _parse_sub_cond(source, state, condgroup, nested + 1)
else:
p = _parse_sub(source, state, nested + 1)
if not sourcematch(")"):
raise error, "unbalanced parenthesis"
if group is not None:
state.closegroup(group)
subpatternappend((SUBPATTERN, (group, p)))
else:
while 1:
char = sourceget()
if char is None:
raise error, "unexpected end of pattern"
if char == ")":
break
raise error, "unknown extension"
elif this == "^":
subpatternappend((AT, AT_BEGINNING))
elif this == "$":
subpattern.append((AT, AT_END))
elif this and this[0] == "\\":
code = _escape(source, this, state, nested + 1)
subpatternappend(code)
else:
raise error, "parser error"
return subpattern
def parse(str, flags=0, pattern=None):
# parse 're' pattern into list of (opcode, argument) tuples
source = Tokenizer(str)
if pattern is None:
pattern = Pattern()
pattern.flags = flags
pattern.str = str
p = _parse_sub(source, pattern, 0)
if (sys.py3kwarning and
(p.pattern.flags & SRE_FLAG_LOCALE) and
(p.pattern.flags & SRE_FLAG_UNICODE)):
import warnings
warnings.warnpy3k("LOCALE and UNICODE flags are incompatible",
DeprecationWarning, stacklevel=5)
tail = source.get()
if tail == ")":
raise error, "unbalanced parenthesis"
elif tail:
raise error, "bogus characters at end of regular expression"
if not (flags & SRE_FLAG_VERBOSE) and p.pattern.flags & SRE_FLAG_VERBOSE:
# the VERBOSE flag was switched on inside the pattern. to be
# on the safe side, we'll parse the whole thing again...
return parse(str, p.pattern.flags)
if flags & SRE_FLAG_DEBUG:
p.dump()
return p
def parse_template(source, pattern):
# parse 're' replacement string into list of literals and
# group references
s = Tokenizer(source)
sget = s.get
p = []
a = p.append
def literal(literal, p=p, pappend=a):
if p and p[-1][0] is LITERAL:
p[-1] = LITERAL, p[-1][1] + literal
else:
pappend((LITERAL, literal))
sep = source[:0]
if type(sep) is type(""):
makechar = chr
else:
makechar = unichr
while 1:
this = sget()
if this is None:
break # end of replacement string
if this and this[0] == "\\":
# group
c = this[1:2]
if c == "g":
name = ""
if s.match("<"):
while 1:
char = sget()
if char is None:
raise error, "unterminated group name"
if char == ">":
break
name = name + char
if not name:
raise error, "missing group name"
try:
index = int(name)
if index < 0:
raise error, "negative group number"
except ValueError:
if not isname(name):
raise error, "bad character in group name"
try:
index = pattern.groupindex[name]
except KeyError:
msg = "unknown group name: {0!r}".format(name)
raise IndexError(msg)
a((MARK, index))
elif c == "0":
if s.next in OCTDIGITS:
this = this + sget()
if s.next in OCTDIGITS:
this = this + sget()
literal(makechar(int(this[1:], 8) & 0xff))
elif c in DIGITS:
isoctal = False
if s.next in DIGITS:
this = this + sget()
if (c in OCTDIGITS and this[2] in OCTDIGITS and
s.next in OCTDIGITS):
this = this + sget()
isoctal = True
literal(makechar(int(this[1:], 8) & 0xff))
if not isoctal:
a((MARK, int(this[1:])))
else:
try:
this = makechar(ESCAPES[this][1])
except KeyError:
if sys.py3kwarning and c in ASCIILETTERS:
import warnings
warnings.warnpy3k('bad escape %s' % this,
DeprecationWarning, stacklevel=4)
literal(this)
else:
literal(this)
# convert template to groups and literals lists
i = 0
groups = []
groupsappend = groups.append
literals = [None] * len(p)
for c, s in p:
if c is MARK:
groupsappend((i, s))
# literal[i] is already None
else:
literals[i] = s
i = i + 1
return groups, literals
def expand_template(template, match):
g = match.group
sep = match.string[:0]
groups, literals = template
literals = literals[:]
try:
for index, group in groups:
literals[index] = s = g(group)
if s is None:
raise error, "unmatched group"
except IndexError:
raise error, "invalid group reference"
return sep.join(literals)
# Wrapper module for _ssl, providing some additional facilities
# implemented in Python. Written by Bill Janssen.
"""This module provides some more Pythonic support for SSL.
Object types:
SSLSocket -- subtype of socket.socket which does SSL over the socket
Exceptions:
SSLError -- exception raised for I/O errors
Functions:
cert_time_to_seconds -- convert time string used for certificate
notBefore and notAfter functions to integer
seconds past the Epoch (the time values
returned from time.time())
fetch_server_certificate (HOST, PORT) -- fetch the certificate provided
by the server running on HOST at port PORT. No
validation of the certificate is performed.
Integer constants:
SSL_ERROR_ZERO_RETURN
SSL_ERROR_WANT_READ
SSL_ERROR_WANT_WRITE
SSL_ERROR_WANT_X509_LOOKUP
SSL_ERROR_SYSCALL
SSL_ERROR_SSL
SSL_ERROR_WANT_CONNECT
SSL_ERROR_EOF
SSL_ERROR_INVALID_ERROR_CODE
The following group define certificate requirements that one side is
allowing/requiring from the other side:
CERT_NONE - no certificates from the other side are required (or will
be looked at if provided)
CERT_OPTIONAL - certificates are not required, but if provided will be
validated, and if validation fails, the connection will
also fail
CERT_REQUIRED - certificates are required, and will be validated, and
if validation fails, the connection will also fail
The following constants identify various SSL protocol variants:
PROTOCOL_SSLv2
PROTOCOL_SSLv3
PROTOCOL_SSLv23
PROTOCOL_TLS
PROTOCOL_TLSv1
PROTOCOL_TLSv1_1
PROTOCOL_TLSv1_2
The following constants identify various SSL alert message descriptions as per
http://www.iana.org/assignments/tls-parameters/tls-parameters.xml#tls-parameters-6
ALERT_DESCRIPTION_CLOSE_NOTIFY
ALERT_DESCRIPTION_UNEXPECTED_MESSAGE
ALERT_DESCRIPTION_BAD_RECORD_MAC
ALERT_DESCRIPTION_RECORD_OVERFLOW
ALERT_DESCRIPTION_DECOMPRESSION_FAILURE
ALERT_DESCRIPTION_HANDSHAKE_FAILURE
ALERT_DESCRIPTION_BAD_CERTIFICATE
ALERT_DESCRIPTION_UNSUPPORTED_CERTIFICATE
ALERT_DESCRIPTION_CERTIFICATE_REVOKED
ALERT_DESCRIPTION_CERTIFICATE_EXPIRED
ALERT_DESCRIPTION_CERTIFICATE_UNKNOWN
ALERT_DESCRIPTION_ILLEGAL_PARAMETER
ALERT_DESCRIPTION_UNKNOWN_CA
ALERT_DESCRIPTION_ACCESS_DENIED
ALERT_DESCRIPTION_DECODE_ERROR
ALERT_DESCRIPTION_DECRYPT_ERROR
ALERT_DESCRIPTION_PROTOCOL_VERSION
ALERT_DESCRIPTION_INSUFFICIENT_SECURITY
ALERT_DESCRIPTION_INTERNAL_ERROR
ALERT_DESCRIPTION_USER_CANCELLED
ALERT_DESCRIPTION_NO_RENEGOTIATION
ALERT_DESCRIPTION_UNSUPPORTED_EXTENSION
ALERT_DESCRIPTION_CERTIFICATE_UNOBTAINABLE
ALERT_DESCRIPTION_UNRECOGNIZED_NAME
ALERT_DESCRIPTION_BAD_CERTIFICATE_STATUS_RESPONSE
ALERT_DESCRIPTION_BAD_CERTIFICATE_HASH_VALUE
ALERT_DESCRIPTION_UNKNOWN_PSK_IDENTITY
"""
import textwrap
import re
import sys
import os
from collections import namedtuple
from contextlib import closing
import _ssl # if we can't import it, let the error propagate
from _ssl import OPENSSL_VERSION_NUMBER, OPENSSL_VERSION_INFO, OPENSSL_VERSION
from _ssl import _SSLContext
from _ssl import (
SSLError, SSLZeroReturnError, SSLWantReadError, SSLWantWriteError,
SSLSyscallError, SSLEOFError,
)
from _ssl import CERT_NONE, CERT_OPTIONAL, CERT_REQUIRED
from _ssl import txt2obj as _txt2obj, nid2obj as _nid2obj
from _ssl import RAND_status, RAND_add
try:
from _ssl import RAND_egd
except ImportError:
# LibreSSL does not provide RAND_egd
pass
def _import_symbols(prefix):
for n in dir(_ssl):
if n.startswith(prefix):
globals()[n] = getattr(_ssl, n)
_import_symbols('OP_')
_import_symbols('ALERT_DESCRIPTION_')
_import_symbols('SSL_ERROR_')
_import_symbols('PROTOCOL_')
_import_symbols('VERIFY_')
from _ssl import HAS_SNI, HAS_ECDH, HAS_NPN, HAS_ALPN, HAS_TLSv1_3
from _ssl import _OPENSSL_API_VERSION
_PROTOCOL_NAMES = {value: name for name, value in globals().items()
if name.startswith('PROTOCOL_')
and name != 'PROTOCOL_SSLv23'}
PROTOCOL_SSLv23 = PROTOCOL_TLS
try:
_SSLv2_IF_EXISTS = PROTOCOL_SSLv2
except NameError:
_SSLv2_IF_EXISTS = None
from socket import socket, _fileobject, _delegate_methods, error as socket_error
if sys.platform == "win32" or os.name == "nt":
from _ssl import enum_certificates, enum_crls
from socket import socket, AF_INET, SOCK_STREAM, create_connection
from socket import SOL_SOCKET, SO_TYPE
import base64 # for DER-to-PEM translation
import errno
import warnings
if _ssl.HAS_TLS_UNIQUE:
CHANNEL_BINDING_TYPES = ['tls-unique']
else:
CHANNEL_BINDING_TYPES = []
# Disable weak or insecure ciphers by default
# (OpenSSL's default setting is 'DEFAULT:!aNULL:!eNULL')
# Enable a better set of ciphers by default
# This list has been explicitly chosen to:
# * TLS 1.3 ChaCha20 and AES-GCM cipher suites
# * Prefer cipher suites that offer perfect forward secrecy (DHE/ECDHE)
# * Prefer ECDHE over DHE for better performance
# * Prefer AEAD over CBC for better performance and security
# * Prefer AES-GCM over ChaCha20 because most platforms have AES-NI
# (ChaCha20 needs OpenSSL 1.1.0 or patched 1.0.2)
# * Prefer any AES-GCM and ChaCha20 over any AES-CBC for better
# performance and security
# * Then Use HIGH cipher suites as a fallback
# * Disable NULL authentication, NULL encryption, 3DES and MD5 MACs
# for security reasons
_DEFAULT_CIPHERS = (
'TLS13-AES-256-GCM-SHA384:TLS13-CHACHA20-POLY1305-SHA256:'
'TLS13-AES-128-GCM-SHA256:'
'ECDH+AESGCM:ECDH+CHACHA20:DH+AESGCM:DH+CHACHA20:ECDH+AES256:DH+AES256:'
'ECDH+AES128:DH+AES:ECDH+HIGH:DH+HIGH:RSA+AESGCM:RSA+AES:RSA+HIGH:'
'!aNULL:!eNULL:!MD5:!3DES'
)
# Restricted and more secure ciphers for the server side
# This list has been explicitly chosen to:
# * TLS 1.3 ChaCha20 and AES-GCM cipher suites
# * Prefer cipher suites that offer perfect forward secrecy (DHE/ECDHE)
# * Prefer ECDHE over DHE for better performance
# * Prefer AEAD over CBC for better performance and security
# * Prefer AES-GCM over ChaCha20 because most platforms have AES-NI
# * Prefer any AES-GCM and ChaCha20 over any AES-CBC for better
# performance and security
# * Then Use HIGH cipher suites as a fallback
# * Disable NULL authentication, NULL encryption, MD5 MACs, DSS, RC4, and
# 3DES for security reasons
_RESTRICTED_SERVER_CIPHERS = (
'TLS13-AES-256-GCM-SHA384:TLS13-CHACHA20-POLY1305-SHA256:'
'TLS13-AES-128-GCM-SHA256:'
'ECDH+AESGCM:ECDH+CHACHA20:DH+AESGCM:DH+CHACHA20:ECDH+AES256:DH+AES256:'
'ECDH+AES128:DH+AES:ECDH+HIGH:DH+HIGH:RSA+AESGCM:RSA+AES:RSA+HIGH:'
'!aNULL:!eNULL:!MD5:!DSS:!RC4:!3DES'
)
class CertificateError(ValueError):
pass
def _dnsname_match(dn, hostname, max_wildcards=1):
"""Matching according to RFC 6125, section 6.4.3
http://tools.ietf.org/html/rfc6125#section-6.4.3
"""
pats = []
if not dn:
return False
pieces = dn.split(r'.')
leftmost = pieces[0]
remainder = pieces[1:]
wildcards = leftmost.count('*')
if wildcards > max_wildcards:
# Issue #17980: avoid denials of service by refusing more
# than one wildcard per fragment. A survery of established
# policy among SSL implementations showed it to be a
# reasonable choice.
raise CertificateError(
"too many wildcards in certificate DNS name: " + repr(dn))
# speed up common case w/o wildcards
if not wildcards:
return dn.lower() == hostname.lower()
# RFC 6125, section 6.4.3, subitem 1.
# The client SHOULD NOT attempt to match a presented identifier in which
# the wildcard character comprises a label other than the left-most label.
if leftmost == '*':
# When '*' is a fragment by itself, it matches a non-empty dotless
# fragment.
pats.append('[^.]+')
elif leftmost.startswith('xn--') or hostname.startswith('xn--'):
# RFC 6125, section 6.4.3, subitem 3.
# The client SHOULD NOT attempt to match a presented identifier
# where the wildcard character is embedded within an A-label or
# U-label of an internationalized domain name.
pats.append(re.escape(leftmost))
else:
# Otherwise, '*' matches any dotless string, e.g. www*
pats.append(re.escape(leftmost).replace(r'\*', '[^.]*'))
# add the remaining fragments, ignore any wildcards
for frag in remainder:
pats.append(re.escape(frag))
pat = re.compile(r'\A' + r'\.'.join(pats) + r'\Z', re.IGNORECASE)
return pat.match(hostname)
def match_hostname(cert, hostname):
"""Verify that *cert* (in decoded format as returned by
SSLSocket.getpeercert()) matches the *hostname*. RFC 2818 and RFC 6125
rules are followed, but IP addresses are not accepted for *hostname*.
CertificateError is raised on failure. On success, the function
returns nothing.
"""
if not cert:
raise ValueError("empty or no certificate, match_hostname needs a "
"SSL socket or SSL context with either "
"CERT_OPTIONAL or CERT_REQUIRED")
dnsnames = []
san = cert.get('subjectAltName', ())
for key, value in san:
if key == 'DNS':
if _dnsname_match(value, hostname):
return
dnsnames.append(value)
if not dnsnames:
# The subject is only checked when there is no dNSName entry
# in subjectAltName
for sub in cert.get('subject', ()):
for key, value in sub:
# XXX according to RFC 2818, the most specific Common Name
# must be used.
if key == 'commonName':
if _dnsname_match(value, hostname):
return
dnsnames.append(value)
if len(dnsnames) > 1:
raise CertificateError("hostname %r "
"doesn't match either of %s"
% (hostname, ', '.join(map(repr, dnsnames))))
elif len(dnsnames) == 1:
raise CertificateError("hostname %r "
"doesn't match %r"
% (hostname, dnsnames[0]))
else:
raise CertificateError("no appropriate commonName or "
"subjectAltName fields were found")
DefaultVerifyPaths = namedtuple("DefaultVerifyPaths",
"cafile capath openssl_cafile_env openssl_cafile openssl_capath_env "
"openssl_capath")
def get_default_verify_paths():
"""Return paths to default cafile and capath.
"""
parts = _ssl.get_default_verify_paths()
# environment vars shadow paths
cafile = os.environ.get(parts[0], parts[1])
capath = os.environ.get(parts[2], parts[3])
return DefaultVerifyPaths(cafile if os.path.isfile(cafile) else None,
capath if os.path.isdir(capath) else None,
*parts)
class _ASN1Object(namedtuple("_ASN1Object", "nid shortname longname oid")):
"""ASN.1 object identifier lookup
"""
__slots__ = ()
def __new__(cls, oid):
return super(_ASN1Object, cls).__new__(cls, *_txt2obj(oid, name=False))
@classmethod
def fromnid(cls, nid):
"""Create _ASN1Object from OpenSSL numeric ID
"""
return super(_ASN1Object, cls).__new__(cls, *_nid2obj(nid))
@classmethod
def fromname(cls, name):
"""Create _ASN1Object from short name, long name or OID
"""
return super(_ASN1Object, cls).__new__(cls, *_txt2obj(name, name=True))
class Purpose(_ASN1Object):
"""SSLContext purpose flags with X509v3 Extended Key Usage objects
"""
Purpose.SERVER_AUTH = Purpose('1.3.6.1.5.5.7.3.1')
Purpose.CLIENT_AUTH = Purpose('1.3.6.1.5.5.7.3.2')
class SSLContext(_SSLContext):
"""An SSLContext holds various SSL-related configuration options and
data, such as certificates and possibly a private key."""
__slots__ = ('protocol', '__weakref__')
_windows_cert_stores = ("CA", "ROOT")
def __new__(cls, protocol, *args, **kwargs):
self = _SSLContext.__new__(cls, protocol)
if protocol != _SSLv2_IF_EXISTS:
self.set_ciphers(_DEFAULT_CIPHERS)
return self
def __init__(self, protocol):
self.protocol = protocol
def wrap_socket(self, sock, server_side=False,
do_handshake_on_connect=True,
suppress_ragged_eofs=True,
server_hostname=None):
return SSLSocket(sock=sock, server_side=server_side,
do_handshake_on_connect=do_handshake_on_connect,
suppress_ragged_eofs=suppress_ragged_eofs,
server_hostname=server_hostname,
_context=self)
def set_npn_protocols(self, npn_protocols):
protos = bytearray()
for protocol in npn_protocols:
b = protocol.encode('ascii')
if len(b) == 0 or len(b) > 255:
raise SSLError('NPN protocols must be 1 to 255 in length')
protos.append(len(b))
protos.extend(b)
self._set_npn_protocols(protos)
def set_alpn_protocols(self, alpn_protocols):
protos = bytearray()
for protocol in alpn_protocols:
b = protocol.encode('ascii')
if len(b) == 0 or len(b) > 255:
raise SSLError('ALPN protocols must be 1 to 255 in length')
protos.append(len(b))
protos.extend(b)
self._set_alpn_protocols(protos)
def _load_windows_store_certs(self, storename, purpose):
certs = bytearray()
try:
for cert, encoding, trust in enum_certificates(storename):
# CA certs are never PKCS#7 encoded
if encoding == "x509_asn":
if trust is True or purpose.oid in trust:
certs.extend(cert)
except OSError:
warnings.warn("unable to enumerate Windows certificate store")
if certs:
self.load_verify_locations(cadata=certs)
return certs
def load_default_certs(self, purpose=Purpose.SERVER_AUTH):
if not isinstance(purpose, _ASN1Object):
raise TypeError(purpose)
if sys.platform == "win32" or os.name == "nt":
for storename in self._windows_cert_stores:
self._load_windows_store_certs(storename, purpose)
self.set_default_verify_paths()
def create_default_context(purpose=Purpose.SERVER_AUTH, cafile=None,
capath=None, cadata=None):
"""Create a SSLContext object with default settings.
NOTE: The protocol and settings may change anytime without prior
deprecation. The values represent a fair balance between maximum
compatibility and security.
"""
if not isinstance(purpose, _ASN1Object):
raise TypeError(purpose)
# SSLContext sets OP_NO_SSLv2, OP_NO_SSLv3, OP_NO_COMPRESSION,
# OP_CIPHER_SERVER_PREFERENCE, OP_SINGLE_DH_USE and OP_SINGLE_ECDH_USE
# by default.
context = SSLContext(PROTOCOL_TLS)
if purpose == Purpose.SERVER_AUTH:
# verify certs and host name in client mode
context.verify_mode = CERT_REQUIRED
context.check_hostname = True
elif purpose == Purpose.CLIENT_AUTH:
context.set_ciphers(_RESTRICTED_SERVER_CIPHERS)
if cafile or capath or cadata:
context.load_verify_locations(cafile, capath, cadata)
elif context.verify_mode != CERT_NONE:
# no explicit cafile, capath or cadata but the verify mode is
# CERT_OPTIONAL or CERT_REQUIRED. Let's try to load default system
# root CA certificates for the given purpose. This may fail silently.
context.load_default_certs(purpose)
return context
def _create_unverified_context(protocol=PROTOCOL_TLS, cert_reqs=None,
check_hostname=False, purpose=Purpose.SERVER_AUTH,
certfile=None, keyfile=None,
cafile=None, capath=None, cadata=None):
"""Create a SSLContext object for Python stdlib modules
All Python stdlib modules shall use this function to create SSLContext
objects in order to keep common settings in one place. The configuration
is less restrict than create_default_context()'s to increase backward
compatibility.
"""
if not isinstance(purpose, _ASN1Object):
raise TypeError(purpose)
# SSLContext sets OP_NO_SSLv2, OP_NO_SSLv3, OP_NO_COMPRESSION,
# OP_CIPHER_SERVER_PREFERENCE, OP_SINGLE_DH_USE and OP_SINGLE_ECDH_USE
# by default.
context = SSLContext(protocol)
if cert_reqs is not None:
context.verify_mode = cert_reqs
context.check_hostname = check_hostname
if keyfile and not certfile:
raise ValueError("certfile must be specified")
if certfile or keyfile:
context.load_cert_chain(certfile, keyfile)
# load CA root certs
if cafile or capath or cadata:
context.load_verify_locations(cafile, capath, cadata)
elif context.verify_mode != CERT_NONE:
# no explicit cafile, capath or cadata but the verify mode is
# CERT_OPTIONAL or CERT_REQUIRED. Let's try to load default system
# root CA certificates for the given purpose. This may fail silently.
context.load_default_certs(purpose)
return context
# Backwards compatibility alias, even though it's not a public name.
_create_stdlib_context = _create_unverified_context
# PEP 493: Verify HTTPS by default, but allow envvar to override that
_https_verify_envvar = 'PYTHONHTTPSVERIFY'
def _get_https_context_factory():
if not sys.flags.ignore_environment:
config_setting = os.environ.get(_https_verify_envvar)
if config_setting == '0':
return _create_unverified_context
return create_default_context
_create_default_https_context = _get_https_context_factory()
# PEP 493: "private" API to configure HTTPS defaults without monkeypatching
def _https_verify_certificates(enable=True):
"""Verify server HTTPS certificates by default?"""
global _create_default_https_context
if enable:
_create_default_https_context = create_default_context
else:
_create_default_https_context = _create_unverified_context
class SSLSocket(socket):
"""This class implements a subtype of socket.socket that wraps
the underlying OS socket in an SSL context when necessary, and
provides read and write methods over that channel."""
def __init__(self, sock=None, keyfile=None, certfile=None,
server_side=False, cert_reqs=CERT_NONE,
ssl_version=PROTOCOL_TLS, ca_certs=None,
do_handshake_on_connect=True,
family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None,
suppress_ragged_eofs=True, npn_protocols=None, ciphers=None,
server_hostname=None,
_context=None):
self._makefile_refs = 0
if _context:
self._context = _context
else:
if server_side and not certfile:
raise ValueError("certfile must be specified for server-side "
"operations")
if keyfile and not certfile:
raise ValueError("certfile must be specified")
if certfile and not keyfile:
keyfile = certfile
self._context = SSLContext(ssl_version)
self._context.verify_mode = cert_reqs
if ca_certs:
self._context.load_verify_locations(ca_certs)
if certfile:
self._context.load_cert_chain(certfile, keyfile)
if npn_protocols:
self._context.set_npn_protocols(npn_protocols)
if ciphers:
self._context.set_ciphers(ciphers)
self.keyfile = keyfile
self.certfile = certfile
self.cert_reqs = cert_reqs
self.ssl_version = ssl_version
self.ca_certs = ca_certs
self.ciphers = ciphers
# Can't use sock.type as other flags (such as SOCK_NONBLOCK) get
# mixed in.
if sock.getsockopt(SOL_SOCKET, SO_TYPE) != SOCK_STREAM:
raise NotImplementedError("only stream sockets are supported")
socket.__init__(self, _sock=sock._sock)
# The initializer for socket overrides the methods send(), recv(), etc.
# in the instancce, which we don't need -- but we want to provide the
# methods defined in SSLSocket.
for attr in _delegate_methods:
try:
delattr(self, attr)
except AttributeError:
pass
if server_side and server_hostname:
raise ValueError("server_hostname can only be specified "
"in client mode")
if self._context.check_hostname and not server_hostname:
raise ValueError("check_hostname requires server_hostname")
self.server_side = server_side
self.server_hostname = server_hostname
self.do_handshake_on_connect = do_handshake_on_connect
self.suppress_ragged_eofs = suppress_ragged_eofs
# See if we are connected
try:
self.getpeername()
except socket_error as e:
if e.errno != errno.ENOTCONN:
raise
connected = False
else:
connected = True
self._closed = False
self._sslobj = None
self._connected = connected
if connected:
# create the SSL object
try:
self._sslobj = self._context._wrap_socket(self._sock, server_side,
server_hostname, ssl_sock=self)
if do_handshake_on_connect:
timeout = self.gettimeout()
if timeout == 0.0:
# non-blocking
raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
self.do_handshake()
except (OSError, ValueError):
self.close()
raise
@property
def context(self):
return self._context
@context.setter
def context(self, ctx):
self._context = ctx
self._sslobj.context = ctx
def dup(self):
raise NotImplementedError("Can't dup() %s instances" %
self.__class__.__name__)
def _checkClosed(self, msg=None):
# raise an exception here if you wish to check for spurious closes
pass
def _check_connected(self):
if not self._connected:
# getpeername() will raise ENOTCONN if the socket is really
# not connected; note that we can be connected even without
# _connected being set, e.g. if connect() first returned
# EAGAIN.
self.getpeername()
def read(self, len=1024, buffer=None):
"""Read up to LEN bytes and return them.
Return zero-length string on EOF."""
self._checkClosed()
if not self._sslobj:
raise ValueError("Read on closed or unwrapped SSL socket.")
try:
if buffer is not None:
v = self._sslobj.read(len, buffer)
else:
v = self._sslobj.read(len)
return v
except SSLError as x:
if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:
if buffer is not None:
return 0
else:
return b''
else:
raise
def write(self, data):
"""Write DATA to the underlying SSL channel. Returns
number of bytes of DATA actually transmitted."""
self._checkClosed()
if not self._sslobj:
raise ValueError("Write on closed or unwrapped SSL socket.")
return self._sslobj.write(data)
def getpeercert(self, binary_form=False):
"""Returns a formatted version of the data in the
certificate provided by the other end of the SSL channel.
Return None if no certificate was provided, {} if a
certificate was provided, but not validated."""
self._checkClosed()
self._check_connected()
return self._sslobj.peer_certificate(binary_form)
def selected_npn_protocol(self):
self._checkClosed()
if not self._sslobj or not _ssl.HAS_NPN:
return None
else:
return self._sslobj.selected_npn_protocol()
def selected_alpn_protocol(self):
self._checkClosed()
if not self._sslobj or not _ssl.HAS_ALPN:
return None
else:
return self._sslobj.selected_alpn_protocol()
def cipher(self):
self._checkClosed()
if not self._sslobj:
return None
else:
return self._sslobj.cipher()
def compression(self):
self._checkClosed()
if not self._sslobj:
return None
else:
return self._sslobj.compression()
def send(self, data, flags=0):
self._checkClosed()
if self._sslobj:
if flags != 0:
raise ValueError(
"non-zero flags not allowed in calls to send() on %s" %
self.__class__)
try:
v = self._sslobj.write(data)
except SSLError as x:
if x.args[0] == SSL_ERROR_WANT_READ:
return 0
elif x.args[0] == SSL_ERROR_WANT_WRITE:
return 0
else:
raise
else:
return v
else:
return self._sock.send(data, flags)
def sendto(self, data, flags_or_addr, addr=None):
self._checkClosed()
if self._sslobj:
raise ValueError("sendto not allowed on instances of %s" %
self.__class__)
elif addr is None:
return self._sock.sendto(data, flags_or_addr)
else:
return self._sock.sendto(data, flags_or_addr, addr)
def sendall(self, data, flags=0):
self._checkClosed()
if self._sslobj:
if flags != 0:
raise ValueError(
"non-zero flags not allowed in calls to sendall() on %s" %
self.__class__)
amount = len(data)
count = 0
while (count < amount):
v = self.send(data[count:])
count += v
return amount
else:
return socket.sendall(self, data, flags)
def recv(self, buflen=1024, flags=0):
self._checkClosed()
if self._sslobj:
if flags != 0:
raise ValueError(
"non-zero flags not allowed in calls to recv() on %s" %
self.__class__)
return self.read(buflen)
else:
return self._sock.recv(buflen, flags)
def recv_into(self, buffer, nbytes=None, flags=0):
self._checkClosed()
if buffer and (nbytes is None):
nbytes = len(buffer)
elif nbytes is None:
nbytes = 1024
if self._sslobj:
if flags != 0:
raise ValueError(
"non-zero flags not allowed in calls to recv_into() on %s" %
self.__class__)
return self.read(nbytes, buffer)
else:
return self._sock.recv_into(buffer, nbytes, flags)
def recvfrom(self, buflen=1024, flags=0):
self._checkClosed()
if self._sslobj:
raise ValueError("recvfrom not allowed on instances of %s" %
self.__class__)
else:
return self._sock.recvfrom(buflen, flags)
def recvfrom_into(self, buffer, nbytes=None, flags=0):
self._checkClosed()
if self._sslobj:
raise ValueError("recvfrom_into not allowed on instances of %s" %
self.__class__)
else:
return self._sock.recvfrom_into(buffer, nbytes, flags)
def pending(self):
self._checkClosed()
if self._sslobj:
return self._sslobj.pending()
else:
return 0
def shutdown(self, how):
self._checkClosed()
self._sslobj = None
socket.shutdown(self, how)
def close(self):
if self._makefile_refs < 1:
self._sslobj = None
socket.close(self)
else:
self._makefile_refs -= 1
def unwrap(self):
if self._sslobj:
s = self._sslobj.shutdown()
self._sslobj = None
return s
else:
raise ValueError("No SSL wrapper around " + str(self))
def _real_close(self):
self._sslobj = None
socket._real_close(self)
def do_handshake(self, block=False):
"""Perform a TLS/SSL handshake."""
self._check_connected()
timeout = self.gettimeout()
try:
if timeout == 0.0 and block:
self.settimeout(None)
self._sslobj.do_handshake()
finally:
self.settimeout(timeout)
if self.context.check_hostname:
if not self.server_hostname:
raise ValueError("check_hostname needs server_hostname "
"argument")
match_hostname(self.getpeercert(), self.server_hostname)
def _real_connect(self, addr, connect_ex):
if self.server_side:
raise ValueError("can't connect in server-side mode")
# Here we assume that the socket is client-side, and not
# connected at the time of the call. We connect it, then wrap it.
if self._connected:
raise ValueError("attempt to connect already-connected SSLSocket!")
self._sslobj = self.context._wrap_socket(self._sock, False, self.server_hostname, ssl_sock=self)
try:
if connect_ex:
rc = socket.connect_ex(self, addr)
else:
rc = None
socket.connect(self, addr)
if not rc:
self._connected = True
if self.do_handshake_on_connect:
self.do_handshake()
return rc
except (OSError, ValueError):
self._sslobj = None
raise
def connect(self, addr):
"""Connects to remote ADDR, and then wraps the connection in
an SSL channel."""
self._real_connect(addr, False)
def connect_ex(self, addr):
"""Connects to remote ADDR, and then wraps the connection in
an SSL channel."""
return self._real_connect(addr, True)
def accept(self):
"""Accepts a new connection from a remote client, and returns
a tuple containing that new connection wrapped with a server-side
SSL channel, and the address of the remote client."""
newsock, addr = socket.accept(self)
newsock = self.context.wrap_socket(newsock,
do_handshake_on_connect=self.do_handshake_on_connect,
suppress_ragged_eofs=self.suppress_ragged_eofs,
server_side=True)
return newsock, addr
def makefile(self, mode='r', bufsize=-1):
"""Make and return a file-like object that
works with the SSL connection. Just use the code
from the socket module."""
self._makefile_refs += 1
# close=True so as to decrement the reference count when done with
# the file-like object.
return _fileobject(self, mode, bufsize, close=True)
def get_channel_binding(self, cb_type="tls-unique"):
"""Get channel binding data for current connection. Raise ValueError
if the requested `cb_type` is not supported. Return bytes of the data
or None if the data is not available (e.g. before the handshake).
"""
if cb_type not in CHANNEL_BINDING_TYPES:
raise ValueError("Unsupported channel binding type")
if cb_type != "tls-unique":
raise NotImplementedError(
"{0} channel binding type not implemented"
.format(cb_type))
if self._sslobj is None:
return None
return self._sslobj.tls_unique_cb()
def version(self):
"""
Return a string identifying the protocol version used by the
current SSL channel, or None if there is no established channel.
"""
if self._sslobj is None:
return None
return self._sslobj.version()
def wrap_socket(sock, keyfile=None, certfile=None,
server_side=False, cert_reqs=CERT_NONE,
ssl_version=PROTOCOL_TLS, ca_certs=None,
do_handshake_on_connect=True,
suppress_ragged_eofs=True,
ciphers=None):
return SSLSocket(sock=sock, keyfile=keyfile, certfile=certfile,
server_side=server_side, cert_reqs=cert_reqs,
ssl_version=ssl_version, ca_certs=ca_certs,
do_handshake_on_connect=do_handshake_on_connect,
suppress_ragged_eofs=suppress_ragged_eofs,
ciphers=ciphers)
# some utility functions
def cert_time_to_seconds(cert_time):
"""Return the time in seconds since the Epoch, given the timestring
representing the "notBefore" or "notAfter" date from a certificate
in ``"%b %d %H:%M:%S %Y %Z"`` strptime format (C locale).
"notBefore" or "notAfter" dates must use UTC (RFC 5280).
Month is one of: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
UTC should be specified as GMT (see ASN1_TIME_print())
"""
from time import strptime
from calendar import timegm
months = (
"Jan","Feb","Mar","Apr","May","Jun",
"Jul","Aug","Sep","Oct","Nov","Dec"
)
time_format = ' %d %H:%M:%S %Y GMT' # NOTE: no month, fixed GMT
try:
month_number = months.index(cert_time[:3].title()) + 1
except ValueError:
raise ValueError('time data %r does not match '
'format "%%b%s"' % (cert_time, time_format))
else:
# found valid month
tt = strptime(cert_time[3:], time_format)
# return an integer, the previous mktime()-based implementation
# returned a float (fractional seconds are always zero here).
return timegm((tt[0], month_number) + tt[2:6])
PEM_HEADER = "-----BEGIN CERTIFICATE-----"
PEM_FOOTER = "-----END CERTIFICATE-----"
def DER_cert_to_PEM_cert(der_cert_bytes):
"""Takes a certificate in binary DER format and returns the
PEM version of it as a string."""
f = base64.standard_b64encode(der_cert_bytes).decode('ascii')
return (PEM_HEADER + '\n' +
textwrap.fill(f, 64) + '\n' +
PEM_FOOTER + '\n')
def PEM_cert_to_DER_cert(pem_cert_string):
"""Takes a certificate in ASCII PEM format and returns the
DER-encoded version of it as a byte sequence"""
if not pem_cert_string.startswith(PEM_HEADER):
raise ValueError("Invalid PEM encoding; must start with %s"
% PEM_HEADER)
if not pem_cert_string.strip().endswith(PEM_FOOTER):
raise ValueError("Invalid PEM encoding; must end with %s"
% PEM_FOOTER)
d = pem_cert_string.strip()[len(PEM_HEADER):-len(PEM_FOOTER)]
return base64.decodestring(d.encode('ASCII', 'strict'))
def get_server_certificate(addr, ssl_version=PROTOCOL_TLS, ca_certs=None):
"""Retrieve the certificate from the server at the specified address,
and return it as a PEM-encoded string.
If 'ca_certs' is specified, validate the server cert against it.
If 'ssl_version' is specified, use it in the connection attempt."""
host, port = addr
if ca_certs is not None:
cert_reqs = CERT_REQUIRED
else:
cert_reqs = CERT_NONE
context = _create_stdlib_context(ssl_version,
cert_reqs=cert_reqs,
cafile=ca_certs)
with closing(create_connection(addr)) as sock:
with closing(context.wrap_socket(sock)) as sslsock:
dercert = sslsock.getpeercert(True)
return DER_cert_to_PEM_cert(dercert)
def get_protocol_name(protocol_code):
return _PROTOCOL_NAMES.get(protocol_code, '<unknown>')
# a replacement for the old socket.ssl function
def sslwrap_simple(sock, keyfile=None, certfile=None):
"""A replacement for the old socket.ssl function. Designed
for compability with Python 2.5 and earlier. Will disappear in
Python 3.0."""
if hasattr(sock, "_sock"):
sock = sock._sock
ctx = SSLContext(PROTOCOL_SSLv23)
if keyfile or certfile:
ctx.load_cert_chain(certfile, keyfile)
ssl_sock = ctx._wrap_socket(sock, server_side=False)
try:
sock.getpeername()
except socket_error:
# no, no connection yet
pass
else:
# yes, do the handshake
ssl_sock.do_handshake()
return ssl_sock
"""Constants/functions for interpreting results of os.stat() and os.lstat().
Suggested usage: from stat import *
"""
# Indices for stat struct members in the tuple returned by os.stat()
ST_MODE = 0
ST_INO = 1
ST_DEV = 2
ST_NLINK = 3
ST_UID = 4
ST_GID = 5
ST_SIZE = 6
ST_ATIME = 7
ST_MTIME = 8
ST_CTIME = 9
# Extract bits from the mode
def S_IMODE(mode):
return mode & 07777
def S_IFMT(mode):
return mode & 0170000
# Constants used as S_IFMT() for various file types
# (not all are implemented on all systems)
S_IFDIR = 0040000
S_IFCHR = 0020000
S_IFBLK = 0060000
S_IFREG = 0100000
S_IFIFO = 0010000
S_IFLNK = 0120000
S_IFSOCK = 0140000
# Functions to test for each file type
def S_ISDIR(mode):
return S_IFMT(mode) == S_IFDIR
def S_ISCHR(mode):
return S_IFMT(mode) == S_IFCHR
def S_ISBLK(mode):
return S_IFMT(mode) == S_IFBLK
def S_ISREG(mode):
return S_IFMT(mode) == S_IFREG
def S_ISFIFO(mode):
return S_IFMT(mode) == S_IFIFO
def S_ISLNK(mode):
return S_IFMT(mode) == S_IFLNK
def S_ISSOCK(mode):
return S_IFMT(mode) == S_IFSOCK
# Names for permission bits
S_ISUID = 04000
S_ISGID = 02000
S_ENFMT = S_ISGID
S_ISVTX = 01000
S_IREAD = 00400
S_IWRITE = 00200
S_IEXEC = 00100
S_IRWXU = 00700
S_IRUSR = 00400
S_IWUSR = 00200
S_IXUSR = 00100
S_IRWXG = 00070
S_IRGRP = 00040
S_IWGRP = 00020
S_IXGRP = 00010
S_IRWXO = 00007
S_IROTH = 00004
S_IWOTH = 00002
S_IXOTH = 00001
# Names for file flags
UF_NODUMP = 0x00000001
UF_IMMUTABLE = 0x00000002
UF_APPEND = 0x00000004
UF_OPAQUE = 0x00000008
UF_NOUNLINK = 0x00000010
UF_COMPRESSED = 0x00000020 # OS X: file is hfs-compressed
UF_HIDDEN = 0x00008000 # OS X: file should not be displayed
SF_ARCHIVED = 0x00010000
SF_IMMUTABLE = 0x00020000
SF_APPEND = 0x00040000
SF_NOUNLINK = 0x00100000
SF_SNAPSHOT = 0x00200000
"""Constants for interpreting the results of os.statvfs() and os.fstatvfs()."""
from warnings import warnpy3k
warnpy3k("the statvfs module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
# Indices for statvfs struct members in the tuple returned by
# os.statvfs() and os.fstatvfs().
F_BSIZE = 0 # Preferred file system block size
F_FRSIZE = 1 # Fundamental file system block size
F_BLOCKS = 2 # Total number of file system blocks (FRSIZE)
F_BFREE = 3 # Total number of free blocks
F_BAVAIL = 4 # Free blocks available to non-superuser
F_FILES = 5 # Total number of file nodes
F_FFREE = 6 # Total number of free file nodes
F_FAVAIL = 7 # Free nodes available to non-superuser
F_FLAG = 8 # Flags (see your local statvfs man page)
F_NAMEMAX = 9 # Maximum file name length
"""A collection of string operations (most are no longer used).
Warning: most of the code you see here isn't normally used nowadays.
Beginning with Python 1.6, many of these functions are implemented as
methods on the standard string object. They used to be implemented by
a built-in module called strop, but strop is now obsolete itself.
Public module variables:
whitespace -- a string containing all characters considered whitespace
lowercase -- a string containing all characters considered lowercase letters
uppercase -- a string containing all characters considered uppercase letters
letters -- a string containing all characters considered letters
digits -- a string containing all characters considered decimal digits
hexdigits -- a string containing all characters considered hexadecimal digits
octdigits -- a string containing all characters considered octal digits
punctuation -- a string containing all characters considered punctuation
printable -- a string containing all characters considered printable
"""
# Some strings for ctype-style character classification
whitespace = ' \t\n\r\v\f'
lowercase = 'abcdefghijklmnopqrstuvwxyz'
uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
letters = lowercase + uppercase
ascii_lowercase = lowercase
ascii_uppercase = uppercase
ascii_letters = ascii_lowercase + ascii_uppercase
digits = '0123456789'
hexdigits = digits + 'abcdef' + 'ABCDEF'
octdigits = '01234567'
punctuation = """!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
printable = digits + letters + punctuation + whitespace
# Case conversion helpers
# Use str to convert Unicode literal in case of -U
l = map(chr, xrange(256))
_idmap = str('').join(l)
del l
# Functions which aren't available as string methods.
# Capitalize the words in a string, e.g. " aBc dEf " -> "Abc Def".
def capwords(s, sep=None):
"""capwords(s [,sep]) -> string
Split the argument into words using split, capitalize each
word using capitalize, and join the capitalized words using
join. If the optional second argument sep is absent or None,
runs of whitespace characters are replaced by a single space
and leading and trailing whitespace are removed, otherwise
sep is used to split and join the words.
"""
return (sep or ' ').join(x.capitalize() for x in s.split(sep))
# Construct a translation string
_idmapL = None
def maketrans(fromstr, tostr):
"""maketrans(frm, to) -> string
Return a translation table (a string of 256 bytes long)
suitable for use in string.translate. The strings frm and to
must be of the same length.
"""
if len(fromstr) != len(tostr):
raise ValueError, "maketrans arguments must have same length"
global _idmapL
if not _idmapL:
_idmapL = list(_idmap)
L = _idmapL[:]
fromstr = map(ord, fromstr)
for i in range(len(fromstr)):
L[fromstr[i]] = tostr[i]
return ''.join(L)
####################################################################
import re as _re
class _multimap:
"""Helper class for combining multiple mappings.
Used by .{safe_,}substitute() to combine the mapping and keyword
arguments.
"""
def __init__(self, primary, secondary):
self._primary = primary
self._secondary = secondary
def __getitem__(self, key):
try:
return self._primary[key]
except KeyError:
return self._secondary[key]
class _TemplateMetaclass(type):
pattern = r"""
%(delim)s(?:
(?P<escaped>%(delim)s) | # Escape sequence of two delimiters
(?P<named>%(id)s) | # delimiter and a Python identifier
{(?P<braced>%(id)s)} | # delimiter and a braced identifier
(?P<invalid>) # Other ill-formed delimiter exprs
)
"""
def __init__(cls, name, bases, dct):
super(_TemplateMetaclass, cls).__init__(name, bases, dct)
if 'pattern' in dct:
pattern = cls.pattern
else:
pattern = _TemplateMetaclass.pattern % {
'delim' : _re.escape(cls.delimiter),
'id' : cls.idpattern,
}
cls.pattern = _re.compile(pattern, _re.IGNORECASE | _re.VERBOSE)
class Template:
"""A string class for supporting $-substitutions."""
__metaclass__ = _TemplateMetaclass
delimiter = '$'
idpattern = r'[_a-z][_a-z0-9]*'
def __init__(self, template):
self.template = template
# Search for $$, $identifier, ${identifier}, and any bare $'s
def _invalid(self, mo):
i = mo.start('invalid')
lines = self.template[:i].splitlines(True)
if not lines:
colno = 1
lineno = 1
else:
colno = i - len(''.join(lines[:-1]))
lineno = len(lines)
raise ValueError('Invalid placeholder in string: line %d, col %d' %
(lineno, colno))
def substitute(*args, **kws):
if not args:
raise TypeError("descriptor 'substitute' of 'Template' object "
"needs an argument")
self, args = args[0], args[1:] # allow the "self" keyword be passed
if len(args) > 1:
raise TypeError('Too many positional arguments')
if not args:
mapping = kws
elif kws:
mapping = _multimap(kws, args[0])
else:
mapping = args[0]
# Helper function for .sub()
def convert(mo):
# Check the most common path first.
named = mo.group('named') or mo.group('braced')
if named is not None:
val = mapping[named]
# We use this idiom instead of str() because the latter will
# fail if val is a Unicode containing non-ASCII characters.
return '%s' % (val,)
if mo.group('escaped') is not None:
return self.delimiter
if mo.group('invalid') is not None:
self._invalid(mo)
raise ValueError('Unrecognized named group in pattern',
self.pattern)
return self.pattern.sub(convert, self.template)
def safe_substitute(*args, **kws):
if not args:
raise TypeError("descriptor 'safe_substitute' of 'Template' object "
"needs an argument")
self, args = args[0], args[1:] # allow the "self" keyword be passed
if len(args) > 1:
raise TypeError('Too many positional arguments')
if not args:
mapping = kws
elif kws:
mapping = _multimap(kws, args[0])
else:
mapping = args[0]
# Helper function for .sub()
def convert(mo):
named = mo.group('named') or mo.group('braced')
if named is not None:
try:
# We use this idiom instead of str() because the latter
# will fail if val is a Unicode containing non-ASCII
return '%s' % (mapping[named],)
except KeyError:
return mo.group()
if mo.group('escaped') is not None:
return self.delimiter
if mo.group('invalid') is not None:
return mo.group()
raise ValueError('Unrecognized named group in pattern',
self.pattern)
return self.pattern.sub(convert, self.template)
####################################################################
# NOTE: Everything below here is deprecated. Use string methods instead.
# This stuff will go away in Python 3.0.
# Backward compatible names for exceptions
index_error = ValueError
atoi_error = ValueError
atof_error = ValueError
atol_error = ValueError
# convert UPPER CASE letters to lower case
def lower(s):
"""lower(s) -> string
Return a copy of the string s converted to lowercase.
"""
return s.lower()
# Convert lower case letters to UPPER CASE
def upper(s):
"""upper(s) -> string
Return a copy of the string s converted to uppercase.
"""
return s.upper()
# Swap lower case letters and UPPER CASE
def swapcase(s):
"""swapcase(s) -> string
Return a copy of the string s with upper case characters
converted to lowercase and vice versa.
"""
return s.swapcase()
# Strip leading and trailing tabs and spaces
def strip(s, chars=None):
"""strip(s [,chars]) -> string
Return a copy of the string s with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping.
"""
return s.strip(chars)
# Strip leading tabs and spaces
def lstrip(s, chars=None):
"""lstrip(s [,chars]) -> string
Return a copy of the string s with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
"""
return s.lstrip(chars)
# Strip trailing tabs and spaces
def rstrip(s, chars=None):
"""rstrip(s [,chars]) -> string
Return a copy of the string s with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
"""
return s.rstrip(chars)
# Split a string into a list of space/tab-separated words
def split(s, sep=None, maxsplit=-1):
"""split(s [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is given, splits at no more than
maxsplit places (resulting in at most maxsplit+1 words). If sep
is not specified or is None, any whitespace string is a separator.
(split and splitfields are synonymous)
"""
return s.split(sep, maxsplit)
splitfields = split
# Split a string into a list of space/tab-separated words
def rsplit(s, sep=None, maxsplit=-1):
"""rsplit(s [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using sep as the
delimiter string, starting at the end of the string and working
to the front. If maxsplit is given, at most maxsplit splits are
done. If sep is not specified or is None, any whitespace string
is a separator.
"""
return s.rsplit(sep, maxsplit)
# Join fields with optional separator
def join(words, sep = ' '):
"""join(list [,sep]) -> string
Return a string composed of the words in list, with
intervening occurrences of sep. The default separator is a
single space.
(joinfields and join are synonymous)
"""
return sep.join(words)
joinfields = join
# Find substring, raise exception if not found
def index(s, *args):
"""index(s, sub [,start [,end]]) -> int
Like find but raises ValueError when the substring is not found.
"""
return s.index(*args)
# Find last substring, raise exception if not found
def rindex(s, *args):
"""rindex(s, sub [,start [,end]]) -> int
Like rfind but raises ValueError when the substring is not found.
"""
return s.rindex(*args)
# Count non-overlapping occurrences of substring
def count(s, *args):
"""count(s, sub[, start[,end]]) -> int
Return the number of occurrences of substring sub in string
s[start:end]. Optional arguments start and end are
interpreted as in slice notation.
"""
return s.count(*args)
# Find substring, return -1 if not found
def find(s, *args):
"""find(s, sub [,start [,end]]) -> in
Return the lowest index in s where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
"""
return s.find(*args)
# Find last substring, return -1 if not found
def rfind(s, *args):
"""rfind(s, sub [,start [,end]]) -> int
Return the highest index in s where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
"""
return s.rfind(*args)
# for a bit of speed
_float = float
_int = int
_long = long
# Convert string to float
def atof(s):
"""atof(s) -> float
Return the floating point number represented by the string s.
"""
return _float(s)
# Convert string to integer
def atoi(s , base=10):
"""atoi(s [,base]) -> int
Return the integer represented by the string s in the given
base, which defaults to 10. The string s must consist of one
or more digits, possibly preceded by a sign. If base is 0, it
is chosen from the leading characters of s, 0 for octal, 0x or
0X for hexadecimal. If base is 16, a preceding 0x or 0X is
accepted.
"""
return _int(s, base)
# Convert string to long integer
def atol(s, base=10):
"""atol(s [,base]) -> long
Return the long integer represented by the string s in the
given base, which defaults to 10. The string s must consist
of one or more digits, possibly preceded by a sign. If base
is 0, it is chosen from the leading characters of s, 0 for
octal, 0x or 0X for hexadecimal. If base is 16, a preceding
0x or 0X is accepted. A trailing L or l is not accepted,
unless base is 0.
"""
return _long(s, base)
# Left-justify a string
def ljust(s, width, *args):
"""ljust(s, width[, fillchar]) -> string
Return a left-justified version of s, in a field of the
specified width, padded with spaces as needed. The string is
never truncated. If specified the fillchar is used instead of spaces.
"""
return s.ljust(width, *args)
# Right-justify a string
def rjust(s, width, *args):
"""rjust(s, width[, fillchar]) -> string
Return a right-justified version of s, in a field of the
specified width, padded with spaces as needed. The string is
never truncated. If specified the fillchar is used instead of spaces.
"""
return s.rjust(width, *args)
# Center a string
def center(s, width, *args):
"""center(s, width[, fillchar]) -> string
Return a center version of s, in a field of the specified
width. padded with spaces as needed. The string is never
truncated. If specified the fillchar is used instead of spaces.
"""
return s.center(width, *args)
# Zero-fill a number, e.g., (12, 3) --> '012' and (-3, 3) --> '-03'
# Decadent feature: the argument may be a string or a number
# (Use of this is deprecated; it should be a string as with ljust c.s.)
def zfill(x, width):
"""zfill(x, width) -> string
Pad a numeric string x with zeros on the left, to fill a field
of the specified width. The string x is never truncated.
"""
if not isinstance(x, basestring):
x = repr(x)
return x.zfill(width)
# Expand tabs in a string.
# Doesn't take non-printing chars into account, but does understand \n.
def expandtabs(s, tabsize=8):
"""expandtabs(s [,tabsize]) -> string
Return a copy of the string s with all tab characters replaced
by the appropriate number of spaces, depending on the current
column, and the tabsize (default 8).
"""
return s.expandtabs(tabsize)
# Character translation through look-up table.
def translate(s, table, deletions=""):
"""translate(s,table [,deletions]) -> string
Return a copy of the string s, where all characters occurring
in the optional argument deletions are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256. The
deletions argument is not allowed for Unicode strings.
"""
if deletions or table is None:
return s.translate(table, deletions)
else:
# Add s[:0] so that if s is Unicode and table is an 8-bit string,
# table is converted to Unicode. This means that table *cannot*
# be a dictionary -- for that feature, use u.translate() directly.
return s.translate(table + s[:0])
# Capitalize a string, e.g. "aBc dEf" -> "Abc def".
def capitalize(s):
"""capitalize(s) -> string
Return a copy of the string s with only its first character
capitalized.
"""
return s.capitalize()
# Substring replacement (global)
def replace(s, old, new, maxreplace=-1):
"""replace (str, old, new[, maxreplace]) -> string
Return a copy of string str with all occurrences of substring
old replaced by new. If the optional argument maxreplace is
given, only the first maxreplace occurrences are replaced.
"""
return s.replace(old, new, maxreplace)
# Try importing optional built-in module "strop" -- if it exists,
# it redefines some string operations that are 100-1000 times faster.
# It also defines values for whitespace, lowercase and uppercase
# that match <ctype.h>'s definitions.
try:
from strop import maketrans, lowercase, uppercase, whitespace
letters = lowercase + uppercase
except ImportError:
pass # Use the original versions
########################################################################
# the Formatter class
# see PEP 3101 for details and purpose of this class
# The hard parts are reused from the C implementation. They're exposed as "_"
# prefixed methods of str and unicode.
# The overall parser is implemented in str._formatter_parser.
# The field name parser is implemented in str._formatter_field_name_split
class Formatter(object):
def format(*args, **kwargs):
if not args:
raise TypeError("descriptor 'format' of 'Formatter' object "
"needs an argument")
self, args = args[0], args[1:] # allow the "self" keyword be passed
try:
format_string, args = args[0], args[1:] # allow the "format_string" keyword be passed
except IndexError:
if 'format_string' in kwargs:
format_string = kwargs.pop('format_string')
else:
raise TypeError("format() missing 1 required positional "
"argument: 'format_string'")
return self.vformat(format_string, args, kwargs)
def vformat(self, format_string, args, kwargs):
used_args = set()
result = self._vformat(format_string, args, kwargs, used_args, 2)
self.check_unused_args(used_args, args, kwargs)
return result
def _vformat(self, format_string, args, kwargs, used_args, recursion_depth):
if recursion_depth < 0:
raise ValueError('Max string recursion exceeded')
result = []
for literal_text, field_name, format_spec, conversion in \
self.parse(format_string):
# output the literal text
if literal_text:
result.append(literal_text)
# if there's a field, output it
if field_name is not None:
# this is some markup, find the object and do
# the formatting
# given the field_name, find the object it references
# and the argument it came from
obj, arg_used = self.get_field(field_name, args, kwargs)
used_args.add(arg_used)
# do any conversion on the resulting object
obj = self.convert_field(obj, conversion)
# expand the format spec, if needed
format_spec = self._vformat(format_spec, args, kwargs,
used_args, recursion_depth-1)
# format the object and append to the result
result.append(self.format_field(obj, format_spec))
return ''.join(result)
def get_value(self, key, args, kwargs):
if isinstance(key, (int, long)):
return args[key]
else:
return kwargs[key]
def check_unused_args(self, used_args, args, kwargs):
pass
def format_field(self, value, format_spec):
return format(value, format_spec)
def convert_field(self, value, conversion):
# do any conversion on the resulting object
if conversion is None:
return value
elif conversion == 's':
return str(value)
elif conversion == 'r':
return repr(value)
raise ValueError("Unknown conversion specifier {0!s}".format(conversion))
# returns an iterable that contains tuples of the form:
# (literal_text, field_name, format_spec, conversion)
# literal_text can be zero length
# field_name can be None, in which case there's no
# object to format and output
# if field_name is not None, it is looked up, formatted
# with format_spec and conversion and then used
def parse(self, format_string):
return format_string._formatter_parser()
# given a field_name, find the object it references.
# field_name: the field being looked up, e.g. "0.name"
# or "lookup[3]"
# used_args: a set of which args have been used
# args, kwargs: as passed in to vformat
def get_field(self, field_name, args, kwargs):
first, rest = field_name._formatter_field_name_split()
obj = self.get_value(first, args, kwargs)
# loop through the rest of the field_name, doing
# getattr or getitem as needed
for is_attr, i in rest:
if is_attr:
obj = getattr(obj, i)
else:
obj = obj[i]
return obj, first
r"""File-like objects that read from or write to a string buffer.
This implements (nearly) all stdio methods.
f = StringIO() # ready for writing
f = StringIO(buf) # ready for reading
f.close() # explicitly release resources held
flag = f.isatty() # always false
pos = f.tell() # get current position
f.seek(pos) # set current position
f.seek(pos, mode) # mode 0: absolute; 1: relative; 2: relative to EOF
buf = f.read() # read until EOF
buf = f.read(n) # read up to n bytes
buf = f.readline() # read until end of line ('\n') or EOF
list = f.readlines()# list of f.readline() results until EOF
f.truncate([size]) # truncate file at to at most size (default: current pos)
f.write(buf) # write at current position
f.writelines(list) # for line in list: f.write(line)
f.getvalue() # return whole file's contents as a string
Notes:
- Using a real file is often faster (but less convenient).
- There's also a much faster implementation in C, called cStringIO, but
it's not subclassable.
- fileno() is left unimplemented so that code which uses it triggers
an exception early.
- Seeking far beyond EOF and then writing will insert real null
bytes that occupy space in the buffer.
- There's a simple test set (see end of this file).
"""
try:
from errno import EINVAL
except ImportError:
EINVAL = 22
__all__ = ["StringIO"]
def _complain_ifclosed(closed):
if closed:
raise ValueError, "I/O operation on closed file"
class StringIO:
"""class StringIO([buffer])
When a StringIO object is created, it can be initialized to an existing
string by passing the string to the constructor. If no string is given,
the StringIO will start empty.
The StringIO object can accept either Unicode or 8-bit strings, but
mixing the two may take some care. If both are used, 8-bit strings that
cannot be interpreted as 7-bit ASCII (that use the 8th bit) will cause
a UnicodeError to be raised when getvalue() is called.
"""
def __init__(self, buf = ''):
# Force self.buf to be a string or unicode
if not isinstance(buf, basestring):
buf = str(buf)
self.buf = buf
self.len = len(buf)
self.buflist = []
self.pos = 0
self.closed = False
self.softspace = 0
def __iter__(self):
return self
def next(self):
"""A file object is its own iterator, for example iter(f) returns f
(unless f is closed). When a file is used as an iterator, typically
in a for loop (for example, for line in f: print line), the next()
method is called repeatedly. This method returns the next input line,
or raises StopIteration when EOF is hit.
"""
_complain_ifclosed(self.closed)
r = self.readline()
if not r:
raise StopIteration
return r
def close(self):
"""Free the memory buffer.
"""
if not self.closed:
self.closed = True
del self.buf, self.pos
def isatty(self):
"""Returns False because StringIO objects are not connected to a
tty-like device.
"""
_complain_ifclosed(self.closed)
return False
def seek(self, pos, mode = 0):
"""Set the file's current position.
The mode argument is optional and defaults to 0 (absolute file
positioning); other values are 1 (seek relative to the current
position) and 2 (seek relative to the file's end).
There is no return value.
"""
_complain_ifclosed(self.closed)
if self.buflist:
self.buf += ''.join(self.buflist)
self.buflist = []
if mode == 1:
pos += self.pos
elif mode == 2:
pos += self.len
self.pos = max(0, pos)
def tell(self):
"""Return the file's current position."""
_complain_ifclosed(self.closed)
return self.pos
def read(self, n = -1):
"""Read at most size bytes from the file
(less if the read hits EOF before obtaining size bytes).
If the size argument is negative or omitted, read all data until EOF
is reached. The bytes are returned as a string object. An empty
string is returned when EOF is encountered immediately.
"""
_complain_ifclosed(self.closed)
if self.buflist:
self.buf += ''.join(self.buflist)
self.buflist = []
if n is None or n < 0:
newpos = self.len
else:
newpos = min(self.pos+n, self.len)
r = self.buf[self.pos:newpos]
self.pos = newpos
return r
def readline(self, length=None):
r"""Read one entire line from the file.
A trailing newline character is kept in the string (but may be absent
when a file ends with an incomplete line). If the size argument is
present and non-negative, it is a maximum byte count (including the
trailing newline) and an incomplete line may be returned.
An empty string is returned only when EOF is encountered immediately.
Note: Unlike stdio's fgets(), the returned string contains null
characters ('\0') if they occurred in the input.
"""
_complain_ifclosed(self.closed)
if self.buflist:
self.buf += ''.join(self.buflist)
self.buflist = []
i = self.buf.find('\n', self.pos)
if i < 0:
newpos = self.len
else:
newpos = i+1
if length is not None and length >= 0:
if self.pos + length < newpos:
newpos = self.pos + length
r = self.buf[self.pos:newpos]
self.pos = newpos
return r
def readlines(self, sizehint = 0):
"""Read until EOF using readline() and return a list containing the
lines thus read.
If the optional sizehint argument is present, instead of reading up
to EOF, whole lines totalling approximately sizehint bytes (or more
to accommodate a final whole line).
"""
total = 0
lines = []
line = self.readline()
while line:
lines.append(line)
total += len(line)
if 0 < sizehint <= total:
break
line = self.readline()
return lines
def truncate(self, size=None):
"""Truncate the file's size.
If the optional size argument is present, the file is truncated to
(at most) that size. The size defaults to the current position.
The current file position is not changed unless the position
is beyond the new file size.
If the specified size exceeds the file's current size, the
file remains unchanged.
"""
_complain_ifclosed(self.closed)
if size is None:
size = self.pos
elif size < 0:
raise IOError(EINVAL, "Negative size not allowed")
elif size < self.pos:
self.pos = size
self.buf = self.getvalue()[:size]
self.len = size
def write(self, s):
"""Write a string to the file.
There is no return value.
"""
_complain_ifclosed(self.closed)
if not s: return
# Force s to be a string or unicode
if not isinstance(s, basestring):
s = str(s)
spos = self.pos
slen = self.len
if spos == slen:
self.buflist.append(s)
self.len = self.pos = spos + len(s)
return
if spos > slen:
self.buflist.append('\0'*(spos - slen))
slen = spos
newpos = spos + len(s)
if spos < slen:
if self.buflist:
self.buf += ''.join(self.buflist)
self.buflist = [self.buf[:spos], s, self.buf[newpos:]]
self.buf = ''
if newpos > slen:
slen = newpos
else:
self.buflist.append(s)
slen = newpos
self.len = slen
self.pos = newpos
def writelines(self, iterable):
"""Write a sequence of strings to the file. The sequence can be any
iterable object producing strings, typically a list of strings. There
is no return value.
(The name is intended to match readlines(); writelines() does not add
line separators.)
"""
write = self.write
for line in iterable:
write(line)
def flush(self):
"""Flush the internal buffer
"""
_complain_ifclosed(self.closed)
def getvalue(self):
"""
Retrieve the entire contents of the "file" at any time before
the StringIO object's close() method is called.
The StringIO object can accept either Unicode or 8-bit strings,
but mixing the two may take some care. If both are used, 8-bit
strings that cannot be interpreted as 7-bit ASCII (that use the
8th bit) will cause a UnicodeError to be raised when getvalue()
is called.
"""
_complain_ifclosed(self.closed)
if self.buflist:
self.buf += ''.join(self.buflist)
self.buflist = []
return self.buf
# A little test suite
def test():
import sys
if sys.argv[1:]:
file = sys.argv[1]
else:
file = '/etc/passwd'
lines = open(file, 'r').readlines()
text = open(file, 'r').read()
f = StringIO()
for line in lines[:-2]:
f.write(line)
f.writelines(lines[-2:])
if f.getvalue() != text:
raise RuntimeError, 'write failed'
length = f.tell()
print 'File length =', length
f.seek(len(lines[0]))
f.write(lines[1])
f.seek(0)
print 'First line =', repr(f.readline())
print 'Position =', f.tell()
line = f.readline()
print 'Second line =', repr(line)
f.seek(-len(line), 1)
line2 = f.read(len(line))
if line != line2:
raise RuntimeError, 'bad result after seek back'
f.seek(len(line2), 1)
list = f.readlines()
line = list[-1]
f.seek(f.tell() - len(line))
line2 = f.read()
if line != line2:
raise RuntimeError, 'bad result after seek back from EOF'
print 'Read', len(list), 'more lines'
print 'File length =', f.tell()
if f.tell() != length:
raise RuntimeError, 'bad length'
f.truncate(length/2)
f.seek(0, 2)
print 'Truncated length =', f.tell()
if f.tell() != length/2:
raise RuntimeError, 'truncate did not adjust length'
f.close()
if __name__ == '__main__':
test()
# module 'string' -- A collection of string operations
# Warning: most of the code you see here isn't normally used nowadays. With
# Python 1.6, many of these functions are implemented as methods on the
# standard string object. They used to be implemented by a built-in module
# called strop, but strop is now obsolete itself.
"""Common string manipulations.
Public module variables:
whitespace -- a string containing all characters considered whitespace
lowercase -- a string containing all characters considered lowercase letters
uppercase -- a string containing all characters considered uppercase letters
letters -- a string containing all characters considered letters
digits -- a string containing all characters considered decimal digits
hexdigits -- a string containing all characters considered hexadecimal digits
octdigits -- a string containing all characters considered octal digits
"""
from warnings import warnpy3k
warnpy3k("the stringold module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
# Some strings for ctype-style character classification
whitespace = ' \t\n\r\v\f'
lowercase = 'abcdefghijklmnopqrstuvwxyz'
uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
letters = lowercase + uppercase
digits = '0123456789'
hexdigits = digits + 'abcdef' + 'ABCDEF'
octdigits = '01234567'
# Case conversion helpers
_idmap = ''
for i in range(256): _idmap = _idmap + chr(i)
del i
# Backward compatible names for exceptions
index_error = ValueError
atoi_error = ValueError
atof_error = ValueError
atol_error = ValueError
# convert UPPER CASE letters to lower case
def lower(s):
"""lower(s) -> string
Return a copy of the string s converted to lowercase.
"""
return s.lower()
# Convert lower case letters to UPPER CASE
def upper(s):
"""upper(s) -> string
Return a copy of the string s converted to uppercase.
"""
return s.upper()
# Swap lower case letters and UPPER CASE
def swapcase(s):
"""swapcase(s) -> string
Return a copy of the string s with upper case characters
converted to lowercase and vice versa.
"""
return s.swapcase()
# Strip leading and trailing tabs and spaces
def strip(s):
"""strip(s) -> string
Return a copy of the string s with leading and trailing
whitespace removed.
"""
return s.strip()
# Strip leading tabs and spaces
def lstrip(s):
"""lstrip(s) -> string
Return a copy of the string s with leading whitespace removed.
"""
return s.lstrip()
# Strip trailing tabs and spaces
def rstrip(s):
"""rstrip(s) -> string
Return a copy of the string s with trailing whitespace
removed.
"""
return s.rstrip()
# Split a string into a list of space/tab-separated words
def split(s, sep=None, maxsplit=0):
"""split(str [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is nonzero, splits into at most
maxsplit words If sep is not specified, any whitespace string
is a separator. Maxsplit defaults to 0.
(split and splitfields are synonymous)
"""
return s.split(sep, maxsplit)
splitfields = split
# Join fields with optional separator
def join(words, sep = ' '):
"""join(list [,sep]) -> string
Return a string composed of the words in list, with
intervening occurrences of sep. The default separator is a
single space.
(joinfields and join are synonymous)
"""
return sep.join(words)
joinfields = join
# for a little bit of speed
_apply = apply
# Find substring, raise exception if not found
def index(s, *args):
"""index(s, sub [,start [,end]]) -> int
Like find but raises ValueError when the substring is not found.
"""
return _apply(s.index, args)
# Find last substring, raise exception if not found
def rindex(s, *args):
"""rindex(s, sub [,start [,end]]) -> int
Like rfind but raises ValueError when the substring is not found.
"""
return _apply(s.rindex, args)
# Count non-overlapping occurrences of substring
def count(s, *args):
"""count(s, sub[, start[,end]]) -> int
Return the number of occurrences of substring sub in string
s[start:end]. Optional arguments start and end are
interpreted as in slice notation.
"""
return _apply(s.count, args)
# Find substring, return -1 if not found
def find(s, *args):
"""find(s, sub [,start [,end]]) -> in
Return the lowest index in s where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
"""
return _apply(s.find, args)
# Find last substring, return -1 if not found
def rfind(s, *args):
"""rfind(s, sub [,start [,end]]) -> int
Return the highest index in s where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
"""
return _apply(s.rfind, args)
# for a bit of speed
_float = float
_int = int
_long = long
_StringType = type('')
# Convert string to float
def atof(s):
"""atof(s) -> float
Return the floating point number represented by the string s.
"""
if type(s) == _StringType:
return _float(s)
else:
raise TypeError('argument 1: expected string, %s found' %
type(s).__name__)
# Convert string to integer
def atoi(*args):
"""atoi(s [,base]) -> int
Return the integer represented by the string s in the given
base, which defaults to 10. The string s must consist of one
or more digits, possibly preceded by a sign. If base is 0, it
is chosen from the leading characters of s, 0 for octal, 0x or
0X for hexadecimal. If base is 16, a preceding 0x or 0X is
accepted.
"""
try:
s = args[0]
except IndexError:
raise TypeError('function requires at least 1 argument: %d given' %
len(args))
# Don't catch type error resulting from too many arguments to int(). The
# error message isn't compatible but the error type is, and this function
# is complicated enough already.
if type(s) == _StringType:
return _apply(_int, args)
else:
raise TypeError('argument 1: expected string, %s found' %
type(s).__name__)
# Convert string to long integer
def atol(*args):
"""atol(s [,base]) -> long
Return the long integer represented by the string s in the
given base, which defaults to 10. The string s must consist
of one or more digits, possibly preceded by a sign. If base
is 0, it is chosen from the leading characters of s, 0 for
octal, 0x or 0X for hexadecimal. If base is 16, a preceding
0x or 0X is accepted. A trailing L or l is not accepted,
unless base is 0.
"""
try:
s = args[0]
except IndexError:
raise TypeError('function requires at least 1 argument: %d given' %
len(args))
# Don't catch type error resulting from too many arguments to long(). The
# error message isn't compatible but the error type is, and this function
# is complicated enough already.
if type(s) == _StringType:
return _apply(_long, args)
else:
raise TypeError('argument 1: expected string, %s found' %
type(s).__name__)
# Left-justify a string
def ljust(s, width):
"""ljust(s, width) -> string
Return a left-justified version of s, in a field of the
specified width, padded with spaces as needed. The string is
never truncated.
"""
n = width - len(s)
if n <= 0: return s
return s + ' '*n
# Right-justify a string
def rjust(s, width):
"""rjust(s, width) -> string
Return a right-justified version of s, in a field of the
specified width, padded with spaces as needed. The string is
never truncated.
"""
n = width - len(s)
if n <= 0: return s
return ' '*n + s
# Center a string
def center(s, width):
"""center(s, width) -> string
Return a center version of s, in a field of the specified
width. padded with spaces as needed. The string is never
truncated.
"""
n = width - len(s)
if n <= 0: return s
half = n/2
if n%2 and width%2:
# This ensures that center(center(s, i), j) = center(s, j)
half = half+1
return ' '*half + s + ' '*(n-half)
# Zero-fill a number, e.g., (12, 3) --> '012' and (-3, 3) --> '-03'
# Decadent feature: the argument may be a string or a number
# (Use of this is deprecated; it should be a string as with ljust c.s.)
def zfill(x, width):
"""zfill(x, width) -> string
Pad a numeric string x with zeros on the left, to fill a field
of the specified width. The string x is never truncated.
"""
if type(x) == type(''): s = x
else: s = repr(x)
n = len(s)
if n >= width: return s
sign = ''
if s[0] in ('-', '+'):
sign, s = s[0], s[1:]
return sign + '0'*(width-n) + s
# Expand tabs in a string.
# Doesn't take non-printing chars into account, but does understand \n.
def expandtabs(s, tabsize=8):
"""expandtabs(s [,tabsize]) -> string
Return a copy of the string s with all tab characters replaced
by the appropriate number of spaces, depending on the current
column, and the tabsize (default 8).
"""
res = line = ''
for c in s:
if c == '\t':
c = ' '*(tabsize - len(line) % tabsize)
line = line + c
if c == '\n':
res = res + line
line = ''
return res + line
# Character translation through look-up table.
def translate(s, table, deletions=""):
"""translate(s,table [,deletechars]) -> string
Return a copy of the string s, where all characters occurring
in the optional argument deletechars are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256.
"""
return s.translate(table, deletions)
# Capitalize a string, e.g. "aBc dEf" -> "Abc def".
def capitalize(s):
"""capitalize(s) -> string
Return a copy of the string s with only its first character
capitalized.
"""
return s.capitalize()
# Capitalize the words in a string, e.g. " aBc dEf " -> "Abc Def".
def capwords(s, sep=None):
"""capwords(s, [sep]) -> string
Split the argument into words using split, capitalize each
word using capitalize, and join the capitalized words using
join. Note that this replaces runs of whitespace characters by
a single space.
"""
return join(map(capitalize, s.split(sep)), sep or ' ')
# Construct a translation string
_idmapL = None
def maketrans(fromstr, tostr):
"""maketrans(frm, to) -> string
Return a translation table (a string of 256 bytes long)
suitable for use in string.translate. The strings frm and to
must be of the same length.
"""
if len(fromstr) != len(tostr):
raise ValueError, "maketrans arguments must have same length"
global _idmapL
if not _idmapL:
_idmapL = list(_idmap)
L = _idmapL[:]
fromstr = map(ord, fromstr)
for i in range(len(fromstr)):
L[fromstr[i]] = tostr[i]
return join(L, "")
# Substring replacement (global)
def replace(s, old, new, maxsplit=0):
"""replace (str, old, new[, maxsplit]) -> string
Return a copy of string str with all occurrences of substring
old replaced by new. If the optional argument maxsplit is
given, only the first maxsplit occurrences are replaced.
"""
return s.replace(old, new, maxsplit)
# XXX: transitional
#
# If string objects do not have methods, then we need to use the old string.py
# library, which uses strop for many more things than just the few outlined
# below.
try:
''.upper
except AttributeError:
from stringold import *
# Try importing optional built-in module "strop" -- if it exists,
# it redefines some string operations that are 100-1000 times faster.
# It also defines values for whitespace, lowercase and uppercase
# that match <ctype.h>'s definitions.
try:
from strop import maketrans, lowercase, uppercase, whitespace
letters = lowercase + uppercase
except ImportError:
pass # Use the original versions
# This file is generated by mkstringprep.py. DO NOT EDIT.
"""Library that exposes various tables found in the StringPrep RFC 3454.
There are two kinds of tables: sets, for which a member test is provided,
and mappings, for which a mapping function is provided.
"""
from unicodedata import ucd_3_2_0 as unicodedata
assert unicodedata.unidata_version == '3.2.0'
def in_table_a1(code):
if unicodedata.category(code) != 'Cn': return False
c = ord(code)
if 0xFDD0 <= c < 0xFDF0: return False
return (c & 0xFFFF) not in (0xFFFE, 0xFFFF)
b1_set = set([173, 847, 6150, 6155, 6156, 6157, 8203, 8204, 8205, 8288, 65279] + range(65024,65040))
def in_table_b1(code):
return ord(code) in b1_set
b3_exceptions = {
0xb5:u'\u03bc', 0xdf:u'ss', 0x130:u'i\u0307', 0x149:u'\u02bcn',
0x17f:u's', 0x1f0:u'j\u030c', 0x345:u'\u03b9', 0x37a:u' \u03b9',
0x390:u'\u03b9\u0308\u0301', 0x3b0:u'\u03c5\u0308\u0301', 0x3c2:u'\u03c3', 0x3d0:u'\u03b2',
0x3d1:u'\u03b8', 0x3d2:u'\u03c5', 0x3d3:u'\u03cd', 0x3d4:u'\u03cb',
0x3d5:u'\u03c6', 0x3d6:u'\u03c0', 0x3f0:u'\u03ba', 0x3f1:u'\u03c1',
0x3f2:u'\u03c3', 0x3f5:u'\u03b5', 0x587:u'\u0565\u0582', 0x1e96:u'h\u0331',
0x1e97:u't\u0308', 0x1e98:u'w\u030a', 0x1e99:u'y\u030a', 0x1e9a:u'a\u02be',
0x1e9b:u'\u1e61', 0x1f50:u'\u03c5\u0313', 0x1f52:u'\u03c5\u0313\u0300', 0x1f54:u'\u03c5\u0313\u0301',
0x1f56:u'\u03c5\u0313\u0342', 0x1f80:u'\u1f00\u03b9', 0x1f81:u'\u1f01\u03b9', 0x1f82:u'\u1f02\u03b9',
0x1f83:u'\u1f03\u03b9', 0x1f84:u'\u1f04\u03b9', 0x1f85:u'\u1f05\u03b9', 0x1f86:u'\u1f06\u03b9',
0x1f87:u'\u1f07\u03b9', 0x1f88:u'\u1f00\u03b9', 0x1f89:u'\u1f01\u03b9', 0x1f8a:u'\u1f02\u03b9',
0x1f8b:u'\u1f03\u03b9', 0x1f8c:u'\u1f04\u03b9', 0x1f8d:u'\u1f05\u03b9', 0x1f8e:u'\u1f06\u03b9',
0x1f8f:u'\u1f07\u03b9', 0x1f90:u'\u1f20\u03b9', 0x1f91:u'\u1f21\u03b9', 0x1f92:u'\u1f22\u03b9',
0x1f93:u'\u1f23\u03b9', 0x1f94:u'\u1f24\u03b9', 0x1f95:u'\u1f25\u03b9', 0x1f96:u'\u1f26\u03b9',
0x1f97:u'\u1f27\u03b9', 0x1f98:u'\u1f20\u03b9', 0x1f99:u'\u1f21\u03b9', 0x1f9a:u'\u1f22\u03b9',
0x1f9b:u'\u1f23\u03b9', 0x1f9c:u'\u1f24\u03b9', 0x1f9d:u'\u1f25\u03b9', 0x1f9e:u'\u1f26\u03b9',
0x1f9f:u'\u1f27\u03b9', 0x1fa0:u'\u1f60\u03b9', 0x1fa1:u'\u1f61\u03b9', 0x1fa2:u'\u1f62\u03b9',
0x1fa3:u'\u1f63\u03b9', 0x1fa4:u'\u1f64\u03b9', 0x1fa5:u'\u1f65\u03b9', 0x1fa6:u'\u1f66\u03b9',
0x1fa7:u'\u1f67\u03b9', 0x1fa8:u'\u1f60\u03b9', 0x1fa9:u'\u1f61\u03b9', 0x1faa:u'\u1f62\u03b9',
0x1fab:u'\u1f63\u03b9', 0x1fac:u'\u1f64\u03b9', 0x1fad:u'\u1f65\u03b9', 0x1fae:u'\u1f66\u03b9',
0x1faf:u'\u1f67\u03b9', 0x1fb2:u'\u1f70\u03b9', 0x1fb3:u'\u03b1\u03b9', 0x1fb4:u'\u03ac\u03b9',
0x1fb6:u'\u03b1\u0342', 0x1fb7:u'\u03b1\u0342\u03b9', 0x1fbc:u'\u03b1\u03b9', 0x1fbe:u'\u03b9',
0x1fc2:u'\u1f74\u03b9', 0x1fc3:u'\u03b7\u03b9', 0x1fc4:u'\u03ae\u03b9', 0x1fc6:u'\u03b7\u0342',
0x1fc7:u'\u03b7\u0342\u03b9', 0x1fcc:u'\u03b7\u03b9', 0x1fd2:u'\u03b9\u0308\u0300', 0x1fd3:u'\u03b9\u0308\u0301',
0x1fd6:u'\u03b9\u0342', 0x1fd7:u'\u03b9\u0308\u0342', 0x1fe2:u'\u03c5\u0308\u0300', 0x1fe3:u'\u03c5\u0308\u0301',
0x1fe4:u'\u03c1\u0313', 0x1fe6:u'\u03c5\u0342', 0x1fe7:u'\u03c5\u0308\u0342', 0x1ff2:u'\u1f7c\u03b9',
0x1ff3:u'\u03c9\u03b9', 0x1ff4:u'\u03ce\u03b9', 0x1ff6:u'\u03c9\u0342', 0x1ff7:u'\u03c9\u0342\u03b9',
0x1ffc:u'\u03c9\u03b9', 0x20a8:u'rs', 0x2102:u'c', 0x2103:u'\xb0c',
0x2107:u'\u025b', 0x2109:u'\xb0f', 0x210b:u'h', 0x210c:u'h',
0x210d:u'h', 0x2110:u'i', 0x2111:u'i', 0x2112:u'l',
0x2115:u'n', 0x2116:u'no', 0x2119:u'p', 0x211a:u'q',
0x211b:u'r', 0x211c:u'r', 0x211d:u'r', 0x2120:u'sm',
0x2121:u'tel', 0x2122:u'tm', 0x2124:u'z', 0x2128:u'z',
0x212c:u'b', 0x212d:u'c', 0x2130:u'e', 0x2131:u'f',
0x2133:u'm', 0x213e:u'\u03b3', 0x213f:u'\u03c0', 0x2145:u'd',
0x3371:u'hpa', 0x3373:u'au', 0x3375:u'ov', 0x3380:u'pa',
0x3381:u'na', 0x3382:u'\u03bca', 0x3383:u'ma', 0x3384:u'ka',
0x3385:u'kb', 0x3386:u'mb', 0x3387:u'gb', 0x338a:u'pf',
0x338b:u'nf', 0x338c:u'\u03bcf', 0x3390:u'hz', 0x3391:u'khz',
0x3392:u'mhz', 0x3393:u'ghz', 0x3394:u'thz', 0x33a9:u'pa',
0x33aa:u'kpa', 0x33ab:u'mpa', 0x33ac:u'gpa', 0x33b4:u'pv',
0x33b5:u'nv', 0x33b6:u'\u03bcv', 0x33b7:u'mv', 0x33b8:u'kv',
0x33b9:u'mv', 0x33ba:u'pw', 0x33bb:u'nw', 0x33bc:u'\u03bcw',
0x33bd:u'mw', 0x33be:u'kw', 0x33bf:u'mw', 0x33c0:u'k\u03c9',
0x33c1:u'm\u03c9', 0x33c3:u'bq', 0x33c6:u'c\u2215kg', 0x33c7:u'co.',
0x33c8:u'db', 0x33c9:u'gy', 0x33cb:u'hp', 0x33cd:u'kk',
0x33ce:u'km', 0x33d7:u'ph', 0x33d9:u'ppm', 0x33da:u'pr',
0x33dc:u'sv', 0x33dd:u'wb', 0xfb00:u'ff', 0xfb01:u'fi',
0xfb02:u'fl', 0xfb03:u'ffi', 0xfb04:u'ffl', 0xfb05:u'st',
0xfb06:u'st', 0xfb13:u'\u0574\u0576', 0xfb14:u'\u0574\u0565', 0xfb15:u'\u0574\u056b',
0xfb16:u'\u057e\u0576', 0xfb17:u'\u0574\u056d', 0x1d400:u'a', 0x1d401:u'b',
0x1d402:u'c', 0x1d403:u'd', 0x1d404:u'e', 0x1d405:u'f',
0x1d406:u'g', 0x1d407:u'h', 0x1d408:u'i', 0x1d409:u'j',
0x1d40a:u'k', 0x1d40b:u'l', 0x1d40c:u'm', 0x1d40d:u'n',
0x1d40e:u'o', 0x1d40f:u'p', 0x1d410:u'q', 0x1d411:u'r',
0x1d412:u's', 0x1d413:u't', 0x1d414:u'u', 0x1d415:u'v',
0x1d416:u'w', 0x1d417:u'x', 0x1d418:u'y', 0x1d419:u'z',
0x1d434:u'a', 0x1d435:u'b', 0x1d436:u'c', 0x1d437:u'd',
0x1d438:u'e', 0x1d439:u'f', 0x1d43a:u'g', 0x1d43b:u'h',
0x1d43c:u'i', 0x1d43d:u'j', 0x1d43e:u'k', 0x1d43f:u'l',
0x1d440:u'm', 0x1d441:u'n', 0x1d442:u'o', 0x1d443:u'p',
0x1d444:u'q', 0x1d445:u'r', 0x1d446:u's', 0x1d447:u't',
0x1d448:u'u', 0x1d449:u'v', 0x1d44a:u'w', 0x1d44b:u'x',
0x1d44c:u'y', 0x1d44d:u'z', 0x1d468:u'a', 0x1d469:u'b',
0x1d46a:u'c', 0x1d46b:u'd', 0x1d46c:u'e', 0x1d46d:u'f',
0x1d46e:u'g', 0x1d46f:u'h', 0x1d470:u'i', 0x1d471:u'j',
0x1d472:u'k', 0x1d473:u'l', 0x1d474:u'm', 0x1d475:u'n',
0x1d476:u'o', 0x1d477:u'p', 0x1d478:u'q', 0x1d479:u'r',
0x1d47a:u's', 0x1d47b:u't', 0x1d47c:u'u', 0x1d47d:u'v',
0x1d47e:u'w', 0x1d47f:u'x', 0x1d480:u'y', 0x1d481:u'z',
0x1d49c:u'a', 0x1d49e:u'c', 0x1d49f:u'd', 0x1d4a2:u'g',
0x1d4a5:u'j', 0x1d4a6:u'k', 0x1d4a9:u'n', 0x1d4aa:u'o',
0x1d4ab:u'p', 0x1d4ac:u'q', 0x1d4ae:u's', 0x1d4af:u't',
0x1d4b0:u'u', 0x1d4b1:u'v', 0x1d4b2:u'w', 0x1d4b3:u'x',
0x1d4b4:u'y', 0x1d4b5:u'z', 0x1d4d0:u'a', 0x1d4d1:u'b',
0x1d4d2:u'c', 0x1d4d3:u'd', 0x1d4d4:u'e', 0x1d4d5:u'f',
0x1d4d6:u'g', 0x1d4d7:u'h', 0x1d4d8:u'i', 0x1d4d9:u'j',
0x1d4da:u'k', 0x1d4db:u'l', 0x1d4dc:u'm', 0x1d4dd:u'n',
0x1d4de:u'o', 0x1d4df:u'p', 0x1d4e0:u'q', 0x1d4e1:u'r',
0x1d4e2:u's', 0x1d4e3:u't', 0x1d4e4:u'u', 0x1d4e5:u'v',
0x1d4e6:u'w', 0x1d4e7:u'x', 0x1d4e8:u'y', 0x1d4e9:u'z',
0x1d504:u'a', 0x1d505:u'b', 0x1d507:u'd', 0x1d508:u'e',
0x1d509:u'f', 0x1d50a:u'g', 0x1d50d:u'j', 0x1d50e:u'k',
0x1d50f:u'l', 0x1d510:u'm', 0x1d511:u'n', 0x1d512:u'o',
0x1d513:u'p', 0x1d514:u'q', 0x1d516:u's', 0x1d517:u't',
0x1d518:u'u', 0x1d519:u'v', 0x1d51a:u'w', 0x1d51b:u'x',
0x1d51c:u'y', 0x1d538:u'a', 0x1d539:u'b', 0x1d53b:u'd',
0x1d53c:u'e', 0x1d53d:u'f', 0x1d53e:u'g', 0x1d540:u'i',
0x1d541:u'j', 0x1d542:u'k', 0x1d543:u'l', 0x1d544:u'm',
0x1d546:u'o', 0x1d54a:u's', 0x1d54b:u't', 0x1d54c:u'u',
0x1d54d:u'v', 0x1d54e:u'w', 0x1d54f:u'x', 0x1d550:u'y',
0x1d56c:u'a', 0x1d56d:u'b', 0x1d56e:u'c', 0x1d56f:u'd',
0x1d570:u'e', 0x1d571:u'f', 0x1d572:u'g', 0x1d573:u'h',
0x1d574:u'i', 0x1d575:u'j', 0x1d576:u'k', 0x1d577:u'l',
0x1d578:u'm', 0x1d579:u'n', 0x1d57a:u'o', 0x1d57b:u'p',
0x1d57c:u'q', 0x1d57d:u'r', 0x1d57e:u's', 0x1d57f:u't',
0x1d580:u'u', 0x1d581:u'v', 0x1d582:u'w', 0x1d583:u'x',
0x1d584:u'y', 0x1d585:u'z', 0x1d5a0:u'a', 0x1d5a1:u'b',
0x1d5a2:u'c', 0x1d5a3:u'd', 0x1d5a4:u'e', 0x1d5a5:u'f',
0x1d5a6:u'g', 0x1d5a7:u'h', 0x1d5a8:u'i', 0x1d5a9:u'j',
0x1d5aa:u'k', 0x1d5ab:u'l', 0x1d5ac:u'm', 0x1d5ad:u'n',
0x1d5ae:u'o', 0x1d5af:u'p', 0x1d5b0:u'q', 0x1d5b1:u'r',
0x1d5b2:u's', 0x1d5b3:u't', 0x1d5b4:u'u', 0x1d5b5:u'v',
0x1d5b6:u'w', 0x1d5b7:u'x', 0x1d5b8:u'y', 0x1d5b9:u'z',
0x1d5d4:u'a', 0x1d5d5:u'b', 0x1d5d6:u'c', 0x1d5d7:u'd',
0x1d5d8:u'e', 0x1d5d9:u'f', 0x1d5da:u'g', 0x1d5db:u'h',
0x1d5dc:u'i', 0x1d5dd:u'j', 0x1d5de:u'k', 0x1d5df:u'l',
0x1d5e0:u'm', 0x1d5e1:u'n', 0x1d5e2:u'o', 0x1d5e3:u'p',
0x1d5e4:u'q', 0x1d5e5:u'r', 0x1d5e6:u's', 0x1d5e7:u't',
0x1d5e8:u'u', 0x1d5e9:u'v', 0x1d5ea:u'w', 0x1d5eb:u'x',
0x1d5ec:u'y', 0x1d5ed:u'z', 0x1d608:u'a', 0x1d609:u'b',
0x1d60a:u'c', 0x1d60b:u'd', 0x1d60c:u'e', 0x1d60d:u'f',
0x1d60e:u'g', 0x1d60f:u'h', 0x1d610:u'i', 0x1d611:u'j',
0x1d612:u'k', 0x1d613:u'l', 0x1d614:u'm', 0x1d615:u'n',
0x1d616:u'o', 0x1d617:u'p', 0x1d618:u'q', 0x1d619:u'r',
0x1d61a:u's', 0x1d61b:u't', 0x1d61c:u'u', 0x1d61d:u'v',
0x1d61e:u'w', 0x1d61f:u'x', 0x1d620:u'y', 0x1d621:u'z',
0x1d63c:u'a', 0x1d63d:u'b', 0x1d63e:u'c', 0x1d63f:u'd',
0x1d640:u'e', 0x1d641:u'f', 0x1d642:u'g', 0x1d643:u'h',
0x1d644:u'i', 0x1d645:u'j', 0x1d646:u'k', 0x1d647:u'l',
0x1d648:u'm', 0x1d649:u'n', 0x1d64a:u'o', 0x1d64b:u'p',
0x1d64c:u'q', 0x1d64d:u'r', 0x1d64e:u's', 0x1d64f:u't',
0x1d650:u'u', 0x1d651:u'v', 0x1d652:u'w', 0x1d653:u'x',
0x1d654:u'y', 0x1d655:u'z', 0x1d670:u'a', 0x1d671:u'b',
0x1d672:u'c', 0x1d673:u'd', 0x1d674:u'e', 0x1d675:u'f',
0x1d676:u'g', 0x1d677:u'h', 0x1d678:u'i', 0x1d679:u'j',
0x1d67a:u'k', 0x1d67b:u'l', 0x1d67c:u'm', 0x1d67d:u'n',
0x1d67e:u'o', 0x1d67f:u'p', 0x1d680:u'q', 0x1d681:u'r',
0x1d682:u's', 0x1d683:u't', 0x1d684:u'u', 0x1d685:u'v',
0x1d686:u'w', 0x1d687:u'x', 0x1d688:u'y', 0x1d689:u'z',
0x1d6a8:u'\u03b1', 0x1d6a9:u'\u03b2', 0x1d6aa:u'\u03b3', 0x1d6ab:u'\u03b4',
0x1d6ac:u'\u03b5', 0x1d6ad:u'\u03b6', 0x1d6ae:u'\u03b7', 0x1d6af:u'\u03b8',
0x1d6b0:u'\u03b9', 0x1d6b1:u'\u03ba', 0x1d6b2:u'\u03bb', 0x1d6b3:u'\u03bc',
0x1d6b4:u'\u03bd', 0x1d6b5:u'\u03be', 0x1d6b6:u'\u03bf', 0x1d6b7:u'\u03c0',
0x1d6b8:u'\u03c1', 0x1d6b9:u'\u03b8', 0x1d6ba:u'\u03c3', 0x1d6bb:u'\u03c4',
0x1d6bc:u'\u03c5', 0x1d6bd:u'\u03c6', 0x1d6be:u'\u03c7', 0x1d6bf:u'\u03c8',
0x1d6c0:u'\u03c9', 0x1d6d3:u'\u03c3', 0x1d6e2:u'\u03b1', 0x1d6e3:u'\u03b2',
0x1d6e4:u'\u03b3', 0x1d6e5:u'\u03b4', 0x1d6e6:u'\u03b5', 0x1d6e7:u'\u03b6',
0x1d6e8:u'\u03b7', 0x1d6e9:u'\u03b8', 0x1d6ea:u'\u03b9', 0x1d6eb:u'\u03ba',
0x1d6ec:u'\u03bb', 0x1d6ed:u'\u03bc', 0x1d6ee:u'\u03bd', 0x1d6ef:u'\u03be',
0x1d6f0:u'\u03bf', 0x1d6f1:u'\u03c0', 0x1d6f2:u'\u03c1', 0x1d6f3:u'\u03b8',
0x1d6f4:u'\u03c3', 0x1d6f5:u'\u03c4', 0x1d6f6:u'\u03c5', 0x1d6f7:u'\u03c6',
0x1d6f8:u'\u03c7', 0x1d6f9:u'\u03c8', 0x1d6fa:u'\u03c9', 0x1d70d:u'\u03c3',
0x1d71c:u'\u03b1', 0x1d71d:u'\u03b2', 0x1d71e:u'\u03b3', 0x1d71f:u'\u03b4',
0x1d720:u'\u03b5', 0x1d721:u'\u03b6', 0x1d722:u'\u03b7', 0x1d723:u'\u03b8',
0x1d724:u'\u03b9', 0x1d725:u'\u03ba', 0x1d726:u'\u03bb', 0x1d727:u'\u03bc',
0x1d728:u'\u03bd', 0x1d729:u'\u03be', 0x1d72a:u'\u03bf', 0x1d72b:u'\u03c0',
0x1d72c:u'\u03c1', 0x1d72d:u'\u03b8', 0x1d72e:u'\u03c3', 0x1d72f:u'\u03c4',
0x1d730:u'\u03c5', 0x1d731:u'\u03c6', 0x1d732:u'\u03c7', 0x1d733:u'\u03c8',
0x1d734:u'\u03c9', 0x1d747:u'\u03c3', 0x1d756:u'\u03b1', 0x1d757:u'\u03b2',
0x1d758:u'\u03b3', 0x1d759:u'\u03b4', 0x1d75a:u'\u03b5', 0x1d75b:u'\u03b6',
0x1d75c:u'\u03b7', 0x1d75d:u'\u03b8', 0x1d75e:u'\u03b9', 0x1d75f:u'\u03ba',
0x1d760:u'\u03bb', 0x1d761:u'\u03bc', 0x1d762:u'\u03bd', 0x1d763:u'\u03be',
0x1d764:u'\u03bf', 0x1d765:u'\u03c0', 0x1d766:u'\u03c1', 0x1d767:u'\u03b8',
0x1d768:u'\u03c3', 0x1d769:u'\u03c4', 0x1d76a:u'\u03c5', 0x1d76b:u'\u03c6',
0x1d76c:u'\u03c7', 0x1d76d:u'\u03c8', 0x1d76e:u'\u03c9', 0x1d781:u'\u03c3',
0x1d790:u'\u03b1', 0x1d791:u'\u03b2', 0x1d792:u'\u03b3', 0x1d793:u'\u03b4',
0x1d794:u'\u03b5', 0x1d795:u'\u03b6', 0x1d796:u'\u03b7', 0x1d797:u'\u03b8',
0x1d798:u'\u03b9', 0x1d799:u'\u03ba', 0x1d79a:u'\u03bb', 0x1d79b:u'\u03bc',
0x1d79c:u'\u03bd', 0x1d79d:u'\u03be', 0x1d79e:u'\u03bf', 0x1d79f:u'\u03c0',
0x1d7a0:u'\u03c1', 0x1d7a1:u'\u03b8', 0x1d7a2:u'\u03c3', 0x1d7a3:u'\u03c4',
0x1d7a4:u'\u03c5', 0x1d7a5:u'\u03c6', 0x1d7a6:u'\u03c7', 0x1d7a7:u'\u03c8',
0x1d7a8:u'\u03c9', 0x1d7bb:u'\u03c3', }
def map_table_b3(code):
r = b3_exceptions.get(ord(code))
if r is not None: return r
return code.lower()
def map_table_b2(a):
al = map_table_b3(a)
b = unicodedata.normalize("NFKC", al)
bl = u"".join([map_table_b3(ch) for ch in b])
c = unicodedata.normalize("NFKC", bl)
if b != c:
return c
else:
return al
def in_table_c11(code):
return code == u" "
def in_table_c12(code):
return unicodedata.category(code) == "Zs" and code != u" "
def in_table_c11_c12(code):
return unicodedata.category(code) == "Zs"
def in_table_c21(code):
return ord(code) < 128 and unicodedata.category(code) == "Cc"
c22_specials = set([1757, 1807, 6158, 8204, 8205, 8232, 8233, 65279] + range(8288,8292) + range(8298,8304) + range(65529,65533) + range(119155,119163))
def in_table_c22(code):
c = ord(code)
if c < 128: return False
if unicodedata.category(code) == "Cc": return True
return c in c22_specials
def in_table_c21_c22(code):
return unicodedata.category(code) == "Cc" or \
ord(code) in c22_specials
def in_table_c3(code):
return unicodedata.category(code) == "Co"
def in_table_c4(code):
c = ord(code)
if c < 0xFDD0: return False
if c < 0xFDF0: return True
return (ord(code) & 0xFFFF) in (0xFFFE, 0xFFFF)
def in_table_c5(code):
return unicodedata.category(code) == "Cs"
c6_set = set(range(65529,65534))
def in_table_c6(code):
return ord(code) in c6_set
c7_set = set(range(12272,12284))
def in_table_c7(code):
return ord(code) in c7_set
c8_set = set([832, 833, 8206, 8207] + range(8234,8239) + range(8298,8304))
def in_table_c8(code):
return ord(code) in c8_set
c9_set = set([917505] + range(917536,917632))
def in_table_c9(code):
return ord(code) in c9_set
def in_table_d1(code):
return unicodedata.bidirectional(code) in ("R","AL")
def in_table_d2(code):
return unicodedata.bidirectional(code) == "L"
from _struct import *
from _struct import _clearcache
from _struct import __doc__
# subprocess - Subprocesses with accessible I/O streams
#
# For more information about this module, see PEP 324.
#
# Copyright (c) 2003-2005 by Peter Astrand <[email protected]>
#
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
r"""Subprocesses with accessible I/O streams
This module allows you to spawn processes, connect to their
input/output/error pipes, and obtain their return codes.
For a complete description of this module see the Python documentation.
Main API
========
call(...): Runs a command, waits for it to complete, then returns
the return code.
check_call(...): Same as call() but raises CalledProcessError()
if return code is not 0
check_output(...): Same as check_call() but returns the contents of
stdout instead of a return code
Popen(...): A class for flexibly executing a command in a new process
Constants
---------
PIPE: Special value that indicates a pipe should be created
STDOUT: Special value that indicates that stderr should go to stdout
"""
import sys, os
mswindows = (sys.platform == "win32" or (sys.platform == "cli" and os.name == "nt"))
mono = (sys.platform == "cli" and os.name == "posix")
import types
import traceback
import gc
import signal
import errno
# Exception classes used by this module.
class CalledProcessError(Exception):
"""This exception is raised when a process run by check_call() or
check_output() returns a non-zero exit status.
Attributes:
cmd, returncode, output
"""
def __init__(self, returncode, cmd, output=None):
self.returncode = returncode
self.cmd = cmd
self.output = output
def __str__(self):
return "Command '%s' returned non-zero exit status %d" % (self.cmd, self.returncode)
if mswindows:
import threading
import msvcrt
import _subprocess
class STARTUPINFO:
dwFlags = 0
hStdInput = None
hStdOutput = None
hStdError = None
wShowWindow = 0
class pywintypes:
error = IOError
elif mono:
import threading
_has_poll = False
import clr
if clr.IsNetCoreApp:
clr.AddReference("System.Diagnostics.Process")
from System.Diagnostics import Process
from System.IO import MemoryStream
else:
import select
_has_poll = hasattr(select, 'poll')
try:
import threading
except ImportError:
threading = None
import fcntl
import pickle
# When select or poll has indicated that the file is writable,
# we can write up to _PIPE_BUF bytes without risk of blocking.
# POSIX defines PIPE_BUF as >= 512.
_PIPE_BUF = getattr(select, 'PIPE_BUF', 512)
__all__ = ["Popen", "PIPE", "STDOUT", "call", "check_call",
"check_output", "CalledProcessError"]
if mswindows:
from _subprocess import (CREATE_NEW_CONSOLE, CREATE_NEW_PROCESS_GROUP,
STD_INPUT_HANDLE, STD_OUTPUT_HANDLE,
STD_ERROR_HANDLE, SW_HIDE,
STARTF_USESTDHANDLES, STARTF_USESHOWWINDOW)
__all__.extend(["CREATE_NEW_CONSOLE", "CREATE_NEW_PROCESS_GROUP",
"STD_INPUT_HANDLE", "STD_OUTPUT_HANDLE",
"STD_ERROR_HANDLE", "SW_HIDE",
"STARTF_USESTDHANDLES", "STARTF_USESHOWWINDOW"])
try:
MAXFD = os.sysconf("SC_OPEN_MAX")
except:
MAXFD = 256
_active = []
def _cleanup():
for inst in _active[:]:
res = inst._internal_poll(_deadstate=sys.maxint)
if res is not None:
try:
_active.remove(inst)
except ValueError:
# This can happen if two threads create a new Popen instance.
# It's harmless that it was already removed, so ignore.
pass
PIPE = -1
STDOUT = -2
def _eintr_retry_call(func, *args):
while True:
try:
return func(*args)
except (OSError, IOError) as e:
if e.errno == errno.EINTR:
continue
raise
# XXX This function is only used by multiprocessing and the test suite,
# but it's here so that it can be imported when Python is compiled without
# threads.
def _args_from_interpreter_flags():
"""Return a list of command-line arguments reproducing the current
settings in sys.flags and sys.warnoptions."""
flag_opt_map = {
'debug': 'd',
# 'inspect': 'i',
# 'interactive': 'i',
'optimize': 'O',
'dont_write_bytecode': 'B',
'no_user_site': 's',
'no_site': 'S',
'ignore_environment': 'E',
'verbose': 'v',
'bytes_warning': 'b',
'py3k_warning': '3',
}
args = []
for flag, opt in flag_opt_map.items():
v = getattr(sys.flags, flag)
if v > 0:
args.append('-' + opt * v)
if getattr(sys.flags, 'hash_randomization') != 0:
args.append('-R')
for opt in sys.warnoptions:
args.append('-W' + opt)
return args
def call(*popenargs, **kwargs):
"""Run command with arguments. Wait for command to complete, then
return the returncode attribute.
The arguments are the same as for the Popen constructor. Example:
retcode = call(["ls", "-l"])
"""
return Popen(*popenargs, **kwargs).wait()
def check_call(*popenargs, **kwargs):
"""Run command with arguments. Wait for command to complete. If
the exit code was zero then return, otherwise raise
CalledProcessError. The CalledProcessError object will have the
return code in the returncode attribute.
The arguments are the same as for the Popen constructor. Example:
check_call(["ls", "-l"])
"""
retcode = call(*popenargs, **kwargs)
if retcode:
cmd = kwargs.get("args")
if cmd is None:
cmd = popenargs[0]
raise CalledProcessError(retcode, cmd)
return 0
def check_output(*popenargs, **kwargs):
r"""Run command with arguments and return its output as a byte string.
If the exit code was non-zero it raises a CalledProcessError. The
CalledProcessError object will have the return code in the returncode
attribute and output in the output attribute.
The arguments are the same as for the Popen constructor. Example:
>>> check_output(["ls", "-l", "/dev/null"])
'crw-rw-rw- 1 root root 1, 3 Oct 18 2007 /dev/null\n'
The stdout argument is not allowed as it is used internally.
To capture standard error in the result, use stderr=STDOUT.
>>> check_output(["/bin/sh", "-c",
... "ls -l non_existent_file ; exit 0"],
... stderr=STDOUT)
'ls: non_existent_file: No such file or directory\n'
"""
if mono:
raise NotImplementedError("check_output not currently supported on .NET platforms")
if 'stdout' in kwargs:
raise ValueError('stdout argument not allowed, it will be overridden.')
process = Popen(stdout=PIPE, *popenargs, **kwargs)
output, unused_err = process.communicate()
retcode = process.poll()
if retcode:
cmd = kwargs.get("args")
if cmd is None:
cmd = popenargs[0]
raise CalledProcessError(retcode, cmd, output=output)
return output
def list2cmdline(seq):
"""
Translate a sequence of arguments into a command line
string, using the same rules as the MS C runtime:
1) Arguments are delimited by white space, which is either a
space or a tab.
2) A string surrounded by double quotation marks is
interpreted as a single argument, regardless of white space
contained within. A quoted string can be embedded in an
argument.
3) A double quotation mark preceded by a backslash is
interpreted as a literal double quotation mark.
4) Backslashes are interpreted literally, unless they
immediately precede a double quotation mark.
5) If backslashes immediately precede a double quotation mark,
every pair of backslashes is interpreted as a literal
backslash. If the number of backslashes is odd, the last
backslash escapes the next double quotation mark as
described in rule 3.
"""
# See
# http://msdn.microsoft.com/en-us/library/17w5ykft.aspx
# or search http://msdn.microsoft.com for
# "Parsing C++ Command-Line Arguments"
result = []
needquote = False
for arg in seq:
bs_buf = []
# Add a space to separate this argument from the others
if result:
result.append(' ')
needquote = (" " in arg) or ("\t" in arg) or not arg
if needquote:
result.append('"')
for c in arg:
if c == '\\':
# Don't know if we need to double yet.
bs_buf.append(c)
elif c == '"':
# Double backslashes.
result.append('\\' * len(bs_buf)*2)
bs_buf = []
result.append('\\"')
else:
# Normal char
if bs_buf:
result.extend(bs_buf)
bs_buf = []
result.append(c)
# Add remaining backslashes, if any.
if bs_buf:
result.extend(bs_buf)
if needquote:
result.extend(bs_buf)
result.append('"')
return ''.join(result)
class Popen(object):
""" Execute a child program in a new process.
For a complete description of the arguments see the Python documentation.
Arguments:
args: A string, or a sequence of program arguments.
bufsize: supplied as the buffering argument to the open() function when
creating the stdin/stdout/stderr pipe file objects
executable: A replacement program to execute.
stdin, stdout and stderr: These specify the executed programs' standard
input, standard output and standard error file handles, respectively.
preexec_fn: (POSIX only) An object to be called in the child process
just before the child is executed.
close_fds: Controls closing or inheriting of file descriptors.
shell: If true, the command will be executed through the shell.
cwd: Sets the current directory before the child is executed.
env: Defines the environment variables for the new process.
universal_newlines: If true, use universal line endings for file
objects stdin, stdout and stderr.
startupinfo and creationflags (Windows only)
Attributes:
stdin, stdout, stderr, pid, returncode
"""
_child_created = False # Set here since __del__ checks it
def __init__(self, args, bufsize=0, executable=None,
stdin=None, stdout=None, stderr=None,
preexec_fn=None, close_fds=False, shell=False,
cwd=None, env=None, universal_newlines=False,
startupinfo=None, creationflags=0):
"""Create new Popen instance."""
_cleanup()
if not isinstance(bufsize, (int, long)):
raise TypeError("bufsize must be an integer")
if mswindows:
if preexec_fn is not None:
raise ValueError("preexec_fn is not supported on Windows "
"platforms")
if close_fds and (stdin is not None or stdout is not None or
stderr is not None):
raise ValueError("close_fds is not supported on Windows "
"platforms if you redirect stdin/stdout/stderr")
elif mono:
if preexec_fn:
raise ValueError("preexec_fn is not supported on .NET platforms")
if close_fds:
raise ValueError("close_fds is not supported on .NET platforms")
if universal_newlines:
raise ValueError("universal_newlines is not supported on .NET platforms")
if startupinfo:
raise ValueError("startupinfo is not supported on .NET platforms")
if creationflags:
raise ValueError("creationflags is not supported on .NET platforms")
if stdin is not None and stdin != PIPE:
raise NotImplementedError("Cannont redirect stdin yet.")
if stderr == STDOUT:
raise NotImplementedError("Cannont redirect stderr to stdout yet.")
else:
# POSIX
if startupinfo is not None:
raise ValueError("startupinfo is only supported on Windows "
"platforms")
if creationflags != 0:
raise ValueError("creationflags is only supported on Windows "
"platforms")
self.stdin = None
self.stdout = None
self.stderr = None
self.pid = None
self.returncode = None
self.universal_newlines = universal_newlines
# Input and output objects. The general principle is like
# this:
#
# Parent Child
# ------ -----
# p2cwrite ---stdin---> p2cread
# c2pread <--stdout--- c2pwrite
# errread <--stderr--- errwrite
#
# On POSIX, the child objects are file descriptors. On
# Windows, these are Windows file handles. The parent objects
# are file descriptors on both platforms. The parent objects
# are None when not using PIPEs. The child objects are None
# when not redirecting.
(p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite), to_close = self._get_handles(stdin, stdout, stderr)
try:
self._execute_child(args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell, to_close,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite)
except Exception:
# Preserve original exception in case os.close raises.
exc_type, exc_value, exc_trace = sys.exc_info()
for fd in to_close:
try:
if mswindows:
fd.Close()
else:
os.close(fd)
except EnvironmentError:
pass
raise exc_type, exc_value, exc_trace
if mswindows:
if p2cwrite is not None:
p2cwrite = msvcrt.open_osfhandle(p2cwrite.Detach(), 0)
if c2pread is not None:
c2pread = msvcrt.open_osfhandle(c2pread.Detach(), 0)
if errread is not None:
errread = msvcrt.open_osfhandle(errread.Detach(), 0)
if p2cwrite is not None:
self.stdin = os.fdopen(p2cwrite, 'wb', bufsize)
if c2pread is not None:
if universal_newlines:
self.stdout = os.fdopen(c2pread, 'rU', bufsize)
else:
self.stdout = os.fdopen(c2pread, 'rb', bufsize)
if errread is not None:
if universal_newlines:
self.stderr = os.fdopen(errread, 'rU', bufsize)
else:
self.stderr = os.fdopen(errread, 'rb', bufsize)
def _translate_newlines(self, data):
data = data.replace("\r\n", "\n")
data = data.replace("\r", "\n")
return data
def __del__(self, _maxint=sys.maxint):
# If __init__ hasn't had a chance to execute (e.g. if it
# was passed an undeclared keyword argument), we don't
# have a _child_created attribute at all.
if not self._child_created:
# We didn't get to successfully create a child process.
return
# In case the child hasn't been waited on, check if it's done.
self._internal_poll(_deadstate=_maxint)
if self.returncode is None and _active is not None:
# Child is still running, keep us alive until we can wait on it.
_active.append(self)
def communicate(self, input=None):
"""Interact with process: Send data to stdin. Read data from
stdout and stderr, until end-of-file is reached. Wait for
process to terminate. The optional input argument should be a
string to be sent to the child process, or None, if no data
should be sent to the child.
communicate() returns a tuple (stdout, stderr)."""
# Optimization: If we are only using one pipe, or no pipe at
# all, using select() or threads is unnecessary.
if [self.stdin, self.stdout, self.stderr].count(None) >= 2:
stdout = None
stderr = None
if self.stdin:
if input:
try:
self.stdin.write(input)
except IOError as e:
if e.errno != errno.EPIPE and e.errno != errno.EINVAL:
raise
self.stdin.close()
elif self.stdout:
stdout = _eintr_retry_call(self.stdout.read)
self.stdout.close()
elif self.stderr:
stderr = _eintr_retry_call(self.stderr.read)
self.stderr.close()
self.wait()
return (stdout, stderr)
return self._communicate(input)
def poll(self):
"""Check if child process has terminated. Set and return returncode
attribute."""
return self._internal_poll()
if mswindows:
#
# Windows methods
#
def _get_handles(self, stdin, stdout, stderr):
"""Construct and return tuple with IO objects:
p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite
"""
to_close = set()
if stdin is None and stdout is None and stderr is None:
return (None, None, None, None, None, None), to_close
p2cread, p2cwrite = None, None
c2pread, c2pwrite = None, None
errread, errwrite = None, None
if stdin is None:
p2cread = _subprocess.GetStdHandle(_subprocess.STD_INPUT_HANDLE)
if p2cread is None:
p2cread, _ = _subprocess.CreatePipe(None, 0)
elif stdin == PIPE:
p2cread, p2cwrite = _subprocess.CreatePipe(None, 0)
elif isinstance(stdin, (int, long)):
p2cread = msvcrt.get_osfhandle(stdin)
else:
# Assuming file-like object
p2cread = msvcrt.get_osfhandle(stdin.fileno())
p2cread = self._make_inheritable(p2cread)
# We just duplicated the handle, it has to be closed at the end
to_close.add(p2cread)
if stdin == PIPE:
to_close.add(p2cwrite)
if stdout is None:
c2pwrite = _subprocess.GetStdHandle(_subprocess.STD_OUTPUT_HANDLE)
if c2pwrite is None:
_, c2pwrite = _subprocess.CreatePipe(None, 0)
elif stdout == PIPE:
c2pread, c2pwrite = _subprocess.CreatePipe(None, 0)
elif isinstance(stdout, (int, long)):
c2pwrite = msvcrt.get_osfhandle(stdout)
else:
# Assuming file-like object
c2pwrite = msvcrt.get_osfhandle(stdout.fileno())
c2pwrite = self._make_inheritable(c2pwrite)
# We just duplicated the handle, it has to be closed at the end
to_close.add(c2pwrite)
if stdout == PIPE:
to_close.add(c2pread)
if stderr is None:
errwrite = _subprocess.GetStdHandle(_subprocess.STD_ERROR_HANDLE)
if errwrite is None:
_, errwrite = _subprocess.CreatePipe(None, 0)
elif stderr == PIPE:
errread, errwrite = _subprocess.CreatePipe(None, 0)
elif stderr == STDOUT:
errwrite = c2pwrite
elif isinstance(stderr, (int, long)):
errwrite = msvcrt.get_osfhandle(stderr)
else:
# Assuming file-like object
errwrite = msvcrt.get_osfhandle(stderr.fileno())
errwrite = self._make_inheritable(errwrite)
# We just duplicated the handle, it has to be closed at the end
to_close.add(errwrite)
if stderr == PIPE:
to_close.add(errread)
return (p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite), to_close
def _make_inheritable(self, handle):
"""Return a duplicate of handle, which is inheritable"""
return _subprocess.DuplicateHandle(_subprocess.GetCurrentProcess(),
handle, _subprocess.GetCurrentProcess(), 0, 1,
_subprocess.DUPLICATE_SAME_ACCESS)
def _find_w9xpopen(self):
"""Find and return absolut path to w9xpopen.exe"""
w9xpopen = os.path.join(
os.path.dirname(_subprocess.GetModuleFileName(0)),
"w9xpopen.exe")
if not os.path.exists(w9xpopen):
# Eeek - file-not-found - possibly an embedding
# situation - see if we can locate it in sys.exec_prefix
w9xpopen = os.path.join(os.path.dirname(sys.exec_prefix),
"w9xpopen.exe")
if not os.path.exists(w9xpopen):
raise RuntimeError("Cannot locate w9xpopen.exe, which is "
"needed for Popen to work with your "
"shell or platform.")
return w9xpopen
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell, to_close,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite):
"""Execute program (MS Windows version)"""
if not isinstance(args, types.StringTypes):
args = list2cmdline(args)
# Process startup details
if startupinfo is None:
startupinfo = STARTUPINFO()
if None not in (p2cread, c2pwrite, errwrite):
startupinfo.dwFlags |= _subprocess.STARTF_USESTDHANDLES
startupinfo.hStdInput = p2cread
startupinfo.hStdOutput = c2pwrite
startupinfo.hStdError = errwrite
if shell:
startupinfo.dwFlags |= _subprocess.STARTF_USESHOWWINDOW
startupinfo.wShowWindow = _subprocess.SW_HIDE
comspec = os.environ.get("COMSPEC", "cmd.exe")
args = '{} /c "{}"'.format (comspec, args)
if (_subprocess.GetVersion() >= 0x80000000 or
os.path.basename(comspec).lower() == "command.com"):
# Win9x, or using command.com on NT. We need to
# use the w9xpopen intermediate program. For more
# information, see KB Q150956
# (http://web.archive.org/web/20011105084002/http://support.microsoft.com/support/kb/articles/Q150/9/56.asp)
w9xpopen = self._find_w9xpopen()
args = '"%s" %s' % (w9xpopen, args)
# Not passing CREATE_NEW_CONSOLE has been known to
# cause random failures on win9x. Specifically a
# dialog: "Your program accessed mem currently in
# use at xxx" and a hopeful warning about the
# stability of your system. Cost is Ctrl+C wont
# kill children.
creationflags |= _subprocess.CREATE_NEW_CONSOLE
def _close_in_parent(fd):
fd.Close()
to_close.remove(fd)
# Start the process
try:
hp, ht, pid, tid = _subprocess.CreateProcess(executable, args,
# no special security
None, None,
int(not close_fds),
creationflags,
env,
cwd,
startupinfo)
except pywintypes.error, e:
# Translate pywintypes.error to WindowsError, which is
# a subclass of OSError. FIXME: We should really
# translate errno using _sys_errlist (or similar), but
# how can this be done from Python?
raise WindowsError(*e.args)
finally:
# Child is launched. Close the parent's copy of those pipe
# handles that only the child should have open. You need
# to make sure that no handles to the write end of the
# output pipe are maintained in this process or else the
# pipe will not close when the child process exits and the
# ReadFile will hang.
if p2cread is not None:
_close_in_parent(p2cread)
if c2pwrite is not None:
_close_in_parent(c2pwrite)
if errwrite is not None:
_close_in_parent(errwrite)
# Retain the process handle, but close the thread handle
self._child_created = True
self._handle = hp
self.pid = pid
ht.Close()
def _internal_poll(self, _deadstate=None,
_WaitForSingleObject=_subprocess.WaitForSingleObject,
_WAIT_OBJECT_0=_subprocess.WAIT_OBJECT_0,
_GetExitCodeProcess=_subprocess.GetExitCodeProcess):
"""Check if child process has terminated. Returns returncode
attribute.
This method is called by __del__, so it can only refer to objects
in its local scope.
"""
if self.returncode is None:
if _WaitForSingleObject(self._handle, 0) == _WAIT_OBJECT_0:
self.returncode = _GetExitCodeProcess(self._handle)
return self.returncode
def wait(self):
"""Wait for child process to terminate. Returns returncode
attribute."""
if self.returncode is None:
_subprocess.WaitForSingleObject(self._handle,
_subprocess.INFINITE)
self.returncode = _subprocess.GetExitCodeProcess(self._handle)
return self.returncode
def _readerthread(self, fh, buffer):
buffer.append(fh.read())
def _communicate(self, input):
stdout = None # Return
stderr = None # Return
if self.stdout:
stdout = []
stdout_thread = threading.Thread(target=self._readerthread,
args=(self.stdout, stdout))
stdout_thread.setDaemon(True)
stdout_thread.start()
if self.stderr:
stderr = []
stderr_thread = threading.Thread(target=self._readerthread,
args=(self.stderr, stderr))
stderr_thread.setDaemon(True)
stderr_thread.start()
if self.stdin:
if input is not None:
try:
self.stdin.write(input)
except IOError as e:
if e.errno == errno.EPIPE:
# communicate() should ignore broken pipe error
pass
elif e.errno == errno.EINVAL:
# bpo-19612, bpo-30418: On Windows, stdin.write()
# fails with EINVAL if the child process exited or
# if the child process is still running but closed
# the pipe.
pass
else:
raise
self.stdin.close()
if self.stdout:
stdout_thread.join()
if self.stderr:
stderr_thread.join()
# All data exchanged. Translate lists into strings.
if stdout is not None:
stdout = stdout[0]
if stderr is not None:
stderr = stderr[0]
# Translate newlines, if requested. We cannot let the file
# object do the translation: It is based on stdio, which is
# impossible to combine with select (unless forcing no
# buffering).
if self.universal_newlines and hasattr(file, 'newlines'):
if stdout:
stdout = self._translate_newlines(stdout)
if stderr:
stderr = self._translate_newlines(stderr)
self.wait()
return (stdout, stderr)
def send_signal(self, sig):
"""Send a signal to the process
"""
if sig == signal.SIGTERM:
self.terminate()
elif sig == signal.CTRL_C_EVENT:
os.kill(self.pid, signal.CTRL_C_EVENT)
elif sig == signal.CTRL_BREAK_EVENT:
os.kill(self.pid, signal.CTRL_BREAK_EVENT)
else:
raise ValueError("Unsupported signal: {}".format(sig))
def terminate(self):
"""Terminates the process
"""
try:
_subprocess.TerminateProcess(self._handle, 1)
except OSError as e:
# ERROR_ACCESS_DENIED (winerror 5) is received when the
# process already died.
if e.winerror != 5:
raise
rc = _subprocess.GetExitCodeProcess(self._handle)
if rc == _subprocess.STILL_ACTIVE:
raise
self.returncode = rc
kill = terminate
elif mono:
def _get_handles(self, stdin, stdout, stderr):
to_close = set()
p2cread, p2cwrite = None, None
c2pread, c2pwrite = None, None
errread, errwrite = None, None
self._redirectStdin = stdin == PIPE
self._redirectStdout = stdout == PIPE
self._redirectStderr = stderr == PIPE
return (p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite), to_close
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell, to_close,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite):
if isinstance(args, types.StringTypes):
args = [args]
else:
args = list(args)
if shell:
args = ["/bin/sh", "-c"] + args
if executable:
args[0] = executable
if executable is None:
executable = args[0]
self.process = Process()
self.process.StartInfo.UseShellExecute = False
self.process.StartInfo.FileName = executable
self.process.StartInfo.Arguments = list2cmdline(args[1:])
if env:
self.process.StartInfo.EnvironmentVariables.Clear()
for k, v in env.items():
self.process.StartInfo.EnvironmentVariables.Add(k, v)
self.process.StartInfo.WorkingDirectory = cwd or os.getcwd()
self.process.StartInfo.RedirectStandardInput = self._redirectStdin
self.process.StartInfo.RedirectStandardOutput = self._redirectStdout
self.process.StartInfo.RedirectStandardError = self._redirectStderr
self.process.Start()
self.pid = self.process.Id
self.stdin = file(self.process.StandardInput.BaseStream) if self._redirectStdin else None
self.stdout = file(self.process.StandardOutput.BaseStream) if self._redirectStdout else None
self.stderr = file(self.process.StandardError.BaseStream) if self._redirectStderr else None
def _internal_poll(self):
return self.process.ExitCode if self.process.HasExited else None
def wait(self):
"""Wait for child process to terminate. Returns returncode
attribute."""
self.process.WaitForExit()
return self.process.ExitCode
def _readerthread(self, fh, buffer):
buffer.append(fh.read())
def _communicate(self, input):
stdout = None # Return
stderr = None # Return
if self.stdout:
stdout = []
stdout_thread = threading.Thread(target=self._readerthread,
args=(self.stdout, stdout))
stdout_thread.setDaemon(True)
stdout_thread.start()
if self.stderr:
stderr = []
stderr_thread = threading.Thread(target=self._readerthread,
args=(self.stderr, stderr))
stderr_thread.setDaemon(True)
stderr_thread.start()
if self.stdin:
if input is not None:
try:
self.stdin.write(input)
except IOError as e:
if e.errno == errno.EPIPE:
# communicate() should ignore broken pipe error
pass
elif (e.errno == errno.EINVAL
and self.poll() is not None):
# Issue #19612: stdin.write() fails with EINVAL
# if the process already exited before the write
pass
else:
raise
self.stdin.close()
if self.stdout:
stdout_thread.join()
if self.stderr:
stderr_thread.join()
# All data exchanged. Translate lists into strings.
if stdout is not None:
stdout = stdout[0]
if stderr is not None:
stderr = stderr[0]
# Translate newlines, if requested. We cannot let the file
# object do the translation: It is based on stdio, which is
# impossible to combine with select (unless forcing no
# buffering).
if self.universal_newlines and hasattr(file, 'newlines'):
if stdout:
stdout = self._translate_newlines(stdout)
if stderr:
stderr = self._translate_newlines(stderr)
self.wait()
return (stdout, stderr)
else:
#
# POSIX methods
#
def _get_handles(self, stdin, stdout, stderr):
"""Construct and return tuple with IO objects:
p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite
"""
to_close = set()
p2cread, p2cwrite = None, None
c2pread, c2pwrite = None, None
errread, errwrite = None, None
if stdin is None:
pass
elif stdin == PIPE:
p2cread, p2cwrite = self.pipe_cloexec()
to_close.update((p2cread, p2cwrite))
elif isinstance(stdin, (int, long)):
p2cread = stdin
else:
# Assuming file-like object
p2cread = stdin.fileno()
if stdout is None:
pass
elif stdout == PIPE:
c2pread, c2pwrite = self.pipe_cloexec()
to_close.update((c2pread, c2pwrite))
elif isinstance(stdout, (int, long)):
c2pwrite = stdout
else:
# Assuming file-like object
c2pwrite = stdout.fileno()
if stderr is None:
pass
elif stderr == PIPE:
errread, errwrite = self.pipe_cloexec()
to_close.update((errread, errwrite))
elif stderr == STDOUT:
if c2pwrite is not None:
errwrite = c2pwrite
else: # child's stdout is not set, use parent's stdout
errwrite = sys.__stdout__.fileno()
elif isinstance(stderr, (int, long)):
errwrite = stderr
else:
# Assuming file-like object
errwrite = stderr.fileno()
return (p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite), to_close
def _set_cloexec_flag(self, fd, cloexec=True):
try:
cloexec_flag = fcntl.FD_CLOEXEC
except AttributeError:
cloexec_flag = 1
old = fcntl.fcntl(fd, fcntl.F_GETFD)
if cloexec:
fcntl.fcntl(fd, fcntl.F_SETFD, old | cloexec_flag)
else:
fcntl.fcntl(fd, fcntl.F_SETFD, old & ~cloexec_flag)
def pipe_cloexec(self):
"""Create a pipe with FDs set CLOEXEC."""
# Pipes' FDs are set CLOEXEC by default because we don't want them
# to be inherited by other subprocesses: the CLOEXEC flag is removed
# from the child's FDs by _dup2(), between fork() and exec().
# This is not atomic: we would need the pipe2() syscall for that.
r, w = os.pipe()
self._set_cloexec_flag(r)
self._set_cloexec_flag(w)
return r, w
def _close_fds(self, but):
if hasattr(os, 'closerange'):
os.closerange(3, but)
os.closerange(but + 1, MAXFD)
else:
for i in xrange(3, MAXFD):
if i == but:
continue
try:
os.close(i)
except:
pass
# Used as a bandaid workaround for https://bugs.python.org/issue27448
# to prevent multiple simultaneous subprocess launches from interfering
# with one another and leaving gc disabled.
if threading:
_disabling_gc_lock = threading.Lock()
else:
class _noop_context_manager(object):
# A dummy context manager that does nothing for the rare
# user of a --without-threads build.
def __enter__(self): pass
def __exit__(self, *args): pass
_disabling_gc_lock = _noop_context_manager()
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell, to_close,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite):
"""Execute program (POSIX version)"""
if isinstance(args, types.StringTypes):
args = [args]
else:
args = list(args)
if shell:
args = ["/bin/sh", "-c"] + args
if executable:
args[0] = executable
if executable is None:
executable = args[0]
def _close_in_parent(fd):
os.close(fd)
to_close.remove(fd)
# For transferring possible exec failure from child to parent
# The first char specifies the exception type: 0 means
# OSError, 1 means some other error.
errpipe_read, errpipe_write = self.pipe_cloexec()
try:
try:
with self._disabling_gc_lock:
gc_was_enabled = gc.isenabled()
# Disable gc to avoid bug where gc -> file_dealloc ->
# write to stderr -> hang.
# https://bugs.python.org/issue1336
gc.disable()
try:
self.pid = os.fork()
except:
if gc_was_enabled:
gc.enable()
raise
self._child_created = True
if self.pid == 0:
# Child
try:
# Close parent's pipe ends
if p2cwrite is not None:
os.close(p2cwrite)
if c2pread is not None:
os.close(c2pread)
if errread is not None:
os.close(errread)
os.close(errpipe_read)
# When duping fds, if there arises a situation
# where one of the fds is either 0, 1 or 2, it
# is possible that it is overwritten (#12607).
if c2pwrite == 0:
c2pwrite = os.dup(c2pwrite)
if errwrite == 0 or errwrite == 1:
errwrite = os.dup(errwrite)
# Dup fds for child
def _dup2(a, b):
# dup2() removes the CLOEXEC flag but
# we must do it ourselves if dup2()
# would be a no-op (issue #10806).
if a == b:
self._set_cloexec_flag(a, False)
elif a is not None:
os.dup2(a, b)
_dup2(p2cread, 0)
_dup2(c2pwrite, 1)
_dup2(errwrite, 2)
# Close pipe fds. Make sure we don't close the
# same fd more than once, or standard fds.
closed = { None }
for fd in [p2cread, c2pwrite, errwrite]:
if fd not in closed and fd > 2:
os.close(fd)
closed.add(fd)
if cwd is not None:
os.chdir(cwd)
if preexec_fn:
preexec_fn()
# Close all other fds, if asked for - after
# preexec_fn(), which may open FDs.
if close_fds:
self._close_fds(but=errpipe_write)
if env is None:
os.execvp(executable, args)
else:
os.execvpe(executable, args, env)
except:
exc_type, exc_value, tb = sys.exc_info()
# Save the traceback and attach it to the exception object
exc_lines = traceback.format_exception(exc_type,
exc_value,
tb)
exc_value.child_traceback = ''.join(exc_lines)
os.write(errpipe_write, pickle.dumps(exc_value))
finally:
# This exitcode won't be reported to applications, so it
# really doesn't matter what we return.
os._exit(255)
# Parent
if gc_was_enabled:
gc.enable()
finally:
# be sure the FD is closed no matter what
os.close(errpipe_write)
# Wait for exec to fail or succeed; possibly raising exception
data = _eintr_retry_call(os.read, errpipe_read, 1048576)
pickle_bits = []
while data:
pickle_bits.append(data)
data = _eintr_retry_call(os.read, errpipe_read, 1048576)
data = "".join(pickle_bits)
finally:
if p2cread is not None and p2cwrite is not None:
_close_in_parent(p2cread)
if c2pwrite is not None and c2pread is not None:
_close_in_parent(c2pwrite)
if errwrite is not None and errread is not None:
_close_in_parent(errwrite)
# be sure the FD is closed no matter what
os.close(errpipe_read)
if data != "":
try:
_eintr_retry_call(os.waitpid, self.pid, 0)
except OSError as e:
if e.errno != errno.ECHILD:
raise
child_exception = pickle.loads(data)
raise child_exception
def _handle_exitstatus(self, sts, _WIFSIGNALED=os.WIFSIGNALED,
_WTERMSIG=os.WTERMSIG, _WIFEXITED=os.WIFEXITED,
_WEXITSTATUS=os.WEXITSTATUS, _WIFSTOPPED=os.WIFSTOPPED,
_WSTOPSIG=os.WSTOPSIG):
# This method is called (indirectly) by __del__, so it cannot
# refer to anything outside of its local scope.
if _WIFSIGNALED(sts):
self.returncode = -_WTERMSIG(sts)
elif _WIFEXITED(sts):
self.returncode = _WEXITSTATUS(sts)
elif _WIFSTOPPED(sts):
self.returncode = -_WSTOPSIG(sts)
else:
# Should never happen
raise RuntimeError("Unknown child exit status!")
def _internal_poll(self, _deadstate=None, _waitpid=os.waitpid,
_WNOHANG=os.WNOHANG, _os_error=os.error, _ECHILD=errno.ECHILD):
"""Check if child process has terminated. Returns returncode
attribute.
This method is called by __del__, so it cannot reference anything
outside of the local scope (nor can any methods it calls).
"""
if self.returncode is None:
try:
pid, sts = _waitpid(self.pid, _WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except _os_error as e:
if _deadstate is not None:
self.returncode = _deadstate
if e.errno == _ECHILD:
# This happens if SIGCLD is set to be ignored or
# waiting for child processes has otherwise been
# disabled for our process. This child is dead, we
# can't get the status.
# http://bugs.python.org/issue15756
self.returncode = 0
return self.returncode
def wait(self):
"""Wait for child process to terminate. Returns returncode
attribute."""
while self.returncode is None:
try:
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
except OSError as e:
if e.errno != errno.ECHILD:
raise
# This happens if SIGCLD is set to be ignored or waiting
# for child processes has otherwise been disabled for our
# process. This child is dead, we can't get the status.
pid = self.pid
sts = 0
# Check the pid and loop as waitpid has been known to return
# 0 even without WNOHANG in odd situations. issue14396.
if pid == self.pid:
self._handle_exitstatus(sts)
return self.returncode
def _communicate(self, input):
if self.stdin:
# Flush stdio buffer. This might block, if the user has
# been writing to .stdin in an uncontrolled fashion.
self.stdin.flush()
if not input:
self.stdin.close()
if _has_poll:
stdout, stderr = self._communicate_with_poll(input)
else:
stdout, stderr = self._communicate_with_select(input)
# All data exchanged. Translate lists into strings.
if stdout is not None:
stdout = ''.join(stdout)
if stderr is not None:
stderr = ''.join(stderr)
# Translate newlines, if requested. We cannot let the file
# object do the translation: It is based on stdio, which is
# impossible to combine with select (unless forcing no
# buffering).
if self.universal_newlines and hasattr(file, 'newlines'):
if stdout:
stdout = self._translate_newlines(stdout)
if stderr:
stderr = self._translate_newlines(stderr)
self.wait()
return (stdout, stderr)
def _communicate_with_poll(self, input):
stdout = None # Return
stderr = None # Return
fd2file = {}
fd2output = {}
poller = select.poll()
def register_and_append(file_obj, eventmask):
poller.register(file_obj.fileno(), eventmask)
fd2file[file_obj.fileno()] = file_obj
def close_unregister_and_remove(fd):
poller.unregister(fd)
fd2file[fd].close()
fd2file.pop(fd)
if self.stdin and input:
register_and_append(self.stdin, select.POLLOUT)
select_POLLIN_POLLPRI = select.POLLIN | select.POLLPRI
if self.stdout:
register_and_append(self.stdout, select_POLLIN_POLLPRI)
fd2output[self.stdout.fileno()] = stdout = []
if self.stderr:
register_and_append(self.stderr, select_POLLIN_POLLPRI)
fd2output[self.stderr.fileno()] = stderr = []
input_offset = 0
while fd2file:
try:
ready = poller.poll()
except select.error, e:
if e.args[0] == errno.EINTR:
continue
raise
for fd, mode in ready:
if mode & select.POLLOUT:
chunk = input[input_offset : input_offset + _PIPE_BUF]
try:
input_offset += os.write(fd, chunk)
except OSError as e:
if e.errno == errno.EPIPE:
close_unregister_and_remove(fd)
else:
raise
else:
if input_offset >= len(input):
close_unregister_and_remove(fd)
elif mode & select_POLLIN_POLLPRI:
data = os.read(fd, 4096)
if not data:
close_unregister_and_remove(fd)
fd2output[fd].append(data)
else:
# Ignore hang up or errors.
close_unregister_and_remove(fd)
return (stdout, stderr)
def _communicate_with_select(self, input):
read_set = []
write_set = []
stdout = None # Return
stderr = None # Return
if self.stdin and input:
write_set.append(self.stdin)
if self.stdout:
read_set.append(self.stdout)
stdout = []
if self.stderr:
read_set.append(self.stderr)
stderr = []
input_offset = 0
while read_set or write_set:
try:
rlist, wlist, xlist = select.select(read_set, write_set, [])
except select.error, e:
if e.args[0] == errno.EINTR:
continue
raise
if self.stdin in wlist:
chunk = input[input_offset : input_offset + _PIPE_BUF]
try:
bytes_written = os.write(self.stdin.fileno(), chunk)
except OSError as e:
if e.errno == errno.EPIPE:
self.stdin.close()
write_set.remove(self.stdin)
else:
raise
else:
input_offset += bytes_written
if input_offset >= len(input):
self.stdin.close()
write_set.remove(self.stdin)
if self.stdout in rlist:
data = os.read(self.stdout.fileno(), 1024)
if data == "":
self.stdout.close()
read_set.remove(self.stdout)
stdout.append(data)
if self.stderr in rlist:
data = os.read(self.stderr.fileno(), 1024)
if data == "":
self.stderr.close()
read_set.remove(self.stderr)
stderr.append(data)
return (stdout, stderr)
def send_signal(self, sig):
"""Send a signal to the process
"""
os.kill(self.pid, sig)
def terminate(self):
"""Terminate the process with SIGTERM
"""
self.send_signal(signal.SIGTERM)
def kill(self):
"""Kill the process with SIGKILL
"""
self.send_signal(signal.SIGKILL)
def _demo_posix():
#
# Example 1: Simple redirection: Get process list
#
plist = Popen(["ps"], stdout=PIPE).communicate()[0]
print "Process list:"
print plist
#
# Example 2: Change uid before executing child
#
if os.getuid() == 0:
p = Popen(["id"], preexec_fn=lambda: os.setuid(100))
p.wait()
#
# Example 3: Connecting several subprocesses
#
print "Looking for 'hda'..."
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
print repr(p2.communicate()[0])
#
# Example 4: Catch execution error
#
print
print "Trying a weird file..."
try:
print Popen(["/this/path/does/not/exist"]).communicate()
except OSError, e:
if e.errno == errno.ENOENT:
print "The file didn't exist. I thought so..."
print "Child traceback:"
print e.child_traceback
else:
print "Error", e.errno
else:
print >>sys.stderr, "Gosh. No error."
def _demo_windows():
#
# Example 1: Connecting several subprocesses
#
print "Looking for 'PROMPT' in set output..."
p1 = Popen("set", stdout=PIPE, shell=True)
p2 = Popen('find "PROMPT"', stdin=p1.stdout, stdout=PIPE)
print repr(p2.communicate()[0])
#
# Example 2: Simple execution of program
#
print "Executing calc..."
p = Popen("calc")
p.wait()
if __name__ == "__main__":
if mswindows:
_demo_windows()
else:
_demo_posix()
"""Stuff to parse Sun and NeXT audio files.
An audio file consists of a header followed by the data. The structure
of the header is as follows.
+---------------+
| magic word |
+---------------+
| header size |
+---------------+
| data size |
+---------------+
| encoding |
+---------------+
| sample rate |
+---------------+
| # of channels |
+---------------+
| info |
| |
+---------------+
The magic word consists of the 4 characters '.snd'. Apart from the
info field, all header fields are 4 bytes in size. They are all
32-bit unsigned integers encoded in big-endian byte order.
The header size really gives the start of the data.
The data size is the physical size of the data. From the other
parameters the number of frames can be calculated.
The encoding gives the way in which audio samples are encoded.
Possible values are listed below.
The info field currently consists of an ASCII string giving a
human-readable description of the audio file. The info field is
padded with NUL bytes to the header size.
Usage.
Reading audio files:
f = sunau.open(file, 'r')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods read(), seek(), and close().
When the setpos() and rewind() methods are not used, the seek()
method is not necessary.
This returns an instance of a class with the following public methods:
getnchannels() -- returns number of audio channels (1 for
mono, 2 for stereo)
getsampwidth() -- returns sample width in bytes
getframerate() -- returns sampling frequency
getnframes() -- returns number of audio frames
getcomptype() -- returns compression type ('NONE' or 'ULAW')
getcompname() -- returns human-readable version of
compression type ('not compressed' matches 'NONE')
getparams() -- returns a tuple consisting of all of the
above in the above order
getmarkers() -- returns None (for compatibility with the
aifc module)
getmark(id) -- raises an error since the mark does not
exist (for compatibility with the aifc module)
readframes(n) -- returns at most n frames of audio
rewind() -- rewind to the beginning of the audio stream
setpos(pos) -- seek to the specified position
tell() -- return the current position
close() -- close the instance (make it unusable)
The position returned by tell() and the position given to setpos()
are compatible and have nothing to do with the actual position in the
file.
The close() method is called automatically when the class instance
is destroyed.
Writing audio files:
f = sunau.open(file, 'w')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods write(), tell(), seek(), and
close().
This returns an instance of a class with the following public methods:
setnchannels(n) -- set the number of channels
setsampwidth(n) -- set the sample width
setframerate(n) -- set the frame rate
setnframes(n) -- set the number of frames
setcomptype(type, name)
-- set the compression type and the
human-readable compression type
setparams(tuple)-- set all parameters at once
tell() -- return current position in output file
writeframesraw(data)
-- write audio frames without pathing up the
file header
writeframes(data)
-- write audio frames and patch up the file header
close() -- patch up the file header and close the
output file
You should set the parameters before the first writeframesraw or
writeframes. The total number of frames does not need to be set,
but when it is set to the correct value, the header does not have to
be patched up.
It is best to first set all parameters, perhaps possibly the
compression type, and then write audio frames using writeframesraw.
When all frames have been written, either call writeframes('') or
close() to patch up the sizes in the header.
The close() method is called automatically when the class instance
is destroyed.
"""
# from <multimedia/audio_filehdr.h>
AUDIO_FILE_MAGIC = 0x2e736e64
AUDIO_FILE_ENCODING_MULAW_8 = 1
AUDIO_FILE_ENCODING_LINEAR_8 = 2
AUDIO_FILE_ENCODING_LINEAR_16 = 3
AUDIO_FILE_ENCODING_LINEAR_24 = 4
AUDIO_FILE_ENCODING_LINEAR_32 = 5
AUDIO_FILE_ENCODING_FLOAT = 6
AUDIO_FILE_ENCODING_DOUBLE = 7
AUDIO_FILE_ENCODING_ADPCM_G721 = 23
AUDIO_FILE_ENCODING_ADPCM_G722 = 24
AUDIO_FILE_ENCODING_ADPCM_G723_3 = 25
AUDIO_FILE_ENCODING_ADPCM_G723_5 = 26
AUDIO_FILE_ENCODING_ALAW_8 = 27
# from <multimedia/audio_hdr.h>
AUDIO_UNKNOWN_SIZE = 0xFFFFFFFFL # ((unsigned)(~0))
_simple_encodings = [AUDIO_FILE_ENCODING_MULAW_8,
AUDIO_FILE_ENCODING_LINEAR_8,
AUDIO_FILE_ENCODING_LINEAR_16,
AUDIO_FILE_ENCODING_LINEAR_24,
AUDIO_FILE_ENCODING_LINEAR_32,
AUDIO_FILE_ENCODING_ALAW_8]
class Error(Exception):
pass
def _read_u32(file):
x = 0L
for i in range(4):
byte = file.read(1)
if byte == '':
raise EOFError
x = x*256 + ord(byte)
return x
def _write_u32(file, x):
data = []
for i in range(4):
d, m = divmod(x, 256)
data.insert(0, m)
x = d
for i in range(4):
file.write(chr(int(data[i])))
class Au_read:
def __init__(self, f):
if type(f) == type(''):
import __builtin__
f = __builtin__.open(f, 'rb')
self.initfp(f)
def __del__(self):
if self._file:
self.close()
def initfp(self, file):
self._file = file
self._soundpos = 0
magic = int(_read_u32(file))
if magic != AUDIO_FILE_MAGIC:
raise Error, 'bad magic number'
self._hdr_size = int(_read_u32(file))
if self._hdr_size < 24:
raise Error, 'header size too small'
if self._hdr_size > 100:
raise Error, 'header size ridiculously large'
self._data_size = _read_u32(file)
if self._data_size != AUDIO_UNKNOWN_SIZE:
self._data_size = int(self._data_size)
self._encoding = int(_read_u32(file))
if self._encoding not in _simple_encodings:
raise Error, 'encoding not (yet) supported'
if self._encoding in (AUDIO_FILE_ENCODING_MULAW_8,
AUDIO_FILE_ENCODING_ALAW_8):
self._sampwidth = 2
self._framesize = 1
elif self._encoding == AUDIO_FILE_ENCODING_LINEAR_8:
self._framesize = self._sampwidth = 1
elif self._encoding == AUDIO_FILE_ENCODING_LINEAR_16:
self._framesize = self._sampwidth = 2
elif self._encoding == AUDIO_FILE_ENCODING_LINEAR_24:
self._framesize = self._sampwidth = 3
elif self._encoding == AUDIO_FILE_ENCODING_LINEAR_32:
self._framesize = self._sampwidth = 4
else:
raise Error, 'unknown encoding'
self._framerate = int(_read_u32(file))
self._nchannels = int(_read_u32(file))
self._framesize = self._framesize * self._nchannels
if self._hdr_size > 24:
self._info = file.read(self._hdr_size - 24)
for i in range(len(self._info)):
if self._info[i] == '\0':
self._info = self._info[:i]
break
else:
self._info = ''
try:
self._data_pos = file.tell()
except (AttributeError, IOError):
self._data_pos = None
def getfp(self):
return self._file
def getnchannels(self):
return self._nchannels
def getsampwidth(self):
return self._sampwidth
def getframerate(self):
return self._framerate
def getnframes(self):
if self._data_size == AUDIO_UNKNOWN_SIZE:
return AUDIO_UNKNOWN_SIZE
if self._encoding in _simple_encodings:
return self._data_size // self._framesize
return 0 # XXX--must do some arithmetic here
def getcomptype(self):
if self._encoding == AUDIO_FILE_ENCODING_MULAW_8:
return 'ULAW'
elif self._encoding == AUDIO_FILE_ENCODING_ALAW_8:
return 'ALAW'
else:
return 'NONE'
def getcompname(self):
if self._encoding == AUDIO_FILE_ENCODING_MULAW_8:
return 'CCITT G.711 u-law'
elif self._encoding == AUDIO_FILE_ENCODING_ALAW_8:
return 'CCITT G.711 A-law'
else:
return 'not compressed'
def getparams(self):
return self.getnchannels(), self.getsampwidth(), \
self.getframerate(), self.getnframes(), \
self.getcomptype(), self.getcompname()
def getmarkers(self):
return None
def getmark(self, id):
raise Error, 'no marks'
def readframes(self, nframes):
if self._encoding in _simple_encodings:
if nframes == AUDIO_UNKNOWN_SIZE:
data = self._file.read()
else:
data = self._file.read(nframes * self._framesize)
self._soundpos += len(data) // self._framesize
if self._encoding == AUDIO_FILE_ENCODING_MULAW_8:
import audioop
data = audioop.ulaw2lin(data, self._sampwidth)
return data
return None # XXX--not implemented yet
def rewind(self):
if self._data_pos is None:
raise IOError('cannot seek')
self._file.seek(self._data_pos)
self._soundpos = 0
def tell(self):
return self._soundpos
def setpos(self, pos):
if pos < 0 or pos > self.getnframes():
raise Error, 'position not in range'
if self._data_pos is None:
raise IOError('cannot seek')
self._file.seek(self._data_pos + pos * self._framesize)
self._soundpos = pos
def close(self):
self._file = None
class Au_write:
def __init__(self, f):
if type(f) == type(''):
import __builtin__
f = __builtin__.open(f, 'wb')
self.initfp(f)
def __del__(self):
if self._file:
self.close()
def initfp(self, file):
self._file = file
self._framerate = 0
self._nchannels = 0
self._sampwidth = 0
self._framesize = 0
self._nframes = AUDIO_UNKNOWN_SIZE
self._nframeswritten = 0
self._datawritten = 0
self._datalength = 0
self._info = ''
self._comptype = 'ULAW' # default is U-law
def setnchannels(self, nchannels):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
if nchannels not in (1, 2, 4):
raise Error, 'only 1, 2, or 4 channels supported'
self._nchannels = nchannels
def getnchannels(self):
if not self._nchannels:
raise Error, 'number of channels not set'
return self._nchannels
def setsampwidth(self, sampwidth):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
if sampwidth not in (1, 2, 4):
raise Error, 'bad sample width'
self._sampwidth = sampwidth
def getsampwidth(self):
if not self._framerate:
raise Error, 'sample width not specified'
return self._sampwidth
def setframerate(self, framerate):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
self._framerate = framerate
def getframerate(self):
if not self._framerate:
raise Error, 'frame rate not set'
return self._framerate
def setnframes(self, nframes):
if self._nframeswritten:
raise Error, 'cannot change parameters after starting to write'
if nframes < 0:
raise Error, '# of frames cannot be negative'
self._nframes = nframes
def getnframes(self):
return self._nframeswritten
def setcomptype(self, type, name):
if type in ('NONE', 'ULAW'):
self._comptype = type
else:
raise Error, 'unknown compression type'
def getcomptype(self):
return self._comptype
def getcompname(self):
if self._comptype == 'ULAW':
return 'CCITT G.711 u-law'
elif self._comptype == 'ALAW':
return 'CCITT G.711 A-law'
else:
return 'not compressed'
def setparams(self, params):
nchannels, sampwidth, framerate, nframes, comptype, compname = params
self.setnchannels(nchannels)
self.setsampwidth(sampwidth)
self.setframerate(framerate)
self.setnframes(nframes)
self.setcomptype(comptype, compname)
def getparams(self):
return self.getnchannels(), self.getsampwidth(), \
self.getframerate(), self.getnframes(), \
self.getcomptype(), self.getcompname()
def tell(self):
return self._nframeswritten
def writeframesraw(self, data):
self._ensure_header_written()
if self._comptype == 'ULAW':
import audioop
data = audioop.lin2ulaw(data, self._sampwidth)
nframes = len(data) // self._framesize
self._file.write(data)
self._nframeswritten = self._nframeswritten + nframes
self._datawritten = self._datawritten + len(data)
def writeframes(self, data):
self.writeframesraw(data)
if self._nframeswritten != self._nframes or \
self._datalength != self._datawritten:
self._patchheader()
def close(self):
if self._file:
try:
self._ensure_header_written()
if self._nframeswritten != self._nframes or \
self._datalength != self._datawritten:
self._patchheader()
self._file.flush()
finally:
self._file = None
#
# private methods
#
def _ensure_header_written(self):
if not self._nframeswritten:
if not self._nchannels:
raise Error, '# of channels not specified'
if not self._sampwidth:
raise Error, 'sample width not specified'
if not self._framerate:
raise Error, 'frame rate not specified'
self._write_header()
def _write_header(self):
if self._comptype == 'NONE':
if self._sampwidth == 1:
encoding = AUDIO_FILE_ENCODING_LINEAR_8
self._framesize = 1
elif self._sampwidth == 2:
encoding = AUDIO_FILE_ENCODING_LINEAR_16
self._framesize = 2
elif self._sampwidth == 4:
encoding = AUDIO_FILE_ENCODING_LINEAR_32
self._framesize = 4
else:
raise Error, 'internal error'
elif self._comptype == 'ULAW':
encoding = AUDIO_FILE_ENCODING_MULAW_8
self._framesize = 1
else:
raise Error, 'internal error'
self._framesize = self._framesize * self._nchannels
_write_u32(self._file, AUDIO_FILE_MAGIC)
header_size = 25 + len(self._info)
header_size = (header_size + 7) & ~7
_write_u32(self._file, header_size)
if self._nframes == AUDIO_UNKNOWN_SIZE:
length = AUDIO_UNKNOWN_SIZE
else:
length = self._nframes * self._framesize
try:
self._form_length_pos = self._file.tell()
except (AttributeError, IOError):
self._form_length_pos = None
_write_u32(self._file, length)
self._datalength = length
_write_u32(self._file, encoding)
_write_u32(self._file, self._framerate)
_write_u32(self._file, self._nchannels)
self._file.write(self._info)
self._file.write('\0'*(header_size - len(self._info) - 24))
def _patchheader(self):
if self._form_length_pos is None:
raise IOError('cannot seek')
self._file.seek(self._form_length_pos)
_write_u32(self._file, self._datawritten)
self._datalength = self._datawritten
self._file.seek(0, 2)
def open(f, mode=None):
if mode is None:
if hasattr(f, 'mode'):
mode = f.mode
else:
mode = 'rb'
if mode in ('r', 'rb'):
return Au_read(f)
elif mode in ('w', 'wb'):
return Au_write(f)
else:
raise Error, "mode must be 'r', 'rb', 'w', or 'wb'"
openfp = open
"""Interpret sun audio headers."""
from warnings import warnpy3k
warnpy3k("the sunaudio module has been removed in Python 3.0; "
"use the sunau module instead", stacklevel=2)
del warnpy3k
MAGIC = '.snd'
class error(Exception):
pass
def get_long_be(s):
"""Convert a 4-char value to integer."""
return (ord(s[0])<<24) | (ord(s[1])<<16) | (ord(s[2])<<8) | ord(s[3])
def gethdr(fp):
"""Read a sound header from an open file."""
if fp.read(4) != MAGIC:
raise error, 'gethdr: bad magic word'
hdr_size = get_long_be(fp.read(4))
data_size = get_long_be(fp.read(4))
encoding = get_long_be(fp.read(4))
sample_rate = get_long_be(fp.read(4))
channels = get_long_be(fp.read(4))
excess = hdr_size - 24
if excess < 0:
raise error, 'gethdr: bad hdr_size'
if excess > 0:
info = fp.read(excess)
else:
info = ''
return (data_size, encoding, sample_rate, channels, info)
def printhdr(file):
"""Read and print the sound header of a named file."""
hdr = gethdr(open(file, 'r'))
data_size, encoding, sample_rate, channels, info = hdr
while info[-1:] == '\0':
info = info[:-1]
print 'File name: ', file
print 'Data size: ', data_size
print 'Encoding: ', encoding
print 'Sample rate:', sample_rate
print 'Channels: ', channels
print 'Info: ', repr(info)
#! /usr/bin/env python
"""Non-terminal symbols of Python grammar (from "graminit.h")."""
# This file is automatically generated; please don't muck it up!
#
# To update the symbols in this file, 'cd' to the top directory of
# the python source tree after building the interpreter and run:
#
# ./python Lib/symbol.py
#--start constants--
single_input = 256
file_input = 257
eval_input = 258
decorator = 259
decorators = 260
decorated = 261
funcdef = 262
parameters = 263
varargslist = 264
fpdef = 265
fplist = 266
stmt = 267
simple_stmt = 268
small_stmt = 269
expr_stmt = 270
augassign = 271
print_stmt = 272
del_stmt = 273
pass_stmt = 274
flow_stmt = 275
break_stmt = 276
continue_stmt = 277
return_stmt = 278
yield_stmt = 279
raise_stmt = 280
import_stmt = 281
import_name = 282
import_from = 283
import_as_name = 284
dotted_as_name = 285
import_as_names = 286
dotted_as_names = 287
dotted_name = 288
global_stmt = 289
exec_stmt = 290
assert_stmt = 291
compound_stmt = 292
if_stmt = 293
while_stmt = 294
for_stmt = 295
try_stmt = 296
with_stmt = 297
with_item = 298
except_clause = 299
suite = 300
testlist_safe = 301
old_test = 302
old_lambdef = 303
test = 304
or_test = 305
and_test = 306
not_test = 307
comparison = 308
comp_op = 309
expr = 310
xor_expr = 311
and_expr = 312
shift_expr = 313
arith_expr = 314
term = 315
factor = 316
power = 317
atom = 318
listmaker = 319
testlist_comp = 320
lambdef = 321
trailer = 322
subscriptlist = 323
subscript = 324
sliceop = 325
exprlist = 326
testlist = 327
dictorsetmaker = 328
classdef = 329
arglist = 330
argument = 331
list_iter = 332
list_for = 333
list_if = 334
comp_iter = 335
comp_for = 336
comp_if = 337
testlist1 = 338
encoding_decl = 339
yield_expr = 340
#--end constants--
sym_name = {}
for _name, _value in globals().items():
if type(_value) is type(0):
sym_name[_value] = _name
def main():
import sys
import token
if len(sys.argv) == 1:
sys.argv = sys.argv + ["Include/graminit.h", "Lib/symbol.py"]
token.main()
if __name__ == "__main__":
main()
"""Provide access to Python's configuration information.
"""
import sys
import os
from os.path import pardir, realpath
_INSTALL_SCHEMES = {
'posix_prefix': {
'stdlib': '{base}/lib/python{py_version_short}',
'platstdlib': '{platbase}/lib/python{py_version_short}',
'purelib': '{base}/lib/python{py_version_short}/site-packages',
'platlib': '{platbase}/lib/python{py_version_short}/site-packages',
'include': '{base}/include/python{py_version_short}',
'platinclude': '{platbase}/include/python{py_version_short}',
'scripts': '{base}/bin',
'data': '{base}',
},
'posix_home': {
'stdlib': '{base}/lib/python',
'platstdlib': '{base}/lib/python',
'purelib': '{base}/lib/python',
'platlib': '{base}/lib/python',
'include': '{base}/include/python',
'platinclude': '{base}/include/python',
'scripts': '{base}/bin',
'data' : '{base}',
},
'nt': {
'stdlib': '{base}/Lib',
'platstdlib': '{base}/Lib',
'purelib': '{base}/Lib/site-packages',
'platlib': '{base}/Lib/site-packages',
'include': '{base}/Include',
'platinclude': '{base}/Include',
'scripts': '{base}/Scripts',
'data' : '{base}',
},
'os2': {
'stdlib': '{base}/Lib',
'platstdlib': '{base}/Lib',
'purelib': '{base}/Lib/site-packages',
'platlib': '{base}/Lib/site-packages',
'include': '{base}/Include',
'platinclude': '{base}/Include',
'scripts': '{base}/Scripts',
'data' : '{base}',
},
'os2_home': {
'stdlib': '{userbase}/lib/python{py_version_short}',
'platstdlib': '{userbase}/lib/python{py_version_short}',
'purelib': '{userbase}/lib/python{py_version_short}/site-packages',
'platlib': '{userbase}/lib/python{py_version_short}/site-packages',
'include': '{userbase}/include/python{py_version_short}',
'scripts': '{userbase}/bin',
'data' : '{userbase}',
},
'nt_user': {
'stdlib': '{userbase}/IronPython{py_version_nodot}',
'platstdlib': '{userbase}/IronPython{py_version_nodot}',
'purelib': '{userbase}/IronPython{py_version_nodot}/site-packages',
'platlib': '{userbase}/IronPython{py_version_nodot}/site-packages',
'include': '{userbase}/IronPython{py_version_nodot}/Include',
'scripts': '{userbase}/Scripts',
'data' : '{userbase}',
},
'posix_user': {
'stdlib': '{userbase}/lib/python{py_version_short}',
'platstdlib': '{userbase}/lib/python{py_version_short}',
'purelib': '{userbase}/lib/python{py_version_short}/site-packages',
'platlib': '{userbase}/lib/python{py_version_short}/site-packages',
'include': '{userbase}/include/python{py_version_short}',
'scripts': '{userbase}/bin',
'data' : '{userbase}',
},
'osx_framework_user': {
'stdlib': '{userbase}/lib/python',
'platstdlib': '{userbase}/lib/python',
'purelib': '{userbase}/lib/python/site-packages',
'platlib': '{userbase}/lib/python/site-packages',
'include': '{userbase}/include',
'scripts': '{userbase}/bin',
'data' : '{userbase}',
},
}
_SCHEME_KEYS = ('stdlib', 'platstdlib', 'purelib', 'platlib', 'include',
'scripts', 'data')
_PY_VERSION = sys.version.split()[0]
_PY_VERSION_SHORT = sys.version[:3]
_PY_VERSION_SHORT_NO_DOT = _PY_VERSION[0] + _PY_VERSION[2]
_PREFIX = os.path.normpath(sys.prefix)
_EXEC_PREFIX = os.path.normpath(sys.exec_prefix)
_CONFIG_VARS = None
_USER_BASE = None
def _safe_realpath(path):
try:
return realpath(path)
except OSError:
return path
if sys.executable:
_PROJECT_BASE = os.path.dirname(_safe_realpath(sys.executable))
else:
# sys.executable can be empty if argv[0] has been changed and Python is
# unable to retrieve the real program name
_PROJECT_BASE = _safe_realpath(os.getcwd())
if os.name == "nt" and "pcbuild" in _PROJECT_BASE[-8:].lower():
_PROJECT_BASE = _safe_realpath(os.path.join(_PROJECT_BASE, pardir))
# PC/VS7.1
if os.name == "nt" and "\\pc\\v" in _PROJECT_BASE[-10:].lower():
_PROJECT_BASE = _safe_realpath(os.path.join(_PROJECT_BASE, pardir, pardir))
# PC/VS9.0/amd64
if (os.name == "nt"
and os.path.basename(os.path.dirname(os.path.dirname(_PROJECT_BASE))).lower() == "pc"
and os.path.basename(os.path.dirname(_PROJECT_BASE)).lower() == "vs9.0"):
_PROJECT_BASE = _safe_realpath(os.path.join(_PROJECT_BASE, pardir, pardir, pardir))
# PC/AMD64
if os.name == "nt" and "\\pcbuild\\amd64" in _PROJECT_BASE[-14:].lower():
_PROJECT_BASE = _safe_realpath(os.path.join(_PROJECT_BASE, pardir, pardir))
# set for cross builds
if "_PYTHON_PROJECT_BASE" in os.environ:
# the build directory for posix builds
_PROJECT_BASE = os.path.normpath(os.path.abspath("."))
def is_python_build():
for fn in ("Setup.dist", "Setup.local"):
if os.path.isfile(os.path.join(_PROJECT_BASE, "Modules", fn)):
return True
return False
_PYTHON_BUILD = is_python_build()
if _PYTHON_BUILD:
for scheme in ('posix_prefix', 'posix_home'):
_INSTALL_SCHEMES[scheme]['include'] = '{projectbase}/Include'
_INSTALL_SCHEMES[scheme]['platinclude'] = '{srcdir}'
def _subst_vars(s, local_vars):
try:
return s.format(**local_vars)
except KeyError:
try:
return s.format(**os.environ)
except KeyError, var:
raise AttributeError('{%s}' % var)
def _extend_dict(target_dict, other_dict):
target_keys = target_dict.keys()
for key, value in other_dict.items():
if key in target_keys:
continue
target_dict[key] = value
def _expand_vars(scheme, vars):
res = {}
if vars is None:
vars = {}
_extend_dict(vars, get_config_vars())
for key, value in _INSTALL_SCHEMES[scheme].items():
if os.name in ('posix', 'nt'):
value = os.path.expanduser(value)
res[key] = os.path.normpath(_subst_vars(value, vars))
return res
def _get_default_scheme():
if sys.platform == 'cli':
return 'nt'
if os.name == 'posix':
# the default scheme for posix is posix_prefix
return 'posix_prefix'
return os.name
def _getuserbase():
env_base = os.environ.get("IRONPYTHONUSERBASE", None)
def joinuser(*args):
return os.path.expanduser(os.path.join(*args))
# what about 'os2emx', 'riscos' ?
if os.name == "nt":
base = os.environ.get("APPDATA") or "~"
return env_base if env_base else joinuser(base, "Python")
if sys.platform == "darwin":
framework = get_config_var("PYTHONFRAMEWORK")
if framework:
return env_base if env_base else \
joinuser("~", "Library", framework, "%d.%d"
% (sys.version_info[:2]))
return env_base if env_base else joinuser("~", ".local")
def _parse_makefile(filename, vars=None):
"""Parse a Makefile-style file.
A dictionary containing name/value pairs is returned. If an
optional dictionary is passed in as the second argument, it is
used instead of a new dictionary.
"""
import re
# Regexes needed for parsing Makefile (and similar syntaxes,
# like old-style Setup files).
_variable_rx = re.compile("([a-zA-Z][a-zA-Z0-9_]+)\s*=\s*(.*)")
_findvar1_rx = re.compile(r"\$\(([A-Za-z][A-Za-z0-9_]*)\)")
_findvar2_rx = re.compile(r"\${([A-Za-z][A-Za-z0-9_]*)}")
if vars is None:
vars = {}
done = {}
notdone = {}
with open(filename) as f:
lines = f.readlines()
for line in lines:
if line.startswith('#') or line.strip() == '':
continue
m = _variable_rx.match(line)
if m:
n, v = m.group(1, 2)
v = v.strip()
# `$$' is a literal `$' in make
tmpv = v.replace('$$', '')
if "$" in tmpv:
notdone[n] = v
else:
try:
v = int(v)
except ValueError:
# insert literal `$'
done[n] = v.replace('$$', '$')
else:
done[n] = v
# do variable interpolation here
while notdone:
for name in notdone.keys():
value = notdone[name]
m = _findvar1_rx.search(value) or _findvar2_rx.search(value)
if m:
n = m.group(1)
found = True
if n in done:
item = str(done[n])
elif n in notdone:
# get it on a subsequent round
found = False
elif n in os.environ:
# do it like make: fall back to environment
item = os.environ[n]
else:
done[n] = item = ""
if found:
after = value[m.end():]
value = value[:m.start()] + item + after
if "$" in after:
notdone[name] = value
else:
try: value = int(value)
except ValueError:
done[name] = value.strip()
else:
done[name] = value
del notdone[name]
else:
# bogus variable reference; just drop it since we can't deal
del notdone[name]
# strip spurious spaces
for k, v in done.items():
if isinstance(v, str):
done[k] = v.strip()
# save the results in the global dictionary
vars.update(done)
return vars
def get_makefile_filename():
"""Return the path of the Makefile."""
if _PYTHON_BUILD:
return os.path.join(_PROJECT_BASE, "Makefile")
return os.path.join(get_path('platstdlib'), "config", "Makefile")
# Issue #22199: retain undocumented private name for compatibility
_get_makefile_filename = get_makefile_filename
def _generate_posix_vars():
"""Generate the Python module containing build-time variables."""
import pprint
vars = {}
# load the installed Makefile:
makefile = get_makefile_filename()
try:
_parse_makefile(makefile, vars)
except IOError, e:
msg = "invalid Python installation: unable to open %s" % makefile
if hasattr(e, "strerror"):
msg = msg + " (%s)" % e.strerror
raise IOError(msg)
# load the installed pyconfig.h:
config_h = get_config_h_filename()
try:
with open(config_h) as f:
parse_config_h(f, vars)
except IOError, e:
msg = "invalid Python installation: unable to open %s" % config_h
if hasattr(e, "strerror"):
msg = msg + " (%s)" % e.strerror
raise IOError(msg)
# On AIX, there are wrong paths to the linker scripts in the Makefile
# -- these paths are relative to the Python source, but when installed
# the scripts are in another directory.
if _PYTHON_BUILD:
vars['LDSHARED'] = vars['BLDSHARED']
# There's a chicken-and-egg situation on OS X with regards to the
# _sysconfigdata module after the changes introduced by #15298:
# get_config_vars() is called by get_platform() as part of the
# `make pybuilddir.txt` target -- which is a precursor to the
# _sysconfigdata.py module being constructed. Unfortunately,
# get_config_vars() eventually calls _init_posix(), which attempts
# to import _sysconfigdata, which we won't have built yet. In order
# for _init_posix() to work, if we're on Darwin, just mock up the
# _sysconfigdata module manually and populate it with the build vars.
# This is more than sufficient for ensuring the subsequent call to
# get_platform() succeeds.
name = '_sysconfigdata'
if 'darwin' in sys.platform:
import imp
module = imp.new_module(name)
module.build_time_vars = vars
sys.modules[name] = module
pybuilddir = 'build/lib.%s-%s' % (get_platform(), sys.version[:3])
if hasattr(sys, "gettotalrefcount"):
pybuilddir += '-pydebug'
try:
os.makedirs(pybuilddir)
except OSError:
pass
destfile = os.path.join(pybuilddir, name + '.py')
with open(destfile, 'wb') as f:
f.write('# system configuration generated and used by'
' the sysconfig module\n')
f.write('build_time_vars = ')
pprint.pprint(vars, stream=f)
# Create file used for sys.path fixup -- see Modules/getpath.c
with open('pybuilddir.txt', 'w') as f:
f.write(pybuilddir)
def _init_posix(vars):
"""Initialize the module as appropriate for POSIX systems."""
# _sysconfigdata is generated at build time, see _generate_posix_vars()
from _sysconfigdata import build_time_vars
vars.update(build_time_vars)
def _init_non_posix(vars):
"""Initialize the module as appropriate for NT"""
# set basic install directories
vars['LIBDEST'] = get_path('stdlib')
vars['BINLIBDEST'] = get_path('platstdlib')
vars['INCLUDEPY'] = get_path('include')
vars['SO'] = '.pyd'
vars['EXE'] = '.exe'
vars['VERSION'] = _PY_VERSION_SHORT_NO_DOT
vars['BINDIR'] = os.path.dirname(_safe_realpath(sys.executable))
#
# public APIs
#
def parse_config_h(fp, vars=None):
"""Parse a config.h-style file.
A dictionary containing name/value pairs is returned. If an
optional dictionary is passed in as the second argument, it is
used instead of a new dictionary.
"""
import re
if vars is None:
vars = {}
define_rx = re.compile("#define ([A-Z][A-Za-z0-9_]+) (.*)\n")
undef_rx = re.compile("/[*] #undef ([A-Z][A-Za-z0-9_]+) [*]/\n")
while True:
line = fp.readline()
if not line:
break
m = define_rx.match(line)
if m:
n, v = m.group(1, 2)
try: v = int(v)
except ValueError: pass
vars[n] = v
else:
m = undef_rx.match(line)
if m:
vars[m.group(1)] = 0
return vars
def get_config_h_filename():
"""Returns the path of pyconfig.h."""
if _PYTHON_BUILD:
if os.name == "nt":
inc_dir = os.path.join(_PROJECT_BASE, "PC")
else:
inc_dir = _PROJECT_BASE
else:
inc_dir = get_path('platinclude')
return os.path.join(inc_dir, 'pyconfig.h')
def get_scheme_names():
"""Returns a tuple containing the schemes names."""
schemes = _INSTALL_SCHEMES.keys()
schemes.sort()
return tuple(schemes)
def get_path_names():
"""Returns a tuple containing the paths names."""
return _SCHEME_KEYS
def get_paths(scheme=_get_default_scheme(), vars=None, expand=True):
"""Returns a mapping containing an install scheme.
``scheme`` is the install scheme name. If not provided, it will
return the default scheme for the current platform.
"""
if expand:
return _expand_vars(scheme, vars)
else:
return _INSTALL_SCHEMES[scheme]
def get_path(name, scheme=_get_default_scheme(), vars=None, expand=True):
"""Returns a path corresponding to the scheme.
``scheme`` is the install scheme name.
"""
return get_paths(scheme, vars, expand)[name]
def get_config_vars(*args):
"""With no arguments, return a dictionary of all configuration
variables relevant for the current platform.
On Unix, this means every variable defined in Python's installed Makefile;
On Windows and Mac OS it's a much smaller set.
With arguments, return a list of values that result from looking up
each argument in the configuration variable dictionary.
"""
import re
global _CONFIG_VARS
if _CONFIG_VARS is None:
_CONFIG_VARS = {}
# Normalized versions of prefix and exec_prefix are handy to have;
# in fact, these are the standard versions used most places in the
# Distutils.
_CONFIG_VARS['prefix'] = _PREFIX
_CONFIG_VARS['exec_prefix'] = _EXEC_PREFIX
_CONFIG_VARS['py_version'] = _PY_VERSION
_CONFIG_VARS['py_version_short'] = _PY_VERSION_SHORT
_CONFIG_VARS['py_version_nodot'] = _PY_VERSION[0] + _PY_VERSION[2]
_CONFIG_VARS['base'] = _PREFIX
_CONFIG_VARS['platbase'] = _EXEC_PREFIX
_CONFIG_VARS['projectbase'] = _PROJECT_BASE
if os.name in ('nt', 'os2') or sys.platform == 'cli':
_init_non_posix(_CONFIG_VARS)
if os.name == 'posix' and sys.platform != 'cli':
_init_posix(_CONFIG_VARS)
# Setting 'userbase' is done below the call to the
# init function to enable using 'get_config_var' in
# the init-function.
_CONFIG_VARS['userbase'] = _getuserbase()
if 'srcdir' not in _CONFIG_VARS:
_CONFIG_VARS['srcdir'] = _PROJECT_BASE
# Convert srcdir into an absolute path if it appears necessary.
# Normally it is relative to the build directory. However, during
# testing, for example, we might be running a non-installed python
# from a different directory.
if _PYTHON_BUILD and os.name == "posix":
base = _PROJECT_BASE
try:
cwd = os.getcwd()
except OSError:
cwd = None
if (not os.path.isabs(_CONFIG_VARS['srcdir']) and
base != cwd):
# srcdir is relative and we are not in the same directory
# as the executable. Assume executable is in the build
# directory and make srcdir absolute.
srcdir = os.path.join(base, _CONFIG_VARS['srcdir'])
_CONFIG_VARS['srcdir'] = os.path.normpath(srcdir)
# OS X platforms require special customization to handle
# multi-architecture, multi-os-version installers
if sys.platform == 'darwin':
import _osx_support
_osx_support.customize_config_vars(_CONFIG_VARS)
if args:
vals = []
for name in args:
vals.append(_CONFIG_VARS.get(name))
return vals
else:
return _CONFIG_VARS
def get_config_var(name):
"""Return the value of a single variable using the dictionary returned by
'get_config_vars()'.
Equivalent to get_config_vars().get(name)
"""
return get_config_vars().get(name)
def get_platform():
"""Return a string that identifies the current platform.
This is used mainly to distinguish platform-specific build directories and
platform-specific built distributions. Typically includes the OS name
and version and the architecture (as supplied by 'os.uname()'),
although the exact information included depends on the OS; eg. for IRIX
the architecture isn't particularly important (IRIX only runs on SGI
hardware), but for Linux the kernel version isn't particularly
important.
Examples of returned values:
linux-i586
linux-alpha (?)
solaris-2.6-sun4u
irix-5.3
irix64-6.2
Windows will return one of:
win-amd64 (64bit Windows on AMD64 (aka x86_64, Intel64, EM64T, etc)
win-ia64 (64bit Windows on Itanium)
win32 (all others - specifically, sys.platform is returned)
For other non-POSIX platforms, currently just returns 'sys.platform'.
"""
import re
if os.name == 'nt':
# sniff sys.version for architecture.
prefix = " bit ("
i = sys.version.find(prefix)
if i == -1:
return sys.platform
j = sys.version.find(")", i)
look = sys.version[i+len(prefix):j].lower()
if look == 'amd64':
return 'win-amd64'
if look == 'itanium':
return 'win-ia64'
return sys.platform
# Set for cross builds explicitly
if "_PYTHON_HOST_PLATFORM" in os.environ:
return os.environ["_PYTHON_HOST_PLATFORM"]
if os.name != "posix" or not hasattr(os, 'uname'):
# XXX what about the architecture? NT is Intel or Alpha,
# Mac OS is M68k or PPC, etc.
return sys.platform
# Try to distinguish various flavours of Unix
osname, host, release, version, machine = os.uname()
# Convert the OS name to lowercase, remove '/' characters
# (to accommodate BSD/OS), and translate spaces (for "Power Macintosh")
osname = osname.lower().replace('/', '')
machine = machine.replace(' ', '_')
machine = machine.replace('/', '-')
if osname[:5] == "linux":
# At least on Linux/Intel, 'machine' is the processor --
# i386, etc.
# XXX what about Alpha, SPARC, etc?
return "%s-%s" % (osname, machine)
elif osname[:5] == "sunos":
if release[0] >= "5": # SunOS 5 == Solaris 2
osname = "solaris"
release = "%d.%s" % (int(release[0]) - 3, release[2:])
# We can't use "platform.architecture()[0]" because a
# bootstrap problem. We use a dict to get an error
# if some suspicious happens.
bitness = {2147483647:"32bit", 9223372036854775807:"64bit"}
machine += ".%s" % bitness[sys.maxint]
# fall through to standard osname-release-machine representation
elif osname[:4] == "irix": # could be "irix64"!
return "%s-%s" % (osname, release)
elif osname[:3] == "aix":
return "%s-%s.%s" % (osname, version, release)
elif osname[:6] == "cygwin":
osname = "cygwin"
rel_re = re.compile (r'[\d.]+')
m = rel_re.match(release)
if m:
release = m.group()
elif osname[:6] == "darwin":
import _osx_support
osname, release, machine = _osx_support.get_platform_osx(
get_config_vars(),
osname, release, machine)
return "%s-%s-%s" % (osname, release, machine)
def get_python_version():
return _PY_VERSION_SHORT
def _print_dict(title, data):
for index, (key, value) in enumerate(sorted(data.items())):
if index == 0:
print '%s: ' % (title)
print '\t%s = "%s"' % (key, value)
def _main():
"""Display all information sysconfig detains."""
if '--generate-posix-vars' in sys.argv:
_generate_posix_vars()
return
print 'Platform: "%s"' % get_platform()
print 'Python version: "%s"' % get_python_version()
print 'Current installation scheme: "%s"' % _get_default_scheme()
print
_print_dict('Paths', get_paths())
print
_print_dict('Variables', get_config_vars())
if __name__ == '__main__':
_main()
#! /usr/bin/env python
"""The Tab Nanny despises ambiguous indentation. She knows no mercy.
tabnanny -- Detection of ambiguous indentation
For the time being this module is intended to be called as a script.
However it is possible to import it into an IDE and use the function
check() described below.
Warning: The API provided by this module is likely to change in future
releases; such changes may not be backward compatible.
"""
# Released to the public domain, by Tim Peters, 15 April 1998.
# XXX Note: this is now a standard library module.
# XXX The API needs to undergo changes however; the current code is too
# XXX script-like. This will be addressed later.
__version__ = "6"
import os
import sys
import getopt
import tokenize
if not hasattr(tokenize, 'NL'):
raise ValueError("tokenize.NL doesn't exist -- tokenize module too old")
__all__ = ["check", "NannyNag", "process_tokens"]
verbose = 0
filename_only = 0
def errprint(*args):
sep = ""
for arg in args:
sys.stderr.write(sep + str(arg))
sep = " "
sys.stderr.write("\n")
def main():
global verbose, filename_only
try:
opts, args = getopt.getopt(sys.argv[1:], "qv")
except getopt.error, msg:
errprint(msg)
return
for o, a in opts:
if o == '-q':
filename_only = filename_only + 1
if o == '-v':
verbose = verbose + 1
if not args:
errprint("Usage:", sys.argv[0], "[-v] file_or_directory ...")
return
for arg in args:
check(arg)
class NannyNag(Exception):
"""
Raised by process_tokens() if detecting an ambiguous indent.
Captured and handled in check().
"""
def __init__(self, lineno, msg, line):
self.lineno, self.msg, self.line = lineno, msg, line
def get_lineno(self):
return self.lineno
def get_msg(self):
return self.msg
def get_line(self):
return self.line
def check(file):
"""check(file_or_dir)
If file_or_dir is a directory and not a symbolic link, then recursively
descend the directory tree named by file_or_dir, checking all .py files
along the way. If file_or_dir is an ordinary Python source file, it is
checked for whitespace related problems. The diagnostic messages are
written to standard output using the print statement.
"""
if os.path.isdir(file) and not os.path.islink(file):
if verbose:
print "%r: listing directory" % (file,)
names = os.listdir(file)
for name in names:
fullname = os.path.join(file, name)
if (os.path.isdir(fullname) and
not os.path.islink(fullname) or
os.path.normcase(name[-3:]) == ".py"):
check(fullname)
return
try:
f = open(file)
except IOError, msg:
errprint("%r: I/O Error: %s" % (file, msg))
return
if verbose > 1:
print "checking %r ..." % file
try:
process_tokens(tokenize.generate_tokens(f.readline))
except tokenize.TokenError, msg:
errprint("%r: Token Error: %s" % (file, msg))
return
except IndentationError, msg:
errprint("%r: Indentation Error: %s" % (file, msg))
return
except NannyNag, nag:
badline = nag.get_lineno()
line = nag.get_line()
if verbose:
print "%r: *** Line %d: trouble in tab city! ***" % (file, badline)
print "offending line: %r" % (line,)
print nag.get_msg()
else:
if ' ' in file: file = '"' + file + '"'
if filename_only: print file
else: print file, badline, repr(line)
return
if verbose:
print "%r: Clean bill of health." % (file,)
class Whitespace:
# the characters used for space and tab
S, T = ' \t'
# members:
# raw
# the original string
# n
# the number of leading whitespace characters in raw
# nt
# the number of tabs in raw[:n]
# norm
# the normal form as a pair (count, trailing), where:
# count
# a tuple such that raw[:n] contains count[i]
# instances of S * i + T
# trailing
# the number of trailing spaces in raw[:n]
# It's A Theorem that m.indent_level(t) ==
# n.indent_level(t) for all t >= 1 iff m.norm == n.norm.
# is_simple
# true iff raw[:n] is of the form (T*)(S*)
def __init__(self, ws):
self.raw = ws
S, T = Whitespace.S, Whitespace.T
count = []
b = n = nt = 0
for ch in self.raw:
if ch == S:
n = n + 1
b = b + 1
elif ch == T:
n = n + 1
nt = nt + 1
if b >= len(count):
count = count + [0] * (b - len(count) + 1)
count[b] = count[b] + 1
b = 0
else:
break
self.n = n
self.nt = nt
self.norm = tuple(count), b
self.is_simple = len(count) <= 1
# return length of longest contiguous run of spaces (whether or not
# preceding a tab)
def longest_run_of_spaces(self):
count, trailing = self.norm
return max(len(count)-1, trailing)
def indent_level(self, tabsize):
# count, il = self.norm
# for i in range(len(count)):
# if count[i]:
# il = il + (i/tabsize + 1)*tabsize * count[i]
# return il
# quicker:
# il = trailing + sum (i/ts + 1)*ts*count[i] =
# trailing + ts * sum (i/ts + 1)*count[i] =
# trailing + ts * sum i/ts*count[i] + count[i] =
# trailing + ts * [(sum i/ts*count[i]) + (sum count[i])] =
# trailing + ts * [(sum i/ts*count[i]) + num_tabs]
# and note that i/ts*count[i] is 0 when i < ts
count, trailing = self.norm
il = 0
for i in range(tabsize, len(count)):
il = il + i/tabsize * count[i]
return trailing + tabsize * (il + self.nt)
# return true iff self.indent_level(t) == other.indent_level(t)
# for all t >= 1
def equal(self, other):
return self.norm == other.norm
# return a list of tuples (ts, i1, i2) such that
# i1 == self.indent_level(ts) != other.indent_level(ts) == i2.
# Intended to be used after not self.equal(other) is known, in which
# case it will return at least one witnessing tab size.
def not_equal_witness(self, other):
n = max(self.longest_run_of_spaces(),
other.longest_run_of_spaces()) + 1
a = []
for ts in range(1, n+1):
if self.indent_level(ts) != other.indent_level(ts):
a.append( (ts,
self.indent_level(ts),
other.indent_level(ts)) )
return a
# Return True iff self.indent_level(t) < other.indent_level(t)
# for all t >= 1.
# The algorithm is due to Vincent Broman.
# Easy to prove it's correct.
# XXXpost that.
# Trivial to prove n is sharp (consider T vs ST).
# Unknown whether there's a faster general way. I suspected so at
# first, but no longer.
# For the special (but common!) case where M and N are both of the
# form (T*)(S*), M.less(N) iff M.len() < N.len() and
# M.num_tabs() <= N.num_tabs(). Proof is easy but kinda long-winded.
# XXXwrite that up.
# Note that M is of the form (T*)(S*) iff len(M.norm[0]) <= 1.
def less(self, other):
if self.n >= other.n:
return False
if self.is_simple and other.is_simple:
return self.nt <= other.nt
n = max(self.longest_run_of_spaces(),
other.longest_run_of_spaces()) + 1
# the self.n >= other.n test already did it for ts=1
for ts in range(2, n+1):
if self.indent_level(ts) >= other.indent_level(ts):
return False
return True
# return a list of tuples (ts, i1, i2) such that
# i1 == self.indent_level(ts) >= other.indent_level(ts) == i2.
# Intended to be used after not self.less(other) is known, in which
# case it will return at least one witnessing tab size.
def not_less_witness(self, other):
n = max(self.longest_run_of_spaces(),
other.longest_run_of_spaces()) + 1
a = []
for ts in range(1, n+1):
if self.indent_level(ts) >= other.indent_level(ts):
a.append( (ts,
self.indent_level(ts),
other.indent_level(ts)) )
return a
def format_witnesses(w):
firsts = map(lambda tup: str(tup[0]), w)
prefix = "at tab size"
if len(w) > 1:
prefix = prefix + "s"
return prefix + " " + ', '.join(firsts)
def process_tokens(tokens):
INDENT = tokenize.INDENT
DEDENT = tokenize.DEDENT
NEWLINE = tokenize.NEWLINE
JUNK = tokenize.COMMENT, tokenize.NL
indents = [Whitespace("")]
check_equal = 0
for (type, token, start, end, line) in tokens:
if type == NEWLINE:
# a program statement, or ENDMARKER, will eventually follow,
# after some (possibly empty) run of tokens of the form
# (NL | COMMENT)* (INDENT | DEDENT+)?
# If an INDENT appears, setting check_equal is wrong, and will
# be undone when we see the INDENT.
check_equal = 1
elif type == INDENT:
check_equal = 0
thisguy = Whitespace(token)
if not indents[-1].less(thisguy):
witness = indents[-1].not_less_witness(thisguy)
msg = "indent not greater e.g. " + format_witnesses(witness)
raise NannyNag(start[0], msg, line)
indents.append(thisguy)
elif type == DEDENT:
# there's nothing we need to check here! what's important is
# that when the run of DEDENTs ends, the indentation of the
# program statement (or ENDMARKER) that triggered the run is
# equal to what's left at the top of the indents stack
# Ouch! This assert triggers if the last line of the source
# is indented *and* lacks a newline -- then DEDENTs pop out
# of thin air.
# assert check_equal # else no earlier NEWLINE, or an earlier INDENT
check_equal = 1
del indents[-1]
elif check_equal and type not in JUNK:
# this is the first "real token" following a NEWLINE, so it
# must be the first token of the next program statement, or an
# ENDMARKER; the "line" argument exposes the leading whitespace
# for this statement; in the case of ENDMARKER, line is an empty
# string, so will properly match the empty string with which the
# "indents" stack was seeded
check_equal = 0
thisguy = Whitespace(line)
if not indents[-1].equal(thisguy):
witness = indents[-1].not_equal_witness(thisguy)
msg = "indent not equal e.g. " + format_witnesses(witness)
raise NannyNag(start[0], msg, line)
if __name__ == '__main__':
main()
# -*- coding: iso-8859-1 -*-
#-------------------------------------------------------------------
# tarfile.py
#-------------------------------------------------------------------
# Copyright (C) 2002 Lars Gust�bel <[email protected]>
# All rights reserved.
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
#
"""Read from and write to tar format archives.
"""
__version__ = "$Revision: 85213 $"
# $Source$
version = "0.9.0"
__author__ = "Lars Gust�bel ([email protected])"
__date__ = "$Date$"
__cvsid__ = "$Id$"
__credits__ = "Gustavo Niemeyer, Niels Gust�bel, Richard Townsend."
#---------
# Imports
#---------
from __builtin__ import open as bltn_open
import sys
import os
import shutil
import stat
import errno
import time
import struct
import copy
import re
import operator
try:
import grp, pwd
except ImportError:
grp = pwd = None
# from tarfile import *
__all__ = ["TarFile", "TarInfo", "is_tarfile", "TarError"]
#---------------------------------------------------------
# tar constants
#---------------------------------------------------------
NUL = "\0" # the null character
BLOCKSIZE = 512 # length of processing blocks
RECORDSIZE = BLOCKSIZE * 20 # length of records
GNU_MAGIC = "ustar \0" # magic gnu tar string
POSIX_MAGIC = "ustar\x0000" # magic posix tar string
LENGTH_NAME = 100 # maximum length of a filename
LENGTH_LINK = 100 # maximum length of a linkname
LENGTH_PREFIX = 155 # maximum length of the prefix field
REGTYPE = "0" # regular file
AREGTYPE = "\0" # regular file
LNKTYPE = "1" # link (inside tarfile)
SYMTYPE = "2" # symbolic link
CHRTYPE = "3" # character special device
BLKTYPE = "4" # block special device
DIRTYPE = "5" # directory
FIFOTYPE = "6" # fifo special device
CONTTYPE = "7" # contiguous file
GNUTYPE_LONGNAME = "L" # GNU tar longname
GNUTYPE_LONGLINK = "K" # GNU tar longlink
GNUTYPE_SPARSE = "S" # GNU tar sparse file
XHDTYPE = "x" # POSIX.1-2001 extended header
XGLTYPE = "g" # POSIX.1-2001 global header
SOLARIS_XHDTYPE = "X" # Solaris extended header
USTAR_FORMAT = 0 # POSIX.1-1988 (ustar) format
GNU_FORMAT = 1 # GNU tar format
PAX_FORMAT = 2 # POSIX.1-2001 (pax) format
DEFAULT_FORMAT = GNU_FORMAT
#---------------------------------------------------------
# tarfile constants
#---------------------------------------------------------
# File types that tarfile supports:
SUPPORTED_TYPES = (REGTYPE, AREGTYPE, LNKTYPE,
SYMTYPE, DIRTYPE, FIFOTYPE,
CONTTYPE, CHRTYPE, BLKTYPE,
GNUTYPE_LONGNAME, GNUTYPE_LONGLINK,
GNUTYPE_SPARSE)
# File types that will be treated as a regular file.
REGULAR_TYPES = (REGTYPE, AREGTYPE,
CONTTYPE, GNUTYPE_SPARSE)
# File types that are part of the GNU tar format.
GNU_TYPES = (GNUTYPE_LONGNAME, GNUTYPE_LONGLINK,
GNUTYPE_SPARSE)
# Fields from a pax header that override a TarInfo attribute.
PAX_FIELDS = ("path", "linkpath", "size", "mtime",
"uid", "gid", "uname", "gname")
# Fields in a pax header that are numbers, all other fields
# are treated as strings.
PAX_NUMBER_FIELDS = {
"atime": float,
"ctime": float,
"mtime": float,
"uid": int,
"gid": int,
"size": int
}
#---------------------------------------------------------
# Bits used in the mode field, values in octal.
#---------------------------------------------------------
S_IFLNK = 0120000 # symbolic link
S_IFREG = 0100000 # regular file
S_IFBLK = 0060000 # block device
S_IFDIR = 0040000 # directory
S_IFCHR = 0020000 # character device
S_IFIFO = 0010000 # fifo
TSUID = 04000 # set UID on execution
TSGID = 02000 # set GID on execution
TSVTX = 01000 # reserved
TUREAD = 0400 # read by owner
TUWRITE = 0200 # write by owner
TUEXEC = 0100 # execute/search by owner
TGREAD = 0040 # read by group
TGWRITE = 0020 # write by group
TGEXEC = 0010 # execute/search by group
TOREAD = 0004 # read by other
TOWRITE = 0002 # write by other
TOEXEC = 0001 # execute/search by other
#---------------------------------------------------------
# initialization
#---------------------------------------------------------
ENCODING = sys.getfilesystemencoding()
if ENCODING is None:
ENCODING = sys.getdefaultencoding()
#---------------------------------------------------------
# Some useful functions
#---------------------------------------------------------
def stn(s, length):
"""Convert a python string to a null-terminated string buffer.
"""
return s[:length] + (length - len(s)) * NUL
def nts(s):
"""Convert a null-terminated string field to a python string.
"""
# Use the string up to the first null char.
p = s.find("\0")
if p == -1:
return s
return s[:p]
def nti(s):
"""Convert a number field to a python number.
"""
# There are two possible encodings for a number field, see
# itn() below.
if s[0] != chr(0200):
try:
n = int(nts(s).strip() or "0", 8)
except ValueError:
raise InvalidHeaderError("invalid header")
else:
n = 0L
for i in xrange(len(s) - 1):
n <<= 8
n += ord(s[i + 1])
return n
def itn(n, digits=8, format=DEFAULT_FORMAT):
"""Convert a python number to a number field.
"""
# POSIX 1003.1-1988 requires numbers to be encoded as a string of
# octal digits followed by a null-byte, this allows values up to
# (8**(digits-1))-1. GNU tar allows storing numbers greater than
# that if necessary. A leading 0200 byte indicates this particular
# encoding, the following digits-1 bytes are a big-endian
# representation. This allows values up to (256**(digits-1))-1.
if 0 <= n < 8 ** (digits - 1):
s = "%0*o" % (digits - 1, n) + NUL
else:
if format != GNU_FORMAT or n >= 256 ** (digits - 1):
raise ValueError("overflow in number field")
if n < 0:
# XXX We mimic GNU tar's behaviour with negative numbers,
# this could raise OverflowError.
n = struct.unpack("L", struct.pack("l", n))[0]
s = ""
for i in xrange(digits - 1):
s = chr(n & 0377) + s
n >>= 8
s = chr(0200) + s
return s
def uts(s, encoding, errors):
"""Convert a unicode object to a string.
"""
if errors == "utf-8":
# An extra error handler similar to the -o invalid=UTF-8 option
# in POSIX.1-2001. Replace untranslatable characters with their
# UTF-8 representation.
try:
return s.encode(encoding, "strict")
except UnicodeEncodeError:
x = []
for c in s:
try:
x.append(c.encode(encoding, "strict"))
except UnicodeEncodeError:
x.append(c.encode("utf8"))
return "".join(x)
else:
return s.encode(encoding, errors)
def calc_chksums(buf):
"""Calculate the checksum for a member's header by summing up all
characters except for the chksum field which is treated as if
it was filled with spaces. According to the GNU tar sources,
some tars (Sun and NeXT) calculate chksum with signed char,
which will be different if there are chars in the buffer with
the high bit set. So we calculate two checksums, unsigned and
signed.
"""
unsigned_chksum = 256 + sum(struct.unpack("148B", buf[:148]) + struct.unpack("356B", buf[156:512]))
signed_chksum = 256 + sum(struct.unpack("148b", buf[:148]) + struct.unpack("356b", buf[156:512]))
return unsigned_chksum, signed_chksum
def copyfileobj(src, dst, length=None):
"""Copy length bytes from fileobj src to fileobj dst.
If length is None, copy the entire content.
"""
if length == 0:
return
if length is None:
shutil.copyfileobj(src, dst)
return
BUFSIZE = 16 * 1024
blocks, remainder = divmod(length, BUFSIZE)
for b in xrange(blocks):
buf = src.read(BUFSIZE)
if len(buf) < BUFSIZE:
raise IOError("end of file reached")
dst.write(buf)
if remainder != 0:
buf = src.read(remainder)
if len(buf) < remainder:
raise IOError("end of file reached")
dst.write(buf)
return
filemode_table = (
((S_IFLNK, "l"),
(S_IFREG, "-"),
(S_IFBLK, "b"),
(S_IFDIR, "d"),
(S_IFCHR, "c"),
(S_IFIFO, "p")),
((TUREAD, "r"),),
((TUWRITE, "w"),),
((TUEXEC|TSUID, "s"),
(TSUID, "S"),
(TUEXEC, "x")),
((TGREAD, "r"),),
((TGWRITE, "w"),),
((TGEXEC|TSGID, "s"),
(TSGID, "S"),
(TGEXEC, "x")),
((TOREAD, "r"),),
((TOWRITE, "w"),),
((TOEXEC|TSVTX, "t"),
(TSVTX, "T"),
(TOEXEC, "x"))
)
def filemode(mode):
"""Convert a file's mode to a string of the form
-rwxrwxrwx.
Used by TarFile.list()
"""
perm = []
for table in filemode_table:
for bit, char in table:
if mode & bit == bit:
perm.append(char)
break
else:
perm.append("-")
return "".join(perm)
class TarError(Exception):
"""Base exception."""
pass
class ExtractError(TarError):
"""General exception for extract errors."""
pass
class ReadError(TarError):
"""Exception for unreadable tar archives."""
pass
class CompressionError(TarError):
"""Exception for unavailable compression methods."""
pass
class StreamError(TarError):
"""Exception for unsupported operations on stream-like TarFiles."""
pass
class HeaderError(TarError):
"""Base exception for header errors."""
pass
class EmptyHeaderError(HeaderError):
"""Exception for empty headers."""
pass
class TruncatedHeaderError(HeaderError):
"""Exception for truncated headers."""
pass
class EOFHeaderError(HeaderError):
"""Exception for end of file headers."""
pass
class InvalidHeaderError(HeaderError):
"""Exception for invalid headers."""
pass
class SubsequentHeaderError(HeaderError):
"""Exception for missing and invalid extended headers."""
pass
#---------------------------
# internal stream interface
#---------------------------
class _LowLevelFile:
"""Low-level file object. Supports reading and writing.
It is used instead of a regular file object for streaming
access.
"""
def __init__(self, name, mode):
mode = {
"r": os.O_RDONLY,
"w": os.O_WRONLY | os.O_CREAT | os.O_TRUNC,
}[mode]
if hasattr(os, "O_BINARY"):
mode |= os.O_BINARY
self.fd = os.open(name, mode, 0666)
def close(self):
os.close(self.fd)
def read(self, size):
return os.read(self.fd, size)
def write(self, s):
os.write(self.fd, s)
class _Stream:
"""Class that serves as an adapter between TarFile and
a stream-like object. The stream-like object only
needs to have a read() or write() method and is accessed
blockwise. Use of gzip or bzip2 compression is possible.
A stream-like object could be for example: sys.stdin,
sys.stdout, a socket, a tape device etc.
_Stream is intended to be used only internally.
"""
def __init__(self, name, mode, comptype, fileobj, bufsize):
"""Construct a _Stream object.
"""
self._extfileobj = True
if fileobj is None:
fileobj = _LowLevelFile(name, mode)
self._extfileobj = False
if comptype == '*':
# Enable transparent compression detection for the
# stream interface
fileobj = _StreamProxy(fileobj)
comptype = fileobj.getcomptype()
self.name = name or ""
self.mode = mode
self.comptype = comptype
self.fileobj = fileobj
self.bufsize = bufsize
self.buf = ""
self.pos = 0L
self.closed = False
try:
if comptype == "gz":
try:
import zlib
except ImportError:
raise CompressionError("zlib module is not available")
self.zlib = zlib
self.crc = zlib.crc32("") & 0xffffffffL
if mode == "r":
self._init_read_gz()
else:
self._init_write_gz()
elif comptype == "bz2":
try:
import bz2
except ImportError:
raise CompressionError("bz2 module is not available")
if mode == "r":
self.dbuf = ""
self.cmp = bz2.BZ2Decompressor()
else:
self.cmp = bz2.BZ2Compressor()
except:
if not self._extfileobj:
self.fileobj.close()
self.closed = True
raise
def __del__(self):
if hasattr(self, "closed") and not self.closed:
self.close()
def _init_write_gz(self):
"""Initialize for writing with gzip compression.
"""
self.cmp = self.zlib.compressobj(9, self.zlib.DEFLATED,
-self.zlib.MAX_WBITS,
self.zlib.DEF_MEM_LEVEL,
0)
timestamp = struct.pack("<L", long(time.time()))
self.__write("\037\213\010\010%s\002\377" % timestamp)
if type(self.name) is unicode:
self.name = self.name.encode("iso-8859-1", "replace")
if self.name.endswith(".gz"):
self.name = self.name[:-3]
self.__write(self.name + NUL)
def write(self, s):
"""Write string s to the stream.
"""
if self.comptype == "gz":
self.crc = self.zlib.crc32(s, self.crc) & 0xffffffffL
self.pos += len(s)
if self.comptype != "tar":
s = self.cmp.compress(s)
self.__write(s)
def __write(self, s):
"""Write string s to the stream if a whole new block
is ready to be written.
"""
self.buf += s
while len(self.buf) > self.bufsize:
self.fileobj.write(self.buf[:self.bufsize])
self.buf = self.buf[self.bufsize:]
def close(self):
"""Close the _Stream object. No operation should be
done on it afterwards.
"""
if self.closed:
return
self.closed = True
try:
if self.mode == "w" and self.comptype != "tar":
self.buf += self.cmp.flush()
if self.mode == "w" and self.buf:
self.fileobj.write(self.buf)
self.buf = ""
if self.comptype == "gz":
# The native zlib crc is an unsigned 32-bit integer, but
# the Python wrapper implicitly casts that to a signed C
# long. So, on a 32-bit box self.crc may "look negative",
# while the same crc on a 64-bit box may "look positive".
# To avoid irksome warnings from the `struct` module, force
# it to look positive on all boxes.
self.fileobj.write(struct.pack("<L", self.crc & 0xffffffffL))
self.fileobj.write(struct.pack("<L", self.pos & 0xffffFFFFL))
finally:
if not self._extfileobj:
self.fileobj.close()
def _init_read_gz(self):
"""Initialize for reading a gzip compressed fileobj.
"""
self.cmp = self.zlib.decompressobj(-self.zlib.MAX_WBITS)
self.dbuf = ""
# taken from gzip.GzipFile with some alterations
if self.__read(2) != "\037\213":
raise ReadError("not a gzip file")
if self.__read(1) != "\010":
raise CompressionError("unsupported compression method")
flag = ord(self.__read(1))
self.__read(6)
if flag & 4:
xlen = ord(self.__read(1)) + 256 * ord(self.__read(1))
self.read(xlen)
if flag & 8:
while True:
s = self.__read(1)
if not s or s == NUL:
break
if flag & 16:
while True:
s = self.__read(1)
if not s or s == NUL:
break
if flag & 2:
self.__read(2)
def tell(self):
"""Return the stream's file pointer position.
"""
return self.pos
def seek(self, pos=0):
"""Set the stream's file pointer to pos. Negative seeking
is forbidden.
"""
if pos - self.pos >= 0:
blocks, remainder = divmod(pos - self.pos, self.bufsize)
for i in xrange(blocks):
self.read(self.bufsize)
self.read(remainder)
else:
raise StreamError("seeking backwards is not allowed")
return self.pos
def read(self, size=None):
"""Return the next size number of bytes from the stream.
If size is not defined, return all bytes of the stream
up to EOF.
"""
if size is None:
t = []
while True:
buf = self._read(self.bufsize)
if not buf:
break
t.append(buf)
buf = "".join(t)
else:
buf = self._read(size)
self.pos += len(buf)
return buf
def _read(self, size):
"""Return size bytes from the stream.
"""
if self.comptype == "tar":
return self.__read(size)
c = len(self.dbuf)
t = [self.dbuf]
while c < size:
buf = self.__read(self.bufsize)
if not buf:
break
try:
buf = self.cmp.decompress(buf)
except IOError:
raise ReadError("invalid compressed data")
t.append(buf)
c += len(buf)
t = "".join(t)
self.dbuf = t[size:]
return t[:size]
def __read(self, size):
"""Return size bytes from stream. If internal buffer is empty,
read another block from the stream.
"""
c = len(self.buf)
t = [self.buf]
while c < size:
buf = self.fileobj.read(self.bufsize)
if not buf:
break
t.append(buf)
c += len(buf)
t = "".join(t)
self.buf = t[size:]
return t[:size]
# class _Stream
class _StreamProxy(object):
"""Small proxy class that enables transparent compression
detection for the Stream interface (mode 'r|*').
"""
def __init__(self, fileobj):
self.fileobj = fileobj
self.buf = self.fileobj.read(BLOCKSIZE)
def read(self, size):
self.read = self.fileobj.read
return self.buf
def getcomptype(self):
if self.buf.startswith("\037\213\010"):
return "gz"
if self.buf[0:3] == "BZh" and self.buf[4:10] == "1AY&SY":
return "bz2"
return "tar"
def close(self):
self.fileobj.close()
# class StreamProxy
class _BZ2Proxy(object):
"""Small proxy class that enables external file object
support for "r:bz2" and "w:bz2" modes. This is actually
a workaround for a limitation in bz2 module's BZ2File
class which (unlike gzip.GzipFile) has no support for
a file object argument.
"""
blocksize = 16 * 1024
def __init__(self, fileobj, mode):
self.fileobj = fileobj
self.mode = mode
self.name = getattr(self.fileobj, "name", None)
self.init()
def init(self):
import bz2
self.pos = 0
if self.mode == "r":
self.bz2obj = bz2.BZ2Decompressor()
self.fileobj.seek(0)
self.buf = ""
else:
self.bz2obj = bz2.BZ2Compressor()
def read(self, size):
b = [self.buf]
x = len(self.buf)
while x < size:
raw = self.fileobj.read(self.blocksize)
if not raw:
break
data = self.bz2obj.decompress(raw)
b.append(data)
x += len(data)
self.buf = "".join(b)
buf = self.buf[:size]
self.buf = self.buf[size:]
self.pos += len(buf)
return buf
def seek(self, pos):
if pos < self.pos:
self.init()
self.read(pos - self.pos)
def tell(self):
return self.pos
def write(self, data):
self.pos += len(data)
raw = self.bz2obj.compress(data)
self.fileobj.write(raw)
def close(self):
if self.mode == "w":
raw = self.bz2obj.flush()
self.fileobj.write(raw)
# class _BZ2Proxy
#------------------------
# Extraction file object
#------------------------
class _FileInFile(object):
"""A thin wrapper around an existing file object that
provides a part of its data as an individual file
object.
"""
def __init__(self, fileobj, offset, size, sparse=None):
self.fileobj = fileobj
self.offset = offset
self.size = size
self.sparse = sparse
self.position = 0
def tell(self):
"""Return the current file position.
"""
return self.position
def seek(self, position):
"""Seek to a position in the file.
"""
self.position = position
def read(self, size=None):
"""Read data from the file.
"""
if size is None:
size = self.size - self.position
else:
size = min(size, self.size - self.position)
if self.sparse is None:
return self.readnormal(size)
else:
return self.readsparse(size)
def __read(self, size):
buf = self.fileobj.read(size)
if len(buf) != size:
raise ReadError("unexpected end of data")
return buf
def readnormal(self, size):
"""Read operation for regular files.
"""
self.fileobj.seek(self.offset + self.position)
self.position += size
return self.__read(size)
def readsparse(self, size):
"""Read operation for sparse files.
"""
data = []
while size > 0:
buf = self.readsparsesection(size)
if not buf:
break
size -= len(buf)
data.append(buf)
return "".join(data)
def readsparsesection(self, size):
"""Read a single section of a sparse file.
"""
section = self.sparse.find(self.position)
if section is None:
return ""
size = min(size, section.offset + section.size - self.position)
if isinstance(section, _data):
realpos = section.realpos + self.position - section.offset
self.fileobj.seek(self.offset + realpos)
self.position += size
return self.__read(size)
else:
self.position += size
return NUL * size
#class _FileInFile
class ExFileObject(object):
"""File-like object for reading an archive member.
Is returned by TarFile.extractfile().
"""
blocksize = 1024
def __init__(self, tarfile, tarinfo):
self.fileobj = _FileInFile(tarfile.fileobj,
tarinfo.offset_data,
tarinfo.size,
getattr(tarinfo, "sparse", None))
self.name = tarinfo.name
self.mode = "r"
self.closed = False
self.size = tarinfo.size
self.position = 0
self.buffer = ""
def read(self, size=None):
"""Read at most size bytes from the file. If size is not
present or None, read all data until EOF is reached.
"""
if self.closed:
raise ValueError("I/O operation on closed file")
buf = ""
if self.buffer:
if size is None:
buf = self.buffer
self.buffer = ""
else:
buf = self.buffer[:size]
self.buffer = self.buffer[size:]
if size is None:
buf += self.fileobj.read()
else:
buf += self.fileobj.read(size - len(buf))
self.position += len(buf)
return buf
def readline(self, size=-1):
"""Read one entire line from the file. If size is present
and non-negative, return a string with at most that
size, which may be an incomplete line.
"""
if self.closed:
raise ValueError("I/O operation on closed file")
if "\n" in self.buffer:
pos = self.buffer.find("\n") + 1
else:
buffers = [self.buffer]
while True:
buf = self.fileobj.read(self.blocksize)
buffers.append(buf)
if not buf or "\n" in buf:
self.buffer = "".join(buffers)
pos = self.buffer.find("\n") + 1
if pos == 0:
# no newline found.
pos = len(self.buffer)
break
if size != -1:
pos = min(size, pos)
buf = self.buffer[:pos]
self.buffer = self.buffer[pos:]
self.position += len(buf)
return buf
def readlines(self):
"""Return a list with all remaining lines.
"""
result = []
while True:
line = self.readline()
if not line: break
result.append(line)
return result
def tell(self):
"""Return the current file position.
"""
if self.closed:
raise ValueError("I/O operation on closed file")
return self.position
def seek(self, pos, whence=os.SEEK_SET):
"""Seek to a position in the file.
"""
if self.closed:
raise ValueError("I/O operation on closed file")
if whence == os.SEEK_SET:
self.position = min(max(pos, 0), self.size)
elif whence == os.SEEK_CUR:
if pos < 0:
self.position = max(self.position + pos, 0)
else:
self.position = min(self.position + pos, self.size)
elif whence == os.SEEK_END:
self.position = max(min(self.size + pos, self.size), 0)
else:
raise ValueError("Invalid argument")
self.buffer = ""
self.fileobj.seek(self.position)
def close(self):
"""Close the file object.
"""
self.closed = True
def __iter__(self):
"""Get an iterator over the file's lines.
"""
while True:
line = self.readline()
if not line:
break
yield line
#class ExFileObject
#------------------
# Exported Classes
#------------------
class TarInfo(object):
"""Informational class which holds the details about an
archive member given by a tar header block.
TarInfo objects are returned by TarFile.getmember(),
TarFile.getmembers() and TarFile.gettarinfo() and are
usually created internally.
"""
def __init__(self, name=""):
"""Construct a TarInfo object. name is the optional name
of the member.
"""
self.name = name # member name
self.mode = 0644 # file permissions
self.uid = 0 # user id
self.gid = 0 # group id
self.size = 0 # file size
self.mtime = 0 # modification time
self.chksum = 0 # header checksum
self.type = REGTYPE # member type
self.linkname = "" # link name
self.uname = "" # user name
self.gname = "" # group name
self.devmajor = 0 # device major number
self.devminor = 0 # device minor number
self.offset = 0 # the tar header starts here
self.offset_data = 0 # the file's data starts here
self.pax_headers = {} # pax header information
# In pax headers the "name" and "linkname" field are called
# "path" and "linkpath".
def _getpath(self):
return self.name
def _setpath(self, name):
self.name = name
path = property(_getpath, _setpath)
def _getlinkpath(self):
return self.linkname
def _setlinkpath(self, linkname):
self.linkname = linkname
linkpath = property(_getlinkpath, _setlinkpath)
def __repr__(self):
return "<%s %r at %#x>" % (self.__class__.__name__,self.name,id(self))
def get_info(self, encoding, errors):
"""Return the TarInfo's attributes as a dictionary.
"""
info = {
"name": self.name,
"mode": self.mode & 07777,
"uid": self.uid,
"gid": self.gid,
"size": self.size,
"mtime": self.mtime,
"chksum": self.chksum,
"type": self.type,
"linkname": self.linkname,
"uname": self.uname,
"gname": self.gname,
"devmajor": self.devmajor,
"devminor": self.devminor
}
if info["type"] == DIRTYPE and not info["name"].endswith("/"):
info["name"] += "/"
for key in ("name", "linkname", "uname", "gname"):
if type(info[key]) is unicode:
info[key] = info[key].encode(encoding, errors)
return info
def tobuf(self, format=DEFAULT_FORMAT, encoding=ENCODING, errors="strict"):
"""Return a tar header as a string of 512 byte blocks.
"""
info = self.get_info(encoding, errors)
if format == USTAR_FORMAT:
return self.create_ustar_header(info)
elif format == GNU_FORMAT:
return self.create_gnu_header(info)
elif format == PAX_FORMAT:
return self.create_pax_header(info, encoding, errors)
else:
raise ValueError("invalid format")
def create_ustar_header(self, info):
"""Return the object as a ustar header block.
"""
info["magic"] = POSIX_MAGIC
if len(info["linkname"]) > LENGTH_LINK:
raise ValueError("linkname is too long")
if len(info["name"]) > LENGTH_NAME:
info["prefix"], info["name"] = self._posix_split_name(info["name"])
return self._create_header(info, USTAR_FORMAT)
def create_gnu_header(self, info):
"""Return the object as a GNU header block sequence.
"""
info["magic"] = GNU_MAGIC
buf = ""
if len(info["linkname"]) > LENGTH_LINK:
buf += self._create_gnu_long_header(info["linkname"], GNUTYPE_LONGLINK)
if len(info["name"]) > LENGTH_NAME:
buf += self._create_gnu_long_header(info["name"], GNUTYPE_LONGNAME)
return buf + self._create_header(info, GNU_FORMAT)
def create_pax_header(self, info, encoding, errors):
"""Return the object as a ustar header block. If it cannot be
represented this way, prepend a pax extended header sequence
with supplement information.
"""
info["magic"] = POSIX_MAGIC
pax_headers = self.pax_headers.copy()
# Test string fields for values that exceed the field length or cannot
# be represented in ASCII encoding.
for name, hname, length in (
("name", "path", LENGTH_NAME), ("linkname", "linkpath", LENGTH_LINK),
("uname", "uname", 32), ("gname", "gname", 32)):
if hname in pax_headers:
# The pax header has priority.
continue
val = info[name].decode(encoding, errors)
# Try to encode the string as ASCII.
try:
val.encode("ascii")
except UnicodeEncodeError:
pax_headers[hname] = val
continue
if len(info[name]) > length:
pax_headers[hname] = val
# Test number fields for values that exceed the field limit or values
# that like to be stored as float.
for name, digits in (("uid", 8), ("gid", 8), ("size", 12), ("mtime", 12)):
if name in pax_headers:
# The pax header has priority. Avoid overflow.
info[name] = 0
continue
val = info[name]
if not 0 <= val < 8 ** (digits - 1) or isinstance(val, float):
pax_headers[name] = unicode(val)
info[name] = 0
# Create a pax extended header if necessary.
if pax_headers:
buf = self._create_pax_generic_header(pax_headers)
else:
buf = ""
return buf + self._create_header(info, USTAR_FORMAT)
@classmethod
def create_pax_global_header(cls, pax_headers):
"""Return the object as a pax global header block sequence.
"""
return cls._create_pax_generic_header(pax_headers, type=XGLTYPE)
def _posix_split_name(self, name):
"""Split a name longer than 100 chars into a prefix
and a name part.
"""
prefix = name[:LENGTH_PREFIX + 1]
while prefix and prefix[-1] != "/":
prefix = prefix[:-1]
name = name[len(prefix):]
prefix = prefix[:-1]
if not prefix or len(name) > LENGTH_NAME:
raise ValueError("name is too long")
return prefix, name
@staticmethod
def _create_header(info, format):
"""Return a header block. info is a dictionary with file
information, format must be one of the *_FORMAT constants.
"""
parts = [
stn(info.get("name", ""), 100),
itn(info.get("mode", 0) & 07777, 8, format),
itn(info.get("uid", 0), 8, format),
itn(info.get("gid", 0), 8, format),
itn(info.get("size", 0), 12, format),
itn(info.get("mtime", 0), 12, format),
" ", # checksum field
info.get("type", REGTYPE),
stn(info.get("linkname", ""), 100),
stn(info.get("magic", POSIX_MAGIC), 8),
stn(info.get("uname", ""), 32),
stn(info.get("gname", ""), 32),
itn(info.get("devmajor", 0), 8, format),
itn(info.get("devminor", 0), 8, format),
stn(info.get("prefix", ""), 155)
]
buf = struct.pack("%ds" % BLOCKSIZE, "".join(parts))
chksum = calc_chksums(buf[-BLOCKSIZE:])[0]
buf = buf[:-364] + "%06o\0" % chksum + buf[-357:]
return buf
@staticmethod
def _create_payload(payload):
"""Return the string payload filled with zero bytes
up to the next 512 byte border.
"""
blocks, remainder = divmod(len(payload), BLOCKSIZE)
if remainder > 0:
payload += (BLOCKSIZE - remainder) * NUL
return payload
@classmethod
def _create_gnu_long_header(cls, name, type):
"""Return a GNUTYPE_LONGNAME or GNUTYPE_LONGLINK sequence
for name.
"""
name += NUL
info = {}
info["name"] = "././@LongLink"
info["type"] = type
info["size"] = len(name)
info["magic"] = GNU_MAGIC
# create extended header + name blocks.
return cls._create_header(info, USTAR_FORMAT) + \
cls._create_payload(name)
@classmethod
def _create_pax_generic_header(cls, pax_headers, type=XHDTYPE):
"""Return a POSIX.1-2001 extended or global header sequence
that contains a list of keyword, value pairs. The values
must be unicode objects.
"""
records = []
for keyword, value in pax_headers.iteritems():
keyword = keyword.encode("utf8")
value = value.encode("utf8")
l = len(keyword) + len(value) + 3 # ' ' + '=' + '\n'
n = p = 0
while True:
n = l + len(str(p))
if n == p:
break
p = n
records.append("%d %s=%s\n" % (p, keyword, value))
records = "".join(records)
# We use a hardcoded "././@PaxHeader" name like star does
# instead of the one that POSIX recommends.
info = {}
info["name"] = "././@PaxHeader"
info["type"] = type
info["size"] = len(records)
info["magic"] = POSIX_MAGIC
# Create pax header + record blocks.
return cls._create_header(info, USTAR_FORMAT) + \
cls._create_payload(records)
@classmethod
def frombuf(cls, buf):
"""Construct a TarInfo object from a 512 byte string buffer.
"""
if len(buf) == 0:
raise EmptyHeaderError("empty header")
if len(buf) != BLOCKSIZE:
raise TruncatedHeaderError("truncated header")
if buf.count(NUL) == BLOCKSIZE:
raise EOFHeaderError("end of file header")
chksum = nti(buf[148:156])
if chksum not in calc_chksums(buf):
raise InvalidHeaderError("bad checksum")
obj = cls()
obj.buf = buf
obj.name = nts(buf[0:100])
obj.mode = nti(buf[100:108])
obj.uid = nti(buf[108:116])
obj.gid = nti(buf[116:124])
obj.size = nti(buf[124:136])
obj.mtime = nti(buf[136:148])
obj.chksum = chksum
obj.type = buf[156:157]
obj.linkname = nts(buf[157:257])
obj.uname = nts(buf[265:297])
obj.gname = nts(buf[297:329])
obj.devmajor = nti(buf[329:337])
obj.devminor = nti(buf[337:345])
prefix = nts(buf[345:500])
# Old V7 tar format represents a directory as a regular
# file with a trailing slash.
if obj.type == AREGTYPE and obj.name.endswith("/"):
obj.type = DIRTYPE
# Remove redundant slashes from directories.
if obj.isdir():
obj.name = obj.name.rstrip("/")
# Reconstruct a ustar longname.
if prefix and obj.type not in GNU_TYPES:
obj.name = prefix + "/" + obj.name
return obj
@classmethod
def fromtarfile(cls, tarfile):
"""Return the next TarInfo object from TarFile object
tarfile.
"""
buf = tarfile.fileobj.read(BLOCKSIZE)
obj = cls.frombuf(buf)
obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
return obj._proc_member(tarfile)
#--------------------------------------------------------------------------
# The following are methods that are called depending on the type of a
# member. The entry point is _proc_member() which can be overridden in a
# subclass to add custom _proc_*() methods. A _proc_*() method MUST
# implement the following
# operations:
# 1. Set self.offset_data to the position where the data blocks begin,
# if there is data that follows.
# 2. Set tarfile.offset to the position where the next member's header will
# begin.
# 3. Return self or another valid TarInfo object.
def _proc_member(self, tarfile):
"""Choose the right processing method depending on
the type and call it.
"""
if self.type in (GNUTYPE_LONGNAME, GNUTYPE_LONGLINK):
return self._proc_gnulong(tarfile)
elif self.type == GNUTYPE_SPARSE:
return self._proc_sparse(tarfile)
elif self.type in (XHDTYPE, XGLTYPE, SOLARIS_XHDTYPE):
return self._proc_pax(tarfile)
else:
return self._proc_builtin(tarfile)
def _proc_builtin(self, tarfile):
"""Process a builtin type or an unknown type which
will be treated as a regular file.
"""
self.offset_data = tarfile.fileobj.tell()
offset = self.offset_data
if self.isreg() or self.type not in SUPPORTED_TYPES:
# Skip the following data blocks.
offset += self._block(self.size)
tarfile.offset = offset
# Patch the TarInfo object with saved global
# header information.
self._apply_pax_info(tarfile.pax_headers, tarfile.encoding, tarfile.errors)
return self
def _proc_gnulong(self, tarfile):
"""Process the blocks that hold a GNU longname
or longlink member.
"""
buf = tarfile.fileobj.read(self._block(self.size))
# Fetch the next header and process it.
try:
next = self.fromtarfile(tarfile)
except HeaderError:
raise SubsequentHeaderError("missing or bad subsequent header")
# Patch the TarInfo object from the next header with
# the longname information.
next.offset = self.offset
if self.type == GNUTYPE_LONGNAME:
next.name = nts(buf)
elif self.type == GNUTYPE_LONGLINK:
next.linkname = nts(buf)
return next
def _proc_sparse(self, tarfile):
"""Process a GNU sparse header plus extra headers.
"""
buf = self.buf
sp = _ringbuffer()
pos = 386
lastpos = 0L
realpos = 0L
# There are 4 possible sparse structs in the
# first header.
for i in xrange(4):
try:
offset = nti(buf[pos:pos + 12])
numbytes = nti(buf[pos + 12:pos + 24])
except ValueError:
break
if offset > lastpos:
sp.append(_hole(lastpos, offset - lastpos))
sp.append(_data(offset, numbytes, realpos))
realpos += numbytes
lastpos = offset + numbytes
pos += 24
isextended = ord(buf[482])
origsize = nti(buf[483:495])
# If the isextended flag is given,
# there are extra headers to process.
while isextended == 1:
buf = tarfile.fileobj.read(BLOCKSIZE)
pos = 0
for i in xrange(21):
try:
offset = nti(buf[pos:pos + 12])
numbytes = nti(buf[pos + 12:pos + 24])
except ValueError:
break
if offset > lastpos:
sp.append(_hole(lastpos, offset - lastpos))
sp.append(_data(offset, numbytes, realpos))
realpos += numbytes
lastpos = offset + numbytes
pos += 24
isextended = ord(buf[504])
if lastpos < origsize:
sp.append(_hole(lastpos, origsize - lastpos))
self.sparse = sp
self.offset_data = tarfile.fileobj.tell()
tarfile.offset = self.offset_data + self._block(self.size)
self.size = origsize
return self
def _proc_pax(self, tarfile):
"""Process an extended or global header as described in
POSIX.1-2001.
"""
# Read the header information.
buf = tarfile.fileobj.read(self._block(self.size))
# A pax header stores supplemental information for either
# the following file (extended) or all following files
# (global).
if self.type == XGLTYPE:
pax_headers = tarfile.pax_headers
else:
pax_headers = tarfile.pax_headers.copy()
# Parse pax header information. A record looks like that:
# "%d %s=%s\n" % (length, keyword, value). length is the size
# of the complete record including the length field itself and
# the newline. keyword and value are both UTF-8 encoded strings.
regex = re.compile(r"(\d+) ([^=]+)=", re.U)
pos = 0
while True:
match = regex.match(buf, pos)
if not match:
break
length, keyword = match.groups()
length = int(length)
value = buf[match.end(2) + 1:match.start(1) + length - 1]
keyword = keyword.decode("utf8")
value = value.decode("utf8")
pax_headers[keyword] = value
pos += length
# Fetch the next header.
try:
next = self.fromtarfile(tarfile)
except HeaderError:
raise SubsequentHeaderError("missing or bad subsequent header")
if self.type in (XHDTYPE, SOLARIS_XHDTYPE):
# Patch the TarInfo object with the extended header info.
next._apply_pax_info(pax_headers, tarfile.encoding, tarfile.errors)
next.offset = self.offset
if "size" in pax_headers:
# If the extended header replaces the size field,
# we need to recalculate the offset where the next
# header starts.
offset = next.offset_data
if next.isreg() or next.type not in SUPPORTED_TYPES:
offset += next._block(next.size)
tarfile.offset = offset
return next
def _apply_pax_info(self, pax_headers, encoding, errors):
"""Replace fields with supplemental information from a previous
pax extended or global header.
"""
for keyword, value in pax_headers.iteritems():
if keyword not in PAX_FIELDS:
continue
if keyword == "path":
value = value.rstrip("/")
if keyword in PAX_NUMBER_FIELDS:
try:
value = PAX_NUMBER_FIELDS[keyword](value)
except ValueError:
value = 0
else:
value = uts(value, encoding, errors)
setattr(self, keyword, value)
self.pax_headers = pax_headers.copy()
def _block(self, count):
"""Round up a byte count by BLOCKSIZE and return it,
e.g. _block(834) => 1024.
"""
blocks, remainder = divmod(count, BLOCKSIZE)
if remainder:
blocks += 1
return blocks * BLOCKSIZE
def isreg(self):
return self.type in REGULAR_TYPES
def isfile(self):
return self.isreg()
def isdir(self):
return self.type == DIRTYPE
def issym(self):
return self.type == SYMTYPE
def islnk(self):
return self.type == LNKTYPE
def ischr(self):
return self.type == CHRTYPE
def isblk(self):
return self.type == BLKTYPE
def isfifo(self):
return self.type == FIFOTYPE
def issparse(self):
return self.type == GNUTYPE_SPARSE
def isdev(self):
return self.type in (CHRTYPE, BLKTYPE, FIFOTYPE)
# class TarInfo
class TarFile(object):
"""The TarFile Class provides an interface to tar archives.
"""
debug = 0 # May be set from 0 (no msgs) to 3 (all msgs)
dereference = False # If true, add content of linked file to the
# tar file, else the link.
ignore_zeros = False # If true, skips empty or invalid blocks and
# continues processing.
errorlevel = 1 # If 0, fatal errors only appear in debug
# messages (if debug >= 0). If > 0, errors
# are passed to the caller as exceptions.
format = DEFAULT_FORMAT # The format to use when creating an archive.
encoding = ENCODING # Encoding for 8-bit character strings.
errors = None # Error handler for unicode conversion.
tarinfo = TarInfo # The default TarInfo class to use.
fileobject = ExFileObject # The default ExFileObject class to use.
def __init__(self, name=None, mode="r", fileobj=None, format=None,
tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,
errors=None, pax_headers=None, debug=None, errorlevel=None):
"""Open an (uncompressed) tar archive `name'. `mode' is either 'r' to
read from an existing archive, 'a' to append data to an existing
file or 'w' to create a new file overwriting an existing one. `mode'
defaults to 'r'.
If `fileobj' is given, it is used for reading or writing data. If it
can be determined, `mode' is overridden by `fileobj's mode.
`fileobj' is not closed, when TarFile is closed.
"""
modes = {"r": "rb", "a": "r+b", "w": "wb"}
if mode not in modes:
raise ValueError("mode must be 'r', 'a' or 'w'")
self.mode = mode
self._mode = modes[mode]
if not fileobj:
if self.mode == "a" and not os.path.exists(name):
# Create nonexistent files in append mode.
self.mode = "w"
self._mode = "wb"
fileobj = bltn_open(name, self._mode)
self._extfileobj = False
else:
if name is None and hasattr(fileobj, "name"):
name = fileobj.name
if hasattr(fileobj, "mode"):
self._mode = fileobj.mode
self._extfileobj = True
self.name = os.path.abspath(name) if name else None
self.fileobj = fileobj
# Init attributes.
if format is not None:
self.format = format
if tarinfo is not None:
self.tarinfo = tarinfo
if dereference is not None:
self.dereference = dereference
if ignore_zeros is not None:
self.ignore_zeros = ignore_zeros
if encoding is not None:
self.encoding = encoding
if errors is not None:
self.errors = errors
elif mode == "r":
self.errors = "utf-8"
else:
self.errors = "strict"
if pax_headers is not None and self.format == PAX_FORMAT:
self.pax_headers = pax_headers
else:
self.pax_headers = {}
if debug is not None:
self.debug = debug
if errorlevel is not None:
self.errorlevel = errorlevel
# Init datastructures.
self.closed = False
self.members = [] # list of members as TarInfo objects
self._loaded = False # flag if all members have been read
self.offset = self.fileobj.tell()
# current position in the archive file
self.inodes = {} # dictionary caching the inodes of
# archive members already added
try:
if self.mode == "r":
self.firstmember = None
self.firstmember = self.next()
if self.mode == "a":
# Move to the end of the archive,
# before the first empty block.
while True:
self.fileobj.seek(self.offset)
try:
tarinfo = self.tarinfo.fromtarfile(self)
self.members.append(tarinfo)
except EOFHeaderError:
self.fileobj.seek(self.offset)
break
except HeaderError, e:
raise ReadError(str(e))
if self.mode in "aw":
self._loaded = True
if self.pax_headers:
buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy())
self.fileobj.write(buf)
self.offset += len(buf)
except:
if not self._extfileobj:
self.fileobj.close()
self.closed = True
raise
def _getposix(self):
return self.format == USTAR_FORMAT
def _setposix(self, value):
import warnings
warnings.warn("use the format attribute instead", DeprecationWarning,
2)
if value:
self.format = USTAR_FORMAT
else:
self.format = GNU_FORMAT
posix = property(_getposix, _setposix)
#--------------------------------------------------------------------------
# Below are the classmethods which act as alternate constructors to the
# TarFile class. The open() method is the only one that is needed for
# public use; it is the "super"-constructor and is able to select an
# adequate "sub"-constructor for a particular compression using the mapping
# from OPEN_METH.
#
# This concept allows one to subclass TarFile without losing the comfort of
# the super-constructor. A sub-constructor is registered and made available
# by adding it to the mapping in OPEN_METH.
@classmethod
def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):
"""Open a tar archive for reading, writing or appending. Return
an appropriate TarFile class.
mode:
'r' or 'r:*' open for reading with transparent compression
'r:' open for reading exclusively uncompressed
'r:gz' open for reading with gzip compression
'r:bz2' open for reading with bzip2 compression
'a' or 'a:' open for appending, creating the file if necessary
'w' or 'w:' open for writing without compression
'w:gz' open for writing with gzip compression
'w:bz2' open for writing with bzip2 compression
'r|*' open a stream of tar blocks with transparent compression
'r|' open an uncompressed stream of tar blocks for reading
'r|gz' open a gzip compressed stream of tar blocks
'r|bz2' open a bzip2 compressed stream of tar blocks
'w|' open an uncompressed stream for writing
'w|gz' open a gzip compressed stream for writing
'w|bz2' open a bzip2 compressed stream for writing
"""
if not name and not fileobj:
raise ValueError("nothing to open")
if mode in ("r", "r:*"):
# Find out which *open() is appropriate for opening the file.
def not_compressed(comptype):
return cls.OPEN_METH[comptype] == 'taropen'
for comptype in sorted(cls.OPEN_METH, key=not_compressed):
func = getattr(cls, cls.OPEN_METH[comptype])
if fileobj is not None:
saved_pos = fileobj.tell()
try:
return func(name, "r", fileobj, **kwargs)
except (ReadError, CompressionError), e:
if fileobj is not None:
fileobj.seek(saved_pos)
continue
raise ReadError("file could not be opened successfully")
elif ":" in mode:
filemode, comptype = mode.split(":", 1)
filemode = filemode or "r"
comptype = comptype or "tar"
# Select the *open() function according to
# given compression.
if comptype in cls.OPEN_METH:
func = getattr(cls, cls.OPEN_METH[comptype])
else:
raise CompressionError("unknown compression type %r" % comptype)
return func(name, filemode, fileobj, **kwargs)
elif "|" in mode:
filemode, comptype = mode.split("|", 1)
filemode = filemode or "r"
comptype = comptype or "tar"
if filemode not in ("r", "w"):
raise ValueError("mode must be 'r' or 'w'")
stream = _Stream(name, filemode, comptype, fileobj, bufsize)
try:
t = cls(name, filemode, stream, **kwargs)
except:
stream.close()
raise
t._extfileobj = False
return t
elif mode in ("a", "w"):
return cls.taropen(name, mode, fileobj, **kwargs)
raise ValueError("undiscernible mode")
@classmethod
def taropen(cls, name, mode="r", fileobj=None, **kwargs):
"""Open uncompressed tar archive name for reading or writing.
"""
if mode not in ("r", "a", "w"):
raise ValueError("mode must be 'r', 'a' or 'w'")
return cls(name, mode, fileobj, **kwargs)
@classmethod
def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
"""Open gzip compressed tar archive name for reading or writing.
Appending is not allowed.
"""
if mode not in ("r", "w"):
raise ValueError("mode must be 'r' or 'w'")
try:
import gzip
gzip.GzipFile
except (ImportError, AttributeError):
raise CompressionError("gzip module is not available")
try:
fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj)
except OSError:
if fileobj is not None and mode == 'r':
raise ReadError("not a gzip file")
raise
try:
t = cls.taropen(name, mode, fileobj, **kwargs)
except IOError:
fileobj.close()
if mode == 'r':
raise ReadError("not a gzip file")
raise
except:
fileobj.close()
raise
t._extfileobj = False
return t
@classmethod
def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
"""Open bzip2 compressed tar archive name for reading or writing.
Appending is not allowed.
"""
if mode not in ("r", "w"):
raise ValueError("mode must be 'r' or 'w'.")
try:
import bz2
except ImportError:
raise CompressionError("bz2 module is not available")
if fileobj is not None:
fileobj = _BZ2Proxy(fileobj, mode)
else:
fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel)
try:
t = cls.taropen(name, mode, fileobj, **kwargs)
except (IOError, EOFError):
fileobj.close()
if mode == 'r':
raise ReadError("not a bzip2 file")
raise
except:
fileobj.close()
raise
t._extfileobj = False
return t
# All *open() methods are registered here.
OPEN_METH = {
"tar": "taropen", # uncompressed tar
"gz": "gzopen", # gzip compressed tar
"bz2": "bz2open" # bzip2 compressed tar
}
#--------------------------------------------------------------------------
# The public methods which TarFile provides:
def close(self):
"""Close the TarFile. In write-mode, two finishing zero blocks are
appended to the archive.
"""
if self.closed:
return
self.closed = True
try:
if self.mode in "aw":
self.fileobj.write(NUL * (BLOCKSIZE * 2))
self.offset += (BLOCKSIZE * 2)
# fill up the end with zero-blocks
# (like option -b20 for tar does)
blocks, remainder = divmod(self.offset, RECORDSIZE)
if remainder > 0:
self.fileobj.write(NUL * (RECORDSIZE - remainder))
finally:
if not self._extfileobj:
self.fileobj.close()
def getmember(self, name):
"""Return a TarInfo object for member `name'. If `name' can not be
found in the archive, KeyError is raised. If a member occurs more
than once in the archive, its last occurrence is assumed to be the
most up-to-date version.
"""
tarinfo = self._getmember(name)
if tarinfo is None:
raise KeyError("filename %r not found" % name)
return tarinfo
def getmembers(self):
"""Return the members of the archive as a list of TarInfo objects. The
list has the same order as the members in the archive.
"""
self._check()
if not self._loaded: # if we want to obtain a list of
self._load() # all members, we first have to
# scan the whole archive.
return self.members
def getnames(self):
"""Return the members of the archive as a list of their names. It has
the same order as the list returned by getmembers().
"""
return [tarinfo.name for tarinfo in self.getmembers()]
def gettarinfo(self, name=None, arcname=None, fileobj=None):
"""Create a TarInfo object from the result of os.stat or equivalent
on an existing file. The file is either named by `name', or
specified as a file object `fileobj' with a file descriptor. If
given, `arcname' specifies an alternative name for the file in the
archive, otherwise, the name is taken from the 'name' attribute of
'fileobj', or the 'name' argument.
"""
self._check("aw")
# When fileobj is given, replace name by
# fileobj's real name.
if fileobj is not None:
name = fileobj.name
# Building the name of the member in the archive.
# Backward slashes are converted to forward slashes,
# Absolute paths are turned to relative paths.
if arcname is None:
arcname = name
drv, arcname = os.path.splitdrive(arcname)
arcname = arcname.replace(os.sep, "/")
arcname = arcname.lstrip("/")
# Now, fill the TarInfo object with
# information specific for the file.
tarinfo = self.tarinfo()
tarinfo.tarfile = self # Not needed
# Use os.stat or os.lstat, depending on platform
# and if symlinks shall be resolved.
if fileobj is None:
if hasattr(os, "lstat") and not self.dereference:
statres = os.lstat(name)
else:
statres = os.stat(name)
else:
statres = os.fstat(fileobj.fileno())
linkname = ""
stmd = statres.st_mode
if stat.S_ISREG(stmd):
inode = (statres.st_ino, statres.st_dev)
if not self.dereference and statres.st_nlink > 1 and \
inode in self.inodes and arcname != self.inodes[inode]:
# Is it a hardlink to an already
# archived file?
type = LNKTYPE
linkname = self.inodes[inode]
else:
# The inode is added only if its valid.
# For win32 it is always 0.
type = REGTYPE
if inode[0]:
self.inodes[inode] = arcname
elif stat.S_ISDIR(stmd):
type = DIRTYPE
elif stat.S_ISFIFO(stmd):
type = FIFOTYPE
elif stat.S_ISLNK(stmd):
type = SYMTYPE
linkname = os.readlink(name)
elif stat.S_ISCHR(stmd):
type = CHRTYPE
elif stat.S_ISBLK(stmd):
type = BLKTYPE
else:
return None
# Fill the TarInfo object with all
# information we can get.
tarinfo.name = arcname
tarinfo.mode = stmd
tarinfo.uid = statres.st_uid
tarinfo.gid = statres.st_gid
if type == REGTYPE:
tarinfo.size = statres.st_size
else:
tarinfo.size = 0L
tarinfo.mtime = statres.st_mtime
tarinfo.type = type
tarinfo.linkname = linkname
if pwd:
try:
tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0]
except KeyError:
pass
if grp:
try:
tarinfo.gname = grp.getgrgid(tarinfo.gid)[0]
except KeyError:
pass
if type in (CHRTYPE, BLKTYPE):
if hasattr(os, "major") and hasattr(os, "minor"):
tarinfo.devmajor = os.major(statres.st_rdev)
tarinfo.devminor = os.minor(statres.st_rdev)
return tarinfo
def list(self, verbose=True):
"""Print a table of contents to sys.stdout. If `verbose' is False, only
the names of the members are printed. If it is True, an `ls -l'-like
output is produced.
"""
self._check()
for tarinfo in self:
if verbose:
print filemode(tarinfo.mode),
print "%s/%s" % (tarinfo.uname or tarinfo.uid,
tarinfo.gname or tarinfo.gid),
if tarinfo.ischr() or tarinfo.isblk():
print "%10s" % ("%d,%d" \
% (tarinfo.devmajor, tarinfo.devminor)),
else:
print "%10d" % tarinfo.size,
print "%d-%02d-%02d %02d:%02d:%02d" \
% time.localtime(tarinfo.mtime)[:6],
print tarinfo.name + ("/" if tarinfo.isdir() else ""),
if verbose:
if tarinfo.issym():
print "->", tarinfo.linkname,
if tarinfo.islnk():
print "link to", tarinfo.linkname,
print
def add(self, name, arcname=None, recursive=True, exclude=None, filter=None):
"""Add the file `name' to the archive. `name' may be any type of file
(directory, fifo, symbolic link, etc.). If given, `arcname'
specifies an alternative name for the file in the archive.
Directories are added recursively by default. This can be avoided by
setting `recursive' to False. `exclude' is a function that should
return True for each filename to be excluded. `filter' is a function
that expects a TarInfo object argument and returns the changed
TarInfo object, if it returns None the TarInfo object will be
excluded from the archive.
"""
self._check("aw")
if arcname is None:
arcname = name
# Exclude pathnames.
if exclude is not None:
import warnings
warnings.warn("use the filter argument instead",
DeprecationWarning, 2)
if exclude(name):
self._dbg(2, "tarfile: Excluded %r" % name)
return
# Skip if somebody tries to archive the archive...
if self.name is not None and os.path.abspath(name) == self.name:
self._dbg(2, "tarfile: Skipped %r" % name)
return
self._dbg(1, name)
# Create a TarInfo object from the file.
tarinfo = self.gettarinfo(name, arcname)
if tarinfo is None:
self._dbg(1, "tarfile: Unsupported type %r" % name)
return
# Change or exclude the TarInfo object.
if filter is not None:
tarinfo = filter(tarinfo)
if tarinfo is None:
self._dbg(2, "tarfile: Excluded %r" % name)
return
# Append the tar header and data to the archive.
if tarinfo.isreg():
with bltn_open(name, "rb") as f:
self.addfile(tarinfo, f)
elif tarinfo.isdir():
self.addfile(tarinfo)
if recursive:
for f in os.listdir(name):
self.add(os.path.join(name, f), os.path.join(arcname, f),
recursive, exclude, filter)
else:
self.addfile(tarinfo)
def addfile(self, tarinfo, fileobj=None):
"""Add the TarInfo object `tarinfo' to the archive. If `fileobj' is
given, tarinfo.size bytes are read from it and added to the archive.
You can create TarInfo objects directly, or by using gettarinfo().
On Windows platforms, `fileobj' should always be opened with mode
'rb' to avoid irritation about the file size.
"""
self._check("aw")
tarinfo = copy.copy(tarinfo)
buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
self.fileobj.write(buf)
self.offset += len(buf)
# If there's data to follow, append it.
if fileobj is not None:
copyfileobj(fileobj, self.fileobj, tarinfo.size)
blocks, remainder = divmod(tarinfo.size, BLOCKSIZE)
if remainder > 0:
self.fileobj.write(NUL * (BLOCKSIZE - remainder))
blocks += 1
self.offset += blocks * BLOCKSIZE
self.members.append(tarinfo)
def extractall(self, path=".", members=None):
"""Extract all members from the archive to the current working
directory and set owner, modification time and permissions on
directories afterwards. `path' specifies a different directory
to extract to. `members' is optional and must be a subset of the
list returned by getmembers().
"""
directories = []
if members is None:
members = self
for tarinfo in members:
if tarinfo.isdir():
# Extract directories with a safe mode.
directories.append(tarinfo)
tarinfo = copy.copy(tarinfo)
tarinfo.mode = 0700
self.extract(tarinfo, path)
# Reverse sort directories.
directories.sort(key=operator.attrgetter('name'))
directories.reverse()
# Set correct owner, mtime and filemode on directories.
for tarinfo in directories:
dirpath = os.path.join(path, tarinfo.name)
try:
self.chown(tarinfo, dirpath)
self.utime(tarinfo, dirpath)
self.chmod(tarinfo, dirpath)
except ExtractError, e:
if self.errorlevel > 1:
raise
else:
self._dbg(1, "tarfile: %s" % e)
def extract(self, member, path=""):
"""Extract a member from the archive to the current working directory,
using its full name. Its file information is extracted as accurately
as possible. `member' may be a filename or a TarInfo object. You can
specify a different directory using `path'.
"""
self._check("r")
if isinstance(member, basestring):
tarinfo = self.getmember(member)
else:
tarinfo = member
# Prepare the link target for makelink().
if tarinfo.islnk():
tarinfo._link_target = os.path.join(path, tarinfo.linkname)
try:
self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
except EnvironmentError, e:
if self.errorlevel > 0:
raise
else:
if e.filename is None:
self._dbg(1, "tarfile: %s" % e.strerror)
else:
self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename))
except ExtractError, e:
if self.errorlevel > 1:
raise
else:
self._dbg(1, "tarfile: %s" % e)
def extractfile(self, member):
"""Extract a member from the archive as a file object. `member' may be
a filename or a TarInfo object. If `member' is a regular file, a
file-like object is returned. If `member' is a link, a file-like
object is constructed from the link's target. If `member' is none of
the above, None is returned.
The file-like object is read-only and provides the following
methods: read(), readline(), readlines(), seek() and tell()
"""
self._check("r")
if isinstance(member, basestring):
tarinfo = self.getmember(member)
else:
tarinfo = member
if tarinfo.isreg():
return self.fileobject(self, tarinfo)
elif tarinfo.type not in SUPPORTED_TYPES:
# If a member's type is unknown, it is treated as a
# regular file.
return self.fileobject(self, tarinfo)
elif tarinfo.islnk() or tarinfo.issym():
if isinstance(self.fileobj, _Stream):
# A small but ugly workaround for the case that someone tries
# to extract a (sym)link as a file-object from a non-seekable
# stream of tar blocks.
raise StreamError("cannot extract (sym)link as file object")
else:
# A (sym)link's file object is its target's file object.
return self.extractfile(self._find_link_target(tarinfo))
else:
# If there's no data associated with the member (directory, chrdev,
# blkdev, etc.), return None instead of a file object.
return None
def _extract_member(self, tarinfo, targetpath):
"""Extract the TarInfo object tarinfo to a physical
file called targetpath.
"""
# Fetch the TarInfo object for the given name
# and build the destination pathname, replacing
# forward slashes to platform specific separators.
targetpath = targetpath.rstrip("/")
targetpath = targetpath.replace("/", os.sep)
# Create all upper directories.
upperdirs = os.path.dirname(targetpath)
if upperdirs and not os.path.exists(upperdirs):
# Create directories that are not part of the archive with
# default permissions.
os.makedirs(upperdirs)
if tarinfo.islnk() or tarinfo.issym():
self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname))
else:
self._dbg(1, tarinfo.name)
if tarinfo.isreg():
self.makefile(tarinfo, targetpath)
elif tarinfo.isdir():
self.makedir(tarinfo, targetpath)
elif tarinfo.isfifo():
self.makefifo(tarinfo, targetpath)
elif tarinfo.ischr() or tarinfo.isblk():
self.makedev(tarinfo, targetpath)
elif tarinfo.islnk() or tarinfo.issym():
self.makelink(tarinfo, targetpath)
elif tarinfo.type not in SUPPORTED_TYPES:
self.makeunknown(tarinfo, targetpath)
else:
self.makefile(tarinfo, targetpath)
self.chown(tarinfo, targetpath)
if not tarinfo.issym():
self.chmod(tarinfo, targetpath)
self.utime(tarinfo, targetpath)
#--------------------------------------------------------------------------
# Below are the different file methods. They are called via
# _extract_member() when extract() is called. They can be replaced in a
# subclass to implement other functionality.
def makedir(self, tarinfo, targetpath):
"""Make a directory called targetpath.
"""
try:
# Use a safe mode for the directory, the real mode is set
# later in _extract_member().
os.mkdir(targetpath, 0700)
except EnvironmentError, e:
if e.errno != errno.EEXIST:
raise
def makefile(self, tarinfo, targetpath):
"""Make a file called targetpath.
"""
source = self.extractfile(tarinfo)
try:
with bltn_open(targetpath, "wb") as target:
copyfileobj(source, target)
finally:
source.close()
def makeunknown(self, tarinfo, targetpath):
"""Make a file from a TarInfo object with an unknown type
at targetpath.
"""
self.makefile(tarinfo, targetpath)
self._dbg(1, "tarfile: Unknown file type %r, " \
"extracted as regular file." % tarinfo.type)
def makefifo(self, tarinfo, targetpath):
"""Make a fifo called targetpath.
"""
if hasattr(os, "mkfifo"):
os.mkfifo(targetpath)
else:
raise ExtractError("fifo not supported by system")
def makedev(self, tarinfo, targetpath):
"""Make a character or block device called targetpath.
"""
if not hasattr(os, "mknod") or not hasattr(os, "makedev"):
raise ExtractError("special devices not supported by system")
mode = tarinfo.mode
if tarinfo.isblk():
mode |= stat.S_IFBLK
else:
mode |= stat.S_IFCHR
os.mknod(targetpath, mode,
os.makedev(tarinfo.devmajor, tarinfo.devminor))
def makelink(self, tarinfo, targetpath):
"""Make a (symbolic) link called targetpath. If it cannot be created
(platform limitation), we try to make a copy of the referenced file
instead of a link.
"""
if hasattr(os, "symlink") and hasattr(os, "link"):
# For systems that support symbolic and hard links.
if tarinfo.issym():
if os.path.lexists(targetpath):
os.unlink(targetpath)
os.symlink(tarinfo.linkname, targetpath)
else:
# See extract().
if os.path.exists(tarinfo._link_target):
if os.path.lexists(targetpath):
os.unlink(targetpath)
os.link(tarinfo._link_target, targetpath)
else:
self._extract_member(self._find_link_target(tarinfo), targetpath)
else:
try:
self._extract_member(self._find_link_target(tarinfo), targetpath)
except KeyError:
raise ExtractError("unable to resolve link inside archive")
def chown(self, tarinfo, targetpath):
"""Set owner of targetpath according to tarinfo.
"""
if pwd and hasattr(os, "geteuid") and os.geteuid() == 0:
# We have to be root to do so.
try:
g = grp.getgrnam(tarinfo.gname)[2]
except KeyError:
g = tarinfo.gid
try:
u = pwd.getpwnam(tarinfo.uname)[2]
except KeyError:
u = tarinfo.uid
try:
if tarinfo.issym() and hasattr(os, "lchown"):
os.lchown(targetpath, u, g)
else:
if sys.platform != "os2emx":
os.chown(targetpath, u, g)
except EnvironmentError, e:
raise ExtractError("could not change owner")
def chmod(self, tarinfo, targetpath):
"""Set file permissions of targetpath according to tarinfo.
"""
if hasattr(os, 'chmod'):
try:
os.chmod(targetpath, tarinfo.mode)
except EnvironmentError, e:
raise ExtractError("could not change mode")
def utime(self, tarinfo, targetpath):
"""Set modification time of targetpath according to tarinfo.
"""
if not hasattr(os, 'utime'):
return
try:
os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime))
except EnvironmentError, e:
raise ExtractError("could not change modification time")
#--------------------------------------------------------------------------
def next(self):
"""Return the next member of the archive as a TarInfo object, when
TarFile is opened for reading. Return None if there is no more
available.
"""
self._check("ra")
if self.firstmember is not None:
m = self.firstmember
self.firstmember = None
return m
# Advance the file pointer.
if self.offset != self.fileobj.tell():
self.fileobj.seek(self.offset - 1)
if not self.fileobj.read(1):
raise ReadError("unexpected end of data")
# Read the next block.
tarinfo = None
while True:
try:
tarinfo = self.tarinfo.fromtarfile(self)
except EOFHeaderError, e:
if self.ignore_zeros:
self._dbg(2, "0x%X: %s" % (self.offset, e))
self.offset += BLOCKSIZE
continue
except InvalidHeaderError, e:
if self.ignore_zeros:
self._dbg(2, "0x%X: %s" % (self.offset, e))
self.offset += BLOCKSIZE
continue
elif self.offset == 0:
raise ReadError(str(e))
except EmptyHeaderError:
if self.offset == 0:
raise ReadError("empty file")
except TruncatedHeaderError, e:
if self.offset == 0:
raise ReadError(str(e))
except SubsequentHeaderError, e:
raise ReadError(str(e))
break
if tarinfo is not None:
self.members.append(tarinfo)
else:
self._loaded = True
return tarinfo
#--------------------------------------------------------------------------
# Little helper methods:
def _getmember(self, name, tarinfo=None, normalize=False):
"""Find an archive member by name from bottom to top.
If tarinfo is given, it is used as the starting point.
"""
# Ensure that all members have been loaded.
members = self.getmembers()
# Limit the member search list up to tarinfo.
if tarinfo is not None:
members = members[:members.index(tarinfo)]
if normalize:
name = os.path.normpath(name)
for member in reversed(members):
if normalize:
member_name = os.path.normpath(member.name)
else:
member_name = member.name
if name == member_name:
return member
def _load(self):
"""Read through the entire archive file and look for readable
members.
"""
while True:
tarinfo = self.next()
if tarinfo is None:
break
self._loaded = True
def _check(self, mode=None):
"""Check if TarFile is still open, and if the operation's mode
corresponds to TarFile's mode.
"""
if self.closed:
raise IOError("%s is closed" % self.__class__.__name__)
if mode is not None and self.mode not in mode:
raise IOError("bad operation for mode %r" % self.mode)
def _find_link_target(self, tarinfo):
"""Find the target member of a symlink or hardlink member in the
archive.
"""
if tarinfo.issym():
# Always search the entire archive.
linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname)))
limit = None
else:
# Search the archive before the link, because a hard link is
# just a reference to an already archived file.
linkname = tarinfo.linkname
limit = tarinfo
member = self._getmember(linkname, tarinfo=limit, normalize=True)
if member is None:
raise KeyError("linkname %r not found" % linkname)
return member
def __iter__(self):
"""Provide an iterator object.
"""
if self._loaded:
return iter(self.members)
else:
return TarIter(self)
def _dbg(self, level, msg):
"""Write debugging output to sys.stderr.
"""
if level <= self.debug:
print >> sys.stderr, msg
def __enter__(self):
self._check()
return self
def __exit__(self, type, value, traceback):
if type is None:
self.close()
else:
# An exception occurred. We must not call close() because
# it would try to write end-of-archive blocks and padding.
if not self._extfileobj:
self.fileobj.close()
self.closed = True
# class TarFile
class TarIter:
"""Iterator Class.
for tarinfo in TarFile(...):
suite...
"""
def __init__(self, tarfile):
"""Construct a TarIter object.
"""
self.tarfile = tarfile
self.index = 0
def __iter__(self):
"""Return iterator object.
"""
return self
def next(self):
"""Return the next item using TarFile's next() method.
When all members have been read, set TarFile as _loaded.
"""
# Fix for SF #1100429: Under rare circumstances it can
# happen that getmembers() is called during iteration,
# which will cause TarIter to stop prematurely.
if self.index == 0 and self.tarfile.firstmember is not None:
tarinfo = self.tarfile.next()
elif self.index < len(self.tarfile.members):
tarinfo = self.tarfile.members[self.index]
elif not self.tarfile._loaded:
tarinfo = self.tarfile.next()
if not tarinfo:
self.tarfile._loaded = True
raise StopIteration
else:
raise StopIteration
self.index += 1
return tarinfo
# Helper classes for sparse file support
class _section:
"""Base class for _data and _hole.
"""
def __init__(self, offset, size):
self.offset = offset
self.size = size
def __contains__(self, offset):
return self.offset <= offset < self.offset + self.size
class _data(_section):
"""Represent a data section in a sparse file.
"""
def __init__(self, offset, size, realpos):
_section.__init__(self, offset, size)
self.realpos = realpos
class _hole(_section):
"""Represent a hole section in a sparse file.
"""
pass
class _ringbuffer(list):
"""Ringbuffer class which increases performance
over a regular list.
"""
def __init__(self):
self.idx = 0
def find(self, offset):
idx = self.idx
while True:
item = self[idx]
if offset in item:
break
idx += 1
if idx == len(self):
idx = 0
if idx == self.idx:
# End of File
return None
self.idx = idx
return item
#---------------------------------------------
# zipfile compatible TarFile class
#---------------------------------------------
TAR_PLAIN = 0 # zipfile.ZIP_STORED
TAR_GZIPPED = 8 # zipfile.ZIP_DEFLATED
class TarFileCompat:
"""TarFile class compatible with standard module zipfile's
ZipFile class.
"""
def __init__(self, file, mode="r", compression=TAR_PLAIN):
from warnings import warnpy3k
warnpy3k("the TarFileCompat class has been removed in Python 3.0",
stacklevel=2)
if compression == TAR_PLAIN:
self.tarfile = TarFile.taropen(file, mode)
elif compression == TAR_GZIPPED:
self.tarfile = TarFile.gzopen(file, mode)
else:
raise ValueError("unknown compression constant")
if mode[0:1] == "r":
members = self.tarfile.getmembers()
for m in members:
m.filename = m.name
m.file_size = m.size
m.date_time = time.gmtime(m.mtime)[:6]
def namelist(self):
return map(lambda m: m.name, self.infolist())
def infolist(self):
return filter(lambda m: m.type in REGULAR_TYPES,
self.tarfile.getmembers())
def printdir(self):
self.tarfile.list()
def testzip(self):
return
def getinfo(self, name):
return self.tarfile.getmember(name)
def read(self, name):
return self.tarfile.extractfile(self.tarfile.getmember(name)).read()
def write(self, filename, arcname=None, compress_type=None):
self.tarfile.add(filename, arcname)
def writestr(self, zinfo, bytes):
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
import calendar
tinfo = TarInfo(zinfo.filename)
tinfo.size = len(bytes)
tinfo.mtime = calendar.timegm(zinfo.date_time)
self.tarfile.addfile(tinfo, StringIO(bytes))
def close(self):
self.tarfile.close()
#class TarFileCompat
#--------------------
# exported functions
#--------------------
def is_tarfile(name):
"""Return True if name points to a tar archive that we
are able to handle, else return False.
"""
try:
t = open(name)
t.close()
return True
except TarError:
return False
open = TarFile.open
r"""TELNET client class.
Based on RFC 854: TELNET Protocol Specification, by J. Postel and
J. Reynolds
Example:
>>> from telnetlib import Telnet
>>> tn = Telnet('www.python.org', 79) # connect to finger port
>>> tn.write('guido\r\n')
>>> print tn.read_all()
Login Name TTY Idle When Where
guido Guido van Rossum pts/2 <Dec 2 11:10> snag.cnri.reston..
>>>
Note that read_all() won't read until eof -- it just reads some data
-- but it guarantees to read at least one byte unless EOF is hit.
It is possible to pass a Telnet object to select.select() in order to
wait until more data is available. Note that in this case,
read_eager() may return '' even if there was data on the socket,
because the protocol negotiation may have eaten the data. This is why
EOFError is needed in some cases to distinguish between "no data" and
"connection closed" (since the socket also appears ready for reading
when it is closed).
To do:
- option negotiation
- timeout should be intrinsic to the connection object instead of an
option on one of the read calls only
"""
# Imported modules
import errno
import sys
import socket
import select
__all__ = ["Telnet"]
# Tunable parameters
DEBUGLEVEL = 0
# Telnet protocol defaults
TELNET_PORT = 23
# Telnet protocol characters (don't change)
IAC = chr(255) # "Interpret As Command"
DONT = chr(254)
DO = chr(253)
WONT = chr(252)
WILL = chr(251)
theNULL = chr(0)
SE = chr(240) # Subnegotiation End
NOP = chr(241) # No Operation
DM = chr(242) # Data Mark
BRK = chr(243) # Break
IP = chr(244) # Interrupt process
AO = chr(245) # Abort output
AYT = chr(246) # Are You There
EC = chr(247) # Erase Character
EL = chr(248) # Erase Line
GA = chr(249) # Go Ahead
SB = chr(250) # Subnegotiation Begin
# Telnet protocol options code (don't change)
# These ones all come from arpa/telnet.h
BINARY = chr(0) # 8-bit data path
ECHO = chr(1) # echo
RCP = chr(2) # prepare to reconnect
SGA = chr(3) # suppress go ahead
NAMS = chr(4) # approximate message size
STATUS = chr(5) # give status
TM = chr(6) # timing mark
RCTE = chr(7) # remote controlled transmission and echo
NAOL = chr(8) # negotiate about output line width
NAOP = chr(9) # negotiate about output page size
NAOCRD = chr(10) # negotiate about CR disposition
NAOHTS = chr(11) # negotiate about horizontal tabstops
NAOHTD = chr(12) # negotiate about horizontal tab disposition
NAOFFD = chr(13) # negotiate about formfeed disposition
NAOVTS = chr(14) # negotiate about vertical tab stops
NAOVTD = chr(15) # negotiate about vertical tab disposition
NAOLFD = chr(16) # negotiate about output LF disposition
XASCII = chr(17) # extended ascii character set
LOGOUT = chr(18) # force logout
BM = chr(19) # byte macro
DET = chr(20) # data entry terminal
SUPDUP = chr(21) # supdup protocol
SUPDUPOUTPUT = chr(22) # supdup output
SNDLOC = chr(23) # send location
TTYPE = chr(24) # terminal type
EOR = chr(25) # end or record
TUID = chr(26) # TACACS user identification
OUTMRK = chr(27) # output marking
TTYLOC = chr(28) # terminal location number
VT3270REGIME = chr(29) # 3270 regime
X3PAD = chr(30) # X.3 PAD
NAWS = chr(31) # window size
TSPEED = chr(32) # terminal speed
LFLOW = chr(33) # remote flow control
LINEMODE = chr(34) # Linemode option
XDISPLOC = chr(35) # X Display Location
OLD_ENVIRON = chr(36) # Old - Environment variables
AUTHENTICATION = chr(37) # Authenticate
ENCRYPT = chr(38) # Encryption option
NEW_ENVIRON = chr(39) # New - Environment variables
# the following ones come from
# http://www.iana.org/assignments/telnet-options
# Unfortunately, that document does not assign identifiers
# to all of them, so we are making them up
TN3270E = chr(40) # TN3270E
XAUTH = chr(41) # XAUTH
CHARSET = chr(42) # CHARSET
RSP = chr(43) # Telnet Remote Serial Port
COM_PORT_OPTION = chr(44) # Com Port Control Option
SUPPRESS_LOCAL_ECHO = chr(45) # Telnet Suppress Local Echo
TLS = chr(46) # Telnet Start TLS
KERMIT = chr(47) # KERMIT
SEND_URL = chr(48) # SEND-URL
FORWARD_X = chr(49) # FORWARD_X
PRAGMA_LOGON = chr(138) # TELOPT PRAGMA LOGON
SSPI_LOGON = chr(139) # TELOPT SSPI LOGON
PRAGMA_HEARTBEAT = chr(140) # TELOPT PRAGMA HEARTBEAT
EXOPL = chr(255) # Extended-Options-List
NOOPT = chr(0)
class Telnet:
"""Telnet interface class.
An instance of this class represents a connection to a telnet
server. The instance is initially not connected; the open()
method must be used to establish a connection. Alternatively, the
host name and optional port number can be passed to the
constructor, too.
Don't try to reopen an already connected instance.
This class has many read_*() methods. Note that some of them
raise EOFError when the end of the connection is read, because
they can return an empty string for other reasons. See the
individual doc strings.
read_until(expected, [timeout])
Read until the expected string has been seen, or a timeout is
hit (default is no timeout); may block.
read_all()
Read all data until EOF; may block.
read_some()
Read at least one byte or EOF; may block.
read_very_eager()
Read all data available already queued or on the socket,
without blocking.
read_eager()
Read either data already queued or some data available on the
socket, without blocking.
read_lazy()
Read all data in the raw queue (processing it first), without
doing any socket I/O.
read_very_lazy()
Reads all data in the cooked queue, without doing any socket
I/O.
read_sb_data()
Reads available data between SB ... SE sequence. Don't block.
set_option_negotiation_callback(callback)
Each time a telnet option is read on the input flow, this callback
(if set) is called with the following parameters :
callback(telnet socket, command, option)
option will be chr(0) when there is no option.
No other action is done afterwards by telnetlib.
"""
def __init__(self, host=None, port=0,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
"""Constructor.
When called without arguments, create an unconnected instance.
With a hostname argument, it connects the instance; port number
and timeout are optional.
"""
self.debuglevel = DEBUGLEVEL
self.host = host
self.port = port
self.timeout = timeout
self.sock = None
self.rawq = ''
self.irawq = 0
self.cookedq = ''
self.eof = 0
self.iacseq = '' # Buffer for IAC sequence.
self.sb = 0 # flag for SB and SE sequence.
self.sbdataq = ''
self.option_callback = None
self._has_poll = hasattr(select, 'poll')
if host is not None:
self.open(host, port, timeout)
def open(self, host, port=0, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
"""Connect to a host.
The optional second argument is the port number, which
defaults to the standard telnet port (23).
Don't try to reopen an already connected instance.
"""
self.eof = 0
if not port:
port = TELNET_PORT
self.host = host
self.port = port
self.timeout = timeout
self.sock = socket.create_connection((host, port), timeout)
def __del__(self):
"""Destructor -- close the connection."""
self.close()
def msg(self, msg, *args):
"""Print a debug message, when the debug level is > 0.
If extra arguments are present, they are substituted in the
message using the standard string formatting operator.
"""
if self.debuglevel > 0:
print 'Telnet(%s,%s):' % (self.host, self.port),
if args:
print msg % args
else:
print msg
def set_debuglevel(self, debuglevel):
"""Set the debug level.
The higher it is, the more debug output you get (on sys.stdout).
"""
self.debuglevel = debuglevel
def close(self):
"""Close the connection."""
sock = self.sock
self.sock = 0
self.eof = 1
self.iacseq = ''
self.sb = 0
if sock:
sock.close()
def get_socket(self):
"""Return the socket object used internally."""
return self.sock
def fileno(self):
"""Return the fileno() of the socket object used internally."""
return self.sock.fileno()
def write(self, buffer):
"""Write a string to the socket, doubling any IAC characters.
Can block if the connection is blocked. May raise
socket.error if the connection is closed.
"""
if IAC in buffer:
buffer = buffer.replace(IAC, IAC+IAC)
self.msg("send %r", buffer)
self.sock.sendall(buffer)
def read_until(self, match, timeout=None):
"""Read until a given string is encountered or until timeout.
When no match is found, return whatever is available instead,
possibly the empty string. Raise EOFError if the connection
is closed and no cooked data is available.
"""
if self._has_poll:
return self._read_until_with_poll(match, timeout)
else:
return self._read_until_with_select(match, timeout)
def _read_until_with_poll(self, match, timeout):
"""Read until a given string is encountered or until timeout.
This method uses select.poll() to implement the timeout.
"""
n = len(match)
call_timeout = timeout
if timeout is not None:
from time import time
time_start = time()
self.process_rawq()
i = self.cookedq.find(match)
if i < 0:
poller = select.poll()
poll_in_or_priority_flags = select.POLLIN | select.POLLPRI
poller.register(self, poll_in_or_priority_flags)
while i < 0 and not self.eof:
try:
# Poll takes its timeout in milliseconds.
ready = poller.poll(None if timeout is None
else 1000 * call_timeout)
except select.error as e:
if e[0] == errno.EINTR:
if timeout is not None:
elapsed = time() - time_start
call_timeout = timeout-elapsed
continue
raise
for fd, mode in ready:
if mode & poll_in_or_priority_flags:
i = max(0, len(self.cookedq)-n)
self.fill_rawq()
self.process_rawq()
i = self.cookedq.find(match, i)
if timeout is not None:
elapsed = time() - time_start
if elapsed >= timeout:
break
call_timeout = timeout-elapsed
poller.unregister(self)
if i >= 0:
i = i + n
buf = self.cookedq[:i]
self.cookedq = self.cookedq[i:]
return buf
return self.read_very_lazy()
def _read_until_with_select(self, match, timeout=None):
"""Read until a given string is encountered or until timeout.
The timeout is implemented using select.select().
"""
n = len(match)
self.process_rawq()
i = self.cookedq.find(match)
if i >= 0:
i = i+n
buf = self.cookedq[:i]
self.cookedq = self.cookedq[i:]
return buf
s_reply = ([self], [], [])
s_args = s_reply
if timeout is not None:
s_args = s_args + (timeout,)
from time import time
time_start = time()
while not self.eof and select.select(*s_args) == s_reply:
i = max(0, len(self.cookedq)-n)
self.fill_rawq()
self.process_rawq()
i = self.cookedq.find(match, i)
if i >= 0:
i = i+n
buf = self.cookedq[:i]
self.cookedq = self.cookedq[i:]
return buf
if timeout is not None:
elapsed = time() - time_start
if elapsed >= timeout:
break
s_args = s_reply + (timeout-elapsed,)
return self.read_very_lazy()
def read_all(self):
"""Read all data until EOF; block until connection closed."""
self.process_rawq()
while not self.eof:
self.fill_rawq()
self.process_rawq()
buf = self.cookedq
self.cookedq = ''
return buf
def read_some(self):
"""Read at least one byte of cooked data unless EOF is hit.
Return '' if EOF is hit. Block if no data is immediately
available.
"""
self.process_rawq()
while not self.cookedq and not self.eof:
self.fill_rawq()
self.process_rawq()
buf = self.cookedq
self.cookedq = ''
return buf
def read_very_eager(self):
"""Read everything that's possible without blocking in I/O (eager).
Raise EOFError if connection closed and no cooked data
available. Return '' if no cooked data available otherwise.
Don't block unless in the midst of an IAC sequence.
"""
self.process_rawq()
while not self.eof and self.sock_avail():
self.fill_rawq()
self.process_rawq()
return self.read_very_lazy()
def read_eager(self):
"""Read readily available data.
Raise EOFError if connection closed and no cooked data
available. Return '' if no cooked data available otherwise.
Don't block unless in the midst of an IAC sequence.
"""
self.process_rawq()
while not self.cookedq and not self.eof and self.sock_avail():
self.fill_rawq()
self.process_rawq()
return self.read_very_lazy()
def read_lazy(self):
"""Process and return data that's already in the queues (lazy).
Raise EOFError if connection closed and no data available.
Return '' if no cooked data available otherwise. Don't block
unless in the midst of an IAC sequence.
"""
self.process_rawq()
return self.read_very_lazy()
def read_very_lazy(self):
"""Return any data available in the cooked queue (very lazy).
Raise EOFError if connection closed and no data available.
Return '' if no cooked data available otherwise. Don't block.
"""
buf = self.cookedq
self.cookedq = ''
if not buf and self.eof and not self.rawq:
raise EOFError, 'telnet connection closed'
return buf
def read_sb_data(self):
"""Return any data available in the SB ... SE queue.
Return '' if no SB ... SE available. Should only be called
after seeing a SB or SE command. When a new SB command is
found, old unread SB data will be discarded. Don't block.
"""
buf = self.sbdataq
self.sbdataq = ''
return buf
def set_option_negotiation_callback(self, callback):
"""Provide a callback function called after each receipt of a telnet option."""
self.option_callback = callback
def process_rawq(self):
"""Transfer from raw queue to cooked queue.
Set self.eof when connection is closed. Don't block unless in
the midst of an IAC sequence.
"""
buf = ['', '']
try:
while self.rawq:
c = self.rawq_getchar()
if not self.iacseq:
if c == theNULL:
continue
if c == "\021":
continue
if c != IAC:
buf[self.sb] = buf[self.sb] + c
continue
else:
self.iacseq += c
elif len(self.iacseq) == 1:
# 'IAC: IAC CMD [OPTION only for WILL/WONT/DO/DONT]'
if c in (DO, DONT, WILL, WONT):
self.iacseq += c
continue
self.iacseq = ''
if c == IAC:
buf[self.sb] = buf[self.sb] + c
else:
if c == SB: # SB ... SE start.
self.sb = 1
self.sbdataq = ''
elif c == SE:
self.sb = 0
self.sbdataq = self.sbdataq + buf[1]
buf[1] = ''
if self.option_callback:
# Callback is supposed to look into
# the sbdataq
self.option_callback(self.sock, c, NOOPT)
else:
# We can't offer automatic processing of
# suboptions. Alas, we should not get any
# unless we did a WILL/DO before.
self.msg('IAC %d not recognized' % ord(c))
elif len(self.iacseq) == 2:
cmd = self.iacseq[1]
self.iacseq = ''
opt = c
if cmd in (DO, DONT):
self.msg('IAC %s %d',
cmd == DO and 'DO' or 'DONT', ord(opt))
if self.option_callback:
self.option_callback(self.sock, cmd, opt)
else:
self.sock.sendall(IAC + WONT + opt)
elif cmd in (WILL, WONT):
self.msg('IAC %s %d',
cmd == WILL and 'WILL' or 'WONT', ord(opt))
if self.option_callback:
self.option_callback(self.sock, cmd, opt)
else:
self.sock.sendall(IAC + DONT + opt)
except EOFError: # raised by self.rawq_getchar()
self.iacseq = '' # Reset on EOF
self.sb = 0
pass
self.cookedq = self.cookedq + buf[0]
self.sbdataq = self.sbdataq + buf[1]
def rawq_getchar(self):
"""Get next char from raw queue.
Block if no data is immediately available. Raise EOFError
when connection is closed.
"""
if not self.rawq:
self.fill_rawq()
if self.eof:
raise EOFError
c = self.rawq[self.irawq]
self.irawq = self.irawq + 1
if self.irawq >= len(self.rawq):
self.rawq = ''
self.irawq = 0
return c
def fill_rawq(self):
"""Fill raw queue from exactly one recv() system call.
Block if no data is immediately available. Set self.eof when
connection is closed.
"""
if self.irawq >= len(self.rawq):
self.rawq = ''
self.irawq = 0
# The buffer size should be fairly small so as to avoid quadratic
# behavior in process_rawq() above
buf = self.sock.recv(50)
self.msg("recv %r", buf)
self.eof = (not buf)
self.rawq = self.rawq + buf
def sock_avail(self):
"""Test whether data is available on the socket."""
return select.select([self], [], [], 0) == ([self], [], [])
def interact(self):
"""Interaction function, emulates a very dumb telnet client."""
if sys.platform == "win32":
self.mt_interact()
return
while 1:
rfd, wfd, xfd = select.select([self, sys.stdin], [], [])
if self in rfd:
try:
text = self.read_eager()
except EOFError:
print '*** Connection closed by remote host ***'
break
if text:
sys.stdout.write(text)
sys.stdout.flush()
if sys.stdin in rfd:
line = sys.stdin.readline()
if not line:
break
self.write(line)
def mt_interact(self):
"""Multithreaded version of interact()."""
import thread
thread.start_new_thread(self.listener, ())
while 1:
line = sys.stdin.readline()
if not line:
break
self.write(line)
def listener(self):
"""Helper for mt_interact() -- this executes in the other thread."""
while 1:
try:
data = self.read_eager()
except EOFError:
print '*** Connection closed by remote host ***'
return
if data:
sys.stdout.write(data)
else:
sys.stdout.flush()
def expect(self, list, timeout=None):
"""Read until one from a list of a regular expressions matches.
The first argument is a list of regular expressions, either
compiled (re.RegexObject instances) or uncompiled (strings).
The optional second argument is a timeout, in seconds; default
is no timeout.
Return a tuple of three items: the index in the list of the
first regular expression that matches; the match object
returned; and the text read up till and including the match.
If EOF is read and no text was read, raise EOFError.
Otherwise, when nothing matches, return (-1, None, text) where
text is the text received so far (may be the empty string if a
timeout happened).
If a regular expression ends with a greedy match (e.g. '.*')
or if more than one expression can match the same input, the
results are undeterministic, and may depend on the I/O timing.
"""
if self._has_poll:
return self._expect_with_poll(list, timeout)
else:
return self._expect_with_select(list, timeout)
def _expect_with_poll(self, expect_list, timeout=None):
"""Read until one from a list of a regular expressions matches.
This method uses select.poll() to implement the timeout.
"""
re = None
expect_list = expect_list[:]
indices = range(len(expect_list))
for i in indices:
if not hasattr(expect_list[i], "search"):
if not re: import re
expect_list[i] = re.compile(expect_list[i])
call_timeout = timeout
if timeout is not None:
from time import time
time_start = time()
self.process_rawq()
m = None
for i in indices:
m = expect_list[i].search(self.cookedq)
if m:
e = m.end()
text = self.cookedq[:e]
self.cookedq = self.cookedq[e:]
break
if not m:
poller = select.poll()
poll_in_or_priority_flags = select.POLLIN | select.POLLPRI
poller.register(self, poll_in_or_priority_flags)
while not m and not self.eof:
try:
ready = poller.poll(None if timeout is None
else 1000 * call_timeout)
except select.error as e:
if e[0] == errno.EINTR:
if timeout is not None:
elapsed = time() - time_start
call_timeout = timeout-elapsed
continue
raise
for fd, mode in ready:
if mode & poll_in_or_priority_flags:
self.fill_rawq()
self.process_rawq()
for i in indices:
m = expect_list[i].search(self.cookedq)
if m:
e = m.end()
text = self.cookedq[:e]
self.cookedq = self.cookedq[e:]
break
if timeout is not None:
elapsed = time() - time_start
if elapsed >= timeout:
break
call_timeout = timeout-elapsed
poller.unregister(self)
if m:
return (i, m, text)
text = self.read_very_lazy()
if not text and self.eof:
raise EOFError
return (-1, None, text)
def _expect_with_select(self, list, timeout=None):
"""Read until one from a list of a regular expressions matches.
The timeout is implemented using select.select().
"""
re = None
list = list[:]
indices = range(len(list))
for i in indices:
if not hasattr(list[i], "search"):
if not re: import re
list[i] = re.compile(list[i])
if timeout is not None:
from time import time
time_start = time()
while 1:
self.process_rawq()
for i in indices:
m = list[i].search(self.cookedq)
if m:
e = m.end()
text = self.cookedq[:e]
self.cookedq = self.cookedq[e:]
return (i, m, text)
if self.eof:
break
if timeout is not None:
elapsed = time() - time_start
if elapsed >= timeout:
break
s_args = ([self.fileno()], [], [], timeout-elapsed)
r, w, x = select.select(*s_args)
if not r:
break
self.fill_rawq()
text = self.read_very_lazy()
if not text and self.eof:
raise EOFError
return (-1, None, text)
def test():
"""Test program for telnetlib.
Usage: python telnetlib.py [-d] ... [host [port]]
Default host is localhost; default port is 23.
"""
debuglevel = 0
while sys.argv[1:] and sys.argv[1] == '-d':
debuglevel = debuglevel+1
del sys.argv[1]
host = 'localhost'
if sys.argv[1:]:
host = sys.argv[1]
port = 0
if sys.argv[2:]:
portstr = sys.argv[2]
try:
port = int(portstr)
except ValueError:
port = socket.getservbyname(portstr, 'tcp')
tn = Telnet()
tn.set_debuglevel(debuglevel)
tn.open(host, port, timeout=0.5)
tn.interact()
tn.close()
if __name__ == '__main__':
test()
"""Temporary files.
This module provides generic, low- and high-level interfaces for
creating temporary files and directories. All of the interfaces
provided by this module can be used without fear of race conditions
except for 'mktemp'. 'mktemp' is subject to race conditions and
should not be used; it is provided for backward compatibility only.
This module also provides some data items to the user:
TMP_MAX - maximum number of names that will be tried before
giving up.
template - the default prefix for all temporary names.
You may change this to control the default prefix.
tempdir - If this is set to a string before the first use of
any routine from this module, it will be considered as
another candidate location to store temporary files.
"""
__all__ = [
"NamedTemporaryFile", "TemporaryFile", # high level safe interfaces
"SpooledTemporaryFile",
"mkstemp", "mkdtemp", # low level safe interfaces
"mktemp", # deprecated unsafe interface
"TMP_MAX", "gettempprefix", # constants
"tempdir", "gettempdir"
]
# Imports.
import io as _io
import os as _os
import errno as _errno
from random import Random as _Random
try:
from cStringIO import StringIO as _StringIO
except ImportError:
from StringIO import StringIO as _StringIO
try:
import fcntl as _fcntl
except ImportError:
def _set_cloexec(fd):
pass
else:
def _set_cloexec(fd):
try:
flags = _fcntl.fcntl(fd, _fcntl.F_GETFD, 0)
except IOError:
pass
else:
# flags read successfully, modify
flags |= _fcntl.FD_CLOEXEC
_fcntl.fcntl(fd, _fcntl.F_SETFD, flags)
try:
import thread as _thread
except ImportError:
import dummy_thread as _thread
_allocate_lock = _thread.allocate_lock
_text_openflags = _os.O_RDWR | _os.O_CREAT | _os.O_EXCL
if hasattr(_os, 'O_NOINHERIT'):
_text_openflags |= _os.O_NOINHERIT
if hasattr(_os, 'O_NOFOLLOW'):
_text_openflags |= _os.O_NOFOLLOW
_bin_openflags = _text_openflags
if hasattr(_os, 'O_BINARY'):
_bin_openflags |= _os.O_BINARY
if hasattr(_os, 'TMP_MAX'):
TMP_MAX = _os.TMP_MAX
else:
TMP_MAX = 10000
template = "tmp"
# Internal routines.
_once_lock = _allocate_lock()
if hasattr(_os, "lstat"):
_stat = _os.lstat
elif hasattr(_os, "stat"):
_stat = _os.stat
else:
# Fallback. All we need is something that raises os.error if the
# file doesn't exist.
def _stat(fn):
try:
f = open(fn)
except IOError:
raise _os.error
f.close()
def _exists(fn):
try:
_stat(fn)
except _os.error:
return False
else:
return True
class _RandomNameSequence:
"""An instance of _RandomNameSequence generates an endless
sequence of unpredictable strings which can safely be incorporated
into file names. Each string is six characters long. Multiple
threads can safely use the same instance at the same time.
_RandomNameSequence is an iterator."""
characters = ("abcdefghijklmnopqrstuvwxyz" +
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
"0123456789_")
def __init__(self):
self.mutex = _allocate_lock()
self.normcase = _os.path.normcase
@property
def rng(self):
cur_pid = _os.getpid()
if cur_pid != getattr(self, '_rng_pid', None):
self._rng = _Random()
self._rng_pid = cur_pid
return self._rng
def __iter__(self):
return self
def next(self):
m = self.mutex
c = self.characters
choose = self.rng.choice
m.acquire()
try:
letters = [choose(c) for dummy in "123456"]
finally:
m.release()
return self.normcase(''.join(letters))
def _candidate_tempdir_list():
"""Generate a list of candidate temporary directories which
_get_default_tempdir will try."""
dirlist = []
# First, try the environment.
for envname in 'TMPDIR', 'TEMP', 'TMP':
dirname = _os.getenv(envname)
if dirname: dirlist.append(dirname)
# Failing that, try OS-specific locations.
if _os.name == 'riscos':
dirname = _os.getenv('Wimp$ScrapDir')
if dirname: dirlist.append(dirname)
elif _os.name == 'nt':
dirlist.extend([ r'c:\temp', r'c:\tmp', r'\temp', r'\tmp' ])
else:
dirlist.extend([ '/tmp', '/var/tmp', '/usr/tmp' ])
# As a last resort, the current directory.
try:
dirlist.append(_os.getcwd())
except (AttributeError, _os.error):
dirlist.append(_os.curdir)
return dirlist
def _get_default_tempdir():
"""Calculate the default directory to use for temporary files.
This routine should be called exactly once.
We determine whether or not a candidate temp dir is usable by
trying to create and write to a file in that directory. If this
is successful, the test file is deleted. To prevent denial of
service, the name of the test file must be randomized."""
namer = _RandomNameSequence()
dirlist = _candidate_tempdir_list()
flags = _text_openflags
for dir in dirlist:
if dir != _os.curdir:
dir = _os.path.normcase(_os.path.abspath(dir))
# Try only a few names per directory.
for seq in xrange(100):
name = namer.next()
filename = _os.path.join(dir, name)
try:
fd = _os.open(filename, flags, 0o600)
try:
try:
with _io.open(fd, 'wb', closefd=False) as fp:
fp.write(b'blat')
finally:
_os.close(fd)
finally:
_os.unlink(filename)
return dir
except (OSError, IOError) as e:
if e.args[0] == _errno.EEXIST:
continue
if (_os.name == 'nt' and e.args[0] == _errno.EACCES and
_os.path.isdir(dir) and _os.access(dir, _os.W_OK)):
# On windows, when a directory with the chosen name already
# exists, EACCES error code is returned instead of EEXIST.
continue
break # no point trying more names in this directory
raise IOError, (_errno.ENOENT,
("No usable temporary directory found in %s" % dirlist))
_name_sequence = None
def _get_candidate_names():
"""Common setup sequence for all user-callable interfaces."""
global _name_sequence
if _name_sequence is None:
_once_lock.acquire()
try:
if _name_sequence is None:
_name_sequence = _RandomNameSequence()
finally:
_once_lock.release()
return _name_sequence
def _mkstemp_inner(dir, pre, suf, flags):
"""Code common to mkstemp, TemporaryFile, and NamedTemporaryFile."""
names = _get_candidate_names()
for seq in xrange(TMP_MAX):
name = names.next()
file = _os.path.join(dir, pre + name + suf)
try:
fd = _os.open(file, flags, 0600)
_set_cloexec(fd)
return (fd, _os.path.abspath(file))
except OSError, e:
if e.errno == _errno.EEXIST:
continue # try again
if (_os.name == 'nt' and e.errno == _errno.EACCES and
_os.path.isdir(dir) and _os.access(dir, _os.W_OK)):
# On windows, when a directory with the chosen name already
# exists, EACCES error code is returned instead of EEXIST.
continue
raise
raise IOError, (_errno.EEXIST, "No usable temporary file name found")
# User visible interfaces.
def gettempprefix():
"""Accessor for tempdir.template."""
return template
tempdir = None
def gettempdir():
"""Accessor for tempfile.tempdir."""
global tempdir
if tempdir is None:
_once_lock.acquire()
try:
if tempdir is None:
tempdir = _get_default_tempdir()
finally:
_once_lock.release()
return tempdir
def mkstemp(suffix="", prefix=template, dir=None, text=False):
"""User-callable function to create and return a unique temporary
file. The return value is a pair (fd, name) where fd is the
file descriptor returned by os.open, and name is the filename.
If 'suffix' is specified, the file name will end with that suffix,
otherwise there will be no suffix.
If 'prefix' is specified, the file name will begin with that prefix,
otherwise a default prefix is used.
If 'dir' is specified, the file will be created in that directory,
otherwise a default directory is used.
If 'text' is specified and true, the file is opened in text
mode. Else (the default) the file is opened in binary mode. On
some operating systems, this makes no difference.
The file is readable and writable only by the creating user ID.
If the operating system uses permission bits to indicate whether a
file is executable, the file is executable by no one. The file
descriptor is not inherited by children of this process.
Caller is responsible for deleting the file when done with it.
"""
if dir is None:
dir = gettempdir()
if text:
flags = _text_openflags
else:
flags = _bin_openflags
return _mkstemp_inner(dir, prefix, suffix, flags)
def mkdtemp(suffix="", prefix=template, dir=None):
"""User-callable function to create and return a unique temporary
directory. The return value is the pathname of the directory.
Arguments are as for mkstemp, except that the 'text' argument is
not accepted.
The directory is readable, writable, and searchable only by the
creating user.
Caller is responsible for deleting the directory when done with it.
"""
if dir is None:
dir = gettempdir()
names = _get_candidate_names()
for seq in xrange(TMP_MAX):
name = names.next()
file = _os.path.join(dir, prefix + name + suffix)
try:
_os.mkdir(file, 0700)
return file
except OSError, e:
if e.errno == _errno.EEXIST:
continue # try again
if (_os.name == 'nt' and e.errno == _errno.EACCES and
_os.path.isdir(dir) and _os.access(dir, _os.W_OK)):
# On windows, when a directory with the chosen name already
# exists, EACCES error code is returned instead of EEXIST.
continue
raise
raise IOError, (_errno.EEXIST, "No usable temporary directory name found")
def mktemp(suffix="", prefix=template, dir=None):
"""User-callable function to return a unique temporary file name. The
file is not created.
Arguments are as for mkstemp, except that the 'text' argument is
not accepted.
This function is unsafe and should not be used. The file name
refers to a file that did not exist at some point, but by the time
you get around to creating it, someone else may have beaten you to
the punch.
"""
## from warnings import warn as _warn
## _warn("mktemp is a potential security risk to your program",
## RuntimeWarning, stacklevel=2)
if dir is None:
dir = gettempdir()
names = _get_candidate_names()
for seq in xrange(TMP_MAX):
name = names.next()
file = _os.path.join(dir, prefix + name + suffix)
if not _exists(file):
return file
raise IOError, (_errno.EEXIST, "No usable temporary filename found")
class _TemporaryFileWrapper:
"""Temporary file wrapper
This class provides a wrapper around files opened for
temporary use. In particular, it seeks to automatically
remove the file when it is no longer needed.
"""
def __init__(self, file, name, delete=True):
self.file = file
self.name = name
self.close_called = False
self.delete = delete
def __getattr__(self, name):
# Attribute lookups are delegated to the underlying file
# and cached for non-numeric results
# (i.e. methods are cached, closed and friends are not)
file = self.__dict__['file']
a = getattr(file, name)
if not issubclass(type(a), type(0)):
setattr(self, name, a)
return a
# The underlying __enter__ method returns the wrong object
# (self.file) so override it to return the wrapper
def __enter__(self):
self.file.__enter__()
return self
# NT provides delete-on-close as a primitive, so we don't need
# the wrapper to do anything special. We still use it so that
# file.name is useful (i.e. not "(fdopen)") with NamedTemporaryFile.
if _os.name != 'nt':
# Cache the unlinker so we don't get spurious errors at
# shutdown when the module-level "os" is None'd out. Note
# that this must be referenced as self.unlink, because the
# name TemporaryFileWrapper may also get None'd out before
# __del__ is called.
unlink = _os.unlink
def close(self):
if not self.close_called:
self.close_called = True
try:
self.file.close()
finally:
if self.delete:
self.unlink(self.name)
def __del__(self):
self.close()
# Need to trap __exit__ as well to ensure the file gets
# deleted when used in a with statement
def __exit__(self, exc, value, tb):
result = self.file.__exit__(exc, value, tb)
self.close()
return result
else:
def __exit__(self, exc, value, tb):
self.file.__exit__(exc, value, tb)
def NamedTemporaryFile(mode='w+b', bufsize=-1, suffix="",
prefix=template, dir=None, delete=True):
"""Create and return a temporary file.
Arguments:
'prefix', 'suffix', 'dir' -- as for mkstemp.
'mode' -- the mode argument to os.fdopen (default "w+b").
'bufsize' -- the buffer size argument to os.fdopen (default -1).
'delete' -- whether the file is deleted on close (default True).
The file is created as mkstemp() would do it.
Returns an object with a file-like interface; the name of the file
is accessible as its 'name' attribute. The file will be automatically
deleted when it is closed unless the 'delete' argument is set to False.
"""
if dir is None:
dir = gettempdir()
if 'b' in mode:
flags = _bin_openflags
else:
flags = _text_openflags
# Setting O_TEMPORARY in the flags causes the OS to delete
# the file when it is closed. This is only supported by Windows.
if _os.name == 'nt' and delete:
flags |= _os.O_TEMPORARY
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
try:
file = _os.fdopen(fd, mode, bufsize)
return _TemporaryFileWrapper(file, name, delete)
except BaseException:
_os.unlink(name)
_os.close(fd)
raise
if _os.name != 'posix' or _os.sys.platform == 'cygwin':
# On non-POSIX and Cygwin systems, assume that we cannot unlink a file
# while it is open.
TemporaryFile = NamedTemporaryFile
else:
def TemporaryFile(mode='w+b', bufsize=-1, suffix="",
prefix=template, dir=None):
"""Create and return a temporary file.
Arguments:
'prefix', 'suffix', 'dir' -- as for mkstemp.
'mode' -- the mode argument to os.fdopen (default "w+b").
'bufsize' -- the buffer size argument to os.fdopen (default -1).
The file is created as mkstemp() would do it.
Returns an object with a file-like interface. The file has no
name, and will cease to exist when it is closed.
"""
if dir is None:
dir = gettempdir()
if 'b' in mode:
flags = _bin_openflags
else:
flags = _text_openflags
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
try:
_os.unlink(name)
return _os.fdopen(fd, mode, bufsize)
except:
_os.close(fd)
raise
class SpooledTemporaryFile:
"""Temporary file wrapper, specialized to switch from
StringIO to a real file when it exceeds a certain size or
when a fileno is needed.
"""
_rolled = False
def __init__(self, max_size=0, mode='w+b', bufsize=-1,
suffix="", prefix=template, dir=None):
self._file = _StringIO()
self._max_size = max_size
self._rolled = False
self._TemporaryFileArgs = (mode, bufsize, suffix, prefix, dir)
def _check(self, file):
if self._rolled: return
max_size = self._max_size
if max_size and file.tell() > max_size:
self.rollover()
def rollover(self):
if self._rolled: return
file = self._file
newfile = self._file = TemporaryFile(*self._TemporaryFileArgs)
del self._TemporaryFileArgs
newfile.write(file.getvalue())
newfile.seek(file.tell(), 0)
self._rolled = True
# The method caching trick from NamedTemporaryFile
# won't work here, because _file may change from a
# _StringIO instance to a real file. So we list
# all the methods directly.
# Context management protocol
def __enter__(self):
if self._file.closed:
raise ValueError("Cannot enter context with closed file")
return self
def __exit__(self, exc, value, tb):
self._file.close()
# file protocol
def __iter__(self):
return self._file.__iter__()
def close(self):
self._file.close()
@property
def closed(self):
return self._file.closed
def fileno(self):
self.rollover()
return self._file.fileno()
def flush(self):
self._file.flush()
def isatty(self):
return self._file.isatty()
@property
def mode(self):
try:
return self._file.mode
except AttributeError:
return self._TemporaryFileArgs[0]
@property
def name(self):
try:
return self._file.name
except AttributeError:
return None
def next(self):
return self._file.next
def read(self, *args):
return self._file.read(*args)
def readline(self, *args):
return self._file.readline(*args)
def readlines(self, *args):
return self._file.readlines(*args)
def seek(self, *args):
self._file.seek(*args)
@property
def softspace(self):
return self._file.softspace
def tell(self):
return self._file.tell()
def truncate(self):
self._file.truncate()
def write(self, s):
file = self._file
rv = file.write(s)
self._check(file)
return rv
def writelines(self, iterable):
file = self._file
rv = file.writelines(iterable)
self._check(file)
return rv
def xreadlines(self, *args):
if hasattr(self._file, 'xreadlines'): # real file
return iter(self._file)
else: # StringIO()
return iter(self._file.readlines(*args))
"""Text wrapping and filling.
"""
# Copyright (C) 1999-2001 Gregory P. Ward.
# Copyright (C) 2002, 2003 Python Software Foundation.
# Written by Greg Ward <[email protected]>
__revision__ = "$Id$"
import string, re
try:
_unicode = unicode
except NameError:
# If Python is built without Unicode support, the unicode type
# will not exist. Fake one.
class _unicode(object):
pass
# Do the right thing with boolean values for all known Python versions
# (so this module can be copied to projects that don't depend on Python
# 2.3, e.g. Optik and Docutils) by uncommenting the block of code below.
#try:
# True, False
#except NameError:
# (True, False) = (1, 0)
__all__ = ['TextWrapper', 'wrap', 'fill', 'dedent']
# Hardcode the recognized whitespace characters to the US-ASCII
# whitespace characters. The main reason for doing this is that in
# ISO-8859-1, 0xa0 is non-breaking whitespace, so in certain locales
# that character winds up in string.whitespace. Respecting
# string.whitespace in those cases would 1) make textwrap treat 0xa0 the
# same as any other whitespace char, which is clearly wrong (it's a
# *non-breaking* space), 2) possibly cause problems with Unicode,
# since 0xa0 is not in range(128).
_whitespace = '\t\n\x0b\x0c\r '
class TextWrapper:
"""
Object for wrapping/filling text. The public interface consists of
the wrap() and fill() methods; the other methods are just there for
subclasses to override in order to tweak the default behaviour.
If you want to completely replace the main wrapping algorithm,
you'll probably have to override _wrap_chunks().
Several instance attributes control various aspects of wrapping:
width (default: 70)
the maximum width of wrapped lines (unless break_long_words
is false)
initial_indent (default: "")
string that will be prepended to the first line of wrapped
output. Counts towards the line's width.
subsequent_indent (default: "")
string that will be prepended to all lines save the first
of wrapped output; also counts towards each line's width.
expand_tabs (default: true)
Expand tabs in input text to spaces before further processing.
Each tab will become 1 .. 8 spaces, depending on its position in
its line. If false, each tab is treated as a single character.
replace_whitespace (default: true)
Replace all whitespace characters in the input text by spaces
after tab expansion. Note that if expand_tabs is false and
replace_whitespace is true, every tab will be converted to a
single space!
fix_sentence_endings (default: false)
Ensure that sentence-ending punctuation is always followed
by two spaces. Off by default because the algorithm is
(unavoidably) imperfect.
break_long_words (default: true)
Break words longer than 'width'. If false, those words will not
be broken, and some lines might be longer than 'width'.
break_on_hyphens (default: true)
Allow breaking hyphenated words. If true, wrapping will occur
preferably on whitespaces and right after hyphens part of
compound words.
drop_whitespace (default: true)
Drop leading and trailing whitespace from lines.
"""
whitespace_trans = string.maketrans(_whitespace, ' ' * len(_whitespace))
unicode_whitespace_trans = {}
uspace = ord(u' ')
for x in map(ord, _whitespace):
unicode_whitespace_trans[x] = uspace
# This funky little regex is just the trick for splitting
# text up into word-wrappable chunks. E.g.
# "Hello there -- you goof-ball, use the -b option!"
# splits into
# Hello/ /there/ /--/ /you/ /goof-/ball,/ /use/ /the/ /-b/ /option!
# (after stripping out empty strings).
wordsep_re = re.compile(
r'(\s+|' # any whitespace
r'[^\s\w]*\w+[^0-9\W]-(?=\w+[^0-9\W])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
# This less funky little regex just split on recognized spaces. E.g.
# "Hello there -- you goof-ball, use the -b option!"
# splits into
# Hello/ /there/ /--/ /you/ /goof-ball,/ /use/ /the/ /-b/ /option!/
wordsep_simple_re = re.compile(r'(\s+)')
# XXX this is not locale- or charset-aware -- string.lowercase
# is US-ASCII only (and therefore English-only)
sentence_end_re = re.compile(r'[%s]' # lowercase letter
r'[\.\!\?]' # sentence-ending punct.
r'[\"\']?' # optional end-of-quote
r'\Z' # end of chunk
% string.lowercase)
def __init__(self,
width=70,
initial_indent="",
subsequent_indent="",
expand_tabs=True,
replace_whitespace=True,
fix_sentence_endings=False,
break_long_words=True,
drop_whitespace=True,
break_on_hyphens=True):
self.width = width
self.initial_indent = initial_indent
self.subsequent_indent = subsequent_indent
self.expand_tabs = expand_tabs
self.replace_whitespace = replace_whitespace
self.fix_sentence_endings = fix_sentence_endings
self.break_long_words = break_long_words
self.drop_whitespace = drop_whitespace
self.break_on_hyphens = break_on_hyphens
# recompile the regexes for Unicode mode -- done in this clumsy way for
# backwards compatibility because it's rather common to monkey-patch
# the TextWrapper class' wordsep_re attribute.
self.wordsep_re_uni = re.compile(self.wordsep_re.pattern, re.U)
self.wordsep_simple_re_uni = re.compile(
self.wordsep_simple_re.pattern, re.U)
# -- Private methods -----------------------------------------------
# (possibly useful for subclasses to override)
def _munge_whitespace(self, text):
"""_munge_whitespace(text : string) -> string
Munge whitespace in text: expand tabs and convert all other
whitespace characters to spaces. Eg. " foo\\tbar\\n\\nbaz"
becomes " foo bar baz".
"""
if self.expand_tabs:
text = text.expandtabs()
if self.replace_whitespace:
if isinstance(text, str):
text = text.translate(self.whitespace_trans)
elif isinstance(text, _unicode):
text = text.translate(self.unicode_whitespace_trans)
return text
def _split(self, text):
"""_split(text : string) -> [string]
Split the text to wrap into indivisible chunks. Chunks are
not quite the same as words; see _wrap_chunks() for full
details. As an example, the text
Look, goof-ball -- use the -b option!
breaks into the following chunks:
'Look,', ' ', 'goof-', 'ball', ' ', '--', ' ',
'use', ' ', 'the', ' ', '-b', ' ', 'option!'
if break_on_hyphens is True, or in:
'Look,', ' ', 'goof-ball', ' ', '--', ' ',
'use', ' ', 'the', ' ', '-b', ' ', option!'
otherwise.
"""
if isinstance(text, _unicode):
if self.break_on_hyphens:
pat = self.wordsep_re_uni
else:
pat = self.wordsep_simple_re_uni
else:
if self.break_on_hyphens:
pat = self.wordsep_re
else:
pat = self.wordsep_simple_re
chunks = pat.split(text)
chunks = filter(None, chunks) # remove empty chunks
return chunks
def _fix_sentence_endings(self, chunks):
"""_fix_sentence_endings(chunks : [string])
Correct for sentence endings buried in 'chunks'. Eg. when the
original text contains "... foo.\\nBar ...", munge_whitespace()
and split() will convert that to [..., "foo.", " ", "Bar", ...]
which has one too few spaces; this method simply changes the one
space to two.
"""
i = 0
patsearch = self.sentence_end_re.search
while i < len(chunks)-1:
if chunks[i+1] == " " and patsearch(chunks[i]):
chunks[i+1] = " "
i += 2
else:
i += 1
def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
"""_handle_long_word(chunks : [string],
cur_line : [string],
cur_len : int, width : int)
Handle a chunk of text (most likely a word, not whitespace) that
is too long to fit in any line.
"""
# Figure out when indent is larger than the specified width, and make
# sure at least one character is stripped off on every pass
if width < 1:
space_left = 1
else:
space_left = width - cur_len
# If we're allowed to break long words, then do so: put as much
# of the next chunk onto the current line as will fit.
if self.break_long_words:
cur_line.append(reversed_chunks[-1][:space_left])
reversed_chunks[-1] = reversed_chunks[-1][space_left:]
# Otherwise, we have to preserve the long word intact. Only add
# it to the current line if there's nothing already there --
# that minimizes how much we violate the width constraint.
elif not cur_line:
cur_line.append(reversed_chunks.pop())
# If we're not allowed to break long words, and there's already
# text on the current line, do nothing. Next time through the
# main loop of _wrap_chunks(), we'll wind up here again, but
# cur_len will be zero, so the next line will be entirely
# devoted to the long word that we can't handle right now.
def _wrap_chunks(self, chunks):
"""_wrap_chunks(chunks : [string]) -> [string]
Wrap a sequence of text chunks and return a list of lines of
length 'self.width' or less. (If 'break_long_words' is false,
some lines may be longer than this.) Chunks correspond roughly
to words and the whitespace between them: each chunk is
indivisible (modulo 'break_long_words'), but a line break can
come between any two chunks. Chunks should not have internal
whitespace; ie. a chunk is either all whitespace or a "word".
Whitespace chunks will be removed from the beginning and end of
lines, but apart from that whitespace is preserved.
"""
lines = []
if self.width <= 0:
raise ValueError("invalid width %r (must be > 0)" % self.width)
# Arrange in reverse order so items can be efficiently popped
# from a stack of chucks.
chunks.reverse()
while chunks:
# Start the list of chunks that will make up the current line.
# cur_len is just the length of all the chunks in cur_line.
cur_line = []
cur_len = 0
# Figure out which static string will prefix this line.
if lines:
indent = self.subsequent_indent
else:
indent = self.initial_indent
# Maximum width for this line.
width = self.width - len(indent)
# First chunk on line is whitespace -- drop it, unless this
# is the very beginning of the text (ie. no lines started yet).
if self.drop_whitespace and chunks[-1].strip() == '' and lines:
del chunks[-1]
while chunks:
l = len(chunks[-1])
# Can at least squeeze this chunk onto the current line.
if cur_len + l <= width:
cur_line.append(chunks.pop())
cur_len += l
# Nope, this line is full.
else:
break
# The current line is full, and the next chunk is too big to
# fit on *any* line (not just this one).
if chunks and len(chunks[-1]) > width:
self._handle_long_word(chunks, cur_line, cur_len, width)
# If the last chunk on this line is all whitespace, drop it.
if self.drop_whitespace and cur_line and cur_line[-1].strip() == '':
del cur_line[-1]
# Convert current line back to a string and store it in list
# of all lines (return value).
if cur_line:
lines.append(indent + ''.join(cur_line))
return lines
# -- Public interface ----------------------------------------------
def wrap(self, text):
"""wrap(text : string) -> [string]
Reformat the single paragraph in 'text' so it fits in lines of
no more than 'self.width' columns, and return a list of wrapped
lines. Tabs in 'text' are expanded with string.expandtabs(),
and all other whitespace characters (including newline) are
converted to space.
"""
text = self._munge_whitespace(text)
chunks = self._split(text)
if self.fix_sentence_endings:
self._fix_sentence_endings(chunks)
return self._wrap_chunks(chunks)
def fill(self, text):
"""fill(text : string) -> string
Reformat the single paragraph in 'text' to fit in lines of no
more than 'self.width' columns, and return a new string
containing the entire wrapped paragraph.
"""
return "\n".join(self.wrap(text))
# -- Convenience interface ---------------------------------------------
def wrap(text, width=70, **kwargs):
"""Wrap a single paragraph of text, returning a list of wrapped lines.
Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines. By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space. See TextWrapper class for available keyword args to customize
wrapping behaviour.
"""
w = TextWrapper(width=width, **kwargs)
return w.wrap(text)
def fill(text, width=70, **kwargs):
"""Fill a single paragraph of text, returning a new string.
Reformat the single paragraph in 'text' to fit in lines of no more
than 'width' columns, and return a new string containing the entire
wrapped paragraph. As with wrap(), tabs are expanded and other
whitespace characters converted to space. See TextWrapper class for
available keyword args to customize wrapping behaviour.
"""
w = TextWrapper(width=width, **kwargs)
return w.fill(text)
# -- Loosely related functionality -------------------------------------
_whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)
def dedent(text):
"""Remove any common leading whitespace from every line in `text`.
This can be used to make triple-quoted strings line up with the left
edge of the display, while still presenting them in the source code
in indented form.
Note that tabs and spaces are both treated as whitespace, but they
are not equal: the lines " hello" and "\\thello" are
considered to have no common leading whitespace. (This behaviour is
new in Python 2.5; older versions of this module incorrectly
expanded tabs before searching for common leading whitespace.)
Entirely blank lines are normalized to a newline character.
"""
# Look for the longest leading string of spaces and tabs common to
# all lines.
margin = None
text = _whitespace_only_re.sub('', text)
indents = _leading_whitespace_re.findall(text)
for indent in indents:
if margin is None:
margin = indent
# Current line more deeply indented than previous winner:
# no change (previous winner is still on top).
elif indent.startswith(margin):
pass
# Current line consistent with and no deeper than previous winner:
# it's the new winner.
elif margin.startswith(indent):
margin = indent
# Find the largest common whitespace between current line and previous
# winner.
else:
for i, (x, y) in enumerate(zip(margin, indent)):
if x != y:
margin = margin[:i]
break
else:
margin = margin[:len(indent)]
# sanity check (testing/debugging only)
if 0 and margin:
for line in text.split("\n"):
assert not line or line.startswith(margin), \
"line = %r, margin = %r" % (line, margin)
if margin:
text = re.sub(r'(?m)^' + margin, '', text)
return text
if __name__ == "__main__":
#print dedent("\tfoo\n\tbar")
#print dedent(" \thello there\n \t how are you?")
print dedent("Hello there.\n This is indented.")
s = """Gur Mra bs Clguba, ol Gvz Crgref
Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr.
Ernqnovyvgl pbhagf.
Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.
Nygubhtu cenpgvpnyvgl orngf chevgl.
Reebef fubhyq arire cnff fvyragyl.
Hayrff rkcyvpvgyl fvyraprq.
Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.
Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.
Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.
Abj vf orggre guna arire.
Nygubhtu arire vf bsgra orggre guna *evtug* abj.
Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.
Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.
Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!"""
d = {}
for c in (65, 97):
for i in range(26):
d[chr(i+c)] = chr((i+13) % 26 + c)
print "".join([d.get(c, c) for c in s])
"""Thread module emulating a subset of Java's threading model."""
import sys as _sys
try:
import thread
except ImportError:
del _sys.modules[__name__]
raise
import warnings
from collections import deque as _deque
from itertools import count as _count
from time import time as _time, sleep as _sleep
from traceback import format_exc as _format_exc
# Note regarding PEP 8 compliant aliases
# This threading model was originally inspired by Java, and inherited
# the convention of camelCase function and method names from that
# language. While those names are not in any imminent danger of being
# deprecated, starting with Python 2.6, the module now provides a
# PEP 8 compliant alias for any such method name.
# Using the new PEP 8 compliant names also facilitates substitution
# with the multiprocessing module, which doesn't provide the old
# Java inspired names.
# Rename some stuff so "from threading import *" is safe
__all__ = ['activeCount', 'active_count', 'Condition', 'currentThread',
'current_thread', 'enumerate', 'Event',
'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Thread',
'Timer', 'setprofile', 'settrace', 'local', 'stack_size']
_start_new_thread = thread.start_new_thread
_allocate_lock = thread.allocate_lock
_get_ident = thread.get_ident
ThreadError = thread.error
del thread
# sys.exc_clear is used to work around the fact that except blocks
# don't fully clear the exception until 3.0.
warnings.filterwarnings('ignore', category=DeprecationWarning,
module='threading', message='sys.exc_clear')
# Debug support (adapted from ihooks.py).
# All the major classes here derive from _Verbose. We force that to
# be a new-style class so that all the major classes here are new-style.
# This helps debugging (type(instance) is more revealing for instances
# of new-style classes).
_VERBOSE = False
if __debug__:
class _Verbose(object):
def __init__(self, verbose=None):
if verbose is None:
verbose = _VERBOSE
self.__verbose = verbose
def _note(self, format, *args):
if self.__verbose:
format = format % args
# Issue #4188: calling current_thread() can incur an infinite
# recursion if it has to create a DummyThread on the fly.
ident = _get_ident()
try:
name = _active[ident].name
except KeyError:
name = "<OS thread %d>" % ident
format = "%s: %s\n" % (name, format)
_sys.stderr.write(format)
else:
# Disable this when using "python -O"
class _Verbose(object):
def __init__(self, verbose=None):
pass
def _note(self, *args):
pass
# Support for profile and trace hooks
_profile_hook = None
_trace_hook = None
def setprofile(func):
"""Set a profile function for all threads started from the threading module.
The func will be passed to sys.setprofile() for each thread, before its
run() method is called.
"""
global _profile_hook
_profile_hook = func
def settrace(func):
"""Set a trace function for all threads started from the threading module.
The func will be passed to sys.settrace() for each thread, before its run()
method is called.
"""
global _trace_hook
_trace_hook = func
# Synchronization classes
Lock = _allocate_lock
def RLock(*args, **kwargs):
"""Factory function that returns a new reentrant lock.
A reentrant lock must be released by the thread that acquired it. Once a
thread has acquired a reentrant lock, the same thread may acquire it again
without blocking; the thread must release it once for each time it has
acquired it.
"""
return _RLock(*args, **kwargs)
class _RLock(_Verbose):
"""A reentrant lock must be released by the thread that acquired it. Once a
thread has acquired a reentrant lock, the same thread may acquire it
again without blocking; the thread must release it once for each time it
has acquired it.
"""
def __init__(self, verbose=None):
_Verbose.__init__(self, verbose)
self.__block = _allocate_lock()
self.__owner = None
self.__count = 0
def __repr__(self):
owner = self.__owner
try:
owner = _active[owner].name
except KeyError:
pass
return "<%s owner=%r count=%d>" % (
self.__class__.__name__, owner, self.__count)
def acquire(self, blocking=1):
"""Acquire a lock, blocking or non-blocking.
When invoked without arguments: if this thread already owns the lock,
increment the recursion level by one, and return immediately. Otherwise,
if another thread owns the lock, block until the lock is unlocked. Once
the lock is unlocked (not owned by any thread), then grab ownership, set
the recursion level to one, and return. If more than one thread is
blocked waiting until the lock is unlocked, only one at a time will be
able to grab ownership of the lock. There is no return value in this
case.
When invoked with the blocking argument set to true, do the same thing
as when called without arguments, and return true.
When invoked with the blocking argument set to false, do not block. If a
call without an argument would block, return false immediately;
otherwise, do the same thing as when called without arguments, and
return true.
"""
me = _get_ident()
if self.__owner == me:
self.__count = self.__count + 1
if __debug__:
self._note("%s.acquire(%s): recursive success", self, blocking)
return 1
rc = self.__block.acquire(blocking)
if rc:
self.__owner = me
self.__count = 1
if __debug__:
self._note("%s.acquire(%s): initial success", self, blocking)
else:
if __debug__:
self._note("%s.acquire(%s): failure", self, blocking)
return rc
__enter__ = acquire
def release(self):
"""Release a lock, decrementing the recursion level.
If after the decrement it is zero, reset the lock to unlocked (not owned
by any thread), and if any other threads are blocked waiting for the
lock to become unlocked, allow exactly one of them to proceed. If after
the decrement the recursion level is still nonzero, the lock remains
locked and owned by the calling thread.
Only call this method when the calling thread owns the lock. A
RuntimeError is raised if this method is called when the lock is
unlocked.
There is no return value.
"""
if self.__owner != _get_ident():
raise RuntimeError("cannot release un-acquired lock")
self.__count = count = self.__count - 1
if not count:
self.__owner = None
self.__block.release()
if __debug__:
self._note("%s.release(): final release", self)
else:
if __debug__:
self._note("%s.release(): non-final release", self)
def __exit__(self, t, v, tb):
self.release()
# Internal methods used by condition variables
def _acquire_restore(self, count_owner):
count, owner = count_owner
self.__block.acquire()
self.__count = count
self.__owner = owner
if __debug__:
self._note("%s._acquire_restore()", self)
def _release_save(self):
if __debug__:
self._note("%s._release_save()", self)
count = self.__count
self.__count = 0
owner = self.__owner
self.__owner = None
self.__block.release()
return (count, owner)
def _is_owned(self):
return self.__owner == _get_ident()
def Condition(*args, **kwargs):
"""Factory function that returns a new condition variable object.
A condition variable allows one or more threads to wait until they are
notified by another thread.
If the lock argument is given and not None, it must be a Lock or RLock
object, and it is used as the underlying lock. Otherwise, a new RLock object
is created and used as the underlying lock.
"""
return _Condition(*args, **kwargs)
class _Condition(_Verbose):
"""Condition variables allow one or more threads to wait until they are
notified by another thread.
"""
def __init__(self, lock=None, verbose=None):
_Verbose.__init__(self, verbose)
if lock is None:
lock = RLock()
self.__lock = lock
# Export the lock's acquire() and release() methods
self.acquire = lock.acquire
self.release = lock.release
# If the lock defines _release_save() and/or _acquire_restore(),
# these override the default implementations (which just call
# release() and acquire() on the lock). Ditto for _is_owned().
try:
self._release_save = lock._release_save
except AttributeError:
pass
try:
self._acquire_restore = lock._acquire_restore
except AttributeError:
pass
try:
self._is_owned = lock._is_owned
except AttributeError:
pass
self.__waiters = []
def __enter__(self):
return self.__lock.__enter__()
def __exit__(self, *args):
return self.__lock.__exit__(*args)
def __repr__(self):
return "<Condition(%s, %d)>" % (self.__lock, len(self.__waiters))
def _release_save(self):
self.__lock.release() # No state to save
def _acquire_restore(self, x):
self.__lock.acquire() # Ignore saved state
def _is_owned(self):
# Return True if lock is owned by current_thread.
# This method is called only if __lock doesn't have _is_owned().
if self.__lock.acquire(0):
self.__lock.release()
return False
else:
return True
def wait(self, timeout=None):
"""Wait until notified or until a timeout occurs.
If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.
This method releases the underlying lock, and then blocks until it is
awakened by a notify() or notifyAll() call for the same condition
variable in another thread, or until the optional timeout occurs. Once
awakened or timed out, it re-acquires the lock and returns.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
When the underlying lock is an RLock, it is not released using its
release() method, since this may not actually unlock the lock when it
was acquired multiple times recursively. Instead, an internal interface
of the RLock class is used, which really unlocks it even when it has
been recursively acquired several times. Another internal interface is
then used to restore the recursion level when the lock is reacquired.
"""
if not self._is_owned():
raise RuntimeError("cannot wait on un-acquired lock")
waiter = _allocate_lock()
waiter.acquire()
self.__waiters.append(waiter)
saved_state = self._release_save()
try: # restore state no matter what (e.g., KeyboardInterrupt)
if timeout is None:
waiter.acquire()
if __debug__:
self._note("%s.wait(): got it", self)
else:
# Balancing act: We can't afford a pure busy loop, so we
# have to sleep; but if we sleep the whole timeout time,
# we'll be unresponsive. The scheme here sleeps very
# little at first, longer as time goes on, but never longer
# than 20 times per second (or the timeout time remaining).
endtime = _time() + timeout
delay = 0.0005 # 500 us -> initial delay of 1 ms
while True:
gotit = waiter.acquire(0)
if gotit:
break
remaining = endtime - _time()
if remaining <= 0:
break
delay = min(delay * 2, remaining, .05)
_sleep(delay)
if not gotit:
if __debug__:
self._note("%s.wait(%s): timed out", self, timeout)
try:
self.__waiters.remove(waiter)
except ValueError:
pass
else:
if __debug__:
self._note("%s.wait(%s): got it", self, timeout)
finally:
self._acquire_restore(saved_state)
def notify(self, n=1):
"""Wake up one or more threads waiting on this condition, if any.
If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.
This method wakes up at most n of the threads waiting for the condition
variable; it is a no-op if no threads are waiting.
"""
if not self._is_owned():
raise RuntimeError("cannot notify on un-acquired lock")
__waiters = self.__waiters
waiters = __waiters[:n]
if not waiters:
if __debug__:
self._note("%s.notify(): no waiters", self)
return
self._note("%s.notify(): notifying %d waiter%s", self, n,
n!=1 and "s" or "")
for waiter in waiters:
waiter.release()
try:
__waiters.remove(waiter)
except ValueError:
pass
def notifyAll(self):
"""Wake up all threads waiting on this condition.
If the calling thread has not acquired the lock when this method
is called, a RuntimeError is raised.
"""
self.notify(len(self.__waiters))
notify_all = notifyAll
def Semaphore(*args, **kwargs):
"""A factory function that returns a new semaphore.
Semaphores manage a counter representing the number of release() calls minus
the number of acquire() calls, plus an initial value. The acquire() method
blocks if necessary until it can return without making the counter
negative. If not given, value defaults to 1.
"""
return _Semaphore(*args, **kwargs)
class _Semaphore(_Verbose):
"""Semaphores manage a counter representing the number of release() calls
minus the number of acquire() calls, plus an initial value. The acquire()
method blocks if necessary until it can return without making the counter
negative. If not given, value defaults to 1.
"""
# After Tim Peters' semaphore class, but not quite the same (no maximum)
def __init__(self, value=1, verbose=None):
if value < 0:
raise ValueError("semaphore initial value must be >= 0")
_Verbose.__init__(self, verbose)
self.__cond = Condition(Lock())
self.__value = value
def acquire(self, blocking=1):
"""Acquire a semaphore, decrementing the internal counter by one.
When invoked without arguments: if the internal counter is larger than
zero on entry, decrement it by one and return immediately. If it is zero
on entry, block, waiting until some other thread has called release() to
make it larger than zero. This is done with proper interlocking so that
if multiple acquire() calls are blocked, release() will wake exactly one
of them up. The implementation may pick one at random, so the order in
which blocked threads are awakened should not be relied on. There is no
return value in this case.
When invoked with blocking set to true, do the same thing as when called
without arguments, and return true.
When invoked with blocking set to false, do not block. If a call without
an argument would block, return false immediately; otherwise, do the
same thing as when called without arguments, and return true.
"""
rc = False
with self.__cond:
while self.__value == 0:
if not blocking:
break
if __debug__:
self._note("%s.acquire(%s): blocked waiting, value=%s",
self, blocking, self.__value)
self.__cond.wait()
else:
self.__value = self.__value - 1
if __debug__:
self._note("%s.acquire: success, value=%s",
self, self.__value)
rc = True
return rc
__enter__ = acquire
def release(self):
"""Release a semaphore, incrementing the internal counter by one.
When the counter is zero on entry and another thread is waiting for it
to become larger than zero again, wake up that thread.
"""
with self.__cond:
self.__value = self.__value + 1
if __debug__:
self._note("%s.release: success, value=%s",
self, self.__value)
self.__cond.notify()
def __exit__(self, t, v, tb):
self.release()
def BoundedSemaphore(*args, **kwargs):
"""A factory function that returns a new bounded semaphore.
A bounded semaphore checks to make sure its current value doesn't exceed its
initial value. If it does, ValueError is raised. In most situations
semaphores are used to guard resources with limited capacity.
If the semaphore is released too many times it's a sign of a bug. If not
given, value defaults to 1.
Like regular semaphores, bounded semaphores manage a counter representing
the number of release() calls minus the number of acquire() calls, plus an
initial value. The acquire() method blocks if necessary until it can return
without making the counter negative. If not given, value defaults to 1.
"""
return _BoundedSemaphore(*args, **kwargs)
class _BoundedSemaphore(_Semaphore):
"""A bounded semaphore checks to make sure its current value doesn't exceed
its initial value. If it does, ValueError is raised. In most situations
semaphores are used to guard resources with limited capacity.
"""
def __init__(self, value=1, verbose=None):
_Semaphore.__init__(self, value, verbose)
self._initial_value = value
def release(self):
"""Release a semaphore, incrementing the internal counter by one.
When the counter is zero on entry and another thread is waiting for it
to become larger than zero again, wake up that thread.
If the number of releases exceeds the number of acquires,
raise a ValueError.
"""
with self._Semaphore__cond:
if self._Semaphore__value >= self._initial_value:
raise ValueError("Semaphore released too many times")
self._Semaphore__value += 1
self._Semaphore__cond.notify()
def Event(*args, **kwargs):
"""A factory function that returns a new event.
Events manage a flag that can be set to true with the set() method and reset
to false with the clear() method. The wait() method blocks until the flag is
true.
"""
return _Event(*args, **kwargs)
class _Event(_Verbose):
"""A factory function that returns a new event object. An event manages a
flag that can be set to true with the set() method and reset to false
with the clear() method. The wait() method blocks until the flag is true.
"""
# After Tim Peters' event class (without is_posted())
def __init__(self, verbose=None):
_Verbose.__init__(self, verbose)
self.__cond = Condition(Lock())
self.__flag = False
def _reset_internal_locks(self):
# private! called by Thread._reset_internal_locks by _after_fork()
self.__cond.__init__(Lock())
def isSet(self):
'Return true if and only if the internal flag is true.'
return self.__flag
is_set = isSet
def set(self):
"""Set the internal flag to true.
All threads waiting for the flag to become true are awakened. Threads
that call wait() once the flag is true will not block at all.
"""
with self.__cond:
self.__flag = True
self.__cond.notify_all()
def clear(self):
"""Reset the internal flag to false.
Subsequently, threads calling wait() will block until set() is called to
set the internal flag to true again.
"""
with self.__cond:
self.__flag = False
def wait(self, timeout=None):
"""Block until the internal flag is true.
If the internal flag is true on entry, return immediately. Otherwise,
block until another thread calls set() to set the flag to true, or until
the optional timeout occurs.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
This method returns the internal flag on exit, so it will always return
True except if a timeout is given and the operation times out.
"""
with self.__cond:
if not self.__flag:
self.__cond.wait(timeout)
return self.__flag
# Helper to generate new thread names
_counter = _count().next
_counter() # Consume 0 so first non-main thread has id 1.
def _newname(template="Thread-%d"):
return template % _counter()
# Active thread administration
_active_limbo_lock = _allocate_lock()
_active = {} # maps thread id to Thread object
_limbo = {}
# Main class for threads
class Thread(_Verbose):
"""A class that represents a thread of control.
This class can be safely subclassed in a limited fashion.
"""
__initialized = False
# Need to store a reference to sys.exc_info for printing
# out exceptions when a thread tries to use a global var. during interp.
# shutdown and thus raises an exception about trying to perform some
# operation on/with a NoneType
__exc_info = _sys.exc_info
# Keep sys.exc_clear too to clear the exception just before
# allowing .join() to return.
__exc_clear = _sys.exc_clear
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
"""This constructor should always be called with keyword arguments. Arguments are:
*group* should be None; reserved for future extension when a ThreadGroup
class is implemented.
*target* is the callable object to be invoked by the run()
method. Defaults to None, meaning nothing is called.
*name* is the thread name. By default, a unique name is constructed of
the form "Thread-N" where N is a small decimal number.
*args* is the argument tuple for the target invocation. Defaults to ().
*kwargs* is a dictionary of keyword arguments for the target
invocation. Defaults to {}.
If a subclass overrides the constructor, it must make sure to invoke
the base class constructor (Thread.__init__()) before doing anything
else to the thread.
"""
assert group is None, "group argument must be None for now"
_Verbose.__init__(self, verbose)
if kwargs is None:
kwargs = {}
self.__target = target
self.__name = str(name or _newname())
self.__args = args
self.__kwargs = kwargs
self.__daemonic = self._set_daemon()
self.__ident = None
self.__started = Event()
self.__stopped = False
self.__block = Condition(Lock())
self.__initialized = True
# sys.stderr is not stored in the class like
# sys.exc_info since it can be changed between instances
self.__stderr = _sys.stderr
def _reset_internal_locks(self):
# private! Called by _after_fork() to reset our internal locks as
# they may be in an invalid state leading to a deadlock or crash.
if hasattr(self, '_Thread__block'): # DummyThread deletes self.__block
self.__block.__init__()
self.__started._reset_internal_locks()
@property
def _block(self):
# used by a unittest
return self.__block
def _set_daemon(self):
# Overridden in _MainThread and _DummyThread
return current_thread().daemon
def __repr__(self):
assert self.__initialized, "Thread.__init__() was not called"
status = "initial"
if self.__started.is_set():
status = "started"
if self.__stopped:
status = "stopped"
if self.__daemonic:
status += " daemon"
if self.__ident is not None:
status += " %s" % self.__ident
return "<%s(%s, %s)>" % (self.__class__.__name__, self.__name, status)
def start(self):
"""Start the thread's activity.
It must be called at most once per thread object. It arranges for the
object's run() method to be invoked in a separate thread of control.
This method will raise a RuntimeError if called more than once on the
same thread object.
"""
if not self.__initialized:
raise RuntimeError("thread.__init__() not called")
if self.__started.is_set():
raise RuntimeError("threads can only be started once")
if __debug__:
self._note("%s.start(): starting thread", self)
with _active_limbo_lock:
_limbo[self] = self
try:
_start_new_thread(self.__bootstrap, ())
except Exception:
with _active_limbo_lock:
del _limbo[self]
raise
self.__started.wait()
def run(self):
"""Method representing the thread's activity.
You may override this method in a subclass. The standard run() method
invokes the callable object passed to the object's constructor as the
target argument, if any, with sequential and keyword arguments taken
from the args and kwargs arguments, respectively.
"""
try:
if self.__target:
self.__target(*self.__args, **self.__kwargs)
finally:
# Avoid a refcycle if the thread is running a function with
# an argument that has a member that points to the thread.
del self.__target, self.__args, self.__kwargs
def __bootstrap(self):
# Wrapper around the real bootstrap code that ignores
# exceptions during interpreter cleanup. Those typically
# happen when a daemon thread wakes up at an unfortunate
# moment, finds the world around it destroyed, and raises some
# random exception *** while trying to report the exception in
# __bootstrap_inner() below ***. Those random exceptions
# don't help anybody, and they confuse users, so we suppress
# them. We suppress them only when it appears that the world
# indeed has already been destroyed, so that exceptions in
# __bootstrap_inner() during normal business hours are properly
# reported. Also, we only suppress them for daemonic threads;
# if a non-daemonic encounters this, something else is wrong.
try:
self.__bootstrap_inner()
except:
if self.__daemonic and _sys is None:
return
raise
def _set_ident(self):
self.__ident = _get_ident()
def __bootstrap_inner(self):
try:
self._set_ident()
self.__started.set()
with _active_limbo_lock:
_active[self.__ident] = self
del _limbo[self]
if __debug__:
self._note("%s.__bootstrap(): thread started", self)
if _trace_hook:
self._note("%s.__bootstrap(): registering trace hook", self)
_sys.settrace(_trace_hook)
if _profile_hook:
self._note("%s.__bootstrap(): registering profile hook", self)
_sys.setprofile(_profile_hook)
try:
self.run()
except SystemExit:
if __debug__:
self._note("%s.__bootstrap(): raised SystemExit", self)
except:
if __debug__:
self._note("%s.__bootstrap(): unhandled exception", self)
# If sys.stderr is no more (most likely from interpreter
# shutdown) use self.__stderr. Otherwise still use sys (as in
# _sys) in case sys.stderr was redefined since the creation of
# self.
if _sys and _sys.stderr is not None:
print>>_sys.stderr, ("Exception in thread %s:\n%s" %
(self.name, _format_exc()))
elif self.__stderr is not None:
# Do the best job possible w/o a huge amt. of code to
# approximate a traceback (code ideas from
# Lib/traceback.py)
exc_type, exc_value, exc_tb = self.__exc_info()
try:
print>>self.__stderr, (
"Exception in thread " + self.name +
" (most likely raised during interpreter shutdown):")
print>>self.__stderr, (
"Traceback (most recent call last):")
while exc_tb:
print>>self.__stderr, (
' File "%s", line %s, in %s' %
(exc_tb.tb_frame.f_code.co_filename,
exc_tb.tb_lineno,
exc_tb.tb_frame.f_code.co_name))
exc_tb = exc_tb.tb_next
print>>self.__stderr, ("%s: %s" % (exc_type, exc_value))
# Make sure that exc_tb gets deleted since it is a memory
# hog; deleting everything else is just for thoroughness
finally:
del exc_type, exc_value, exc_tb
else:
if __debug__:
self._note("%s.__bootstrap(): normal return", self)
finally:
# Prevent a race in
# test_threading.test_no_refcycle_through_target when
# the exception keeps the target alive past when we
# assert that it's dead.
self.__exc_clear()
finally:
with _active_limbo_lock:
self.__stop()
try:
# We don't call self.__delete() because it also
# grabs _active_limbo_lock.
del _active[_get_ident()]
except:
pass
def __stop(self):
# DummyThreads delete self.__block, but they have no waiters to
# notify anyway (join() is forbidden on them).
if not hasattr(self, '_Thread__block'):
return
self.__block.acquire()
self.__stopped = True
self.__block.notify_all()
self.__block.release()
def __delete(self):
"Remove current thread from the dict of currently running threads."
# Notes about running with dummy_thread:
#
# Must take care to not raise an exception if dummy_thread is being
# used (and thus this module is being used as an instance of
# dummy_threading). dummy_thread.get_ident() always returns -1 since
# there is only one thread if dummy_thread is being used. Thus
# len(_active) is always <= 1 here, and any Thread instance created
# overwrites the (if any) thread currently registered in _active.
#
# An instance of _MainThread is always created by 'threading'. This
# gets overwritten the instant an instance of Thread is created; both
# threads return -1 from dummy_thread.get_ident() and thus have the
# same key in the dict. So when the _MainThread instance created by
# 'threading' tries to clean itself up when atexit calls this method
# it gets a KeyError if another Thread instance was created.
#
# This all means that KeyError from trying to delete something from
# _active if dummy_threading is being used is a red herring. But
# since it isn't if dummy_threading is *not* being used then don't
# hide the exception.
try:
with _active_limbo_lock:
del _active[_get_ident()]
# There must not be any python code between the previous line
# and after the lock is released. Otherwise a tracing function
# could try to acquire the lock again in the same thread, (in
# current_thread()), and would block.
except KeyError:
if 'dummy_threading' not in _sys.modules:
raise
def join(self, timeout=None):
"""Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is
called terminates -- either normally or through an unhandled exception
or until the optional timeout occurs.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof). As join() always returns None, you must call
isAlive() after join() to decide whether a timeout happened -- if the
thread is still alive, the join() call timed out.
When the timeout argument is not present or None, the operation will
block until the thread terminates.
A thread can be join()ed many times.
join() raises a RuntimeError if an attempt is made to join the current
thread as that would cause a deadlock. It is also an error to join() a
thread before it has been started and attempts to do so raises the same
exception.
"""
if not self.__initialized:
raise RuntimeError("Thread.__init__() not called")
if not self.__started.is_set():
raise RuntimeError("cannot join thread before it is started")
if self is current_thread():
raise RuntimeError("cannot join current thread")
if __debug__:
if not self.__stopped:
self._note("%s.join(): waiting until thread stops", self)
self.__block.acquire()
try:
if timeout is None:
while not self.__stopped:
self.__block.wait()
if __debug__:
self._note("%s.join(): thread stopped", self)
else:
deadline = _time() + timeout
while not self.__stopped:
delay = deadline - _time()
if delay <= 0:
if __debug__:
self._note("%s.join(): timed out", self)
break
self.__block.wait(delay)
else:
if __debug__:
self._note("%s.join(): thread stopped", self)
finally:
self.__block.release()
@property
def name(self):
"""A string used for identification purposes only.
It has no semantics. Multiple threads may be given the same name. The
initial name is set by the constructor.
"""
assert self.__initialized, "Thread.__init__() not called"
return self.__name
@name.setter
def name(self, name):
assert self.__initialized, "Thread.__init__() not called"
self.__name = str(name)
@property
def ident(self):
"""Thread identifier of this thread or None if it has not been started.
This is a nonzero integer. See the thread.get_ident() function. Thread
identifiers may be recycled when a thread exits and another thread is
created. The identifier is available even after the thread has exited.
"""
assert self.__initialized, "Thread.__init__() not called"
return self.__ident
def isAlive(self):
"""Return whether the thread is alive.
This method returns True just before the run() method starts until just
after the run() method terminates. The module function enumerate()
returns a list of all alive threads.
"""
assert self.__initialized, "Thread.__init__() not called"
return self.__started.is_set() and not self.__stopped
is_alive = isAlive
@property
def daemon(self):
"""A boolean value indicating whether this thread is a daemon thread (True) or not (False).
This must be set before start() is called, otherwise RuntimeError is
raised. Its initial value is inherited from the creating thread; the
main thread is not a daemon thread and therefore all threads created in
the main thread default to daemon = False.
The entire Python program exits when only daemon threads are left.
"""
assert self.__initialized, "Thread.__init__() not called"
return self.__daemonic
@daemon.setter
def daemon(self, daemonic):
if not self.__initialized:
raise RuntimeError("Thread.__init__() not called")
if self.__started.is_set():
raise RuntimeError("cannot set daemon status of active thread");
self.__daemonic = daemonic
def isDaemon(self):
return self.daemon
def setDaemon(self, daemonic):
self.daemon = daemonic
def getName(self):
return self.name
def setName(self, name):
self.name = name
# The timer class was contributed by Itamar Shtull-Trauring
def Timer(*args, **kwargs):
"""Factory function to create a Timer object.
Timers call a function after a specified number of seconds:
t = Timer(30.0, f, args=[], kwargs={})
t.start()
t.cancel() # stop the timer's action if it's still waiting
"""
return _Timer(*args, **kwargs)
class _Timer(Thread):
"""Call a function after a specified number of seconds:
t = Timer(30.0, f, args=[], kwargs={})
t.start()
t.cancel() # stop the timer's action if it's still waiting
"""
def __init__(self, interval, function, args=[], kwargs={}):
Thread.__init__(self)
self.interval = interval
self.function = function
self.args = args
self.kwargs = kwargs
self.finished = Event()
def cancel(self):
"""Stop the timer if it hasn't finished yet"""
self.finished.set()
def run(self):
self.finished.wait(self.interval)
if not self.finished.is_set():
self.function(*self.args, **self.kwargs)
self.finished.set()
# Special thread class to represent the main thread
# This is garbage collected through an exit handler
class _MainThread(Thread):
def __init__(self):
Thread.__init__(self, name="MainThread")
self._Thread__started.set()
self._set_ident()
with _active_limbo_lock:
_active[_get_ident()] = self
def _set_daemon(self):
return False
def _exitfunc(self):
self._Thread__stop()
t = _pickSomeNonDaemonThread()
if t:
if __debug__:
self._note("%s: waiting for other threads", self)
while t:
t.join()
t = _pickSomeNonDaemonThread()
if __debug__:
self._note("%s: exiting", self)
self._Thread__delete()
def _pickSomeNonDaemonThread():
for t in enumerate():
if not t.daemon and t.is_alive():
return t
return None
# Dummy thread class to represent threads not started here.
# These aren't garbage collected when they die, nor can they be waited for.
# If they invoke anything in threading.py that calls current_thread(), they
# leave an entry in the _active dict forever after.
# Their purpose is to return *something* from current_thread().
# They are marked as daemon threads so we won't wait for them
# when we exit (conform previous semantics).
class _DummyThread(Thread):
def __init__(self):
Thread.__init__(self, name=_newname("Dummy-%d"))
# Thread.__block consumes an OS-level locking primitive, which
# can never be used by a _DummyThread. Since a _DummyThread
# instance is immortal, that's bad, so release this resource.
del self._Thread__block
self._Thread__started.set()
self._set_ident()
with _active_limbo_lock:
_active[_get_ident()] = self
def _set_daemon(self):
return True
def join(self, timeout=None):
assert False, "cannot join a dummy thread"
# Global API functions
def currentThread():
"""Return the current Thread object, corresponding to the caller's thread of control.
If the caller's thread of control was not created through the threading
module, a dummy thread object with limited functionality is returned.
"""
try:
return _active[_get_ident()]
except KeyError:
##print "current_thread(): no current thread for", _get_ident()
return _DummyThread()
current_thread = currentThread
def activeCount():
"""Return the number of Thread objects currently alive.
The returned count is equal to the length of the list returned by
enumerate().
"""
with _active_limbo_lock:
return len(_active) + len(_limbo)
active_count = activeCount
def _enumerate():
# Same as enumerate(), but without the lock. Internal use only.
return _active.values() + _limbo.values()
def enumerate():
"""Return a list of all Thread objects currently alive.
The list includes daemonic threads, dummy thread objects created by
current_thread(), and the main thread. It excludes terminated threads and
threads that have not yet been started.
"""
with _active_limbo_lock:
return _active.values() + _limbo.values()
from thread import stack_size
# Create the main thread object,
# and make it available for the interpreter
# (Py_Main) as threading._shutdown.
_shutdown = _MainThread()._exitfunc
# get thread-local implementation, either from the thread
# module, or from the python fallback
try:
from thread import _local as local
except ImportError:
from _threading_local import local
def _after_fork():
# This function is called by Python/ceval.c:PyEval_ReInitThreads which
# is called from PyOS_AfterFork. Here we cleanup threading module state
# that should not exist after a fork.
# Reset _active_limbo_lock, in case we forked while the lock was held
# by another (non-forked) thread. http://bugs.python.org/issue874900
global _active_limbo_lock
_active_limbo_lock = _allocate_lock()
# fork() only copied the current thread; clear references to others.
new_active = {}
current = current_thread()
with _active_limbo_lock:
for thread in _enumerate():
# Any lock/condition variable may be currently locked or in an
# invalid state, so we reinitialize them.
if hasattr(thread, '_reset_internal_locks'):
thread._reset_internal_locks()
if thread is current:
# There is only one active thread. We reset the ident to
# its new value since it can have changed.
ident = _get_ident()
thread._Thread__ident = ident
new_active[ident] = thread
else:
# All the others are already stopped.
thread._Thread__stop()
_limbo.clear()
_active.clear()
_active.update(new_active)
assert len(_active) == 1
# Self-test code
def _test():
class BoundedQueue(_Verbose):
def __init__(self, limit):
_Verbose.__init__(self)
self.mon = RLock()
self.rc = Condition(self.mon)
self.wc = Condition(self.mon)
self.limit = limit
self.queue = _deque()
def put(self, item):
self.mon.acquire()
while len(self.queue) >= self.limit:
self._note("put(%s): queue full", item)
self.wc.wait()
self.queue.append(item)
self._note("put(%s): appended, length now %d",
item, len(self.queue))
self.rc.notify()
self.mon.release()
def get(self):
self.mon.acquire()
while not self.queue:
self._note("get(): queue empty")
self.rc.wait()
item = self.queue.popleft()
self._note("get(): got %s, %d left", item, len(self.queue))
self.wc.notify()
self.mon.release()
return item
class ProducerThread(Thread):
def __init__(self, queue, quota):
Thread.__init__(self, name="Producer")
self.queue = queue
self.quota = quota
def run(self):
from random import random
counter = 0
while counter < self.quota:
counter = counter + 1
self.queue.put("%s.%d" % (self.name, counter))
_sleep(random() * 0.00001)
class ConsumerThread(Thread):
def __init__(self, queue, count):
Thread.__init__(self, name="Consumer")
self.queue = queue
self.count = count
def run(self):
while self.count > 0:
item = self.queue.get()
print item
self.count = self.count - 1
NP = 3
QL = 4
NI = 5
Q = BoundedQueue(QL)
P = []
for i in range(NP):
t = ProducerThread(Q, NI)
t.name = ("Producer-%d" % (i+1))
P.append(t)
C = ConsumerThread(Q, NI*NP)
for t in P:
t.start()
_sleep(0.000001)
C.start()
for t in P:
t.join()
C.join()
if __name__ == '__main__':
_test()
#! /usr/bin/env python
"""Tool for measuring execution time of small code snippets.
This module avoids a number of common traps for measuring execution
times. See also Tim Peters' introduction to the Algorithms chapter in
the Python Cookbook, published by O'Reilly.
Library usage: see the Timer class.
Command line usage:
python timeit.py [-n N] [-r N] [-s S] [-t] [-c] [-h] [--] [statement]
Options:
-n/--number N: how many times to execute 'statement' (default: see below)
-r/--repeat N: how many times to repeat the timer (default 3)
-s/--setup S: statement to be executed once initially (default 'pass')
-t/--time: use time.time() (default on Unix)
-c/--clock: use time.clock() (default on Windows)
-v/--verbose: print raw timing results; repeat for more digits precision
-h/--help: print this usage message and exit
--: separate options from statement, use when statement starts with -
statement: statement to be timed (default 'pass')
A multi-line statement may be given by specifying each line as a
separate argument; indented lines are possible by enclosing an
argument in quotes and using leading spaces. Multiple -s options are
treated similarly.
If -n is not given, a suitable number of loops is calculated by trying
successive powers of 10 until the total time is at least 0.2 seconds.
The difference in default timer function is because on Windows,
clock() has microsecond granularity but time()'s granularity is 1/60th
of a second; on Unix, clock() has 1/100th of a second granularity and
time() is much more precise. On either platform, the default timer
functions measure wall clock time, not the CPU time. This means that
other processes running on the same computer may interfere with the
timing. The best thing to do when accurate timing is necessary is to
repeat the timing a few times and use the best time. The -r option is
good for this; the default of 3 repetitions is probably enough in most
cases. On Unix, you can use clock() to measure CPU time.
Note: there is a certain baseline overhead associated with executing a
pass statement. The code here doesn't try to hide it, but you should
be aware of it. The baseline overhead can be measured by invoking the
program without arguments.
The baseline overhead differs between Python versions! Also, to
fairly compare older Python versions to Python 2.3, you may want to
use python -O for the older versions to avoid timing SET_LINENO
instructions.
"""
import gc
import sys
import time
try:
import itertools
except ImportError:
# Must be an older Python version (see timeit() below)
itertools = None
__all__ = ["Timer"]
dummy_src_name = "<timeit-src>"
default_number = 1000000
default_repeat = 3
if sys.platform == "win32":
# On Windows, the best timer is time.clock()
default_timer = time.clock
else:
# On most other platforms the best timer is time.time()
default_timer = time.time
# Don't change the indentation of the template; the reindent() calls
# in Timer.__init__() depend on setup being indented 4 spaces and stmt
# being indented 8 spaces.
template = """
def inner(_it, _timer%(init)s):
%(setup)s
_t0 = _timer()
for _i in _it:
%(stmt)s
_t1 = _timer()
return _t1 - _t0
"""
def reindent(src, indent):
"""Helper to reindent a multi-line statement."""
return src.replace("\n", "\n" + " "*indent)
def _template_func(setup, func):
"""Create a timer function. Used if the "statement" is a callable."""
def inner(_it, _timer, _func=func):
setup()
_t0 = _timer()
for _i in _it:
_func()
_t1 = _timer()
return _t1 - _t0
return inner
class Timer:
"""Class for timing execution speed of small code snippets.
The constructor takes a statement to be timed, an additional
statement used for setup, and a timer function. Both statements
default to 'pass'; the timer function is platform-dependent (see
module doc string).
To measure the execution time of the first statement, use the
timeit() method. The repeat() method is a convenience to call
timeit() multiple times and return a list of results.
The statements may contain newlines, as long as they don't contain
multi-line string literals.
"""
def __init__(self, stmt="pass", setup="pass", timer=default_timer):
"""Constructor. See class doc string."""
self.timer = timer
ns = {}
if isinstance(stmt, basestring):
# Check that the code can be compiled outside a function
if isinstance(setup, basestring):
compile(setup, dummy_src_name, "exec")
compile(setup + '\n' + stmt, dummy_src_name, "exec")
else:
compile(stmt, dummy_src_name, "exec")
stmt = reindent(stmt, 8)
if isinstance(setup, basestring):
setup = reindent(setup, 4)
src = template % {'stmt': stmt, 'setup': setup, 'init': ''}
elif hasattr(setup, '__call__'):
src = template % {'stmt': stmt, 'setup': '_setup()',
'init': ', _setup=_setup'}
ns['_setup'] = setup
else:
raise ValueError("setup is neither a string nor callable")
self.src = src # Save for traceback display
code = compile(src, dummy_src_name, "exec")
exec code in globals(), ns
self.inner = ns["inner"]
elif hasattr(stmt, '__call__'):
self.src = None
if isinstance(setup, basestring):
_setup = setup
def setup():
exec _setup in globals(), ns
elif not hasattr(setup, '__call__'):
raise ValueError("setup is neither a string nor callable")
self.inner = _template_func(setup, stmt)
else:
raise ValueError("stmt is neither a string nor callable")
def print_exc(self, file=None):
"""Helper to print a traceback from the timed code.
Typical use:
t = Timer(...) # outside the try/except
try:
t.timeit(...) # or t.repeat(...)
except:
t.print_exc()
The advantage over the standard traceback is that source lines
in the compiled template will be displayed.
The optional file argument directs where the traceback is
sent; it defaults to sys.stderr.
"""
import linecache, traceback
if self.src is not None:
linecache.cache[dummy_src_name] = (len(self.src),
None,
self.src.split("\n"),
dummy_src_name)
# else the source is already stored somewhere else
traceback.print_exc(file=file)
def timeit(self, number=default_number):
"""Time 'number' executions of the main statement.
To be precise, this executes the setup statement once, and
then returns the time it takes to execute the main statement
a number of times, as a float measured in seconds. The
argument is the number of times through the loop, defaulting
to one million. The main statement, the setup statement and
the timer function to be used are passed to the constructor.
"""
if itertools:
it = itertools.repeat(None, number)
else:
it = [None] * number
gcold = gc.isenabled()
gc.disable()
try:
timing = self.inner(it, self.timer)
finally:
if gcold:
gc.enable()
return timing
def repeat(self, repeat=default_repeat, number=default_number):
"""Call timeit() a few times.
This is a convenience function that calls the timeit()
repeatedly, returning a list of results. The first argument
specifies how many times to call timeit(), defaulting to 3;
the second argument specifies the timer argument, defaulting
to one million.
Note: it's tempting to calculate mean and standard deviation
from the result vector and report these. However, this is not
very useful. In a typical case, the lowest value gives a
lower bound for how fast your machine can run the given code
snippet; higher values in the result vector are typically not
caused by variability in Python's speed, but by other
processes interfering with your timing accuracy. So the min()
of the result is probably the only number you should be
interested in. After that, you should look at the entire
vector and apply common sense rather than statistics.
"""
r = []
for i in range(repeat):
t = self.timeit(number)
r.append(t)
return r
def timeit(stmt="pass", setup="pass", timer=default_timer,
number=default_number):
"""Convenience function to create Timer object and call timeit method."""
return Timer(stmt, setup, timer).timeit(number)
def repeat(stmt="pass", setup="pass", timer=default_timer,
repeat=default_repeat, number=default_number):
"""Convenience function to create Timer object and call repeat method."""
return Timer(stmt, setup, timer).repeat(repeat, number)
def main(args=None, _wrap_timer=None):
"""Main program, used when run as a script.
The optional 'args' argument specifies the command line to be parsed,
defaulting to sys.argv[1:].
The return value is an exit code to be passed to sys.exit(); it
may be None to indicate success.
When an exception happens during timing, a traceback is printed to
stderr and the return value is 1. Exceptions at other times
(including the template compilation) are not caught.
'_wrap_timer' is an internal interface used for unit testing. If it
is not None, it must be a callable that accepts a timer function
and returns another timer function (used for unit testing).
"""
if args is None:
args = sys.argv[1:]
import getopt
try:
opts, args = getopt.getopt(args, "n:s:r:tcvh",
["number=", "setup=", "repeat=",
"time", "clock", "verbose", "help"])
except getopt.error, err:
print err
print "use -h/--help for command line help"
return 2
timer = default_timer
stmt = "\n".join(args) or "pass"
number = 0 # auto-determine
setup = []
repeat = default_repeat
verbose = 0
precision = 3
for o, a in opts:
if o in ("-n", "--number"):
number = int(a)
if o in ("-s", "--setup"):
setup.append(a)
if o in ("-r", "--repeat"):
repeat = int(a)
if repeat <= 0:
repeat = 1
if o in ("-t", "--time"):
timer = time.time
if o in ("-c", "--clock"):
timer = time.clock
if o in ("-v", "--verbose"):
if verbose:
precision += 1
verbose += 1
if o in ("-h", "--help"):
print __doc__,
return 0
setup = "\n".join(setup) or "pass"
# Include the current directory, so that local imports work (sys.path
# contains the directory of this script, rather than the current
# directory)
import os
sys.path.insert(0, os.curdir)
if _wrap_timer is not None:
timer = _wrap_timer(timer)
t = Timer(stmt, setup, timer)
if number == 0:
# determine number so that 0.2 <= total time < 2.0
for i in range(1, 10):
number = 10**i
try:
x = t.timeit(number)
except:
t.print_exc()
return 1
if verbose:
print "%d loops -> %.*g secs" % (number, precision, x)
if x >= 0.2:
break
try:
r = t.repeat(repeat, number)
except:
t.print_exc()
return 1
best = min(r)
if verbose:
print "raw times:", " ".join(["%.*g" % (precision, x) for x in r])
print "%d loops," % number,
usec = best * 1e6 / number
if usec < 1000:
print "best of %d: %.*g usec per loop" % (repeat, precision, usec)
else:
msec = usec / 1000
if msec < 1000:
print "best of %d: %.*g msec per loop" % (repeat, precision, msec)
else:
sec = msec / 1000
print "best of %d: %.*g sec per loop" % (repeat, precision, sec)
return None
if __name__ == "__main__":
sys.exit(main())
"""Convert "arbitrary" sound files to AIFF (Apple and SGI's audio format).
Input may be compressed.
Uncompressed file type may be AIFF, WAV, VOC, 8SVX, NeXT/Sun, and others.
An exception is raised if the file is not of a recognized type.
Returned filename is either the input filename or a temporary filename;
in the latter case the caller must ensure that it is removed.
Other temporary files used are removed by the function.
"""
from warnings import warnpy3k
warnpy3k("the toaiff module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
import os
import tempfile
import pipes
import sndhdr
__all__ = ["error", "toaiff"]
table = {}
t = pipes.Template()
t.append('sox -t au - -t aiff -r 8000 -', '--')
table['au'] = t
# XXX The following is actually sub-optimal.
# XXX The HCOM sampling rate can be 22k, 22k/2, 22k/3 or 22k/4.
# XXX We must force the output sampling rate else the SGI won't play
# XXX files sampled at 5.5k or 7.333k; however this means that files
# XXX sampled at 11k are unnecessarily expanded.
# XXX Similar comments apply to some other file types.
t = pipes.Template()
t.append('sox -t hcom - -t aiff -r 22050 -', '--')
table['hcom'] = t
t = pipes.Template()
t.append('sox -t voc - -t aiff -r 11025 -', '--')
table['voc'] = t
t = pipes.Template()
t.append('sox -t wav - -t aiff -', '--')
table['wav'] = t
t = pipes.Template()
t.append('sox -t 8svx - -t aiff -r 16000 -', '--')
table['8svx'] = t
t = pipes.Template()
t.append('sox -t sndt - -t aiff -r 16000 -', '--')
table['sndt'] = t
t = pipes.Template()
t.append('sox -t sndr - -t aiff -r 16000 -', '--')
table['sndr'] = t
uncompress = pipes.Template()
uncompress.append('uncompress', '--')
class error(Exception):
pass
def toaiff(filename):
temps = []
ret = None
try:
ret = _toaiff(filename, temps)
finally:
for temp in temps[:]:
if temp != ret:
try:
os.unlink(temp)
except os.error:
pass
temps.remove(temp)
return ret
def _toaiff(filename, temps):
if filename[-2:] == '.Z':
(fd, fname) = tempfile.mkstemp()
os.close(fd)
temps.append(fname)
sts = uncompress.copy(filename, fname)
if sts:
raise error, filename + ': uncompress failed'
else:
fname = filename
try:
ftype = sndhdr.whathdr(fname)
if ftype:
ftype = ftype[0] # All we're interested in
except IOError, msg:
if type(msg) == type(()) and len(msg) == 2 and \
type(msg[0]) == type(0) and type(msg[1]) == type(''):
msg = msg[1]
if type(msg) != type(''):
msg = repr(msg)
raise error, filename + ': ' + msg
if ftype == 'aiff':
return fname
if ftype is None or not ftype in table:
raise error, '%s: unsupported audio file type %r' % (filename, ftype)
(fd, temp) = tempfile.mkstemp()
os.close(fd)
temps.append(temp)
sts = table[ftype].copy(fname, temp)
if sts:
raise error, filename + ': conversion to aiff failed'
return temp
"""Token constants (from "token.h")."""
# This file is automatically generated; please don't muck it up!
#
# To update the symbols in this file, 'cd' to the top directory of
# the python source tree after building the interpreter and run:
#
# ./python Lib/token.py
#--start constants--
ENDMARKER = 0
NAME = 1
NUMBER = 2
STRING = 3
NEWLINE = 4
INDENT = 5
DEDENT = 6
LPAR = 7
RPAR = 8
LSQB = 9
RSQB = 10
COLON = 11
COMMA = 12
SEMI = 13
PLUS = 14
MINUS = 15
STAR = 16
SLASH = 17
VBAR = 18
AMPER = 19
LESS = 20
GREATER = 21
EQUAL = 22
DOT = 23
PERCENT = 24
BACKQUOTE = 25
LBRACE = 26
RBRACE = 27
EQEQUAL = 28
NOTEQUAL = 29
LESSEQUAL = 30
GREATEREQUAL = 31
TILDE = 32
CIRCUMFLEX = 33
LEFTSHIFT = 34
RIGHTSHIFT = 35
DOUBLESTAR = 36
PLUSEQUAL = 37
MINEQUAL = 38
STAREQUAL = 39
SLASHEQUAL = 40
PERCENTEQUAL = 41
AMPEREQUAL = 42
VBAREQUAL = 43
CIRCUMFLEXEQUAL = 44
LEFTSHIFTEQUAL = 45
RIGHTSHIFTEQUAL = 46
DOUBLESTAREQUAL = 47
DOUBLESLASH = 48
DOUBLESLASHEQUAL = 49
AT = 50
OP = 51
ERRORTOKEN = 52
N_TOKENS = 53
NT_OFFSET = 256
#--end constants--
tok_name = {}
for _name, _value in globals().items():
if type(_value) is type(0):
tok_name[_value] = _name
del _name, _value
def ISTERMINAL(x):
return x < NT_OFFSET
def ISNONTERMINAL(x):
return x >= NT_OFFSET
def ISEOF(x):
return x == ENDMARKER
def main():
import re
import sys
args = sys.argv[1:]
inFileName = args and args[0] or "Include/token.h"
outFileName = "Lib/token.py"
if len(args) > 1:
outFileName = args[1]
try:
fp = open(inFileName)
except IOError, err:
sys.stdout.write("I/O error: %s\n" % str(err))
sys.exit(1)
lines = fp.read().split("\n")
fp.close()
prog = re.compile(
"#define[ \t][ \t]*([A-Z0-9][A-Z0-9_]*)[ \t][ \t]*([0-9][0-9]*)",
re.IGNORECASE)
tokens = {}
for line in lines:
match = prog.match(line)
if match:
name, val = match.group(1, 2)
val = int(val)
tokens[val] = name # reverse so we can sort them...
keys = tokens.keys()
keys.sort()
# load the output skeleton from the target:
try:
fp = open(outFileName)
except IOError, err:
sys.stderr.write("I/O error: %s\n" % str(err))
sys.exit(2)
format = fp.read().split("\n")
fp.close()
try:
start = format.index("#--start constants--") + 1
end = format.index("#--end constants--")
except ValueError:
sys.stderr.write("target does not contain format markers")
sys.exit(3)
lines = []
for val in keys:
lines.append("%s = %d" % (tokens[val], val))
format[start:end] = lines
try:
fp = open(outFileName, 'w')
except IOError, err:
sys.stderr.write("I/O error: %s\n" % str(err))
sys.exit(4)
fp.write("\n".join(format))
fp.close()
if __name__ == "__main__":
main()
"""Tokenization help for Python programs.
generate_tokens(readline) is a generator that breaks a stream of
text into Python tokens. It accepts a readline-like method which is called
repeatedly to get the next line of input (or "" for EOF). It generates
5-tuples with these members:
the token type (see token.py)
the token (a string)
the starting (row, column) indices of the token (a 2-tuple of ints)
the ending (row, column) indices of the token (a 2-tuple of ints)
the original line (string)
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators
Older entry points
tokenize_loop(readline, tokeneater)
tokenize(readline, tokeneater=printtoken)
are the same, except instead of generating tokens, tokeneater is a callback
function to which the 5 fields described above are passed as 5 arguments,
each time a new token is found."""
__author__ = 'Ka-Ping Yee <[email protected]>'
__credits__ = ('GvR, ESR, Tim Peters, Thomas Wouters, Fred Drake, '
'Skip Montanaro, Raymond Hettinger')
from itertools import chain
import string, re
from token import *
import token
__all__ = [x for x in dir(token) if not x.startswith("_")]
__all__ += ["COMMENT", "tokenize", "generate_tokens", "NL", "untokenize"]
del x
del token
COMMENT = N_TOKENS
tok_name[COMMENT] = 'COMMENT'
NL = N_TOKENS + 1
tok_name[NL] = 'NL'
N_TOKENS += 2
def group(*choices): return '(' + '|'.join(choices) + ')'
def any(*choices): return group(*choices) + '*'
def maybe(*choices): return group(*choices) + '?'
Whitespace = r'[ \f\t]*'
Comment = r'#[^\r\n]*'
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
Name = r'[a-zA-Z_]\w*'
Hexnumber = r'0[xX][\da-fA-F]+[lL]?'
Octnumber = r'(0[oO][0-7]+)|(0[0-7]*)[lL]?'
Binnumber = r'0[bB][01]+[lL]?'
Decnumber = r'[1-9]\d*[lL]?'
Intnumber = group(Hexnumber, Binnumber, Octnumber, Decnumber)
Exponent = r'[eE][-+]?\d+'
Pointfloat = group(r'\d+\.\d*', r'\.\d+') + maybe(Exponent)
Expfloat = r'\d+' + Exponent
Floatnumber = group(Pointfloat, Expfloat)
Imagnumber = group(r'\d+[jJ]', Floatnumber + r'[jJ]')
Number = group(Imagnumber, Floatnumber, Intnumber)
# Tail end of ' string.
Single = r"[^'\\]*(?:\\.[^'\\]*)*'"
# Tail end of " string.
Double = r'[^"\\]*(?:\\.[^"\\]*)*"'
# Tail end of ''' string.
Single3 = r"[^'\\]*(?:(?:\\.|'(?!''))[^'\\]*)*'''"
# Tail end of """ string.
Double3 = r'[^"\\]*(?:(?:\\.|"(?!""))[^"\\]*)*"""'
Triple = group("[uUbB]?[rR]?'''", '[uUbB]?[rR]?"""')
# Single-line ' or " string.
String = group(r"[uUbB]?[rR]?'[^\n'\\]*(?:\\.[^\n'\\]*)*'",
r'[uUbB]?[rR]?"[^\n"\\]*(?:\\.[^\n"\\]*)*"')
# Because of leftmost-then-longest match semantics, be sure to put the
# longest operators first (e.g., if = came before ==, == would get
# recognized as two instances of =).
Operator = group(r"\*\*=?", r">>=?", r"<<=?", r"<>", r"!=",
r"//=?",
r"[+\-*/%&|^=<>]=?",
r"~")
Bracket = '[][(){}]'
Special = group(r'\r?\n', r'[:;.,`@]')
Funny = group(Operator, Bracket, Special)
PlainToken = group(Number, Funny, String, Name)
Token = Ignore + PlainToken
# First (or only) line of ' or " string.
ContStr = group(r"[uUbB]?[rR]?'[^\n'\\]*(?:\\.[^\n'\\]*)*" +
group("'", r'\\\r?\n'),
r'[uUbB]?[rR]?"[^\n"\\]*(?:\\.[^\n"\\]*)*' +
group('"', r'\\\r?\n'))
PseudoExtras = group(r'\\\r?\n|\Z', Comment, Triple)
PseudoToken = Whitespace + group(PseudoExtras, Number, Funny, ContStr, Name)
tokenprog, pseudoprog, single3prog, double3prog = map(
re.compile, (Token, PseudoToken, Single3, Double3))
endprogs = {"'": re.compile(Single), '"': re.compile(Double),
"'''": single3prog, '"""': double3prog,
"r'''": single3prog, 'r"""': double3prog,
"u'''": single3prog, 'u"""': double3prog,
"ur'''": single3prog, 'ur"""': double3prog,
"R'''": single3prog, 'R"""': double3prog,
"U'''": single3prog, 'U"""': double3prog,
"uR'''": single3prog, 'uR"""': double3prog,
"Ur'''": single3prog, 'Ur"""': double3prog,
"UR'''": single3prog, 'UR"""': double3prog,
"b'''": single3prog, 'b"""': double3prog,
"br'''": single3prog, 'br"""': double3prog,
"B'''": single3prog, 'B"""': double3prog,
"bR'''": single3prog, 'bR"""': double3prog,
"Br'''": single3prog, 'Br"""': double3prog,
"BR'''": single3prog, 'BR"""': double3prog,
'r': None, 'R': None, 'u': None, 'U': None,
'b': None, 'B': None}
triple_quoted = {}
for t in ("'''", '"""',
"r'''", 'r"""', "R'''", 'R"""',
"u'''", 'u"""', "U'''", 'U"""',
"ur'''", 'ur"""', "Ur'''", 'Ur"""',
"uR'''", 'uR"""', "UR'''", 'UR"""',
"b'''", 'b"""', "B'''", 'B"""',
"br'''", 'br"""', "Br'''", 'Br"""',
"bR'''", 'bR"""', "BR'''", 'BR"""'):
triple_quoted[t] = t
single_quoted = {}
for t in ("'", '"',
"r'", 'r"', "R'", 'R"',
"u'", 'u"', "U'", 'U"',
"ur'", 'ur"', "Ur'", 'Ur"',
"uR'", 'uR"', "UR'", 'UR"',
"b'", 'b"', "B'", 'B"',
"br'", 'br"', "Br'", 'Br"',
"bR'", 'bR"', "BR'", 'BR"' ):
single_quoted[t] = t
tabsize = 8
class TokenError(Exception): pass
class StopTokenizing(Exception): pass
def printtoken(type, token, srow_scol, erow_ecol, line): # for testing
srow, scol = srow_scol
erow, ecol = erow_ecol
print "%d,%d-%d,%d:\t%s\t%s" % \
(srow, scol, erow, ecol, tok_name[type], repr(token))
def tokenize(readline, tokeneater=printtoken):
"""
The tokenize() function accepts two parameters: one representing the
input stream, and one providing an output mechanism for tokenize().
The first parameter, readline, must be a callable object which provides
the same interface as the readline() method of built-in file objects.
Each call to the function should return one line of input as a string.
The second parameter, tokeneater, must also be a callable object. It is
called once for each token, with five arguments, corresponding to the
tuples generated by generate_tokens().
"""
try:
tokenize_loop(readline, tokeneater)
except StopTokenizing:
pass
# backwards compatible interface
def tokenize_loop(readline, tokeneater):
for token_info in generate_tokens(readline):
tokeneater(*token_info)
class Untokenizer:
def __init__(self):
self.tokens = []
self.prev_row = 1
self.prev_col = 0
def add_whitespace(self, start):
row, col = start
if row < self.prev_row or row == self.prev_row and col < self.prev_col:
raise ValueError("start ({},{}) precedes previous end ({},{})"
.format(row, col, self.prev_row, self.prev_col))
row_offset = row - self.prev_row
if row_offset:
self.tokens.append("\\\n" * row_offset)
self.prev_col = 0
col_offset = col - self.prev_col
if col_offset:
self.tokens.append(" " * col_offset)
def untokenize(self, iterable):
it = iter(iterable)
indents = []
startline = False
for t in it:
if len(t) == 2:
self.compat(t, it)
break
tok_type, token, start, end, line = t
if tok_type == ENDMARKER:
break
if tok_type == INDENT:
indents.append(token)
continue
elif tok_type == DEDENT:
indents.pop()
self.prev_row, self.prev_col = end
continue
elif tok_type in (NEWLINE, NL):
startline = True
elif startline and indents:
indent = indents[-1]
if start[1] >= len(indent):
self.tokens.append(indent)
self.prev_col = len(indent)
startline = False
self.add_whitespace(start)
self.tokens.append(token)
self.prev_row, self.prev_col = end
if tok_type in (NEWLINE, NL):
self.prev_row += 1
self.prev_col = 0
return "".join(self.tokens)
def compat(self, token, iterable):
indents = []
toks_append = self.tokens.append
startline = token[0] in (NEWLINE, NL)
prevstring = False
for tok in chain([token], iterable):
toknum, tokval = tok[:2]
if toknum in (NAME, NUMBER):
tokval += ' '
# Insert a space between two consecutive strings
if toknum == STRING:
if prevstring:
tokval = ' ' + tokval
prevstring = True
else:
prevstring = False
if toknum == INDENT:
indents.append(tokval)
continue
elif toknum == DEDENT:
indents.pop()
continue
elif toknum in (NEWLINE, NL):
startline = True
elif startline and indents:
toks_append(indents[-1])
startline = False
toks_append(tokval)
def untokenize(iterable):
"""Transform tokens back into Python source code.
Each element returned by the iterable must be a token sequence
with at least two elements, a token number and token value. If
only two tokens are passed, the resulting output is poor.
Round-trip invariant for full input:
Untokenized source will match input source exactly
Round-trip invariant for limited intput:
# Output text will tokenize the back to the input
t1 = [tok[:2] for tok in generate_tokens(f.readline)]
newcode = untokenize(t1)
readline = iter(newcode.splitlines(1)).next
t2 = [tok[:2] for tok in generate_tokens(readline)]
assert t1 == t2
"""
ut = Untokenizer()
return ut.untokenize(iterable)
def generate_tokens(readline):
"""
The generate_tokens() generator requires one argument, readline, which
must be a callable object which provides the same interface as the
readline() method of built-in file objects. Each call to the function
should return one line of input as a string. Alternately, readline
can be a callable function terminating with StopIteration:
readline = open(myfile).next # Example of alternate readline
The generator produces 5-tuples with these members: the token type; the
token string; a 2-tuple (srow, scol) of ints specifying the row and
column where the token begins in the source; a 2-tuple (erow, ecol) of
ints specifying the row and column where the token ends in the source;
and the line on which the token was found. The line passed is the
logical line; continuation lines are included.
"""
lnum = parenlev = continued = 0
namechars, numchars = string.ascii_letters + '_', '0123456789'
contstr, needcont = '', 0
contline = None
indents = [0]
while 1: # loop over lines in stream
try:
line = readline()
except StopIteration:
line = ''
lnum += 1
pos, max = 0, len(line)
if contstr: # continued string
if not line:
raise TokenError, ("EOF in multi-line string", strstart)
endmatch = endprog.match(line)
if endmatch:
pos = end = endmatch.end(0)
yield (STRING, contstr + line[:end],
strstart, (lnum, end), contline + line)
contstr, needcont = '', 0
contline = None
elif needcont and line[-2:] != '\\\n' and line[-3:] != '\\\r\n':
yield (ERRORTOKEN, contstr + line,
strstart, (lnum, len(line)), contline)
contstr = ''
contline = None
continue
else:
contstr = contstr + line
contline = contline + line
continue
elif parenlev == 0 and not continued: # new statement
if not line: break
column = 0
while pos < max: # measure leading whitespace
if line[pos] == ' ':
column += 1
elif line[pos] == '\t':
column = (column//tabsize + 1)*tabsize
elif line[pos] == '\f':
column = 0
else:
break
pos += 1
if pos == max:
break
if line[pos] in '#\r\n': # skip comments or blank lines
if line[pos] == '#':
comment_token = line[pos:].rstrip('\r\n')
nl_pos = pos + len(comment_token)
yield (COMMENT, comment_token,
(lnum, pos), (lnum, pos + len(comment_token)), line)
yield (NL, line[nl_pos:],
(lnum, nl_pos), (lnum, len(line)), line)
else:
yield ((NL, COMMENT)[line[pos] == '#'], line[pos:],
(lnum, pos), (lnum, len(line)), line)
continue
if column > indents[-1]: # count indents or dedents
indents.append(column)
yield (INDENT, line[:pos], (lnum, 0), (lnum, pos), line)
while column < indents[-1]:
if column not in indents:
raise IndentationError(
"unindent does not match any outer indentation level",
("<tokenize>", lnum, pos, line))
indents = indents[:-1]
yield (DEDENT, '', (lnum, pos), (lnum, pos), line)
else: # continued statement
if not line:
raise TokenError, ("EOF in multi-line statement", (lnum, 0))
continued = 0
while pos < max:
pseudomatch = pseudoprog.match(line, pos)
if pseudomatch: # scan for tokens
start, end = pseudomatch.span(1)
spos, epos, pos = (lnum, start), (lnum, end), end
if start == end:
continue
token, initial = line[start:end], line[start]
if initial in numchars or \
(initial == '.' and token != '.'): # ordinary number
yield (NUMBER, token, spos, epos, line)
elif initial in '\r\n':
yield (NL if parenlev > 0 else NEWLINE,
token, spos, epos, line)
elif initial == '#':
assert not token.endswith("\n")
yield (COMMENT, token, spos, epos, line)
elif token in triple_quoted:
endprog = endprogs[token]
endmatch = endprog.match(line, pos)
if endmatch: # all on one line
pos = endmatch.end(0)
token = line[start:pos]
yield (STRING, token, spos, (lnum, pos), line)
else:
strstart = (lnum, start) # multiple lines
contstr = line[start:]
contline = line
break
elif initial in single_quoted or \
token[:2] in single_quoted or \
token[:3] in single_quoted:
if token[-1] == '\n': # continued string
strstart = (lnum, start)
endprog = (endprogs[initial] or endprogs[token[1]] or
endprogs[token[2]])
contstr, needcont = line[start:], 1
contline = line
break
else: # ordinary string
yield (STRING, token, spos, epos, line)
elif initial in namechars: # ordinary name
yield (NAME, token, spos, epos, line)
elif initial == '\\': # continued stmt
continued = 1
else:
if initial in '([{':
parenlev += 1
elif initial in ')]}':
parenlev -= 1
yield (OP, token, spos, epos, line)
else:
yield (ERRORTOKEN, line[pos],
(lnum, pos), (lnum, pos+1), line)
pos += 1
for indent in indents[1:]: # pop remaining indent levels
yield (DEDENT, '', (lnum, 0), (lnum, 0), '')
yield (ENDMARKER, '', (lnum, 0), (lnum, 0), '')
if __name__ == '__main__': # testing
import sys
if len(sys.argv) > 1:
tokenize(open(sys.argv[1]).readline)
else:
tokenize(sys.stdin.readline)
#!/usr/bin/env python
# portions copyright 2001, Autonomous Zones Industries, Inc., all rights...
# err... reserved and offered to the public under the terms of the
# Python 2.2 license.
# Author: Zooko O'Whielacronx
# http://zooko.com/
# mailto:[email protected]
#
# Copyright 2000, Mojam Media, Inc., all rights reserved.
# Author: Skip Montanaro
#
# Copyright 1999, Bioreason, Inc., all rights reserved.
# Author: Andrew Dalke
#
# Copyright 1995-1997, Automatrix, Inc., all rights reserved.
# Author: Skip Montanaro
#
# Copyright 1991-1995, Stichting Mathematisch Centrum, all rights reserved.
#
#
# Permission to use, copy, modify, and distribute this Python software and
# its associated documentation for any purpose without fee is hereby
# granted, provided that the above copyright notice appears in all copies,
# and that both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of neither Automatrix,
# Bioreason or Mojam Media be used in advertising or publicity pertaining to
# distribution of the software without specific, written prior permission.
#
"""program/module to trace Python program or function execution
Sample use, command line:
trace.py -c -f counts --ignore-dir '$prefix' spam.py eggs
trace.py -t --ignore-dir '$prefix' spam.py eggs
trace.py --trackcalls spam.py eggs
Sample use, programmatically
import sys
# create a Trace object, telling it what to ignore, and whether to
# do tracing or line-counting or both.
tracer = trace.Trace(ignoredirs=[sys.prefix, sys.exec_prefix,], trace=0,
count=1)
# run the new command using the given tracer
tracer.run('main()')
# make a report, placing output in /tmp
r = tracer.results()
r.write_results(show_missing=True, coverdir="/tmp")
"""
import linecache
import os
import re
import sys
import time
import token
import tokenize
import inspect
import gc
import dis
try:
import cPickle
pickle = cPickle
except ImportError:
import pickle
try:
import threading
except ImportError:
_settrace = sys.settrace
def _unsettrace():
sys.settrace(None)
else:
def _settrace(func):
threading.settrace(func)
sys.settrace(func)
def _unsettrace():
sys.settrace(None)
threading.settrace(None)
def usage(outfile):
outfile.write("""Usage: %s [OPTIONS] <file> [ARGS]
Meta-options:
--help Display this help then exit.
--version Output version information then exit.
Otherwise, exactly one of the following three options must be given:
-t, --trace Print each line to sys.stdout before it is executed.
-c, --count Count the number of times each line is executed
and write the counts to <module>.cover for each
module executed, in the module's directory.
See also `--coverdir', `--file', `--no-report' below.
-l, --listfuncs Keep track of which functions are executed at least
once and write the results to sys.stdout after the
program exits.
-T, --trackcalls Keep track of caller/called pairs and write the
results to sys.stdout after the program exits.
-r, --report Generate a report from a counts file; do not execute
any code. `--file' must specify the results file to
read, which must have been created in a previous run
with `--count --file=FILE'.
Modifiers:
-f, --file=<file> File to accumulate counts over several runs.
-R, --no-report Do not generate the coverage report files.
Useful if you want to accumulate over several runs.
-C, --coverdir=<dir> Directory where the report files. The coverage
report for <package>.<module> is written to file
<dir>/<package>/<module>.cover.
-m, --missing Annotate executable lines that were not executed
with '>>>>>> '.
-s, --summary Write a brief summary on stdout for each file.
(Can only be used with --count or --report.)
-g, --timing Prefix each line with the time since the program started.
Only used while tracing.
Filters, may be repeated multiple times:
--ignore-module=<mod> Ignore the given module(s) and its submodules
(if it is a package). Accepts comma separated
list of module names
--ignore-dir=<dir> Ignore files in the given directory (multiple
directories can be joined by os.pathsep).
""" % sys.argv[0])
PRAGMA_NOCOVER = "#pragma NO COVER"
# Simple rx to find lines with no code.
rx_blank = re.compile(r'^\s*(#.*)?$')
class Ignore:
def __init__(self, modules = None, dirs = None):
self._mods = modules or []
self._dirs = dirs or []
self._dirs = map(os.path.normpath, self._dirs)
self._ignore = { '<string>': 1 }
def names(self, filename, modulename):
if modulename in self._ignore:
return self._ignore[modulename]
# haven't seen this one before, so see if the module name is
# on the ignore list. Need to take some care since ignoring
# "cmp" musn't mean ignoring "cmpcache" but ignoring
# "Spam" must also mean ignoring "Spam.Eggs".
for mod in self._mods:
if mod == modulename: # Identical names, so ignore
self._ignore[modulename] = 1
return 1
# check if the module is a proper submodule of something on
# the ignore list
n = len(mod)
# (will not overflow since if the first n characters are the
# same and the name has not already occurred, then the size
# of "name" is greater than that of "mod")
if mod == modulename[:n] and modulename[n] == '.':
self._ignore[modulename] = 1
return 1
# Now check that __file__ isn't in one of the directories
if filename is None:
# must be a built-in, so we must ignore
self._ignore[modulename] = 1
return 1
# Ignore a file when it contains one of the ignorable paths
for d in self._dirs:
# The '+ os.sep' is to ensure that d is a parent directory,
# as compared to cases like:
# d = "/usr/local"
# filename = "/usr/local.py"
# or
# d = "/usr/local.py"
# filename = "/usr/local.py"
if filename.startswith(d + os.sep):
self._ignore[modulename] = 1
return 1
# Tried the different ways, so we don't ignore this module
self._ignore[modulename] = 0
return 0
def modname(path):
"""Return a plausible module name for the patch."""
base = os.path.basename(path)
filename, ext = os.path.splitext(base)
return filename
def fullmodname(path):
"""Return a plausible module name for the path."""
# If the file 'path' is part of a package, then the filename isn't
# enough to uniquely identify it. Try to do the right thing by
# looking in sys.path for the longest matching prefix. We'll
# assume that the rest is the package name.
comparepath = os.path.normcase(path)
longest = ""
for dir in sys.path:
dir = os.path.normcase(dir)
if comparepath.startswith(dir) and comparepath[len(dir)] == os.sep:
if len(dir) > len(longest):
longest = dir
if longest:
base = path[len(longest) + 1:]
else:
base = path
# the drive letter is never part of the module name
drive, base = os.path.splitdrive(base)
base = base.replace(os.sep, ".")
if os.altsep:
base = base.replace(os.altsep, ".")
filename, ext = os.path.splitext(base)
return filename.lstrip(".")
class CoverageResults:
def __init__(self, counts=None, calledfuncs=None, infile=None,
callers=None, outfile=None):
self.counts = counts
if self.counts is None:
self.counts = {}
self.counter = self.counts.copy() # map (filename, lineno) to count
self.calledfuncs = calledfuncs
if self.calledfuncs is None:
self.calledfuncs = {}
self.calledfuncs = self.calledfuncs.copy()
self.callers = callers
if self.callers is None:
self.callers = {}
self.callers = self.callers.copy()
self.infile = infile
self.outfile = outfile
if self.infile:
# Try to merge existing counts file.
try:
counts, calledfuncs, callers = \
pickle.load(open(self.infile, 'rb'))
self.update(self.__class__(counts, calledfuncs, callers))
except (IOError, EOFError, ValueError), err:
print >> sys.stderr, ("Skipping counts file %r: %s"
% (self.infile, err))
def update(self, other):
"""Merge in the data from another CoverageResults"""
counts = self.counts
calledfuncs = self.calledfuncs
callers = self.callers
other_counts = other.counts
other_calledfuncs = other.calledfuncs
other_callers = other.callers
for key in other_counts.keys():
counts[key] = counts.get(key, 0) + other_counts[key]
for key in other_calledfuncs.keys():
calledfuncs[key] = 1
for key in other_callers.keys():
callers[key] = 1
def write_results(self, show_missing=True, summary=False, coverdir=None):
"""
@param coverdir
"""
if self.calledfuncs:
print
print "functions called:"
calls = self.calledfuncs.keys()
calls.sort()
for filename, modulename, funcname in calls:
print ("filename: %s, modulename: %s, funcname: %s"
% (filename, modulename, funcname))
if self.callers:
print
print "calling relationships:"
calls = self.callers.keys()
calls.sort()
lastfile = lastcfile = ""
for ((pfile, pmod, pfunc), (cfile, cmod, cfunc)) in calls:
if pfile != lastfile:
print
print "***", pfile, "***"
lastfile = pfile
lastcfile = ""
if cfile != pfile and lastcfile != cfile:
print " -->", cfile
lastcfile = cfile
print " %s.%s -> %s.%s" % (pmod, pfunc, cmod, cfunc)
# turn the counts data ("(filename, lineno) = count") into something
# accessible on a per-file basis
per_file = {}
for filename, lineno in self.counts.keys():
lines_hit = per_file[filename] = per_file.get(filename, {})
lines_hit[lineno] = self.counts[(filename, lineno)]
# accumulate summary info, if needed
sums = {}
for filename, count in per_file.iteritems():
# skip some "files" we don't care about...
if filename == "<string>":
continue
if filename.startswith("<doctest "):
continue
if filename.endswith((".pyc", ".pyo")):
filename = filename[:-1]
if coverdir is None:
dir = os.path.dirname(os.path.abspath(filename))
modulename = modname(filename)
else:
dir = coverdir
if not os.path.exists(dir):
os.makedirs(dir)
modulename = fullmodname(filename)
# If desired, get a list of the line numbers which represent
# executable content (returned as a dict for better lookup speed)
if show_missing:
lnotab = find_executable_linenos(filename)
else:
lnotab = {}
source = linecache.getlines(filename)
coverpath = os.path.join(dir, modulename + ".cover")
n_hits, n_lines = self.write_results_file(coverpath, source,
lnotab, count)
if summary and n_lines:
percent = 100 * n_hits // n_lines
sums[modulename] = n_lines, percent, modulename, filename
if summary and sums:
mods = sums.keys()
mods.sort()
print "lines cov% module (path)"
for m in mods:
n_lines, percent, modulename, filename = sums[m]
print "%5d %3d%% %s (%s)" % sums[m]
if self.outfile:
# try and store counts and module info into self.outfile
try:
pickle.dump((self.counts, self.calledfuncs, self.callers),
open(self.outfile, 'wb'), 1)
except IOError, err:
print >> sys.stderr, "Can't save counts files because %s" % err
def write_results_file(self, path, lines, lnotab, lines_hit):
"""Return a coverage results file in path."""
try:
outfile = open(path, "w")
except IOError, err:
print >> sys.stderr, ("trace: Could not open %r for writing: %s "
"- skipping" % (path, err))
return 0, 0
n_lines = 0
n_hits = 0
for i, line in enumerate(lines):
lineno = i + 1
# do the blank/comment match to try to mark more lines
# (help the reader find stuff that hasn't been covered)
if lineno in lines_hit:
outfile.write("%5d: " % lines_hit[lineno])
n_hits += 1
n_lines += 1
elif rx_blank.match(line):
outfile.write(" ")
else:
# lines preceded by no marks weren't hit
# Highlight them if so indicated, unless the line contains
# #pragma: NO COVER
if lineno in lnotab and not PRAGMA_NOCOVER in lines[i]:
outfile.write(">>>>>> ")
n_lines += 1
else:
outfile.write(" ")
outfile.write(lines[i].expandtabs(8))
outfile.close()
return n_hits, n_lines
def find_lines_from_code(code, strs):
"""Return dict where keys are lines in the line number table."""
linenos = {}
for _, lineno in dis.findlinestarts(code):
if lineno not in strs:
linenos[lineno] = 1
return linenos
def find_lines(code, strs):
"""Return lineno dict for all code objects reachable from code."""
# get all of the lineno information from the code of this scope level
linenos = find_lines_from_code(code, strs)
# and check the constants for references to other code objects
for c in code.co_consts:
if inspect.iscode(c):
# find another code object, so recurse into it
linenos.update(find_lines(c, strs))
return linenos
def find_strings(filename):
"""Return a dict of possible docstring positions.
The dict maps line numbers to strings. There is an entry for
line that contains only a string or a part of a triple-quoted
string.
"""
d = {}
# If the first token is a string, then it's the module docstring.
# Add this special case so that the test in the loop passes.
prev_ttype = token.INDENT
f = open(filename)
for ttype, tstr, start, end, line in tokenize.generate_tokens(f.readline):
if ttype == token.STRING:
if prev_ttype == token.INDENT:
sline, scol = start
eline, ecol = end
for i in range(sline, eline + 1):
d[i] = 1
prev_ttype = ttype
f.close()
return d
def find_executable_linenos(filename):
"""Return dict where keys are line numbers in the line number table."""
try:
prog = open(filename, "rU").read()
except IOError, err:
print >> sys.stderr, ("Not printing coverage data for %r: %s"
% (filename, err))
return {}
code = compile(prog, filename, "exec")
strs = find_strings(filename)
return find_lines(code, strs)
class Trace:
def __init__(self, count=1, trace=1, countfuncs=0, countcallers=0,
ignoremods=(), ignoredirs=(), infile=None, outfile=None,
timing=False):
"""
@param count true iff it should count number of times each
line is executed
@param trace true iff it should print out each line that is
being counted
@param countfuncs true iff it should just output a list of
(filename, modulename, funcname,) for functions
that were called at least once; This overrides
`count' and `trace'
@param ignoremods a list of the names of modules to ignore
@param ignoredirs a list of the names of directories to ignore
all of the (recursive) contents of
@param infile file from which to read stored counts to be
added into the results
@param outfile file in which to write the results
@param timing true iff timing information be displayed
"""
self.infile = infile
self.outfile = outfile
self.ignore = Ignore(ignoremods, ignoredirs)
self.counts = {} # keys are (filename, linenumber)
self.blabbed = {} # for debugging
self.pathtobasename = {} # for memoizing os.path.basename
self.donothing = 0
self.trace = trace
self._calledfuncs = {}
self._callers = {}
self._caller_cache = {}
self.start_time = None
if timing:
self.start_time = time.time()
if countcallers:
self.globaltrace = self.globaltrace_trackcallers
elif countfuncs:
self.globaltrace = self.globaltrace_countfuncs
elif trace and count:
self.globaltrace = self.globaltrace_lt
self.localtrace = self.localtrace_trace_and_count
elif trace:
self.globaltrace = self.globaltrace_lt
self.localtrace = self.localtrace_trace
elif count:
self.globaltrace = self.globaltrace_lt
self.localtrace = self.localtrace_count
else:
# Ahem -- do nothing? Okay.
self.donothing = 1
def run(self, cmd):
import __main__
dict = __main__.__dict__
self.runctx(cmd, dict, dict)
def runctx(self, cmd, globals=None, locals=None):
if globals is None: globals = {}
if locals is None: locals = {}
if not self.donothing:
_settrace(self.globaltrace)
try:
exec cmd in globals, locals
finally:
if not self.donothing:
_unsettrace()
def runfunc(self, func, *args, **kw):
result = None
if not self.donothing:
sys.settrace(self.globaltrace)
try:
result = func(*args, **kw)
finally:
if not self.donothing:
sys.settrace(None)
return result
def file_module_function_of(self, frame):
code = frame.f_code
filename = code.co_filename
if filename:
modulename = modname(filename)
else:
modulename = None
funcname = code.co_name
clsname = None
if code in self._caller_cache:
if self._caller_cache[code] is not None:
clsname = self._caller_cache[code]
else:
self._caller_cache[code] = None
## use of gc.get_referrers() was suggested by Michael Hudson
# all functions which refer to this code object
funcs = [f for f in gc.get_referrers(code)
if inspect.isfunction(f)]
# require len(func) == 1 to avoid ambiguity caused by calls to
# new.function(): "In the face of ambiguity, refuse the
# temptation to guess."
if len(funcs) == 1:
dicts = [d for d in gc.get_referrers(funcs[0])
if isinstance(d, dict)]
if len(dicts) == 1:
classes = [c for c in gc.get_referrers(dicts[0])
if hasattr(c, "__bases__")]
if len(classes) == 1:
# ditto for new.classobj()
clsname = classes[0].__name__
# cache the result - assumption is that new.* is
# not called later to disturb this relationship
# _caller_cache could be flushed if functions in
# the new module get called.
self._caller_cache[code] = clsname
if clsname is not None:
funcname = "%s.%s" % (clsname, funcname)
return filename, modulename, funcname
def globaltrace_trackcallers(self, frame, why, arg):
"""Handler for call events.
Adds information about who called who to the self._callers dict.
"""
if why == 'call':
# XXX Should do a better job of identifying methods
this_func = self.file_module_function_of(frame)
parent_func = self.file_module_function_of(frame.f_back)
self._callers[(parent_func, this_func)] = 1
def globaltrace_countfuncs(self, frame, why, arg):
"""Handler for call events.
Adds (filename, modulename, funcname) to the self._calledfuncs dict.
"""
if why == 'call':
this_func = self.file_module_function_of(frame)
self._calledfuncs[this_func] = 1
def globaltrace_lt(self, frame, why, arg):
"""Handler for call events.
If the code block being entered is to be ignored, returns `None',
else returns self.localtrace.
"""
if why == 'call':
code = frame.f_code
filename = frame.f_globals.get('__file__', None)
if filename:
# XXX modname() doesn't work right for packages, so
# the ignore support won't work right for packages
modulename = modname(filename)
if modulename is not None:
ignore_it = self.ignore.names(filename, modulename)
if not ignore_it:
if self.trace:
print (" --- modulename: %s, funcname: %s"
% (modulename, code.co_name))
return self.localtrace
else:
return None
def localtrace_trace_and_count(self, frame, why, arg):
if why == "line":
# record the file name and line number of every trace
filename = frame.f_code.co_filename
lineno = frame.f_lineno
key = filename, lineno
self.counts[key] = self.counts.get(key, 0) + 1
if self.start_time:
print '%.2f' % (time.time() - self.start_time),
bname = os.path.basename(filename)
print "%s(%d): %s" % (bname, lineno,
linecache.getline(filename, lineno)),
return self.localtrace
def localtrace_trace(self, frame, why, arg):
if why == "line":
# record the file name and line number of every trace
filename = frame.f_code.co_filename
lineno = frame.f_lineno
if self.start_time:
print '%.2f' % (time.time() - self.start_time),
bname = os.path.basename(filename)
print "%s(%d): %s" % (bname, lineno,
linecache.getline(filename, lineno)),
return self.localtrace
def localtrace_count(self, frame, why, arg):
if why == "line":
filename = frame.f_code.co_filename
lineno = frame.f_lineno
key = filename, lineno
self.counts[key] = self.counts.get(key, 0) + 1
return self.localtrace
def results(self):
return CoverageResults(self.counts, infile=self.infile,
outfile=self.outfile,
calledfuncs=self._calledfuncs,
callers=self._callers)
def _err_exit(msg):
sys.stderr.write("%s: %s\n" % (sys.argv[0], msg))
sys.exit(1)
def main(argv=None):
import getopt
if argv is None:
argv = sys.argv
try:
opts, prog_argv = getopt.getopt(argv[1:], "tcrRf:d:msC:lTg",
["help", "version", "trace", "count",
"report", "no-report", "summary",
"file=", "missing",
"ignore-module=", "ignore-dir=",
"coverdir=", "listfuncs",
"trackcalls", "timing"])
except getopt.error, msg:
sys.stderr.write("%s: %s\n" % (sys.argv[0], msg))
sys.stderr.write("Try `%s --help' for more information\n"
% sys.argv[0])
sys.exit(1)
trace = 0
count = 0
report = 0
no_report = 0
counts_file = None
missing = 0
ignore_modules = []
ignore_dirs = []
coverdir = None
summary = 0
listfuncs = False
countcallers = False
timing = False
for opt, val in opts:
if opt == "--help":
usage(sys.stdout)
sys.exit(0)
if opt == "--version":
sys.stdout.write("trace 2.0\n")
sys.exit(0)
if opt == "-T" or opt == "--trackcalls":
countcallers = True
continue
if opt == "-l" or opt == "--listfuncs":
listfuncs = True
continue
if opt == "-g" or opt == "--timing":
timing = True
continue
if opt == "-t" or opt == "--trace":
trace = 1
continue
if opt == "-c" or opt == "--count":
count = 1
continue
if opt == "-r" or opt == "--report":
report = 1
continue
if opt == "-R" or opt == "--no-report":
no_report = 1
continue
if opt == "-f" or opt == "--file":
counts_file = val
continue
if opt == "-m" or opt == "--missing":
missing = 1
continue
if opt == "-C" or opt == "--coverdir":
coverdir = val
continue
if opt == "-s" or opt == "--summary":
summary = 1
continue
if opt == "--ignore-module":
for mod in val.split(","):
ignore_modules.append(mod.strip())
continue
if opt == "--ignore-dir":
for s in val.split(os.pathsep):
s = os.path.expandvars(s)
# should I also call expanduser? (after all, could use $HOME)
s = s.replace("$prefix",
os.path.join(sys.prefix, "lib",
"python" + sys.version[:3]))
s = s.replace("$exec_prefix",
os.path.join(sys.exec_prefix, "lib",
"python" + sys.version[:3]))
s = os.path.normpath(s)
ignore_dirs.append(s)
continue
assert 0, "Should never get here"
if listfuncs and (count or trace):
_err_exit("cannot specify both --listfuncs and (--trace or --count)")
if not (count or trace or report or listfuncs or countcallers):
_err_exit("must specify one of --trace, --count, --report, "
"--listfuncs, or --trackcalls")
if report and no_report:
_err_exit("cannot specify both --report and --no-report")
if report and not counts_file:
_err_exit("--report requires a --file")
if no_report and len(prog_argv) == 0:
_err_exit("missing name of file to run")
# everything is ready
if report:
results = CoverageResults(infile=counts_file, outfile=counts_file)
results.write_results(missing, summary=summary, coverdir=coverdir)
else:
sys.argv = prog_argv
progname = prog_argv[0]
sys.path[0] = os.path.split(progname)[0]
t = Trace(count, trace, countfuncs=listfuncs,
countcallers=countcallers, ignoremods=ignore_modules,
ignoredirs=ignore_dirs, infile=counts_file,
outfile=counts_file, timing=timing)
try:
with open(progname) as fp:
code = compile(fp.read(), progname, 'exec')
# try to emulate __main__ namespace as much as possible
globs = {
'__file__': progname,
'__name__': '__main__',
'__package__': None,
'__cached__': None,
}
t.runctx(code, globs, globs)
except IOError, err:
_err_exit("Cannot run file %r because: %s" % (sys.argv[0], err))
except SystemExit:
pass
results = t.results()
if not no_report:
results.write_results(missing, summary=summary, coverdir=coverdir)
if __name__=='__main__':
main()
"""Extract, format and print information about Python stack traces."""
import linecache
import sys
import types
__all__ = ['extract_stack', 'extract_tb', 'format_exception',
'format_exception_only', 'format_list', 'format_stack',
'format_tb', 'print_exc', 'format_exc', 'print_exception',
'print_last', 'print_stack', 'print_tb', 'tb_lineno']
def _print(file, str='', terminator='\n'):
file.write(str+terminator)
def print_list(extracted_list, file=None):
"""Print the list of tuples as returned by extract_tb() or
extract_stack() as a formatted stack trace to the given file."""
if file is None:
file = sys.stderr
for filename, lineno, name, line in extracted_list:
_print(file,
' File "%s", line %d, in %s' % (filename,lineno,name))
if line:
_print(file, ' %s' % line.strip())
def format_list(extracted_list):
"""Format a list of traceback entry tuples for printing.
Given a list of tuples as returned by extract_tb() or
extract_stack(), return a list of strings ready for printing.
Each string in the resulting list corresponds to the item with the
same index in the argument list. Each string ends in a newline;
the strings may contain internal newlines as well, for those items
whose source text line is not None.
"""
list = []
for filename, lineno, name, line in extracted_list:
item = ' File "%s", line %d, in %s\n' % (filename,lineno,name)
if line:
item = item + ' %s\n' % line.strip()
list.append(item)
return list
def print_tb(tb, limit=None, file=None):
"""Print up to 'limit' stack trace entries from the traceback 'tb'.
If 'limit' is omitted or None, all entries are printed. If 'file'
is omitted or None, the output goes to sys.stderr; otherwise
'file' should be an open file or file-like object with a write()
method.
"""
if file is None:
file = sys.stderr
if limit is None:
if hasattr(sys, 'tracebacklimit'):
limit = sys.tracebacklimit
n = 0
while tb is not None and (limit is None or n < limit):
f = tb.tb_frame
lineno = tb.tb_lineno
co = f.f_code
filename = co.co_filename
name = co.co_name
_print(file,
' File "%s", line %d, in %s' % (filename, lineno, name))
linecache.checkcache(filename)
line = linecache.getline(filename, lineno, f.f_globals)
if line: _print(file, ' ' + line.strip())
tb = tb.tb_next
n = n+1
def format_tb(tb, limit = None):
"""A shorthand for 'format_list(extract_tb(tb, limit))'."""
return format_list(extract_tb(tb, limit))
def extract_tb(tb, limit = None):
"""Return list of up to limit pre-processed entries from traceback.
This is useful for alternate formatting of stack traces. If
'limit' is omitted or None, all entries are extracted. A
pre-processed stack trace entry is a quadruple (filename, line
number, function name, text) representing the information that is
usually printed for a stack trace. The text is a string with
leading and trailing whitespace stripped; if the source is not
available it is None.
"""
if limit is None:
if hasattr(sys, 'tracebacklimit'):
limit = sys.tracebacklimit
list = []
n = 0
while tb is not None and (limit is None or n < limit):
f = tb.tb_frame
lineno = tb.tb_lineno
co = f.f_code
filename = co.co_filename
name = co.co_name
linecache.checkcache(filename)
line = linecache.getline(filename, lineno, f.f_globals)
if line: line = line.strip()
else: line = None
list.append((filename, lineno, name, line))
tb = tb.tb_next
n = n+1
return list
def print_exception(etype, value, tb, limit=None, file=None):
"""Print exception up to 'limit' stack trace entries from 'tb' to 'file'.
This differs from print_tb() in the following ways: (1) if
traceback is not None, it prints a header "Traceback (most recent
call last):"; (2) it prints the exception type and value after the
stack trace; (3) if type is SyntaxError and value has the
appropriate format, it prints the line where the syntax error
occurred with a caret on the next line indicating the approximate
position of the error.
"""
if file is None:
file = sys.stderr
if tb:
_print(file, 'Traceback (most recent call last):')
print_tb(tb, limit, file)
lines = format_exception_only(etype, value)
for line in lines:
_print(file, line, '')
def format_exception(etype, value, tb, limit = None):
"""Format a stack trace and the exception information.
The arguments have the same meaning as the corresponding arguments
to print_exception(). The return value is a list of strings, each
ending in a newline and some containing internal newlines. When
these lines are concatenated and printed, exactly the same text is
printed as does print_exception().
"""
if tb:
list = ['Traceback (most recent call last):\n']
list = list + format_tb(tb, limit)
else:
list = []
list = list + format_exception_only(etype, value)
return list
def format_exception_only(etype, value):
"""Format the exception part of a traceback.
The arguments are the exception type and value such as given by
sys.last_type and sys.last_value. The return value is a list of
strings, each ending in a newline.
Normally, the list contains a single string; however, for
SyntaxError exceptions, it contains several lines that (when
printed) display detailed information about where the syntax
error occurred.
The message indicating which exception occurred is always the last
string in the list.
"""
# An instance should not have a meaningful value parameter, but
# sometimes does, particularly for string exceptions, such as
# >>> raise string1, string2 # deprecated
#
# Clear these out first because issubtype(string1, SyntaxError)
# would raise another exception and mask the original problem.
if (isinstance(etype, BaseException) or
isinstance(etype, types.InstanceType) or
etype is None or type(etype) is str):
return [_format_final_exc_line(etype, value)]
stype = etype.__name__
if not issubclass(etype, SyntaxError):
return [_format_final_exc_line(stype, value)]
# It was a syntax error; show exactly where the problem was found.
lines = []
try:
msg, (filename, lineno, offset, badline) = value.args
except Exception:
pass
else:
filename = filename or "<string>"
lines.append(' File "%s", line %d\n' % (filename, lineno))
if badline is not None:
lines.append(' %s\n' % badline.strip())
if offset is not None:
caretspace = badline.rstrip('\n')
offset = min(len(caretspace), offset) - 1
caretspace = caretspace[:offset].lstrip()
# non-space whitespace (likes tabs) must be kept for alignment
caretspace = ((c.isspace() and c or ' ') for c in caretspace)
lines.append(' %s^\n' % ''.join(caretspace))
value = msg
lines.append(_format_final_exc_line(stype, value))
return lines
def _format_final_exc_line(etype, value):
"""Return a list of a single line -- normal case for format_exception_only"""
valuestr = _some_str(value)
if value is None or not valuestr:
line = "%s\n" % etype
else:
line = "%s: %s\n" % (etype, valuestr)
return line
def _some_str(value):
try:
return str(value)
except Exception:
pass
try:
value = unicode(value)
return value.encode("ascii", "backslashreplace")
except Exception:
pass
return '<unprintable %s object>' % type(value).__name__
def print_exc(limit=None, file=None):
"""Shorthand for 'print_exception(sys.exc_type, sys.exc_value, sys.exc_traceback, limit, file)'.
(In fact, it uses sys.exc_info() to retrieve the same information
in a thread-safe way.)"""
if file is None:
file = sys.stderr
try:
etype, value, tb = sys.exc_info()
print_exception(etype, value, tb, limit, file)
finally:
etype = value = tb = None
def format_exc(limit=None):
"""Like print_exc() but return a string."""
try:
etype, value, tb = sys.exc_info()
return ''.join(format_exception(etype, value, tb, limit))
finally:
etype = value = tb = None
def print_last(limit=None, file=None):
"""This is a shorthand for 'print_exception(sys.last_type,
sys.last_value, sys.last_traceback, limit, file)'."""
if not hasattr(sys, "last_type"):
raise ValueError("no last exception")
if file is None:
file = sys.stderr
print_exception(sys.last_type, sys.last_value, sys.last_traceback,
limit, file)
def print_stack(f=None, limit=None, file=None):
"""Print a stack trace from its invocation point.
The optional 'f' argument can be used to specify an alternate
stack frame at which to start. The optional 'limit' and 'file'
arguments have the same meaning as for print_exception().
"""
if f is None:
try:
raise ZeroDivisionError
except ZeroDivisionError:
f = sys.exc_info()[2].tb_frame.f_back
print_list(extract_stack(f, limit), file)
def format_stack(f=None, limit=None):
"""Shorthand for 'format_list(extract_stack(f, limit))'."""
if f is None:
try:
raise ZeroDivisionError
except ZeroDivisionError:
f = sys.exc_info()[2].tb_frame.f_back
return format_list(extract_stack(f, limit))
def extract_stack(f=None, limit = None):
"""Extract the raw traceback from the current stack frame.
The return value has the same format as for extract_tb(). The
optional 'f' and 'limit' arguments have the same meaning as for
print_stack(). Each item in the list is a quadruple (filename,
line number, function name, text), and the entries are in order
from oldest to newest stack frame.
"""
if f is None:
try:
raise ZeroDivisionError
except ZeroDivisionError:
f = sys.exc_info()[2].tb_frame.f_back
if limit is None:
if hasattr(sys, 'tracebacklimit'):
limit = sys.tracebacklimit
list = []
n = 0
while f is not None and (limit is None or n < limit):
lineno = f.f_lineno
co = f.f_code
filename = co.co_filename
name = co.co_name
linecache.checkcache(filename)
line = linecache.getline(filename, lineno, f.f_globals)
if line: line = line.strip()
else: line = None
list.append((filename, lineno, name, line))
f = f.f_back
n = n+1
list.reverse()
return list
def tb_lineno(tb):
"""Calculate correct line number of traceback given in tb.
Obsolete in 2.3.
"""
return tb.tb_lineno
import signal
import weakref
from functools import wraps
__unittest = True
class _InterruptHandler(object):
def __init__(self, default_handler):
self.called = False
self.original_handler = default_handler
if isinstance(default_handler, (int, long)):
if default_handler == signal.SIG_DFL:
# Pretend it's signal.default_int_handler instead.
default_handler = signal.default_int_handler
elif default_handler == signal.SIG_IGN:
# Not quite the same thing as SIG_IGN, but the closest we
# can make it: do nothing.
def default_handler(unused_signum, unused_frame):
pass
else:
raise TypeError("expected SIGINT signal handler to be "
"signal.SIG_IGN, signal.SIG_DFL, or a "
"callable object")
self.default_handler = default_handler
def __call__(self, signum, frame):
installed_handler = signal.getsignal(signal.SIGINT)
if installed_handler is not self:
# if we aren't the installed handler, then delegate immediately
# to the default handler
self.default_handler(signum, frame)
if self.called:
self.default_handler(signum, frame)
self.called = True
for result in _results.keys():
result.stop()
_results = weakref.WeakKeyDictionary()
def registerResult(result):
_results[result] = 1
def removeResult(result):
return bool(_results.pop(result, None))
_interrupt_handler = None
def installHandler():
global _interrupt_handler
if _interrupt_handler is None:
default_handler = signal.getsignal(signal.SIGINT)
_interrupt_handler = _InterruptHandler(default_handler)
signal.signal(signal.SIGINT, _interrupt_handler)
def removeHandler(method=None):
if method is not None:
@wraps(method)
def inner(*args, **kwargs):
initial = signal.getsignal(signal.SIGINT)
removeHandler()
try:
return method(*args, **kwargs)
finally:
signal.signal(signal.SIGINT, initial)
return inner
global _interrupt_handler
if _interrupt_handler is not None:
signal.signal(signal.SIGINT, _interrupt_handler.original_handler)
"""TestSuite"""
import sys
from . import case
from . import util
__unittest = True
def _call_if_exists(parent, attr):
func = getattr(parent, attr, lambda: None)
func()
class BaseTestSuite(object):
"""A simple test suite that doesn't provide class or module shared fixtures.
"""
def __init__(self, tests=()):
self._tests = []
self.addTests(tests)
def __repr__(self):
return "<%s tests=%s>" % (util.strclass(self.__class__), list(self))
def __eq__(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
return list(self) == list(other)
def __ne__(self, other):
return not self == other
# Can't guarantee hash invariant, so flag as unhashable
__hash__ = None
def __iter__(self):
return iter(self._tests)
def countTestCases(self):
cases = 0
for test in self:
cases += test.countTestCases()
return cases
def addTest(self, test):
# sanity checks
if not hasattr(test, '__call__'):
raise TypeError("{} is not callable".format(repr(test)))
if isinstance(test, type) and issubclass(test,
(case.TestCase, TestSuite)):
raise TypeError("TestCases and TestSuites must be instantiated "
"before passing them to addTest()")
self._tests.append(test)
def addTests(self, tests):
if isinstance(tests, basestring):
raise TypeError("tests must be an iterable of tests, not a string")
for test in tests:
self.addTest(test)
def run(self, result):
for test in self:
if result.shouldStop:
break
test(result)
return result
def __call__(self, *args, **kwds):
return self.run(*args, **kwds)
def debug(self):
"""Run the tests without collecting errors in a TestResult"""
for test in self:
test.debug()
class TestSuite(BaseTestSuite):
"""A test suite is a composite test consisting of a number of TestCases.
For use, create an instance of TestSuite, then add test case instances.
When all tests have been added, the suite can be passed to a test
runner, such as TextTestRunner. It will run the individual test cases
in the order in which they were added, aggregating the results. When
subclassing, do not forget to call the base class constructor.
"""
def run(self, result, debug=False):
topLevel = False
if getattr(result, '_testRunEntered', False) is False:
result._testRunEntered = topLevel = True
for test in self:
if result.shouldStop:
break
if _isnotsuite(test):
self._tearDownPreviousClass(test, result)
self._handleModuleFixture(test, result)
self._handleClassSetUp(test, result)
result._previousTestClass = test.__class__
if (getattr(test.__class__, '_classSetupFailed', False) or
getattr(result, '_moduleSetUpFailed', False)):
continue
if not debug:
test(result)
else:
test.debug()
if topLevel:
self._tearDownPreviousClass(None, result)
self._handleModuleTearDown(result)
result._testRunEntered = False
return result
def debug(self):
"""Run the tests without collecting errors in a TestResult"""
debug = _DebugResult()
self.run(debug, True)
################################
def _handleClassSetUp(self, test, result):
previousClass = getattr(result, '_previousTestClass', None)
currentClass = test.__class__
if currentClass == previousClass:
return
if result._moduleSetUpFailed:
return
if getattr(currentClass, "__unittest_skip__", False):
return
try:
currentClass._classSetupFailed = False
except TypeError:
# test may actually be a function
# so its class will be a builtin-type
pass
setUpClass = getattr(currentClass, 'setUpClass', None)
if setUpClass is not None:
_call_if_exists(result, '_setupStdout')
try:
setUpClass()
except Exception as e:
if isinstance(result, _DebugResult):
raise
currentClass._classSetupFailed = True
className = util.strclass(currentClass)
errorName = 'setUpClass (%s)' % className
self._addClassOrModuleLevelException(result, e, errorName)
finally:
_call_if_exists(result, '_restoreStdout')
def _get_previous_module(self, result):
previousModule = None
previousClass = getattr(result, '_previousTestClass', None)
if previousClass is not None:
previousModule = previousClass.__module__
return previousModule
def _handleModuleFixture(self, test, result):
previousModule = self._get_previous_module(result)
currentModule = test.__class__.__module__
if currentModule == previousModule:
return
self._handleModuleTearDown(result)
result._moduleSetUpFailed = False
try:
module = sys.modules[currentModule]
except KeyError:
return
setUpModule = getattr(module, 'setUpModule', None)
if setUpModule is not None:
_call_if_exists(result, '_setupStdout')
try:
setUpModule()
except Exception, e:
if isinstance(result, _DebugResult):
raise
result._moduleSetUpFailed = True
errorName = 'setUpModule (%s)' % currentModule
self._addClassOrModuleLevelException(result, e, errorName)
finally:
_call_if_exists(result, '_restoreStdout')
def _addClassOrModuleLevelException(self, result, exception, errorName):
error = _ErrorHolder(errorName)
addSkip = getattr(result, 'addSkip', None)
if addSkip is not None and isinstance(exception, case.SkipTest):
addSkip(error, str(exception))
else:
result.addError(error, sys.exc_info())
def _handleModuleTearDown(self, result):
previousModule = self._get_previous_module(result)
if previousModule is None:
return
if result._moduleSetUpFailed:
return
try:
module = sys.modules[previousModule]
except KeyError:
return
tearDownModule = getattr(module, 'tearDownModule', None)
if tearDownModule is not None:
_call_if_exists(result, '_setupStdout')
try:
tearDownModule()
except Exception as e:
if isinstance(result, _DebugResult):
raise
errorName = 'tearDownModule (%s)' % previousModule
self._addClassOrModuleLevelException(result, e, errorName)
finally:
_call_if_exists(result, '_restoreStdout')
def _tearDownPreviousClass(self, test, result):
previousClass = getattr(result, '_previousTestClass', None)
currentClass = test.__class__
if currentClass == previousClass:
return
if getattr(previousClass, '_classSetupFailed', False):
return
if getattr(result, '_moduleSetUpFailed', False):
return
if getattr(previousClass, "__unittest_skip__", False):
return
tearDownClass = getattr(previousClass, 'tearDownClass', None)
if tearDownClass is not None:
_call_if_exists(result, '_setupStdout')
try:
tearDownClass()
except Exception, e:
if isinstance(result, _DebugResult):
raise
className = util.strclass(previousClass)
errorName = 'tearDownClass (%s)' % className
self._addClassOrModuleLevelException(result, e, errorName)
finally:
_call_if_exists(result, '_restoreStdout')
class _ErrorHolder(object):
"""
Placeholder for a TestCase inside a result. As far as a TestResult
is concerned, this looks exactly like a unit test. Used to insert
arbitrary errors into a test suite run.
"""
# Inspired by the ErrorHolder from Twisted:
# http://twistedmatrix.com/trac/browser/trunk/twisted/trial/runner.py
# attribute used by TestResult._exc_info_to_string
failureException = None
def __init__(self, description):
self.description = description
def id(self):
return self.description
def shortDescription(self):
return None
def __repr__(self):
return "<ErrorHolder description=%r>" % (self.description,)
def __str__(self):
return self.id()
def run(self, result):
# could call result.addError(...) - but this test-like object
# shouldn't be run anyway
pass
def __call__(self, result):
return self.run(result)
def countTestCases(self):
return 0
def _isnotsuite(test):
"A crude way to tell apart testcases and suites with duck-typing"
try:
iter(test)
except TypeError:
return True
return False
class _DebugResult(object):
"Used by the TestSuite to hold previous class when running in debug."
_previousTestClass = None
_moduleSetUpFailed = False
shouldStop = False
"""Various utility functions."""
from collections import namedtuple, OrderedDict
__unittest = True
_MAX_LENGTH = 80
def safe_repr(obj, short=False):
try:
result = repr(obj)
except Exception:
result = object.__repr__(obj)
if not short or len(result) < _MAX_LENGTH:
return result
return result[:_MAX_LENGTH] + ' [truncated]...'
def strclass(cls):
return "%s.%s" % (cls.__module__, cls.__name__)
def sorted_list_difference(expected, actual):
"""Finds elements in only one or the other of two, sorted input lists.
Returns a two-element tuple of lists. The first list contains those
elements in the "expected" list but not in the "actual" list, and the
second contains those elements in the "actual" list but not in the
"expected" list. Duplicate elements in either input list are ignored.
"""
i = j = 0
missing = []
unexpected = []
while True:
try:
e = expected[i]
a = actual[j]
if e < a:
missing.append(e)
i += 1
while expected[i] == e:
i += 1
elif e > a:
unexpected.append(a)
j += 1
while actual[j] == a:
j += 1
else:
i += 1
try:
while expected[i] == e:
i += 1
finally:
j += 1
while actual[j] == a:
j += 1
except IndexError:
missing.extend(expected[i:])
unexpected.extend(actual[j:])
break
return missing, unexpected
def unorderable_list_difference(expected, actual, ignore_duplicate=False):
"""Same behavior as sorted_list_difference but
for lists of unorderable items (like dicts).
As it does a linear search per item (remove) it
has O(n*n) performance.
"""
missing = []
unexpected = []
while expected:
item = expected.pop()
try:
actual.remove(item)
except ValueError:
missing.append(item)
if ignore_duplicate:
for lst in expected, actual:
try:
while True:
lst.remove(item)
except ValueError:
pass
if ignore_duplicate:
while actual:
item = actual.pop()
unexpected.append(item)
try:
while True:
actual.remove(item)
except ValueError:
pass
return missing, unexpected
# anything left in actual is unexpected
return missing, actual
_Mismatch = namedtuple('Mismatch', 'actual expected value')
def _count_diff_all_purpose(actual, expected):
'Returns list of (cnt_act, cnt_exp, elem) triples where the counts differ'
# elements need not be hashable
s, t = list(actual), list(expected)
m, n = len(s), len(t)
NULL = object()
result = []
for i, elem in enumerate(s):
if elem is NULL:
continue
cnt_s = cnt_t = 0
for j in range(i, m):
if s[j] == elem:
cnt_s += 1
s[j] = NULL
for j, other_elem in enumerate(t):
if other_elem == elem:
cnt_t += 1
t[j] = NULL
if cnt_s != cnt_t:
diff = _Mismatch(cnt_s, cnt_t, elem)
result.append(diff)
for i, elem in enumerate(t):
if elem is NULL:
continue
cnt_t = 0
for j in range(i, n):
if t[j] == elem:
cnt_t += 1
t[j] = NULL
diff = _Mismatch(0, cnt_t, elem)
result.append(diff)
return result
def _ordered_count(iterable):
'Return dict of element counts, in the order they were first seen'
c = OrderedDict()
for elem in iterable:
c[elem] = c.get(elem, 0) + 1
return c
def _count_diff_hashable(actual, expected):
'Returns list of (cnt_act, cnt_exp, elem) triples where the counts differ'
# elements must be hashable
s, t = _ordered_count(actual), _ordered_count(expected)
result = []
for elem, cnt_s in s.items():
cnt_t = t.get(elem, 0)
if cnt_s != cnt_t:
diff = _Mismatch(cnt_s, cnt_t, elem)
result.append(diff)
for elem, cnt_t in t.items():
if elem not in s:
diff = _Mismatch(0, cnt_t, elem)
result.append(diff)
return result
"""
Python unit testing framework, based on Erich Gamma's JUnit and Kent Beck's
Smalltalk testing framework.
This module contains the core framework classes that form the basis of
specific test cases and suites (TestCase, TestSuite etc.), and also a
text-based utility class for running the tests and reporting the results
(TextTestRunner).
Simple usage:
import unittest
class IntegerArithmeticTestCase(unittest.TestCase):
def testAdd(self): ## test method names begin 'test*'
self.assertEqual((1 + 2), 3)
self.assertEqual(0 + 1, 1)
def testMultiply(self):
self.assertEqual((0 * 10), 0)
self.assertEqual((5 * 8), 40)
if __name__ == '__main__':
unittest.main()
Further information is available in the bundled documentation, and from
http://docs.python.org/library/unittest.html
Copyright (c) 1999-2003 Steve Purcell
Copyright (c) 2003-2010 Python Software Foundation
This module is free software, and you may redistribute it and/or modify
it under the same terms as Python itself, so long as this copyright message
and disclaimer are retained in their original form.
IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,
SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF
THIS CODE, EVEN IF THE AUTHOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.
THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE CODE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS,
AND THERE IS NO OBLIGATION WHATSOEVER TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
"""
__all__ = ['TestResult', 'TestCase', 'TestSuite',
'TextTestRunner', 'TestLoader', 'FunctionTestCase', 'main',
'defaultTestLoader', 'SkipTest', 'skip', 'skipIf', 'skipUnless',
'expectedFailure', 'TextTestResult', 'installHandler',
'registerResult', 'removeResult', 'removeHandler']
# Expose obsolete functions for backwards compatibility
__all__.extend(['getTestCaseNames', 'makeSuite', 'findTestCases'])
__unittest = True
from .result import TestResult
from .case import (TestCase, FunctionTestCase, SkipTest, skip, skipIf,
skipUnless, expectedFailure)
from .suite import BaseTestSuite, TestSuite
from .loader import (TestLoader, defaultTestLoader, makeSuite, getTestCaseNames,
findTestCases)
from .main import TestProgram, main
from .runner import TextTestRunner, TextTestResult
from .signals import installHandler, registerResult, removeResult, removeHandler
# deprecated
_TextTestResult = TextTestResult
"""Main entry point"""
import sys
if sys.argv[0].endswith("__main__.py"):
sys.argv[0] = "python -m unittest"
__unittest = True
from .main import main, TestProgram, USAGE_AS_MAIN
TestProgram.USAGE = USAGE_AS_MAIN
main(module=None)
"""Open an arbitrary URL.
See the following document for more info on URLs:
"Names and Addresses, URIs, URLs, URNs, URCs", at
http://www.w3.org/pub/WWW/Addressing/Overview.html
See also the HTTP spec (from which the error codes are derived):
"HTTP - Hypertext Transfer Protocol", at
http://www.w3.org/pub/WWW/Protocols/
Related standards and specs:
- RFC1808: the "relative URL" spec. (authoritative status)
- RFC1738 - the "URL standard". (authoritative status)
- RFC1630 - the "URI spec". (informational status)
The object returned by URLopener().open(file) will differ per
protocol. All you know is that is has methods read(), readline(),
readlines(), fileno(), close() and info(). The read*(), fileno()
and close() methods work like those of open files.
The info() method returns a mimetools.Message object which can be
used to query various info about the object, if available.
(mimetools.Message objects are queried with the getheader() method.)
"""
import string
import socket
import os
import time
import sys
import base64
import re
from urlparse import urljoin as basejoin
__all__ = ["urlopen", "URLopener", "FancyURLopener", "urlretrieve",
"urlcleanup", "quote", "quote_plus", "unquote", "unquote_plus",
"urlencode", "url2pathname", "pathname2url", "splittag",
"localhost", "thishost", "ftperrors", "basejoin", "unwrap",
"splittype", "splithost", "splituser", "splitpasswd", "splitport",
"splitnport", "splitquery", "splitattr", "splitvalue",
"getproxies"]
__version__ = '1.17' # XXX This version is not always updated :-(
MAXFTPCACHE = 10 # Trim the ftp cache beyond this size
# Helper for non-unix systems
if os.name == 'nt':
from nturl2path import url2pathname, pathname2url
elif os.name == 'riscos':
from rourl2path import url2pathname, pathname2url
else:
def url2pathname(pathname):
"""OS-specific conversion from a relative URL of the 'file' scheme
to a file system path; not recommended for general use."""
return unquote(pathname)
def pathname2url(pathname):
"""OS-specific conversion from a file system path to a relative URL
of the 'file' scheme; not recommended for general use."""
return quote(pathname)
# This really consists of two pieces:
# (1) a class which handles opening of all sorts of URLs
# (plus assorted utilities etc.)
# (2) a set of functions for parsing URLs
# XXX Should these be separated out into different modules?
# Shortcut for basic usage
_urlopener = None
def urlopen(url, data=None, proxies=None, context=None):
"""Create a file-like object for the specified URL to read from."""
from warnings import warnpy3k
warnpy3k("urllib.urlopen() has been removed in Python 3.0 in "
"favor of urllib2.urlopen()", stacklevel=2)
global _urlopener
if proxies is not None or context is not None:
opener = FancyURLopener(proxies=proxies, context=context)
elif not _urlopener:
opener = FancyURLopener()
_urlopener = opener
else:
opener = _urlopener
if data is None:
return opener.open(url)
else:
return opener.open(url, data)
def urlretrieve(url, filename=None, reporthook=None, data=None, context=None):
global _urlopener
if context is not None:
opener = FancyURLopener(context=context)
elif not _urlopener:
_urlopener = opener = FancyURLopener()
else:
opener = _urlopener
return opener.retrieve(url, filename, reporthook, data)
def urlcleanup():
if _urlopener:
_urlopener.cleanup()
_safe_quoters.clear()
ftpcache.clear()
# check for SSL
try:
import ssl
except:
_have_ssl = False
else:
_have_ssl = True
# exception raised when downloaded size does not match content-length
class ContentTooShortError(IOError):
def __init__(self, message, content):
IOError.__init__(self, message)
self.content = content
ftpcache = {}
class URLopener:
"""Class to open URLs.
This is a class rather than just a subroutine because we may need
more than one set of global protocol-specific options.
Note -- this is a base class for those who don't want the
automatic handling of errors type 302 (relocated) and 401
(authorization needed)."""
__tempfiles = None
version = "Python-urllib/%s" % __version__
# Constructor
def __init__(self, proxies=None, context=None, **x509):
if proxies is None:
proxies = getproxies()
assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
self.proxies = proxies
self.key_file = x509.get('key_file')
self.cert_file = x509.get('cert_file')
self.context = context
self.addheaders = [('User-Agent', self.version), ('Accept', '*/*')]
self.__tempfiles = []
self.__unlink = os.unlink # See cleanup()
self.tempcache = None
# Undocumented feature: if you assign {} to tempcache,
# it is used to cache files retrieved with
# self.retrieve(). This is not enabled by default
# since it does not work for changing documents (and I
# haven't got the logic to check expiration headers
# yet).
self.ftpcache = ftpcache
# Undocumented feature: you can use a different
# ftp cache by assigning to the .ftpcache member;
# in case you want logically independent URL openers
# XXX This is not threadsafe. Bah.
def __del__(self):
self.close()
def close(self):
self.cleanup()
def cleanup(self):
# This code sometimes runs when the rest of this module
# has already been deleted, so it can't use any globals
# or import anything.
if self.__tempfiles:
for file in self.__tempfiles:
try:
self.__unlink(file)
except OSError:
pass
del self.__tempfiles[:]
if self.tempcache:
self.tempcache.clear()
def addheader(self, *args):
"""Add a header to be used by the HTTP interface only
e.g. u.addheader('Accept', 'sound/basic')"""
self.addheaders.append(args)
# External interface
def open(self, fullurl, data=None):
"""Use URLopener().open(file) instead of open(file, 'r')."""
fullurl = unwrap(toBytes(fullurl))
# percent encode url, fixing lame server errors for e.g, like space
# within url paths.
fullurl = quote(fullurl, safe="%/:=&?~#+!$,;'@()*[]|")
if self.tempcache and fullurl in self.tempcache:
filename, headers = self.tempcache[fullurl]
fp = open(filename, 'rb')
return addinfourl(fp, headers, fullurl)
urltype, url = splittype(fullurl)
if not urltype:
urltype = 'file'
if urltype in self.proxies:
proxy = self.proxies[urltype]
urltype, proxyhost = splittype(proxy)
host, selector = splithost(proxyhost)
url = (host, fullurl) # Signal special case to open_*()
else:
proxy = None
name = 'open_' + urltype
self.type = urltype
name = name.replace('-', '_')
# bpo-35907: disallow the file reading with the type not allowed
if not hasattr(self, name) or name == 'open_local_file':
if proxy:
return self.open_unknown_proxy(proxy, fullurl, data)
else:
return self.open_unknown(fullurl, data)
try:
if data is None:
return getattr(self, name)(url)
else:
return getattr(self, name)(url, data)
except socket.error, msg:
raise IOError, ('socket error', msg), sys.exc_info()[2]
def open_unknown(self, fullurl, data=None):
"""Overridable interface to open unknown URL type."""
type, url = splittype(fullurl)
raise IOError, ('url error', 'unknown url type', type)
def open_unknown_proxy(self, proxy, fullurl, data=None):
"""Overridable interface to open unknown URL type."""
type, url = splittype(fullurl)
raise IOError, ('url error', 'invalid proxy for %s' % type, proxy)
# External interface
def retrieve(self, url, filename=None, reporthook=None, data=None):
"""retrieve(url) returns (filename, headers) for a local object
or (tempfilename, headers) for a remote object."""
url = unwrap(toBytes(url))
if self.tempcache and url in self.tempcache:
return self.tempcache[url]
type, url1 = splittype(url)
if filename is None and (not type or type == 'file'):
try:
fp = self.open_local_file(url1)
hdrs = fp.info()
fp.close()
return url2pathname(splithost(url1)[1]), hdrs
except IOError:
pass
fp = self.open(url, data)
try:
headers = fp.info()
if filename:
tfp = open(filename, 'wb')
else:
import tempfile
garbage, path = splittype(url)
garbage, path = splithost(path or "")
path, garbage = splitquery(path or "")
path, garbage = splitattr(path or "")
suffix = os.path.splitext(path)[1]
(fd, filename) = tempfile.mkstemp(suffix)
self.__tempfiles.append(filename)
tfp = os.fdopen(fd, 'wb')
try:
result = filename, headers
if self.tempcache is not None:
self.tempcache[url] = result
bs = 1024*8
size = -1
read = 0
blocknum = 0
if "content-length" in headers:
size = int(headers["Content-Length"])
if reporthook:
reporthook(blocknum, bs, size)
while 1:
block = fp.read(bs)
if block == "":
break
read += len(block)
tfp.write(block)
blocknum += 1
if reporthook:
reporthook(blocknum, bs, size)
finally:
tfp.close()
finally:
fp.close()
# raise exception if actual size does not match content-length header
if size >= 0 and read < size:
raise ContentTooShortError("retrieval incomplete: got only %i out "
"of %i bytes" % (read, size), result)
return result
# Each method named open_<type> knows how to open that type of URL
def open_http(self, url, data=None):
"""Use HTTP protocol."""
import httplib
user_passwd = None
proxy_passwd= None
if isinstance(url, str):
host, selector = splithost(url)
if host:
user_passwd, host = splituser(host)
host = unquote(host)
realhost = host
else:
host, selector = url
# check whether the proxy contains authorization information
proxy_passwd, host = splituser(host)
# now we proceed with the url we want to obtain
urltype, rest = splittype(selector)
url = rest
user_passwd = None
if urltype.lower() != 'http':
realhost = None
else:
realhost, rest = splithost(rest)
if realhost:
user_passwd, realhost = splituser(realhost)
if user_passwd:
selector = "%s://%s%s" % (urltype, realhost, rest)
if proxy_bypass(realhost):
host = realhost
#print "proxy via http:", host, selector
if not host: raise IOError, ('http error', 'no host given')
if proxy_passwd:
proxy_passwd = unquote(proxy_passwd)
proxy_auth = base64.b64encode(proxy_passwd).strip()
else:
proxy_auth = None
if user_passwd:
user_passwd = unquote(user_passwd)
auth = base64.b64encode(user_passwd).strip()
else:
auth = None
h = httplib.HTTP(host)
if data is not None:
h.putrequest('POST', selector)
h.putheader('Content-Type', 'application/x-www-form-urlencoded')
h.putheader('Content-Length', '%d' % len(data))
else:
h.putrequest('GET', selector)
if proxy_auth: h.putheader('Proxy-Authorization', 'Basic %s' % proxy_auth)
if auth: h.putheader('Authorization', 'Basic %s' % auth)
if realhost: h.putheader('Host', realhost)
for args in self.addheaders: h.putheader(*args)
h.endheaders(data)
errcode, errmsg, headers = h.getreply()
fp = h.getfile()
if errcode == -1:
if fp: fp.close()
# something went wrong with the HTTP status line
raise IOError, ('http protocol error', 0,
'got a bad status line', None)
# According to RFC 2616, "2xx" code indicates that the client's
# request was successfully received, understood, and accepted.
if (200 <= errcode < 300):
return addinfourl(fp, headers, "http:" + url, errcode)
else:
if data is None:
return self.http_error(url, fp, errcode, errmsg, headers)
else:
return self.http_error(url, fp, errcode, errmsg, headers, data)
def http_error(self, url, fp, errcode, errmsg, headers, data=None):
"""Handle http errors.
Derived class can override this, or provide specific handlers
named http_error_DDD where DDD is the 3-digit error code."""
# First check if there's a specific handler for this error
name = 'http_error_%d' % errcode
if hasattr(self, name):
method = getattr(self, name)
if data is None:
result = method(url, fp, errcode, errmsg, headers)
else:
result = method(url, fp, errcode, errmsg, headers, data)
if result: return result
return self.http_error_default(url, fp, errcode, errmsg, headers)
def http_error_default(self, url, fp, errcode, errmsg, headers):
"""Default error handler: close the connection and raise IOError."""
fp.close()
raise IOError, ('http error', errcode, errmsg, headers)
if _have_ssl:
def open_https(self, url, data=None):
"""Use HTTPS protocol."""
import httplib
user_passwd = None
proxy_passwd = None
if isinstance(url, str):
host, selector = splithost(url)
if host:
user_passwd, host = splituser(host)
host = unquote(host)
realhost = host
else:
host, selector = url
# here, we determine, whether the proxy contains authorization information
proxy_passwd, host = splituser(host)
urltype, rest = splittype(selector)
url = rest
user_passwd = None
if urltype.lower() != 'https':
realhost = None
else:
realhost, rest = splithost(rest)
if realhost:
user_passwd, realhost = splituser(realhost)
if user_passwd:
selector = "%s://%s%s" % (urltype, realhost, rest)
#print "proxy via https:", host, selector
if not host: raise IOError, ('https error', 'no host given')
if proxy_passwd:
proxy_passwd = unquote(proxy_passwd)
proxy_auth = base64.b64encode(proxy_passwd).strip()
else:
proxy_auth = None
if user_passwd:
user_passwd = unquote(user_passwd)
auth = base64.b64encode(user_passwd).strip()
else:
auth = None
h = httplib.HTTPS(host, 0,
key_file=self.key_file,
cert_file=self.cert_file,
context=self.context)
if data is not None:
h.putrequest('POST', selector)
h.putheader('Content-Type',
'application/x-www-form-urlencoded')
h.putheader('Content-Length', '%d' % len(data))
else:
h.putrequest('GET', selector)
if proxy_auth: h.putheader('Proxy-Authorization', 'Basic %s' % proxy_auth)
if auth: h.putheader('Authorization', 'Basic %s' % auth)
if realhost: h.putheader('Host', realhost)
for args in self.addheaders: h.putheader(*args)
h.endheaders(data)
errcode, errmsg, headers = h.getreply()
fp = h.getfile()
if errcode == -1:
if fp: fp.close()
# something went wrong with the HTTP status line
raise IOError, ('http protocol error', 0,
'got a bad status line', None)
# According to RFC 2616, "2xx" code indicates that the client's
# request was successfully received, understood, and accepted.
if (200 <= errcode < 300):
return addinfourl(fp, headers, "https:" + url, errcode)
else:
if data is None:
return self.http_error(url, fp, errcode, errmsg, headers)
else:
return self.http_error(url, fp, errcode, errmsg, headers,
data)
def open_file(self, url):
"""Use local file or FTP depending on form of URL."""
if not isinstance(url, str):
raise IOError, ('file error', 'proxy support for file protocol currently not implemented')
if url[:2] == '//' and url[2:3] != '/' and url[2:12].lower() != 'localhost/':
return self.open_ftp(url)
else:
return self.open_local_file(url)
def open_local_file(self, url):
"""Use local file."""
import mimetypes, mimetools, email.utils
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
host, file = splithost(url)
localname = url2pathname(file)
try:
stats = os.stat(localname)
except OSError, e:
raise IOError(e.errno, e.strerror, e.filename)
size = stats.st_size
modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
mtype = mimetypes.guess_type(url)[0]
headers = mimetools.Message(StringIO(
'Content-Type: %s\nContent-Length: %d\nLast-modified: %s\n' %
(mtype or 'text/plain', size, modified)))
if not host:
urlfile = file
if file[:1] == '/':
urlfile = 'file://' + file
elif file[:2] == './':
raise ValueError("local file url may start with / or file:. Unknown url of type: %s" % url)
return addinfourl(open(localname, 'rb'),
headers, urlfile)
host, port = splitport(host)
if not port \
and socket.gethostbyname(host) in (localhost(), thishost()):
urlfile = file
if file[:1] == '/':
urlfile = 'file://' + file
return addinfourl(open(localname, 'rb'),
headers, urlfile)
raise IOError, ('local file error', 'not on local host')
def open_ftp(self, url):
"""Use FTP protocol."""
if not isinstance(url, str):
raise IOError, ('ftp error', 'proxy support for ftp protocol currently not implemented')
import mimetypes, mimetools
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
host, path = splithost(url)
if not host: raise IOError, ('ftp error', 'no host given')
host, port = splitport(host)
user, host = splituser(host)
if user: user, passwd = splitpasswd(user)
else: passwd = None
host = unquote(host)
user = user or ''
passwd = passwd or ''
host = socket.gethostbyname(host)
if not port:
import ftplib
port = ftplib.FTP_PORT
else:
port = int(port)
path, attrs = splitattr(path)
path = unquote(path)
dirs = path.split('/')
dirs, file = dirs[:-1], dirs[-1]
if dirs and not dirs[0]: dirs = dirs[1:]
if dirs and not dirs[0]: dirs[0] = '/'
key = user, host, port, '/'.join(dirs)
# XXX thread unsafe!
if len(self.ftpcache) > MAXFTPCACHE:
# Prune the cache, rather arbitrarily
for k in self.ftpcache.keys():
if k != key:
v = self.ftpcache[k]
del self.ftpcache[k]
v.close()
try:
if not key in self.ftpcache:
self.ftpcache[key] = \
ftpwrapper(user, passwd, host, port, dirs)
if not file: type = 'D'
else: type = 'I'
for attr in attrs:
attr, value = splitvalue(attr)
if attr.lower() == 'type' and \
value in ('a', 'A', 'i', 'I', 'd', 'D'):
type = value.upper()
(fp, retrlen) = self.ftpcache[key].retrfile(file, type)
mtype = mimetypes.guess_type("ftp:" + url)[0]
headers = ""
if mtype:
headers += "Content-Type: %s\n" % mtype
if retrlen is not None and retrlen >= 0:
headers += "Content-Length: %d\n" % retrlen
headers = mimetools.Message(StringIO(headers))
return addinfourl(fp, headers, "ftp:" + url)
except ftperrors(), msg:
raise IOError, ('ftp error', msg), sys.exc_info()[2]
def open_data(self, url, data=None):
"""Use "data" URL."""
if not isinstance(url, str):
raise IOError, ('data error', 'proxy support for data protocol currently not implemented')
# ignore POSTed data
#
# syntax of data URLs:
# dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
# mediatype := [ type "/" subtype ] *( ";" parameter )
# data := *urlchar
# parameter := attribute "=" value
import mimetools
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
try:
[type, data] = url.split(',', 1)
except ValueError:
raise IOError, ('data error', 'bad data URL')
if not type:
type = 'text/plain;charset=US-ASCII'
semi = type.rfind(';')
if semi >= 0 and '=' not in type[semi:]:
encoding = type[semi+1:]
type = type[:semi]
else:
encoding = ''
msg = []
msg.append('Date: %s'%time.strftime('%a, %d %b %Y %H:%M:%S GMT',
time.gmtime(time.time())))
msg.append('Content-type: %s' % type)
if encoding == 'base64':
data = base64.decodestring(data)
else:
data = unquote(data)
msg.append('Content-Length: %d' % len(data))
msg.append('')
msg.append(data)
msg = '\n'.join(msg)
f = StringIO(msg)
headers = mimetools.Message(f, 0)
#f.fileno = None # needed for addinfourl
return addinfourl(f, headers, url)
class FancyURLopener(URLopener):
"""Derived class with handlers for errors we can handle (perhaps)."""
def __init__(self, *args, **kwargs):
URLopener.__init__(self, *args, **kwargs)
self.auth_cache = {}
self.tries = 0
self.maxtries = 10
def http_error_default(self, url, fp, errcode, errmsg, headers):
"""Default error handling -- don't raise an exception."""
return addinfourl(fp, headers, "http:" + url, errcode)
def http_error_302(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 302 -- relocated (temporarily)."""
self.tries += 1
try:
if self.maxtries and self.tries >= self.maxtries:
if hasattr(self, "http_error_500"):
meth = self.http_error_500
else:
meth = self.http_error_default
return meth(url, fp, 500,
"Internal Server Error: Redirect Recursion",
headers)
result = self.redirect_internal(url, fp, errcode, errmsg,
headers, data)
return result
finally:
self.tries = 0
def redirect_internal(self, url, fp, errcode, errmsg, headers, data):
if 'location' in headers:
newurl = headers['location']
elif 'uri' in headers:
newurl = headers['uri']
else:
return
fp.close()
# In case the server sent a relative URL, join with original:
newurl = basejoin(self.type + ":" + url, newurl)
# For security reasons we do not allow redirects to protocols
# other than HTTP, HTTPS or FTP.
newurl_lower = newurl.lower()
if not (newurl_lower.startswith('http://') or
newurl_lower.startswith('https://') or
newurl_lower.startswith('ftp://')):
raise IOError('redirect error', errcode,
errmsg + " - Redirection to url '%s' is not allowed" %
newurl,
headers)
return self.open(newurl)
def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 301 -- also relocated (permanently)."""
return self.http_error_302(url, fp, errcode, errmsg, headers, data)
def http_error_303(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 303 -- also relocated (essentially identical to 302)."""
return self.http_error_302(url, fp, errcode, errmsg, headers, data)
def http_error_307(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 307 -- relocated, but turn POST into error."""
if data is None:
return self.http_error_302(url, fp, errcode, errmsg, headers, data)
else:
return self.http_error_default(url, fp, errcode, errmsg, headers)
def http_error_401(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 401 -- authentication required.
This function supports Basic authentication only."""
if not 'www-authenticate' in headers:
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
stuff = headers['www-authenticate']
import re
match = re.match('[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', stuff)
if not match:
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
scheme, realm = match.groups()
if scheme.lower() != 'basic':
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
name = 'retry_' + self.type + '_basic_auth'
if data is None:
return getattr(self,name)(url, realm)
else:
return getattr(self,name)(url, realm, data)
def http_error_407(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 407 -- proxy authentication required.
This function supports Basic authentication only."""
if not 'proxy-authenticate' in headers:
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
stuff = headers['proxy-authenticate']
import re
match = re.match('[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', stuff)
if not match:
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
scheme, realm = match.groups()
if scheme.lower() != 'basic':
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
name = 'retry_proxy_' + self.type + '_basic_auth'
if data is None:
return getattr(self,name)(url, realm)
else:
return getattr(self,name)(url, realm, data)
def retry_proxy_http_basic_auth(self, url, realm, data=None):
host, selector = splithost(url)
newurl = 'http://' + host + selector
proxy = self.proxies['http']
urltype, proxyhost = splittype(proxy)
proxyhost, proxyselector = splithost(proxyhost)
i = proxyhost.find('@') + 1
proxyhost = proxyhost[i:]
user, passwd = self.get_user_passwd(proxyhost, realm, i)
if not (user or passwd): return None
proxyhost = quote(user, safe='') + ':' + quote(passwd, safe='') + '@' + proxyhost
self.proxies['http'] = 'http://' + proxyhost + proxyselector
if data is None:
return self.open(newurl)
else:
return self.open(newurl, data)
def retry_proxy_https_basic_auth(self, url, realm, data=None):
host, selector = splithost(url)
newurl = 'https://' + host + selector
proxy = self.proxies['https']
urltype, proxyhost = splittype(proxy)
proxyhost, proxyselector = splithost(proxyhost)
i = proxyhost.find('@') + 1
proxyhost = proxyhost[i:]
user, passwd = self.get_user_passwd(proxyhost, realm, i)
if not (user or passwd): return None
proxyhost = quote(user, safe='') + ':' + quote(passwd, safe='') + '@' + proxyhost
self.proxies['https'] = 'https://' + proxyhost + proxyselector
if data is None:
return self.open(newurl)
else:
return self.open(newurl, data)
def retry_http_basic_auth(self, url, realm, data=None):
host, selector = splithost(url)
i = host.find('@') + 1
host = host[i:]
user, passwd = self.get_user_passwd(host, realm, i)
if not (user or passwd): return None
host = quote(user, safe='') + ':' + quote(passwd, safe='') + '@' + host
newurl = 'http://' + host + selector
if data is None:
return self.open(newurl)
else:
return self.open(newurl, data)
def retry_https_basic_auth(self, url, realm, data=None):
host, selector = splithost(url)
i = host.find('@') + 1
host = host[i:]
user, passwd = self.get_user_passwd(host, realm, i)
if not (user or passwd): return None
host = quote(user, safe='') + ':' + quote(passwd, safe='') + '@' + host
newurl = 'https://' + host + selector
if data is None:
return self.open(newurl)
else:
return self.open(newurl, data)
def get_user_passwd(self, host, realm, clear_cache=0):
key = realm + '@' + host.lower()
if key in self.auth_cache:
if clear_cache:
del self.auth_cache[key]
else:
return self.auth_cache[key]
user, passwd = self.prompt_user_passwd(host, realm)
if user or passwd: self.auth_cache[key] = (user, passwd)
return user, passwd
def prompt_user_passwd(self, host, realm):
"""Override this in a GUI environment!"""
import getpass
try:
user = raw_input("Enter username for %s at %s: " % (realm,
host))
passwd = getpass.getpass("Enter password for %s in %s at %s: " %
(user, realm, host))
return user, passwd
except KeyboardInterrupt:
print
return None, None
# Utility functions
_localhost = None
def localhost():
"""Return the IP address of the magic hostname 'localhost'."""
global _localhost
if _localhost is None:
_localhost = socket.gethostbyname('localhost')
return _localhost
_thishost = None
def thishost():
"""Return the IP address of the current host."""
global _thishost
if _thishost is None:
try:
_thishost = socket.gethostbyname(socket.gethostname())
except socket.gaierror:
_thishost = socket.gethostbyname('localhost')
return _thishost
_ftperrors = None
def ftperrors():
"""Return the set of errors raised by the FTP class."""
global _ftperrors
if _ftperrors is None:
import ftplib
_ftperrors = ftplib.all_errors
return _ftperrors
_noheaders = None
def noheaders():
"""Return an empty mimetools.Message object."""
global _noheaders
if _noheaders is None:
import mimetools
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
_noheaders = mimetools.Message(StringIO(), 0)
_noheaders.fp.close() # Recycle file descriptor
return _noheaders
# Utility classes
class ftpwrapper:
"""Class used by open_ftp() for cache of open FTP connections."""
def __init__(self, user, passwd, host, port, dirs,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
persistent=True):
self.user = user
self.passwd = passwd
self.host = host
self.port = port
self.dirs = dirs
self.timeout = timeout
self.refcount = 0
self.keepalive = persistent
try:
self.init()
except:
self.close()
raise
def init(self):
import ftplib
self.busy = 0
self.ftp = ftplib.FTP()
self.ftp.connect(self.host, self.port, self.timeout)
self.ftp.login(self.user, self.passwd)
_target = '/'.join(self.dirs)
self.ftp.cwd(_target)
def retrfile(self, file, type):
import ftplib
self.endtransfer()
if type in ('d', 'D'): cmd = 'TYPE A'; isdir = 1
else: cmd = 'TYPE ' + type; isdir = 0
try:
self.ftp.voidcmd(cmd)
except ftplib.all_errors:
self.init()
self.ftp.voidcmd(cmd)
conn = None
if file and not isdir:
# Try to retrieve as a file
try:
cmd = 'RETR ' + file
conn, retrlen = self.ftp.ntransfercmd(cmd)
except ftplib.error_perm, reason:
if str(reason)[:3] != '550':
raise IOError, ('ftp error', reason), sys.exc_info()[2]
if not conn:
# Set transfer mode to ASCII!
self.ftp.voidcmd('TYPE A')
# Try a directory listing. Verify that directory exists.
if file:
pwd = self.ftp.pwd()
try:
try:
self.ftp.cwd(file)
except ftplib.error_perm, reason:
raise IOError, ('ftp error', reason), sys.exc_info()[2]
finally:
self.ftp.cwd(pwd)
cmd = 'LIST ' + file
else:
cmd = 'LIST'
conn, retrlen = self.ftp.ntransfercmd(cmd)
self.busy = 1
ftpobj = addclosehook(conn.makefile('rb'), self.file_close)
self.refcount += 1
conn.close()
# Pass back both a suitably decorated object and a retrieval length
return (ftpobj, retrlen)
def endtransfer(self):
if not self.busy:
return
self.busy = 0
try:
self.ftp.voidresp()
except ftperrors():
pass
def close(self):
self.keepalive = False
if self.refcount <= 0:
self.real_close()
def file_close(self):
self.endtransfer()
self.refcount -= 1
if self.refcount <= 0 and not self.keepalive:
self.real_close()
def real_close(self):
self.endtransfer()
try:
self.ftp.close()
except ftperrors():
pass
class addbase:
"""Base class for addinfo and addclosehook."""
def __init__(self, fp):
self.fp = fp
self.read = self.fp.read
self.readline = self.fp.readline
if hasattr(self.fp, "readlines"): self.readlines = self.fp.readlines
if hasattr(self.fp, "fileno"):
self.fileno = self.fp.fileno
else:
self.fileno = lambda: None
if hasattr(self.fp, "__iter__"):
self.__iter__ = self.fp.__iter__
if hasattr(self.fp, "next"):
self.next = self.fp.next
def __repr__(self):
return '<%s at %r whose fp = %r>' % (self.__class__.__name__,
id(self), self.fp)
def close(self):
self.read = None
self.readline = None
self.readlines = None
self.fileno = None
if self.fp: self.fp.close()
self.fp = None
class addclosehook(addbase):
"""Class to add a close hook to an open file."""
def __init__(self, fp, closehook, *hookargs):
addbase.__init__(self, fp)
self.closehook = closehook
self.hookargs = hookargs
def close(self):
try:
closehook = self.closehook
hookargs = self.hookargs
if closehook:
self.closehook = None
self.hookargs = None
closehook(*hookargs)
finally:
addbase.close(self)
class addinfo(addbase):
"""class to add an info() method to an open file."""
def __init__(self, fp, headers):
addbase.__init__(self, fp)
self.headers = headers
def info(self):
return self.headers
class addinfourl(addbase):
"""class to add info() and geturl() methods to an open file."""
def __init__(self, fp, headers, url, code=None):
addbase.__init__(self, fp)
self.headers = headers
self.url = url
self.code = code
def info(self):
return self.headers
def getcode(self):
return self.code
def geturl(self):
return self.url
# Utilities to parse URLs (most of these return None for missing parts):
# unwrap('<URL:type://host/path>') --> 'type://host/path'
# splittype('type:opaquestring') --> 'type', 'opaquestring'
# splithost('//host[:port]/path') --> 'host[:port]', '/path'
# splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'
# splitpasswd('user:passwd') -> 'user', 'passwd'
# splitport('host:port') --> 'host', 'port'
# splitquery('/path?query') --> '/path', 'query'
# splittag('/path#tag') --> '/path', 'tag'
# splitattr('/path;attr1=value1;attr2=value2;...') ->
# '/path', ['attr1=value1', 'attr2=value2', ...]
# splitvalue('attr=value') --> 'attr', 'value'
# unquote('abc%20def') -> 'abc def'
# quote('abc def') -> 'abc%20def')
try:
unicode
except NameError:
def _is_unicode(x):
return 0
else:
def _is_unicode(x):
return isinstance(x, unicode)
def toBytes(url):
"""toBytes(u"URL") --> 'URL'."""
# Most URL schemes require ASCII. If that changes, the conversion
# can be relaxed
if _is_unicode(url):
try:
url = url.encode("ASCII")
except UnicodeError:
raise UnicodeError("URL " + repr(url) +
" contains non-ASCII characters")
return url
def unwrap(url):
"""unwrap('<URL:type://host/path>') --> 'type://host/path'."""
url = url.strip()
if url[:1] == '<' and url[-1:] == '>':
url = url[1:-1].strip()
if url[:4] == 'URL:': url = url[4:].strip()
return url
_typeprog = None
def splittype(url):
"""splittype('type:opaquestring') --> 'type', 'opaquestring'."""
global _typeprog
if _typeprog is None:
import re
_typeprog = re.compile('^([^/:]+):')
match = _typeprog.match(url)
if match:
scheme = match.group(1)
return scheme.lower(), url[len(scheme) + 1:]
return None, url
_hostprog = None
def splithost(url):
"""splithost('//host[:port]/path') --> 'host[:port]', '/path'."""
global _hostprog
if _hostprog is None:
_hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
match = _hostprog.match(url)
if match:
host_port = match.group(1)
path = match.group(2)
if path and not path.startswith('/'):
path = '/' + path
return host_port, path
return None, url
_userprog = None
def splituser(host):
"""splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'."""
global _userprog
if _userprog is None:
import re
_userprog = re.compile('^(.*)@(.*)$')
match = _userprog.match(host)
if match: return match.group(1, 2)
return None, host
_passwdprog = None
def splitpasswd(user):
"""splitpasswd('user:passwd') -> 'user', 'passwd'."""
global _passwdprog
if _passwdprog is None:
import re
_passwdprog = re.compile('^([^:]*):(.*)$',re.S)
match = _passwdprog.match(user)
if match: return match.group(1, 2)
return user, None
# splittag('/path#tag') --> '/path', 'tag'
_portprog = None
def splitport(host):
"""splitport('host:port') --> 'host', 'port'."""
global _portprog
if _portprog is None:
import re
_portprog = re.compile('^(.*):([0-9]*)$')
match = _portprog.match(host)
if match:
host, port = match.groups()
if port:
return host, port
return host, None
_nportprog = None
def splitnport(host, defport=-1):
"""Split host and port, returning numeric port.
Return given default port if no ':' found; defaults to -1.
Return numerical port if a valid number are found after ':'.
Return None if ':' but not a valid number."""
global _nportprog
if _nportprog is None:
import re
_nportprog = re.compile('^(.*):(.*)$')
match = _nportprog.match(host)
if match:
host, port = match.group(1, 2)
if port:
try:
nport = int(port)
except ValueError:
nport = None
return host, nport
return host, defport
_queryprog = None
def splitquery(url):
"""splitquery('/path?query') --> '/path', 'query'."""
global _queryprog
if _queryprog is None:
import re
_queryprog = re.compile('^(.*)\?([^?]*)$')
match = _queryprog.match(url)
if match: return match.group(1, 2)
return url, None
_tagprog = None
def splittag(url):
"""splittag('/path#tag') --> '/path', 'tag'."""
global _tagprog
if _tagprog is None:
import re
_tagprog = re.compile('^(.*)#([^#]*)$')
match = _tagprog.match(url)
if match: return match.group(1, 2)
return url, None
def splitattr(url):
"""splitattr('/path;attr1=value1;attr2=value2;...') ->
'/path', ['attr1=value1', 'attr2=value2', ...]."""
words = url.split(';')
return words[0], words[1:]
_valueprog = None
def splitvalue(attr):
"""splitvalue('attr=value') --> 'attr', 'value'."""
global _valueprog
if _valueprog is None:
import re
_valueprog = re.compile('^([^=]*)=(.*)$')
match = _valueprog.match(attr)
if match: return match.group(1, 2)
return attr, None
# urlparse contains a duplicate of this method to avoid a circular import. If
# you update this method, also update the copy in urlparse. This code
# duplication does not exist in Python3.
_hexdig = '0123456789ABCDEFabcdef'
_hextochr = dict((a + b, chr(int(a + b, 16)))
for a in _hexdig for b in _hexdig)
_asciire = re.compile('([\x00-\x7f]+)')
def unquote(s, recurse=False):
"""unquote('abc%20def') -> 'abc def'."""
if _is_unicode(s) and not recurse:
if '%' not in s:
return s
bits = _asciire.split(s)
res = [bits[0]]
append = res.append
for i in range(1, len(bits), 2):
append(unquote(str(bits[i]), True).decode('latin1'))
append(bits[i + 1])
return ''.join(res)
bits = s.split('%')
# fastpath
if len(bits) == 1:
return s
res = [bits[0]]
append = res.append
for item in bits[1:]:
try:
append(_hextochr[item[:2]])
append(item[2:])
except KeyError:
append('%')
append(item)
return ''.join(res)
def unquote_plus(s):
"""unquote('%7e/abc+def') -> '~/abc def'"""
s = s.replace('+', ' ')
return unquote(s)
always_safe = ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'
'abcdefghijklmnopqrstuvwxyz'
'0123456789' '_.-')
_safe_map = {}
for i, c in zip(xrange(256), str(bytearray(xrange(256)))):
_safe_map[c] = c if (i < 128 and c in always_safe) else '%{:02X}'.format(i)
_safe_quoters = {}
def quote(s, safe='/'):
"""quote('abc def') -> 'abc%20def'
Each part of a URL, e.g. the path info, the query, etc., has a
different set of reserved characters that must be quoted.
RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
the following reserved characters.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.
By default, the quote function is intended for quoting the path
section of a URL. Thus, it will not encode '/'. This character
is reserved, but in typical usage the quote function is being
called on a path where the existing slash characters are used as
reserved characters.
"""
# fastpath
if not s:
if s is None:
raise TypeError('None object cannot be quoted')
return s
cachekey = (safe, always_safe)
try:
(quoter, safe) = _safe_quoters[cachekey]
except KeyError:
safe_map = _safe_map.copy()
safe_map.update([(c, c) for c in safe])
quoter = safe_map.__getitem__
safe = always_safe + safe
_safe_quoters[cachekey] = (quoter, safe)
if not s.rstrip(safe):
return s
return ''.join(map(quoter, s))
def quote_plus(s, safe=''):
"""Quote the query fragment of a URL; replacing ' ' with '+'"""
if ' ' in s:
s = quote(s, safe + ' ')
return s.replace(' ', '+')
return quote(s, safe)
def urlencode(query, doseq=0):
"""Encode a sequence of two-element tuples or dictionary into a URL query string.
If any values in the query arg are sequences and doseq is true, each
sequence element is converted to a separate parameter.
If the query arg is a sequence of two-element tuples, the order of the
parameters in the output will match the order of parameters in the
input.
"""
if hasattr(query,"items"):
# mapping objects
query = query.items()
else:
# it's a bother at times that strings and string-like objects are
# sequences...
try:
# non-sequence items should not work with len()
# non-empty strings will fail this
if len(query) and not isinstance(query[0], tuple):
raise TypeError
# zero-length sequences of all types will get here and succeed,
# but that's a minor nit - since the original implementation
# allowed empty dicts that type of behavior probably should be
# preserved for consistency
except TypeError:
ty,va,tb = sys.exc_info()
raise TypeError, "not a valid non-string sequence or mapping object", tb
l = []
if not doseq:
# preserve old behavior
for k, v in query:
k = quote_plus(str(k))
v = quote_plus(str(v))
l.append(k + '=' + v)
else:
for k, v in query:
k = quote_plus(str(k))
if isinstance(v, str):
v = quote_plus(v)
l.append(k + '=' + v)
elif _is_unicode(v):
# is there a reasonable way to convert to ASCII?
# encode generates a string, but "replace" or "ignore"
# lose information and "strict" can raise UnicodeError
v = quote_plus(v.encode("ASCII","replace"))
l.append(k + '=' + v)
else:
try:
# is this a sufficient test for sequence-ness?
len(v)
except TypeError:
# not a sequence
v = quote_plus(str(v))
l.append(k + '=' + v)
else:
# loop over the sequence
for elt in v:
l.append(k + '=' + quote_plus(str(elt)))
return '&'.join(l)
# Proxy handling
def getproxies_environment():
"""Return a dictionary of scheme -> proxy server URL mappings.
Scan the environment for variables named <scheme>_proxy;
this seems to be the standard convention. In order to prefer lowercase
variables, we process the environment in two passes, first matches any
and second matches only lower case proxies.
If you need a different way, you can pass a proxies dictionary to the
[Fancy]URLopener constructor.
"""
# Get all variables
proxies = {}
for name, value in os.environ.items():
name = name.lower()
if value and name[-6:] == '_proxy':
proxies[name[:-6]] = value
# CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY
# (non-all-lowercase) as it may be set from the web server by a "Proxy:"
# header from the client
# If "proxy" is lowercase, it will still be used thanks to the next block
if 'REQUEST_METHOD' in os.environ:
proxies.pop('http', None)
# Get lowercase variables
for name, value in os.environ.items():
if name[-6:] == '_proxy':
name = name.lower()
if value:
proxies[name[:-6]] = value
else:
proxies.pop(name[:-6], None)
return proxies
def proxy_bypass_environment(host, proxies=None):
"""Test if proxies should not be used for a particular host.
Checks the proxies dict for the value of no_proxy, which should be a
list of comma separated DNS suffixes, or '*' for all hosts.
"""
if proxies is None:
proxies = getproxies_environment()
# don't bypass, if no_proxy isn't specified
try:
no_proxy = proxies['no']
except KeyError:
return 0
# '*' is special case for always bypass
if no_proxy == '*':
return 1
# strip port off host
hostonly, port = splitport(host)
# check if the host ends with any of the DNS suffixes
no_proxy_list = [proxy.strip() for proxy in no_proxy.split(',')]
for name in no_proxy_list:
if name:
name = name.lstrip('.') # ignore leading dots
name = re.escape(name)
pattern = r'(.+\.)?%s$' % name
if (re.match(pattern, hostonly, re.I)
or re.match(pattern, host, re.I)):
return 1
# otherwise, don't bypass
return 0
if sys.platform == 'darwin':
from _scproxy import _get_proxy_settings, _get_proxies
def proxy_bypass_macosx_sysconf(host):
"""
Return True iff this host shouldn't be accessed using a proxy
This function uses the MacOSX framework SystemConfiguration
to fetch the proxy information.
"""
import re
import socket
from fnmatch import fnmatch
hostonly, port = splitport(host)
def ip2num(ipAddr):
parts = ipAddr.split('.')
parts = map(int, parts)
if len(parts) != 4:
parts = (parts + [0, 0, 0, 0])[:4]
return (parts[0] << 24) | (parts[1] << 16) | (parts[2] << 8) | parts[3]
proxy_settings = _get_proxy_settings()
# Check for simple host names:
if '.' not in host:
if proxy_settings['exclude_simple']:
return True
hostIP = None
for value in proxy_settings.get('exceptions', ()):
# Items in the list are strings like these: *.local, 169.254/16
if not value: continue
m = re.match(r"(\d+(?:\.\d+)*)(/\d+)?", value)
if m is not None:
if hostIP is None:
try:
hostIP = socket.gethostbyname(hostonly)
hostIP = ip2num(hostIP)
except socket.error:
continue
base = ip2num(m.group(1))
mask = m.group(2)
if mask is None:
mask = 8 * (m.group(1).count('.') + 1)
else:
mask = int(mask[1:])
mask = 32 - mask
if (hostIP >> mask) == (base >> mask):
return True
elif fnmatch(host, value):
return True
return False
def getproxies_macosx_sysconf():
"""Return a dictionary of scheme -> proxy server URL mappings.
This function uses the MacOSX framework SystemConfiguration
to fetch the proxy information.
"""
return _get_proxies()
def proxy_bypass(host):
"""Return True, if a host should be bypassed.
Checks proxy settings gathered from the environment, if specified, or
from the MacOSX framework SystemConfiguration.
"""
proxies = getproxies_environment()
if proxies:
return proxy_bypass_environment(host, proxies)
else:
return proxy_bypass_macosx_sysconf(host)
def getproxies():
return getproxies_environment() or getproxies_macosx_sysconf()
elif os.name == 'nt':
def getproxies_registry():
"""Return a dictionary of scheme -> proxy server URL mappings.
Win32 uses the registry to store proxies.
"""
proxies = {}
try:
import _winreg
except ImportError:
# Std module, so should be around - but you never know!
return proxies
try:
internetSettings = _winreg.OpenKey(_winreg.HKEY_CURRENT_USER,
r'Software\Microsoft\Windows\CurrentVersion\Internet Settings')
proxyEnable = _winreg.QueryValueEx(internetSettings,
'ProxyEnable')[0]
if proxyEnable:
# Returned as Unicode but problems if not converted to ASCII
proxyServer = str(_winreg.QueryValueEx(internetSettings,
'ProxyServer')[0])
if '=' in proxyServer:
# Per-protocol settings
for p in proxyServer.split(';'):
protocol, address = p.split('=', 1)
# See if address has a type:// prefix
import re
if not re.match('^([^/:]+)://', address):
address = '%s://%s' % (protocol, address)
proxies[protocol] = address
else:
# Use one setting for all protocols
if proxyServer[:5] == 'http:':
proxies['http'] = proxyServer
else:
proxies['http'] = 'http://%s' % proxyServer
proxies['https'] = 'https://%s' % proxyServer
proxies['ftp'] = 'ftp://%s' % proxyServer
internetSettings.Close()
except (WindowsError, ValueError, TypeError):
# Either registry key not found etc, or the value in an
# unexpected format.
# proxies already set up to be empty so nothing to do
pass
return proxies
def getproxies():
"""Return a dictionary of scheme -> proxy server URL mappings.
Returns settings gathered from the environment, if specified,
or the registry.
"""
return getproxies_environment() or getproxies_registry()
def proxy_bypass_registry(host):
try:
import _winreg
import re
except ImportError:
# Std modules, so should be around - but you never know!
return 0
try:
internetSettings = _winreg.OpenKey(_winreg.HKEY_CURRENT_USER,
r'Software\Microsoft\Windows\CurrentVersion\Internet Settings')
proxyEnable = _winreg.QueryValueEx(internetSettings,
'ProxyEnable')[0]
proxyOverride = str(_winreg.QueryValueEx(internetSettings,
'ProxyOverride')[0])
# ^^^^ Returned as Unicode but problems if not converted to ASCII
except WindowsError:
return 0
if not proxyEnable or not proxyOverride:
return 0
# try to make a host list from name and IP address.
rawHost, port = splitport(host)
host = [rawHost]
try:
addr = socket.gethostbyname(rawHost)
if addr != rawHost:
host.append(addr)
except socket.error:
pass
try:
fqdn = socket.getfqdn(rawHost)
if fqdn != rawHost:
host.append(fqdn)
except socket.error:
pass
# make a check value list from the registry entry: replace the
# '<local>' string by the localhost entry and the corresponding
# canonical entry.
proxyOverride = proxyOverride.split(';')
# now check if we match one of the registry values.
for test in proxyOverride:
if test == '<local>':
if '.' not in rawHost:
return 1
test = test.replace(".", r"\.") # mask dots
test = test.replace("*", r".*") # change glob sequence
test = test.replace("?", r".") # change glob char
for val in host:
# print "%s <--> %s" %( test, val )
if re.match(test, val, re.I):
return 1
return 0
def proxy_bypass(host):
"""Return True, if the host should be bypassed.
Checks proxy settings gathered from the environment, if specified,
or the registry.
"""
proxies = getproxies_environment()
if proxies:
return proxy_bypass_environment(host, proxies)
else:
return proxy_bypass_registry(host)
else:
# By default use environment variables
getproxies = getproxies_environment
proxy_bypass = proxy_bypass_environment
# Test and time quote() and unquote()
def test1():
s = ''
for i in range(256): s = s + chr(i)
s = s*4
t0 = time.time()
qs = quote(s)
uqs = unquote(qs)
t1 = time.time()
if uqs != s:
print 'Wrong!'
print repr(s)
print repr(qs)
print repr(uqs)
print round(t1 - t0, 3), 'sec'
def reporthook(blocknum, blocksize, totalsize):
# Report during remote transfers
print "Block number: %d, Block size: %d, Total size: %d" % (
blocknum, blocksize, totalsize)
"""An extensible library for opening URLs using a variety of protocols
The simplest way to use this module is to call the urlopen function,
which accepts a string containing a URL or a Request object (described
below). It opens the URL and returns the results as file-like
object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do
all the actual work. Each Handler implements a particular protocol or
option. The OpenerDirector is a composite object that invokes the
Handlers needed to open the requested URL. For example, the
HTTPHandler performs HTTP GET and POST requests and deals with
non-error returns. The HTTPRedirectHandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
deals with digest authentication.
urlopen(url, data=None) -- Basic usage is the same as original
urllib. pass the url and optionally data to post to an HTTP URL, and
get a file-like object back. One difference is that you can also pass
a Request instance instead of URL. Raises a URLError (subclass of
IOError); for HTTP errors, raises an HTTPError, which can also be
treated as a valid response.
build_opener -- Function that creates a new OpenerDirector instance.
Will install the default handlers. Accepts one or more Handlers as
arguments, either instances or Handler classes that it will
instantiate. If one of the argument is a subclass of the default
handler, the argument will be installed instead of the default.
install_opener -- Installs a new opener as the default opener.
objects of interest:
OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
the Handler classes, while dealing with requests and responses.
Request -- An object that encapsulates the state of a request. The
state can be as simple as the URL. It can also include extra HTTP
headers, e.g. a User-Agent.
BaseHandler --
exceptions:
URLError -- A subclass of IOError, individual protocols have their own
specific subclass.
HTTPError -- Also a valid HTTP response, so you can treat an HTTP error
as an exceptional event or valid response.
internals:
BaseHandler and parent
_call_chain conventions
Example usage:
import urllib2
# set up authentication info
authinfo = urllib2.HTTPBasicAuthHandler()
authinfo.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='geheim$parole')
proxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)
# install it
urllib2.install_opener(opener)
f = urllib2.urlopen('http://www.python.org/')
"""
# XXX issues:
# If an authentication error handler that tries to perform
# authentication for some reason but fails, how should the error be
# signalled? The client needs to know the HTTP error code. But if
# the handler knows that the problem was, e.g., that it didn't know
# that hash algo that requested in the challenge, it would be good to
# pass that information along to the client, too.
# ftp errors aren't handled cleanly
# check digest against correct (i.e. non-apache) implementation
# Possible extensions:
# complex proxies XXX not sure what exactly was meant by this
# abstract factory for opener
import base64
import hashlib
import httplib
import mimetools
import os
import posixpath
import random
import re
import socket
import sys
import time
import urlparse
import bisect
import warnings
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
# check for SSL
try:
import ssl
except ImportError:
_have_ssl = False
else:
_have_ssl = True
from urllib import (unwrap, unquote, splittype, splithost, quote,
addinfourl, splitport, splittag, toBytes,
splitattr, ftpwrapper, splituser, splitpasswd, splitvalue)
# support for FileHandler, proxies via environment variables
from urllib import localhost, url2pathname, getproxies, proxy_bypass
# used in User-Agent header sent
__version__ = sys.version[:3]
_opener = None
def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
cafile=None, capath=None, cadefault=False, context=None):
global _opener
if cafile or capath or cadefault:
if context is not None:
raise ValueError(
"You can't pass both context and any of cafile, capath, and "
"cadefault"
)
if not _have_ssl:
raise ValueError('SSL support not available')
context = ssl.create_default_context(purpose=ssl.Purpose.SERVER_AUTH,
cafile=cafile,
capath=capath)
https_handler = HTTPSHandler(context=context)
opener = build_opener(https_handler)
elif context:
https_handler = HTTPSHandler(context=context)
opener = build_opener(https_handler)
elif _opener is None:
_opener = opener = build_opener()
else:
opener = _opener
return opener.open(url, data, timeout)
def install_opener(opener):
global _opener
_opener = opener
# do these error classes make sense?
# make sure all of the IOError stuff is overridden. we just want to be
# subtypes.
class URLError(IOError):
# URLError is a sub-type of IOError, but it doesn't share any of
# the implementation. need to override __init__ and __str__.
# It sets self.args for compatibility with other EnvironmentError
# subclasses, but args doesn't have the typical format with errno in
# slot 0 and strerror in slot 1. This may be better than nothing.
def __init__(self, reason):
self.args = reason,
self.reason = reason
def __str__(self):
return '<urlopen error %s>' % self.reason
class HTTPError(URLError, addinfourl):
"""Raised when HTTP error occurs, but also acts like non-error return"""
__super_init = addinfourl.__init__
def __init__(self, url, code, msg, hdrs, fp):
self.code = code
self.msg = msg
self.hdrs = hdrs
self.fp = fp
self.filename = url
# The addinfourl classes depend on fp being a valid file
# object. In some cases, the HTTPError may not have a valid
# file object. If this happens, the simplest workaround is to
# not initialize the base classes.
if fp is not None:
self.__super_init(fp, hdrs, url, code)
def __str__(self):
return 'HTTP Error %s: %s' % (self.code, self.msg)
# since URLError specifies a .reason attribute, HTTPError should also
# provide this attribute. See issue13211 fo discussion.
@property
def reason(self):
return self.msg
def info(self):
return self.hdrs
# copied from cookielib.py
_cut_port_re = re.compile(r":\d+$")
def request_host(request):
"""Return request-host, as defined by RFC 2965.
Variation from RFC: returned value is lowercased, for convenient
comparison.
"""
url = request.get_full_url()
host = urlparse.urlparse(url)[1]
if host == "":
host = request.get_header("Host", "")
# remove port, if present
host = _cut_port_re.sub("", host, 1)
return host.lower()
class Request:
def __init__(self, url, data=None, headers={},
origin_req_host=None, unverifiable=False):
# unwrap('<URL:type://host/path>') --> 'type://host/path'
self.__original = unwrap(url)
self.__original, self.__fragment = splittag(self.__original)
self.type = None
# self.__r_type is what's left after doing the splittype
self.host = None
self.port = None
self._tunnel_host = None
self.data = data
self.headers = {}
for key, value in headers.items():
self.add_header(key, value)
self.unredirected_hdrs = {}
if origin_req_host is None:
origin_req_host = request_host(self)
self.origin_req_host = origin_req_host
self.unverifiable = unverifiable
def __getattr__(self, attr):
# XXX this is a fallback mechanism to guard against these
# methods getting called in a non-standard order. this may be
# too complicated and/or unnecessary.
# XXX should the __r_XXX attributes be public?
if attr in ('_Request__r_type', '_Request__r_host'):
getattr(self, 'get_' + attr[12:])()
return self.__dict__[attr]
raise AttributeError, attr
def get_method(self):
if self.has_data():
return "POST"
else:
return "GET"
# XXX these helper methods are lame
def add_data(self, data):
self.data = data
def has_data(self):
return self.data is not None
def get_data(self):
return self.data
def get_full_url(self):
if self.__fragment:
return '%s#%s' % (self.__original, self.__fragment)
else:
return self.__original
def get_type(self):
if self.type is None:
self.type, self.__r_type = splittype(self.__original)
if self.type is None:
raise ValueError, "unknown url type: %s" % self.__original
return self.type
def get_host(self):
if self.host is None:
self.host, self.__r_host = splithost(self.__r_type)
if self.host:
self.host = unquote(self.host)
return self.host
def get_selector(self):
return self.__r_host
def set_proxy(self, host, type):
if self.type == 'https' and not self._tunnel_host:
self._tunnel_host = self.host
else:
self.type = type
self.__r_host = self.__original
self.host = host
def has_proxy(self):
return self.__r_host == self.__original
def get_origin_req_host(self):
return self.origin_req_host
def is_unverifiable(self):
return self.unverifiable
def add_header(self, key, val):
# useful for something like authentication
self.headers[key.capitalize()] = val
def add_unredirected_header(self, key, val):
# will not be added to a redirected request
self.unredirected_hdrs[key.capitalize()] = val
def has_header(self, header_name):
return (header_name in self.headers or
header_name in self.unredirected_hdrs)
def get_header(self, header_name, default=None):
return self.headers.get(
header_name,
self.unredirected_hdrs.get(header_name, default))
def header_items(self):
hdrs = self.unredirected_hdrs.copy()
hdrs.update(self.headers)
return hdrs.items()
class OpenerDirector:
def __init__(self):
client_version = "Python-urllib/%s" % __version__
self.addheaders = [('User-agent', client_version)]
# self.handlers is retained only for backward compatibility
self.handlers = []
# manage the individual handlers
self.handle_open = {}
self.handle_error = {}
self.process_response = {}
self.process_request = {}
def add_handler(self, handler):
if not hasattr(handler, "add_parent"):
raise TypeError("expected BaseHandler instance, got %r" %
type(handler))
added = False
for meth in dir(handler):
if meth in ["redirect_request", "do_open", "proxy_open"]:
# oops, coincidental match
continue
i = meth.find("_")
protocol = meth[:i]
condition = meth[i+1:]
if condition.startswith("error"):
j = condition.find("_") + i + 1
kind = meth[j+1:]
try:
kind = int(kind)
except ValueError:
pass
lookup = self.handle_error.get(protocol, {})
self.handle_error[protocol] = lookup
elif condition == "open":
kind = protocol
lookup = self.handle_open
elif condition == "response":
kind = protocol
lookup = self.process_response
elif condition == "request":
kind = protocol
lookup = self.process_request
else:
continue
handlers = lookup.setdefault(kind, [])
if handlers:
bisect.insort(handlers, handler)
else:
handlers.append(handler)
added = True
if added:
bisect.insort(self.handlers, handler)
handler.add_parent(self)
def close(self):
# Only exists for backwards compatibility.
pass
def _call_chain(self, chain, kind, meth_name, *args):
# Handlers raise an exception if no one else should try to handle
# the request, or return None if they can't but another handler
# could. Otherwise, they return the response.
handlers = chain.get(kind, ())
for handler in handlers:
func = getattr(handler, meth_name)
result = func(*args)
if result is not None:
return result
def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
# accept a URL or a Request object
if isinstance(fullurl, basestring):
req = Request(fullurl, data)
else:
req = fullurl
if data is not None:
req.add_data(data)
req.timeout = timeout
protocol = req.get_type()
# pre-process request
meth_name = protocol+"_request"
for processor in self.process_request.get(protocol, []):
meth = getattr(processor, meth_name)
req = meth(req)
response = self._open(req, data)
# post-process response
meth_name = protocol+"_response"
for processor in self.process_response.get(protocol, []):
meth = getattr(processor, meth_name)
response = meth(req, response)
return response
def _open(self, req, data=None):
result = self._call_chain(self.handle_open, 'default',
'default_open', req)
if result:
return result
protocol = req.get_type()
result = self._call_chain(self.handle_open, protocol, protocol +
'_open', req)
if result:
return result
return self._call_chain(self.handle_open, 'unknown',
'unknown_open', req)
def error(self, proto, *args):
if proto in ('http', 'https'):
# XXX http[s] protocols are special-cased
dict = self.handle_error['http'] # https is not different than http
proto = args[2] # YUCK!
meth_name = 'http_error_%s' % proto
http_err = 1
orig_args = args
else:
dict = self.handle_error
meth_name = proto + '_error'
http_err = 0
args = (dict, proto, meth_name) + args
result = self._call_chain(*args)
if result:
return result
if http_err:
args = (dict, 'default', 'http_error_default') + orig_args
return self._call_chain(*args)
# XXX probably also want an abstract factory that knows when it makes
# sense to skip a superclass in favor of a subclass and when it might
# make sense to include both
def build_opener(*handlers):
"""Create an opener object from a list of handlers.
The opener will use several default handlers, including support
for HTTP, FTP and when applicable, HTTPS.
If any of the handlers passed as arguments are subclasses of the
default handlers, the default handlers will not be used.
"""
import types
def isclass(obj):
return isinstance(obj, (types.ClassType, type))
opener = OpenerDirector()
default_classes = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]
if hasattr(httplib, 'HTTPS'):
default_classes.append(HTTPSHandler)
skip = set()
for klass in default_classes:
for check in handlers:
if isclass(check):
if issubclass(check, klass):
skip.add(klass)
elif isinstance(check, klass):
skip.add(klass)
for klass in skip:
default_classes.remove(klass)
for klass in default_classes:
opener.add_handler(klass())
for h in handlers:
if isclass(h):
h = h()
opener.add_handler(h)
return opener
class BaseHandler:
handler_order = 500
def add_parent(self, parent):
self.parent = parent
def close(self):
# Only exists for backwards compatibility
pass
def __lt__(self, other):
if not hasattr(other, "handler_order"):
# Try to preserve the old behavior of having custom classes
# inserted after default ones (works only for custom user
# classes which are not aware of handler_order).
return True
return self.handler_order < other.handler_order
class HTTPErrorProcessor(BaseHandler):
"""Process HTTP error responses."""
handler_order = 1000 # after all other processing
def http_response(self, request, response):
code, msg, hdrs = response.code, response.msg, response.info()
# According to RFC 2616, "2xx" code indicates that the client's
# request was successfully received, understood, and accepted.
if not (200 <= code < 300):
response = self.parent.error(
'http', request, response, code, msg, hdrs)
return response
https_response = http_response
class HTTPDefaultErrorHandler(BaseHandler):
def http_error_default(self, req, fp, code, msg, hdrs):
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
class HTTPRedirectHandler(BaseHandler):
# maximum number of redirections to any single URL
# this is needed because of the state that cookies introduce
max_repeats = 4
# maximum total number of redirections (regardless of URL) before
# assuming we're in a loop
max_redirections = 10
def redirect_request(self, req, fp, code, msg, headers, newurl):
"""Return a Request or None in response to a redirect.
This is called by the http_error_30x methods when a
redirection response is received. If a redirection should
take place, return a new Request to allow http_error_30x to
perform the redirect. Otherwise, raise HTTPError if no-one
else should try to handle this url. Return None if you can't
but another Handler might.
"""
m = req.get_method()
if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
or code in (301, 302, 303) and m == "POST"):
# Strictly (according to RFC 2616), 301 or 302 in response
# to a POST MUST NOT cause a redirection without confirmation
# from the user (of urllib2, in this case). In practice,
# essentially all clients do redirect in this case, so we
# do the same.
# be conciliant with URIs containing a space
newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type")
)
return Request(newurl,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True)
else:
raise HTTPError(req.get_full_url(), code, msg, headers, fp)
# Implementation note: To avoid the server sending us into an
# infinite loop, the request object needs to track what URLs we
# have already seen. Do this by adding a handler-specific
# attribute to the Request object.
def http_error_302(self, req, fp, code, msg, headers):
# Some servers (incorrectly) return multiple Location headers
# (so probably same goes for URI). Use first header.
if 'location' in headers:
newurl = headers.getheaders('location')[0]
elif 'uri' in headers:
newurl = headers.getheaders('uri')[0]
else:
return
# fix a possible malformed URL
urlparts = urlparse.urlparse(newurl)
if not urlparts.path and urlparts.netloc:
urlparts = list(urlparts)
urlparts[2] = "/"
newurl = urlparse.urlunparse(urlparts)
newurl = urlparse.urljoin(req.get_full_url(), newurl)
# For security reasons we do not allow redirects to protocols
# other than HTTP, HTTPS or FTP.
newurl_lower = newurl.lower()
if not (newurl_lower.startswith('http://') or
newurl_lower.startswith('https://') or
newurl_lower.startswith('ftp://')):
raise HTTPError(newurl, code,
msg + " - Redirection to url '%s' is not allowed" %
newurl,
headers, fp)
# XXX Probably want to forget about the state of the current
# request, although that might interact poorly with other
# handlers that also use handler-specific request attributes
new = self.redirect_request(req, fp, code, msg, headers, newurl)
if new is None:
return
# loop detection
# .redirect_dict has a key url if url was previously visited.
if hasattr(req, 'redirect_dict'):
visited = new.redirect_dict = req.redirect_dict
if (visited.get(newurl, 0) >= self.max_repeats or
len(visited) >= self.max_redirections):
raise HTTPError(req.get_full_url(), code,
self.inf_msg + msg, headers, fp)
else:
visited = new.redirect_dict = req.redirect_dict = {}
visited[newurl] = visited.get(newurl, 0) + 1
# Don't close the fp until we are sure that we won't use it
# with HTTPError.
fp.read()
fp.close()
return self.parent.open(new, timeout=req.timeout)
http_error_301 = http_error_303 = http_error_307 = http_error_302
inf_msg = "The HTTP server returned a redirect error that would " \
"lead to an infinite loop.\n" \
"The last 30x error message was:\n"
def _parse_proxy(proxy):
"""Return (scheme, user, password, host/port) given a URL or an authority.
If a URL is supplied, it must have an authority (host:port) component.
According to RFC 3986, having an authority component means the URL must
have two slashes after the scheme:
>>> _parse_proxy('file:/ftp.example.com/')
Traceback (most recent call last):
ValueError: proxy URL with no authority: 'file:/ftp.example.com/'
The first three items of the returned tuple may be None.
Examples of authority parsing:
>>> _parse_proxy('proxy.example.com')
(None, None, None, 'proxy.example.com')
>>> _parse_proxy('proxy.example.com:3128')
(None, None, None, 'proxy.example.com:3128')
The authority component may optionally include userinfo (assumed to be
username:password):
>>> _parse_proxy('joe:[email protected]')
(None, 'joe', 'password', 'proxy.example.com')
>>> _parse_proxy('joe:[email protected]:3128')
(None, 'joe', 'password', 'proxy.example.com:3128')
Same examples, but with URLs instead:
>>> _parse_proxy('http://proxy.example.com/')
('http', None, None, 'proxy.example.com')
>>> _parse_proxy('http://proxy.example.com:3128/')
('http', None, None, 'proxy.example.com:3128')
>>> _parse_proxy('http://joe:[email protected]/')
('http', 'joe', 'password', 'proxy.example.com')
>>> _parse_proxy('http://joe:[email protected]:3128')
('http', 'joe', 'password', 'proxy.example.com:3128')
Everything after the authority is ignored:
>>> _parse_proxy('ftp://joe:[email protected]/rubbish:3128')
('ftp', 'joe', 'password', 'proxy.example.com')
Test for no trailing '/' case:
>>> _parse_proxy('http://joe:[email protected]')
('http', 'joe', 'password', 'proxy.example.com')
"""
scheme, r_scheme = splittype(proxy)
if not r_scheme.startswith("/"):
# authority
scheme = None
authority = proxy
else:
# URL
if not r_scheme.startswith("//"):
raise ValueError("proxy URL with no authority: %r" % proxy)
# We have an authority, so for RFC 3986-compliant URLs (by ss 3.
# and 3.3.), path is empty or starts with '/'
end = r_scheme.find("/", 2)
if end == -1:
end = None
authority = r_scheme[2:end]
userinfo, hostport = splituser(authority)
if userinfo is not None:
user, password = splitpasswd(userinfo)
else:
user = password = None
return scheme, user, password, hostport
class ProxyHandler(BaseHandler):
# Proxies must be in front
handler_order = 100
def __init__(self, proxies=None):
if proxies is None:
proxies = getproxies()
assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
self.proxies = proxies
for type, url in proxies.items():
setattr(self, '%s_open' % type,
lambda r, proxy=url, type=type, meth=self.proxy_open: \
meth(r, proxy, type))
def proxy_open(self, req, proxy, type):
orig_type = req.get_type()
proxy_type, user, password, hostport = _parse_proxy(proxy)
if proxy_type is None:
proxy_type = orig_type
if req.host and proxy_bypass(req.host):
return None
if user and password:
user_pass = '%s:%s' % (unquote(user), unquote(password))
creds = base64.b64encode(user_pass).strip()
req.add_header('Proxy-authorization', 'Basic ' + creds)
hostport = unquote(hostport)
req.set_proxy(hostport, proxy_type)
if orig_type == proxy_type or orig_type == 'https':
# let other handlers take care of it
return None
else:
# need to start over, because the other handlers don't
# grok the proxy's URL type
# e.g. if we have a constructor arg proxies like so:
# {'http': 'ftp://proxy.example.com'}, we may end up turning
# a request for http://acme.example.com/a into one for
# ftp://proxy.example.com/a
return self.parent.open(req, timeout=req.timeout)
class HTTPPasswordMgr:
def __init__(self):
self.passwd = {}
def add_password(self, realm, uri, user, passwd):
# uri could be a single URI or a sequence
if isinstance(uri, basestring):
uri = [uri]
if not realm in self.passwd:
self.passwd[realm] = {}
for default_port in True, False:
reduced_uri = tuple(
[self.reduce_uri(u, default_port) for u in uri])
self.passwd[realm][reduced_uri] = (user, passwd)
def find_user_password(self, realm, authuri):
domains = self.passwd.get(realm, {})
for default_port in True, False:
reduced_authuri = self.reduce_uri(authuri, default_port)
for uris, authinfo in domains.iteritems():
for uri in uris:
if self.is_suburi(uri, reduced_authuri):
return authinfo
return None, None
def reduce_uri(self, uri, default_port=True):
"""Accept authority or URI and extract only the authority and path."""
# note HTTP URLs do not have a userinfo component
parts = urlparse.urlsplit(uri)
if parts[1]:
# URI
scheme = parts[0]
authority = parts[1]
path = parts[2] or '/'
else:
# host or host:port
scheme = None
authority = uri
path = '/'
host, port = splitport(authority)
if default_port and port is None and scheme is not None:
dport = {"http": 80,
"https": 443,
}.get(scheme)
if dport is not None:
authority = "%s:%d" % (host, dport)
return authority, path
def is_suburi(self, base, test):
"""Check if test is below base in a URI tree
Both args must be URIs in reduced form.
"""
if base == test:
return True
if base[0] != test[0]:
return False
common = posixpath.commonprefix((base[1], test[1]))
if len(common) == len(base[1]):
return True
return False
class HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr):
def find_user_password(self, realm, authuri):
user, password = HTTPPasswordMgr.find_user_password(self, realm,
authuri)
if user is not None:
return user, password
return HTTPPasswordMgr.find_user_password(self, None, authuri)
class AbstractBasicAuthHandler:
# XXX this allows for multiple auth-schemes, but will stupidly pick
# the last one with a realm specified.
# allow for double- and single-quoted realm values
# (single quotes are a violation of the RFC, but appear in the wild)
rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
'realm=(["\']?)([^"\']*)\\2', re.I)
# XXX could pre-emptively send auth info already accepted (RFC 2617,
# end of section 2, and section 1.2 immediately after "credentials"
# production).
def __init__(self, password_mgr=None):
if password_mgr is None:
password_mgr = HTTPPasswordMgr()
self.passwd = password_mgr
self.add_password = self.passwd.add_password
def http_error_auth_reqed(self, authreq, host, req, headers):
# host may be an authority (without userinfo) or a URL with an
# authority
# XXX could be multiple headers
authreq = headers.get(authreq, None)
if authreq:
mo = AbstractBasicAuthHandler.rx.search(authreq)
if mo:
scheme, quote, realm = mo.groups()
if quote not in ['"', "'"]:
warnings.warn("Basic Auth Realm was unquoted",
UserWarning, 2)
if scheme.lower() == 'basic':
return self.retry_http_basic_auth(host, req, realm)
def retry_http_basic_auth(self, host, req, realm):
user, pw = self.passwd.find_user_password(realm, host)
if pw is not None:
raw = "%s:%s" % (user, pw)
auth = 'Basic %s' % base64.b64encode(raw).strip()
if req.get_header(self.auth_header, None) == auth:
return None
req.add_unredirected_header(self.auth_header, auth)
return self.parent.open(req, timeout=req.timeout)
else:
return None
class HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
auth_header = 'Authorization'
def http_error_401(self, req, fp, code, msg, headers):
url = req.get_full_url()
response = self.http_error_auth_reqed('www-authenticate',
url, req, headers)
return response
class ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
auth_header = 'Proxy-authorization'
def http_error_407(self, req, fp, code, msg, headers):
# http_error_auth_reqed requires that there is no userinfo component in
# authority. Assume there isn't one, since urllib2 does not (and
# should not, RFC 3986 s. 3.2.1) support requests for URLs containing
# userinfo.
authority = req.get_host()
response = self.http_error_auth_reqed('proxy-authenticate',
authority, req, headers)
return response
def randombytes(n):
"""Return n random bytes."""
# Use /dev/urandom if it is available. Fall back to random module
# if not. It might be worthwhile to extend this function to use
# other platform-specific mechanisms for getting random bytes.
if os.path.exists("/dev/urandom"):
f = open("/dev/urandom")
s = f.read(n)
f.close()
return s
else:
L = [chr(random.randrange(0, 256)) for i in range(n)]
return "".join(L)
class AbstractDigestAuthHandler:
# Digest authentication is specified in RFC 2617.
# XXX The client does not inspect the Authentication-Info header
# in a successful response.
# XXX It should be possible to test this implementation against
# a mock server that just generates a static set of challenges.
# XXX qop="auth-int" supports is shaky
def __init__(self, passwd=None):
if passwd is None:
passwd = HTTPPasswordMgr()
self.passwd = passwd
self.add_password = self.passwd.add_password
self.retried = 0
self.nonce_count = 0
self.last_nonce = None
def reset_retry_count(self):
self.retried = 0
def http_error_auth_reqed(self, auth_header, host, req, headers):
authreq = headers.get(auth_header, None)
if self.retried > 5:
# Don't fail endlessly - if we failed once, we'll probably
# fail a second time. Hm. Unless the Password Manager is
# prompting for the information. Crap. This isn't great
# but it's better than the current 'repeat until recursion
# depth exceeded' approach <wink>
raise HTTPError(req.get_full_url(), 401, "digest auth failed",
headers, None)
else:
self.retried += 1
if authreq:
scheme = authreq.split()[0]
if scheme.lower() == 'digest':
return self.retry_http_digest_auth(req, authreq)
def retry_http_digest_auth(self, req, auth):
token, challenge = auth.split(' ', 1)
chal = parse_keqv_list(parse_http_list(challenge))
auth = self.get_authorization(req, chal)
if auth:
auth_val = 'Digest %s' % auth
if req.headers.get(self.auth_header, None) == auth_val:
return None
req.add_unredirected_header(self.auth_header, auth_val)
resp = self.parent.open(req, timeout=req.timeout)
return resp
def get_cnonce(self, nonce):
# The cnonce-value is an opaque
# quoted string value provided by the client and used by both client
# and server to avoid chosen plaintext attacks, to provide mutual
# authentication, and to provide some message integrity protection.
# This isn't a fabulous effort, but it's probably Good Enough.
dig = hashlib.sha1("%s:%s:%s:%s" % (self.nonce_count, nonce, time.ctime(),
randombytes(8))).hexdigest()
return dig[:16]
def get_authorization(self, req, chal):
try:
realm = chal['realm']
nonce = chal['nonce']
qop = chal.get('qop')
algorithm = chal.get('algorithm', 'MD5')
# mod_digest doesn't send an opaque, even though it isn't
# supposed to be optional
opaque = chal.get('opaque', None)
except KeyError:
return None
H, KD = self.get_algorithm_impls(algorithm)
if H is None:
return None
user, pw = self.passwd.find_user_password(realm, req.get_full_url())
if user is None:
return None
# XXX not implemented yet
if req.has_data():
entdig = self.get_entity_digest(req.get_data(), chal)
else:
entdig = None
A1 = "%s:%s:%s" % (user, realm, pw)
A2 = "%s:%s" % (req.get_method(),
# XXX selector: what about proxies and full urls
req.get_selector())
if qop == 'auth':
if nonce == self.last_nonce:
self.nonce_count += 1
else:
self.nonce_count = 1
self.last_nonce = nonce
ncvalue = '%08x' % self.nonce_count
cnonce = self.get_cnonce(nonce)
noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, qop, H(A2))
respdig = KD(H(A1), noncebit)
elif qop is None:
respdig = KD(H(A1), "%s:%s" % (nonce, H(A2)))
else:
# XXX handle auth-int.
raise URLError("qop '%s' is not supported." % qop)
# XXX should the partial digests be encoded too?
base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \
'response="%s"' % (user, realm, nonce, req.get_selector(),
respdig)
if opaque:
base += ', opaque="%s"' % opaque
if entdig:
base += ', digest="%s"' % entdig
base += ', algorithm="%s"' % algorithm
if qop:
base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce)
return base
def get_algorithm_impls(self, algorithm):
# algorithm should be case-insensitive according to RFC2617
algorithm = algorithm.upper()
# lambdas assume digest modules are imported at the top level
if algorithm == 'MD5':
H = lambda x: hashlib.md5(x).hexdigest()
elif algorithm == 'SHA':
H = lambda x: hashlib.sha1(x).hexdigest()
# XXX MD5-sess
else:
raise ValueError("Unsupported digest authentication "
"algorithm %r" % algorithm.lower())
KD = lambda s, d: H("%s:%s" % (s, d))
return H, KD
def get_entity_digest(self, data, chal):
# XXX not implemented yet
return None
class HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
"""An authentication protocol defined by RFC 2069
Digest authentication improves on basic authentication because it
does not transmit passwords in the clear.
"""
auth_header = 'Authorization'
handler_order = 490 # before Basic auth
def http_error_401(self, req, fp, code, msg, headers):
host = urlparse.urlparse(req.get_full_url())[1]
retry = self.http_error_auth_reqed('www-authenticate',
host, req, headers)
self.reset_retry_count()
return retry
class ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
auth_header = 'Proxy-Authorization'
handler_order = 490 # before Basic auth
def http_error_407(self, req, fp, code, msg, headers):
host = req.get_host()
retry = self.http_error_auth_reqed('proxy-authenticate',
host, req, headers)
self.reset_retry_count()
return retry
class AbstractHTTPHandler(BaseHandler):
def __init__(self, debuglevel=0):
self._debuglevel = debuglevel
def set_http_debuglevel(self, level):
self._debuglevel = level
def do_request_(self, request):
host = request.get_host()
if not host:
raise URLError('no host given')
if request.has_data(): # POST
data = request.get_data()
if not request.has_header('Content-type'):
request.add_unredirected_header(
'Content-type',
'application/x-www-form-urlencoded')
if not request.has_header('Content-length'):
request.add_unredirected_header(
'Content-length', '%d' % len(data))
sel_host = host
if request.has_proxy():
scheme, sel = splittype(request.get_selector())
sel_host, sel_path = splithost(sel)
if not request.has_header('Host'):
request.add_unredirected_header('Host', sel_host)
for name, value in self.parent.addheaders:
name = name.capitalize()
if not request.has_header(name):
request.add_unredirected_header(name, value)
return request
def do_open(self, http_class, req, **http_conn_args):
"""Return an addinfourl object for the request, using http_class.
http_class must implement the HTTPConnection API from httplib.
The addinfourl return value is a file-like object. It also
has methods and attributes including:
- info(): return a mimetools.Message object for the headers
- geturl(): return the original request URL
- code: HTTP status code
"""
host = req.get_host()
if not host:
raise URLError('no host given')
# will parse host:port
h = http_class(host, timeout=req.timeout, **http_conn_args)
h.set_debuglevel(self._debuglevel)
headers = dict(req.unredirected_hdrs)
headers.update(dict((k, v) for k, v in req.headers.items()
if k not in headers))
# We want to make an HTTP/1.1 request, but the addinfourl
# class isn't prepared to deal with a persistent connection.
# It will try to read all remaining data from the socket,
# which will block while the server waits for the next request.
# So make sure the connection gets closed after the (only)
# request.
headers["Connection"] = "close"
headers = dict(
(name.title(), val) for name, val in headers.items())
if req._tunnel_host:
tunnel_headers = {}
proxy_auth_hdr = "Proxy-Authorization"
if proxy_auth_hdr in headers:
tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr]
# Proxy-Authorization should not be sent to origin
# server.
del headers[proxy_auth_hdr]
h.set_tunnel(req._tunnel_host, headers=tunnel_headers)
try:
h.request(req.get_method(), req.get_selector(), req.data, headers)
except socket.error, err: # XXX what error?
h.close()
raise URLError(err)
else:
try:
r = h.getresponse(buffering=True)
except TypeError: # buffering kw not supported
r = h.getresponse()
# Pick apart the HTTPResponse object to get the addinfourl
# object initialized properly.
# Wrap the HTTPResponse object in socket's file object adapter
# for Windows. That adapter calls recv(), so delegate recv()
# to read(). This weird wrapping allows the returned object to
# have readline() and readlines() methods.
# XXX It might be better to extract the read buffering code
# out of socket._fileobject() and into a base class.
r.recv = r.read
fp = socket._fileobject(r, close=True)
resp = addinfourl(fp, r.msg, req.get_full_url())
resp.code = r.status
resp.msg = r.reason
return resp
class HTTPHandler(AbstractHTTPHandler):
def http_open(self, req):
return self.do_open(httplib.HTTPConnection, req)
http_request = AbstractHTTPHandler.do_request_
if hasattr(httplib, 'HTTPS'):
class HTTPSHandler(AbstractHTTPHandler):
def __init__(self, debuglevel=0, context=None):
AbstractHTTPHandler.__init__(self, debuglevel)
self._context = context
def https_open(self, req):
return self.do_open(httplib.HTTPSConnection, req,
context=self._context)
https_request = AbstractHTTPHandler.do_request_
class HTTPCookieProcessor(BaseHandler):
def __init__(self, cookiejar=None):
import cookielib
if cookiejar is None:
cookiejar = cookielib.CookieJar()
self.cookiejar = cookiejar
def http_request(self, request):
self.cookiejar.add_cookie_header(request)
return request
def http_response(self, request, response):
self.cookiejar.extract_cookies(response, request)
return response
https_request = http_request
https_response = http_response
class UnknownHandler(BaseHandler):
def unknown_open(self, req):
type = req.get_type()
raise URLError('unknown url type: %s' % type)
def parse_keqv_list(l):
"""Parse list of key=value strings where keys are not duplicated."""
parsed = {}
for elt in l:
k, v = elt.split('=', 1)
if v[0] == '"' and v[-1] == '"':
v = v[1:-1]
parsed[k] = v
return parsed
def parse_http_list(s):
"""Parse lists as described by RFC 2068 Section 2.
In particular, parse comma-separated lists where the elements of
the list may include quoted-strings. A quoted-string could
contain a comma. A non-quoted string could have quotes in the
middle. Neither commas nor quotes count if they are escaped.
Only double-quotes count, not single-quotes.
"""
res = []
part = ''
escape = quote = False
for cur in s:
if escape:
part += cur
escape = False
continue
if quote:
if cur == '\\':
escape = True
continue
elif cur == '"':
quote = False
part += cur
continue
if cur == ',':
res.append(part)
part = ''
continue
if cur == '"':
quote = True
part += cur
# append last part
if part:
res.append(part)
return [part.strip() for part in res]
def _safe_gethostbyname(host):
try:
return socket.gethostbyname(host)
except socket.gaierror:
return None
class FileHandler(BaseHandler):
# Use local file or FTP depending on form of URL
def file_open(self, req):
url = req.get_selector()
if url[:2] == '//' and url[2:3] != '/' and (req.host and
req.host != 'localhost'):
req.type = 'ftp'
return self.parent.open(req)
else:
return self.open_local_file(req)
# names for the localhost
names = None
def get_names(self):
if FileHandler.names is None:
try:
FileHandler.names = tuple(
socket.gethostbyname_ex('localhost')[2] +
socket.gethostbyname_ex(socket.gethostname())[2])
except socket.gaierror:
FileHandler.names = (socket.gethostbyname('localhost'),)
return FileHandler.names
# not entirely sure what the rules are here
def open_local_file(self, req):
import email.utils
import mimetypes
host = req.get_host()
filename = req.get_selector()
localfile = url2pathname(filename)
try:
stats = os.stat(localfile)
size = stats.st_size
modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
mtype = mimetypes.guess_type(filename)[0]
headers = mimetools.Message(StringIO(
'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
(mtype or 'text/plain', size, modified)))
if host:
host, port = splitport(host)
if not host or \
(not port and _safe_gethostbyname(host) in self.get_names()):
if host:
origurl = 'file://' + host + filename
else:
origurl = 'file://' + filename
return addinfourl(open(localfile, 'rb'), headers, origurl)
except OSError, msg:
# urllib2 users shouldn't expect OSErrors coming from urlopen()
raise URLError(msg)
raise URLError('file not on local host')
class FTPHandler(BaseHandler):
def ftp_open(self, req):
import ftplib
import mimetypes
host = req.get_host()
if not host:
raise URLError('ftp error: no host given')
host, port = splitport(host)
if port is None:
port = ftplib.FTP_PORT
else:
port = int(port)
# username/password handling
user, host = splituser(host)
if user:
user, passwd = splitpasswd(user)
else:
passwd = None
host = unquote(host)
user = user or ''
passwd = passwd or ''
try:
host = socket.gethostbyname(host)
except socket.error, msg:
raise URLError(msg)
path, attrs = splitattr(req.get_selector())
dirs = path.split('/')
dirs = map(unquote, dirs)
dirs, file = dirs[:-1], dirs[-1]
if dirs and not dirs[0]:
dirs = dirs[1:]
try:
fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
type = file and 'I' or 'D'
for attr in attrs:
attr, value = splitvalue(attr)
if attr.lower() == 'type' and \
value in ('a', 'A', 'i', 'I', 'd', 'D'):
type = value.upper()
fp, retrlen = fw.retrfile(file, type)
headers = ""
mtype = mimetypes.guess_type(req.get_full_url())[0]
if mtype:
headers += "Content-type: %s\n" % mtype
if retrlen is not None and retrlen >= 0:
headers += "Content-length: %d\n" % retrlen
sf = StringIO(headers)
headers = mimetools.Message(sf)
return addinfourl(fp, headers, req.get_full_url())
except ftplib.all_errors, msg:
raise URLError, ('ftp error: %s' % msg), sys.exc_info()[2]
def connect_ftp(self, user, passwd, host, port, dirs, timeout):
fw = ftpwrapper(user, passwd, host, port, dirs, timeout,
persistent=False)
## fw.ftp.set_debuglevel(1)
return fw
class CacheFTPHandler(FTPHandler):
# XXX would be nice to have pluggable cache strategies
# XXX this stuff is definitely not thread safe
def __init__(self):
self.cache = {}
self.timeout = {}
self.soonest = 0
self.delay = 60
self.max_conns = 16
def setTimeout(self, t):
self.delay = t
def setMaxConns(self, m):
self.max_conns = m
def connect_ftp(self, user, passwd, host, port, dirs, timeout):
key = user, host, port, '/'.join(dirs), timeout
if key in self.cache:
self.timeout[key] = time.time() + self.delay
else:
self.cache[key] = ftpwrapper(user, passwd, host, port, dirs, timeout)
self.timeout[key] = time.time() + self.delay
self.check_cache()
return self.cache[key]
def check_cache(self):
# first check for old ones
t = time.time()
if self.soonest <= t:
for k, v in self.timeout.items():
if v < t:
self.cache[k].close()
del self.cache[k]
del self.timeout[k]
self.soonest = min(self.timeout.values())
# then check the size
if len(self.cache) == self.max_conns:
for k, v in self.timeout.items():
if v == self.soonest:
del self.cache[k]
del self.timeout[k]
break
self.soonest = min(self.timeout.values())
def clear_cache(self):
for conn in self.cache.values():
conn.close()
self.cache.clear()
self.timeout.clear()
"""Parse (absolute and relative) URLs.
urlparse module is based upon the following RFC specifications.
RFC 3986 (STD66): "Uniform Resource Identifiers" by T. Berners-Lee, R. Fielding
and L. Masinter, January 2005.
RFC 2732 : "Format for Literal IPv6 Addresses in URL's by R.Hinden, B.Carpenter
and L.Masinter, December 1999.
RFC 2396: "Uniform Resource Identifiers (URI)": Generic Syntax by T.
Berners-Lee, R. Fielding, and L. Masinter, August 1998.
RFC 2368: "The mailto URL scheme", by P.Hoffman , L Masinter, J. Zwinski, July 1998.
RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, UC Irvine, June
1995.
RFC 1738: "Uniform Resource Locators (URL)" by T. Berners-Lee, L. Masinter, M.
McCahill, December 1994
RFC 3986 is considered the current standard and any future changes to
urlparse module should conform with it. The urlparse module is
currently not entirely compliant with this RFC due to defacto
scenarios for parsing, and for backward compatibility purposes, some
parsing quirks from older RFCs are retained. The testcases in
test_urlparse.py provides a good indicator of parsing behavior.
"""
import re
__all__ = ["urlparse", "urlunparse", "urljoin", "urldefrag",
"urlsplit", "urlunsplit", "parse_qs", "parse_qsl"]
# A classification of schemes ('' means apply by default)
uses_relative = ['ftp', 'http', 'gopher', 'nntp', 'imap',
'wais', 'file', 'https', 'shttp', 'mms',
'prospero', 'rtsp', 'rtspu', '', 'sftp',
'svn', 'svn+ssh']
uses_netloc = ['ftp', 'http', 'gopher', 'nntp', 'telnet',
'imap', 'wais', 'file', 'mms', 'https', 'shttp',
'snews', 'prospero', 'rtsp', 'rtspu', 'rsync', '',
'svn', 'svn+ssh', 'sftp','nfs','git', 'git+ssh']
uses_params = ['ftp', 'hdl', 'prospero', 'http', 'imap',
'https', 'shttp', 'rtsp', 'rtspu', 'sip', 'sips',
'mms', '', 'sftp', 'tel']
# These are not actually used anymore, but should stay for backwards
# compatibility. (They are undocumented, but have a public-looking name.)
non_hierarchical = ['gopher', 'hdl', 'mailto', 'news',
'telnet', 'wais', 'imap', 'snews', 'sip', 'sips']
uses_query = ['http', 'wais', 'imap', 'https', 'shttp', 'mms',
'gopher', 'rtsp', 'rtspu', 'sip', 'sips', '']
uses_fragment = ['ftp', 'hdl', 'http', 'gopher', 'news',
'nntp', 'wais', 'https', 'shttp', 'snews',
'file', 'prospero', '']
# Characters valid in scheme names
scheme_chars = ('abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
'0123456789'
'+-.')
MAX_CACHE_SIZE = 20
_parse_cache = {}
def clear_cache():
"""Clear the parse cache."""
_parse_cache.clear()
class ResultMixin(object):
"""Shared methods for the parsed result objects."""
@property
def username(self):
netloc = self.netloc
if "@" in netloc:
userinfo = netloc.rsplit("@", 1)[0]
if ":" in userinfo:
userinfo = userinfo.split(":", 1)[0]
return userinfo
return None
@property
def password(self):
netloc = self.netloc
if "@" in netloc:
userinfo = netloc.rsplit("@", 1)[0]
if ":" in userinfo:
return userinfo.split(":", 1)[1]
return None
@property
def hostname(self):
netloc = self.netloc.split('@')[-1]
if '[' in netloc and ']' in netloc:
return netloc.split(']')[0][1:].lower()
elif ':' in netloc:
return netloc.split(':')[0].lower()
elif netloc == '':
return None
else:
return netloc.lower()
@property
def port(self):
netloc = self.netloc.split('@')[-1].split(']')[-1]
if ':' in netloc:
port = netloc.split(':')[1]
if port:
port = int(port, 10)
# verify legal port
if (0 <= port <= 65535):
return port
return None
from collections import namedtuple
class SplitResult(namedtuple('SplitResult', 'scheme netloc path query fragment'), ResultMixin):
__slots__ = ()
def geturl(self):
return urlunsplit(self)
class ParseResult(namedtuple('ParseResult', 'scheme netloc path params query fragment'), ResultMixin):
__slots__ = ()
def geturl(self):
return urlunparse(self)
def urlparse(url, scheme='', allow_fragments=True):
"""Parse a URL into 6 components:
<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
tuple = urlsplit(url, scheme, allow_fragments)
scheme, netloc, url, query, fragment = tuple
if scheme in uses_params and ';' in url:
url, params = _splitparams(url)
else:
params = ''
return ParseResult(scheme, netloc, url, params, query, fragment)
def _splitparams(url):
if '/' in url:
i = url.find(';', url.rfind('/'))
if i < 0:
return url, ''
else:
i = url.find(';')
return url[:i], url[i+1:]
def _splitnetloc(url, start=0):
delim = len(url) # position of end of domain part of url, default is end
for c in '/?#': # look for delimiters; the order is NOT important
wdelim = url.find(c, start) # find first of this delim
if wdelim >= 0: # if found
delim = min(delim, wdelim) # use earliest delim position
return url[start:delim], url[delim:] # return (domain, rest)
def _checknetloc(netloc):
if not netloc or not isinstance(netloc, unicode):
return
# looking for characters like \u2100 that expand to 'a/c'
# IDNA uses NFKC equivalence, so normalize for this check
import unicodedata
n = netloc.replace(u'@', u'') # ignore characters already included
n = n.replace(u':', u'') # but not the surrounding text
n = n.replace(u'#', u'')
n = n.replace(u'?', u'')
netloc2 = unicodedata.normalize('NFKC', n)
if n == netloc2:
return
for c in '/?#@:':
if c in netloc2:
raise ValueError("netloc %r contains invalid characters "
"under NFKC normalization"
% netloc)
def urlsplit(url, scheme='', allow_fragments=True):
"""Parse a URL into 5 components:
<scheme>://<netloc>/<path>?<query>#<fragment>
Return a 5-tuple: (scheme, netloc, path, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
allow_fragments = bool(allow_fragments)
key = url, scheme, allow_fragments, type(url), type(scheme)
cached = _parse_cache.get(key, None)
if cached:
return cached
if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
clear_cache()
netloc = query = fragment = ''
i = url.find(':')
if i > 0:
if url[:i] == 'http': # optimize the common case
scheme = url[:i].lower()
url = url[i+1:]
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)
if (('[' in netloc and ']' not in netloc) or
(']' in netloc and '[' not in netloc)):
raise ValueError("Invalid IPv6 URL")
if allow_fragments and '#' in url:
url, fragment = url.split('#', 1)
if '?' in url:
url, query = url.split('?', 1)
_checknetloc(netloc)
v = SplitResult(scheme, netloc, url, query, fragment)
_parse_cache[key] = v
return v
for c in url[:i]:
if c not in scheme_chars:
break
else:
# make sure "url" is not actually a port number (in which case
# "scheme" is really part of the path)
rest = url[i+1:]
if not rest or any(c not in '0123456789' for c in rest):
# not a port number
scheme, url = url[:i].lower(), rest
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)
if (('[' in netloc and ']' not in netloc) or
(']' in netloc and '[' not in netloc)):
raise ValueError("Invalid IPv6 URL")
if allow_fragments and '#' in url:
url, fragment = url.split('#', 1)
if '?' in url:
url, query = url.split('?', 1)
_checknetloc(netloc)
v = SplitResult(scheme, netloc, url, query, fragment)
_parse_cache[key] = v
return v
def urlunparse(data):
"""Put a parsed URL back together again. This may result in a
slightly different, but equivalent URL, if the URL that was parsed
originally had redundant delimiters, e.g. a ? with an empty query
(the draft states that these are equivalent)."""
scheme, netloc, url, params, query, fragment = data
if params:
url = "%s;%s" % (url, params)
return urlunsplit((scheme, netloc, url, query, fragment))
def urlunsplit(data):
"""Combine the elements of a tuple as returned by urlsplit() into a
complete URL as a string. The data argument can be any five-item iterable.
This may result in a slightly different, but equivalent URL, if the URL that
was parsed originally had unnecessary delimiters (for example, a ? with an
empty query; the RFC states that these are equivalent)."""
scheme, netloc, url, query, fragment = data
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
if url and url[:1] != '/': url = '/' + url
url = '//' + (netloc or '') + url
if scheme:
url = scheme + ':' + url
if query:
url = url + '?' + query
if fragment:
url = url + '#' + fragment
return url
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
if not base:
return url
if not url:
return base
bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
urlparse(base, '', allow_fragments)
scheme, netloc, path, params, query, fragment = \
urlparse(url, bscheme, allow_fragments)
if scheme != bscheme or scheme not in uses_relative:
return url
if scheme in uses_netloc:
if netloc:
return urlunparse((scheme, netloc, path,
params, query, fragment))
netloc = bnetloc
if path[:1] == '/':
return urlunparse((scheme, netloc, path,
params, query, fragment))
if not path and not params:
path = bpath
params = bparams
if not query:
query = bquery
return urlunparse((scheme, netloc, path,
params, query, fragment))
segments = bpath.split('/')[:-1] + path.split('/')
# XXX The stuff below is bogus in various ways...
if segments[-1] == '.':
segments[-1] = ''
while '.' in segments:
segments.remove('.')
while 1:
i = 1
n = len(segments) - 1
while i < n:
if (segments[i] == '..'
and segments[i-1] not in ('', '..')):
del segments[i-1:i+1]
break
i = i+1
else:
break
if segments == ['', '..']:
segments[-1] = ''
elif len(segments) >= 2 and segments[-1] == '..':
segments[-2:] = ['']
return urlunparse((scheme, netloc, '/'.join(segments),
params, query, fragment))
def urldefrag(url):
"""Removes any existing fragment from URL.
Returns a tuple of the defragmented URL and the fragment. If
the URL contained no fragments, the second element is the
empty string.
"""
if '#' in url:
s, n, p, a, q, frag = urlparse(url)
defrag = urlunparse((s, n, p, a, q, ''))
return defrag, frag
else:
return url, ''
try:
unicode
except NameError:
def _is_unicode(x):
return 0
else:
def _is_unicode(x):
return isinstance(x, unicode)
# unquote method for parse_qs and parse_qsl
# Cannot use directly from urllib as it would create a circular reference
# because urllib uses urlparse methods (urljoin). If you update this function,
# update it also in urllib. This code duplication does not existin in Python3.
_hexdig = '0123456789ABCDEFabcdef'
_hextochr = dict((a+b, chr(int(a+b,16)))
for a in _hexdig for b in _hexdig)
_asciire = re.compile('([\x00-\x7f]+)')
def unquote(s, recurse=False):
"""unquote('abc%20def') -> 'abc def'."""
if _is_unicode(s) and not recurse:
if '%' not in s:
return s
bits = _asciire.split(s)
res = [bits[0]]
append = res.append
for i in range(1, len(bits), 2):
append(unquote(str(bits[i]), True).decode('latin1'))
append(bits[i + 1])
return ''.join(res)
bits = s.split('%')
# fastpath
if len(bits) == 1:
return s
res = [bits[0]]
append = res.append
for item in bits[1:]:
try:
append(_hextochr[item[:2]])
append(item[2:])
except KeyError:
append('%')
append(item)
return ''.join(res)
def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
"""Parse a query given as a string argument.
Arguments:
qs: percent-encoded query string to be parsed
keep_blank_values: flag indicating whether blank values in
percent-encoded queries should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings. The default false value indicates that
blank values are to be ignored and treated as if they were
not included.
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
max_num_fields: int. If set, then throws a ValueError if there
are more than n fields read by parse_qsl().
"""
dict = {}
for name, value in parse_qsl(qs, keep_blank_values, strict_parsing,
max_num_fields):
if name in dict:
dict[name].append(value)
else:
dict[name] = [value]
return dict
def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
"""Parse a query given as a string argument.
Arguments:
qs: percent-encoded query string to be parsed
keep_blank_values: flag indicating whether blank values in
percent-encoded queries should be treated as blank strings. A
true value indicates that blanks should be retained as blank
strings. The default false value indicates that blank values
are to be ignored and treated as if they were not included.
strict_parsing: flag indicating what to do with parsing errors. If
false (the default), errors are silently ignored. If true,
errors raise a ValueError exception.
max_num_fields: int. If set, then throws a ValueError if there
are more than n fields read by parse_qsl().
Returns a list, as G-d intended.
"""
# If max_num_fields is defined then check that the number of fields
# is less than max_num_fields. This prevents a memory exhaustion DOS
# attack via post bodies with many fields.
if max_num_fields is not None:
num_fields = 1 + qs.count('&') + qs.count(';')
if max_num_fields < num_fields:
raise ValueError('Max number of fields exceeded')
pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
r = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError, "bad query field: %r" % (name_value,)
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = unquote(nv[0].replace('+', ' '))
value = unquote(nv[1].replace('+', ' '))
r.append((name, value))
return r
"""Hook to allow user-specified customization code to run.
As a policy, Python doesn't run user-specified code on startup of
Python programs (interactive sessions execute the script specified in
the PYTHONSTARTUP environment variable if it exists).
However, some programs or sites may find it convenient to allow users
to have a standard customization file, which gets run when a program
requests it. This module implements such a mechanism. A program
that wishes to use the mechanism must execute the statement
import user
The user module looks for a file .pythonrc.py in the user's home
directory and if it can be opened, execfile()s it in its own global
namespace. Errors during this phase are not caught; that's up to the
program that imports the user module, if it wishes.
The user's .pythonrc.py could conceivably test for sys.version if it
wishes to do different things depending on the Python version.
"""
from warnings import warnpy3k
warnpy3k("the user module has been removed in Python 3.0", stacklevel=2)
del warnpy3k
import os
home = os.curdir # Default
if 'HOME' in os.environ:
home = os.environ['HOME']
elif os.name == 'posix':
home = os.path.expanduser("~/")
elif os.name == 'nt': # Contributed by Jeff Bauer
if 'HOMEPATH' in os.environ:
if 'HOMEDRIVE' in os.environ:
home = os.environ['HOMEDRIVE'] + os.environ['HOMEPATH']
else:
home = os.environ['HOMEPATH']
pythonrc = os.path.join(home, ".pythonrc.py")
try:
f = open(pythonrc)
except IOError:
pass
else:
f.close()
execfile(pythonrc)
"""A more or less complete user-defined wrapper around dictionary objects."""
class UserDict:
def __init__(*args, **kwargs):
if not args:
raise TypeError("descriptor '__init__' of 'UserDict' object "
"needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
if args:
dict = args[0]
elif 'dict' in kwargs:
dict = kwargs.pop('dict')
import warnings
warnings.warn("Passing 'dict' as keyword argument is "
"deprecated", PendingDeprecationWarning,
stacklevel=2)
else:
dict = None
self.data = {}
if dict is not None:
self.update(dict)
if len(kwargs):
self.update(kwargs)
def __repr__(self): return repr(self.data)
def __cmp__(self, dict):
if isinstance(dict, UserDict):
return cmp(self.data, dict.data)
else:
return cmp(self.data, dict)
__hash__ = None # Avoid Py3k warning
def __len__(self): return len(self.data)
def __getitem__(self, key):
if key in self.data:
return self.data[key]
if hasattr(self.__class__, "__missing__"):
return self.__class__.__missing__(self, key)
raise KeyError(key)
def __setitem__(self, key, item): self.data[key] = item
def __delitem__(self, key): del self.data[key]
def clear(self): self.data.clear()
def copy(self):
if self.__class__ is UserDict:
return UserDict(self.data.copy())
import copy
data = self.data
try:
self.data = {}
c = copy.copy(self)
finally:
self.data = data
c.update(self)
return c
def keys(self): return self.data.keys()
def items(self): return self.data.items()
def iteritems(self): return self.data.iteritems()
def iterkeys(self): return self.data.iterkeys()
def itervalues(self): return self.data.itervalues()
def values(self): return self.data.values()
def has_key(self, key): return key in self.data
def update(*args, **kwargs):
if not args:
raise TypeError("descriptor 'update' of 'UserDict' object "
"needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
if args:
dict = args[0]
elif 'dict' in kwargs:
dict = kwargs.pop('dict')
import warnings
warnings.warn("Passing 'dict' as keyword argument is deprecated",
PendingDeprecationWarning, stacklevel=2)
else:
dict = None
if dict is None:
pass
elif isinstance(dict, UserDict):
self.data.update(dict.data)
elif isinstance(dict, type({})) or not hasattr(dict, 'items'):
self.data.update(dict)
else:
for k, v in dict.items():
self[k] = v
if len(kwargs):
self.data.update(kwargs)
def get(self, key, failobj=None):
if key not in self:
return failobj
return self[key]
def setdefault(self, key, failobj=None):
if key not in self:
self[key] = failobj
return self[key]
def pop(self, key, *args):
return self.data.pop(key, *args)
def popitem(self):
return self.data.popitem()
def __contains__(self, key):
return key in self.data
@classmethod
def fromkeys(cls, iterable, value=None):
d = cls()
for key in iterable:
d[key] = value
return d
class IterableUserDict(UserDict):
def __iter__(self):
return iter(self.data)
import _abcoll
_abcoll.MutableMapping.register(IterableUserDict)
class DictMixin:
# Mixin defining all dictionary methods for classes that already have
# a minimum dictionary interface including getitem, setitem, delitem,
# and keys. Without knowledge of the subclass constructor, the mixin
# does not define __init__() or copy(). In addition to the four base
# methods, progressively more efficiency comes with defining
# __contains__(), __iter__(), and iteritems().
# second level definitions support higher levels
def __iter__(self):
for k in self.keys():
yield k
def has_key(self, key):
try:
self[key]
except KeyError:
return False
return True
def __contains__(self, key):
return self.has_key(key)
# third level takes advantage of second level definitions
def iteritems(self):
for k in self:
yield (k, self[k])
def iterkeys(self):
return self.__iter__()
# fourth level uses definitions from lower levels
def itervalues(self):
for _, v in self.iteritems():
yield v
def values(self):
return [v for _, v in self.iteritems()]
def items(self):
return list(self.iteritems())
def clear(self):
for key in self.keys():
del self[key]
def setdefault(self, key, default=None):
try:
return self[key]
except KeyError:
self[key] = default
return default
def pop(self, key, *args):
if len(args) > 1:
raise TypeError, "pop expected at most 2 arguments, got "\
+ repr(1 + len(args))
try:
value = self[key]
except KeyError:
if args:
return args[0]
raise
del self[key]
return value
def popitem(self):
try:
k, v = self.iteritems().next()
except StopIteration:
raise KeyError, 'container is empty'
del self[k]
return (k, v)
def update(self, other=None, **kwargs):
# Make progressively weaker assumptions about "other"
if other is None:
pass
elif hasattr(other, 'iteritems'): # iteritems saves memory and lookups
for k, v in other.iteritems():
self[k] = v
elif hasattr(other, 'keys'):
for k in other.keys():
self[k] = other[k]
else:
for k, v in other:
self[k] = v
if kwargs:
self.update(kwargs)
def get(self, key, default=None):
try:
return self[key]
except KeyError:
return default
def __repr__(self):
return repr(dict(self.iteritems()))
def __cmp__(self, other):
if other is None:
return 1
if isinstance(other, DictMixin):
other = dict(other.iteritems())
return cmp(dict(self.iteritems()), other)
def __len__(self):
return len(self.keys())
"""A more or less complete user-defined wrapper around list objects."""
import collections
class UserList(collections.MutableSequence):
def __init__(self, initlist=None):
self.data = []
if initlist is not None:
# XXX should this accept an arbitrary sequence?
if type(initlist) == type(self.data):
self.data[:] = initlist
elif isinstance(initlist, UserList):
self.data[:] = initlist.data[:]
else:
self.data = list(initlist)
def __repr__(self): return repr(self.data)
def __lt__(self, other): return self.data < self.__cast(other)
def __le__(self, other): return self.data <= self.__cast(other)
def __eq__(self, other): return self.data == self.__cast(other)
def __ne__(self, other): return self.data != self.__cast(other)
def __gt__(self, other): return self.data > self.__cast(other)
def __ge__(self, other): return self.data >= self.__cast(other)
def __cast(self, other):
if isinstance(other, UserList): return other.data
else: return other
def __cmp__(self, other):
return cmp(self.data, self.__cast(other))
__hash__ = None # Mutable sequence, so not hashable
def __contains__(self, item): return item in self.data
def __len__(self): return len(self.data)
def __getitem__(self, i): return self.data[i]
def __setitem__(self, i, item): self.data[i] = item
def __delitem__(self, i): del self.data[i]
def __getslice__(self, i, j):
i = max(i, 0); j = max(j, 0)
return self.__class__(self.data[i:j])
def __setslice__(self, i, j, other):
i = max(i, 0); j = max(j, 0)
if isinstance(other, UserList):
self.data[i:j] = other.data
elif isinstance(other, type(self.data)):
self.data[i:j] = other
else:
self.data[i:j] = list(other)
def __delslice__(self, i, j):
i = max(i, 0); j = max(j, 0)
del self.data[i:j]
def __add__(self, other):
if isinstance(other, UserList):
return self.__class__(self.data + other.data)
elif isinstance(other, type(self.data)):
return self.__class__(self.data + other)
else:
return self.__class__(self.data + list(other))
def __radd__(self, other):
if isinstance(other, UserList):
return self.__class__(other.data + self.data)
elif isinstance(other, type(self.data)):
return self.__class__(other + self.data)
else:
return self.__class__(list(other) + self.data)
def __iadd__(self, other):
if isinstance(other, UserList):
self.data += other.data
elif isinstance(other, type(self.data)):
self.data += other
else:
self.data += list(other)
return self
def __mul__(self, n):
return self.__class__(self.data*n)
__rmul__ = __mul__
def __imul__(self, n):
self.data *= n
return self
def append(self, item): self.data.append(item)
def insert(self, i, item): self.data.insert(i, item)
def pop(self, i=-1): return self.data.pop(i)
def remove(self, item): self.data.remove(item)
def count(self, item): return self.data.count(item)
def index(self, item, *args): return self.data.index(item, *args)
def reverse(self): self.data.reverse()
def sort(self, *args, **kwds): self.data.sort(*args, **kwds)
def extend(self, other):
if isinstance(other, UserList):
self.data.extend(other.data)
else:
self.data.extend(other)
#!/usr/bin/env python
## vim:ts=4:et:nowrap
"""A user-defined wrapper around string objects
Note: string objects have grown methods in Python 1.6
This module requires Python 1.6 or later.
"""
import sys
import collections
__all__ = ["UserString","MutableString"]
class UserString(collections.Sequence):
def __init__(self, seq):
if isinstance(seq, basestring):
self.data = seq
elif isinstance(seq, UserString):
self.data = seq.data[:]
else:
self.data = str(seq)
def __str__(self): return str(self.data)
def __repr__(self): return repr(self.data)
def __int__(self): return int(self.data)
def __long__(self): return long(self.data)
def __float__(self): return float(self.data)
def __complex__(self): return complex(self.data)
def __hash__(self): return hash(self.data)
def __cmp__(self, string):
if isinstance(string, UserString):
return cmp(self.data, string.data)
else:
return cmp(self.data, string)
def __contains__(self, char):
return char in self.data
def __len__(self): return len(self.data)
def __getitem__(self, index): return self.__class__(self.data[index])
def __getslice__(self, start, end):
start = max(start, 0); end = max(end, 0)
return self.__class__(self.data[start:end])
def __add__(self, other):
if isinstance(other, UserString):
return self.__class__(self.data + other.data)
elif isinstance(other, basestring):
return self.__class__(self.data + other)
else:
return self.__class__(self.data + str(other))
def __radd__(self, other):
if isinstance(other, basestring):
return self.__class__(other + self.data)
else:
return self.__class__(str(other) + self.data)
def __mul__(self, n):
return self.__class__(self.data*n)
__rmul__ = __mul__
def __mod__(self, args):
return self.__class__(self.data % args)
# the following methods are defined in alphabetical order:
def capitalize(self): return self.__class__(self.data.capitalize())
def center(self, width, *args):
return self.__class__(self.data.center(width, *args))
def count(self, sub, start=0, end=sys.maxint):
return self.data.count(sub, start, end)
def decode(self, encoding=None, errors=None): # XXX improve this?
if encoding:
if errors:
return self.__class__(self.data.decode(encoding, errors))
else:
return self.__class__(self.data.decode(encoding))
else:
return self.__class__(self.data.decode())
def encode(self, encoding=None, errors=None): # XXX improve this?
if encoding:
if errors:
return self.__class__(self.data.encode(encoding, errors))
else:
return self.__class__(self.data.encode(encoding))
else:
return self.__class__(self.data.encode())
def endswith(self, suffix, start=0, end=sys.maxint):
return self.data.endswith(suffix, start, end)
def expandtabs(self, tabsize=8):
return self.__class__(self.data.expandtabs(tabsize))
def find(self, sub, start=0, end=sys.maxint):
return self.data.find(sub, start, end)
def index(self, sub, start=0, end=sys.maxint):
return self.data.index(sub, start, end)
def isalpha(self): return self.data.isalpha()
def isalnum(self): return self.data.isalnum()
def isdecimal(self): return self.data.isdecimal()
def isdigit(self): return self.data.isdigit()
def islower(self): return self.data.islower()
def isnumeric(self): return self.data.isnumeric()
def isspace(self): return self.data.isspace()
def istitle(self): return self.data.istitle()
def isupper(self): return self.data.isupper()
def join(self, seq): return self.data.join(seq)
def ljust(self, width, *args):
return self.__class__(self.data.ljust(width, *args))
def lower(self): return self.__class__(self.data.lower())
def lstrip(self, chars=None): return self.__class__(self.data.lstrip(chars))
def partition(self, sep):
return self.data.partition(sep)
def replace(self, old, new, maxsplit=-1):
return self.__class__(self.data.replace(old, new, maxsplit))
def rfind(self, sub, start=0, end=sys.maxint):
return self.data.rfind(sub, start, end)
def rindex(self, sub, start=0, end=sys.maxint):
return self.data.rindex(sub, start, end)
def rjust(self, width, *args):
return self.__class__(self.data.rjust(width, *args))
def rpartition(self, sep):
return self.data.rpartition(sep)
def rstrip(self, chars=None): return self.__class__(self.data.rstrip(chars))
def split(self, sep=None, maxsplit=-1):
return self.data.split(sep, maxsplit)
def rsplit(self, sep=None, maxsplit=-1):
return self.data.rsplit(sep, maxsplit)
def splitlines(self, keepends=0): return self.data.splitlines(keepends)
def startswith(self, prefix, start=0, end=sys.maxint):
return self.data.startswith(prefix, start, end)
def strip(self, chars=None): return self.__class__(self.data.strip(chars))
def swapcase(self): return self.__class__(self.data.swapcase())
def title(self): return self.__class__(self.data.title())
def translate(self, *args):
return self.__class__(self.data.translate(*args))
def upper(self): return self.__class__(self.data.upper())
def zfill(self, width): return self.__class__(self.data.zfill(width))
class MutableString(UserString, collections.MutableSequence):
"""mutable string objects
Python strings are immutable objects. This has the advantage, that
strings may be used as dictionary keys. If this property isn't needed
and you insist on changing string values in place instead, you may cheat
and use MutableString.
But the purpose of this class is an educational one: to prevent
people from inventing their own mutable string class derived
from UserString and than forget thereby to remove (override) the
__hash__ method inherited from UserString. This would lead to
errors that would be very hard to track down.
A faster and better solution is to rewrite your program using lists."""
def __init__(self, string=""):
from warnings import warnpy3k
warnpy3k('the class UserString.MutableString has been removed in '
'Python 3.0', stacklevel=2)
self.data = string
# We inherit object.__hash__, so we must deny this explicitly
__hash__ = None
def __setitem__(self, index, sub):
if isinstance(index, slice):
if isinstance(sub, UserString):
sub = sub.data
elif not isinstance(sub, basestring):
sub = str(sub)
start, stop, step = index.indices(len(self.data))
if step == -1:
start, stop = stop+1, start+1
sub = sub[::-1]
elif step != 1:
# XXX(twouters): I guess we should be reimplementing
# the extended slice assignment/deletion algorithm here...
raise TypeError, "invalid step in slicing assignment"
start = min(start, stop)
self.data = self.data[:start] + sub + self.data[stop:]
else:
if index < 0:
index += len(self.data)
if index < 0 or index >= len(self.data): raise IndexError
self.data = self.data[:index] + sub + self.data[index+1:]
def __delitem__(self, index):
if isinstance(index, slice):
start, stop, step = index.indices(len(self.data))
if step == -1:
start, stop = stop+1, start+1
elif step != 1:
# XXX(twouters): see same block in __setitem__
raise TypeError, "invalid step in slicing deletion"
start = min(start, stop)
self.data = self.data[:start] + self.data[stop:]
else:
if index < 0:
index += len(self.data)
if index < 0 or index >= len(self.data): raise IndexError
self.data = self.data[:index] + self.data[index+1:]
def __setslice__(self, start, end, sub):
start = max(start, 0); end = max(end, 0)
if isinstance(sub, UserString):
self.data = self.data[:start]+sub.data+self.data[end:]
elif isinstance(sub, basestring):
self.data = self.data[:start]+sub+self.data[end:]
else:
self.data = self.data[:start]+str(sub)+self.data[end:]
def __delslice__(self, start, end):
start = max(start, 0); end = max(end, 0)
self.data = self.data[:start] + self.data[end:]
def immutable(self):
return UserString(self.data)
def __iadd__(self, other):
if isinstance(other, UserString):
self.data += other.data
elif isinstance(other, basestring):
self.data += other
else:
self.data += str(other)
return self
def __imul__(self, n):
self.data *= n
return self
def insert(self, index, value):
self[index:index] = value
if __name__ == "__main__":
# execute the regression test to stdout, if called as a script:
import os
called_in_dir, called_as = os.path.split(sys.argv[0])
called_as, py = os.path.splitext(called_as)
if '-q' in sys.argv:
from test import test_support
test_support.verbose = 0
__import__('test.test_' + called_as.lower())
#! /usr/bin/env python
# Copyright 1994 by Lance Ellinghouse
# Cathedral City, California Republic, United States of America.
# All Rights Reserved
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Lance Ellinghouse
# not be used in advertising or publicity pertaining to distribution
# of the software without specific, written prior permission.
# LANCE ELLINGHOUSE DISCLAIMS ALL WARRANTIES WITH REGARD TO
# THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
# FITNESS, IN NO EVENT SHALL LANCE ELLINGHOUSE CENTRUM BE LIABLE
# FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
# Modified by Jack Jansen, CWI, July 1995:
# - Use binascii module to do the actual line-by-line conversion
# between ascii and binary. This results in a 1000-fold speedup. The C
# version is still 5 times faster, though.
# - Arguments more compliant with python standard
"""Implementation of the UUencode and UUdecode functions.
encode(in_file, out_file [,name, mode])
decode(in_file [, out_file, mode])
"""
import binascii
import os
import sys
__all__ = ["Error", "encode", "decode"]
class Error(Exception):
pass
def encode(in_file, out_file, name=None, mode=None):
"""Uuencode file"""
#
# If in_file is a pathname open it and change defaults
#
opened_files = []
try:
if in_file == '-':
in_file = sys.stdin
elif isinstance(in_file, basestring):
if name is None:
name = os.path.basename(in_file)
if mode is None:
try:
mode = os.stat(in_file).st_mode
except AttributeError:
pass
in_file = open(in_file, 'rb')
opened_files.append(in_file)
#
# Open out_file if it is a pathname
#
if out_file == '-':
out_file = sys.stdout
elif isinstance(out_file, basestring):
out_file = open(out_file, 'wb')
opened_files.append(out_file)
#
# Set defaults for name and mode
#
if name is None:
name = '-'
if mode is None:
mode = 0666
#
# Remove newline chars from name
#
name = name.replace('\n','\\n')
name = name.replace('\r','\\r')
#
# Write the data
#
out_file.write('begin %o %s\n' % ((mode&0777),name))
data = in_file.read(45)
while len(data) > 0:
out_file.write(binascii.b2a_uu(data))
data = in_file.read(45)
out_file.write(' \nend\n')
finally:
for f in opened_files:
f.close()
def decode(in_file, out_file=None, mode=None, quiet=0):
"""Decode uuencoded file"""
#
# Open the input file, if needed.
#
opened_files = []
if in_file == '-':
in_file = sys.stdin
elif isinstance(in_file, basestring):
in_file = open(in_file)
opened_files.append(in_file)
try:
#
# Read until a begin is encountered or we've exhausted the file
#
while True:
hdr = in_file.readline()
if not hdr:
raise Error('No valid begin line found in input file')
if not hdr.startswith('begin'):
continue
hdrfields = hdr.split(' ', 2)
if len(hdrfields) == 3 and hdrfields[0] == 'begin':
try:
int(hdrfields[1], 8)
break
except ValueError:
pass
if out_file is None:
out_file = hdrfields[2].rstrip()
if os.path.exists(out_file):
raise Error('Cannot overwrite existing file: %s' % out_file)
if mode is None:
mode = int(hdrfields[1], 8)
#
# Open the output file
#
if out_file == '-':
out_file = sys.stdout
elif isinstance(out_file, basestring):
fp = open(out_file, 'wb')
try:
os.path.chmod(out_file, mode)
except AttributeError:
pass
out_file = fp
opened_files.append(out_file)
#
# Main decoding loop
#
s = in_file.readline()
while s and s.strip() != 'end':
try:
data = binascii.a2b_uu(s)
except binascii.Error, v:
# Workaround for broken uuencoders by /Fredrik Lundh
nbytes = (((ord(s[0])-32) & 63) * 4 + 5) // 3
data = binascii.a2b_uu(s[:nbytes])
if not quiet:
sys.stderr.write("Warning: %s\n" % v)
out_file.write(data)
s = in_file.readline()
if not s:
raise Error('Truncated input file')
finally:
for f in opened_files:
f.close()
def test():
"""uuencode/uudecode main program"""
import optparse
parser = optparse.OptionParser(usage='usage: %prog [-d] [-t] [input [output]]')
parser.add_option('-d', '--decode', dest='decode', help='Decode (instead of encode)?', default=False, action='store_true')
parser.add_option('-t', '--text', dest='text', help='data is text, encoded format unix-compatible text?', default=False, action='store_true')
(options, args) = parser.parse_args()
if len(args) > 2:
parser.error('incorrect number of arguments')
sys.exit(1)
input = sys.stdin
output = sys.stdout
if len(args) > 0:
input = args[0]
if len(args) > 1:
output = args[1]
if options.decode:
if options.text:
if isinstance(output, basestring):
output = open(output, 'w')
else:
print sys.argv[0], ': cannot do -t to stdout'
sys.exit(1)
decode(input, output)
else:
if options.text:
if isinstance(input, basestring):
input = open(input, 'r')
else:
print sys.argv[0], ': cannot do -t from stdin'
sys.exit(1)
encode(input, output)
if __name__ == '__main__':
test()
r"""UUID objects (universally unique identifiers) according to RFC 4122.
This module provides immutable UUID objects (class UUID) and the functions
uuid1(), uuid3(), uuid4(), uuid5() for generating version 1, 3, 4, and 5
UUIDs as specified in RFC 4122.
If all you want is a unique ID, you should probably call uuid1() or uuid4().
Note that uuid1() may compromise privacy since it creates a UUID containing
the computer's network address. uuid4() creates a random UUID.
Typical usage:
>>> import uuid
# make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')
# make a UUID using an MD5 hash of a namespace UUID and a name
>>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org')
UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')
# make a random UUID
>>> uuid.uuid4()
UUID('16fd2706-8baf-433b-82eb-8c7fada847da')
# make a UUID using a SHA-1 hash of a namespace UUID and a name
>>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org')
UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')
# make a UUID from a string of hex digits (braces and hyphens ignored)
>>> x = uuid.UUID('{00010203-0405-0607-0809-0a0b0c0d0e0f}')
# convert a UUID to a string of hex digits in standard form
>>> str(x)
'00010203-0405-0607-0809-0a0b0c0d0e0f'
# get the raw 16 bytes of the UUID
>>> x.bytes
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
# make a UUID from a 16-byte string
>>> uuid.UUID(bytes=x.bytes)
UUID('00010203-0405-0607-0809-0a0b0c0d0e0f')
"""
import os
__author__ = 'Ka-Ping Yee <[email protected]>'
RESERVED_NCS, RFC_4122, RESERVED_MICROSOFT, RESERVED_FUTURE = [
'reserved for NCS compatibility', 'specified in RFC 4122',
'reserved for Microsoft compatibility', 'reserved for future definition']
class UUID(object):
"""Instances of the UUID class represent UUIDs as specified in RFC 4122.
UUID objects are immutable, hashable, and usable as dictionary keys.
Converting a UUID to a string with str() yields something in the form
'12345678-1234-1234-1234-123456789abc'. The UUID constructor accepts
five possible forms: a similar string of hexadecimal digits, or a tuple
of six integer fields (with 32-bit, 16-bit, 16-bit, 8-bit, 8-bit, and
48-bit values respectively) as an argument named 'fields', or a string
of 16 bytes (with all the integer fields in big-endian order) as an
argument named 'bytes', or a string of 16 bytes (with the first three
fields in little-endian order) as an argument named 'bytes_le', or a
single 128-bit integer as an argument named 'int'.
UUIDs have these read-only attributes:
bytes the UUID as a 16-byte string (containing the six
integer fields in big-endian byte order)
bytes_le the UUID as a 16-byte string (with time_low, time_mid,
and time_hi_version in little-endian byte order)
fields a tuple of the six integer fields of the UUID,
which are also available as six individual attributes
and two derived attributes:
time_low the first 32 bits of the UUID
time_mid the next 16 bits of the UUID
time_hi_version the next 16 bits of the UUID
clock_seq_hi_variant the next 8 bits of the UUID
clock_seq_low the next 8 bits of the UUID
node the last 48 bits of the UUID
time the 60-bit timestamp
clock_seq the 14-bit sequence number
hex the UUID as a 32-character hexadecimal string
int the UUID as a 128-bit integer
urn the UUID as a URN as specified in RFC 4122
variant the UUID variant (one of the constants RESERVED_NCS,
RFC_4122, RESERVED_MICROSOFT, or RESERVED_FUTURE)
version the UUID version number (1 through 5, meaningful only
when the variant is RFC_4122)
"""
def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None,
int=None, version=None):
r"""Create a UUID from either a string of 32 hexadecimal digits,
a string of 16 bytes as the 'bytes' argument, a string of 16 bytes
in little-endian order as the 'bytes_le' argument, a tuple of six
integers (32-bit time_low, 16-bit time_mid, 16-bit time_hi_version,
8-bit clock_seq_hi_variant, 8-bit clock_seq_low, 48-bit node) as
the 'fields' argument, or a single 128-bit integer as the 'int'
argument. When a string of hex digits is given, curly braces,
hyphens, and a URN prefix are all optional. For example, these
expressions all yield the same UUID:
UUID('{12345678-1234-5678-1234-567812345678}')
UUID('12345678123456781234567812345678')
UUID('urn:uuid:12345678-1234-5678-1234-567812345678')
UUID(bytes='\x12\x34\x56\x78'*4)
UUID(bytes_le='\x78\x56\x34\x12\x34\x12\x78\x56' +
'\x12\x34\x56\x78\x12\x34\x56\x78')
UUID(fields=(0x12345678, 0x1234, 0x5678, 0x12, 0x34, 0x567812345678))
UUID(int=0x12345678123456781234567812345678)
Exactly one of 'hex', 'bytes', 'bytes_le', 'fields', or 'int' must
be given. The 'version' argument is optional; if given, the resulting
UUID will have its variant and version set according to RFC 4122,
overriding the given 'hex', 'bytes', 'bytes_le', 'fields', or 'int'.
"""
if [hex, bytes, bytes_le, fields, int].count(None) != 4:
raise TypeError('need one of hex, bytes, bytes_le, fields, or int')
if hex is not None:
hex = hex.replace('urn:', '').replace('uuid:', '')
hex = hex.strip('{}').replace('-', '')
if len(hex) != 32:
raise ValueError('badly formed hexadecimal UUID string')
int = long(hex, 16)
if bytes_le is not None:
if len(bytes_le) != 16:
raise ValueError('bytes_le is not a 16-char string')
bytes = (bytes_le[3] + bytes_le[2] + bytes_le[1] + bytes_le[0] +
bytes_le[5] + bytes_le[4] + bytes_le[7] + bytes_le[6] +
bytes_le[8:])
if bytes is not None:
if len(bytes) != 16:
raise ValueError('bytes is not a 16-char string')
int = long(('%02x'*16) % tuple(map(ord, bytes)), 16)
if fields is not None:
if len(fields) != 6:
raise ValueError('fields is not a 6-tuple')
(time_low, time_mid, time_hi_version,
clock_seq_hi_variant, clock_seq_low, node) = fields
if not 0 <= time_low < 1<<32L:
raise ValueError('field 1 out of range (need a 32-bit value)')
if not 0 <= time_mid < 1<<16L:
raise ValueError('field 2 out of range (need a 16-bit value)')
if not 0 <= time_hi_version < 1<<16L:
raise ValueError('field 3 out of range (need a 16-bit value)')
if not 0 <= clock_seq_hi_variant < 1<<8L:
raise ValueError('field 4 out of range (need an 8-bit value)')
if not 0 <= clock_seq_low < 1<<8L:
raise ValueError('field 5 out of range (need an 8-bit value)')
if not 0 <= node < 1<<48L:
raise ValueError('field 6 out of range (need a 48-bit value)')
clock_seq = (clock_seq_hi_variant << 8L) | clock_seq_low
int = ((time_low << 96L) | (time_mid << 80L) |
(time_hi_version << 64L) | (clock_seq << 48L) | node)
if int is not None:
if not 0 <= int < 1<<128L:
raise ValueError('int is out of range (need a 128-bit value)')
if version is not None:
if not 1 <= version <= 5:
raise ValueError('illegal version number')
# Set the variant to RFC 4122.
int &= ~(0xc000 << 48L)
int |= 0x8000 << 48L
# Set the version number.
int &= ~(0xf000 << 64L)
int |= version << 76L
self.__dict__['int'] = int
def __cmp__(self, other):
if isinstance(other, UUID):
return cmp(self.int, other.int)
return NotImplemented
def __hash__(self):
return hash(self.int)
def __int__(self):
return self.int
def __repr__(self):
return 'UUID(%r)' % str(self)
def __setattr__(self, name, value):
raise TypeError('UUID objects are immutable')
def __str__(self):
hex = '%032x' % self.int
return '%s-%s-%s-%s-%s' % (
hex[:8], hex[8:12], hex[12:16], hex[16:20], hex[20:])
def get_bytes(self):
bytes = ''
for shift in range(0, 128, 8):
bytes = chr((self.int >> shift) & 0xff) + bytes
return bytes
bytes = property(get_bytes)
def get_bytes_le(self):
bytes = self.bytes
return (bytes[3] + bytes[2] + bytes[1] + bytes[0] +
bytes[5] + bytes[4] + bytes[7] + bytes[6] + bytes[8:])
bytes_le = property(get_bytes_le)
def get_fields(self):
return (self.time_low, self.time_mid, self.time_hi_version,
self.clock_seq_hi_variant, self.clock_seq_low, self.node)
fields = property(get_fields)
def get_time_low(self):
return self.int >> 96L
time_low = property(get_time_low)
def get_time_mid(self):
return (self.int >> 80L) & 0xffff
time_mid = property(get_time_mid)
def get_time_hi_version(self):
return (self.int >> 64L) & 0xffff
time_hi_version = property(get_time_hi_version)
def get_clock_seq_hi_variant(self):
return (self.int >> 56L) & 0xff
clock_seq_hi_variant = property(get_clock_seq_hi_variant)
def get_clock_seq_low(self):
return (self.int >> 48L) & 0xff
clock_seq_low = property(get_clock_seq_low)
def get_time(self):
return (((self.time_hi_version & 0x0fffL) << 48L) |
(self.time_mid << 32L) | self.time_low)
time = property(get_time)
def get_clock_seq(self):
return (((self.clock_seq_hi_variant & 0x3fL) << 8L) |
self.clock_seq_low)
clock_seq = property(get_clock_seq)
def get_node(self):
return self.int & 0xffffffffffff
node = property(get_node)
def get_hex(self):
return '%032x' % self.int
hex = property(get_hex)
def get_urn(self):
return 'urn:uuid:' + str(self)
urn = property(get_urn)
def get_variant(self):
if not self.int & (0x8000 << 48L):
return RESERVED_NCS
elif not self.int & (0x4000 << 48L):
return RFC_4122
elif not self.int & (0x2000 << 48L):
return RESERVED_MICROSOFT
else:
return RESERVED_FUTURE
variant = property(get_variant)
def get_version(self):
# The version bits are only meaningful for RFC 4122 UUIDs.
if self.variant == RFC_4122:
return int((self.int >> 76L) & 0xf)
version = property(get_version)
def _popen(command, args):
import os
path = os.environ.get("PATH", os.defpath).split(os.pathsep)
path.extend(('/sbin', '/usr/sbin'))
for dir in path:
executable = os.path.join(dir, command)
if (os.path.exists(executable) and
os.access(executable, os.F_OK | os.X_OK) and
not os.path.isdir(executable)):
break
else:
return None
# LC_ALL to ensure English output, 2>/dev/null to prevent output on
# stderr (Note: we don't have an example where the words we search for
# are actually localized, but in theory some system could do so.)
cmd = 'LC_ALL=C %s %s 2>/dev/null' % (executable, args)
return os.popen(cmd)
def _find_mac(command, args, hw_identifiers, get_index):
try:
pipe = _popen(command, args)
if not pipe:
return
with pipe:
for line in pipe:
words = line.lower().rstrip().split()
for i in range(len(words)):
if words[i] in hw_identifiers:
try:
word = words[get_index(i)]
mac = int(word.replace(':', ''), 16)
if mac:
return mac
except (ValueError, IndexError):
# Virtual interfaces, such as those provided by
# VPNs, do not have a colon-delimited MAC address
# as expected, but a 16-byte HWAddr separated by
# dashes. These should be ignored in favor of a
# real MAC address
pass
except IOError:
pass
except OSError:
# this is because we get a WindowsError on IronPython
pass
def _ifconfig_getnode():
"""Get the hardware address on Unix by running ifconfig."""
# This works on Linux ('' or '-a'), Tru64 ('-av'), but not all Unixes.
keywords = ('hwaddr', 'ether', 'address:', 'lladdr')
for args in ('', '-a', '-av'):
mac = _find_mac('ifconfig', args, keywords, lambda i: i+1)
if mac:
return mac
def _arp_getnode():
"""Get the hardware address on Unix by running arp."""
import os, socket
try:
ip_addr = socket.gethostbyname(socket.gethostname())
except EnvironmentError:
return None
# Try getting the MAC addr from arp based on our IP address (Solaris).
mac = _find_mac('arp', '-an', [ip_addr], lambda i: -1)
if mac:
return mac
# This works on OpenBSD
mac = _find_mac('arp', '-an', [ip_addr], lambda i: i+1)
if mac:
return mac
# This works on Linux, FreeBSD and NetBSD
mac = _find_mac('arp', '-an', ['(%s)' % ip_addr],
lambda i: i+2)
if mac:
return mac
def _lanscan_getnode():
"""Get the hardware address on Unix by running lanscan."""
# This might work on HP-UX.
return _find_mac('lanscan', '-ai', ['lan0'], lambda i: 0)
def _netstat_getnode():
"""Get the hardware address on Unix by running netstat."""
# This might work on AIX, Tru64 UNIX and presumably on IRIX.
try:
pipe = _popen('netstat', '-ia')
if not pipe:
return
with pipe:
words = pipe.readline().rstrip().split()
try:
i = words.index('Address')
except ValueError:
return
for line in pipe:
try:
words = line.rstrip().split()
word = words[i]
if len(word) == 17 and word.count(':') == 5:
mac = int(word.replace(':', ''), 16)
if mac:
return mac
except (ValueError, IndexError):
pass
except OSError:
pass
def _ipconfig_getnode():
"""Get the hardware address on Windows by running ipconfig.exe."""
import os, re
dirs = ['', r'c:\windows\system32', r'c:\winnt\system32']
try:
import ctypes
buffer = ctypes.create_string_buffer(300)
ctypes.windll.kernel32.GetSystemDirectoryA(buffer, 300)
dirs.insert(0, buffer.value.decode('mbcs'))
except:
pass
for dir in dirs:
try:
pipe = os.popen(os.path.join(dir, 'ipconfig') + ' /all')
except IOError:
continue
with pipe:
for line in pipe:
value = line.split(':')[-1].strip().lower()
if re.match('(?:[0-9a-f][0-9a-f]-){5}[0-9a-f][0-9a-f]$', value):
return int(value.replace('-', ''), 16)
def _netbios_getnode():
"""Get the hardware address on Windows using NetBIOS calls.
See http://support.microsoft.com/kb/118623 for details."""
import win32wnet, netbios
ncb = netbios.NCB()
ncb.Command = netbios.NCBENUM
ncb.Buffer = adapters = netbios.LANA_ENUM()
adapters._pack()
if win32wnet.Netbios(ncb) != 0:
return
adapters._unpack()
for i in range(adapters.length):
ncb.Reset()
ncb.Command = netbios.NCBRESET
ncb.Lana_num = ord(adapters.lana[i])
if win32wnet.Netbios(ncb) != 0:
continue
ncb.Reset()
ncb.Command = netbios.NCBASTAT
ncb.Lana_num = ord(adapters.lana[i])
ncb.Callname = '*'.ljust(16)
ncb.Buffer = status = netbios.ADAPTER_STATUS()
if win32wnet.Netbios(ncb) != 0:
continue
status._unpack()
bytes = map(ord, status.adapter_address)
return ((bytes[0]<<40L) + (bytes[1]<<32L) + (bytes[2]<<24L) +
(bytes[3]<<16L) + (bytes[4]<<8L) + bytes[5])
# Thanks to Thomas Heller for ctypes and for his help with its use here.
# If ctypes is available, use it to find system routines for UUID generation.
_uuid_generate_time = _UuidCreate = None
try:
import ctypes, ctypes.util
import sys, os
# The uuid_generate_* routines are provided by libuuid on at least
# Linux and FreeBSD, and provided by libc on Mac OS X.
_libnames = ['uuid']
if not sys.platform.startswith('win') and os.name != 'nt':
_libnames.append('c')
for libname in _libnames:
try:
lib = ctypes.CDLL(ctypes.util.find_library(libname))
except:
continue
if hasattr(lib, 'uuid_generate_time'):
_uuid_generate_time = lib.uuid_generate_time
break
del _libnames
# The uuid_generate_* functions are broken on MacOS X 10.5, as noted
# in issue #8621 the function generates the same sequence of values
# in the parent process and all children created using fork (unless
# those children use exec as well).
#
# Assume that the uuid_generate functions are broken from 10.5 onward,
# the test can be adjusted when a later version is fixed.
if sys.platform == 'darwin':
import os
if int(os.uname()[2].split('.')[0]) >= 9:
_uuid_generate_time = None
# On Windows prior to 2000, UuidCreate gives a UUID containing the
# hardware address. On Windows 2000 and later, UuidCreate makes a
# random UUID and UuidCreateSequential gives a UUID containing the
# hardware address. These routines are provided by the RPC runtime.
# NOTE: at least on Tim's WinXP Pro SP2 desktop box, while the last
# 6 bytes returned by UuidCreateSequential are fixed, they don't appear
# to bear any relationship to the MAC address of any network device
# on the box.
try:
lib = ctypes.windll.rpcrt4
except:
lib = None
_UuidCreate = getattr(lib, 'UuidCreateSequential',
getattr(lib, 'UuidCreate', None))
except:
pass
def _unixdll_getnode():
"""Get the hardware address on Unix using ctypes."""
_buffer = ctypes.create_string_buffer(16)
_uuid_generate_time(_buffer)
return UUID(bytes=_buffer.raw).node
def _windll_getnode():
"""Get the hardware address on Windows using ctypes."""
_buffer = ctypes.create_string_buffer(16)
if _UuidCreate(_buffer) == 0:
return UUID(bytes=_buffer.raw).node
def _random_getnode():
"""Get a random node ID, with eighth bit set as suggested by RFC 4122."""
import random
return random.randrange(0, 1<<48L) | 0x010000000000L
_node = None
_NODE_GETTERS_WIN32 = [_windll_getnode, _netbios_getnode, _ipconfig_getnode]
_NODE_GETTERS_UNIX = [_unixdll_getnode, _ifconfig_getnode, _arp_getnode,
_lanscan_getnode, _netstat_getnode]
def getnode():
"""Get the hardware address as a 48-bit positive integer.
The first time this runs, it may launch a separate program, which could
be quite slow. If all attempts to obtain the hardware address fail, we
choose a random 48-bit number with its eighth bit set to 1 as recommended
in RFC 4122.
"""
global _node
if _node is not None:
return _node
import sys
if sys.platform == 'win32':
getters = _NODE_GETTERS_WIN32
else:
getters = _NODE_GETTERS_UNIX
for getter in getters + [_random_getnode]:
try:
_node = getter()
except:
continue
if (_node is not None) and (0 <= _node < (1 << 48)):
return _node
assert False, '_random_getnode() returned invalid value: {}'.format(_node)
_last_timestamp = None
def uuid1(node=None, clock_seq=None):
"""Generate a UUID from a host ID, sequence number, and the current time.
If 'node' is not given, getnode() is used to obtain the hardware
address. If 'clock_seq' is given, it is used as the sequence number;
otherwise a random 14-bit sequence number is chosen."""
# When the system provides a version-1 UUID generator, use it (but don't
# use UuidCreate here because its UUIDs don't conform to RFC 4122).
if _uuid_generate_time and node is clock_seq is None:
_buffer = ctypes.create_string_buffer(16)
_uuid_generate_time(_buffer)
return UUID(bytes=_buffer.raw)
global _last_timestamp
import time
nanoseconds = int(time.time() * 1e9)
# 0x01b21dd213814000 is the number of 100-ns intervals between the
# UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00.
timestamp = int(nanoseconds//100) + 0x01b21dd213814000L
if _last_timestamp is not None and timestamp <= _last_timestamp:
timestamp = _last_timestamp + 1
_last_timestamp = timestamp
if clock_seq is None:
import random
clock_seq = random.randrange(1<<14L) # instead of stable storage
time_low = timestamp & 0xffffffffL
time_mid = (timestamp >> 32L) & 0xffffL
time_hi_version = (timestamp >> 48L) & 0x0fffL
clock_seq_low = clock_seq & 0xffL
clock_seq_hi_variant = (clock_seq >> 8L) & 0x3fL
if node is None:
node = getnode()
return UUID(fields=(time_low, time_mid, time_hi_version,
clock_seq_hi_variant, clock_seq_low, node), version=1)
def uuid3(namespace, name):
"""Generate a UUID from the MD5 hash of a namespace UUID and a name."""
from hashlib import md5
hash = md5(namespace.bytes + name).digest()
return UUID(bytes=hash[:16], version=3)
def uuid4():
"""Generate a random UUID."""
return UUID(bytes=os.urandom(16), version=4)
def uuid5(namespace, name):
"""Generate a UUID from the SHA-1 hash of a namespace UUID and a name."""
from hashlib import sha1
hash = sha1(namespace.bytes + name).digest()
return UUID(bytes=hash[:16], version=5)
# The following standard UUIDs are for use with uuid3() or uuid5().
NAMESPACE_DNS = UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_URL = UUID('6ba7b811-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_OID = UUID('6ba7b812-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_X500 = UUID('6ba7b814-9dad-11d1-80b4-00c04fd430c8')
"""Python part of the warnings subsystem."""
# Note: function level imports should *not* be used
# in this module as it may cause import lock deadlock.
# See bug 683658.
import linecache
import sys
import types
__all__ = ["warn", "warn_explicit", "showwarning",
"formatwarning", "filterwarnings", "simplefilter",
"resetwarnings", "catch_warnings"]
def warnpy3k(message, category=None, stacklevel=1):
"""Issue a deprecation warning for Python 3.x related changes.
Warnings are omitted unless Python is started with the -3 option.
"""
if sys.py3kwarning:
if category is None:
category = DeprecationWarning
warn(message, category, stacklevel+1)
def _show_warning(message, category, filename, lineno, file=None, line=None):
"""Hook to write a warning to a file; replace if you like."""
if file is None:
file = sys.stderr
if file is None:
# sys.stderr is None - warnings get lost
return
try:
file.write(formatwarning(message, category, filename, lineno, line))
except (IOError, UnicodeError):
pass # the file (probably stderr) is invalid - this warning gets lost.
# Keep a working version around in case the deprecation of the old API is
# triggered.
showwarning = _show_warning
def formatwarning(message, category, filename, lineno, line=None):
"""Function to format a warning the standard way."""
try:
unicodetype = unicode
except NameError:
unicodetype = ()
try:
message = str(message)
except UnicodeEncodeError:
pass
s = "%s: %s: %s\n" % (lineno, category.__name__, message)
line = linecache.getline(filename, lineno) if line is None else line
if line:
line = line.strip()
if isinstance(s, unicodetype) and isinstance(line, str):
line = unicode(line, 'latin1')
s += " %s\n" % line
if isinstance(s, unicodetype) and isinstance(filename, str):
enc = sys.getfilesystemencoding()
if enc:
try:
filename = unicode(filename, enc)
except UnicodeDecodeError:
pass
s = "%s:%s" % (filename, s)
return s
def filterwarnings(action, message="", category=Warning, module="", lineno=0,
append=0):
"""Insert an entry into the list of warnings filters (at the front).
'action' -- one of "error", "ignore", "always", "default", "module",
or "once"
'message' -- a regex that the warning message must match
'category' -- a class that the warning must be a subclass of
'module' -- a regex that the module name must match
'lineno' -- an integer line number, 0 matches all warnings
'append' -- if true, append to the list of filters
"""
import re
assert action in ("error", "ignore", "always", "default", "module",
"once"), "invalid action: %r" % (action,)
assert isinstance(message, basestring), "message must be a string"
assert isinstance(category, (type, types.ClassType)), \
"category must be a class"
assert issubclass(category, Warning), "category must be a Warning subclass"
assert isinstance(module, basestring), "module must be a string"
assert isinstance(lineno, (int, long)) and lineno >= 0, \
"lineno must be an int >= 0"
item = (action, re.compile(message, re.I), category,
re.compile(module), int(lineno))
if append:
filters.append(item)
else:
filters.insert(0, item)
def simplefilter(action, category=Warning, lineno=0, append=0):
"""Insert a simple entry into the list of warnings filters (at the front).
A simple filter matches all modules and messages.
'action' -- one of "error", "ignore", "always", "default", "module",
or "once"
'category' -- a class that the warning must be a subclass of
'lineno' -- an integer line number, 0 matches all warnings
'append' -- if true, append to the list of filters
"""
assert action in ("error", "ignore", "always", "default", "module",
"once"), "invalid action: %r" % (action,)
assert isinstance(lineno, (int, long)) and lineno >= 0, \
"lineno must be an int >= 0"
item = (action, None, category, None, int(lineno))
if append:
filters.append(item)
else:
filters.insert(0, item)
def resetwarnings():
"""Clear the list of warning filters, so that no filters are active."""
filters[:] = []
class _OptionError(Exception):
"""Exception used by option processing helpers."""
pass
# Helper to process -W options passed via sys.warnoptions
def _processoptions(args):
for arg in args:
try:
_setoption(arg)
except _OptionError, msg:
print >>sys.stderr, "Invalid -W option ignored:", msg
# Helper for _processoptions()
def _setoption(arg):
import re
parts = arg.split(':')
if len(parts) > 5:
raise _OptionError("too many fields (max 5): %r" % (arg,))
while len(parts) < 5:
parts.append('')
action, message, category, module, lineno = [s.strip()
for s in parts]
action = _getaction(action)
message = re.escape(message)
category = _getcategory(category)
module = re.escape(module)
if module:
module = module + '$'
if lineno:
try:
lineno = int(lineno)
if lineno < 0:
raise ValueError
except (ValueError, OverflowError):
raise _OptionError("invalid lineno %r" % (lineno,))
else:
lineno = 0
filterwarnings(action, message, category, module, lineno)
# Helper for _setoption()
def _getaction(action):
if not action:
return "default"
if action == "all": return "always" # Alias
for a in ('default', 'always', 'ignore', 'module', 'once', 'error'):
if a.startswith(action):
return a
raise _OptionError("invalid action: %r" % (action,))
# Helper for _setoption()
def _getcategory(category):
import re
if not category:
return Warning
if re.match("^[a-zA-Z0-9_]+$", category):
try:
cat = eval(category)
except NameError:
raise _OptionError("unknown warning category: %r" % (category,))
else:
i = category.rfind(".")
module = category[:i]
klass = category[i+1:]
try:
m = __import__(module, None, None, [klass])
except ImportError:
raise _OptionError("invalid module name: %r" % (module,))
try:
cat = getattr(m, klass)
except AttributeError:
raise _OptionError("unknown warning category: %r" % (category,))
if not issubclass(cat, Warning):
raise _OptionError("invalid warning category: %r" % (category,))
return cat
# Code typically replaced by _warnings
def warn(message, category=None, stacklevel=1):
"""Issue a warning, or maybe ignore it or raise an exception."""
# Check if message is already a Warning object
if isinstance(message, Warning):
category = message.__class__
# Check category argument
if category is None:
category = UserWarning
assert issubclass(category, Warning)
# Get context information
try:
caller = sys._getframe(stacklevel)
except ValueError:
globals = sys.__dict__
lineno = 1
else:
globals = caller.f_globals
lineno = caller.f_lineno
if '__name__' in globals:
module = globals['__name__']
else:
module = "<string>"
filename = globals.get('__file__')
if filename:
fnl = filename.lower()
if fnl.endswith((".pyc", ".pyo")):
filename = filename[:-1]
else:
if module == "__main__":
try:
filename = sys.argv[0]
except AttributeError:
# embedded interpreters don't have sys.argv, see bug #839151
filename = '__main__'
if not filename:
filename = module
registry = globals.setdefault("__warningregistry__", {})
warn_explicit(message, category, filename, lineno, module, registry,
globals)
def warn_explicit(message, category, filename, lineno,
module=None, registry=None, module_globals=None):
lineno = int(lineno)
if module is None:
module = filename or "<unknown>"
if module[-3:].lower() == ".py":
module = module[:-3] # XXX What about leading pathname?
if registry is None:
registry = {}
if isinstance(message, Warning):
text = str(message)
category = message.__class__
else:
text = message
message = category(message)
key = (text, category, lineno)
# Quick test for common case
if registry.get(key):
return
# Search the filters
for item in filters:
action, msg, cat, mod, ln = item
if ((msg is None or msg.match(text)) and
issubclass(category, cat) and
(mod is None or mod.match(module)) and
(ln == 0 or lineno == ln)):
break
else:
action = defaultaction
# Early exit actions
if action == "ignore":
registry[key] = 1
return
# Prime the linecache for formatting, in case the
# "file" is actually in a zipfile or something.
linecache.getlines(filename, module_globals)
if action == "error":
raise message
# Other actions
if action == "once":
registry[key] = 1
oncekey = (text, category)
if onceregistry.get(oncekey):
return
onceregistry[oncekey] = 1
elif action == "always":
pass
elif action == "module":
registry[key] = 1
altkey = (text, category, 0)
if registry.get(altkey):
return
registry[altkey] = 1
elif action == "default":
registry[key] = 1
else:
# Unrecognized actions are errors
raise RuntimeError(
"Unrecognized action (%r) in warnings.filters:\n %s" %
(action, item))
# Print message and context
showwarning(message, category, filename, lineno)
class WarningMessage(object):
"""Holds the result of a single showwarning() call."""
_WARNING_DETAILS = ("message", "category", "filename", "lineno", "file",
"line")
def __init__(self, message, category, filename, lineno, file=None,
line=None):
self.message = message
self.category = category
self.filename = filename
self.lineno = lineno
self.file = file
self.line = line
self._category_name = category.__name__ if category else None
def __str__(self):
return ("{message : %r, category : %r, filename : %r, lineno : %s, "
"line : %r}" % (self.message, self._category_name,
self.filename, self.lineno, self.line))
class catch_warnings(object):
"""A context manager that copies and restores the warnings filter upon
exiting the context.
The 'record' argument specifies whether warnings should be captured by a
custom implementation of warnings.showwarning() and be appended to a list
returned by the context manager. Otherwise None is returned by the context
manager. The objects appended to the list are arguments whose attributes
mirror the arguments to showwarning().
The 'module' argument is to specify an alternative module to the module
named 'warnings' and imported under that name. This argument is only useful
when testing the warnings module itself.
"""
def __init__(self, record=False, module=None):
"""Specify whether to record warnings and if an alternative module
should be used other than sys.modules['warnings'].
For compatibility with Python 3.0, please consider all arguments to be
keyword-only.
"""
self._record = record
self._module = sys.modules['warnings'] if module is None else module
self._entered = False
def __repr__(self):
args = []
if self._record:
args.append("record=True")
if self._module is not sys.modules['warnings']:
args.append("module=%r" % self._module)
name = type(self).__name__
return "%s(%s)" % (name, ", ".join(args))
def __enter__(self):
if self._entered:
raise RuntimeError("Cannot enter %r twice" % self)
self._entered = True
self._filters = self._module.filters
self._module.filters = self._filters[:]
self._showwarning = self._module.showwarning
if self._record:
log = []
def showwarning(*args, **kwargs):
log.append(WarningMessage(*args, **kwargs))
self._module.showwarning = showwarning
return log
else:
return None
def __exit__(self, *exc_info):
if not self._entered:
raise RuntimeError("Cannot exit %r without entering first" % self)
self._module.filters = self._filters
self._module.showwarning = self._showwarning
# filters contains a sequence of filter 5-tuples
# The components of the 5-tuple are:
# - an action: error, ignore, always, default, module, or once
# - a compiled regex that must match the warning message
# - a class representing the warning category
# - a compiled regex that must match the module that is being warned
# - a line number for the line being warning, or 0 to mean any line
# If either if the compiled regexs are None, match anything.
_warnings_defaults = False
try:
from _warnings import (filters, default_action, once_registry,
warn, warn_explicit)
defaultaction = default_action
onceregistry = once_registry
_warnings_defaults = True
except ImportError:
filters = []
defaultaction = "default"
onceregistry = {}
# Module initialization
_processoptions(sys.warnoptions)
if not _warnings_defaults:
silence = [ImportWarning, PendingDeprecationWarning]
# Don't silence DeprecationWarning if -3 or -Q was used.
if not sys.py3kwarning and not sys.flags.division_warning:
silence.append(DeprecationWarning)
for cls in silence:
simplefilter("ignore", category=cls)
bytes_warning = sys.flags.bytes_warning
if bytes_warning > 1:
bytes_action = "error"
elif bytes_warning:
bytes_action = "default"
else:
bytes_action = "ignore"
simplefilter(bytes_action, category=BytesWarning, append=1)
del _warnings_defaults
"""Stuff to parse WAVE files.
Usage.
Reading WAVE files:
f = wave.open(file, 'r')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods read(), seek(), and close().
When the setpos() and rewind() methods are not used, the seek()
method is not necessary.
This returns an instance of a class with the following public methods:
getnchannels() -- returns number of audio channels (1 for
mono, 2 for stereo)
getsampwidth() -- returns sample width in bytes
getframerate() -- returns sampling frequency
getnframes() -- returns number of audio frames
getcomptype() -- returns compression type ('NONE' for linear samples)
getcompname() -- returns human-readable version of
compression type ('not compressed' linear samples)
getparams() -- returns a tuple consisting of all of the
above in the above order
getmarkers() -- returns None (for compatibility with the
aifc module)
getmark(id) -- raises an error since the mark does not
exist (for compatibility with the aifc module)
readframes(n) -- returns at most n frames of audio
rewind() -- rewind to the beginning of the audio stream
setpos(pos) -- seek to the specified position
tell() -- return the current position
close() -- close the instance (make it unusable)
The position returned by tell() and the position given to setpos()
are compatible and have nothing to do with the actual position in the
file.
The close() method is called automatically when the class instance
is destroyed.
Writing WAVE files:
f = wave.open(file, 'w')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods write(), tell(), seek(), and
close().
This returns an instance of a class with the following public methods:
setnchannels(n) -- set the number of channels
setsampwidth(n) -- set the sample width
setframerate(n) -- set the frame rate
setnframes(n) -- set the number of frames
setcomptype(type, name)
-- set the compression type and the
human-readable compression type
setparams(tuple)
-- set all parameters at once
tell() -- return current position in output file
writeframesraw(data)
-- write audio frames without pathing up the
file header
writeframes(data)
-- write audio frames and patch up the file header
close() -- patch up the file header and close the
output file
You should set the parameters before the first writeframesraw or
writeframes. The total number of frames does not need to be set,
but when it is set to the correct value, the header does not have to
be patched up.
It is best to first set all parameters, perhaps possibly the
compression type, and then write audio frames using writeframesraw.
When all frames have been written, either call writeframes('') or
close() to patch up the sizes in the header.
The close() method is called automatically when the class instance
is destroyed.
"""
import __builtin__
__all__ = ["open", "openfp", "Error"]
class Error(Exception):
pass
WAVE_FORMAT_PCM = 0x0001
_array_fmts = None, 'b', 'h', None, 'i'
import struct
import sys
from chunk import Chunk
def _byteswap3(data):
ba = bytearray(data)
ba[::3] = data[2::3]
ba[2::3] = data[::3]
return bytes(ba)
class Wave_read:
"""Variables used in this class:
These variables are available to the user though appropriate
methods of this class:
_file -- the open file with methods read(), close(), and seek()
set through the __init__() method
_nchannels -- the number of audio channels
available through the getnchannels() method
_nframes -- the number of audio frames
available through the getnframes() method
_sampwidth -- the number of bytes per audio sample
available through the getsampwidth() method
_framerate -- the sampling frequency
available through the getframerate() method
_comptype -- the AIFF-C compression type ('NONE' if AIFF)
available through the getcomptype() method
_compname -- the human-readable AIFF-C compression type
available through the getcomptype() method
_soundpos -- the position in the audio stream
available through the tell() method, set through the
setpos() method
These variables are used internally only:
_fmt_chunk_read -- 1 iff the FMT chunk has been read
_data_seek_needed -- 1 iff positioned correctly in audio
file for readframes()
_data_chunk -- instantiation of a chunk class for the DATA chunk
_framesize -- size of one frame in the file
"""
def initfp(self, file):
self._convert = None
self._soundpos = 0
self._file = Chunk(file, bigendian = 0)
if self._file.getname() != 'RIFF':
raise Error, 'file does not start with RIFF id'
if self._file.read(4) != 'WAVE':
raise Error, 'not a WAVE file'
self._fmt_chunk_read = 0
self._data_chunk = None
while 1:
self._data_seek_needed = 1
try:
chunk = Chunk(self._file, bigendian = 0)
except EOFError:
break
chunkname = chunk.getname()
if chunkname == 'fmt ':
self._read_fmt_chunk(chunk)
self._fmt_chunk_read = 1
elif chunkname == 'data':
if not self._fmt_chunk_read:
raise Error, 'data chunk before fmt chunk'
self._data_chunk = chunk
self._nframes = chunk.chunksize // self._framesize
self._data_seek_needed = 0
break
chunk.skip()
if not self._fmt_chunk_read or not self._data_chunk:
raise Error, 'fmt chunk and/or data chunk missing'
def __init__(self, f):
self._i_opened_the_file = None
if isinstance(f, basestring):
f = __builtin__.open(f, 'rb')
self._i_opened_the_file = f
# else, assume it is an open file object already
try:
self.initfp(f)
except:
if self._i_opened_the_file:
f.close()
raise
def __del__(self):
self.close()
#
# User visible methods.
#
def getfp(self):
return self._file
def rewind(self):
self._data_seek_needed = 1
self._soundpos = 0
def close(self):
self._file = None
file = self._i_opened_the_file
if file:
self._i_opened_the_file = None
file.close()
def tell(self):
return self._soundpos
def getnchannels(self):
return self._nchannels
def getnframes(self):
return self._nframes
def getsampwidth(self):
return self._sampwidth
def getframerate(self):
return self._framerate
def getcomptype(self):
return self._comptype
def getcompname(self):
return self._compname
def getparams(self):
return self.getnchannels(), self.getsampwidth(), \
self.getframerate(), self.getnframes(), \
self.getcomptype(), self.getcompname()
def getmarkers(self):
return None
def getmark(self, id):
raise Error, 'no marks'
def setpos(self, pos):
if pos < 0 or pos > self._nframes:
raise Error, 'position not in range'
self._soundpos = pos
self._data_seek_needed = 1
def readframes(self, nframes):
if self._data_seek_needed:
self._data_chunk.seek(0, 0)
pos = self._soundpos * self._framesize
if pos:
self._data_chunk.seek(pos, 0)
self._data_seek_needed = 0
if nframes == 0:
return ''
if self._sampwidth in (2, 4) and sys.byteorder == 'big':
# unfortunately the fromfile() method does not take
# something that only looks like a file object, so
# we have to reach into the innards of the chunk object
import array
chunk = self._data_chunk
data = array.array(_array_fmts[self._sampwidth])
assert data.itemsize == self._sampwidth
nitems = nframes * self._nchannels
if nitems * self._sampwidth > chunk.chunksize - chunk.size_read:
nitems = (chunk.chunksize - chunk.size_read) // self._sampwidth
data.fromfile(chunk.file.file, nitems)
# "tell" data chunk how much was read
chunk.size_read = chunk.size_read + nitems * self._sampwidth
# do the same for the outermost chunk
chunk = chunk.file
chunk.size_read = chunk.size_read + nitems * self._sampwidth
data.byteswap()
data = data.tostring()
else:
data = self._data_chunk.read(nframes * self._framesize)
if self._sampwidth == 3 and sys.byteorder == 'big':
data = _byteswap3(data)
if self._convert and data:
data = self._convert(data)
self._soundpos = self._soundpos + len(data) // (self._nchannels * self._sampwidth)
return data
#
# Internal methods.
#
def _read_fmt_chunk(self, chunk):
wFormatTag, self._nchannels, self._framerate, dwAvgBytesPerSec, wBlockAlign = struct.unpack('<HHLLH', chunk.read(14))
if wFormatTag == WAVE_FORMAT_PCM:
sampwidth = struct.unpack('<H', chunk.read(2))[0]
self._sampwidth = (sampwidth + 7) // 8
else:
raise Error, 'unknown format: %r' % (wFormatTag,)
self._framesize = self._nchannels * self._sampwidth
self._comptype = 'NONE'
self._compname = 'not compressed'
class Wave_write:
"""Variables used in this class:
These variables are user settable through appropriate methods
of this class:
_file -- the open file with methods write(), close(), tell(), seek()
set through the __init__() method
_comptype -- the AIFF-C compression type ('NONE' in AIFF)
set through the setcomptype() or setparams() method
_compname -- the human-readable AIFF-C compression type
set through the setcomptype() or setparams() method
_nchannels -- the number of audio channels
set through the setnchannels() or setparams() method
_sampwidth -- the number of bytes per audio sample
set through the setsampwidth() or setparams() method
_framerate -- the sampling frequency
set through the setframerate() or setparams() method
_nframes -- the number of audio frames written to the header
set through the setnframes() or setparams() method
These variables are used internally only:
_datalength -- the size of the audio samples written to the header
_nframeswritten -- the number of frames actually written
_datawritten -- the size of the audio samples actually written
"""
def __init__(self, f):
self._i_opened_the_file = None
if isinstance(f, basestring):
f = __builtin__.open(f, 'wb')
self._i_opened_the_file = f
try:
self.initfp(f)
except:
if self._i_opened_the_file:
f.close()
raise
def initfp(self, file):
self._file = file
self._convert = None
self._nchannels = 0
self._sampwidth = 0
self._framerate = 0
self._nframes = 0
self._nframeswritten = 0
self._datawritten = 0
self._datalength = 0
self._headerwritten = False
def __del__(self):
self.close()
#
# User visible methods.
#
def setnchannels(self, nchannels):
if self._datawritten:
raise Error, 'cannot change parameters after starting to write'
if nchannels < 1:
raise Error, 'bad # of channels'
self._nchannels = nchannels
def getnchannels(self):
if not self._nchannels:
raise Error, 'number of channels not set'
return self._nchannels
def setsampwidth(self, sampwidth):
if self._datawritten:
raise Error, 'cannot change parameters after starting to write'
if sampwidth < 1 or sampwidth > 4:
raise Error, 'bad sample width'
self._sampwidth = sampwidth
def getsampwidth(self):
if not self._sampwidth:
raise Error, 'sample width not set'
return self._sampwidth
def setframerate(self, framerate):
if self._datawritten:
raise Error, 'cannot change parameters after starting to write'
if framerate <= 0:
raise Error, 'bad frame rate'
self._framerate = framerate
def getframerate(self):
if not self._framerate:
raise Error, 'frame rate not set'
return self._framerate
def setnframes(self, nframes):
if self._datawritten:
raise Error, 'cannot change parameters after starting to write'
self._nframes = nframes
def getnframes(self):
return self._nframeswritten
def setcomptype(self, comptype, compname):
if self._datawritten:
raise Error, 'cannot change parameters after starting to write'
if comptype not in ('NONE',):
raise Error, 'unsupported compression type'
self._comptype = comptype
self._compname = compname
def getcomptype(self):
return self._comptype
def getcompname(self):
return self._compname
def setparams(self, params):
nchannels, sampwidth, framerate, nframes, comptype, compname = params
if self._datawritten:
raise Error, 'cannot change parameters after starting to write'
self.setnchannels(nchannels)
self.setsampwidth(sampwidth)
self.setframerate(framerate)
self.setnframes(nframes)
self.setcomptype(comptype, compname)
def getparams(self):
if not self._nchannels or not self._sampwidth or not self._framerate:
raise Error, 'not all parameters set'
return self._nchannels, self._sampwidth, self._framerate, \
self._nframes, self._comptype, self._compname
def setmark(self, id, pos, name):
raise Error, 'setmark() not supported'
def getmark(self, id):
raise Error, 'no marks'
def getmarkers(self):
return None
def tell(self):
return self._nframeswritten
def writeframesraw(self, data):
self._ensure_header_written(len(data))
nframes = len(data) // (self._sampwidth * self._nchannels)
if self._convert:
data = self._convert(data)
if self._sampwidth in (2, 4) and sys.byteorder == 'big':
import array
a = array.array(_array_fmts[self._sampwidth])
a.fromstring(data)
data = a
assert data.itemsize == self._sampwidth
data.byteswap()
data.tofile(self._file)
self._datawritten = self._datawritten + len(data) * self._sampwidth
else:
if self._sampwidth == 3 and sys.byteorder == 'big':
data = _byteswap3(data)
self._file.write(data)
self._datawritten = self._datawritten + len(data)
self._nframeswritten = self._nframeswritten + nframes
def writeframes(self, data):
self.writeframesraw(data)
if self._datalength != self._datawritten:
self._patchheader()
def close(self):
try:
if self._file:
self._ensure_header_written(0)
if self._datalength != self._datawritten:
self._patchheader()
self._file.flush()
finally:
self._file = None
file = self._i_opened_the_file
if file:
self._i_opened_the_file = None
file.close()
#
# Internal methods.
#
def _ensure_header_written(self, datasize):
if not self._headerwritten:
if not self._nchannels:
raise Error, '# channels not specified'
if not self._sampwidth:
raise Error, 'sample width not specified'
if not self._framerate:
raise Error, 'sampling rate not specified'
self._write_header(datasize)
def _write_header(self, initlength):
assert not self._headerwritten
self._file.write('RIFF')
if not self._nframes:
self._nframes = initlength / (self._nchannels * self._sampwidth)
self._datalength = self._nframes * self._nchannels * self._sampwidth
self._form_length_pos = self._file.tell()
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
36 + self._datalength, 'WAVE', 'fmt ', 16,
WAVE_FORMAT_PCM, self._nchannels, self._framerate,
self._nchannels * self._framerate * self._sampwidth,
self._nchannels * self._sampwidth,
self._sampwidth * 8, 'data'))
self._data_length_pos = self._file.tell()
self._file.write(struct.pack('<L', self._datalength))
self._headerwritten = True
def _patchheader(self):
assert self._headerwritten
if self._datawritten == self._datalength:
return
curpos = self._file.tell()
self._file.seek(self._form_length_pos, 0)
self._file.write(struct.pack('<L', 36 + self._datawritten))
self._file.seek(self._data_length_pos, 0)
self._file.write(struct.pack('<L', self._datawritten))
self._file.seek(curpos, 0)
self._datalength = self._datawritten
def open(f, mode=None):
if mode is None:
if hasattr(f, 'mode'):
mode = f.mode
else:
mode = 'rb'
if mode in ('r', 'rb'):
return Wave_read(f)
elif mode in ('w', 'wb'):
return Wave_write(f)
else:
raise Error, "mode must be 'r', 'rb', 'w', or 'wb'"
openfp = open # B/W compatibility
"""Weak reference support for Python.
This module is an implementation of PEP 205:
http://www.python.org/dev/peps/pep-0205/
"""
# Naming convention: Variables named "wr" are weak reference objects;
# they are called this instead of "ref" to avoid name collisions with
# the module-global ref() function imported from _weakref.
import UserDict
from _weakref import (
getweakrefcount,
getweakrefs,
ref,
proxy,
CallableProxyType,
ProxyType,
ReferenceType,
_remove_dead_weakref)
from _weakrefset import WeakSet, _IterationGuard
from exceptions import ReferenceError
ProxyTypes = (ProxyType, CallableProxyType)
__all__ = ["ref", "proxy", "getweakrefcount", "getweakrefs",
"WeakKeyDictionary", "ReferenceError", "ReferenceType", "ProxyType",
"CallableProxyType", "ProxyTypes", "WeakValueDictionary", 'WeakSet']
class WeakValueDictionary(UserDict.UserDict):
"""Mapping class that references values weakly.
Entries in the dictionary will be discarded when no strong
reference to the value exists anymore
"""
# We inherit the constructor without worrying about the input
# dictionary; since it uses our .update() method, we get the right
# checks (if the other dictionary is a WeakValueDictionary,
# objects are unwrapped on the way out, and we always wrap on the
# way in).
def __init__(*args, **kw):
if not args:
raise TypeError("descriptor '__init__' of 'WeakValueDictionary' "
"object needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
self = selfref()
if self is not None:
if self._iterating:
self._pending_removals.append(wr.key)
else:
# Atomic removal is necessary since this function
# can be called asynchronously by the GC
_atomic_removal(self.data, wr.key)
self._remove = remove
# A list of keys to be removed
self._pending_removals = []
self._iterating = set()
UserDict.UserDict.__init__(self, *args, **kw)
def _commit_removals(self):
l = self._pending_removals
d = self.data
# We shouldn't encounter any KeyError, because this method should
# always be called *before* mutating the dict.
while l:
key = l.pop()
_remove_dead_weakref(d, key)
def __getitem__(self, key):
if self._pending_removals:
self._commit_removals()
o = self.data[key]()
if o is None:
raise KeyError, key
else:
return o
def __delitem__(self, key):
if self._pending_removals:
self._commit_removals()
del self.data[key]
def __contains__(self, key):
if self._pending_removals:
self._commit_removals()
try:
o = self.data[key]()
except KeyError:
return False
return o is not None
def has_key(self, key):
if self._pending_removals:
self._commit_removals()
try:
o = self.data[key]()
except KeyError:
return False
return o is not None
def __repr__(self):
return "<WeakValueDictionary at %s>" % id(self)
def __setitem__(self, key, value):
if self._pending_removals:
self._commit_removals()
self.data[key] = KeyedRef(value, self._remove, key)
def clear(self):
if self._pending_removals:
self._commit_removals()
self.data.clear()
def copy(self):
if self._pending_removals:
self._commit_removals()
new = WeakValueDictionary()
for key, wr in self.data.items():
o = wr()
if o is not None:
new[key] = o
return new
__copy__ = copy
def __deepcopy__(self, memo):
from copy import deepcopy
if self._pending_removals:
self._commit_removals()
new = self.__class__()
for key, wr in self.data.items():
o = wr()
if o is not None:
new[deepcopy(key, memo)] = o
return new
def get(self, key, default=None):
if self._pending_removals:
self._commit_removals()
try:
wr = self.data[key]
except KeyError:
return default
else:
o = wr()
if o is None:
# This should only happen
return default
else:
return o
def items(self):
if self._pending_removals:
self._commit_removals()
L = []
for key, wr in self.data.items():
o = wr()
if o is not None:
L.append((key, o))
return L
def iteritems(self):
if self._pending_removals:
self._commit_removals()
with _IterationGuard(self):
for wr in self.data.itervalues():
value = wr()
if value is not None:
yield wr.key, value
def iterkeys(self):
if self._pending_removals:
self._commit_removals()
with _IterationGuard(self):
for k in self.data.iterkeys():
yield k
__iter__ = iterkeys
def itervaluerefs(self):
"""Return an iterator that yields the weak references to the values.
The references are not guaranteed to be 'live' at the time
they are used, so the result of calling the references needs
to be checked before being used. This can be used to avoid
creating references that will cause the garbage collector to
keep the values around longer than needed.
"""
if self._pending_removals:
self._commit_removals()
with _IterationGuard(self):
for wr in self.data.itervalues():
yield wr
def itervalues(self):
if self._pending_removals:
self._commit_removals()
with _IterationGuard(self):
for wr in self.data.itervalues():
obj = wr()
if obj is not None:
yield obj
def popitem(self):
if self._pending_removals:
self._commit_removals()
while 1:
key, wr = self.data.popitem()
o = wr()
if o is not None:
return key, o
def pop(self, key, *args):
if self._pending_removals:
self._commit_removals()
try:
o = self.data.pop(key)()
except KeyError:
o = None
if o is None:
if args:
return args[0]
else:
raise KeyError, key
else:
return o
def setdefault(self, key, default=None):
if self._pending_removals:
self._commit_removals()
try:
o = self.data[key]()
except KeyError:
o = None
if o is None:
self.data[key] = KeyedRef(default, self._remove, key)
return default
else:
return o
def update(*args, **kwargs):
if not args:
raise TypeError("descriptor 'update' of 'WeakValueDictionary' "
"object needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
dict = args[0] if args else None
if self._pending_removals:
self._commit_removals()
d = self.data
if dict is not None:
if not hasattr(dict, "items"):
dict = type({})(dict)
for key, o in dict.items():
d[key] = KeyedRef(o, self._remove, key)
if len(kwargs):
self.update(kwargs)
def valuerefs(self):
"""Return a list of weak references to the values.
The references are not guaranteed to be 'live' at the time
they are used, so the result of calling the references needs
to be checked before being used. This can be used to avoid
creating references that will cause the garbage collector to
keep the values around longer than needed.
"""
if self._pending_removals:
self._commit_removals()
return self.data.values()
def values(self):
if self._pending_removals:
self._commit_removals()
L = []
for wr in self.data.values():
o = wr()
if o is not None:
L.append(o)
return L
class KeyedRef(ref):
"""Specialized reference that includes a key corresponding to the value.
This is used in the WeakValueDictionary to avoid having to create
a function object for each key stored in the mapping. A shared
callback object can use the 'key' attribute of a KeyedRef instead
of getting a reference to the key from an enclosing scope.
"""
__slots__ = "key",
def __new__(type, ob, callback, key):
self = ref.__new__(type, ob, callback)
self.key = key
return self
def __init__(self, ob, callback, key):
super(KeyedRef, self).__init__(ob, callback)
class WeakKeyDictionary(UserDict.UserDict):
""" Mapping class that references keys weakly.
Entries in the dictionary will be discarded when there is no
longer a strong reference to the key. This can be used to
associate additional data with an object owned by other parts of
an application without adding attributes to those objects. This
can be especially useful with objects that override attribute
accesses.
"""
def __init__(self, dict=None):
self.data = {}
def remove(k, selfref=ref(self)):
self = selfref()
if self is not None:
if self._iterating:
self._pending_removals.append(k)
else:
del self.data[k]
self._remove = remove
# A list of dead weakrefs (keys to be removed)
self._pending_removals = []
self._iterating = set()
if dict is not None:
self.update(dict)
def _commit_removals(self):
# NOTE: We don't need to call this method before mutating the dict,
# because a dead weakref never compares equal to a live weakref,
# even if they happened to refer to equal objects.
# However, it means keys may already have been removed.
l = self._pending_removals
d = self.data
while l:
try:
del d[l.pop()]
except KeyError:
pass
def __delitem__(self, key):
del self.data[ref(key)]
def __getitem__(self, key):
return self.data[ref(key)]
def __repr__(self):
return "<WeakKeyDictionary at %s>" % id(self)
def __setitem__(self, key, value):
self.data[ref(key, self._remove)] = value
def copy(self):
new = WeakKeyDictionary()
for key, value in self.data.items():
o = key()
if o is not None:
new[o] = value
return new
__copy__ = copy
def __deepcopy__(self, memo):
from copy import deepcopy
new = self.__class__()
for key, value in self.data.items():
o = key()
if o is not None:
new[o] = deepcopy(value, memo)
return new
def get(self, key, default=None):
return self.data.get(ref(key),default)
def has_key(self, key):
try:
wr = ref(key)
except TypeError:
return 0
return wr in self.data
def __contains__(self, key):
try:
wr = ref(key)
except TypeError:
return 0
return wr in self.data
def items(self):
L = []
for key, value in self.data.items():
o = key()
if o is not None:
L.append((o, value))
return L
def iteritems(self):
with _IterationGuard(self):
for wr, value in self.data.iteritems():
key = wr()
if key is not None:
yield key, value
def iterkeyrefs(self):
"""Return an iterator that yields the weak references to the keys.
The references are not guaranteed to be 'live' at the time
they are used, so the result of calling the references needs
to be checked before being used. This can be used to avoid
creating references that will cause the garbage collector to
keep the keys around longer than needed.
"""
with _IterationGuard(self):
for wr in self.data.iterkeys():
yield wr
def iterkeys(self):
with _IterationGuard(self):
for wr in self.data.iterkeys():
obj = wr()
if obj is not None:
yield obj
__iter__ = iterkeys
def itervalues(self):
with _IterationGuard(self):
for value in self.data.itervalues():
yield value
def keyrefs(self):
"""Return a list of weak references to the keys.
The references are not guaranteed to be 'live' at the time
they are used, so the result of calling the references needs
to be checked before being used. This can be used to avoid
creating references that will cause the garbage collector to
keep the keys around longer than needed.
"""
return self.data.keys()
def keys(self):
L = []
for wr in self.data.keys():
o = wr()
if o is not None:
L.append(o)
return L
def popitem(self):
while 1:
key, value = self.data.popitem()
o = key()
if o is not None:
return o, value
def pop(self, key, *args):
return self.data.pop(ref(key), *args)
def setdefault(self, key, default=None):
return self.data.setdefault(ref(key, self._remove),default)
def update(self, dict=None, **kwargs):
d = self.data
if dict is not None:
if not hasattr(dict, "items"):
dict = type({})(dict)
for key, value in dict.items():
d[ref(key, self._remove)] = value
if len(kwargs):
self.update(kwargs)
#! /usr/bin/env python
"""Interfaces for launching and remotely controlling Web browsers."""
# Maintained by Georg Brandl.
import os
import shlex
import sys
import stat
import subprocess
import time
__all__ = ["Error", "open", "open_new", "open_new_tab", "get", "register"]
class Error(Exception):
pass
_browsers = {} # Dictionary of available browser controllers
_tryorder = [] # Preference order of available browsers
def register(name, klass, instance=None, update_tryorder=1):
"""Register a browser connector and, optionally, connection."""
_browsers[name.lower()] = [klass, instance]
if update_tryorder > 0:
_tryorder.append(name)
elif update_tryorder < 0:
_tryorder.insert(0, name)
def get(using=None):
"""Return a browser launcher instance appropriate for the environment."""
if using is not None:
alternatives = [using]
else:
alternatives = _tryorder
for browser in alternatives:
if '%s' in browser:
# User gave us a command line, split it into name and args
browser = shlex.split(browser)
if browser[-1] == '&':
return BackgroundBrowser(browser[:-1])
else:
return GenericBrowser(browser)
else:
# User gave us a browser name or path.
try:
command = _browsers[browser.lower()]
except KeyError:
command = _synthesize(browser)
if command[1] is not None:
return command[1]
elif command[0] is not None:
return command[0]()
raise Error("could not locate runnable browser")
# Please note: the following definition hides a builtin function.
# It is recommended one does "import webbrowser" and uses webbrowser.open(url)
# instead of "from webbrowser import *".
def open(url, new=0, autoraise=True):
for name in _tryorder:
browser = get(name)
if browser.open(url, new, autoraise):
return True
return False
def open_new(url):
return open(url, 1)
def open_new_tab(url):
return open(url, 2)
def _synthesize(browser, update_tryorder=1):
"""Attempt to synthesize a controller base on existing controllers.
This is useful to create a controller when a user specifies a path to
an entry in the BROWSER environment variable -- we can copy a general
controller to operate using a specific installation of the desired
browser in this way.
If we can't create a controller in this way, or if there is no
executable for the requested browser, return [None, None].
"""
cmd = browser.split()[0]
if not _iscommand(cmd):
return [None, None]
name = os.path.basename(cmd)
try:
command = _browsers[name.lower()]
except KeyError:
return [None, None]
# now attempt to clone to fit the new name:
controller = command[1]
if controller and name.lower() == controller.basename:
import copy
controller = copy.copy(controller)
controller.name = browser
controller.basename = os.path.basename(browser)
register(browser, None, controller, update_tryorder)
return [None, controller]
return [None, None]
if sys.platform[:3] == "win" or (sys.platform == 'cli' and os.name == 'nt'):
def _isexecutable(cmd):
cmd = cmd.lower()
if os.path.isfile(cmd) and cmd.endswith((".exe", ".bat")):
return True
for ext in ".exe", ".bat":
if os.path.isfile(cmd + ext):
return True
return False
else:
def _isexecutable(cmd):
if os.path.isfile(cmd):
mode = os.stat(cmd)[stat.ST_MODE]
if mode & stat.S_IXUSR or mode & stat.S_IXGRP or mode & stat.S_IXOTH:
return True
return False
def _iscommand(cmd):
"""Return True if cmd is executable or can be found on the executable
search path."""
if _isexecutable(cmd):
return True
path = os.environ.get("PATH")
if not path:
return False
for d in path.split(os.pathsep):
exe = os.path.join(d, cmd)
if _isexecutable(exe):
return True
return False
# General parent classes
class BaseBrowser(object):
"""Parent class for all browsers. Do not use directly."""
args = ['%s']
def __init__(self, name=""):
self.name = name
self.basename = name
def open(self, url, new=0, autoraise=True):
raise NotImplementedError
def open_new(self, url):
return self.open(url, 1)
def open_new_tab(self, url):
return self.open(url, 2)
class GenericBrowser(BaseBrowser):
"""Class for all browsers started with a command
and without remote functionality."""
def __init__(self, name):
if isinstance(name, basestring):
self.name = name
self.args = ["%s"]
else:
# name should be a list with arguments
self.name = name[0]
self.args = name[1:]
self.basename = os.path.basename(self.name)
def open(self, url, new=0, autoraise=True):
cmdline = [self.name] + [arg.replace("%s", url)
for arg in self.args]
try:
if sys.platform[:3] == "win" or (sys.platform == 'cli' and os.name == 'nt'):
p = subprocess.Popen(cmdline)
else:
p = subprocess.Popen(cmdline, close_fds=True)
return not p.wait()
except OSError:
return False
class BackgroundBrowser(GenericBrowser):
"""Class for all browsers which are to be started in the
background."""
def open(self, url, new=0, autoraise=True):
cmdline = [self.name] + [arg.replace("%s", url)
for arg in self.args]
try:
if sys.platform[:3] == "win" or (sys.platform == 'cli' and os.name == 'nt'):
p = subprocess.Popen(cmdline)
else:
setsid = getattr(os, 'setsid', None)
if not setsid:
setsid = getattr(os, 'setpgrp', None)
p = subprocess.Popen(cmdline, close_fds=True, preexec_fn=setsid)
return (p.poll() is None)
except OSError:
return False
class UnixBrowser(BaseBrowser):
"""Parent class for all Unix browsers with remote functionality."""
raise_opts = None
remote_args = ['%action', '%s']
remote_action = None
remote_action_newwin = None
remote_action_newtab = None
background = False
redirect_stdout = True
def _invoke(self, args, remote, autoraise):
raise_opt = []
if remote and self.raise_opts:
# use autoraise argument only for remote invocation
autoraise = int(autoraise)
opt = self.raise_opts[autoraise]
if opt: raise_opt = [opt]
cmdline = [self.name] + raise_opt + args
if remote or self.background:
inout = file(os.devnull, "r+")
else:
# for TTY browsers, we need stdin/out
inout = None
# if possible, put browser in separate process group, so
# keyboard interrupts don't affect browser as well as Python
setsid = getattr(os, 'setsid', None)
if not setsid:
setsid = getattr(os, 'setpgrp', None)
p = subprocess.Popen(cmdline, close_fds=True, stdin=inout,
stdout=(self.redirect_stdout and inout or None),
stderr=inout, preexec_fn=setsid)
if remote:
# wait five seconds. If the subprocess is not finished, the
# remote invocation has (hopefully) started a new instance.
time.sleep(1)
rc = p.poll()
if rc is None:
time.sleep(4)
rc = p.poll()
if rc is None:
return True
# if remote call failed, open() will try direct invocation
return not rc
elif self.background:
if p.poll() is None:
return True
else:
return False
else:
return not p.wait()
def open(self, url, new=0, autoraise=True):
if new == 0:
action = self.remote_action
elif new == 1:
action = self.remote_action_newwin
elif new == 2:
if self.remote_action_newtab is None:
action = self.remote_action_newwin
else:
action = self.remote_action_newtab
else:
raise Error("Bad 'new' parameter to open(); " +
"expected 0, 1, or 2, got %s" % new)
args = [arg.replace("%s", url).replace("%action", action)
for arg in self.remote_args]
success = self._invoke(args, True, autoraise)
if not success:
# remote invocation failed, try straight way
args = [arg.replace("%s", url) for arg in self.args]
return self._invoke(args, False, False)
else:
return True
class Mozilla(UnixBrowser):
"""Launcher class for Mozilla/Netscape browsers."""
raise_opts = ["-noraise", "-raise"]
remote_args = ['-remote', 'openURL(%s%action)']
remote_action = ""
remote_action_newwin = ",new-window"
remote_action_newtab = ",new-tab"
background = True
Netscape = Mozilla
class Galeon(UnixBrowser):
"""Launcher class for Galeon/Epiphany browsers."""
raise_opts = ["-noraise", ""]
remote_args = ['%action', '%s']
remote_action = "-n"
remote_action_newwin = "-w"
background = True
class Chrome(UnixBrowser):
"Launcher class for Google Chrome browser."
remote_args = ['%action', '%s']
remote_action = ""
remote_action_newwin = "--new-window"
remote_action_newtab = ""
background = True
Chromium = Chrome
class Opera(UnixBrowser):
"Launcher class for Opera browser."
remote_args = ['%action', '%s']
remote_action = ""
remote_action_newwin = "--new-window"
remote_action_newtab = ""
background = True
class Elinks(UnixBrowser):
"Launcher class for Elinks browsers."
remote_args = ['-remote', 'openURL(%s%action)']
remote_action = ""
remote_action_newwin = ",new-window"
remote_action_newtab = ",new-tab"
background = False
# elinks doesn't like its stdout to be redirected -
# it uses redirected stdout as a signal to do -dump
redirect_stdout = False
class Konqueror(BaseBrowser):
"""Controller for the KDE File Manager (kfm, or Konqueror).
See the output of ``kfmclient --commands``
for more information on the Konqueror remote-control interface.
"""
def open(self, url, new=0, autoraise=True):
# XXX Currently I know no way to prevent KFM from opening a new win.
if new == 2:
action = "newTab"
else:
action = "openURL"
devnull = file(os.devnull, "r+")
# if possible, put browser in separate process group, so
# keyboard interrupts don't affect browser as well as Python
setsid = getattr(os, 'setsid', None)
if not setsid:
setsid = getattr(os, 'setpgrp', None)
try:
p = subprocess.Popen(["kfmclient", action, url],
close_fds=True, stdin=devnull,
stdout=devnull, stderr=devnull)
except OSError:
# fall through to next variant
pass
else:
p.wait()
# kfmclient's return code unfortunately has no meaning as it seems
return True
try:
p = subprocess.Popen(["konqueror", "--silent", url],
close_fds=True, stdin=devnull,
stdout=devnull, stderr=devnull,
preexec_fn=setsid)
except OSError:
# fall through to next variant
pass
else:
if p.poll() is None:
# Should be running now.
return True
try:
p = subprocess.Popen(["kfm", "-d", url],
close_fds=True, stdin=devnull,
stdout=devnull, stderr=devnull,
preexec_fn=setsid)
except OSError:
return False
else:
return (p.poll() is None)
class Grail(BaseBrowser):
# There should be a way to maintain a connection to Grail, but the
# Grail remote control protocol doesn't really allow that at this
# point. It probably never will!
def _find_grail_rc(self):
import glob
import pwd
import socket
import tempfile
tempdir = os.path.join(tempfile.gettempdir(),
".grail-unix")
user = pwd.getpwuid(os.getuid())[0]
filename = os.path.join(tempdir, user + "-*")
maybes = glob.glob(filename)
if not maybes:
return None
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
for fn in maybes:
# need to PING each one until we find one that's live
try:
s.connect(fn)
except socket.error:
# no good; attempt to clean it out, but don't fail:
try:
os.unlink(fn)
except IOError:
pass
else:
return s
def _remote(self, action):
s = self._find_grail_rc()
if not s:
return 0
s.send(action)
s.close()
return 1
def open(self, url, new=0, autoraise=True):
if new:
ok = self._remote("LOADNEW " + url)
else:
ok = self._remote("LOAD " + url)
return ok
#
# Platform support for Unix
#
# These are the right tests because all these Unix browsers require either
# a console terminal or an X display to run.
def register_X_browsers():
# use xdg-open if around
if _iscommand("xdg-open"):
register("xdg-open", None, BackgroundBrowser("xdg-open"))
# The default GNOME3 browser
if "GNOME_DESKTOP_SESSION_ID" in os.environ and _iscommand("gvfs-open"):
register("gvfs-open", None, BackgroundBrowser("gvfs-open"))
# The default GNOME browser
if "GNOME_DESKTOP_SESSION_ID" in os.environ and _iscommand("gnome-open"):
register("gnome-open", None, BackgroundBrowser("gnome-open"))
# The default KDE browser
if "KDE_FULL_SESSION" in os.environ and _iscommand("kfmclient"):
register("kfmclient", Konqueror, Konqueror("kfmclient"))
if _iscommand("x-www-browser"):
register("x-www-browser", None, BackgroundBrowser("x-www-browser"))
# The Mozilla/Netscape browsers
for browser in ("mozilla-firefox", "firefox",
"mozilla-firebird", "firebird",
"iceweasel", "iceape",
"seamonkey", "mozilla", "netscape"):
if _iscommand(browser):
register(browser, None, Mozilla(browser))
# Konqueror/kfm, the KDE browser.
if _iscommand("kfm"):
register("kfm", Konqueror, Konqueror("kfm"))
elif _iscommand("konqueror"):
register("konqueror", Konqueror, Konqueror("konqueror"))
# Gnome's Galeon and Epiphany
for browser in ("galeon", "epiphany"):
if _iscommand(browser):
register(browser, None, Galeon(browser))
# Skipstone, another Gtk/Mozilla based browser
if _iscommand("skipstone"):
register("skipstone", None, BackgroundBrowser("skipstone"))
# Google Chrome/Chromium browsers
for browser in ("google-chrome", "chrome", "chromium", "chromium-browser"):
if _iscommand(browser):
register(browser, None, Chrome(browser))
# Opera, quite popular
if _iscommand("opera"):
register("opera", None, Opera("opera"))
# Next, Mosaic -- old but still in use.
if _iscommand("mosaic"):
register("mosaic", None, BackgroundBrowser("mosaic"))
# Grail, the Python browser. Does anybody still use it?
if _iscommand("grail"):
register("grail", Grail, None)
# Prefer X browsers if present
if os.environ.get("DISPLAY"):
register_X_browsers()
# Also try console browsers
if os.environ.get("TERM"):
if _iscommand("www-browser"):
register("www-browser", None, GenericBrowser("www-browser"))
# The Links/elinks browsers <http://artax.karlin.mff.cuni.cz/~mikulas/links/>
if _iscommand("links"):
register("links", None, GenericBrowser("links"))
if _iscommand("elinks"):
register("elinks", None, Elinks("elinks"))
# The Lynx browser <http://lynx.isc.org/>, <http://lynx.browser.org/>
if _iscommand("lynx"):
register("lynx", None, GenericBrowser("lynx"))
# The w3m browser <http://w3m.sourceforge.net/>
if _iscommand("w3m"):
register("w3m", None, GenericBrowser("w3m"))
#
# Platform support for Windows
#
if sys.platform[:3] == "win" or (sys.platform == 'cli' and os.name == 'nt'):
class WindowsDefault(BaseBrowser):
def open(self, url, new=0, autoraise=True):
try:
os.startfile(url)
except WindowsError:
# [Error 22] No application is associated with the specified
# file for this operation: '<URL>'
return False
else:
return True
_tryorder = []
_browsers = {}
# First try to use the default Windows browser
register("windows-default", WindowsDefault)
# Detect some common Windows browsers, fallback to IE
iexplore = os.path.join(os.environ.get("PROGRAMFILES", "C:\\Program Files"),
"Internet Explorer\\IEXPLORE.EXE")
for browser in ("firefox", "firebird", "seamonkey", "mozilla",
"netscape", "opera", iexplore):
if _iscommand(browser):
register(browser, None, BackgroundBrowser(browser))
#
# Platform support for MacOS
#
if sys.platform == 'darwin':
# Adapted from patch submitted to SourceForge by Steven J. Burr
class MacOSX(BaseBrowser):
"""Launcher class for Aqua browsers on Mac OS X
Optionally specify a browser name on instantiation. Note that this
will not work for Aqua browsers if the user has moved the application
package after installation.
If no browser is specified, the default browser, as specified in the
Internet System Preferences panel, will be used.
"""
def __init__(self, name):
self.name = name
def open(self, url, new=0, autoraise=True):
assert "'" not in url
# hack for local urls
if not ':' in url:
url = 'file:'+url
# new must be 0 or 1
new = int(bool(new))
if self.name == "default":
# User called open, open_new or get without a browser parameter
script = 'open location "%s"' % url.replace('"', '%22') # opens in default browser
else:
# User called get and chose a browser
if self.name == "OmniWeb":
toWindow = ""
else:
# Include toWindow parameter of OpenURL command for browsers
# that support it. 0 == new window; -1 == existing
toWindow = "toWindow %d" % (new - 1)
cmd = 'OpenURL "%s"' % url.replace('"', '%22')
script = '''tell application "%s"
activate
%s %s
end tell''' % (self.name, cmd, toWindow)
# Open pipe to AppleScript through osascript command
osapipe = os.popen("osascript", "w")
if osapipe is None:
return False
# Write script to osascript's stdin
osapipe.write(script)
rc = osapipe.close()
return not rc
class MacOSXOSAScript(BaseBrowser):
def __init__(self, name):
self._name = name
def open(self, url, new=0, autoraise=True):
if self._name == 'default':
script = 'open location "%s"' % url.replace('"', '%22') # opens in default browser
else:
script = '''
tell application "%s"
activate
open location "%s"
end
'''%(self._name, url.replace('"', '%22'))
osapipe = os.popen("osascript", "w")
if osapipe is None:
return False
osapipe.write(script)
rc = osapipe.close()
return not rc
# Don't clear _tryorder or _browsers since OS X can use above Unix support
# (but we prefer using the OS X specific stuff)
register("safari", None, MacOSXOSAScript('safari'), -1)
register("firefox", None, MacOSXOSAScript('firefox'), -1)
register("chrome", None, MacOSXOSAScript('chrome'), -1)
register("MacOSX", None, MacOSXOSAScript('default'), -1)
#
# Platform support for OS/2
#
if sys.platform[:3] == "os2" and _iscommand("netscape"):
_tryorder = []
_browsers = {}
register("os2netscape", None,
GenericBrowser(["start", "netscape", "%s"]), -1)
# OK, now that we know what the default preference orders for each
# platform are, allow user to override them with the BROWSER variable.
if "BROWSER" in os.environ:
_userchoices = os.environ["BROWSER"].split(os.pathsep)
_userchoices.reverse()
# Treat choices in same way as if passed into get() but do register
# and prepend to _tryorder
for cmdline in _userchoices:
if cmdline != '':
cmd = _synthesize(cmdline, -1)
if cmd[1] is None:
register(cmdline, None, GenericBrowser(cmdline), -1)
cmdline = None # to make del work if _userchoices was empty
del cmdline
del _userchoices
# what to do if _tryorder is now empty?
def main():
import getopt
usage = """Usage: %s [-n | -t] url
-n: open new window
-t: open new tab""" % sys.argv[0]
try:
opts, args = getopt.getopt(sys.argv[1:], 'ntd')
except getopt.error, msg:
print >>sys.stderr, msg
print >>sys.stderr, usage
sys.exit(1)
new_win = 0
for o, a in opts:
if o == '-n': new_win = 1
elif o == '-t': new_win = 2
if len(args) != 1:
print >>sys.stderr, usage
sys.exit(1)
url = args[0]
open(url, new_win)
print "\a"
if __name__ == "__main__":
main()
# !/usr/bin/env python
"""Guess which db package to use to open a db file."""
import os
import struct
import sys
try:
import dbm
_dbmerror = dbm.error
except ImportError:
dbm = None
# just some sort of valid exception which might be raised in the
# dbm test
_dbmerror = IOError
def whichdb(filename):
"""Guess which db package to use to open a db file.
Return values:
- None if the database file can't be read;
- empty string if the file can be read but can't be recognized
- the module name (e.g. "dbm" or "gdbm") if recognized.
Importing the given module may still fail, and opening the
database using that module may still fail.
"""
# Check for dbm first -- this has a .pag and a .dir file
try:
f = open(filename + os.extsep + "pag", "rb")
f.close()
# dbm linked with gdbm on OS/2 doesn't have .dir file
if not (dbm.library == "GNU gdbm" and sys.platform == "os2emx"):
f = open(filename + os.extsep + "dir", "rb")
f.close()
return "dbm"
except IOError:
# some dbm emulations based on Berkeley DB generate a .db file
# some do not, but they should be caught by the dbhash checks
try:
f = open(filename + os.extsep + "db", "rb")
f.close()
# guarantee we can actually open the file using dbm
# kind of overkill, but since we are dealing with emulations
# it seems like a prudent step
if dbm is not None:
d = dbm.open(filename)
d.close()
return "dbm"
except (IOError, _dbmerror):
pass
# Check for dumbdbm next -- this has a .dir and a .dat file
try:
# First check for presence of files
os.stat(filename + os.extsep + "dat")
size = os.stat(filename + os.extsep + "dir").st_size
# dumbdbm files with no keys are empty
if size == 0:
return "dumbdbm"
f = open(filename + os.extsep + "dir", "rb")
try:
if f.read(1) in ("'", '"'):
return "dumbdbm"
finally:
f.close()
except (OSError, IOError):
pass
# See if the file exists, return None if not
try:
f = open(filename, "rb")
except IOError:
return None
# Read the start of the file -- the magic number
s16 = f.read(16)
f.close()
s = s16[0:4]
# Return "" if not at least 4 bytes
if len(s) != 4:
return ""
# Convert to 4-byte int in native byte order -- return "" if impossible
try:
(magic,) = struct.unpack("=l", s)
except struct.error:
return ""
# Check for GNU dbm
if magic in (0x13579ace, 0x13579acd, 0x13579acf):
return "gdbm"
# Check for old Berkeley db hash file format v2
if magic in (0x00061561, 0x61150600):
return "bsddb185"
# Later versions of Berkeley db hash file have a 12-byte pad in
# front of the file type
try:
(magic,) = struct.unpack("=l", s16[-4:])
except struct.error:
return ""
# Check for BSD hash
if magic in (0x00061561, 0x61150600):
return "dbhash"
# Unknown
return ""
if __name__ == "__main__":
for filename in sys.argv[1:]:
print whichdb(filename) or "UNKNOWN", filename
#-*- coding: ISO-8859-1 -*-
def _():
import sys
if sys.platform == 'cli':
import clr
try:
clr.AddReference('IronPython.Wpf')
except:
pass
_()
del _
from _wpf import *
"""Base classes for server/gateway implementations"""
from types import StringType
from util import FileWrapper, guess_scheme, is_hop_by_hop
from headers import Headers
import sys, os, time
__all__ = ['BaseHandler', 'SimpleHandler', 'BaseCGIHandler', 'CGIHandler']
try:
dict
except NameError:
def dict(items):
d = {}
for k,v in items:
d[k] = v
return d
# Uncomment for 2.2 compatibility.
#try:
# True
# False
#except NameError:
# True = not None
# False = not True
# Weekday and month names for HTTP date/time formatting; always English!
_weekdayname = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
_monthname = [None, # Dummy so we can use 1-based month numbers
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
def format_date_time(timestamp):
year, month, day, hh, mm, ss, wd, y, z = time.gmtime(timestamp)
return "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (
_weekdayname[wd], day, _monthname[month], year, hh, mm, ss
)
class BaseHandler:
"""Manage the invocation of a WSGI application"""
# Configuration parameters; can override per-subclass or per-instance
wsgi_version = (1,0)
wsgi_multithread = True
wsgi_multiprocess = True
wsgi_run_once = False
origin_server = True # We are transmitting direct to client
http_version = "1.0" # Version that should be used for response
server_software = None # String name of server software, if any
# os_environ is used to supply configuration from the OS environment:
# by default it's a copy of 'os.environ' as of import time, but you can
# override this in e.g. your __init__ method.
os_environ = dict(os.environ.items())
# Collaborator classes
wsgi_file_wrapper = FileWrapper # set to None to disable
headers_class = Headers # must be a Headers-like class
# Error handling (also per-subclass or per-instance)
traceback_limit = None # Print entire traceback to self.get_stderr()
error_status = "500 Internal Server Error"
error_headers = [('Content-Type','text/plain')]
error_body = "A server error occurred. Please contact the administrator."
# State variables (don't mess with these)
status = result = None
headers_sent = False
headers = None
bytes_sent = 0
def run(self, application):
"""Invoke the application"""
# Note to self: don't move the close()! Asynchronous servers shouldn't
# call close() from finish_response(), so if you close() anywhere but
# the double-error branch here, you'll break asynchronous servers by
# prematurely closing. Async servers must return from 'run()' without
# closing if there might still be output to iterate over.
try:
self.setup_environ()
self.result = application(self.environ, self.start_response)
self.finish_response()
except:
try:
self.handle_error()
except:
# If we get an error handling an error, just give up already!
self.close()
raise # ...and let the actual server figure it out.
def setup_environ(self):
"""Set up the environment for one request"""
env = self.environ = self.os_environ.copy()
self.add_cgi_vars()
env['wsgi.input'] = self.get_stdin()
env['wsgi.errors'] = self.get_stderr()
env['wsgi.version'] = self.wsgi_version
env['wsgi.run_once'] = self.wsgi_run_once
env['wsgi.url_scheme'] = self.get_scheme()
env['wsgi.multithread'] = self.wsgi_multithread
env['wsgi.multiprocess'] = self.wsgi_multiprocess
if self.wsgi_file_wrapper is not None:
env['wsgi.file_wrapper'] = self.wsgi_file_wrapper
if self.origin_server and self.server_software:
env.setdefault('SERVER_SOFTWARE',self.server_software)
def finish_response(self):
"""Send any iterable data, then close self and the iterable
Subclasses intended for use in asynchronous servers will
want to redefine this method, such that it sets up callbacks
in the event loop to iterate over the data, and to call
'self.close()' once the response is finished.
"""
try:
if not self.result_is_file() or not self.sendfile():
for data in self.result:
self.write(data)
self.finish_content()
finally:
self.close()
def get_scheme(self):
"""Return the URL scheme being used"""
return guess_scheme(self.environ)
def set_content_length(self):
"""Compute Content-Length or switch to chunked encoding if possible"""
try:
blocks = len(self.result)
except (TypeError,AttributeError,NotImplementedError):
pass
else:
if blocks==1:
self.headers['Content-Length'] = str(self.bytes_sent)
return
# XXX Try for chunked encoding if origin server and client is 1.1
def cleanup_headers(self):
"""Make any necessary header changes or defaults
Subclasses can extend this to add other defaults.
"""
if 'Content-Length' not in self.headers:
self.set_content_length()
def start_response(self, status, headers,exc_info=None):
"""'start_response()' callable as specified by PEP 333"""
if exc_info:
try:
if self.headers_sent:
# Re-raise original exception if headers sent
raise exc_info[0], exc_info[1], exc_info[2]
finally:
exc_info = None # avoid dangling circular ref
elif self.headers is not None:
raise AssertionError("Headers already set!")
assert type(status) is StringType,"Status must be a string"
assert len(status)>=4,"Status must be at least 4 characters"
assert int(status[:3]),"Status message must begin w/3-digit code"
assert status[3]==" ", "Status message must have a space after code"
if __debug__:
for name,val in headers:
assert type(name) is StringType,"Header names must be strings"
assert type(val) is StringType,"Header values must be strings"
assert not is_hop_by_hop(name),"Hop-by-hop headers not allowed"
self.status = status
self.headers = self.headers_class(headers)
return self.write
def send_preamble(self):
"""Transmit version/status/date/server, via self._write()"""
if self.origin_server:
if self.client_is_modern():
self._write('HTTP/%s %s\r\n' % (self.http_version,self.status))
if 'Date' not in self.headers:
self._write(
'Date: %s\r\n' % format_date_time(time.time())
)
if self.server_software and 'Server' not in self.headers:
self._write('Server: %s\r\n' % self.server_software)
else:
self._write('Status: %s\r\n' % self.status)
def write(self, data):
"""'write()' callable as specified by PEP 333"""
assert type(data) is StringType,"write() argument must be string"
if not self.status:
raise AssertionError("write() before start_response()")
elif not self.headers_sent:
# Before the first output, send the stored headers
self.bytes_sent = len(data) # make sure we know content-length
self.send_headers()
else:
self.bytes_sent += len(data)
# XXX check Content-Length and truncate if too many bytes written?
self._write(data)
self._flush()
def sendfile(self):
"""Platform-specific file transmission
Override this method in subclasses to support platform-specific
file transmission. It is only called if the application's
return iterable ('self.result') is an instance of
'self.wsgi_file_wrapper'.
This method should return a true value if it was able to actually
transmit the wrapped file-like object using a platform-specific
approach. It should return a false value if normal iteration
should be used instead. An exception can be raised to indicate
that transmission was attempted, but failed.
NOTE: this method should call 'self.send_headers()' if
'self.headers_sent' is false and it is going to attempt direct
transmission of the file.
"""
return False # No platform-specific transmission by default
def finish_content(self):
"""Ensure headers and content have both been sent"""
if not self.headers_sent:
# Only zero Content-Length if not set by the application (so
# that HEAD requests can be satisfied properly, see #3839)
self.headers.setdefault('Content-Length', "0")
self.send_headers()
else:
pass # XXX check if content-length was too short?
def close(self):
"""Close the iterable (if needed) and reset all instance vars
Subclasses may want to also drop the client connection.
"""
try:
if hasattr(self.result,'close'):
self.result.close()
finally:
self.result = self.headers = self.status = self.environ = None
self.bytes_sent = 0; self.headers_sent = False
def send_headers(self):
"""Transmit headers to the client, via self._write()"""
self.cleanup_headers()
self.headers_sent = True
if not self.origin_server or self.client_is_modern():
self.send_preamble()
self._write(str(self.headers))
def result_is_file(self):
"""True if 'self.result' is an instance of 'self.wsgi_file_wrapper'"""
wrapper = self.wsgi_file_wrapper
return wrapper is not None and isinstance(self.result,wrapper)
def client_is_modern(self):
"""True if client can accept status and headers"""
return self.environ['SERVER_PROTOCOL'].upper() != 'HTTP/0.9'
def log_exception(self,exc_info):
"""Log the 'exc_info' tuple in the server log
Subclasses may override to retarget the output or change its format.
"""
try:
from traceback import print_exception
stderr = self.get_stderr()
print_exception(
exc_info[0], exc_info[1], exc_info[2],
self.traceback_limit, stderr
)
stderr.flush()
finally:
exc_info = None
def handle_error(self):
"""Log current error, and send error output to client if possible"""
self.log_exception(sys.exc_info())
if not self.headers_sent:
self.result = self.error_output(self.environ, self.start_response)
self.finish_response()
# XXX else: attempt advanced recovery techniques for HTML or text?
def error_output(self, environ, start_response):
"""WSGI mini-app to create error output
By default, this just uses the 'error_status', 'error_headers',
and 'error_body' attributes to generate an output page. It can
be overridden in a subclass to dynamically generate diagnostics,
choose an appropriate message for the user's preferred language, etc.
Note, however, that it's not recommended from a security perspective to
spit out diagnostics to any old user; ideally, you should have to do
something special to enable diagnostic output, which is why we don't
include any here!
"""
start_response(self.error_status,self.error_headers[:],sys.exc_info())
return [self.error_body]
# Pure abstract methods; *must* be overridden in subclasses
def _write(self,data):
"""Override in subclass to buffer data for send to client
It's okay if this method actually transmits the data; BaseHandler
just separates write and flush operations for greater efficiency
when the underlying system actually has such a distinction.
"""
raise NotImplementedError
def _flush(self):
"""Override in subclass to force sending of recent '_write()' calls
It's okay if this method is a no-op (i.e., if '_write()' actually
sends the data.
"""
raise NotImplementedError
def get_stdin(self):
"""Override in subclass to return suitable 'wsgi.input'"""
raise NotImplementedError
def get_stderr(self):
"""Override in subclass to return suitable 'wsgi.errors'"""
raise NotImplementedError
def add_cgi_vars(self):
"""Override in subclass to insert CGI variables in 'self.environ'"""
raise NotImplementedError
class SimpleHandler(BaseHandler):
"""Handler that's just initialized with streams, environment, etc.
This handler subclass is intended for synchronous HTTP/1.0 origin servers,
and handles sending the entire response output, given the correct inputs.
Usage::
handler = SimpleHandler(
inp,out,err,env, multithread=False, multiprocess=True
)
handler.run(app)"""
def __init__(self,stdin,stdout,stderr,environ,
multithread=True, multiprocess=False
):
self.stdin = stdin
self.stdout = stdout
self.stderr = stderr
self.base_env = environ
self.wsgi_multithread = multithread
self.wsgi_multiprocess = multiprocess
def get_stdin(self):
return self.stdin
def get_stderr(self):
return self.stderr
def add_cgi_vars(self):
self.environ.update(self.base_env)
def _write(self,data):
self.stdout.write(data)
self._write = self.stdout.write
def _flush(self):
self.stdout.flush()
self._flush = self.stdout.flush
class BaseCGIHandler(SimpleHandler):
"""CGI-like systems using input/output/error streams and environ mapping
Usage::
handler = BaseCGIHandler(inp,out,err,env)
handler.run(app)
This handler class is useful for gateway protocols like ReadyExec and
FastCGI, that have usable input/output/error streams and an environment
mapping. It's also the base class for CGIHandler, which just uses
sys.stdin, os.environ, and so on.
The constructor also takes keyword arguments 'multithread' and
'multiprocess' (defaulting to 'True' and 'False' respectively) to control
the configuration sent to the application. It sets 'origin_server' to
False (to enable CGI-like output), and assumes that 'wsgi.run_once' is
False.
"""
origin_server = False
class CGIHandler(BaseCGIHandler):
"""CGI-based invocation via sys.stdin/stdout/stderr and os.environ
Usage::
CGIHandler().run(app)
The difference between this class and BaseCGIHandler is that it always
uses 'wsgi.run_once' of 'True', 'wsgi.multithread' of 'False', and
'wsgi.multiprocess' of 'True'. It does not take any initialization
parameters, but always uses 'sys.stdin', 'os.environ', and friends.
If you need to override any of these parameters, use BaseCGIHandler
instead.
"""
wsgi_run_once = True
# Do not allow os.environ to leak between requests in Google App Engine
# and other multi-run CGI use cases. This is not easily testable.
# See http://bugs.python.org/issue7250
os_environ = {}
def __init__(self):
BaseCGIHandler.__init__(
self, sys.stdin, sys.stdout, sys.stderr, dict(os.environ.items()),
multithread=False, multiprocess=True
)
"""Manage HTTP Response Headers
Much of this module is red-handedly pilfered from email.message in the stdlib,
so portions are Copyright (C) 2001,2002 Python Software Foundation, and were
written by Barry Warsaw.
"""
from types import ListType, TupleType
# Regular expression that matches `special' characters in parameters, the
# existence of which force quoting of the parameter value.
import re
tspecials = re.compile(r'[ \(\)<>@,;:\\"/\[\]\?=]')
def _formatparam(param, value=None, quote=1):
"""Convenience function to format and return a key=value pair.
This will quote the value if needed or if quote is true.
"""
if value is not None and len(value) > 0:
if quote or tspecials.search(value):
value = value.replace('\\', '\\\\').replace('"', r'\"')
return '%s="%s"' % (param, value)
else:
return '%s=%s' % (param, value)
else:
return param
class Headers:
"""Manage a collection of HTTP response headers"""
def __init__(self,headers):
if type(headers) is not ListType:
raise TypeError("Headers must be a list of name/value tuples")
self._headers = headers
def __len__(self):
"""Return the total number of headers, including duplicates."""
return len(self._headers)
def __setitem__(self, name, val):
"""Set the value of a header."""
del self[name]
self._headers.append((name, val))
def __delitem__(self,name):
"""Delete all occurrences of a header, if present.
Does *not* raise an exception if the header is missing.
"""
name = name.lower()
self._headers[:] = [kv for kv in self._headers if kv[0].lower() != name]
def __getitem__(self,name):
"""Get the first header value for 'name'
Return None if the header is missing instead of raising an exception.
Note that if the header appeared multiple times, the first exactly which
occurrence gets returned is undefined. Use getall() to get all
the values matching a header field name.
"""
return self.get(name)
def has_key(self, name):
"""Return true if the message contains the header."""
return self.get(name) is not None
__contains__ = has_key
def get_all(self, name):
"""Return a list of all the values for the named field.
These will be sorted in the order they appeared in the original header
list or were added to this instance, and may contain duplicates. Any
fields deleted and re-inserted are always appended to the header list.
If no fields exist with the given name, returns an empty list.
"""
name = name.lower()
return [kv[1] for kv in self._headers if kv[0].lower()==name]
def get(self,name,default=None):
"""Get the first header value for 'name', or return 'default'"""
name = name.lower()
for k,v in self._headers:
if k.lower()==name:
return v
return default
def keys(self):
"""Return a list of all the header field names.
These will be sorted in the order they appeared in the original header
list, or were added to this instance, and may contain duplicates.
Any fields deleted and re-inserted are always appended to the header
list.
"""
return [k for k, v in self._headers]
def values(self):
"""Return a list of all header values.
These will be sorted in the order they appeared in the original header
list, or were added to this instance, and may contain duplicates.
Any fields deleted and re-inserted are always appended to the header
list.
"""
return [v for k, v in self._headers]
def items(self):
"""Get all the header fields and values.
These will be sorted in the order they were in the original header
list, or were added to this instance, and may contain duplicates.
Any fields deleted and re-inserted are always appended to the header
list.
"""
return self._headers[:]
def __repr__(self):
return "Headers(%r)" % self._headers
def __str__(self):
"""str() returns the formatted headers, complete with end line,
suitable for direct HTTP transmission."""
return '\r\n'.join(["%s: %s" % kv for kv in self._headers]+['',''])
def setdefault(self,name,value):
"""Return first matching header value for 'name', or 'value'
If there is no header named 'name', add a new header with name 'name'
and value 'value'."""
result = self.get(name)
if result is None:
self._headers.append((name,value))
return value
else:
return result
def add_header(self, _name, _value, **_params):
"""Extended header setting.
_name is the header field to add. keyword arguments can be used to set
additional parameters for the header field, with underscores converted
to dashes. Normally the parameter will be added as key="value" unless
value is None, in which case only the key will be added.
Example:
h.add_header('content-disposition', 'attachment', filename='bud.gif')
Note that unlike the corresponding 'email.message' method, this does
*not* handle '(charset, language, value)' tuples: all values must be
strings or None.
"""
parts = []
if _value is not None:
parts.append(_value)
for k, v in _params.items():
if v is None:
parts.append(k.replace('_', '-'))
else:
parts.append(_formatparam(k.replace('_', '-'), v))
self._headers.append((_name, "; ".join(parts)))
"""BaseHTTPServer that implements the Python WSGI protocol (PEP 333, rev 1.21)
This is both an example of how WSGI can be implemented, and a basis for running
simple web applications on a local machine, such as might be done when testing
or debugging an application. It has not been reviewed for security issues,
however, and we strongly recommend that you use a "real" web server for
production use.
For example usage, see the 'if __name__=="__main__"' block at the end of the
module. See also the BaseHTTPServer module docs for other API information.
"""
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
import urllib, sys
from wsgiref.handlers import SimpleHandler
__version__ = "0.1"
__all__ = ['WSGIServer', 'WSGIRequestHandler', 'demo_app', 'make_server']
server_version = "WSGIServer/" + __version__
sys_version = "Python/" + sys.version.split()[0]
software_version = server_version + ' ' + sys_version
class ServerHandler(SimpleHandler):
server_software = software_version
def close(self):
try:
self.request_handler.log_request(
self.status.split(' ',1)[0], self.bytes_sent
)
finally:
SimpleHandler.close(self)
class WSGIServer(HTTPServer):
"""BaseHTTPServer that implements the Python WSGI protocol"""
application = None
def server_bind(self):
"""Override server_bind to store the server name."""
HTTPServer.server_bind(self)
self.setup_environ()
def setup_environ(self):
# Set up base environment
env = self.base_environ = {}
env['SERVER_NAME'] = self.server_name
env['GATEWAY_INTERFACE'] = 'CGI/1.1'
env['SERVER_PORT'] = str(self.server_port)
env['REMOTE_HOST']=''
env['CONTENT_LENGTH']=''
env['SCRIPT_NAME'] = ''
def get_app(self):
return self.application
def set_app(self,application):
self.application = application
class WSGIRequestHandler(BaseHTTPRequestHandler):
server_version = "WSGIServer/" + __version__
def get_environ(self):
env = self.server.base_environ.copy()
env['SERVER_PROTOCOL'] = self.request_version
env['REQUEST_METHOD'] = self.command
if '?' in self.path:
path,query = self.path.split('?',1)
else:
path,query = self.path,''
env['PATH_INFO'] = urllib.unquote(path)
env['QUERY_STRING'] = query
host = self.address_string()
if host != self.client_address[0]:
env['REMOTE_HOST'] = host
env['REMOTE_ADDR'] = self.client_address[0]
if self.headers.typeheader is None:
env['CONTENT_TYPE'] = self.headers.type
else:
env['CONTENT_TYPE'] = self.headers.typeheader
length = self.headers.getheader('content-length')
if length:
env['CONTENT_LENGTH'] = length
for h in self.headers.headers:
k,v = h.split(':',1)
k=k.replace('-','_').upper(); v=v.strip()
if k in env:
continue # skip content length, type,etc.
if 'HTTP_'+k in env:
env['HTTP_'+k] += ','+v # comma-separate multiple headers
else:
env['HTTP_'+k] = v
return env
def get_stderr(self):
return sys.stderr
def handle(self):
"""Handle a single HTTP request"""
self.raw_requestline = self.rfile.readline(65537)
if len(self.raw_requestline) > 65536:
self.requestline = ''
self.request_version = ''
self.command = ''
self.send_error(414)
return
if not self.parse_request(): # An error code has been sent, just exit
return
handler = ServerHandler(
self.rfile, self.wfile, self.get_stderr(), self.get_environ()
)
handler.request_handler = self # backpointer for logging
handler.run(self.server.get_app())
def demo_app(environ,start_response):
from StringIO import StringIO
stdout = StringIO()
print >>stdout, "Hello world!"
print >>stdout
h = environ.items(); h.sort()
for k,v in h:
print >>stdout, k,'=', repr(v)
start_response("200 OK", [('Content-Type','text/plain')])
return [stdout.getvalue()]
def make_server(
host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler
):
"""Create a new WSGI server listening on `host` and `port` for `app`"""
server = server_class((host, port), handler_class)
server.set_app(app)
return server
if __name__ == '__main__':
httpd = make_server('', 8000, demo_app)
sa = httpd.socket.getsockname()
print "Serving HTTP on", sa[0], "port", sa[1], "..."
import webbrowser
webbrowser.open('http://localhost:8000/xyz?abc')
httpd.handle_request() # serve one request, then exit
httpd.server_close()
"""Miscellaneous WSGI-related Utilities"""
import posixpath
__all__ = [
'FileWrapper', 'guess_scheme', 'application_uri', 'request_uri',
'shift_path_info', 'setup_testing_defaults',
]
class FileWrapper:
"""Wrapper to convert file-like objects to iterables"""
def __init__(self, filelike, blksize=8192):
self.filelike = filelike
self.blksize = blksize
if hasattr(filelike,'close'):
self.close = filelike.close
def __getitem__(self,key):
data = self.filelike.read(self.blksize)
if data:
return data
raise IndexError
def __iter__(self):
return self
def next(self):
data = self.filelike.read(self.blksize)
if data:
return data
raise StopIteration
def guess_scheme(environ):
"""Return a guess for whether 'wsgi.url_scheme' should be 'http' or 'https'
"""
if environ.get("HTTPS") in ('yes','on','1'):
return 'https'
else:
return 'http'
def application_uri(environ):
"""Return the application's base URI (no PATH_INFO or QUERY_STRING)"""
url = environ['wsgi.url_scheme']+'://'
from urllib import quote
if environ.get('HTTP_HOST'):
url += environ['HTTP_HOST']
else:
url += environ['SERVER_NAME']
if environ['wsgi.url_scheme'] == 'https':
if environ['SERVER_PORT'] != '443':
url += ':' + environ['SERVER_PORT']
else:
if environ['SERVER_PORT'] != '80':
url += ':' + environ['SERVER_PORT']
url += quote(environ.get('SCRIPT_NAME') or '/')
return url
def request_uri(environ, include_query=1):
"""Return the full request URI, optionally including the query string"""
url = application_uri(environ)
from urllib import quote
path_info = quote(environ.get('PATH_INFO',''),safe='/;=,')
if not environ.get('SCRIPT_NAME'):
url += path_info[1:]
else:
url += path_info
if include_query and environ.get('QUERY_STRING'):
url += '?' + environ['QUERY_STRING']
return url
def shift_path_info(environ):
"""Shift a name from PATH_INFO to SCRIPT_NAME, returning it
If there are no remaining path segments in PATH_INFO, return None.
Note: 'environ' is modified in-place; use a copy if you need to keep
the original PATH_INFO or SCRIPT_NAME.
Note: when PATH_INFO is just a '/', this returns '' and appends a trailing
'/' to SCRIPT_NAME, even though empty path segments are normally ignored,
and SCRIPT_NAME doesn't normally end in a '/'. This is intentional
behavior, to ensure that an application can tell the difference between
'/x' and '/x/' when traversing to objects.
"""
path_info = environ.get('PATH_INFO','')
if not path_info:
return None
path_parts = path_info.split('/')
path_parts[1:-1] = [p for p in path_parts[1:-1] if p and p != '.']
name = path_parts[1]
del path_parts[1]
script_name = environ.get('SCRIPT_NAME','')
script_name = posixpath.normpath(script_name+'/'+name)
if script_name.endswith('/'):
script_name = script_name[:-1]
if not name and not script_name.endswith('/'):
script_name += '/'
environ['SCRIPT_NAME'] = script_name
environ['PATH_INFO'] = '/'.join(path_parts)
# Special case: '/.' on PATH_INFO doesn't get stripped,
# because we don't strip the last element of PATH_INFO
# if there's only one path part left. Instead of fixing this
# above, we fix it here so that PATH_INFO gets normalized to
# an empty string in the environ.
if name=='.':
name = None
return name
def setup_testing_defaults(environ):
"""Update 'environ' with trivial defaults for testing purposes
This adds various parameters required for WSGI, including HTTP_HOST,
SERVER_NAME, SERVER_PORT, REQUEST_METHOD, SCRIPT_NAME, PATH_INFO,
and all of the wsgi.* variables. It only supplies default values,
and does not replace any existing settings for these variables.
This routine is intended to make it easier for unit tests of WSGI
servers and applications to set up dummy environments. It should *not*
be used by actual WSGI servers or applications, since the data is fake!
"""
environ.setdefault('SERVER_NAME','127.0.0.1')
environ.setdefault('SERVER_PROTOCOL','HTTP/1.0')
environ.setdefault('HTTP_HOST',environ['SERVER_NAME'])
environ.setdefault('REQUEST_METHOD','GET')
if 'SCRIPT_NAME' not in environ and 'PATH_INFO' not in environ:
environ.setdefault('SCRIPT_NAME','')
environ.setdefault('PATH_INFO','/')
environ.setdefault('wsgi.version', (1,0))
environ.setdefault('wsgi.run_once', 0)
environ.setdefault('wsgi.multithread', 0)
environ.setdefault('wsgi.multiprocess', 0)
from StringIO import StringIO
environ.setdefault('wsgi.input', StringIO(""))
environ.setdefault('wsgi.errors', StringIO())
environ.setdefault('wsgi.url_scheme',guess_scheme(environ))
if environ['wsgi.url_scheme']=='http':
environ.setdefault('SERVER_PORT', '80')
elif environ['wsgi.url_scheme']=='https':
environ.setdefault('SERVER_PORT', '443')
_hoppish = {
'connection':1, 'keep-alive':1, 'proxy-authenticate':1,
'proxy-authorization':1, 'te':1, 'trailers':1, 'transfer-encoding':1,
'upgrade':1
}.__contains__
def is_hop_by_hop(header_name):
"""Return true if 'header_name' is an HTTP/1.1 "Hop-by-Hop" header"""
return _hoppish(header_name.lower())
# (c) 2005 Ian Bicking and contributors; written for Paste (http://pythonpaste.org)
# Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
# Also licenced under the Apache License, 2.0: http://opensource.org/licenses/apache2.0.php
# Licensed to PSF under a Contributor Agreement
"""
Middleware to check for obedience to the WSGI specification.
Some of the things this checks:
* Signature of the application and start_response (including that
keyword arguments are not used).
* Environment checks:
- Environment is a dictionary (and not a subclass).
- That all the required keys are in the environment: REQUEST_METHOD,
SERVER_NAME, SERVER_PORT, wsgi.version, wsgi.input, wsgi.errors,
wsgi.multithread, wsgi.multiprocess, wsgi.run_once
- That HTTP_CONTENT_TYPE and HTTP_CONTENT_LENGTH are not in the
environment (these headers should appear as CONTENT_LENGTH and
CONTENT_TYPE).
- Warns if QUERY_STRING is missing, as the cgi module acts
unpredictably in that case.
- That CGI-style variables (that don't contain a .) have
(non-unicode) string values
- That wsgi.version is a tuple
- That wsgi.url_scheme is 'http' or 'https' (@@: is this too
restrictive?)
- Warns if the REQUEST_METHOD is not known (@@: probably too
restrictive).
- That SCRIPT_NAME and PATH_INFO are empty or start with /
- That at least one of SCRIPT_NAME or PATH_INFO are set.
- That CONTENT_LENGTH is a positive integer.
- That SCRIPT_NAME is not '/' (it should be '', and PATH_INFO should
be '/').
- That wsgi.input has the methods read, readline, readlines, and
__iter__
- That wsgi.errors has the methods flush, write, writelines
* The status is a string, contains a space, starts with an integer,
and that integer is in range (> 100).
* That the headers is a list (not a subclass, not another kind of
sequence).
* That the items of the headers are tuples of strings.
* That there is no 'status' header (that is used in CGI, but not in
WSGI).
* That the headers don't contain newlines or colons, end in _ or -, or
contain characters codes below 037.
* That Content-Type is given if there is content (CGI often has a
default content type, but WSGI does not).
* That no Content-Type is given when there is no content (@@: is this
too restrictive?)
* That the exc_info argument to start_response is a tuple or None.
* That all calls to the writer are with strings, and no other methods
on the writer are accessed.
* That wsgi.input is used properly:
- .read() is called with zero or one argument
- That it returns a string
- That readline, readlines, and __iter__ return strings
- That .close() is not called
- No other methods are provided
* That wsgi.errors is used properly:
- .write() and .writelines() is called with a string
- That .close() is not called, and no other methods are provided.
* The response iterator:
- That it is not a string (it should be a list of a single string; a
string will work, but perform horribly).
- That .next() returns a string
- That the iterator is not iterated over until start_response has
been called (that can signal either a server or application
error).
- That .close() is called (doesn't raise exception, only prints to
sys.stderr, because we only know it isn't called when the object
is garbage collected).
"""
__all__ = ['validator']
import re
import sys
from types import DictType, StringType, TupleType, ListType
import warnings
header_re = re.compile(r'^[a-zA-Z][a-zA-Z0-9\-_]*$')
bad_header_value_re = re.compile(r'[\000-\037]')
class WSGIWarning(Warning):
"""
Raised in response to WSGI-spec-related warnings
"""
def assert_(cond, *args):
if not cond:
raise AssertionError(*args)
def validator(application):
"""
When applied between a WSGI server and a WSGI application, this
middleware will check for WSGI compliancy on a number of levels.
This middleware does not modify the request or response in any
way, but will raise an AssertionError if anything seems off
(except for a failure to close the application iterator, which
will be printed to stderr -- there's no way to raise an exception
at that point).
"""
def lint_app(*args, **kw):
assert_(len(args) == 2, "Two arguments required")
assert_(not kw, "No keyword arguments allowed")
environ, start_response = args
check_environ(environ)
# We use this to check if the application returns without
# calling start_response:
start_response_started = []
def start_response_wrapper(*args, **kw):
assert_(len(args) == 2 or len(args) == 3, (
"Invalid number of arguments: %s" % (args,)))
assert_(not kw, "No keyword arguments allowed")
status = args[0]
headers = args[1]
if len(args) == 3:
exc_info = args[2]
else:
exc_info = None
check_status(status)
check_headers(headers)
check_content_type(status, headers)
check_exc_info(exc_info)
start_response_started.append(None)
return WriteWrapper(start_response(*args))
environ['wsgi.input'] = InputWrapper(environ['wsgi.input'])
environ['wsgi.errors'] = ErrorWrapper(environ['wsgi.errors'])
iterator = application(environ, start_response_wrapper)
assert_(iterator is not None and iterator != False,
"The application must return an iterator, if only an empty list")
check_iterator(iterator)
return IteratorWrapper(iterator, start_response_started)
return lint_app
class InputWrapper:
def __init__(self, wsgi_input):
self.input = wsgi_input
def read(self, *args):
assert_(len(args) <= 1)
v = self.input.read(*args)
assert_(type(v) is type(""))
return v
def readline(self):
v = self.input.readline()
assert_(type(v) is type(""))
return v
def readlines(self, *args):
assert_(len(args) <= 1)
lines = self.input.readlines(*args)
assert_(type(lines) is type([]))
for line in lines:
assert_(type(line) is type(""))
return lines
def __iter__(self):
while 1:
line = self.readline()
if not line:
return
yield line
def close(self):
assert_(0, "input.close() must not be called")
class ErrorWrapper:
def __init__(self, wsgi_errors):
self.errors = wsgi_errors
def write(self, s):
assert_(type(s) is type(""))
self.errors.write(s)
def flush(self):
self.errors.flush()
def writelines(self, seq):
for line in seq:
self.write(line)
def close(self):
assert_(0, "errors.close() must not be called")
class WriteWrapper:
def __init__(self, wsgi_writer):
self.writer = wsgi_writer
def __call__(self, s):
assert_(type(s) is type(""))
self.writer(s)
class PartialIteratorWrapper:
def __init__(self, wsgi_iterator):
self.iterator = wsgi_iterator
def __iter__(self):
# We want to make sure __iter__ is called
return IteratorWrapper(self.iterator, None)
class IteratorWrapper:
def __init__(self, wsgi_iterator, check_start_response):
self.original_iterator = wsgi_iterator
self.iterator = iter(wsgi_iterator)
self.closed = False
self.check_start_response = check_start_response
def __iter__(self):
return self
def next(self):
assert_(not self.closed,
"Iterator read after closed")
v = self.iterator.next()
if self.check_start_response is not None:
assert_(self.check_start_response,
"The application returns and we started iterating over its body, but start_response has not yet been called")
self.check_start_response = None
return v
def close(self):
self.closed = True
if hasattr(self.original_iterator, 'close'):
self.original_iterator.close()
def __del__(self):
if not self.closed:
sys.stderr.write(
"Iterator garbage collected without being closed")
assert_(self.closed,
"Iterator garbage collected without being closed")
def check_environ(environ):
assert_(type(environ) is DictType,
"Environment is not of the right type: %r (environment: %r)"
% (type(environ), environ))
for key in ['REQUEST_METHOD', 'SERVER_NAME', 'SERVER_PORT',
'wsgi.version', 'wsgi.input', 'wsgi.errors',
'wsgi.multithread', 'wsgi.multiprocess',
'wsgi.run_once']:
assert_(key in environ,
"Environment missing required key: %r" % (key,))
for key in ['HTTP_CONTENT_TYPE', 'HTTP_CONTENT_LENGTH']:
assert_(key not in environ,
"Environment should not have the key: %s "
"(use %s instead)" % (key, key[5:]))
if 'QUERY_STRING' not in environ:
warnings.warn(
'QUERY_STRING is not in the WSGI environment; the cgi '
'module will use sys.argv when this variable is missing, '
'so application errors are more likely',
WSGIWarning)
for key in environ.keys():
if '.' in key:
# Extension, we don't care about its type
continue
assert_(type(environ[key]) is StringType,
"Environmental variable %s is not a string: %r (value: %r)"
% (key, type(environ[key]), environ[key]))
assert_(type(environ['wsgi.version']) is TupleType,
"wsgi.version should be a tuple (%r)" % (environ['wsgi.version'],))
assert_(environ['wsgi.url_scheme'] in ('http', 'https'),
"wsgi.url_scheme unknown: %r" % environ['wsgi.url_scheme'])
check_input(environ['wsgi.input'])
check_errors(environ['wsgi.errors'])
# @@: these need filling out:
if environ['REQUEST_METHOD'] not in (
'GET', 'HEAD', 'POST', 'OPTIONS', 'PATCH', 'PUT', 'DELETE', 'TRACE'):
warnings.warn(
"Unknown REQUEST_METHOD: %r" % environ['REQUEST_METHOD'],
WSGIWarning)
assert_(not environ.get('SCRIPT_NAME')
or environ['SCRIPT_NAME'].startswith('/'),
"SCRIPT_NAME doesn't start with /: %r" % environ['SCRIPT_NAME'])
assert_(not environ.get('PATH_INFO')
or environ['PATH_INFO'].startswith('/'),
"PATH_INFO doesn't start with /: %r" % environ['PATH_INFO'])
if environ.get('CONTENT_LENGTH'):
assert_(int(environ['CONTENT_LENGTH']) >= 0,
"Invalid CONTENT_LENGTH: %r" % environ['CONTENT_LENGTH'])
if not environ.get('SCRIPT_NAME'):
assert_('PATH_INFO' in environ,
"One of SCRIPT_NAME or PATH_INFO are required (PATH_INFO "
"should at least be '/' if SCRIPT_NAME is empty)")
assert_(environ.get('SCRIPT_NAME') != '/',
"SCRIPT_NAME cannot be '/'; it should instead be '', and "
"PATH_INFO should be '/'")
def check_input(wsgi_input):
for attr in ['read', 'readline', 'readlines', '__iter__']:
assert_(hasattr(wsgi_input, attr),
"wsgi.input (%r) doesn't have the attribute %s"
% (wsgi_input, attr))
def check_errors(wsgi_errors):
for attr in ['flush', 'write', 'writelines']:
assert_(hasattr(wsgi_errors, attr),
"wsgi.errors (%r) doesn't have the attribute %s"
% (wsgi_errors, attr))
def check_status(status):
assert_(type(status) is StringType,
"Status must be a string (not %r)" % status)
# Implicitly check that we can turn it into an integer:
status_code = status.split(None, 1)[0]
assert_(len(status_code) == 3,
"Status codes must be three characters: %r" % status_code)
status_int = int(status_code)
assert_(status_int >= 100, "Status code is invalid: %r" % status_int)
if len(status) < 4 or status[3] != ' ':
warnings.warn(
"The status string (%r) should be a three-digit integer "
"followed by a single space and a status explanation"
% status, WSGIWarning)
def check_headers(headers):
assert_(type(headers) is ListType,
"Headers (%r) must be of type list: %r"
% (headers, type(headers)))
header_names = {}
for item in headers:
assert_(type(item) is TupleType,
"Individual headers (%r) must be of type tuple: %r"
% (item, type(item)))
assert_(len(item) == 2)
name, value = item
assert_(name.lower() != 'status',
"The Status header cannot be used; it conflicts with CGI "
"script, and HTTP status is not given through headers "
"(value: %r)." % value)
header_names[name.lower()] = None
assert_('\n' not in name and ':' not in name,
"Header names may not contain ':' or '\\n': %r" % name)
assert_(header_re.search(name), "Bad header name: %r" % name)
assert_(not name.endswith('-') and not name.endswith('_'),
"Names may not end in '-' or '_': %r" % name)
if bad_header_value_re.search(value):
assert_(0, "Bad header value: %r (bad char: %r)"
% (value, bad_header_value_re.search(value).group(0)))
def check_content_type(status, headers):
code = int(status.split(None, 1)[0])
# @@: need one more person to verify this interpretation of RFC 2616
# http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
NO_MESSAGE_BODY = (204, 304)
for name, value in headers:
if name.lower() == 'content-type':
if code not in NO_MESSAGE_BODY:
return
assert_(0, ("Content-Type header found in a %s response, "
"which must not return content.") % code)
if code not in NO_MESSAGE_BODY:
assert_(0, "No Content-Type header found in headers (%s)" % headers)
def check_exc_info(exc_info):
assert_(exc_info is None or type(exc_info) is type(()),
"exc_info (%r) is not a tuple: %r" % (exc_info, type(exc_info)))
# More exc_info checks?
def check_iterator(iterator):
# Technically a string is legal, which is why it's a really bad
# idea, because it may cause the response to be returned
# character-by-character
assert_(not isinstance(iterator, str),
"You should not return a string as your application iterator, "
"instead return a single-item list containing that string.")
"""wsgiref -- a WSGI (PEP 333) Reference Library
Current Contents:
* util -- Miscellaneous useful functions and wrappers
* headers -- Manage response headers
* handlers -- base classes for server/gateway implementations
* simple_server -- a simple BaseHTTPServer that supports WSGI
* validate -- validation wrapper that sits between an app and a server
to detect errors in either
To-Do:
* cgi_gateway -- Run WSGI apps under CGI (pending a deployment standard)
* cgi_wrapper -- Run CGI apps under WSGI
* router -- a simple middleware component that handles URL traversal
"""
"""Implements (a subset of) Sun XDR -- eXternal Data Representation.
See: RFC 1014
"""
import struct
try:
from cStringIO import StringIO as _StringIO
except ImportError:
from StringIO import StringIO as _StringIO
from functools import wraps
__all__ = ["Error", "Packer", "Unpacker", "ConversionError"]
# exceptions
class Error(Exception):
"""Exception class for this module. Use:
except xdrlib.Error, var:
# var has the Error instance for the exception
Public ivars:
msg -- contains the message
"""
def __init__(self, msg):
self.msg = msg
def __repr__(self):
return repr(self.msg)
def __str__(self):
return str(self.msg)
class ConversionError(Error):
pass
def raise_conversion_error(function):
""" Wrap any raised struct.errors in a ConversionError. """
@wraps(function)
def result(self, value):
try:
return function(self, value)
except struct.error as e:
raise ConversionError(e.args[0])
return result
class Packer:
"""Pack various data representations into a buffer."""
def __init__(self):
self.reset()
def reset(self):
self.__buf = _StringIO()
def get_buffer(self):
return self.__buf.getvalue()
# backwards compatibility
get_buf = get_buffer
@raise_conversion_error
def pack_uint(self, x):
self.__buf.write(struct.pack('>L', x))
@raise_conversion_error
def pack_int(self, x):
self.__buf.write(struct.pack('>l', x))
pack_enum = pack_int
def pack_bool(self, x):
if x: self.__buf.write('\0\0\0\1')
else: self.__buf.write('\0\0\0\0')
def pack_uhyper(self, x):
try:
self.pack_uint(x>>32 & 0xffffffffL)
except (TypeError, struct.error) as e:
raise ConversionError(e.args[0])
try:
self.pack_uint(x & 0xffffffffL)
except (TypeError, struct.error) as e:
raise ConversionError(e.args[0])
pack_hyper = pack_uhyper
@raise_conversion_error
def pack_float(self, x):
self.__buf.write(struct.pack('>f', x))
@raise_conversion_error
def pack_double(self, x):
self.__buf.write(struct.pack('>d', x))
def pack_fstring(self, n, s):
if n < 0:
raise ValueError, 'fstring size must be nonnegative'
data = s[:n]
n = ((n+3)//4)*4
data = data + (n - len(data)) * '\0'
self.__buf.write(data)
pack_fopaque = pack_fstring
def pack_string(self, s):
n = len(s)
self.pack_uint(n)
self.pack_fstring(n, s)
pack_opaque = pack_string
pack_bytes = pack_string
def pack_list(self, list, pack_item):
for item in list:
self.pack_uint(1)
pack_item(item)
self.pack_uint(0)
def pack_farray(self, n, list, pack_item):
if len(list) != n:
raise ValueError, 'wrong array size'
for item in list:
pack_item(item)
def pack_array(self, list, pack_item):
n = len(list)
self.pack_uint(n)
self.pack_farray(n, list, pack_item)
class Unpacker:
"""Unpacks various data representations from the given buffer."""
def __init__(self, data):
self.reset(data)
def reset(self, data):
self.__buf = data
self.__pos = 0
def get_position(self):
return self.__pos
def set_position(self, position):
self.__pos = position
def get_buffer(self):
return self.__buf
def done(self):
if self.__pos < len(self.__buf):
raise Error('unextracted data remains')
def unpack_uint(self):
i = self.__pos
self.__pos = j = i+4
data = self.__buf[i:j]
if len(data) < 4:
raise EOFError
x = struct.unpack('>L', data)[0]
try:
return int(x)
except OverflowError:
return x
def unpack_int(self):
i = self.__pos
self.__pos = j = i+4
data = self.__buf[i:j]
if len(data) < 4:
raise EOFError
return struct.unpack('>l', data)[0]
unpack_enum = unpack_int
def unpack_bool(self):
return bool(self.unpack_int())
def unpack_uhyper(self):
hi = self.unpack_uint()
lo = self.unpack_uint()
return long(hi)<<32 | lo
def unpack_hyper(self):
x = self.unpack_uhyper()
if x >= 0x8000000000000000L:
x = x - 0x10000000000000000L
return x
def unpack_float(self):
i = self.__pos
self.__pos = j = i+4
data = self.__buf[i:j]
if len(data) < 4:
raise EOFError
return struct.unpack('>f', data)[0]
def unpack_double(self):
i = self.__pos
self.__pos = j = i+8
data = self.__buf[i:j]
if len(data) < 8:
raise EOFError
return struct.unpack('>d', data)[0]
def unpack_fstring(self, n):
if n < 0:
raise ValueError, 'fstring size must be nonnegative'
i = self.__pos
j = i + (n+3)//4*4
if j > len(self.__buf):
raise EOFError
self.__pos = j
return self.__buf[i:i+n]
unpack_fopaque = unpack_fstring
def unpack_string(self):
n = self.unpack_uint()
return self.unpack_fstring(n)
unpack_opaque = unpack_string
unpack_bytes = unpack_string
def unpack_list(self, unpack_item):
list = []
while 1:
x = self.unpack_uint()
if x == 0: break
if x != 1:
raise ConversionError, '0 or 1 expected, got %r' % (x,)
item = unpack_item()
list.append(item)
return list
def unpack_farray(self, n, unpack_item):
list = []
for i in range(n):
list.append(unpack_item())
return list
def unpack_array(self, unpack_item):
n = self.unpack_uint()
return self.unpack_farray(n, unpack_item)
"""Registration facilities for DOM. This module should not be used
directly. Instead, the functions getDOMImplementation and
registerDOMImplementation should be imported from xml.dom."""
from xml.dom.minicompat import * # isinstance, StringTypes
# This is a list of well-known implementations. Well-known names
# should be published by posting to [email protected], and are
# subsequently recorded in this file.
import sys
well_known_implementations = {
'minidom':'xml.dom.minidom',
'4DOM': 'xml.dom.DOMImplementation',
}
# DOM implementations not officially registered should register
# themselves with their
registered = {}
def registerDOMImplementation(name, factory):
"""registerDOMImplementation(name, factory)
Register the factory function with the name. The factory function
should return an object which implements the DOMImplementation
interface. The factory function can either return the same object,
or a new one (e.g. if that implementation supports some
customization)."""
registered[name] = factory
def _good_enough(dom, features):
"_good_enough(dom, features) -> Return 1 if the dom offers the features"
for f,v in features:
if not dom.hasFeature(f,v):
return 0
return 1
def getDOMImplementation(name = None, features = ()):
"""getDOMImplementation(name = None, features = ()) -> DOM implementation.
Return a suitable DOM implementation. The name is either
well-known, the module name of a DOM implementation, or None. If
it is not None, imports the corresponding module and returns
DOMImplementation object if the import succeeds.
If name is not given, consider the available implementations to
find one with the required feature set. If no implementation can
be found, raise an ImportError. The features list must be a sequence
of (feature, version) pairs which are passed to hasFeature."""
import os
creator = None
mod = well_known_implementations.get(name)
if mod:
mod = __import__(mod, {}, {}, ['getDOMImplementation'])
return mod.getDOMImplementation()
elif name:
return registered[name]()
elif not sys.flags.ignore_environment and "PYTHON_DOM" in os.environ:
return getDOMImplementation(name = os.environ["PYTHON_DOM"])
# User did not specify a name, try implementations in arbitrary
# order, returning the one that has the required features
if isinstance(features, StringTypes):
features = _parse_feature_string(features)
for creator in registered.values():
dom = creator()
if _good_enough(dom, features):
return dom
for creator in well_known_implementations.keys():
try:
dom = getDOMImplementation(name = creator)
except StandardError: # typically ImportError, or AttributeError
continue
if _good_enough(dom, features):
return dom
raise ImportError,"no suitable DOM implementation found"
def _parse_feature_string(s):
features = []
parts = s.split()
i = 0
length = len(parts)
while i < length:
feature = parts[i]
if feature[0] in "0123456789":
raise ValueError, "bad feature name: %r" % (feature,)
i = i + 1
version = None
if i < length:
v = parts[i]
if v[0] in "0123456789":
i = i + 1
version = v
features.append((feature, version))
return tuple(features)
"""Facility to use the Expat parser to load a minidom instance
from a string or file.
This avoids all the overhead of SAX and pulldom to gain performance.
"""
# Warning!
#
# This module is tightly bound to the implementation details of the
# minidom DOM and can't be used with other DOM implementations. This
# is due, in part, to a lack of appropriate methods in the DOM (there is
# no way to create Entity and Notation nodes via the DOM Level 2
# interface), and for performance. The latter is the cause of some fairly
# cryptic code.
#
# Performance hacks:
#
# - .character_data_handler() has an extra case in which continuing
# data is appended to an existing Text node; this can be a
# speedup since pyexpat can break up character data into multiple
# callbacks even though we set the buffer_text attribute on the
# parser. This also gives us the advantage that we don't need a
# separate normalization pass.
#
# - Determining that a node exists is done using an identity comparison
# with None rather than a truth test; this avoids searching for and
# calling any methods on the node object if it exists. (A rather
# nice speedup is achieved this way as well!)
from xml.dom import xmlbuilder, minidom, Node
from xml.dom import EMPTY_NAMESPACE, EMPTY_PREFIX, XMLNS_NAMESPACE
from xml.parsers import expat
from xml.dom.minidom import _append_child, _set_attribute_node
from xml.dom.NodeFilter import NodeFilter
from xml.dom.minicompat import *
TEXT_NODE = Node.TEXT_NODE
CDATA_SECTION_NODE = Node.CDATA_SECTION_NODE
DOCUMENT_NODE = Node.DOCUMENT_NODE
FILTER_ACCEPT = xmlbuilder.DOMBuilderFilter.FILTER_ACCEPT
FILTER_REJECT = xmlbuilder.DOMBuilderFilter.FILTER_REJECT
FILTER_SKIP = xmlbuilder.DOMBuilderFilter.FILTER_SKIP
FILTER_INTERRUPT = xmlbuilder.DOMBuilderFilter.FILTER_INTERRUPT
theDOMImplementation = minidom.getDOMImplementation()
# Expat typename -> TypeInfo
_typeinfo_map = {
"CDATA": minidom.TypeInfo(None, "cdata"),
"ENUM": minidom.TypeInfo(None, "enumeration"),
"ENTITY": minidom.TypeInfo(None, "entity"),
"ENTITIES": minidom.TypeInfo(None, "entities"),
"ID": minidom.TypeInfo(None, "id"),
"IDREF": minidom.TypeInfo(None, "idref"),
"IDREFS": minidom.TypeInfo(None, "idrefs"),
"NMTOKEN": minidom.TypeInfo(None, "nmtoken"),
"NMTOKENS": minidom.TypeInfo(None, "nmtokens"),
}
class ElementInfo(object):
__slots__ = '_attr_info', '_model', 'tagName'
def __init__(self, tagName, model=None):
self.tagName = tagName
self._attr_info = []
self._model = model
def __getstate__(self):
return self._attr_info, self._model, self.tagName
def __setstate__(self, state):
self._attr_info, self._model, self.tagName = state
def getAttributeType(self, aname):
for info in self._attr_info:
if info[1] == aname:
t = info[-2]
if t[0] == "(":
return _typeinfo_map["ENUM"]
else:
return _typeinfo_map[info[-2]]
return minidom._no_type
def getAttributeTypeNS(self, namespaceURI, localName):
return minidom._no_type
def isElementContent(self):
if self._model:
type = self._model[0]
return type not in (expat.model.XML_CTYPE_ANY,
expat.model.XML_CTYPE_MIXED)
else:
return False
def isEmpty(self):
if self._model:
return self._model[0] == expat.model.XML_CTYPE_EMPTY
else:
return False
def isId(self, aname):
for info in self._attr_info:
if info[1] == aname:
return info[-2] == "ID"
return False
def isIdNS(self, euri, ename, auri, aname):
# not sure this is meaningful
return self.isId((auri, aname))
def _intern(builder, s):
return builder._intern_setdefault(s, s)
def _parse_ns_name(builder, name):
assert ' ' in name
parts = name.split(' ')
intern = builder._intern_setdefault
if len(parts) == 3:
uri, localname, prefix = parts
prefix = intern(prefix, prefix)
qname = "%s:%s" % (prefix, localname)
qname = intern(qname, qname)
localname = intern(localname, localname)
else:
uri, localname = parts
prefix = EMPTY_PREFIX
qname = localname = intern(localname, localname)
return intern(uri, uri), localname, prefix, qname
class ExpatBuilder:
"""Document builder that uses Expat to build a ParsedXML.DOM document
instance."""
def __init__(self, options=None):
if options is None:
options = xmlbuilder.Options()
self._options = options
if self._options.filter is not None:
self._filter = FilterVisibilityController(self._options.filter)
else:
self._filter = None
# This *really* doesn't do anything in this case, so
# override it with something fast & minimal.
self._finish_start_element = id
self._parser = None
self.reset()
def createParser(self):
"""Create a new parser object."""
return expat.ParserCreate()
def getParser(self):
"""Return the parser object, creating a new one if needed."""
if not self._parser:
self._parser = self.createParser()
self._intern_setdefault = self._parser.intern.setdefault
self._parser.buffer_text = True
self._parser.ordered_attributes = True
self._parser.specified_attributes = True
self.install(self._parser)
return self._parser
def reset(self):
"""Free all data structures used during DOM construction."""
self.document = theDOMImplementation.createDocument(
EMPTY_NAMESPACE, None, None)
self.curNode = self.document
self._elem_info = self.document._elem_info
self._cdata = False
def install(self, parser):
"""Install the callbacks needed to build the DOM into the parser."""
# This creates circular references!
parser.StartDoctypeDeclHandler = self.start_doctype_decl_handler
parser.StartElementHandler = self.first_element_handler
parser.EndElementHandler = self.end_element_handler
parser.ProcessingInstructionHandler = self.pi_handler
if self._options.entities:
parser.EntityDeclHandler = self.entity_decl_handler
parser.NotationDeclHandler = self.notation_decl_handler
if self._options.comments:
parser.CommentHandler = self.comment_handler
if self._options.cdata_sections:
parser.StartCdataSectionHandler = self.start_cdata_section_handler
parser.EndCdataSectionHandler = self.end_cdata_section_handler
parser.CharacterDataHandler = self.character_data_handler_cdata
else:
parser.CharacterDataHandler = self.character_data_handler
parser.ExternalEntityRefHandler = self.external_entity_ref_handler
parser.XmlDeclHandler = self.xml_decl_handler
parser.ElementDeclHandler = self.element_decl_handler
parser.AttlistDeclHandler = self.attlist_decl_handler
def parseFile(self, file):
"""Parse a document from a file object, returning the document
node."""
parser = self.getParser()
first_buffer = True
try:
while 1:
buffer = file.read(16*1024)
if not buffer:
break
parser.Parse(buffer, 0)
if first_buffer and self.document.documentElement:
self._setup_subset(buffer)
first_buffer = False
parser.Parse("", True)
except ParseEscape:
pass
doc = self.document
self.reset()
self._parser = None
return doc
def parseString(self, string):
"""Parse a document from a string, returning the document node."""
parser = self.getParser()
try:
parser.Parse(string, True)
self._setup_subset(string)
except ParseEscape:
pass
doc = self.document
self.reset()
self._parser = None
return doc
def _setup_subset(self, buffer):
"""Load the internal subset if there might be one."""
if self.document.doctype:
extractor = InternalSubsetExtractor()
extractor.parseString(buffer)
subset = extractor.getSubset()
self.document.doctype.internalSubset = subset
def start_doctype_decl_handler(self, doctypeName, systemId, publicId,
has_internal_subset):
doctype = self.document.implementation.createDocumentType(
doctypeName, publicId, systemId)
doctype.ownerDocument = self.document
_append_child(self.document, doctype)
self.document.doctype = doctype
if self._filter and self._filter.acceptNode(doctype) == FILTER_REJECT:
self.document.doctype = None
del self.document.childNodes[-1]
doctype = None
self._parser.EntityDeclHandler = None
self._parser.NotationDeclHandler = None
if has_internal_subset:
if doctype is not None:
doctype.entities._seq = []
doctype.notations._seq = []
self._parser.CommentHandler = None
self._parser.ProcessingInstructionHandler = None
self._parser.EndDoctypeDeclHandler = self.end_doctype_decl_handler
def end_doctype_decl_handler(self):
if self._options.comments:
self._parser.CommentHandler = self.comment_handler
self._parser.ProcessingInstructionHandler = self.pi_handler
if not (self._elem_info or self._filter):
self._finish_end_element = id
def pi_handler(self, target, data):
node = self.document.createProcessingInstruction(target, data)
_append_child(self.curNode, node)
if self._filter and self._filter.acceptNode(node) == FILTER_REJECT:
self.curNode.removeChild(node)
def character_data_handler_cdata(self, data):
childNodes = self.curNode.childNodes
if self._cdata:
if ( self._cdata_continue
and childNodes[-1].nodeType == CDATA_SECTION_NODE):
childNodes[-1].appendData(data)
return
node = self.document.createCDATASection(data)
self._cdata_continue = True
elif childNodes and childNodes[-1].nodeType == TEXT_NODE:
node = childNodes[-1]
value = node.data + data
d = node.__dict__
d['data'] = d['nodeValue'] = value
return
else:
node = minidom.Text()
d = node.__dict__
d['data'] = d['nodeValue'] = data
d['ownerDocument'] = self.document
_append_child(self.curNode, node)
def character_data_handler(self, data):
childNodes = self.curNode.childNodes
if childNodes and childNodes[-1].nodeType == TEXT_NODE:
node = childNodes[-1]
d = node.__dict__
d['data'] = d['nodeValue'] = node.data + data
return
node = minidom.Text()
d = node.__dict__
d['data'] = d['nodeValue'] = node.data + data
d['ownerDocument'] = self.document
_append_child(self.curNode, node)
def entity_decl_handler(self, entityName, is_parameter_entity, value,
base, systemId, publicId, notationName):
if is_parameter_entity:
# we don't care about parameter entities for the DOM
return
if not self._options.entities:
return
node = self.document._create_entity(entityName, publicId,
systemId, notationName)
if value is not None:
# internal entity
# node *should* be readonly, but we'll cheat
child = self.document.createTextNode(value)
node.childNodes.append(child)
self.document.doctype.entities._seq.append(node)
if self._filter and self._filter.acceptNode(node) == FILTER_REJECT:
del self.document.doctype.entities._seq[-1]
def notation_decl_handler(self, notationName, base, systemId, publicId):
node = self.document._create_notation(notationName, publicId, systemId)
self.document.doctype.notations._seq.append(node)
if self._filter and self._filter.acceptNode(node) == FILTER_ACCEPT:
del self.document.doctype.notations._seq[-1]
def comment_handler(self, data):
node = self.document.createComment(data)
_append_child(self.curNode, node)
if self._filter and self._filter.acceptNode(node) == FILTER_REJECT:
self.curNode.removeChild(node)
def start_cdata_section_handler(self):
self._cdata = True
self._cdata_continue = False
def end_cdata_section_handler(self):
self._cdata = False
self._cdata_continue = False
def external_entity_ref_handler(self, context, base, systemId, publicId):
return 1
def first_element_handler(self, name, attributes):
if self._filter is None and not self._elem_info:
self._finish_end_element = id
self.getParser().StartElementHandler = self.start_element_handler
self.start_element_handler(name, attributes)
def start_element_handler(self, name, attributes):
node = self.document.createElement(name)
_append_child(self.curNode, node)
self.curNode = node
if attributes:
for i in range(0, len(attributes), 2):
a = minidom.Attr(attributes[i], EMPTY_NAMESPACE,
None, EMPTY_PREFIX)
value = attributes[i+1]
d = a.childNodes[0].__dict__
d['data'] = d['nodeValue'] = value
d = a.__dict__
d['value'] = d['nodeValue'] = value
d['ownerDocument'] = self.document
_set_attribute_node(node, a)
if node is not self.document.documentElement:
self._finish_start_element(node)
def _finish_start_element(self, node):
if self._filter:
# To be general, we'd have to call isSameNode(), but this
# is sufficient for minidom:
if node is self.document.documentElement:
return
filt = self._filter.startContainer(node)
if filt == FILTER_REJECT:
# ignore this node & all descendents
Rejecter(self)
elif filt == FILTER_SKIP:
# ignore this node, but make it's children become
# children of the parent node
Skipper(self)
else:
return
self.curNode = node.parentNode
node.parentNode.removeChild(node)
node.unlink()
# If this ever changes, Namespaces.end_element_handler() needs to
# be changed to match.
#
def end_element_handler(self, name):
curNode = self.curNode
self.curNode = curNode.parentNode
self._finish_end_element(curNode)
def _finish_end_element(self, curNode):
info = self._elem_info.get(curNode.tagName)
if info:
self._handle_white_text_nodes(curNode, info)
if self._filter:
if curNode is self.document.documentElement:
return
if self._filter.acceptNode(curNode) == FILTER_REJECT:
self.curNode.removeChild(curNode)
curNode.unlink()
def _handle_white_text_nodes(self, node, info):
if (self._options.whitespace_in_element_content
or not info.isElementContent()):
return
# We have element type information and should remove ignorable
# whitespace; identify for text nodes which contain only
# whitespace.
L = []
for child in node.childNodes:
if child.nodeType == TEXT_NODE and not child.data.strip():
L.append(child)
# Remove ignorable whitespace from the tree.
for child in L:
node.removeChild(child)
def element_decl_handler(self, name, model):
info = self._elem_info.get(name)
if info is None:
self._elem_info[name] = ElementInfo(name, model)
else:
assert info._model is None
info._model = model
def attlist_decl_handler(self, elem, name, type, default, required):
info = self._elem_info.get(elem)
if info is None:
info = ElementInfo(elem)
self._elem_info[elem] = info
info._attr_info.append(
[None, name, None, None, default, 0, type, required])
def xml_decl_handler(self, version, encoding, standalone):
self.document.version = version
self.document.encoding = encoding
# This is still a little ugly, thanks to the pyexpat API. ;-(
if standalone >= 0:
if standalone:
self.document.standalone = True
else:
self.document.standalone = False
# Don't include FILTER_INTERRUPT, since that's checked separately
# where allowed.
_ALLOWED_FILTER_RETURNS = (FILTER_ACCEPT, FILTER_REJECT, FILTER_SKIP)
class FilterVisibilityController(object):
"""Wrapper around a DOMBuilderFilter which implements the checks
to make the whatToShow filter attribute work."""
__slots__ = 'filter',
def __init__(self, filter):
self.filter = filter
def startContainer(self, node):
mask = self._nodetype_mask[node.nodeType]
if self.filter.whatToShow & mask:
val = self.filter.startContainer(node)
if val == FILTER_INTERRUPT:
raise ParseEscape
if val not in _ALLOWED_FILTER_RETURNS:
raise ValueError, \
"startContainer() returned illegal value: " + repr(val)
return val
else:
return FILTER_ACCEPT
def acceptNode(self, node):
mask = self._nodetype_mask[node.nodeType]
if self.filter.whatToShow & mask:
val = self.filter.acceptNode(node)
if val == FILTER_INTERRUPT:
raise ParseEscape
if val == FILTER_SKIP:
# move all child nodes to the parent, and remove this node
parent = node.parentNode
for child in node.childNodes[:]:
parent.appendChild(child)
# node is handled by the caller
return FILTER_REJECT
if val not in _ALLOWED_FILTER_RETURNS:
raise ValueError, \
"acceptNode() returned illegal value: " + repr(val)
return val
else:
return FILTER_ACCEPT
_nodetype_mask = {
Node.ELEMENT_NODE: NodeFilter.SHOW_ELEMENT,
Node.ATTRIBUTE_NODE: NodeFilter.SHOW_ATTRIBUTE,
Node.TEXT_NODE: NodeFilter.SHOW_TEXT,
Node.CDATA_SECTION_NODE: NodeFilter.SHOW_CDATA_SECTION,
Node.ENTITY_REFERENCE_NODE: NodeFilter.SHOW_ENTITY_REFERENCE,
Node.ENTITY_NODE: NodeFilter.SHOW_ENTITY,
Node.PROCESSING_INSTRUCTION_NODE: NodeFilter.SHOW_PROCESSING_INSTRUCTION,
Node.COMMENT_NODE: NodeFilter.SHOW_COMMENT,
Node.DOCUMENT_NODE: NodeFilter.SHOW_DOCUMENT,
Node.DOCUMENT_TYPE_NODE: NodeFilter.SHOW_DOCUMENT_TYPE,
Node.DOCUMENT_FRAGMENT_NODE: NodeFilter.SHOW_DOCUMENT_FRAGMENT,
Node.NOTATION_NODE: NodeFilter.SHOW_NOTATION,
}
class FilterCrutch(object):
__slots__ = '_builder', '_level', '_old_start', '_old_end'
def __init__(self, builder):
self._level = 0
self._builder = builder
parser = builder._parser
self._old_start = parser.StartElementHandler
self._old_end = parser.EndElementHandler
parser.StartElementHandler = self.start_element_handler
parser.EndElementHandler = self.end_element_handler
class Rejecter(FilterCrutch):
__slots__ = ()
def __init__(self, builder):
FilterCrutch.__init__(self, builder)
parser = builder._parser
for name in ("ProcessingInstructionHandler",
"CommentHandler",
"CharacterDataHandler",
"StartCdataSectionHandler",
"EndCdataSectionHandler",
"ExternalEntityRefHandler",
):
setattr(parser, name, None)
def start_element_handler(self, *args):
self._level = self._level + 1
def end_element_handler(self, *args):
if self._level == 0:
# restore the old handlers
parser = self._builder._parser
self._builder.install(parser)
parser.StartElementHandler = self._old_start
parser.EndElementHandler = self._old_end
else:
self._level = self._level - 1
class Skipper(FilterCrutch):
__slots__ = ()
def start_element_handler(self, *args):
node = self._builder.curNode
self._old_start(*args)
if self._builder.curNode is not node:
self._level = self._level + 1
def end_element_handler(self, *args):
if self._level == 0:
# We're popping back out of the node we're skipping, so we
# shouldn't need to do anything but reset the handlers.
self._builder._parser.StartElementHandler = self._old_start
self._builder._parser.EndElementHandler = self._old_end
self._builder = None
else:
self._level = self._level - 1
self._old_end(*args)
# framework document used by the fragment builder.
# Takes a string for the doctype, subset string, and namespace attrs string.
_FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID = \
"http://xml.python.org/entities/fragment-builder/internal"
_FRAGMENT_BUILDER_TEMPLATE = (
'''\
<!DOCTYPE wrapper
%%s [
<!ENTITY fragment-builder-internal
SYSTEM "%s">
%%s
]>
<wrapper %%s
>&fragment-builder-internal;</wrapper>'''
% _FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID)
class FragmentBuilder(ExpatBuilder):
"""Builder which constructs document fragments given XML source
text and a context node.
The context node is expected to provide information about the
namespace declarations which are in scope at the start of the
fragment.
"""
def __init__(self, context, options=None):
if context.nodeType == DOCUMENT_NODE:
self.originalDocument = context
self.context = context
else:
self.originalDocument = context.ownerDocument
self.context = context
ExpatBuilder.__init__(self, options)
def reset(self):
ExpatBuilder.reset(self)
self.fragment = None
def parseFile(self, file):
"""Parse a document fragment from a file object, returning the
fragment node."""
return self.parseString(file.read())
def parseString(self, string):
"""Parse a document fragment from a string, returning the
fragment node."""
self._source = string
parser = self.getParser()
doctype = self.originalDocument.doctype
ident = ""
if doctype:
subset = doctype.internalSubset or self._getDeclarations()
if doctype.publicId:
ident = ('PUBLIC "%s" "%s"'
% (doctype.publicId, doctype.systemId))
elif doctype.systemId:
ident = 'SYSTEM "%s"' % doctype.systemId
else:
subset = ""
nsattrs = self._getNSattrs() # get ns decls from node's ancestors
document = _FRAGMENT_BUILDER_TEMPLATE % (ident, subset, nsattrs)
try:
parser.Parse(document, 1)
except:
self.reset()
raise
fragment = self.fragment
self.reset()
## self._parser = None
return fragment
def _getDeclarations(self):
"""Re-create the internal subset from the DocumentType node.
This is only needed if we don't already have the
internalSubset as a string.
"""
doctype = self.context.ownerDocument.doctype
s = ""
if doctype:
for i in range(doctype.notations.length):
notation = doctype.notations.item(i)
if s:
s = s + "\n "
s = "%s<!NOTATION %s" % (s, notation.nodeName)
if notation.publicId:
s = '%s PUBLIC "%s"\n "%s">' \
% (s, notation.publicId, notation.systemId)
else:
s = '%s SYSTEM "%s">' % (s, notation.systemId)
for i in range(doctype.entities.length):
entity = doctype.entities.item(i)
if s:
s = s + "\n "
s = "%s<!ENTITY %s" % (s, entity.nodeName)
if entity.publicId:
s = '%s PUBLIC "%s"\n "%s"' \
% (s, entity.publicId, entity.systemId)
elif entity.systemId:
s = '%s SYSTEM "%s"' % (s, entity.systemId)
else:
s = '%s "%s"' % (s, entity.firstChild.data)
if entity.notationName:
s = "%s NOTATION %s" % (s, entity.notationName)
s = s + ">"
return s
def _getNSattrs(self):
return ""
def external_entity_ref_handler(self, context, base, systemId, publicId):
if systemId == _FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID:
# this entref is the one that we made to put the subtree
# in; all of our given input is parsed in here.
old_document = self.document
old_cur_node = self.curNode
parser = self._parser.ExternalEntityParserCreate(context)
# put the real document back, parse into the fragment to return
self.document = self.originalDocument
self.fragment = self.document.createDocumentFragment()
self.curNode = self.fragment
try:
parser.Parse(self._source, 1)
finally:
self.curNode = old_cur_node
self.document = old_document
self._source = None
return -1
else:
return ExpatBuilder.external_entity_ref_handler(
self, context, base, systemId, publicId)
class Namespaces:
"""Mix-in class for builders; adds support for namespaces."""
def _initNamespaces(self):
# list of (prefix, uri) ns declarations. Namespace attrs are
# constructed from this and added to the element's attrs.
self._ns_ordered_prefixes = []
def createParser(self):
"""Create a new namespace-handling parser."""
parser = expat.ParserCreate(namespace_separator=" ")
parser.namespace_prefixes = True
return parser
def install(self, parser):
"""Insert the namespace-handlers onto the parser."""
ExpatBuilder.install(self, parser)
if self._options.namespace_declarations:
parser.StartNamespaceDeclHandler = (
self.start_namespace_decl_handler)
def start_namespace_decl_handler(self, prefix, uri):
"""Push this namespace declaration on our storage."""
self._ns_ordered_prefixes.append((prefix, uri))
def start_element_handler(self, name, attributes):
if ' ' in name:
uri, localname, prefix, qname = _parse_ns_name(self, name)
else:
uri = EMPTY_NAMESPACE
qname = name
localname = None
prefix = EMPTY_PREFIX
node = minidom.Element(qname, uri, prefix, localname)
node.ownerDocument = self.document
_append_child(self.curNode, node)
self.curNode = node
if self._ns_ordered_prefixes:
for prefix, uri in self._ns_ordered_prefixes:
if prefix:
a = minidom.Attr(_intern(self, 'xmlns:' + prefix),
XMLNS_NAMESPACE, prefix, "xmlns")
else:
a = minidom.Attr("xmlns", XMLNS_NAMESPACE,
"xmlns", EMPTY_PREFIX)
d = a.childNodes[0].__dict__
d['data'] = d['nodeValue'] = uri
d = a.__dict__
d['value'] = d['nodeValue'] = uri
d['ownerDocument'] = self.document
_set_attribute_node(node, a)
del self._ns_ordered_prefixes[:]
if attributes:
_attrs = node._attrs
_attrsNS = node._attrsNS
for i in range(0, len(attributes), 2):
aname = attributes[i]
value = attributes[i+1]
if ' ' in aname:
uri, localname, prefix, qname = _parse_ns_name(self, aname)
a = minidom.Attr(qname, uri, localname, prefix)
_attrs[qname] = a
_attrsNS[(uri, localname)] = a
else:
a = minidom.Attr(aname, EMPTY_NAMESPACE,
aname, EMPTY_PREFIX)
_attrs[aname] = a
_attrsNS[(EMPTY_NAMESPACE, aname)] = a
d = a.childNodes[0].__dict__
d['data'] = d['nodeValue'] = value
d = a.__dict__
d['ownerDocument'] = self.document
d['value'] = d['nodeValue'] = value
d['ownerElement'] = node
if __debug__:
# This only adds some asserts to the original
# end_element_handler(), so we only define this when -O is not
# used. If changing one, be sure to check the other to see if
# it needs to be changed as well.
#
def end_element_handler(self, name):
curNode = self.curNode
if ' ' in name:
uri, localname, prefix, qname = _parse_ns_name(self, name)
assert (curNode.namespaceURI == uri
and curNode.localName == localname
and curNode.prefix == prefix), \
"element stack messed up! (namespace)"
else:
assert curNode.nodeName == name, \
"element stack messed up - bad nodeName"
assert curNode.namespaceURI == EMPTY_NAMESPACE, \
"element stack messed up - bad namespaceURI"
self.curNode = curNode.parentNode
self._finish_end_element(curNode)
class ExpatBuilderNS(Namespaces, ExpatBuilder):
"""Document builder that supports namespaces."""
def reset(self):
ExpatBuilder.reset(self)
self._initNamespaces()
class FragmentBuilderNS(Namespaces, FragmentBuilder):
"""Fragment builder that supports namespaces."""
def reset(self):
FragmentBuilder.reset(self)
self._initNamespaces()
def _getNSattrs(self):
"""Return string of namespace attributes from this element and
ancestors."""
# XXX This needs to be re-written to walk the ancestors of the
# context to build up the namespace information from
# declarations, elements, and attributes found in context.
# Otherwise we have to store a bunch more data on the DOM
# (though that *might* be more reliable -- not clear).
attrs = ""
context = self.context
L = []
while context:
if hasattr(context, '_ns_prefix_uri'):
for prefix, uri in context._ns_prefix_uri.items():
# add every new NS decl from context to L and attrs string
if prefix in L:
continue
L.append(prefix)
if prefix:
declname = "xmlns:" + prefix
else:
declname = "xmlns"
if attrs:
attrs = "%s\n %s='%s'" % (attrs, declname, uri)
else:
attrs = " %s='%s'" % (declname, uri)
context = context.parentNode
return attrs
class ParseEscape(Exception):
"""Exception raised to short-circuit parsing in InternalSubsetExtractor."""
pass
class InternalSubsetExtractor(ExpatBuilder):
"""XML processor which can rip out the internal document type subset."""
subset = None
def getSubset(self):
"""Return the internal subset as a string."""
return self.subset
def parseFile(self, file):
try:
ExpatBuilder.parseFile(self, file)
except ParseEscape:
pass
def parseString(self, string):
try:
ExpatBuilder.parseString(self, string)
except ParseEscape:
pass
def install(self, parser):
parser.StartDoctypeDeclHandler = self.start_doctype_decl_handler
parser.StartElementHandler = self.start_element_handler
def start_doctype_decl_handler(self, name, publicId, systemId,
has_internal_subset):
if has_internal_subset:
parser = self.getParser()
self.subset = []
parser.DefaultHandler = self.subset.append
parser.EndDoctypeDeclHandler = self.end_doctype_decl_handler
else:
raise ParseEscape()
def end_doctype_decl_handler(self):
s = ''.join(self.subset).replace('\r\n', '\n').replace('\r', '\n')
self.subset = s
raise ParseEscape()
def start_element_handler(self, name, attrs):
raise ParseEscape()
def parse(file, namespaces=True):
"""Parse a document, returning the resulting Document node.
'file' may be either a file name or an open file object.
"""
if namespaces:
builder = ExpatBuilderNS()
else:
builder = ExpatBuilder()
if isinstance(file, StringTypes):
fp = open(file, 'rb')
try:
result = builder.parseFile(fp)
finally:
fp.close()
else:
result = builder.parseFile(file)
return result
def parseString(string, namespaces=True):
"""Parse a document from a string, returning the resulting
Document node.
"""
if namespaces:
builder = ExpatBuilderNS()
else:
builder = ExpatBuilder()
return builder.parseString(string)
def parseFragment(file, context, namespaces=True):
"""Parse a fragment of a document, given the context from which it
was originally extracted. context should be the parent of the
node(s) which are in the fragment.
'file' may be either a file name or an open file object.
"""
if namespaces:
builder = FragmentBuilderNS(context)
else:
builder = FragmentBuilder(context)
if isinstance(file, StringTypes):
fp = open(file, 'rb')
try:
result = builder.parseFile(fp)
finally:
fp.close()
else:
result = builder.parseFile(file)
return result
def parseFragmentString(string, context, namespaces=True):
"""Parse a fragment of a document from a string, given the context
from which it was originally extracted. context should be the
parent of the node(s) which are in the fragment.
"""
if namespaces:
builder = FragmentBuilderNS(context)
else:
builder = FragmentBuilder(context)
return builder.parseString(string)
def makeBuilder(options):
"""Create a builder based on an Options object."""
if options.namespaces:
return ExpatBuilderNS(options)
else:
return ExpatBuilder(options)
"""Python version compatibility support for minidom."""
# This module should only be imported using "import *".
#
# The following names are defined:
#
# NodeList -- lightest possible NodeList implementation
#
# EmptyNodeList -- lightest possible NodeList that is guaranteed to
# remain empty (immutable)
#
# StringTypes -- tuple of defined string types
#
# defproperty -- function used in conjunction with GetattrMagic;
# using these together is needed to make them work
# as efficiently as possible in both Python 2.2+
# and older versions. For example:
#
# class MyClass(GetattrMagic):
# def _get_myattr(self):
# return something
#
# defproperty(MyClass, "myattr",
# "return some value")
#
# For Python 2.2 and newer, this will construct a
# property object on the class, which avoids
# needing to override __getattr__(). It will only
# work for read-only attributes.
#
# For older versions of Python, inheriting from
# GetattrMagic will use the traditional
# __getattr__() hackery to achieve the same effect,
# but less efficiently.
#
# defproperty() should be used for each version of
# the relevant _get_<property>() function.
__all__ = ["NodeList", "EmptyNodeList", "StringTypes", "defproperty"]
import xml.dom
try:
unicode
except NameError:
StringTypes = type(''),
else:
StringTypes = type(''), type(unicode(''))
class NodeList(list):
__slots__ = ()
def item(self, index):
if 0 <= index < len(self):
return self[index]
def _get_length(self):
return len(self)
def _set_length(self, value):
raise xml.dom.NoModificationAllowedErr(
"attempt to modify read-only attribute 'length'")
length = property(_get_length, _set_length,
doc="The number of nodes in the NodeList.")
# For backward compatibility
def __setstate__(self, state):
if state is None:
state = []
self[:] = state
class EmptyNodeList(tuple):
__slots__ = ()
def __add__(self, other):
NL = NodeList()
NL.extend(other)
return NL
def __radd__(self, other):
NL = NodeList()
NL.extend(other)
return NL
def item(self, index):
return None
def _get_length(self):
return 0
def _set_length(self, value):
raise xml.dom.NoModificationAllowedErr(
"attempt to modify read-only attribute 'length'")
length = property(_get_length, _set_length,
doc="The number of nodes in the NodeList.")
def defproperty(klass, name, doc):
get = getattr(klass, ("_get_" + name)).im_func
def set(self, value, name=name):
raise xml.dom.NoModificationAllowedErr(
"attempt to modify read-only attribute " + repr(name))
assert not hasattr(klass, "_set_" + name), \
"expected not to find _set_" + name
prop = property(get, set, doc=doc)
setattr(klass, name, prop)
"""Simple implementation of the Level 1 DOM.
Namespaces and other minor Level 2 features are also supported.
parse("foo.xml")
parseString("<foo><bar/></foo>")
Todo:
=====
* convenience methods for getting elements and text.
* more testing
* bring some of the writer and linearizer code into conformance with this
interface
* SAX 2 namespaces
"""
import xml.dom
from xml.dom import EMPTY_NAMESPACE, EMPTY_PREFIX, XMLNS_NAMESPACE, domreg
from xml.dom.minicompat import *
from xml.dom.xmlbuilder import DOMImplementationLS, DocumentLS
# This is used by the ID-cache invalidation checks; the list isn't
# actually complete, since the nodes being checked will never be the
# DOCUMENT_NODE or DOCUMENT_FRAGMENT_NODE. (The node being checked is
# the node being added or removed, not the node being modified.)
#
_nodeTypes_with_children = (xml.dom.Node.ELEMENT_NODE,
xml.dom.Node.ENTITY_REFERENCE_NODE)
class Node(xml.dom.Node):
namespaceURI = None # this is non-null only for elements and attributes
parentNode = None
ownerDocument = None
nextSibling = None
previousSibling = None
prefix = EMPTY_PREFIX # non-null only for NS elements and attributes
def __nonzero__(self):
return True
def toxml(self, encoding = None):
return self.toprettyxml("", "", encoding)
def toprettyxml(self, indent="\t", newl="\n", encoding = None):
# indent = the indentation string to prepend, per level
# newl = the newline string to append
writer = _get_StringIO()
if encoding is not None:
import codecs
# Can't use codecs.getwriter to preserve 2.0 compatibility
writer = codecs.lookup(encoding)[3](writer)
if self.nodeType == Node.DOCUMENT_NODE:
# Can pass encoding only to document, to put it into XML header
self.writexml(writer, "", indent, newl, encoding)
else:
self.writexml(writer, "", indent, newl)
return writer.getvalue()
def hasChildNodes(self):
if self.childNodes:
return True
else:
return False
def _get_childNodes(self):
return self.childNodes
def _get_firstChild(self):
if self.childNodes:
return self.childNodes[0]
def _get_lastChild(self):
if self.childNodes:
return self.childNodes[-1]
def insertBefore(self, newChild, refChild):
if newChild.nodeType == self.DOCUMENT_FRAGMENT_NODE:
for c in tuple(newChild.childNodes):
self.insertBefore(c, refChild)
### The DOM does not clearly specify what to return in this case
return newChild
if newChild.nodeType not in self._child_node_types:
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(newChild), repr(self)))
if newChild.parentNode is not None:
newChild.parentNode.removeChild(newChild)
if refChild is None:
self.appendChild(newChild)
else:
try:
index = self.childNodes.index(refChild)
except ValueError:
raise xml.dom.NotFoundErr()
if newChild.nodeType in _nodeTypes_with_children:
_clear_id_cache(self)
self.childNodes.insert(index, newChild)
newChild.nextSibling = refChild
refChild.previousSibling = newChild
if index:
node = self.childNodes[index-1]
node.nextSibling = newChild
newChild.previousSibling = node
else:
newChild.previousSibling = None
newChild.parentNode = self
return newChild
def appendChild(self, node):
if node.nodeType == self.DOCUMENT_FRAGMENT_NODE:
for c in tuple(node.childNodes):
self.appendChild(c)
### The DOM does not clearly specify what to return in this case
return node
if node.nodeType not in self._child_node_types:
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(node), repr(self)))
elif node.nodeType in _nodeTypes_with_children:
_clear_id_cache(self)
if node.parentNode is not None:
node.parentNode.removeChild(node)
_append_child(self, node)
node.nextSibling = None
return node
def replaceChild(self, newChild, oldChild):
if newChild.nodeType == self.DOCUMENT_FRAGMENT_NODE:
refChild = oldChild.nextSibling
self.removeChild(oldChild)
return self.insertBefore(newChild, refChild)
if newChild.nodeType not in self._child_node_types:
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(newChild), repr(self)))
if newChild is oldChild:
return
if newChild.parentNode is not None:
newChild.parentNode.removeChild(newChild)
try:
index = self.childNodes.index(oldChild)
except ValueError:
raise xml.dom.NotFoundErr()
self.childNodes[index] = newChild
newChild.parentNode = self
oldChild.parentNode = None
if (newChild.nodeType in _nodeTypes_with_children
or oldChild.nodeType in _nodeTypes_with_children):
_clear_id_cache(self)
newChild.nextSibling = oldChild.nextSibling
newChild.previousSibling = oldChild.previousSibling
oldChild.nextSibling = None
oldChild.previousSibling = None
if newChild.previousSibling:
newChild.previousSibling.nextSibling = newChild
if newChild.nextSibling:
newChild.nextSibling.previousSibling = newChild
return oldChild
def removeChild(self, oldChild):
try:
self.childNodes.remove(oldChild)
except ValueError:
raise xml.dom.NotFoundErr()
if oldChild.nextSibling is not None:
oldChild.nextSibling.previousSibling = oldChild.previousSibling
if oldChild.previousSibling is not None:
oldChild.previousSibling.nextSibling = oldChild.nextSibling
oldChild.nextSibling = oldChild.previousSibling = None
if oldChild.nodeType in _nodeTypes_with_children:
_clear_id_cache(self)
oldChild.parentNode = None
return oldChild
def normalize(self):
L = []
for child in self.childNodes:
if child.nodeType == Node.TEXT_NODE:
if not child.data:
# empty text node; discard
if L:
L[-1].nextSibling = child.nextSibling
if child.nextSibling:
child.nextSibling.previousSibling = child.previousSibling
child.unlink()
elif L and L[-1].nodeType == child.nodeType:
# collapse text node
node = L[-1]
node.data = node.data + child.data
node.nextSibling = child.nextSibling
if child.nextSibling:
child.nextSibling.previousSibling = node
child.unlink()
else:
L.append(child)
else:
L.append(child)
if child.nodeType == Node.ELEMENT_NODE:
child.normalize()
self.childNodes[:] = L
def cloneNode(self, deep):
return _clone_node(self, deep, self.ownerDocument or self)
def isSupported(self, feature, version):
return self.ownerDocument.implementation.hasFeature(feature, version)
def _get_localName(self):
# Overridden in Element and Attr where localName can be Non-Null
return None
# Node interfaces from Level 3 (WD 9 April 2002)
def isSameNode(self, other):
return self is other
def getInterface(self, feature):
if self.isSupported(feature, None):
return self
else:
return None
# The "user data" functions use a dictionary that is only present
# if some user data has been set, so be careful not to assume it
# exists.
def getUserData(self, key):
try:
return self._user_data[key][0]
except (AttributeError, KeyError):
return None
def setUserData(self, key, data, handler):
old = None
try:
d = self._user_data
except AttributeError:
d = {}
self._user_data = d
if key in d:
old = d[key][0]
if data is None:
# ignore handlers passed for None
handler = None
if old is not None:
del d[key]
else:
d[key] = (data, handler)
return old
def _call_user_data_handler(self, operation, src, dst):
if hasattr(self, "_user_data"):
for key, (data, handler) in self._user_data.items():
if handler is not None:
handler.handle(operation, key, data, src, dst)
# minidom-specific API:
def unlink(self):
self.parentNode = self.ownerDocument = None
if self.childNodes:
for child in self.childNodes:
child.unlink()
self.childNodes = NodeList()
self.previousSibling = None
self.nextSibling = None
defproperty(Node, "firstChild", doc="First child node, or None.")
defproperty(Node, "lastChild", doc="Last child node, or None.")
defproperty(Node, "localName", doc="Namespace-local name of this node.")
def _append_child(self, node):
# fast path with less checks; usable by DOM builders if careful
childNodes = self.childNodes
if childNodes:
last = childNodes[-1]
node.__dict__["previousSibling"] = last
last.__dict__["nextSibling"] = node
childNodes.append(node)
node.__dict__["parentNode"] = self
def _in_document(node):
# return True iff node is part of a document tree
while node is not None:
if node.nodeType == Node.DOCUMENT_NODE:
return True
node = node.parentNode
return False
def _write_data(writer, data):
"Writes datachars to writer."
if data:
data = data.replace("&", "&").replace("<", "<"). \
replace("\"", """).replace(">", ">")
writer.write(data)
def _get_elements_by_tagName_helper(parent, name, rc):
for node in parent.childNodes:
if node.nodeType == Node.ELEMENT_NODE and \
(name == "*" or node.tagName == name):
rc.append(node)
_get_elements_by_tagName_helper(node, name, rc)
return rc
def _get_elements_by_tagName_ns_helper(parent, nsURI, localName, rc):
for node in parent.childNodes:
if node.nodeType == Node.ELEMENT_NODE:
if ((localName == "*" or node.localName == localName) and
(nsURI == "*" or node.namespaceURI == nsURI)):
rc.append(node)
_get_elements_by_tagName_ns_helper(node, nsURI, localName, rc)
return rc
class DocumentFragment(Node):
nodeType = Node.DOCUMENT_FRAGMENT_NODE
nodeName = "#document-fragment"
nodeValue = None
attributes = None
parentNode = None
_child_node_types = (Node.ELEMENT_NODE,
Node.TEXT_NODE,
Node.CDATA_SECTION_NODE,
Node.ENTITY_REFERENCE_NODE,
Node.PROCESSING_INSTRUCTION_NODE,
Node.COMMENT_NODE,
Node.NOTATION_NODE)
def __init__(self):
self.childNodes = NodeList()
class Attr(Node):
nodeType = Node.ATTRIBUTE_NODE
attributes = None
ownerElement = None
specified = False
_is_id = False
_child_node_types = (Node.TEXT_NODE, Node.ENTITY_REFERENCE_NODE)
def __init__(self, qName, namespaceURI=EMPTY_NAMESPACE, localName=None,
prefix=None):
# skip setattr for performance
d = self.__dict__
d["nodeName"] = d["name"] = qName
d["namespaceURI"] = namespaceURI
d["prefix"] = prefix
d['childNodes'] = NodeList()
# Add the single child node that represents the value of the attr
self.childNodes.append(Text())
# nodeValue and value are set elsewhere
def _get_localName(self):
return self.nodeName.split(":", 1)[-1]
def _get_specified(self):
return self.specified
def __setattr__(self, name, value):
d = self.__dict__
if name in ("value", "nodeValue"):
d["value"] = d["nodeValue"] = value
d2 = self.childNodes[0].__dict__
d2["data"] = d2["nodeValue"] = value
if self.ownerElement is not None:
_clear_id_cache(self.ownerElement)
elif name in ("name", "nodeName"):
d["name"] = d["nodeName"] = value
if self.ownerElement is not None:
_clear_id_cache(self.ownerElement)
else:
d[name] = value
def _set_prefix(self, prefix):
nsuri = self.namespaceURI
if prefix == "xmlns":
if nsuri and nsuri != XMLNS_NAMESPACE:
raise xml.dom.NamespaceErr(
"illegal use of 'xmlns' prefix for the wrong namespace")
d = self.__dict__
d['prefix'] = prefix
if prefix is None:
newName = self.localName
else:
newName = "%s:%s" % (prefix, self.localName)
if self.ownerElement:
_clear_id_cache(self.ownerElement)
d['nodeName'] = d['name'] = newName
def _set_value(self, value):
d = self.__dict__
d['value'] = d['nodeValue'] = value
if self.ownerElement:
_clear_id_cache(self.ownerElement)
self.childNodes[0].data = value
def unlink(self):
# This implementation does not call the base implementation
# since most of that is not needed, and the expense of the
# method call is not warranted. We duplicate the removal of
# children, but that's all we needed from the base class.
elem = self.ownerElement
if elem is not None:
del elem._attrs[self.nodeName]
del elem._attrsNS[(self.namespaceURI, self.localName)]
if self._is_id:
self._is_id = False
elem._magic_id_nodes -= 1
self.ownerDocument._magic_id_count -= 1
for child in self.childNodes:
child.unlink()
del self.childNodes[:]
def _get_isId(self):
if self._is_id:
return True
doc = self.ownerDocument
elem = self.ownerElement
if doc is None or elem is None:
return False
info = doc._get_elem_info(elem)
if info is None:
return False
if self.namespaceURI:
return info.isIdNS(self.namespaceURI, self.localName)
else:
return info.isId(self.nodeName)
def _get_schemaType(self):
doc = self.ownerDocument
elem = self.ownerElement
if doc is None or elem is None:
return _no_type
info = doc._get_elem_info(elem)
if info is None:
return _no_type
if self.namespaceURI:
return info.getAttributeTypeNS(self.namespaceURI, self.localName)
else:
return info.getAttributeType(self.nodeName)
defproperty(Attr, "isId", doc="True if this attribute is an ID.")
defproperty(Attr, "localName", doc="Namespace-local name of this attribute.")
defproperty(Attr, "schemaType", doc="Schema type for this attribute.")
class NamedNodeMap(object):
"""The attribute list is a transient interface to the underlying
dictionaries. Mutations here will change the underlying element's
dictionary.
Ordering is imposed artificially and does not reflect the order of
attributes as found in an input document.
"""
__slots__ = ('_attrs', '_attrsNS', '_ownerElement')
def __init__(self, attrs, attrsNS, ownerElement):
self._attrs = attrs
self._attrsNS = attrsNS
self._ownerElement = ownerElement
def _get_length(self):
return len(self._attrs)
def item(self, index):
try:
return self[self._attrs.keys()[index]]
except IndexError:
return None
def items(self):
L = []
for node in self._attrs.values():
L.append((node.nodeName, node.value))
return L
def itemsNS(self):
L = []
for node in self._attrs.values():
L.append(((node.namespaceURI, node.localName), node.value))
return L
def has_key(self, key):
if isinstance(key, StringTypes):
return key in self._attrs
else:
return key in self._attrsNS
def keys(self):
return self._attrs.keys()
def keysNS(self):
return self._attrsNS.keys()
def values(self):
return self._attrs.values()
def get(self, name, value=None):
return self._attrs.get(name, value)
__len__ = _get_length
__hash__ = None # Mutable type can't be correctly hashed
def __cmp__(self, other):
if self._attrs is getattr(other, "_attrs", None):
return 0
else:
return cmp(id(self), id(other))
def __getitem__(self, attname_or_tuple):
if isinstance(attname_or_tuple, tuple):
return self._attrsNS[attname_or_tuple]
else:
return self._attrs[attname_or_tuple]
# same as set
def __setitem__(self, attname, value):
if isinstance(value, StringTypes):
try:
node = self._attrs[attname]
except KeyError:
node = Attr(attname)
node.ownerDocument = self._ownerElement.ownerDocument
self.setNamedItem(node)
node.value = value
else:
if not isinstance(value, Attr):
raise TypeError, "value must be a string or Attr object"
node = value
self.setNamedItem(node)
def getNamedItem(self, name):
try:
return self._attrs[name]
except KeyError:
return None
def getNamedItemNS(self, namespaceURI, localName):
try:
return self._attrsNS[(namespaceURI, localName)]
except KeyError:
return None
def removeNamedItem(self, name):
n = self.getNamedItem(name)
if n is not None:
_clear_id_cache(self._ownerElement)
del self._attrs[n.nodeName]
del self._attrsNS[(n.namespaceURI, n.localName)]
if 'ownerElement' in n.__dict__:
n.__dict__['ownerElement'] = None
return n
else:
raise xml.dom.NotFoundErr()
def removeNamedItemNS(self, namespaceURI, localName):
n = self.getNamedItemNS(namespaceURI, localName)
if n is not None:
_clear_id_cache(self._ownerElement)
del self._attrsNS[(n.namespaceURI, n.localName)]
del self._attrs[n.nodeName]
if 'ownerElement' in n.__dict__:
n.__dict__['ownerElement'] = None
return n
else:
raise xml.dom.NotFoundErr()
def setNamedItem(self, node):
if not isinstance(node, Attr):
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(node), repr(self)))
old = self._attrs.get(node.name)
if old:
old.unlink()
self._attrs[node.name] = node
self._attrsNS[(node.namespaceURI, node.localName)] = node
node.ownerElement = self._ownerElement
_clear_id_cache(node.ownerElement)
return old
def setNamedItemNS(self, node):
return self.setNamedItem(node)
def __delitem__(self, attname_or_tuple):
node = self[attname_or_tuple]
_clear_id_cache(node.ownerElement)
node.unlink()
def __getstate__(self):
return self._attrs, self._attrsNS, self._ownerElement
def __setstate__(self, state):
self._attrs, self._attrsNS, self._ownerElement = state
defproperty(NamedNodeMap, "length",
doc="Number of nodes in the NamedNodeMap.")
AttributeList = NamedNodeMap
class TypeInfo(object):
__slots__ = 'namespace', 'name'
def __init__(self, namespace, name):
self.namespace = namespace
self.name = name
def __repr__(self):
if self.namespace:
return "<TypeInfo %r (from %r)>" % (self.name, self.namespace)
else:
return "<TypeInfo %r>" % self.name
def _get_name(self):
return self.name
def _get_namespace(self):
return self.namespace
_no_type = TypeInfo(None, None)
class Element(Node):
nodeType = Node.ELEMENT_NODE
nodeValue = None
schemaType = _no_type
_magic_id_nodes = 0
_child_node_types = (Node.ELEMENT_NODE,
Node.PROCESSING_INSTRUCTION_NODE,
Node.COMMENT_NODE,
Node.TEXT_NODE,
Node.CDATA_SECTION_NODE,
Node.ENTITY_REFERENCE_NODE)
def __init__(self, tagName, namespaceURI=EMPTY_NAMESPACE, prefix=None,
localName=None):
self.tagName = self.nodeName = tagName
self.prefix = prefix
self.namespaceURI = namespaceURI
self.childNodes = NodeList()
self._attrs = {} # attributes are double-indexed:
self._attrsNS = {} # tagName -> Attribute
# URI,localName -> Attribute
# in the future: consider lazy generation
# of attribute objects this is too tricky
# for now because of headaches with
# namespaces.
def _get_localName(self):
return self.tagName.split(":", 1)[-1]
def _get_tagName(self):
return self.tagName
def unlink(self):
for attr in self._attrs.values():
attr.unlink()
self._attrs = None
self._attrsNS = None
Node.unlink(self)
def getAttribute(self, attname):
try:
return self._attrs[attname].value
except KeyError:
return ""
def getAttributeNS(self, namespaceURI, localName):
try:
return self._attrsNS[(namespaceURI, localName)].value
except KeyError:
return ""
def setAttribute(self, attname, value):
attr = self.getAttributeNode(attname)
if attr is None:
attr = Attr(attname)
# for performance
d = attr.__dict__
d["value"] = d["nodeValue"] = value
d["ownerDocument"] = self.ownerDocument
self.setAttributeNode(attr)
elif value != attr.value:
d = attr.__dict__
d["value"] = d["nodeValue"] = value
if attr.isId:
_clear_id_cache(self)
def setAttributeNS(self, namespaceURI, qualifiedName, value):
prefix, localname = _nssplit(qualifiedName)
attr = self.getAttributeNodeNS(namespaceURI, localname)
if attr is None:
# for performance
attr = Attr(qualifiedName, namespaceURI, localname, prefix)
d = attr.__dict__
d["prefix"] = prefix
d["nodeName"] = qualifiedName
d["value"] = d["nodeValue"] = value
d["ownerDocument"] = self.ownerDocument
self.setAttributeNode(attr)
else:
d = attr.__dict__
if value != attr.value:
d["value"] = d["nodeValue"] = value
if attr.isId:
_clear_id_cache(self)
if attr.prefix != prefix:
d["prefix"] = prefix
d["nodeName"] = qualifiedName
def getAttributeNode(self, attrname):
return self._attrs.get(attrname)
def getAttributeNodeNS(self, namespaceURI, localName):
return self._attrsNS.get((namespaceURI, localName))
def setAttributeNode(self, attr):
if attr.ownerElement not in (None, self):
raise xml.dom.InuseAttributeErr("attribute node already owned")
old1 = self._attrs.get(attr.name, None)
if old1 is not None:
self.removeAttributeNode(old1)
old2 = self._attrsNS.get((attr.namespaceURI, attr.localName), None)
if old2 is not None and old2 is not old1:
self.removeAttributeNode(old2)
_set_attribute_node(self, attr)
if old1 is not attr:
# It might have already been part of this node, in which case
# it doesn't represent a change, and should not be returned.
return old1
if old2 is not attr:
return old2
setAttributeNodeNS = setAttributeNode
def removeAttribute(self, name):
try:
attr = self._attrs[name]
except KeyError:
raise xml.dom.NotFoundErr()
self.removeAttributeNode(attr)
def removeAttributeNS(self, namespaceURI, localName):
try:
attr = self._attrsNS[(namespaceURI, localName)]
except KeyError:
raise xml.dom.NotFoundErr()
self.removeAttributeNode(attr)
def removeAttributeNode(self, node):
if node is None:
raise xml.dom.NotFoundErr()
try:
self._attrs[node.name]
except KeyError:
raise xml.dom.NotFoundErr()
_clear_id_cache(self)
node.unlink()
# Restore this since the node is still useful and otherwise
# unlinked
node.ownerDocument = self.ownerDocument
removeAttributeNodeNS = removeAttributeNode
def hasAttribute(self, name):
return name in self._attrs
def hasAttributeNS(self, namespaceURI, localName):
return (namespaceURI, localName) in self._attrsNS
def getElementsByTagName(self, name):
return _get_elements_by_tagName_helper(self, name, NodeList())
def getElementsByTagNameNS(self, namespaceURI, localName):
return _get_elements_by_tagName_ns_helper(
self, namespaceURI, localName, NodeList())
def __repr__(self):
return "<DOM Element: %s at %#x>" % (self.tagName, id(self))
def writexml(self, writer, indent="", addindent="", newl=""):
# indent = current indentation
# addindent = indentation to add to higher levels
# newl = newline string
writer.write(indent+"<" + self.tagName)
attrs = self._get_attributes()
a_names = attrs.keys()
a_names.sort()
for a_name in a_names:
writer.write(" %s=\"" % a_name)
_write_data(writer, attrs[a_name].value)
writer.write("\"")
if self.childNodes:
writer.write(">")
if (len(self.childNodes) == 1 and
self.childNodes[0].nodeType == Node.TEXT_NODE):
self.childNodes[0].writexml(writer, '', '', '')
else:
writer.write(newl)
for node in self.childNodes:
node.writexml(writer, indent+addindent, addindent, newl)
writer.write(indent)
writer.write("</%s>%s" % (self.tagName, newl))
else:
writer.write("/>%s"%(newl))
def _get_attributes(self):
return NamedNodeMap(self._attrs, self._attrsNS, self)
def hasAttributes(self):
if self._attrs:
return True
else:
return False
# DOM Level 3 attributes, based on the 22 Oct 2002 draft
def setIdAttribute(self, name):
idAttr = self.getAttributeNode(name)
self.setIdAttributeNode(idAttr)
def setIdAttributeNS(self, namespaceURI, localName):
idAttr = self.getAttributeNodeNS(namespaceURI, localName)
self.setIdAttributeNode(idAttr)
def setIdAttributeNode(self, idAttr):
if idAttr is None or not self.isSameNode(idAttr.ownerElement):
raise xml.dom.NotFoundErr()
if _get_containing_entref(self) is not None:
raise xml.dom.NoModificationAllowedErr()
if not idAttr._is_id:
idAttr.__dict__['_is_id'] = True
self._magic_id_nodes += 1
self.ownerDocument._magic_id_count += 1
_clear_id_cache(self)
defproperty(Element, "attributes",
doc="NamedNodeMap of attributes on the element.")
defproperty(Element, "localName",
doc="Namespace-local name of this element.")
def _set_attribute_node(element, attr):
_clear_id_cache(element)
element._attrs[attr.name] = attr
element._attrsNS[(attr.namespaceURI, attr.localName)] = attr
# This creates a circular reference, but Element.unlink()
# breaks the cycle since the references to the attribute
# dictionaries are tossed.
attr.__dict__['ownerElement'] = element
class Childless:
"""Mixin that makes childless-ness easy to implement and avoids
the complexity of the Node methods that deal with children.
"""
attributes = None
childNodes = EmptyNodeList()
firstChild = None
lastChild = None
def _get_firstChild(self):
return None
def _get_lastChild(self):
return None
def appendChild(self, node):
raise xml.dom.HierarchyRequestErr(
self.nodeName + " nodes cannot have children")
def hasChildNodes(self):
return False
def insertBefore(self, newChild, refChild):
raise xml.dom.HierarchyRequestErr(
self.nodeName + " nodes do not have children")
def removeChild(self, oldChild):
raise xml.dom.NotFoundErr(
self.nodeName + " nodes do not have children")
def normalize(self):
# For childless nodes, normalize() has nothing to do.
pass
def replaceChild(self, newChild, oldChild):
raise xml.dom.HierarchyRequestErr(
self.nodeName + " nodes do not have children")
class ProcessingInstruction(Childless, Node):
nodeType = Node.PROCESSING_INSTRUCTION_NODE
def __init__(self, target, data):
self.target = self.nodeName = target
self.data = self.nodeValue = data
def _get_data(self):
return self.data
def _set_data(self, value):
d = self.__dict__
d['data'] = d['nodeValue'] = value
def _get_target(self):
return self.target
def _set_target(self, value):
d = self.__dict__
d['target'] = d['nodeName'] = value
def __setattr__(self, name, value):
if name == "data" or name == "nodeValue":
self.__dict__['data'] = self.__dict__['nodeValue'] = value
elif name == "target" or name == "nodeName":
self.__dict__['target'] = self.__dict__['nodeName'] = value
else:
self.__dict__[name] = value
def writexml(self, writer, indent="", addindent="", newl=""):
writer.write("%s<?%s %s?>%s" % (indent,self.target, self.data, newl))
class CharacterData(Childless, Node):
def _get_length(self):
return len(self.data)
__len__ = _get_length
def _get_data(self):
return self.__dict__['data']
def _set_data(self, data):
d = self.__dict__
d['data'] = d['nodeValue'] = data
_get_nodeValue = _get_data
_set_nodeValue = _set_data
def __setattr__(self, name, value):
if name == "data" or name == "nodeValue":
self.__dict__['data'] = self.__dict__['nodeValue'] = value
else:
self.__dict__[name] = value
def __repr__(self):
data = self.data
if len(data) > 10:
dotdotdot = "..."
else:
dotdotdot = ""
return '<DOM %s node "%r%s">' % (
self.__class__.__name__, data[0:10], dotdotdot)
def substringData(self, offset, count):
if offset < 0:
raise xml.dom.IndexSizeErr("offset cannot be negative")
if offset >= len(self.data):
raise xml.dom.IndexSizeErr("offset cannot be beyond end of data")
if count < 0:
raise xml.dom.IndexSizeErr("count cannot be negative")
return self.data[offset:offset+count]
def appendData(self, arg):
self.data = self.data + arg
def insertData(self, offset, arg):
if offset < 0:
raise xml.dom.IndexSizeErr("offset cannot be negative")
if offset >= len(self.data):
raise xml.dom.IndexSizeErr("offset cannot be beyond end of data")
if arg:
self.data = "%s%s%s" % (
self.data[:offset], arg, self.data[offset:])
def deleteData(self, offset, count):
if offset < 0:
raise xml.dom.IndexSizeErr("offset cannot be negative")
if offset >= len(self.data):
raise xml.dom.IndexSizeErr("offset cannot be beyond end of data")
if count < 0:
raise xml.dom.IndexSizeErr("count cannot be negative")
if count:
self.data = self.data[:offset] + self.data[offset+count:]
def replaceData(self, offset, count, arg):
if offset < 0:
raise xml.dom.IndexSizeErr("offset cannot be negative")
if offset >= len(self.data):
raise xml.dom.IndexSizeErr("offset cannot be beyond end of data")
if count < 0:
raise xml.dom.IndexSizeErr("count cannot be negative")
if count:
self.data = "%s%s%s" % (
self.data[:offset], arg, self.data[offset+count:])
defproperty(CharacterData, "length", doc="Length of the string data.")
class Text(CharacterData):
# Make sure we don't add an instance __dict__ if we don't already
# have one, at least when that's possible:
# XXX this does not work, CharacterData is an old-style class
# __slots__ = ()
nodeType = Node.TEXT_NODE
nodeName = "#text"
attributes = None
def splitText(self, offset):
if offset < 0 or offset > len(self.data):
raise xml.dom.IndexSizeErr("illegal offset value")
newText = self.__class__()
newText.data = self.data[offset:]
newText.ownerDocument = self.ownerDocument
next = self.nextSibling
if self.parentNode and self in self.parentNode.childNodes:
if next is None:
self.parentNode.appendChild(newText)
else:
self.parentNode.insertBefore(newText, next)
self.data = self.data[:offset]
return newText
def writexml(self, writer, indent="", addindent="", newl=""):
_write_data(writer, "%s%s%s" % (indent, self.data, newl))
# DOM Level 3 (WD 9 April 2002)
def _get_wholeText(self):
L = [self.data]
n = self.previousSibling
while n is not None:
if n.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
L.insert(0, n.data)
n = n.previousSibling
else:
break
n = self.nextSibling
while n is not None:
if n.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
L.append(n.data)
n = n.nextSibling
else:
break
return ''.join(L)
def replaceWholeText(self, content):
# XXX This needs to be seriously changed if minidom ever
# supports EntityReference nodes.
parent = self.parentNode
n = self.previousSibling
while n is not None:
if n.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
next = n.previousSibling
parent.removeChild(n)
n = next
else:
break
n = self.nextSibling
if not content:
parent.removeChild(self)
while n is not None:
if n.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
next = n.nextSibling
parent.removeChild(n)
n = next
else:
break
if content:
d = self.__dict__
d['data'] = content
d['nodeValue'] = content
return self
else:
return None
def _get_isWhitespaceInElementContent(self):
if self.data.strip():
return False
elem = _get_containing_element(self)
if elem is None:
return False
info = self.ownerDocument._get_elem_info(elem)
if info is None:
return False
else:
return info.isElementContent()
defproperty(Text, "isWhitespaceInElementContent",
doc="True iff this text node contains only whitespace"
" and is in element content.")
defproperty(Text, "wholeText",
doc="The text of all logically-adjacent text nodes.")
def _get_containing_element(node):
c = node.parentNode
while c is not None:
if c.nodeType == Node.ELEMENT_NODE:
return c
c = c.parentNode
return None
def _get_containing_entref(node):
c = node.parentNode
while c is not None:
if c.nodeType == Node.ENTITY_REFERENCE_NODE:
return c
c = c.parentNode
return None
class Comment(Childless, CharacterData):
nodeType = Node.COMMENT_NODE
nodeName = "#comment"
def __init__(self, data):
self.data = self.nodeValue = data
def writexml(self, writer, indent="", addindent="", newl=""):
if "--" in self.data:
raise ValueError("'--' is not allowed in a comment node")
writer.write("%s<!--%s-->%s" % (indent, self.data, newl))
class CDATASection(Text):
# Make sure we don't add an instance __dict__ if we don't already
# have one, at least when that's possible:
# XXX this does not work, Text is an old-style class
# __slots__ = ()
nodeType = Node.CDATA_SECTION_NODE
nodeName = "#cdata-section"
def writexml(self, writer, indent="", addindent="", newl=""):
if self.data.find("]]>") >= 0:
raise ValueError("']]>' not allowed in a CDATA section")
writer.write("<![CDATA[%s]]>" % self.data)
class ReadOnlySequentialNamedNodeMap(object):
__slots__ = '_seq',
def __init__(self, seq=()):
# seq should be a list or tuple
self._seq = seq
def __len__(self):
return len(self._seq)
def _get_length(self):
return len(self._seq)
def getNamedItem(self, name):
for n in self._seq:
if n.nodeName == name:
return n
def getNamedItemNS(self, namespaceURI, localName):
for n in self._seq:
if n.namespaceURI == namespaceURI and n.localName == localName:
return n
def __getitem__(self, name_or_tuple):
if isinstance(name_or_tuple, tuple):
node = self.getNamedItemNS(*name_or_tuple)
else:
node = self.getNamedItem(name_or_tuple)
if node is None:
raise KeyError, name_or_tuple
return node
def item(self, index):
if index < 0:
return None
try:
return self._seq[index]
except IndexError:
return None
def removeNamedItem(self, name):
raise xml.dom.NoModificationAllowedErr(
"NamedNodeMap instance is read-only")
def removeNamedItemNS(self, namespaceURI, localName):
raise xml.dom.NoModificationAllowedErr(
"NamedNodeMap instance is read-only")
def setNamedItem(self, node):
raise xml.dom.NoModificationAllowedErr(
"NamedNodeMap instance is read-only")
def setNamedItemNS(self, node):
raise xml.dom.NoModificationAllowedErr(
"NamedNodeMap instance is read-only")
def __getstate__(self):
return [self._seq]
def __setstate__(self, state):
self._seq = state[0]
defproperty(ReadOnlySequentialNamedNodeMap, "length",
doc="Number of entries in the NamedNodeMap.")
class Identified:
"""Mix-in class that supports the publicId and systemId attributes."""
# XXX this does not work, this is an old-style class
# __slots__ = 'publicId', 'systemId'
def _identified_mixin_init(self, publicId, systemId):
self.publicId = publicId
self.systemId = systemId
def _get_publicId(self):
return self.publicId
def _get_systemId(self):
return self.systemId
class DocumentType(Identified, Childless, Node):
nodeType = Node.DOCUMENT_TYPE_NODE
nodeValue = None
name = None
publicId = None
systemId = None
internalSubset = None
def __init__(self, qualifiedName):
self.entities = ReadOnlySequentialNamedNodeMap()
self.notations = ReadOnlySequentialNamedNodeMap()
if qualifiedName:
prefix, localname = _nssplit(qualifiedName)
self.name = localname
self.nodeName = self.name
def _get_internalSubset(self):
return self.internalSubset
def cloneNode(self, deep):
if self.ownerDocument is None:
# it's ok
clone = DocumentType(None)
clone.name = self.name
clone.nodeName = self.name
operation = xml.dom.UserDataHandler.NODE_CLONED
if deep:
clone.entities._seq = []
clone.notations._seq = []
for n in self.notations._seq:
notation = Notation(n.nodeName, n.publicId, n.systemId)
clone.notations._seq.append(notation)
n._call_user_data_handler(operation, n, notation)
for e in self.entities._seq:
entity = Entity(e.nodeName, e.publicId, e.systemId,
e.notationName)
entity.actualEncoding = e.actualEncoding
entity.encoding = e.encoding
entity.version = e.version
clone.entities._seq.append(entity)
e._call_user_data_handler(operation, e, entity)
self._call_user_data_handler(operation, self, clone)
return clone
else:
return None
def writexml(self, writer, indent="", addindent="", newl=""):
writer.write("<!DOCTYPE ")
writer.write(self.name)
if self.publicId:
writer.write("%s PUBLIC '%s'%s '%s'"
% (newl, self.publicId, newl, self.systemId))
elif self.systemId:
writer.write("%s SYSTEM '%s'" % (newl, self.systemId))
if self.internalSubset is not None:
writer.write(" [")
writer.write(self.internalSubset)
writer.write("]")
writer.write(">"+newl)
class Entity(Identified, Node):
attributes = None
nodeType = Node.ENTITY_NODE
nodeValue = None
actualEncoding = None
encoding = None
version = None
def __init__(self, name, publicId, systemId, notation):
self.nodeName = name
self.notationName = notation
self.childNodes = NodeList()
self._identified_mixin_init(publicId, systemId)
def _get_actualEncoding(self):
return self.actualEncoding
def _get_encoding(self):
return self.encoding
def _get_version(self):
return self.version
def appendChild(self, newChild):
raise xml.dom.HierarchyRequestErr(
"cannot append children to an entity node")
def insertBefore(self, newChild, refChild):
raise xml.dom.HierarchyRequestErr(
"cannot insert children below an entity node")
def removeChild(self, oldChild):
raise xml.dom.HierarchyRequestErr(
"cannot remove children from an entity node")
def replaceChild(self, newChild, oldChild):
raise xml.dom.HierarchyRequestErr(
"cannot replace children of an entity node")
class Notation(Identified, Childless, Node):
nodeType = Node.NOTATION_NODE
nodeValue = None
def __init__(self, name, publicId, systemId):
self.nodeName = name
self._identified_mixin_init(publicId, systemId)
class DOMImplementation(DOMImplementationLS):
_features = [("core", "1.0"),
("core", "2.0"),
("core", None),
("xml", "1.0"),
("xml", "2.0"),
("xml", None),
("ls-load", "3.0"),
("ls-load", None),
]
def hasFeature(self, feature, version):
if version == "":
version = None
return (feature.lower(), version) in self._features
def createDocument(self, namespaceURI, qualifiedName, doctype):
if doctype and doctype.parentNode is not None:
raise xml.dom.WrongDocumentErr(
"doctype object owned by another DOM tree")
doc = self._create_document()
add_root_element = not (namespaceURI is None
and qualifiedName is None
and doctype is None)
if not qualifiedName and add_root_element:
# The spec is unclear what to raise here; SyntaxErr
# would be the other obvious candidate. Since Xerces raises
# InvalidCharacterErr, and since SyntaxErr is not listed
# for createDocument, that seems to be the better choice.
# XXX: need to check for illegal characters here and in
# createElement.
# DOM Level III clears this up when talking about the return value
# of this function. If namespaceURI, qName and DocType are
# Null the document is returned without a document element
# Otherwise if doctype or namespaceURI are not None
# Then we go back to the above problem
raise xml.dom.InvalidCharacterErr("Element with no name")
if add_root_element:
prefix, localname = _nssplit(qualifiedName)
if prefix == "xml" \
and namespaceURI != "http://www.w3.org/XML/1998/namespace":
raise xml.dom.NamespaceErr("illegal use of 'xml' prefix")
if prefix and not namespaceURI:
raise xml.dom.NamespaceErr(
"illegal use of prefix without namespaces")
element = doc.createElementNS(namespaceURI, qualifiedName)
if doctype:
doc.appendChild(doctype)
doc.appendChild(element)
if doctype:
doctype.parentNode = doctype.ownerDocument = doc
doc.doctype = doctype
doc.implementation = self
return doc
def createDocumentType(self, qualifiedName, publicId, systemId):
doctype = DocumentType(qualifiedName)
doctype.publicId = publicId
doctype.systemId = systemId
return doctype
# DOM Level 3 (WD 9 April 2002)
def getInterface(self, feature):
if self.hasFeature(feature, None):
return self
else:
return None
# internal
def _create_document(self):
return Document()
class ElementInfo(object):
"""Object that represents content-model information for an element.
This implementation is not expected to be used in practice; DOM
builders should provide implementations which do the right thing
using information available to it.
"""
__slots__ = 'tagName',
def __init__(self, name):
self.tagName = name
def getAttributeType(self, aname):
return _no_type
def getAttributeTypeNS(self, namespaceURI, localName):
return _no_type
def isElementContent(self):
return False
def isEmpty(self):
"""Returns true iff this element is declared to have an EMPTY
content model."""
return False
def isId(self, aname):
"""Returns true iff the named attribute is a DTD-style ID."""
return False
def isIdNS(self, namespaceURI, localName):
"""Returns true iff the identified attribute is a DTD-style ID."""
return False
def __getstate__(self):
return self.tagName
def __setstate__(self, state):
self.tagName = state
def _clear_id_cache(node):
if node.nodeType == Node.DOCUMENT_NODE:
node._id_cache.clear()
node._id_search_stack = None
elif _in_document(node):
node.ownerDocument._id_cache.clear()
node.ownerDocument._id_search_stack= None
class Document(Node, DocumentLS):
_child_node_types = (Node.ELEMENT_NODE, Node.PROCESSING_INSTRUCTION_NODE,
Node.COMMENT_NODE, Node.DOCUMENT_TYPE_NODE)
nodeType = Node.DOCUMENT_NODE
nodeName = "#document"
nodeValue = None
attributes = None
doctype = None
parentNode = None
previousSibling = nextSibling = None
implementation = DOMImplementation()
# Document attributes from Level 3 (WD 9 April 2002)
actualEncoding = None
encoding = None
standalone = None
version = None
strictErrorChecking = False
errorHandler = None
documentURI = None
_magic_id_count = 0
def __init__(self):
self.childNodes = NodeList()
# mapping of (namespaceURI, localName) -> ElementInfo
# and tagName -> ElementInfo
self._elem_info = {}
self._id_cache = {}
self._id_search_stack = None
def _get_elem_info(self, element):
if element.namespaceURI:
key = element.namespaceURI, element.localName
else:
key = element.tagName
return self._elem_info.get(key)
def _get_actualEncoding(self):
return self.actualEncoding
def _get_doctype(self):
return self.doctype
def _get_documentURI(self):
return self.documentURI
def _get_encoding(self):
return self.encoding
def _get_errorHandler(self):
return self.errorHandler
def _get_standalone(self):
return self.standalone
def _get_strictErrorChecking(self):
return self.strictErrorChecking
def _get_version(self):
return self.version
def appendChild(self, node):
if node.nodeType not in self._child_node_types:
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(node), repr(self)))
if node.parentNode is not None:
# This needs to be done before the next test since this
# may *be* the document element, in which case it should
# end up re-ordered to the end.
node.parentNode.removeChild(node)
if node.nodeType == Node.ELEMENT_NODE \
and self._get_documentElement():
raise xml.dom.HierarchyRequestErr(
"two document elements disallowed")
return Node.appendChild(self, node)
def removeChild(self, oldChild):
try:
self.childNodes.remove(oldChild)
except ValueError:
raise xml.dom.NotFoundErr()
oldChild.nextSibling = oldChild.previousSibling = None
oldChild.parentNode = None
if self.documentElement is oldChild:
self.documentElement = None
return oldChild
def _get_documentElement(self):
for node in self.childNodes:
if node.nodeType == Node.ELEMENT_NODE:
return node
def unlink(self):
if self.doctype is not None:
self.doctype.unlink()
self.doctype = None
Node.unlink(self)
def cloneNode(self, deep):
if not deep:
return None
clone = self.implementation.createDocument(None, None, None)
clone.encoding = self.encoding
clone.standalone = self.standalone
clone.version = self.version
for n in self.childNodes:
childclone = _clone_node(n, deep, clone)
assert childclone.ownerDocument.isSameNode(clone)
clone.childNodes.append(childclone)
if childclone.nodeType == Node.DOCUMENT_NODE:
assert clone.documentElement is None
elif childclone.nodeType == Node.DOCUMENT_TYPE_NODE:
assert clone.doctype is None
clone.doctype = childclone
childclone.parentNode = clone
self._call_user_data_handler(xml.dom.UserDataHandler.NODE_CLONED,
self, clone)
return clone
def createDocumentFragment(self):
d = DocumentFragment()
d.ownerDocument = self
return d
def createElement(self, tagName):
e = Element(tagName)
e.ownerDocument = self
return e
def createTextNode(self, data):
if not isinstance(data, StringTypes):
raise TypeError, "node contents must be a string"
t = Text()
t.data = data
t.ownerDocument = self
return t
def createCDATASection(self, data):
if not isinstance(data, StringTypes):
raise TypeError, "node contents must be a string"
c = CDATASection()
c.data = data
c.ownerDocument = self
return c
def createComment(self, data):
c = Comment(data)
c.ownerDocument = self
return c
def createProcessingInstruction(self, target, data):
p = ProcessingInstruction(target, data)
p.ownerDocument = self
return p
def createAttribute(self, qName):
a = Attr(qName)
a.ownerDocument = self
a.value = ""
return a
def createElementNS(self, namespaceURI, qualifiedName):
prefix, localName = _nssplit(qualifiedName)
e = Element(qualifiedName, namespaceURI, prefix)
e.ownerDocument = self
return e
def createAttributeNS(self, namespaceURI, qualifiedName):
prefix, localName = _nssplit(qualifiedName)
a = Attr(qualifiedName, namespaceURI, localName, prefix)
a.ownerDocument = self
a.value = ""
return a
# A couple of implementation-specific helpers to create node types
# not supported by the W3C DOM specs:
def _create_entity(self, name, publicId, systemId, notationName):
e = Entity(name, publicId, systemId, notationName)
e.ownerDocument = self
return e
def _create_notation(self, name, publicId, systemId):
n = Notation(name, publicId, systemId)
n.ownerDocument = self
return n
def getElementById(self, id):
if id in self._id_cache:
return self._id_cache[id]
if not (self._elem_info or self._magic_id_count):
return None
stack = self._id_search_stack
if stack is None:
# we never searched before, or the cache has been cleared
stack = [self.documentElement]
self._id_search_stack = stack
elif not stack:
# Previous search was completed and cache is still valid;
# no matching node.
return None
result = None
while stack:
node = stack.pop()
# add child elements to stack for continued searching
stack.extend([child for child in node.childNodes
if child.nodeType in _nodeTypes_with_children])
# check this node
info = self._get_elem_info(node)
if info:
# We have to process all ID attributes before
# returning in order to get all the attributes set to
# be IDs using Element.setIdAttribute*().
for attr in node.attributes.values():
if attr.namespaceURI:
if info.isIdNS(attr.namespaceURI, attr.localName):
self._id_cache[attr.value] = node
if attr.value == id:
result = node
elif not node._magic_id_nodes:
break
elif info.isId(attr.name):
self._id_cache[attr.value] = node
if attr.value == id:
result = node
elif not node._magic_id_nodes:
break
elif attr._is_id:
self._id_cache[attr.value] = node
if attr.value == id:
result = node
elif node._magic_id_nodes == 1:
break
elif node._magic_id_nodes:
for attr in node.attributes.values():
if attr._is_id:
self._id_cache[attr.value] = node
if attr.value == id:
result = node
if result is not None:
break
return result
def getElementsByTagName(self, name):
return _get_elements_by_tagName_helper(self, name, NodeList())
def getElementsByTagNameNS(self, namespaceURI, localName):
return _get_elements_by_tagName_ns_helper(
self, namespaceURI, localName, NodeList())
def isSupported(self, feature, version):
return self.implementation.hasFeature(feature, version)
def importNode(self, node, deep):
if node.nodeType == Node.DOCUMENT_NODE:
raise xml.dom.NotSupportedErr("cannot import document nodes")
elif node.nodeType == Node.DOCUMENT_TYPE_NODE:
raise xml.dom.NotSupportedErr("cannot import document type nodes")
return _clone_node(node, deep, self)
def writexml(self, writer, indent="", addindent="", newl="",
encoding = None):
if encoding is None:
writer.write('<?xml version="1.0" ?>'+newl)
else:
writer.write('<?xml version="1.0" encoding="%s"?>%s' % (encoding, newl))
for node in self.childNodes:
node.writexml(writer, indent, addindent, newl)
# DOM Level 3 (WD 9 April 2002)
def renameNode(self, n, namespaceURI, name):
if n.ownerDocument is not self:
raise xml.dom.WrongDocumentErr(
"cannot rename nodes from other documents;\n"
"expected %s,\nfound %s" % (self, n.ownerDocument))
if n.nodeType not in (Node.ELEMENT_NODE, Node.ATTRIBUTE_NODE):
raise xml.dom.NotSupportedErr(
"renameNode() only applies to element and attribute nodes")
if namespaceURI != EMPTY_NAMESPACE:
if ':' in name:
prefix, localName = name.split(':', 1)
if ( prefix == "xmlns"
and namespaceURI != xml.dom.XMLNS_NAMESPACE):
raise xml.dom.NamespaceErr(
"illegal use of 'xmlns' prefix")
else:
if ( name == "xmlns"
and namespaceURI != xml.dom.XMLNS_NAMESPACE
and n.nodeType == Node.ATTRIBUTE_NODE):
raise xml.dom.NamespaceErr(
"illegal use of the 'xmlns' attribute")
prefix = None
localName = name
else:
prefix = None
localName = None
if n.nodeType == Node.ATTRIBUTE_NODE:
element = n.ownerElement
if element is not None:
is_id = n._is_id
element.removeAttributeNode(n)
else:
element = None
# avoid __setattr__
d = n.__dict__
d['prefix'] = prefix
d['localName'] = localName
d['namespaceURI'] = namespaceURI
d['nodeName'] = name
if n.nodeType == Node.ELEMENT_NODE:
d['tagName'] = name
else:
# attribute node
d['name'] = name
if element is not None:
element.setAttributeNode(n)
if is_id:
element.setIdAttributeNode(n)
# It's not clear from a semantic perspective whether we should
# call the user data handlers for the NODE_RENAMED event since
# we're re-using the existing node. The draft spec has been
# interpreted as meaning "no, don't call the handler unless a
# new node is created."
return n
defproperty(Document, "documentElement",
doc="Top-level element of this document.")
def _clone_node(node, deep, newOwnerDocument):
"""
Clone a node and give it the new owner document.
Called by Node.cloneNode and Document.importNode
"""
if node.ownerDocument.isSameNode(newOwnerDocument):
operation = xml.dom.UserDataHandler.NODE_CLONED
else:
operation = xml.dom.UserDataHandler.NODE_IMPORTED
if node.nodeType == Node.ELEMENT_NODE:
clone = newOwnerDocument.createElementNS(node.namespaceURI,
node.nodeName)
for attr in node.attributes.values():
clone.setAttributeNS(attr.namespaceURI, attr.nodeName, attr.value)
a = clone.getAttributeNodeNS(attr.namespaceURI, attr.localName)
a.specified = attr.specified
if deep:
for child in node.childNodes:
c = _clone_node(child, deep, newOwnerDocument)
clone.appendChild(c)
elif node.nodeType == Node.DOCUMENT_FRAGMENT_NODE:
clone = newOwnerDocument.createDocumentFragment()
if deep:
for child in node.childNodes:
c = _clone_node(child, deep, newOwnerDocument)
clone.appendChild(c)
elif node.nodeType == Node.TEXT_NODE:
clone = newOwnerDocument.createTextNode(node.data)
elif node.nodeType == Node.CDATA_SECTION_NODE:
clone = newOwnerDocument.createCDATASection(node.data)
elif node.nodeType == Node.PROCESSING_INSTRUCTION_NODE:
clone = newOwnerDocument.createProcessingInstruction(node.target,
node.data)
elif node.nodeType == Node.COMMENT_NODE:
clone = newOwnerDocument.createComment(node.data)
elif node.nodeType == Node.ATTRIBUTE_NODE:
clone = newOwnerDocument.createAttributeNS(node.namespaceURI,
node.nodeName)
clone.specified = True
clone.value = node.value
elif node.nodeType == Node.DOCUMENT_TYPE_NODE:
assert node.ownerDocument is not newOwnerDocument
operation = xml.dom.UserDataHandler.NODE_IMPORTED
clone = newOwnerDocument.implementation.createDocumentType(
node.name, node.publicId, node.systemId)
clone.ownerDocument = newOwnerDocument
if deep:
clone.entities._seq = []
clone.notations._seq = []
for n in node.notations._seq:
notation = Notation(n.nodeName, n.publicId, n.systemId)
notation.ownerDocument = newOwnerDocument
clone.notations._seq.append(notation)
if hasattr(n, '_call_user_data_handler'):
n._call_user_data_handler(operation, n, notation)
for e in node.entities._seq:
entity = Entity(e.nodeName, e.publicId, e.systemId,
e.notationName)
entity.actualEncoding = e.actualEncoding
entity.encoding = e.encoding
entity.version = e.version
entity.ownerDocument = newOwnerDocument
clone.entities._seq.append(entity)
if hasattr(e, '_call_user_data_handler'):
e._call_user_data_handler(operation, e, entity)
else:
# Note the cloning of Document and DocumentType nodes is
# implementation specific. minidom handles those cases
# directly in the cloneNode() methods.
raise xml.dom.NotSupportedErr("Cannot clone node %s" % repr(node))
# Check for _call_user_data_handler() since this could conceivably
# used with other DOM implementations (one of the FourThought
# DOMs, perhaps?).
if hasattr(node, '_call_user_data_handler'):
node._call_user_data_handler(operation, node, clone)
return clone
def _nssplit(qualifiedName):
fields = qualifiedName.split(':', 1)
if len(fields) == 2:
return fields
else:
return (None, fields[0])
def _get_StringIO():
# we can't use cStringIO since it doesn't support Unicode strings
from StringIO import StringIO
return StringIO()
def _do_pulldom_parse(func, args, kwargs):
events = func(*args, **kwargs)
toktype, rootNode = events.getEvent()
events.expandNode(rootNode)
events.clear()
return rootNode
def parse(file, parser=None, bufsize=None):
"""Parse a file into a DOM by filename or file object."""
if parser is None and not bufsize:
from xml.dom import expatbuilder
return expatbuilder.parse(file)
else:
from xml.dom import pulldom
return _do_pulldom_parse(pulldom.parse, (file,),
{'parser': parser, 'bufsize': bufsize})
def parseString(string, parser=None):
"""Parse a file into a DOM from a string."""
if parser is None:
from xml.dom import expatbuilder
return expatbuilder.parseString(string)
else:
from xml.dom import pulldom
return _do_pulldom_parse(pulldom.parseString, (string,),
{'parser': parser})
def getDOMImplementation(features=None):
if features:
if isinstance(features, StringTypes):
features = domreg._parse_feature_string(features)
for f, v in features:
if not Document.implementation.hasFeature(f, v):
return None
return Document.implementation
"""Simple API for XML (SAX) implementation for Python.
This module provides an implementation of the SAX 2 interface;
information about the Java version of the interface can be found at
http://www.megginson.com/SAX/. The Python version of the interface is
documented at <...>.
This package contains the following modules:
handler -- Base classes and constants which define the SAX 2 API for
the 'client-side' of SAX for Python.
saxutils -- Implementation of the convenience classes commonly used to
work with SAX.
xmlreader -- Base classes and constants which define the SAX 2 API for
the parsers used with SAX for Python.
expatreader -- Driver that allows use of the Expat parser with SAX.
"""
from xmlreader import InputSource
from handler import ContentHandler, ErrorHandler
from _exceptions import SAXException, SAXNotRecognizedException, \
SAXParseException, SAXNotSupportedException, \
SAXReaderNotAvailable
def parse(source, handler, errorHandler=ErrorHandler()):
parser = make_parser()
parser.setContentHandler(handler)
parser.setErrorHandler(errorHandler)
parser.parse(source)
def parseString(string, handler, errorHandler=ErrorHandler()):
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
if errorHandler is None:
errorHandler = ErrorHandler()
parser = make_parser()
parser.setContentHandler(handler)
parser.setErrorHandler(errorHandler)
inpsrc = InputSource()
inpsrc.setByteStream(StringIO(string))
parser.parse(inpsrc)
# this is the parser list used by the make_parser function if no
# alternatives are given as parameters to the function
default_parser_list = ["xml.sax.expatreader"]
# tell modulefinder that importing sax potentially imports expatreader
_false = 0
if _false:
import xml.sax.expatreader
import os, sys
if not sys.flags.ignore_environment and "PY_SAX_PARSER" in os.environ:
default_parser_list = os.environ["PY_SAX_PARSER"].split(",")
del os
_key = "python.xml.sax.parser"
if sys.platform[:4] == "java" and sys.registry.containsKey(_key):
default_parser_list = sys.registry.getProperty(_key).split(",")
def make_parser(parser_list = []):
"""Creates and returns a SAX parser.
Creates the first parser it is able to instantiate of the ones
given in the list created by doing parser_list +
default_parser_list. The lists must contain the names of Python
modules containing both a SAX parser and a create_parser function."""
for parser_name in parser_list + default_parser_list:
try:
return _create_parser(parser_name)
except ImportError,e:
import sys
if parser_name in sys.modules:
# The parser module was found, but importing it
# failed unexpectedly, pass this exception through
raise
except SAXReaderNotAvailable:
# The parser module detected that it won't work properly,
# so try the next one
pass
raise SAXReaderNotAvailable("No parsers found", None)
# --- Internal utility methods used by make_parser
if sys.platform[ : 4] == "java":
def _create_parser(parser_name):
from org.python.core import imp
drv_module = imp.importName(parser_name, 0, globals())
return drv_module.create_parser()
else:
def _create_parser(parser_name):
drv_module = __import__(parser_name,{},{},['create_parser'])
return drv_module.create_parser()
del sys
"""Core XML support for Python.
This package contains four sub-packages:
dom -- The W3C Document Object Model. This supports DOM Level 1 +
Namespaces.
parsers -- Python wrappers for XML parsers (currently only supports Expat).
sax -- The Simple API for XML, developed by XML-Dev, led by David
Megginson and ported to Python by Lars Marius Garshol. This
supports the SAX 2 API.
etree -- The ElementTree XML library. This is a subset of the full
ElementTree XML release.
"""
__all__ = ["dom", "parsers", "sax", "etree"]
_MINIMUM_XMLPLUS_VERSION = (0, 8, 4)
try:
import _xmlplus
except ImportError:
pass
else:
try:
v = _xmlplus.version_info
except AttributeError:
# _xmlplus is too old; ignore it
pass
else:
if v >= _MINIMUM_XMLPLUS_VERSION:
import sys
_xmlplus.__path__.extend(__path__)
sys.modules[__name__] = _xmlplus
else:
del v
"""A parser for XML, using the derived class as static DTD."""
# Author: Sjoerd Mullender.
import re
import string
import warnings
warnings.warn("The xmllib module is obsolete. Use xml.sax instead.",
DeprecationWarning, 2)
del warnings
version = '0.3'
class Error(RuntimeError):
pass
# Regular expressions used for parsing
_S = '[ \t\r\n]+' # white space
_opS = '[ \t\r\n]*' # optional white space
_Name = '[a-zA-Z_:][-a-zA-Z0-9._:]*' # valid XML name
_QStr = "(?:'[^']*'|\"[^\"]*\")" # quoted XML string
illegal = re.compile('[^\t\r\n -\176\240-\377]') # illegal chars in content
interesting = re.compile('[]&<]')
amp = re.compile('&')
ref = re.compile('&(' + _Name + '|#[0-9]+|#x[0-9a-fA-F]+)[^-a-zA-Z0-9._:]')
entityref = re.compile('&(?P<name>' + _Name + ')[^-a-zA-Z0-9._:]')
charref = re.compile('&#(?P<char>[0-9]+[^0-9]|x[0-9a-fA-F]+[^0-9a-fA-F])')
space = re.compile(_S + '$')
newline = re.compile('\n')
attrfind = re.compile(
_S + '(?P<name>' + _Name + ')'
'(' + _opS + '=' + _opS +
'(?P<value>'+_QStr+'|[-a-zA-Z0-9.:+*%?!\(\)_#=~]+))?')
starttagopen = re.compile('<' + _Name)
starttagend = re.compile(_opS + '(?P<slash>/?)>')
starttagmatch = re.compile('<(?P<tagname>'+_Name+')'
'(?P<attrs>(?:'+attrfind.pattern+')*)'+
starttagend.pattern)
endtagopen = re.compile('</')
endbracket = re.compile(_opS + '>')
endbracketfind = re.compile('(?:[^>\'"]|'+_QStr+')*>')
tagfind = re.compile(_Name)
cdataopen = re.compile(r'<!\[CDATA\[')
cdataclose = re.compile(r'\]\]>')
# this matches one of the following:
# SYSTEM SystemLiteral
# PUBLIC PubidLiteral SystemLiteral
_SystemLiteral = '(?P<%s>'+_QStr+')'
_PublicLiteral = '(?P<%s>"[-\'\(\)+,./:=?;!*#@$_%% \n\ra-zA-Z0-9]*"|' \
"'[-\(\)+,./:=?;!*#@$_%% \n\ra-zA-Z0-9]*')"
_ExternalId = '(?:SYSTEM|' \
'PUBLIC'+_S+_PublicLiteral%'pubid'+ \
')'+_S+_SystemLiteral%'syslit'
doctype = re.compile('<!DOCTYPE'+_S+'(?P<name>'+_Name+')'
'(?:'+_S+_ExternalId+')?'+_opS)
xmldecl = re.compile('<\?xml'+_S+
'version'+_opS+'='+_opS+'(?P<version>'+_QStr+')'+
'(?:'+_S+'encoding'+_opS+'='+_opS+
"(?P<encoding>'[A-Za-z][-A-Za-z0-9._]*'|"
'"[A-Za-z][-A-Za-z0-9._]*"))?'
'(?:'+_S+'standalone'+_opS+'='+_opS+
'(?P<standalone>\'(?:yes|no)\'|"(?:yes|no)"))?'+
_opS+'\?>')
procopen = re.compile(r'<\?(?P<proc>' + _Name + ')' + _opS)
procclose = re.compile(_opS + r'\?>')
commentopen = re.compile('<!--')
commentclose = re.compile('-->')
doubledash = re.compile('--')
attrtrans = string.maketrans(' \r\n\t', ' ')
# definitions for XML namespaces
_NCName = '[a-zA-Z_][-a-zA-Z0-9._]*' # XML Name, minus the ":"
ncname = re.compile(_NCName + '$')
qname = re.compile('(?:(?P<prefix>' + _NCName + '):)?' # optional prefix
'(?P<local>' + _NCName + ')$')
xmlns = re.compile('xmlns(?::(?P<ncname>'+_NCName+'))?$')
# XML parser base class -- find tags and call handler functions.
# Usage: p = XMLParser(); p.feed(data); ...; p.close().
# The dtd is defined by deriving a class which defines methods with
# special names to handle tags: start_foo and end_foo to handle <foo>
# and </foo>, respectively. The data between tags is passed to the
# parser by calling self.handle_data() with some data as argument (the
# data may be split up in arbitrary chunks).
class XMLParser:
attributes = {} # default, to be overridden
elements = {} # default, to be overridden
# parsing options, settable using keyword args in __init__
__accept_unquoted_attributes = 0
__accept_missing_endtag_name = 0
__map_case = 0
__accept_utf8 = 0
__translate_attribute_references = 1
# Interface -- initialize and reset this instance
def __init__(self, **kw):
self.__fixed = 0
if 'accept_unquoted_attributes' in kw:
self.__accept_unquoted_attributes = kw['accept_unquoted_attributes']
if 'accept_missing_endtag_name' in kw:
self.__accept_missing_endtag_name = kw['accept_missing_endtag_name']
if 'map_case' in kw:
self.__map_case = kw['map_case']
if 'accept_utf8' in kw:
self.__accept_utf8 = kw['accept_utf8']
if 'translate_attribute_references' in kw:
self.__translate_attribute_references = kw['translate_attribute_references']
self.reset()
def __fixelements(self):
self.__fixed = 1
self.elements = {}
self.__fixdict(self.__dict__)
self.__fixclass(self.__class__)
def __fixclass(self, kl):
self.__fixdict(kl.__dict__)
for k in kl.__bases__:
self.__fixclass(k)
def __fixdict(self, dict):
for key in dict.keys():
if key[:6] == 'start_':
tag = key[6:]
start, end = self.elements.get(tag, (None, None))
if start is None:
self.elements[tag] = getattr(self, key), end
elif key[:4] == 'end_':
tag = key[4:]
start, end = self.elements.get(tag, (None, None))
if end is None:
self.elements[tag] = start, getattr(self, key)
# Interface -- reset this instance. Loses all unprocessed data
def reset(self):
self.rawdata = ''
self.stack = []
self.nomoretags = 0
self.literal = 0
self.lineno = 1
self.__at_start = 1
self.__seen_doctype = None
self.__seen_starttag = 0
self.__use_namespaces = 0
self.__namespaces = {'xml':None} # xml is implicitly declared
# backward compatibility hack: if elements not overridden,
# fill it in ourselves
if self.elements is XMLParser.elements:
self.__fixelements()
# For derived classes only -- enter literal mode (CDATA) till EOF
def setnomoretags(self):
self.nomoretags = self.literal = 1
# For derived classes only -- enter literal mode (CDATA)
def setliteral(self, *args):
self.literal = 1
# Interface -- feed some data to the parser. Call this as
# often as you want, with as little or as much text as you
# want (may include '\n'). (This just saves the text, all the
# processing is done by goahead().)
def feed(self, data):
self.rawdata = self.rawdata + data
self.goahead(0)
# Interface -- handle the remaining data
def close(self):
self.goahead(1)
if self.__fixed:
self.__fixed = 0
# remove self.elements so that we don't leak
del self.elements
# Interface -- translate references
def translate_references(self, data, all = 1):
if not self.__translate_attribute_references:
return data
i = 0
while 1:
res = amp.search(data, i)
if res is None:
return data
s = res.start(0)
res = ref.match(data, s)
if res is None:
self.syntax_error("bogus `&'")
i = s+1
continue
i = res.end(0)
str = res.group(1)
rescan = 0
if str[0] == '#':
if str[1] == 'x':
str = chr(int(str[2:], 16))
else:
str = chr(int(str[1:]))
if data[i - 1] != ';':
self.syntax_error("`;' missing after char reference")
i = i-1
elif all:
if str in self.entitydefs:
str = self.entitydefs[str]
rescan = 1
elif data[i - 1] != ';':
self.syntax_error("bogus `&'")
i = s + 1 # just past the &
continue
else:
self.syntax_error("reference to unknown entity `&%s;'" % str)
str = '&' + str + ';'
elif data[i - 1] != ';':
self.syntax_error("bogus `&'")
i = s + 1 # just past the &
continue
# when we get here, str contains the translated text and i points
# to the end of the string that is to be replaced
data = data[:s] + str + data[i:]
if rescan:
i = s
else:
i = s + len(str)
# Interface - return a dictionary of all namespaces currently valid
def getnamespace(self):
nsdict = {}
for t, d, nst in self.stack:
nsdict.update(d)
return nsdict
# Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
def goahead(self, end):
rawdata = self.rawdata
i = 0
n = len(rawdata)
while i < n:
if i > 0:
self.__at_start = 0
if self.nomoretags:
data = rawdata[i:n]
self.handle_data(data)
self.lineno = self.lineno + data.count('\n')
i = n
break
res = interesting.search(rawdata, i)
if res:
j = res.start(0)
else:
j = n
if i < j:
data = rawdata[i:j]
if self.__at_start and space.match(data) is None:
self.syntax_error('illegal data at start of file')
self.__at_start = 0
if not self.stack and space.match(data) is None:
self.syntax_error('data not in content')
if not self.__accept_utf8 and illegal.search(data):
self.syntax_error('illegal character in content')
self.handle_data(data)
self.lineno = self.lineno + data.count('\n')
i = j
if i == n: break
if rawdata[i] == '<':
if starttagopen.match(rawdata, i):
if self.literal:
data = rawdata[i]
self.handle_data(data)
self.lineno = self.lineno + data.count('\n')
i = i+1
continue
k = self.parse_starttag(i)
if k < 0: break
self.__seen_starttag = 1
self.lineno = self.lineno + rawdata[i:k].count('\n')
i = k
continue
if endtagopen.match(rawdata, i):
k = self.parse_endtag(i)
if k < 0: break
self.lineno = self.lineno + rawdata[i:k].count('\n')
i = k
continue
if commentopen.match(rawdata, i):
if self.literal:
data = rawdata[i]
self.handle_data(data)
self.lineno = self.lineno + data.count('\n')
i = i+1
continue
k = self.parse_comment(i)
if k < 0: break
self.lineno = self.lineno + rawdata[i:k].count('\n')
i = k
continue
if cdataopen.match(rawdata, i):
k = self.parse_cdata(i)
if k < 0: break
self.lineno = self.lineno + rawdata[i:k].count('\n')
i = k
continue
res = xmldecl.match(rawdata, i)
if res:
if not self.__at_start:
self.syntax_error("<?xml?> declaration not at start of document")
version, encoding, standalone = res.group('version',
'encoding',
'standalone')
if version[1:-1] != '1.0':
raise Error('only XML version 1.0 supported')
if encoding: encoding = encoding[1:-1]
if standalone: standalone = standalone[1:-1]
self.handle_xml(encoding, standalone)
i = res.end(0)
continue
res = procopen.match(rawdata, i)
if res:
k = self.parse_proc(i)
if k < 0: break
self.lineno = self.lineno + rawdata[i:k].count('\n')
i = k
continue
res = doctype.match(rawdata, i)
if res:
if self.literal:
data = rawdata[i]
self.handle_data(data)
self.lineno = self.lineno + data.count('\n')
i = i+1
continue
if self.__seen_doctype:
self.syntax_error('multiple DOCTYPE elements')
if self.__seen_starttag:
self.syntax_error('DOCTYPE not at beginning of document')
k = self.parse_doctype(res)
if k < 0: break
self.__seen_doctype = res.group('name')
if self.__map_case:
self.__seen_doctype = self.__seen_doctype.lower()
self.lineno = self.lineno + rawdata[i:k].count('\n')
i = k
continue
elif rawdata[i] == '&':
if self.literal:
data = rawdata[i]
self.handle_data(data)
i = i+1
continue
res = charref.match(rawdata, i)
if res is not None:
i = res.end(0)
if rawdata[i-1] != ';':
self.syntax_error("`;' missing in charref")
i = i-1
if not self.stack:
self.syntax_error('data not in content')
self.handle_charref(res.group('char')[:-1])
self.lineno = self.lineno + res.group(0).count('\n')
continue
res = entityref.match(rawdata, i)
if res is not None:
i = res.end(0)
if rawdata[i-1] != ';':
self.syntax_error("`;' missing in entityref")
i = i-1
name = res.group('name')
if self.__map_case:
name = name.lower()
if name in self.entitydefs:
self.rawdata = rawdata = rawdata[:res.start(0)] + self.entitydefs[name] + rawdata[i:]
n = len(rawdata)
i = res.start(0)
else:
self.unknown_entityref(name)
self.lineno = self.lineno + res.group(0).count('\n')
continue
elif rawdata[i] == ']':
if self.literal:
data = rawdata[i]
self.handle_data(data)
i = i+1
continue
if n-i < 3:
break
if cdataclose.match(rawdata, i):
self.syntax_error("bogus `]]>'")
self.handle_data(rawdata[i])
i = i+1
continue
else:
raise Error('neither < nor & ??')
# We get here only if incomplete matches but
# nothing else
break
# end while
if i > 0:
self.__at_start = 0
if end and i < n:
data = rawdata[i]
self.syntax_error("bogus `%s'" % data)
if not self.__accept_utf8 and illegal.search(data):
self.syntax_error('illegal character in content')
self.handle_data(data)
self.lineno = self.lineno + data.count('\n')
self.rawdata = rawdata[i+1:]
return self.goahead(end)
self.rawdata = rawdata[i:]
if end:
if not self.__seen_starttag:
self.syntax_error('no elements in file')
if self.stack:
self.syntax_error('missing end tags')
while self.stack:
self.finish_endtag(self.stack[-1][0])
# Internal -- parse comment, return length or -1 if not terminated
def parse_comment(self, i):
rawdata = self.rawdata
if rawdata[i:i+4] != '<!--':
raise Error('unexpected call to handle_comment')
res = commentclose.search(rawdata, i+4)
if res is None:
return -1
if doubledash.search(rawdata, i+4, res.start(0)):
self.syntax_error("`--' inside comment")
if rawdata[res.start(0)-1] == '-':
self.syntax_error('comment cannot end in three dashes')
if not self.__accept_utf8 and \
illegal.search(rawdata, i+4, res.start(0)):
self.syntax_error('illegal character in comment')
self.handle_comment(rawdata[i+4: res.start(0)])
return res.end(0)
# Internal -- handle DOCTYPE tag, return length or -1 if not terminated
def parse_doctype(self, res):
rawdata = self.rawdata
n = len(rawdata)
name = res.group('name')
if self.__map_case:
name = name.lower()
pubid, syslit = res.group('pubid', 'syslit')
if pubid is not None:
pubid = pubid[1:-1] # remove quotes
pubid = ' '.join(pubid.split()) # normalize
if syslit is not None: syslit = syslit[1:-1] # remove quotes
j = k = res.end(0)
if k >= n:
return -1
if rawdata[k] == '[':
level = 0
k = k+1
dq = sq = 0
while k < n:
c = rawdata[k]
if not sq and c == '"':
dq = not dq
elif not dq and c == "'":
sq = not sq
elif sq or dq:
pass
elif level <= 0 and c == ']':
res = endbracket.match(rawdata, k+1)
if res is None:
return -1
self.handle_doctype(name, pubid, syslit, rawdata[j+1:k])
return res.end(0)
elif c == '<':
level = level + 1
elif c == '>':
level = level - 1
if level < 0:
self.syntax_error("bogus `>' in DOCTYPE")
k = k+1
res = endbracketfind.match(rawdata, k)
if res is None:
return -1
if endbracket.match(rawdata, k) is None:
self.syntax_error('garbage in DOCTYPE')
self.handle_doctype(name, pubid, syslit, None)
return res.end(0)
# Internal -- handle CDATA tag, return length or -1 if not terminated
def parse_cdata(self, i):
rawdata = self.rawdata
if rawdata[i:i+9] != '<![CDATA[':
raise Error('unexpected call to parse_cdata')
res = cdataclose.search(rawdata, i+9)
if res is None:
return -1
if not self.__accept_utf8 and \
illegal.search(rawdata, i+9, res.start(0)):
self.syntax_error('illegal character in CDATA')
if not self.stack:
self.syntax_error('CDATA not in content')
self.handle_cdata(rawdata[i+9:res.start(0)])
return res.end(0)
__xml_namespace_attributes = {'ns':None, 'src':None, 'prefix':None}
# Internal -- handle a processing instruction tag
def parse_proc(self, i):
rawdata = self.rawdata
end = procclose.search(rawdata, i)
if end is None:
return -1
j = end.start(0)
if not self.__accept_utf8 and illegal.search(rawdata, i+2, j):
self.syntax_error('illegal character in processing instruction')
res = tagfind.match(rawdata, i+2)
if res is None:
raise Error('unexpected call to parse_proc')
k = res.end(0)
name = res.group(0)
if self.__map_case:
name = name.lower()
if name == 'xml:namespace':
self.syntax_error('old-fashioned namespace declaration')
self.__use_namespaces = -1
# namespace declaration
# this must come after the <?xml?> declaration (if any)
# and before the <!DOCTYPE> (if any).
if self.__seen_doctype or self.__seen_starttag:
self.syntax_error('xml:namespace declaration too late in document')
attrdict, namespace, k = self.parse_attributes(name, k, j)
if namespace:
self.syntax_error('namespace declaration inside namespace declaration')
for attrname in attrdict.keys():
if not attrname in self.__xml_namespace_attributes:
self.syntax_error("unknown attribute `%s' in xml:namespace tag" % attrname)
if not 'ns' in attrdict or not 'prefix' in attrdict:
self.syntax_error('xml:namespace without required attributes')
prefix = attrdict.get('prefix')
if ncname.match(prefix) is None:
self.syntax_error('xml:namespace illegal prefix value')
return end.end(0)
if prefix in self.__namespaces:
self.syntax_error('xml:namespace prefix not unique')
self.__namespaces[prefix] = attrdict['ns']
else:
if name.lower() == 'xml':
self.syntax_error('illegal processing instruction target name')
self.handle_proc(name, rawdata[k:j])
return end.end(0)
# Internal -- parse attributes between i and j
def parse_attributes(self, tag, i, j):
rawdata = self.rawdata
attrdict = {}
namespace = {}
while i < j:
res = attrfind.match(rawdata, i)
if res is None:
break
attrname, attrvalue = res.group('name', 'value')
if self.__map_case:
attrname = attrname.lower()
i = res.end(0)
if attrvalue is None:
self.syntax_error("no value specified for attribute `%s'" % attrname)
attrvalue = attrname
elif attrvalue[:1] == "'" == attrvalue[-1:] or \
attrvalue[:1] == '"' == attrvalue[-1:]:
attrvalue = attrvalue[1:-1]
elif not self.__accept_unquoted_attributes:
self.syntax_error("attribute `%s' value not quoted" % attrname)
res = xmlns.match(attrname)
if res is not None:
# namespace declaration
ncname = res.group('ncname')
namespace[ncname or ''] = attrvalue or None
if not self.__use_namespaces:
self.__use_namespaces = len(self.stack)+1
continue
if '<' in attrvalue:
self.syntax_error("`<' illegal in attribute value")
if attrname in attrdict:
self.syntax_error("attribute `%s' specified twice" % attrname)
attrvalue = attrvalue.translate(attrtrans)
attrdict[attrname] = self.translate_references(attrvalue)
return attrdict, namespace, i
# Internal -- handle starttag, return length or -1 if not terminated
def parse_starttag(self, i):
rawdata = self.rawdata
# i points to start of tag
end = endbracketfind.match(rawdata, i+1)
if end is None:
return -1
tag = starttagmatch.match(rawdata, i)
if tag is None or tag.end(0) != end.end(0):
self.syntax_error('garbage in starttag')
return end.end(0)
nstag = tagname = tag.group('tagname')
if self.__map_case:
nstag = tagname = nstag.lower()
if not self.__seen_starttag and self.__seen_doctype and \
tagname != self.__seen_doctype:
self.syntax_error('starttag does not match DOCTYPE')
if self.__seen_starttag and not self.stack:
self.syntax_error('multiple elements on top level')
k, j = tag.span('attrs')
attrdict, nsdict, k = self.parse_attributes(tagname, k, j)
self.stack.append((tagname, nsdict, nstag))
if self.__use_namespaces:
res = qname.match(tagname)
else:
res = None
if res is not None:
prefix, nstag = res.group('prefix', 'local')
if prefix is None:
prefix = ''
ns = None
for t, d, nst in self.stack:
if prefix in d:
ns = d[prefix]
if ns is None and prefix != '':
ns = self.__namespaces.get(prefix)
if ns is not None:
nstag = ns + ' ' + nstag
elif prefix != '':
nstag = prefix + ':' + nstag # undo split
self.stack[-1] = tagname, nsdict, nstag
# translate namespace of attributes
attrnamemap = {} # map from new name to old name (used for error reporting)
for key in attrdict.keys():
attrnamemap[key] = key
if self.__use_namespaces:
nattrdict = {}
for key, val in attrdict.items():
okey = key
res = qname.match(key)
if res is not None:
aprefix, key = res.group('prefix', 'local')
if self.__map_case:
key = key.lower()
if aprefix is not None:
ans = None
for t, d, nst in self.stack:
if aprefix in d:
ans = d[aprefix]
if ans is None:
ans = self.__namespaces.get(aprefix)
if ans is not None:
key = ans + ' ' + key
else:
key = aprefix + ':' + key
nattrdict[key] = val
attrnamemap[key] = okey
attrdict = nattrdict
attributes = self.attributes.get(nstag)
if attributes is not None:
for key in attrdict.keys():
if not key in attributes:
self.syntax_error("unknown attribute `%s' in tag `%s'" % (attrnamemap[key], tagname))
for key, val in attributes.items():
if val is not None and not key in attrdict:
attrdict[key] = val
method = self.elements.get(nstag, (None, None))[0]
self.finish_starttag(nstag, attrdict, method)
if tag.group('slash') == '/':
self.finish_endtag(tagname)
return tag.end(0)
# Internal -- parse endtag
def parse_endtag(self, i):
rawdata = self.rawdata
end = endbracketfind.match(rawdata, i+1)
if end is None:
return -1
res = tagfind.match(rawdata, i+2)
if res is None:
if self.literal:
self.handle_data(rawdata[i])
return i+1
if not self.__accept_missing_endtag_name:
self.syntax_error('no name specified in end tag')
tag = self.stack[-1][0]
k = i+2
else:
tag = res.group(0)
if self.__map_case:
tag = tag.lower()
if self.literal:
if not self.stack or tag != self.stack[-1][0]:
self.handle_data(rawdata[i])
return i+1
k = res.end(0)
if endbracket.match(rawdata, k) is None:
self.syntax_error('garbage in end tag')
self.finish_endtag(tag)
return end.end(0)
# Internal -- finish processing of start tag
def finish_starttag(self, tagname, attrdict, method):
if method is not None:
self.handle_starttag(tagname, method, attrdict)
else:
self.unknown_starttag(tagname, attrdict)
# Internal -- finish processing of end tag
def finish_endtag(self, tag):
self.literal = 0
if not tag:
self.syntax_error('name-less end tag')
found = len(self.stack) - 1
if found < 0:
self.unknown_endtag(tag)
return
else:
found = -1
for i in range(len(self.stack)):
if tag == self.stack[i][0]:
found = i
if found == -1:
self.syntax_error('unopened end tag')
return
while len(self.stack) > found:
if found < len(self.stack) - 1:
self.syntax_error('missing close tag for %s' % self.stack[-1][2])
nstag = self.stack[-1][2]
method = self.elements.get(nstag, (None, None))[1]
if method is not None:
self.handle_endtag(nstag, method)
else:
self.unknown_endtag(nstag)
if self.__use_namespaces == len(self.stack):
self.__use_namespaces = 0
del self.stack[-1]
# Overridable -- handle xml processing instruction
def handle_xml(self, encoding, standalone):
pass
# Overridable -- handle DOCTYPE
def handle_doctype(self, tag, pubid, syslit, data):
pass
# Overridable -- handle start tag
def handle_starttag(self, tag, method, attrs):
method(attrs)
# Overridable -- handle end tag
def handle_endtag(self, tag, method):
method()
# Example -- handle character reference, no need to override
def handle_charref(self, name):
try:
if name[0] == 'x':
n = int(name[1:], 16)
else:
n = int(name)
except ValueError:
self.unknown_charref(name)
return
if not 0 <= n <= 255:
self.unknown_charref(name)
return
self.handle_data(chr(n))
# Definition of entities -- derived classes may override
entitydefs = {'lt': '<', # must use charref
'gt': '>',
'amp': '&', # must use charref
'quot': '"',
'apos': ''',
}
# Example -- handle data, should be overridden
def handle_data(self, data):
pass
# Example -- handle cdata, could be overridden
def handle_cdata(self, data):
pass
# Example -- handle comment, could be overridden
def handle_comment(self, data):
pass
# Example -- handle processing instructions, could be overridden
def handle_proc(self, name, data):
pass
# Example -- handle relatively harmless syntax errors, could be overridden
def syntax_error(self, message):
raise Error('Syntax error at line %d: %s' % (self.lineno, message))
# To be overridden -- handlers for unknown objects
def unknown_starttag(self, tag, attrs): pass
def unknown_endtag(self, tag): pass
def unknown_charref(self, ref): pass
def unknown_entityref(self, name):
self.syntax_error("reference to unknown entity `&%s;'" % name)
class TestXMLParser(XMLParser):
def __init__(self, **kw):
self.testdata = ""
XMLParser.__init__(self, **kw)
def handle_xml(self, encoding, standalone):
self.flush()
print 'xml: encoding =',encoding,'standalone =',standalone
def handle_doctype(self, tag, pubid, syslit, data):
self.flush()
print 'DOCTYPE:',tag, repr(data)
def handle_data(self, data):
self.testdata = self.testdata + data
if len(repr(self.testdata)) >= 70:
self.flush()
def flush(self):
data = self.testdata
if data:
self.testdata = ""
print 'data:', repr(data)
def handle_cdata(self, data):
self.flush()
print 'cdata:', repr(data)
def handle_proc(self, name, data):
self.flush()
print 'processing:',name,repr(data)
def handle_comment(self, data):
self.flush()
r = repr(data)
if len(r) > 68:
r = r[:32] + '...' + r[-32:]
print 'comment:', r
def syntax_error(self, message):
print 'error at line %d:' % self.lineno, message
def unknown_starttag(self, tag, attrs):
self.flush()
if not attrs:
print 'start tag: <' + tag + '>'
else:
print 'start tag: <' + tag,
for name, value in attrs.items():
print name + '=' + '"' + value + '"',
print '>'
def unknown_endtag(self, tag):
self.flush()
print 'end tag: </' + tag + '>'
def unknown_entityref(self, ref):
self.flush()
print '*** unknown entity ref: &' + ref + ';'
def unknown_charref(self, ref):
self.flush()
print '*** unknown char ref: &#' + ref + ';'
def close(self):
XMLParser.close(self)
self.flush()
def test(args = None):
import sys, getopt
from time import time
if not args:
args = sys.argv[1:]
opts, args = getopt.getopt(args, 'st')
klass = TestXMLParser
do_time = 0
for o, a in opts:
if o == '-s':
klass = XMLParser
elif o == '-t':
do_time = 1
if args:
file = args[0]
else:
file = 'test.xml'
if file == '-':
f = sys.stdin
else:
try:
f = open(file, 'r')
except IOError, msg:
print file, ":", msg
sys.exit(1)
data = f.read()
if f is not sys.stdin:
f.close()
x = klass()
t0 = time()
try:
if do_time:
x.feed(data)
x.close()
else:
for c in data:
x.feed(c)
x.close()
except Error, msg:
t1 = time()
print msg
if do_time:
print 'total time: %g' % (t1-t0)
sys.exit(1)
t1 = time()
if do_time:
print 'total time: %g' % (t1-t0)
if __name__ == '__main__':
test()
#
# XML-RPC CLIENT LIBRARY
# $Id$
#
# an XML-RPC client interface for Python.
#
# the marshalling and response parser code can also be used to
# implement XML-RPC servers.
#
# Notes:
# this version is designed to work with Python 2.1 or newer.
#
# History:
# 1999-01-14 fl Created
# 1999-01-15 fl Changed dateTime to use localtime
# 1999-01-16 fl Added Binary/base64 element, default to RPC2 service
# 1999-01-19 fl Fixed array data element (from Skip Montanaro)
# 1999-01-21 fl Fixed dateTime constructor, etc.
# 1999-02-02 fl Added fault handling, handle empty sequences, etc.
# 1999-02-10 fl Fixed problem with empty responses (from Skip Montanaro)
# 1999-06-20 fl Speed improvements, pluggable parsers/transports (0.9.8)
# 2000-11-28 fl Changed boolean to check the truth value of its argument
# 2001-02-24 fl Added encoding/Unicode/SafeTransport patches
# 2001-02-26 fl Added compare support to wrappers (0.9.9/1.0b1)
# 2001-03-28 fl Make sure response tuple is a singleton
# 2001-03-29 fl Don't require empty params element (from Nicholas Riley)
# 2001-06-10 fl Folded in _xmlrpclib accelerator support (1.0b2)
# 2001-08-20 fl Base xmlrpclib.Error on built-in Exception (from Paul Prescod)
# 2001-09-03 fl Allow Transport subclass to override getparser
# 2001-09-10 fl Lazy import of urllib, cgi, xmllib (20x import speedup)
# 2001-10-01 fl Remove containers from memo cache when done with them
# 2001-10-01 fl Use faster escape method (80% dumps speedup)
# 2001-10-02 fl More dumps microtuning
# 2001-10-04 fl Make sure import expat gets a parser (from Guido van Rossum)
# 2001-10-10 sm Allow long ints to be passed as ints if they don't overflow
# 2001-10-17 sm Test for int and long overflow (allows use on 64-bit systems)
# 2001-11-12 fl Use repr() to marshal doubles (from Paul Felix)
# 2002-03-17 fl Avoid buffered read when possible (from James Rucker)
# 2002-04-07 fl Added pythondoc comments
# 2002-04-16 fl Added __str__ methods to datetime/binary wrappers
# 2002-05-15 fl Added error constants (from Andrew Kuchling)
# 2002-06-27 fl Merged with Python CVS version
# 2002-10-22 fl Added basic authentication (based on code from Phillip Eby)
# 2003-01-22 sm Add support for the bool type
# 2003-02-27 gvr Remove apply calls
# 2003-04-24 sm Use cStringIO if available
# 2003-04-25 ak Add support for nil
# 2003-06-15 gn Add support for time.struct_time
# 2003-07-12 gp Correct marshalling of Faults
# 2003-10-31 mvl Add multicall support
# 2004-08-20 mvl Bump minimum supported Python version to 2.1
# 2014-12-02 ch/doko Add workaround for gzip bomb vulnerability
#
# Copyright (c) 1999-2002 by Secret Labs AB.
# Copyright (c) 1999-2002 by Fredrik Lundh.
#
# [email protected]
# http://www.pythonware.com
#
# --------------------------------------------------------------------
# The XML-RPC client interface is
#
# Copyright (c) 1999-2002 by Secret Labs AB
# Copyright (c) 1999-2002 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
#
# things to look into some day:
# TODO: sort out True/False/boolean issues for Python 2.3
"""
An XML-RPC client interface for Python.
The marshalling and response parser code can also be used to
implement XML-RPC servers.
Exported exceptions:
Error Base class for client errors
ProtocolError Indicates an HTTP protocol error
ResponseError Indicates a broken response package
Fault Indicates an XML-RPC fault package
Exported classes:
ServerProxy Represents a logical connection to an XML-RPC server
MultiCall Executor of boxcared xmlrpc requests
Boolean boolean wrapper to generate a "boolean" XML-RPC value
DateTime dateTime wrapper for an ISO 8601 string or time tuple or
localtime integer value to generate a "dateTime.iso8601"
XML-RPC value
Binary binary data wrapper
SlowParser Slow but safe standard parser (based on xmllib)
Marshaller Generate an XML-RPC params chunk from a Python data structure
Unmarshaller Unmarshal an XML-RPC response from incoming XML event message
Transport Handles an HTTP transaction to an XML-RPC server
SafeTransport Handles an HTTPS transaction to an XML-RPC server
Exported constants:
True
False
Exported functions:
boolean Convert any Python value to an XML-RPC boolean
getparser Create instance of the fastest available parser & attach
to an unmarshalling object
dumps Convert an argument tuple or a Fault instance to an XML-RPC
request (or response, if the methodresponse option is used).
loads Convert an XML-RPC packet to unmarshalled data plus a method
name (None if not present).
"""
import re, string, time, operator
from types import *
import socket
import errno
import httplib
try:
import gzip
except ImportError:
gzip = None #python can be built without zlib/gzip support
# --------------------------------------------------------------------
# Internal stuff
try:
unicode
except NameError:
unicode = None # unicode support not available
try:
import datetime
except ImportError:
datetime = None
try:
_bool_is_builtin = False.__class__.__name__ == "bool"
except NameError:
_bool_is_builtin = 0
def _decode(data, encoding, is8bit=re.compile("[\x80-\xff]").search):
# decode non-ascii string (if possible)
if unicode and encoding and is8bit(data):
data = unicode(data, encoding)
return data
def escape(s, replace=string.replace):
s = replace(s, "&", "&")
s = replace(s, "<", "<")
return replace(s, ">", ">",)
if unicode:
def _stringify(string):
# convert to 7-bit ascii if possible
try:
return string.encode("ascii")
except UnicodeError:
return string
else:
def _stringify(string):
return string
__version__ = "1.0.1"
# xmlrpc integer limits
MAXINT = 2L**31-1
MININT = -2L**31
# --------------------------------------------------------------------
# Error constants (from Dan Libby's specification at
# http://xmlrpc-epi.sourceforge.net/specs/rfc.fault_codes.php)
# Ranges of errors
PARSE_ERROR = -32700
SERVER_ERROR = -32600
APPLICATION_ERROR = -32500
SYSTEM_ERROR = -32400
TRANSPORT_ERROR = -32300
# Specific errors
NOT_WELLFORMED_ERROR = -32700
UNSUPPORTED_ENCODING = -32701
INVALID_ENCODING_CHAR = -32702
INVALID_XMLRPC = -32600
METHOD_NOT_FOUND = -32601
INVALID_METHOD_PARAMS = -32602
INTERNAL_ERROR = -32603
# --------------------------------------------------------------------
# Exceptions
##
# Base class for all kinds of client-side errors.
class Error(Exception):
"""Base class for client errors."""
def __str__(self):
return repr(self)
##
# Indicates an HTTP-level protocol error. This is raised by the HTTP
# transport layer, if the server returns an error code other than 200
# (OK).
#
# @param url The target URL.
# @param errcode The HTTP error code.
# @param errmsg The HTTP error message.
# @param headers The HTTP header dictionary.
class ProtocolError(Error):
"""Indicates an HTTP protocol error."""
def __init__(self, url, errcode, errmsg, headers):
Error.__init__(self)
self.url = url
self.errcode = errcode
self.errmsg = errmsg
self.headers = headers
def __repr__(self):
return (
"<ProtocolError for %s: %s %s>" %
(self.url, self.errcode, self.errmsg)
)
##
# Indicates a broken XML-RPC response package. This exception is
# raised by the unmarshalling layer, if the XML-RPC response is
# malformed.
class ResponseError(Error):
"""Indicates a broken response package."""
pass
##
# Indicates an XML-RPC fault response package. This exception is
# raised by the unmarshalling layer, if the XML-RPC response contains
# a fault string. This exception can also used as a class, to
# generate a fault XML-RPC message.
#
# @param faultCode The XML-RPC fault code.
# @param faultString The XML-RPC fault string.
class Fault(Error):
"""Indicates an XML-RPC fault package."""
def __init__(self, faultCode, faultString, **extra):
Error.__init__(self)
self.faultCode = faultCode
self.faultString = faultString
def __repr__(self):
return (
"<Fault %s: %s>" %
(self.faultCode, repr(self.faultString))
)
# --------------------------------------------------------------------
# Special values
##
# Wrapper for XML-RPC boolean values. Use the xmlrpclib.True and
# xmlrpclib.False constants, or the xmlrpclib.boolean() function, to
# generate boolean XML-RPC values.
#
# @param value A boolean value. Any true value is interpreted as True,
# all other values are interpreted as False.
from sys import modules
mod_dict = modules[__name__].__dict__
if _bool_is_builtin:
boolean = Boolean = bool
# to avoid breaking code which references xmlrpclib.{True,False}
mod_dict['True'] = True
mod_dict['False'] = False
else:
class Boolean:
"""Boolean-value wrapper.
Use True or False to generate a "boolean" XML-RPC value.
"""
def __init__(self, value = 0):
self.value = operator.truth(value)
def encode(self, out):
out.write("<value><boolean>%d</boolean></value>\n" % self.value)
def __cmp__(self, other):
if isinstance(other, Boolean):
other = other.value
return cmp(self.value, other)
def __repr__(self):
if self.value:
return "<Boolean True at %x>" % id(self)
else:
return "<Boolean False at %x>" % id(self)
def __int__(self):
return self.value
def __nonzero__(self):
return self.value
mod_dict['True'] = Boolean(1)
mod_dict['False'] = Boolean(0)
##
# Map true or false value to XML-RPC boolean values.
#
# @def boolean(value)
# @param value A boolean value. Any true value is mapped to True,
# all other values are mapped to False.
# @return xmlrpclib.True or xmlrpclib.False.
# @see Boolean
# @see True
# @see False
def boolean(value, _truefalse=(False, True)):
"""Convert any Python value to XML-RPC 'boolean'."""
return _truefalse[operator.truth(value)]
del modules, mod_dict
##
# Wrapper for XML-RPC DateTime values. This converts a time value to
# the format used by XML-RPC.
# <p>
# The value can be given as a string in the format
# "yyyymmddThh:mm:ss", as a 9-item time tuple (as returned by
# time.localtime()), or an integer value (as returned by time.time()).
# The wrapper uses time.localtime() to convert an integer to a time
# tuple.
#
# @param value The time, given as an ISO 8601 string, a time
# tuple, or an integer time value.
def _strftime(value):
if datetime:
if isinstance(value, datetime.datetime):
return "%04d%02d%02dT%02d:%02d:%02d" % (
value.year, value.month, value.day,
value.hour, value.minute, value.second)
if not isinstance(value, (TupleType, time.struct_time)):
if value == 0:
value = time.time()
value = time.localtime(value)
return "%04d%02d%02dT%02d:%02d:%02d" % value[:6]
class DateTime:
"""DateTime wrapper for an ISO 8601 string or time tuple or
localtime integer value to generate 'dateTime.iso8601' XML-RPC
value.
"""
def __init__(self, value=0):
if isinstance(value, StringType):
self.value = value
else:
self.value = _strftime(value)
def make_comparable(self, other):
if isinstance(other, DateTime):
s = self.value
o = other.value
elif datetime and isinstance(other, datetime.datetime):
s = self.value
o = other.strftime("%Y%m%dT%H:%M:%S")
elif isinstance(other, basestring):
s = self.value
o = other
elif hasattr(other, "timetuple"):
s = self.timetuple()
o = other.timetuple()
else:
otype = (hasattr(other, "__class__")
and other.__class__.__name__
or type(other))
raise TypeError("Can't compare %s and %s" %
(self.__class__.__name__, otype))
return s, o
def __lt__(self, other):
s, o = self.make_comparable(other)
return s < o
def __le__(self, other):
s, o = self.make_comparable(other)
return s <= o
def __gt__(self, other):
s, o = self.make_comparable(other)
return s > o
def __ge__(self, other):
s, o = self.make_comparable(other)
return s >= o
def __eq__(self, other):
s, o = self.make_comparable(other)
return s == o
def __ne__(self, other):
s, o = self.make_comparable(other)
return s != o
def timetuple(self):
return time.strptime(self.value, "%Y%m%dT%H:%M:%S")
def __cmp__(self, other):
s, o = self.make_comparable(other)
return cmp(s, o)
##
# Get date/time value.
#
# @return Date/time value, as an ISO 8601 string.
def __str__(self):
return self.value
def __repr__(self):
return "<DateTime %s at %x>" % (repr(self.value), id(self))
def decode(self, data):
data = str(data)
self.value = string.strip(data)
def encode(self, out):
out.write("<value><dateTime.iso8601>")
out.write(self.value)
out.write("</dateTime.iso8601></value>\n")
def _datetime(data):
# decode xml element contents into a DateTime structure.
value = DateTime()
value.decode(data)
return value
def _datetime_type(data):
t = time.strptime(data, "%Y%m%dT%H:%M:%S")
return datetime.datetime(*tuple(t)[:6])
##
# Wrapper for binary data. This can be used to transport any kind
# of binary data over XML-RPC, using BASE64 encoding.
#
# @param data An 8-bit string containing arbitrary data.
import base64
try:
import cStringIO as StringIO
except ImportError:
import StringIO
class Binary:
"""Wrapper for binary data."""
def __init__(self, data=None):
self.data = data
##
# Get buffer contents.
#
# @return Buffer contents, as an 8-bit string.
def __str__(self):
return self.data or ""
def __cmp__(self, other):
if isinstance(other, Binary):
other = other.data
return cmp(self.data, other)
def decode(self, data):
self.data = base64.decodestring(data)
def encode(self, out):
out.write("<value><base64>\n")
base64.encode(StringIO.StringIO(self.data), out)
out.write("</base64></value>\n")
def _binary(data):
# decode xml element contents into a Binary structure
value = Binary()
value.decode(data)
return value
WRAPPERS = (DateTime, Binary)
if not _bool_is_builtin:
WRAPPERS = WRAPPERS + (Boolean,)
# --------------------------------------------------------------------
# XML parsers
try:
# optional xmlrpclib accelerator
import _xmlrpclib
FastParser = _xmlrpclib.Parser
FastUnmarshaller = _xmlrpclib.Unmarshaller
except (AttributeError, ImportError):
FastParser = FastUnmarshaller = None
try:
import _xmlrpclib
FastMarshaller = _xmlrpclib.Marshaller
except (AttributeError, ImportError):
FastMarshaller = None
try:
from xml.parsers import expat
if not hasattr(expat, "ParserCreate"):
raise ImportError
except ImportError:
ExpatParser = None # expat not available
else:
class ExpatParser:
# fast expat parser for Python 2.0 and later.
def __init__(self, target):
self._parser = parser = expat.ParserCreate(None, None)
self._target = target
parser.StartElementHandler = target.start
parser.EndElementHandler = target.end
parser.CharacterDataHandler = target.data
encoding = None
if not parser.returns_unicode:
encoding = "utf-8"
target.xml(encoding, None)
def feed(self, data):
self._parser.Parse(data, 0)
def close(self):
try:
parser = self._parser
except AttributeError:
pass
else:
del self._target, self._parser # get rid of circular references
parser.Parse("", 1) # end of data
class SlowParser:
"""Default XML parser (based on xmllib.XMLParser)."""
# this is the slowest parser.
def __init__(self, target):
import xmllib # lazy subclassing (!)
if xmllib.XMLParser not in SlowParser.__bases__:
SlowParser.__bases__ = (xmllib.XMLParser,)
self.handle_xml = target.xml
self.unknown_starttag = target.start
self.handle_data = target.data
self.handle_cdata = target.data
self.unknown_endtag = target.end
try:
xmllib.XMLParser.__init__(self, accept_utf8=1)
except TypeError:
xmllib.XMLParser.__init__(self) # pre-2.0
# --------------------------------------------------------------------
# XML-RPC marshalling and unmarshalling code
##
# XML-RPC marshaller.
#
# @param encoding Default encoding for 8-bit strings. The default
# value is None (interpreted as UTF-8).
# @see dumps
class Marshaller:
"""Generate an XML-RPC params chunk from a Python data structure.
Create a Marshaller instance for each set of parameters, and use
the "dumps" method to convert your data (represented as a tuple)
to an XML-RPC params chunk. To write a fault response, pass a
Fault instance instead. You may prefer to use the "dumps" module
function for this purpose.
"""
# by the way, if you don't understand what's going on in here,
# that's perfectly ok.
def __init__(self, encoding=None, allow_none=0):
self.memo = {}
self.data = None
self.encoding = encoding
self.allow_none = allow_none
dispatch = {}
def dumps(self, values):
out = []
write = out.append
dump = self.__dump
if isinstance(values, Fault):
# fault instance
write("<fault>\n")
dump({'faultCode': values.faultCode,
'faultString': values.faultString},
write)
write("</fault>\n")
else:
# parameter block
# FIXME: the xml-rpc specification allows us to leave out
# the entire <params> block if there are no parameters.
# however, changing this may break older code (including
# old versions of xmlrpclib.py), so this is better left as
# is for now. See @XMLRPC3 for more information. /F
write("<params>\n")
for v in values:
write("<param>\n")
dump(v, write)
write("</param>\n")
write("</params>\n")
result = string.join(out, "")
return result
def __dump(self, value, write):
try:
f = self.dispatch[type(value)]
except KeyError:
# check if this object can be marshalled as a structure
try:
value.__dict__
except:
raise TypeError, "cannot marshal %s objects" % type(value)
# check if this class is a sub-class of a basic type,
# because we don't know how to marshal these types
# (e.g. a string sub-class)
for type_ in type(value).__mro__:
if type_ in self.dispatch.keys():
raise TypeError, "cannot marshal %s objects" % type(value)
f = self.dispatch[InstanceType]
f(self, value, write)
def dump_nil (self, value, write):
if not self.allow_none:
raise TypeError, "cannot marshal None unless allow_none is enabled"
write("<value><nil/></value>")
dispatch[NoneType] = dump_nil
def dump_int(self, value, write):
# in case ints are > 32 bits
if value > MAXINT or value < MININT:
raise OverflowError, "int exceeds XML-RPC limits"
write("<value><int>")
write(str(value))
write("</int></value>\n")
dispatch[IntType] = dump_int
if _bool_is_builtin:
def dump_bool(self, value, write):
write("<value><boolean>")
write(value and "1" or "0")
write("</boolean></value>\n")
dispatch[bool] = dump_bool
def dump_long(self, value, write):
if value > MAXINT or value < MININT:
raise OverflowError, "long int exceeds XML-RPC limits"
write("<value><int>")
write(str(int(value)))
write("</int></value>\n")
dispatch[LongType] = dump_long
def dump_double(self, value, write):
write("<value><double>")
write(repr(value))
write("</double></value>\n")
dispatch[FloatType] = dump_double
def dump_string(self, value, write, escape=escape):
write("<value><string>")
write(escape(value))
write("</string></value>\n")
dispatch[StringType] = dump_string
if unicode:
def dump_unicode(self, value, write, escape=escape):
write("<value><string>")
write(escape(value).encode(self.encoding, 'xmlcharrefreplace'))
write("</string></value>\n")
dispatch[UnicodeType] = dump_unicode
def dump_array(self, value, write):
i = id(value)
if i in self.memo:
raise TypeError, "cannot marshal recursive sequences"
self.memo[i] = None
dump = self.__dump
write("<value><array><data>\n")
for v in value:
dump(v, write)
write("</data></array></value>\n")
del self.memo[i]
dispatch[TupleType] = dump_array
dispatch[ListType] = dump_array
def dump_struct(self, value, write, escape=escape):
i = id(value)
if i in self.memo:
raise TypeError, "cannot marshal recursive dictionaries"
self.memo[i] = None
dump = self.__dump
write("<value><struct>\n")
for k, v in value.items():
write("<member>\n")
if type(k) is StringType:
k = escape(k)
elif unicode and type(k) is UnicodeType:
k = escape(k).encode(self.encoding, 'xmlcharrefreplace')
else:
raise TypeError, "dictionary key must be string"
write("<name>%s</name>\n" % k)
dump(v, write)
write("</member>\n")
write("</struct></value>\n")
del self.memo[i]
dispatch[DictType] = dump_struct
if datetime:
def dump_datetime(self, value, write):
write("<value><dateTime.iso8601>")
write(_strftime(value))
write("</dateTime.iso8601></value>\n")
dispatch[datetime.datetime] = dump_datetime
def dump_instance(self, value, write):
# check for special wrappers
if value.__class__ in WRAPPERS:
self.write = write
value.encode(self)
del self.write
else:
# store instance attributes as a struct (really?)
self.dump_struct(value.__dict__, write)
dispatch[InstanceType] = dump_instance
##
# XML-RPC unmarshaller.
#
# @see loads
class Unmarshaller:
"""Unmarshal an XML-RPC response, based on incoming XML event
messages (start, data, end). Call close() to get the resulting
data structure.
Note that this reader is fairly tolerant, and gladly accepts bogus
XML-RPC data without complaining (but not bogus XML).
"""
# and again, if you don't understand what's going on in here,
# that's perfectly ok.
def __init__(self, use_datetime=0):
self._type = None
self._stack = []
self._marks = []
self._data = []
self._value = False
self._methodname = None
self._encoding = "utf-8"
self.append = self._stack.append
self._use_datetime = use_datetime
if use_datetime and not datetime:
raise ValueError, "the datetime module is not available"
def close(self):
# return response tuple and target method
if self._type is None or self._marks:
raise ResponseError()
if self._type == "fault":
raise Fault(**self._stack[0])
return tuple(self._stack)
def getmethodname(self):
return self._methodname
#
# event handlers
def xml(self, encoding, standalone):
self._encoding = encoding
# FIXME: assert standalone == 1 ???
def start(self, tag, attrs):
# prepare to handle this element
if tag == "array" or tag == "struct":
self._marks.append(len(self._stack))
self._data = []
if self._value and tag not in self.dispatch:
raise ResponseError("unknown tag %r" % tag)
self._value = (tag == "value")
def data(self, text):
self._data.append(text)
def end(self, tag, join=string.join):
# call the appropriate end tag handler
try:
f = self.dispatch[tag]
except KeyError:
pass # unknown tag ?
else:
return f(self, join(self._data, ""))
#
# accelerator support
def end_dispatch(self, tag, data):
# dispatch data
try:
f = self.dispatch[tag]
except KeyError:
pass # unknown tag ?
else:
return f(self, data)
#
# element decoders
dispatch = {}
def end_nil (self, data):
self.append(None)
self._value = 0
dispatch["nil"] = end_nil
def end_boolean(self, data):
if data == "0":
self.append(False)
elif data == "1":
self.append(True)
else:
raise TypeError, "bad boolean value"
self._value = 0
dispatch["boolean"] = end_boolean
def end_int(self, data):
self.append(int(data))
self._value = 0
dispatch["i4"] = end_int
dispatch["i8"] = end_int
dispatch["int"] = end_int
def end_double(self, data):
self.append(float(data))
self._value = 0
dispatch["double"] = end_double
def end_string(self, data):
if self._encoding:
data = _decode(data, self._encoding)
self.append(_stringify(data))
self._value = 0
dispatch["string"] = end_string
dispatch["name"] = end_string # struct keys are always strings
def end_array(self, data):
mark = self._marks.pop()
# map arrays to Python lists
self._stack[mark:] = [self._stack[mark:]]
self._value = 0
dispatch["array"] = end_array
def end_struct(self, data):
mark = self._marks.pop()
# map structs to Python dictionaries
dict = {}
items = self._stack[mark:]
for i in range(0, len(items), 2):
dict[_stringify(items[i])] = items[i+1]
self._stack[mark:] = [dict]
self._value = 0
dispatch["struct"] = end_struct
def end_base64(self, data):
value = Binary()
value.decode(data)
self.append(value)
self._value = 0
dispatch["base64"] = end_base64
def end_dateTime(self, data):
value = DateTime()
value.decode(data)
if self._use_datetime:
value = _datetime_type(data)
self.append(value)
dispatch["dateTime.iso8601"] = end_dateTime
def end_value(self, data):
# if we stumble upon a value element with no internal
# elements, treat it as a string element
if self._value:
self.end_string(data)
dispatch["value"] = end_value
def end_params(self, data):
self._type = "params"
dispatch["params"] = end_params
def end_fault(self, data):
self._type = "fault"
dispatch["fault"] = end_fault
def end_methodName(self, data):
if self._encoding:
data = _decode(data, self._encoding)
self._methodname = data
self._type = "methodName" # no params
dispatch["methodName"] = end_methodName
## Multicall support
#
class _MultiCallMethod:
# some lesser magic to store calls made to a MultiCall object
# for batch execution
def __init__(self, call_list, name):
self.__call_list = call_list
self.__name = name
def __getattr__(self, name):
return _MultiCallMethod(self.__call_list, "%s.%s" % (self.__name, name))
def __call__(self, *args):
self.__call_list.append((self.__name, args))
class MultiCallIterator:
"""Iterates over the results of a multicall. Exceptions are
raised in response to xmlrpc faults."""
def __init__(self, results):
self.results = results
def __getitem__(self, i):
item = self.results[i]
if type(item) == type({}):
raise Fault(item['faultCode'], item['faultString'])
elif type(item) == type([]):
return item[0]
else:
raise ValueError,\
"unexpected type in multicall result"
class MultiCall:
"""server -> an object used to boxcar method calls
server should be a ServerProxy object.
Methods can be added to the MultiCall using normal
method call syntax e.g.:
multicall = MultiCall(server_proxy)
multicall.add(2,3)
multicall.get_address("Guido")
To execute the multicall, call the MultiCall object e.g.:
add_result, address = multicall()
"""
def __init__(self, server):
self.__server = server
self.__call_list = []
def __repr__(self):
return "<MultiCall at %x>" % id(self)
__str__ = __repr__
def __getattr__(self, name):
return _MultiCallMethod(self.__call_list, name)
def __call__(self):
marshalled_list = []
for name, args in self.__call_list:
marshalled_list.append({'methodName' : name, 'params' : args})
return MultiCallIterator(self.__server.system.multicall(marshalled_list))
# --------------------------------------------------------------------
# convenience functions
##
# Create a parser object, and connect it to an unmarshalling instance.
# This function picks the fastest available XML parser.
#
# return A (parser, unmarshaller) tuple.
def getparser(use_datetime=0):
"""getparser() -> parser, unmarshaller
Create an instance of the fastest available parser, and attach it
to an unmarshalling object. Return both objects.
"""
if use_datetime and not datetime:
raise ValueError, "the datetime module is not available"
if FastParser and FastUnmarshaller:
if use_datetime:
mkdatetime = _datetime_type
else:
mkdatetime = _datetime
target = FastUnmarshaller(True, False, _binary, mkdatetime, Fault)
parser = FastParser(target)
else:
target = Unmarshaller(use_datetime=use_datetime)
if FastParser:
parser = FastParser(target)
elif ExpatParser:
parser = ExpatParser(target)
else:
parser = SlowParser(target)
return parser, target
##
# Convert a Python tuple or a Fault instance to an XML-RPC packet.
#
# @def dumps(params, **options)
# @param params A tuple or Fault instance.
# @keyparam methodname If given, create a methodCall request for
# this method name.
# @keyparam methodresponse If given, create a methodResponse packet.
# If used with a tuple, the tuple must be a singleton (that is,
# it must contain exactly one element).
# @keyparam encoding The packet encoding.
# @return A string containing marshalled data.
def dumps(params, methodname=None, methodresponse=None, encoding=None,
allow_none=0):
"""data [,options] -> marshalled data
Convert an argument tuple or a Fault instance to an XML-RPC
request (or response, if the methodresponse option is used).
In addition to the data object, the following options can be given
as keyword arguments:
methodname: the method name for a methodCall packet
methodresponse: true to create a methodResponse packet.
If this option is used with a tuple, the tuple must be
a singleton (i.e. it can contain only one element).
encoding: the packet encoding (default is UTF-8)
All 8-bit strings in the data structure are assumed to use the
packet encoding. Unicode strings are automatically converted,
where necessary.
"""
assert isinstance(params, TupleType) or isinstance(params, Fault),\
"argument must be tuple or Fault instance"
if isinstance(params, Fault):
methodresponse = 1
elif methodresponse and isinstance(params, TupleType):
assert len(params) == 1, "response tuple must be a singleton"
if not encoding:
encoding = "utf-8"
if FastMarshaller:
m = FastMarshaller(encoding)
else:
m = Marshaller(encoding, allow_none)
data = m.dumps(params)
if encoding != "utf-8":
xmlheader = "<?xml version='1.0' encoding='%s'?>\n" % str(encoding)
else:
xmlheader = "<?xml version='1.0'?>\n" # utf-8 is default
# standard XML-RPC wrappings
if methodname:
# a method call
if not isinstance(methodname, StringType):
methodname = methodname.encode(encoding, 'xmlcharrefreplace')
data = (
xmlheader,
"<methodCall>\n"
"<methodName>", methodname, "</methodName>\n",
data,
"</methodCall>\n"
)
elif methodresponse:
# a method response, or a fault structure
data = (
xmlheader,
"<methodResponse>\n",
data,
"</methodResponse>\n"
)
else:
return data # return as is
return string.join(data, "")
##
# Convert an XML-RPC packet to a Python object. If the XML-RPC packet
# represents a fault condition, this function raises a Fault exception.
#
# @param data An XML-RPC packet, given as an 8-bit string.
# @return A tuple containing the unpacked data, and the method name
# (None if not present).
# @see Fault
def loads(data, use_datetime=0):
"""data -> unmarshalled data, method name
Convert an XML-RPC packet to unmarshalled data plus a method
name (None if not present).
If the XML-RPC packet represents a fault condition, this function
raises a Fault exception.
"""
p, u = getparser(use_datetime=use_datetime)
p.feed(data)
p.close()
return u.close(), u.getmethodname()
##
# Encode a string using the gzip content encoding such as specified by the
# Content-Encoding: gzip
# in the HTTP header, as described in RFC 1952
#
# @param data the unencoded data
# @return the encoded data
def gzip_encode(data):
"""data -> gzip encoded data
Encode data using the gzip content encoding as described in RFC 1952
"""
if not gzip:
raise NotImplementedError
f = StringIO.StringIO()
gzf = gzip.GzipFile(mode="wb", fileobj=f, compresslevel=1)
gzf.write(data)
gzf.close()
encoded = f.getvalue()
f.close()
return encoded
##
# Decode a string using the gzip content encoding such as specified by the
# Content-Encoding: gzip
# in the HTTP header, as described in RFC 1952
#
# @param data The encoded data
# @keyparam max_decode Maximum bytes to decode (20MB default), use negative
# values for unlimited decoding
# @return the unencoded data
# @raises ValueError if data is not correctly coded.
# @raises ValueError if max gzipped payload length exceeded
def gzip_decode(data, max_decode=20971520):
"""gzip encoded data -> unencoded data
Decode data using the gzip content encoding as described in RFC 1952
"""
if not gzip:
raise NotImplementedError
f = StringIO.StringIO(data)
gzf = gzip.GzipFile(mode="rb", fileobj=f)
try:
if max_decode < 0: # no limit
decoded = gzf.read()
else:
decoded = gzf.read(max_decode + 1)
except IOError:
raise ValueError("invalid data")
f.close()
gzf.close()
if max_decode >= 0 and len(decoded) > max_decode:
raise ValueError("max gzipped payload length exceeded")
return decoded
##
# Return a decoded file-like object for the gzip encoding
# as described in RFC 1952.
#
# @param response A stream supporting a read() method
# @return a file-like object that the decoded data can be read() from
class GzipDecodedResponse(gzip.GzipFile if gzip else object):
"""a file-like object to decode a response encoded with the gzip
method, as described in RFC 1952.
"""
def __init__(self, response):
#response doesn't support tell() and read(), required by
#GzipFile
if not gzip:
raise NotImplementedError
self.stringio = StringIO.StringIO(response.read())
gzip.GzipFile.__init__(self, mode="rb", fileobj=self.stringio)
def close(self):
try:
gzip.GzipFile.close(self)
finally:
self.stringio.close()
# --------------------------------------------------------------------
# request dispatcher
class _Method:
# some magic to bind an XML-RPC method to an RPC server.
# supports "nested" methods (e.g. examples.getStateName)
def __init__(self, send, name):
self.__send = send
self.__name = name
def __getattr__(self, name):
return _Method(self.__send, "%s.%s" % (self.__name, name))
def __call__(self, *args):
return self.__send(self.__name, args)
##
# Standard transport class for XML-RPC over HTTP.
# <p>
# You can create custom transports by subclassing this method, and
# overriding selected methods.
class Transport:
"""Handles an HTTP transaction to an XML-RPC server."""
# client identifier (may be overridden)
user_agent = "xmlrpclib.py/%s (by www.pythonware.com)" % __version__
#if true, we'll request gzip encoding
accept_gzip_encoding = True
# if positive, encode request using gzip if it exceeds this threshold
# note that many server will get confused, so only use it if you know
# that they can decode such a request
encode_threshold = None #None = don't encode
def __init__(self, use_datetime=0):
self._use_datetime = use_datetime
self._connection = (None, None)
self._extra_headers = []
##
# Send a complete request, and parse the response.
# Retry request if a cached connection has disconnected.
#
# @param host Target host.
# @param handler Target PRC handler.
# @param request_body XML-RPC request body.
# @param verbose Debugging flag.
# @return Parsed response.
def request(self, host, handler, request_body, verbose=0):
#retry request once if cached connection has gone cold
for i in (0, 1):
try:
return self.single_request(host, handler, request_body, verbose)
except socket.error, e:
if i or e.errno not in (errno.ECONNRESET, errno.ECONNABORTED, errno.EPIPE):
raise
except httplib.BadStatusLine: #close after we sent request
if i:
raise
##
# Send a complete request, and parse the response.
#
# @param host Target host.
# @param handler Target PRC handler.
# @param request_body XML-RPC request body.
# @param verbose Debugging flag.
# @return Parsed response.
def single_request(self, host, handler, request_body, verbose=0):
# issue XML-RPC request
h = self.make_connection(host)
if verbose:
h.set_debuglevel(1)
try:
self.send_request(h, handler, request_body)
self.send_host(h, host)
self.send_user_agent(h)
self.send_content(h, request_body)
response = h.getresponse(buffering=True)
if response.status == 200:
self.verbose = verbose
return self.parse_response(response)
except Fault:
raise
except Exception:
# All unexpected errors leave connection in
# a strange state, so we clear it.
self.close()
raise
#discard any response data and raise exception
if (response.getheader("content-length", 0)):
response.read()
raise ProtocolError(
host + handler,
response.status, response.reason,
response.msg,
)
##
# Create parser.
#
# @return A 2-tuple containing a parser and an unmarshaller.
def getparser(self):
# get parser and unmarshaller
return getparser(use_datetime=self._use_datetime)
##
# Get authorization info from host parameter
# Host may be a string, or a (host, x509-dict) tuple; if a string,
# it is checked for a "user:pw@host" format, and a "Basic
# Authentication" header is added if appropriate.
#
# @param host Host descriptor (URL or (URL, x509 info) tuple).
# @return A 3-tuple containing (actual host, extra headers,
# x509 info). The header and x509 fields may be None.
def get_host_info(self, host):
x509 = {}
if isinstance(host, TupleType):
host, x509 = host
import urllib
auth, host = urllib.splituser(host)
if auth:
import base64
auth = base64.encodestring(urllib.unquote(auth))
auth = string.join(string.split(auth), "") # get rid of whitespace
extra_headers = [
("Authorization", "Basic " + auth)
]
else:
extra_headers = None
return host, extra_headers, x509
##
# Connect to server.
#
# @param host Target host.
# @return A connection handle.
def make_connection(self, host):
#return an existing connection if possible. This allows
#HTTP/1.1 keep-alive.
if self._connection and host == self._connection[0]:
return self._connection[1]
# create a HTTP connection object from a host descriptor
chost, self._extra_headers, x509 = self.get_host_info(host)
#store the host argument along with the connection object
self._connection = host, httplib.HTTPConnection(chost)
return self._connection[1]
##
# Clear any cached connection object.
# Used in the event of socket errors.
#
def close(self):
host, connection = self._connection
if connection:
self._connection = (None, None)
connection.close()
##
# Send request header.
#
# @param connection Connection handle.
# @param handler Target RPC handler.
# @param request_body XML-RPC body.
def send_request(self, connection, handler, request_body):
if (self.accept_gzip_encoding and gzip):
connection.putrequest("POST", handler, skip_accept_encoding=True)
connection.putheader("Accept-Encoding", "gzip")
else:
connection.putrequest("POST", handler)
##
# Send host name.
#
# @param connection Connection handle.
# @param host Host name.
#
# Note: This function doesn't actually add the "Host"
# header anymore, it is done as part of the connection.putrequest() in
# send_request() above.
def send_host(self, connection, host):
extra_headers = self._extra_headers
if extra_headers:
if isinstance(extra_headers, DictType):
extra_headers = extra_headers.items()
for key, value in extra_headers:
connection.putheader(key, value)
##
# Send user-agent identifier.
#
# @param connection Connection handle.
def send_user_agent(self, connection):
connection.putheader("User-Agent", self.user_agent)
##
# Send request body.
#
# @param connection Connection handle.
# @param request_body XML-RPC request body.
def send_content(self, connection, request_body):
connection.putheader("Content-Type", "text/xml")
#optionally encode the request
if (self.encode_threshold is not None and
self.encode_threshold < len(request_body) and
gzip):
connection.putheader("Content-Encoding", "gzip")
request_body = gzip_encode(request_body)
connection.putheader("Content-Length", str(len(request_body)))
connection.endheaders(request_body)
##
# Parse response.
#
# @param file Stream.
# @return Response tuple and target method.
def parse_response(self, response):
# read response data from httpresponse, and parse it
# Check for new http response object, else it is a file object
if hasattr(response,'getheader'):
if response.getheader("Content-Encoding", "") == "gzip":
stream = GzipDecodedResponse(response)
else:
stream = response
else:
stream = response
p, u = self.getparser()
while 1:
data = stream.read(1024)
if not data:
break
if self.verbose:
print "body:", repr(data)
p.feed(data)
if stream is not response:
stream.close()
p.close()
return u.close()
##
# Standard transport class for XML-RPC over HTTPS.
class SafeTransport(Transport):
"""Handles an HTTPS transaction to an XML-RPC server."""
def __init__(self, use_datetime=0, context=None):
Transport.__init__(self, use_datetime=use_datetime)
self.context = context
# FIXME: mostly untested
def make_connection(self, host):
if self._connection and host == self._connection[0]:
return self._connection[1]
# create a HTTPS connection object from a host descriptor
# host may be a string, or a (host, x509-dict) tuple
try:
HTTPS = httplib.HTTPSConnection
except AttributeError:
raise NotImplementedError(
"your version of httplib doesn't support HTTPS"
)
else:
chost, self._extra_headers, x509 = self.get_host_info(host)
self._connection = host, HTTPS(chost, None, context=self.context, **(x509 or {}))
return self._connection[1]
##
# Standard server proxy. This class establishes a virtual connection
# to an XML-RPC server.
# <p>
# This class is available as ServerProxy and Server. New code should
# use ServerProxy, to avoid confusion.
#
# @def ServerProxy(uri, **options)
# @param uri The connection point on the server.
# @keyparam transport A transport factory, compatible with the
# standard transport class.
# @keyparam encoding The default encoding used for 8-bit strings
# (default is UTF-8).
# @keyparam verbose Use a true value to enable debugging output.
# (printed to standard output).
# @see Transport
class ServerProxy:
"""uri [,options] -> a logical connection to an XML-RPC server
uri is the connection point on the server, given as
scheme://host/target.
The standard implementation always supports the "http" scheme. If
SSL socket support is available (Python 2.0), it also supports
"https".
If the target part and the slash preceding it are both omitted,
"/RPC2" is assumed.
The following options can be given as keyword arguments:
transport: a transport factory
encoding: the request encoding (default is UTF-8)
All 8-bit strings passed to the server proxy are assumed to use
the given encoding.
"""
def __init__(self, uri, transport=None, encoding=None, verbose=0,
allow_none=0, use_datetime=0, context=None):
# establish a "logical" server connection
if unicode and isinstance(uri, unicode):
uri = uri.encode('ISO-8859-1')
# get the url
import urllib
type, uri = urllib.splittype(uri)
if type not in ("http", "https"):
raise IOError, "unsupported XML-RPC protocol"
self.__host, self.__handler = urllib.splithost(uri)
if not self.__handler:
self.__handler = "/RPC2"
if transport is None:
if type == "https":
transport = SafeTransport(use_datetime=use_datetime, context=context)
else:
transport = Transport(use_datetime=use_datetime)
self.__transport = transport
self.__encoding = encoding
self.__verbose = verbose
self.__allow_none = allow_none
def __close(self):
self.__transport.close()
def __request(self, methodname, params):
# call a method on the remote server
request = dumps(params, methodname, encoding=self.__encoding,
allow_none=self.__allow_none)
response = self.__transport.request(
self.__host,
self.__handler,
request,
verbose=self.__verbose
)
if len(response) == 1:
response = response[0]
return response
def __repr__(self):
return (
"<ServerProxy for %s%s>" %
(self.__host, self.__handler)
)
__str__ = __repr__
def __getattr__(self, name):
# magic method dispatcher
return _Method(self.__request, name)
# note: to call a remote object with a non-standard name, use
# result getattr(server, "strange-python-name")(args)
def __call__(self, attr):
"""A workaround to get special attributes on the ServerProxy
without interfering with the magic __getattr__
"""
if attr == "close":
return self.__close
elif attr == "transport":
return self.__transport
raise AttributeError("Attribute %r not found" % (attr,))
# compatibility
Server = ServerProxy
# --------------------------------------------------------------------
# test code
if __name__ == "__main__":
server = ServerProxy("http://localhost:8000")
print server
multi = MultiCall(server)
multi.pow(2, 9)
multi.add(5, 1)
multi.add(24, 11)
try:
for response in multi():
print response
except Error, v:
print "ERROR", v
"""
Read and write ZIP files.
"""
import struct, os, time, sys, shutil
import binascii, cStringIO, stat
import io
import re
import string
try:
import zlib # We may need its compression method
crc32 = zlib.crc32
except ImportError:
zlib = None
crc32 = binascii.crc32
__all__ = ["BadZipfile", "error", "ZIP_STORED", "ZIP_DEFLATED", "is_zipfile",
"ZipInfo", "ZipFile", "PyZipFile", "LargeZipFile" ]
class BadZipfile(Exception):
pass
class LargeZipFile(Exception):
"""
Raised when writing a zipfile, the zipfile requires ZIP64 extensions
and those extensions are disabled.
"""
error = BadZipfile # The exception raised by this module
ZIP64_LIMIT = (1 << 31) - 1
ZIP_FILECOUNT_LIMIT = (1 << 16) - 1
ZIP_MAX_COMMENT = (1 << 16) - 1
# constants for Zip file compression methods
ZIP_STORED = 0
ZIP_DEFLATED = 8
# Other ZIP compression methods not supported
# Below are some formats and associated data for reading/writing headers using
# the struct module. The names and structures of headers/records are those used
# in the PKWARE description of the ZIP file format:
# http://www.pkware.com/documents/casestudies/APPNOTE.TXT
# (URL valid as of January 2008)
# The "end of central directory" structure, magic number, size, and indices
# (section V.I in the format document)
structEndArchive = "<4s4H2LH"
stringEndArchive = "PK\005\006"
sizeEndCentDir = struct.calcsize(structEndArchive)
_ECD_SIGNATURE = 0
_ECD_DISK_NUMBER = 1
_ECD_DISK_START = 2
_ECD_ENTRIES_THIS_DISK = 3
_ECD_ENTRIES_TOTAL = 4
_ECD_SIZE = 5
_ECD_OFFSET = 6
_ECD_COMMENT_SIZE = 7
# These last two indices are not part of the structure as defined in the
# spec, but they are used internally by this module as a convenience
_ECD_COMMENT = 8
_ECD_LOCATION = 9
# The "central directory" structure, magic number, size, and indices
# of entries in the structure (section V.F in the format document)
structCentralDir = "<4s4B4HL2L5H2L"
stringCentralDir = "PK\001\002"
sizeCentralDir = struct.calcsize(structCentralDir)
# indexes of entries in the central directory structure
_CD_SIGNATURE = 0
_CD_CREATE_VERSION = 1
_CD_CREATE_SYSTEM = 2
_CD_EXTRACT_VERSION = 3
_CD_EXTRACT_SYSTEM = 4
_CD_FLAG_BITS = 5
_CD_COMPRESS_TYPE = 6
_CD_TIME = 7
_CD_DATE = 8
_CD_CRC = 9
_CD_COMPRESSED_SIZE = 10
_CD_UNCOMPRESSED_SIZE = 11
_CD_FILENAME_LENGTH = 12
_CD_EXTRA_FIELD_LENGTH = 13
_CD_COMMENT_LENGTH = 14
_CD_DISK_NUMBER_START = 15
_CD_INTERNAL_FILE_ATTRIBUTES = 16
_CD_EXTERNAL_FILE_ATTRIBUTES = 17
_CD_LOCAL_HEADER_OFFSET = 18
# The "local file header" structure, magic number, size, and indices
# (section V.A in the format document)
structFileHeader = "<4s2B4HL2L2H"
stringFileHeader = "PK\003\004"
sizeFileHeader = struct.calcsize(structFileHeader)
_FH_SIGNATURE = 0
_FH_EXTRACT_VERSION = 1
_FH_EXTRACT_SYSTEM = 2
_FH_GENERAL_PURPOSE_FLAG_BITS = 3
_FH_COMPRESSION_METHOD = 4
_FH_LAST_MOD_TIME = 5
_FH_LAST_MOD_DATE = 6
_FH_CRC = 7
_FH_COMPRESSED_SIZE = 8
_FH_UNCOMPRESSED_SIZE = 9
_FH_FILENAME_LENGTH = 10
_FH_EXTRA_FIELD_LENGTH = 11
# The "Zip64 end of central directory locator" structure, magic number, and size
structEndArchive64Locator = "<4sLQL"
stringEndArchive64Locator = "PK\x06\x07"
sizeEndCentDir64Locator = struct.calcsize(structEndArchive64Locator)
# The "Zip64 end of central directory" record, magic number, size, and indices
# (section V.G in the format document)
structEndArchive64 = "<4sQ2H2L4Q"
stringEndArchive64 = "PK\x06\x06"
sizeEndCentDir64 = struct.calcsize(structEndArchive64)
_CD64_SIGNATURE = 0
_CD64_DIRECTORY_RECSIZE = 1
_CD64_CREATE_VERSION = 2
_CD64_EXTRACT_VERSION = 3
_CD64_DISK_NUMBER = 4
_CD64_DISK_NUMBER_START = 5
_CD64_NUMBER_ENTRIES_THIS_DISK = 6
_CD64_NUMBER_ENTRIES_TOTAL = 7
_CD64_DIRECTORY_SIZE = 8
_CD64_OFFSET_START_CENTDIR = 9
_DD_SIGNATURE = 0x08074b50
_EXTRA_FIELD_STRUCT = struct.Struct('<HH')
def _strip_extra(extra, xids):
# Remove Extra Fields with specified IDs.
unpack = _EXTRA_FIELD_STRUCT.unpack
modified = False
buffer = []
start = i = 0
while i + 4 <= len(extra):
xid, xlen = unpack(extra[i : i + 4])
j = i + 4 + xlen
if xid in xids:
if i != start:
buffer.append(extra[start : i])
start = j
modified = True
i = j
if not modified:
return extra
return b''.join(buffer)
def _check_zipfile(fp):
try:
if _EndRecData(fp):
return True # file has correct magic number
except IOError:
pass
return False
def is_zipfile(filename):
"""Quickly see if a file is a ZIP file by checking the magic number.
The filename argument may be a file or file-like object too.
"""
result = False
try:
if hasattr(filename, "read"):
result = _check_zipfile(fp=filename)
else:
with open(filename, "rb") as fp:
result = _check_zipfile(fp)
except IOError:
pass
return result
def _EndRecData64(fpin, offset, endrec):
"""
Read the ZIP64 end-of-archive records and use that to update endrec
"""
try:
fpin.seek(offset - sizeEndCentDir64Locator, 2)
except IOError:
# If the seek fails, the file is not large enough to contain a ZIP64
# end-of-archive record, so just return the end record we were given.
return endrec
data = fpin.read(sizeEndCentDir64Locator)
if len(data) != sizeEndCentDir64Locator:
return endrec
sig, diskno, reloff, disks = struct.unpack(structEndArchive64Locator, data)
if sig != stringEndArchive64Locator:
return endrec
if diskno != 0 or disks != 1:
raise BadZipfile("zipfiles that span multiple disks are not supported")
# Assume no 'zip64 extensible data'
fpin.seek(offset - sizeEndCentDir64Locator - sizeEndCentDir64, 2)
data = fpin.read(sizeEndCentDir64)
if len(data) != sizeEndCentDir64:
return endrec
sig, sz, create_version, read_version, disk_num, disk_dir, \
dircount, dircount2, dirsize, diroffset = \
struct.unpack(structEndArchive64, data)
if sig != stringEndArchive64:
return endrec
# Update the original endrec using data from the ZIP64 record
endrec[_ECD_SIGNATURE] = sig
endrec[_ECD_DISK_NUMBER] = disk_num
endrec[_ECD_DISK_START] = disk_dir
endrec[_ECD_ENTRIES_THIS_DISK] = dircount
endrec[_ECD_ENTRIES_TOTAL] = dircount2
endrec[_ECD_SIZE] = dirsize
endrec[_ECD_OFFSET] = diroffset
return endrec
def _EndRecData(fpin):
"""Return data from the "End of Central Directory" record, or None.
The data is a list of the nine items in the ZIP "End of central dir"
record followed by a tenth item, the file seek offset of this record."""
# Determine file size
fpin.seek(0, 2)
filesize = fpin.tell()
# Check to see if this is ZIP file with no archive comment (the
# "end of central directory" structure should be the last item in the
# file if this is the case).
try:
fpin.seek(-sizeEndCentDir, 2)
except IOError:
return None
data = fpin.read()
if (len(data) == sizeEndCentDir and
data[0:4] == stringEndArchive and
data[-2:] == b"\000\000"):
# the signature is correct and there's no comment, unpack structure
endrec = struct.unpack(structEndArchive, data)
endrec=list(endrec)
# Append a blank comment and record start offset
endrec.append("")
endrec.append(filesize - sizeEndCentDir)
# Try to read the "Zip64 end of central directory" structure
return _EndRecData64(fpin, -sizeEndCentDir, endrec)
# Either this is not a ZIP file, or it is a ZIP file with an archive
# comment. Search the end of the file for the "end of central directory"
# record signature. The comment is the last item in the ZIP file and may be
# up to 64K long. It is assumed that the "end of central directory" magic
# number does not appear in the comment.
maxCommentStart = max(filesize - (1 << 16) - sizeEndCentDir, 0)
fpin.seek(maxCommentStart, 0)
data = fpin.read()
start = data.rfind(stringEndArchive)
if start >= 0:
# found the magic number; attempt to unpack and interpret
recData = data[start:start+sizeEndCentDir]
if len(recData) != sizeEndCentDir:
# Zip file is corrupted.
return None
endrec = list(struct.unpack(structEndArchive, recData))
commentSize = endrec[_ECD_COMMENT_SIZE] #as claimed by the zip file
comment = data[start+sizeEndCentDir:start+sizeEndCentDir+commentSize]
endrec.append(comment)
endrec.append(maxCommentStart + start)
# Try to read the "Zip64 end of central directory" structure
return _EndRecData64(fpin, maxCommentStart + start - filesize,
endrec)
# Unable to find a valid end of central directory structure
return None
class ZipInfo (object):
"""Class with attributes describing each file in the ZIP archive."""
__slots__ = (
'orig_filename',
'filename',
'date_time',
'compress_type',
'comment',
'extra',
'create_system',
'create_version',
'extract_version',
'reserved',
'flag_bits',
'volume',
'internal_attr',
'external_attr',
'header_offset',
'CRC',
'compress_size',
'file_size',
'_raw_time',
)
def __init__(self, filename="NoName", date_time=(1980,1,1,0,0,0)):
self.orig_filename = filename # Original file name in archive
# Terminate the file name at the first null byte. Null bytes in file
# names are used as tricks by viruses in archives.
null_byte = filename.find(chr(0))
if null_byte >= 0:
filename = filename[0:null_byte]
# This is used to ensure paths in generated ZIP files always use
# forward slashes as the directory separator, as required by the
# ZIP format specification.
if os.sep != "/" and os.sep in filename:
filename = filename.replace(os.sep, "/")
self.filename = filename # Normalized file name
self.date_time = date_time # year, month, day, hour, min, sec
if date_time[0] < 1980:
raise ValueError('ZIP does not support timestamps before 1980')
# Standard values:
self.compress_type = ZIP_STORED # Type of compression for the file
self.comment = "" # Comment for each file
self.extra = "" # ZIP extra data
if sys.platform == 'win32':
self.create_system = 0 # System which created ZIP archive
else:
# Assume everything else is unix-y
self.create_system = 3 # System which created ZIP archive
self.create_version = 20 # Version which created ZIP archive
self.extract_version = 20 # Version needed to extract archive
self.reserved = 0 # Must be zero
self.flag_bits = 0 # ZIP flag bits
self.volume = 0 # Volume number of file header
self.internal_attr = 0 # Internal attributes
self.external_attr = 0 # External file attributes
# Other attributes are set by class ZipFile:
# header_offset Byte offset to the file header
# CRC CRC-32 of the uncompressed file
# compress_size Size of the compressed file
# file_size Size of the uncompressed file
def FileHeader(self, zip64=None):
"""Return the per-file header as a string."""
dt = self.date_time
dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
if self.flag_bits & 0x08:
# Set these to zero because we write them after the file data
CRC = compress_size = file_size = 0
else:
CRC = self.CRC
compress_size = self.compress_size
file_size = self.file_size
extra = self.extra
if zip64 is None:
zip64 = file_size > ZIP64_LIMIT or compress_size > ZIP64_LIMIT
if zip64:
fmt = '<HHQQ'
extra = extra + struct.pack(fmt,
1, struct.calcsize(fmt)-4, file_size, compress_size)
if file_size > ZIP64_LIMIT or compress_size > ZIP64_LIMIT:
if not zip64:
raise LargeZipFile("Filesize would require ZIP64 extensions")
# File is larger than what fits into a 4 byte integer,
# fall back to the ZIP64 extension
file_size = 0xffffffff
compress_size = 0xffffffff
self.extract_version = max(45, self.extract_version)
self.create_version = max(45, self.extract_version)
filename, flag_bits = self._encodeFilenameFlags()
header = struct.pack(structFileHeader, stringFileHeader,
self.extract_version, self.reserved, flag_bits,
self.compress_type, dostime, dosdate, CRC,
compress_size, file_size,
len(filename), len(extra))
return header + filename + extra
def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits
def _decodeFilename(self):
if self.flag_bits & 0x800:
return self.filename.decode('utf-8')
else:
return self.filename
def _decodeExtra(self):
# Try to decode the extra field.
extra = self.extra
unpack = struct.unpack
while len(extra) >= 4:
tp, ln = unpack('<HH', extra[:4])
if tp == 1:
if ln >= 24:
counts = unpack('<QQQ', extra[4:28])
elif ln == 16:
counts = unpack('<QQ', extra[4:20])
elif ln == 8:
counts = unpack('<Q', extra[4:12])
elif ln == 0:
counts = ()
else:
raise RuntimeError, "Corrupt extra field %s"%(ln,)
idx = 0
# ZIP64 extension (large files and/or large archives)
if self.file_size in (0xffffffffffffffffL, 0xffffffffL):
self.file_size = counts[idx]
idx += 1
if self.compress_size == 0xFFFFFFFFL:
self.compress_size = counts[idx]
idx += 1
if self.header_offset == 0xffffffffL:
old = self.header_offset
self.header_offset = counts[idx]
idx+=1
extra = extra[ln+4:]
class _ZipDecrypter:
"""Class to handle decryption of files stored within a ZIP archive.
ZIP supports a password-based form of encryption. Even though known
plaintext attacks have been found against it, it is still useful
to be able to get data out of such a file.
Usage:
zd = _ZipDecrypter(mypwd)
plain_char = zd(cypher_char)
plain_text = map(zd, cypher_text)
"""
def _GenerateCRCTable():
"""Generate a CRC-32 table.
ZIP encryption uses the CRC32 one-byte primitive for scrambling some
internal keys. We noticed that a direct implementation is faster than
relying on binascii.crc32().
"""
poly = 0xedb88320
table = [0] * 256
for i in range(256):
crc = i
for j in range(8):
if crc & 1:
crc = ((crc >> 1) & 0x7FFFFFFF) ^ poly
else:
crc = ((crc >> 1) & 0x7FFFFFFF)
table[i] = crc
return table
crctable = _GenerateCRCTable()
def _crc32(self, ch, crc):
"""Compute the CRC32 primitive on one byte."""
return ((crc >> 8) & 0xffffff) ^ self.crctable[(crc ^ ord(ch)) & 0xff]
def __init__(self, pwd):
self.key0 = 305419896
self.key1 = 591751049
self.key2 = 878082192
for p in pwd:
self._UpdateKeys(p)
def _UpdateKeys(self, c):
self.key0 = self._crc32(c, self.key0)
self.key1 = (self.key1 + (self.key0 & 255)) & 4294967295
self.key1 = (self.key1 * 134775813 + 1) & 4294967295
self.key2 = self._crc32(chr((self.key1 >> 24) & 255), self.key2)
def __call__(self, c):
"""Decrypt a single character."""
c = ord(c)
k = self.key2 | 2
c = c ^ (((k * (k^1)) >> 8) & 255)
c = chr(c)
self._UpdateKeys(c)
return c
compressor_names = {
0: 'store',
1: 'shrink',
2: 'reduce',
3: 'reduce',
4: 'reduce',
5: 'reduce',
6: 'implode',
7: 'tokenize',
8: 'deflate',
9: 'deflate64',
10: 'implode',
12: 'bzip2',
14: 'lzma',
18: 'terse',
19: 'lz77',
97: 'wavpack',
98: 'ppmd',
}
class ZipExtFile(io.BufferedIOBase):
"""File-like object for reading an archive member.
Is returned by ZipFile.open().
"""
# Max size supported by decompressor.
MAX_N = 1 << 31 - 1
# Read from compressed files in 4k blocks.
MIN_READ_SIZE = 4096
# Search for universal newlines or line chunks.
PATTERN = re.compile(r'^(?P<chunk>[^\r\n]+)|(?P<newline>\n|\r\n?)')
def __init__(self, fileobj, mode, zipinfo, decrypter=None,
close_fileobj=False):
self._fileobj = fileobj
self._decrypter = decrypter
self._close_fileobj = close_fileobj
self._compress_type = zipinfo.compress_type
self._compress_size = zipinfo.compress_size
self._compress_left = zipinfo.compress_size
if self._compress_type == ZIP_DEFLATED:
self._decompressor = zlib.decompressobj(-15)
elif self._compress_type != ZIP_STORED:
descr = compressor_names.get(self._compress_type)
if descr:
raise NotImplementedError("compression type %d (%s)" % (self._compress_type, descr))
else:
raise NotImplementedError("compression type %d" % (self._compress_type,))
self._unconsumed = ''
self._readbuffer = ''
self._offset = 0
self._universal = 'U' in mode
self.newlines = None
# Adjust read size for encrypted files since the first 12 bytes
# are for the encryption/password information.
if self._decrypter is not None:
self._compress_left -= 12
self.mode = mode
self.name = zipinfo.filename
if hasattr(zipinfo, 'CRC'):
self._expected_crc = zipinfo.CRC
self._running_crc = crc32(b'') & 0xffffffff
else:
self._expected_crc = None
def readline(self, limit=-1):
"""Read and return a line from the stream.
If limit is specified, at most limit bytes will be read.
"""
if not self._universal and limit < 0:
# Shortcut common case - newline found in buffer.
i = self._readbuffer.find('\n', self._offset) + 1
if i > 0:
line = self._readbuffer[self._offset: i]
self._offset = i
return line
if not self._universal:
return io.BufferedIOBase.readline(self, limit)
line = ''
while limit < 0 or len(line) < limit:
readahead = self.peek(2)
if readahead == '':
return line
#
# Search for universal newlines or line chunks.
#
# The pattern returns either a line chunk or a newline, but not
# both. Combined with peek(2), we are assured that the sequence
# '\r\n' is always retrieved completely and never split into
# separate newlines - '\r', '\n' due to coincidental readaheads.
#
match = self.PATTERN.search(readahead)
newline = match.group('newline')
if newline is not None:
if self.newlines is None:
self.newlines = []
if newline not in self.newlines:
self.newlines.append(newline)
self._offset += len(newline)
return line + '\n'
chunk = match.group('chunk')
if limit >= 0:
chunk = chunk[: limit - len(line)]
self._offset += len(chunk)
line += chunk
return line
def peek(self, n=1):
"""Returns buffered bytes without advancing the position."""
if n > len(self._readbuffer) - self._offset:
chunk = self.read(n)
if len(chunk) > self._offset:
self._readbuffer = chunk + self._readbuffer[self._offset:]
self._offset = 0
else:
self._offset -= len(chunk)
# Return up to 512 bytes to reduce allocation overhead for tight loops.
return self._readbuffer[self._offset: self._offset + 512]
def readable(self):
return True
def read(self, n=-1):
"""Read and return up to n bytes.
If the argument is omitted, None, or negative, data is read and returned until EOF is reached..
"""
buf = ''
if n is None:
n = -1
while True:
if n < 0:
data = self.read1(n)
elif n > len(buf):
data = self.read1(n - len(buf))
else:
return buf
if len(data) == 0:
return buf
buf += data
def _update_crc(self, newdata, eof):
# Update the CRC using the given data.
if self._expected_crc is None:
# No need to compute the CRC if we don't have a reference value
return
self._running_crc = crc32(newdata, self._running_crc) & 0xffffffff
# Check the CRC if we're at the end of the file
if eof and self._running_crc != self._expected_crc:
raise BadZipfile("Bad CRC-32 for file %r" % self.name)
def read1(self, n):
"""Read up to n bytes with at most one read() system call."""
# Simplify algorithm (branching) by transforming negative n to large n.
if n < 0 or n is None:
n = self.MAX_N
# Bytes available in read buffer.
len_readbuffer = len(self._readbuffer) - self._offset
# Read from file.
if self._compress_left > 0 and n > len_readbuffer + len(self._unconsumed):
nbytes = n - len_readbuffer - len(self._unconsumed)
nbytes = max(nbytes, self.MIN_READ_SIZE)
nbytes = min(nbytes, self._compress_left)
data = self._fileobj.read(nbytes)
self._compress_left -= len(data)
if data and self._decrypter is not None:
data = ''.join(map(self._decrypter, data))
if self._compress_type == ZIP_STORED:
self._update_crc(data, eof=(self._compress_left==0))
self._readbuffer = self._readbuffer[self._offset:] + data
self._offset = 0
else:
# Prepare deflated bytes for decompression.
self._unconsumed += data
# Handle unconsumed data.
if (len(self._unconsumed) > 0 and n > len_readbuffer and
self._compress_type == ZIP_DEFLATED):
data = self._decompressor.decompress(
self._unconsumed,
max(n - len_readbuffer, self.MIN_READ_SIZE)
)
self._unconsumed = self._decompressor.unconsumed_tail
eof = len(self._unconsumed) == 0 and self._compress_left == 0
if eof:
data += self._decompressor.flush()
self._update_crc(data, eof=eof)
self._readbuffer = self._readbuffer[self._offset:] + data
self._offset = 0
# Read from buffer.
data = self._readbuffer[self._offset: self._offset + n]
self._offset += len(data)
return data
def close(self):
try :
if self._close_fileobj:
self._fileobj.close()
finally:
super(ZipExtFile, self).close()
class ZipFile(object):
""" Class with methods to open, read, write, close, list zip files.
z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)
file: Either the path to the file, or a file-like object.
If it is a path, the file will be opened and closed by ZipFile.
mode: The mode can be either read "r", write "w" or append "a".
compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
allowZip64: if True ZipFile will create files with ZIP64 extensions when
needed, otherwise it will raise an exception when this would
be necessary.
"""
fp = None # Set here since __del__ checks it
def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
"""Open the ZIP file with mode read "r", write "w" or append "a"."""
if mode not in ("r", "w", "a"):
raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
if compression == ZIP_STORED:
pass
elif compression == ZIP_DEFLATED:
if not zlib:
raise RuntimeError,\
"Compression requires the (missing) zlib module"
else:
raise RuntimeError, "That compression method is not supported"
self._allowZip64 = allowZip64
self._didModify = False
self.debug = 0 # Level of printing: 0 through 3
self.NameToInfo = {} # Find file info given name
self.filelist = [] # List of ZipInfo instances for archive
self.compression = compression # Method of compression
self.mode = key = mode.replace('b', '')[0]
self.pwd = None
self._comment = ''
# Check if we were passed a file-like object
if isinstance(file, basestring):
self._filePassed = 0
self.filename = file
modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
try:
self.fp = open(file, modeDict[mode])
except IOError:
if mode == 'a':
mode = key = 'w'
self.fp = open(file, modeDict[mode])
else:
raise
else:
self._filePassed = 1
self.fp = file
self.filename = getattr(file, 'name', None)
try:
if key == 'r':
self._RealGetContents()
elif key == 'w':
# set the modified flag so central directory gets written
# even if no files are added to the archive
self._didModify = True
elif key == 'a':
try:
# See if file is a zip file
self._RealGetContents()
# seek to start of directory and overwrite
self.fp.seek(self.start_dir, 0)
except BadZipfile:
# file is not a zip file, just append
self.fp.seek(0, 2)
# set the modified flag so central directory gets written
# even if no files are added to the archive
self._didModify = True
else:
raise RuntimeError('Mode must be "r", "w" or "a"')
except:
fp = self.fp
self.fp = None
if not self._filePassed:
fp.close()
raise
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.close()
def _RealGetContents(self):
"""Read in the table of contents for the ZIP file."""
fp = self.fp
try:
endrec = _EndRecData(fp)
except IOError:
raise BadZipfile("File is not a zip file")
if not endrec:
raise BadZipfile, "File is not a zip file"
if self.debug > 1:
print endrec
size_cd = endrec[_ECD_SIZE] # bytes in central directory
offset_cd = endrec[_ECD_OFFSET] # offset of central directory
self._comment = endrec[_ECD_COMMENT] # archive comment
# "concat" is zero, unless zip was concatenated to another file
concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
if endrec[_ECD_SIGNATURE] == stringEndArchive64:
# If Zip64 extension structures are present, account for them
concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
if self.debug > 2:
inferred = concat + offset_cd
print "given, inferred, offset", offset_cd, inferred, concat
# self.start_dir: Position of start of central directory
self.start_dir = offset_cd + concat
fp.seek(self.start_dir, 0)
data = fp.read(size_cd)
fp = cStringIO.StringIO(data)
total = 0
while total < size_cd:
centdir = fp.read(sizeCentralDir)
if len(centdir) != sizeCentralDir:
raise BadZipfile("Truncated central directory")
centdir = struct.unpack(structCentralDir, centdir)
if centdir[_CD_SIGNATURE] != stringCentralDir:
raise BadZipfile("Bad magic number for central directory")
if self.debug > 2:
print centdir
filename = fp.read(centdir[_CD_FILENAME_LENGTH])
# Create ZipInfo instance to store file information
x = ZipInfo(filename)
x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
(x.create_version, x.create_system, x.extract_version, x.reserved,
x.flag_bits, x.compress_type, t, d,
x.CRC, x.compress_size, x.file_size) = centdir[1:12]
x.volume, x.internal_attr, x.external_attr = centdir[15:18]
# Convert date/time code to (year, month, day, hour, min, sec)
x._raw_time = t
x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
x._decodeExtra()
x.header_offset = x.header_offset + concat
x.filename = x._decodeFilename()
self.filelist.append(x)
self.NameToInfo[x.filename] = x
# update total bytes read from central directory
total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
+ centdir[_CD_EXTRA_FIELD_LENGTH]
+ centdir[_CD_COMMENT_LENGTH])
if self.debug > 2:
print "total", total
def namelist(self):
"""Return a list of file names in the archive."""
l = []
for data in self.filelist:
l.append(data.filename)
return l
def infolist(self):
"""Return a list of class ZipInfo instances for files in the
archive."""
return self.filelist
def printdir(self):
"""Print a table of contents for the zip file."""
print "%-46s %19s %12s" % ("File Name", "Modified ", "Size")
for zinfo in self.filelist:
date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)
def testzip(self):
"""Read all the files and check the CRC."""
chunk_size = 2 ** 20
for zinfo in self.filelist:
try:
# Read by chunks, to avoid an OverflowError or a
# MemoryError with very large embedded files.
with self.open(zinfo.filename, "r") as f:
while f.read(chunk_size): # Check CRC-32
pass
except BadZipfile:
return zinfo.filename
def getinfo(self, name):
"""Return the instance of ZipInfo given 'name'."""
info = self.NameToInfo.get(name)
if info is None:
raise KeyError(
'There is no item named %r in the archive' % name)
return info
def setpassword(self, pwd):
"""Set default password for encrypted files."""
self.pwd = pwd
@property
def comment(self):
"""The comment text associated with the ZIP file."""
return self._comment
@comment.setter
def comment(self, comment):
# check for valid comment length
if len(comment) > ZIP_MAX_COMMENT:
import warnings
warnings.warn('Archive comment is too long; truncating to %d bytes'
% ZIP_MAX_COMMENT, stacklevel=2)
comment = comment[:ZIP_MAX_COMMENT]
self._comment = comment
self._didModify = True
def read(self, name, pwd=None):
"""Return file bytes (as a string) for name."""
return self.open(name, "r", pwd).read()
def open(self, name, mode="r", pwd=None):
"""Return file-like object for 'name'."""
if mode not in ("r", "U", "rU"):
raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
if not self.fp:
raise RuntimeError, \
"Attempt to read ZIP archive that was already closed"
# Only open a new file for instances where we were not
# given a file object in the constructor
if self._filePassed:
zef_file = self.fp
should_close = False
else:
zef_file = open(self.filename, 'rb')
should_close = True
try:
# Make sure we have an info object
if isinstance(name, ZipInfo):
# 'name' is already an info object
zinfo = name
else:
# Get info object for name
zinfo = self.getinfo(name)
zef_file.seek(zinfo.header_offset, 0)
# Skip the file header:
fheader = zef_file.read(sizeFileHeader)
if len(fheader) != sizeFileHeader:
raise BadZipfile("Truncated file header")
fheader = struct.unpack(structFileHeader, fheader)
if fheader[_FH_SIGNATURE] != stringFileHeader:
raise BadZipfile("Bad magic number for file header")
fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
if fheader[_FH_EXTRA_FIELD_LENGTH]:
zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
if fname != zinfo.orig_filename:
raise BadZipfile, \
'File name in directory "%s" and header "%s" differ.' % (
zinfo.orig_filename, fname)
# check for encrypted flag & handle password
is_encrypted = zinfo.flag_bits & 0x1
zd = None
if is_encrypted:
if not pwd:
pwd = self.pwd
if not pwd:
raise RuntimeError, "File %s is encrypted, " \
"password required for extraction" % name
zd = _ZipDecrypter(pwd)
# The first 12 bytes in the cypher stream is an encryption header
# used to strengthen the algorithm. The first 11 bytes are
# completely random, while the 12th contains the MSB of the CRC,
# or the MSB of the file time depending on the header type
# and is used to check the correctness of the password.
bytes = zef_file.read(12)
h = map(zd, bytes[0:12])
if zinfo.flag_bits & 0x8:
# compare against the file type from extended local headers
check_byte = (zinfo._raw_time >> 8) & 0xff
else:
# compare against the CRC otherwise
check_byte = (zinfo.CRC >> 24) & 0xff
if ord(h[11]) != check_byte:
raise RuntimeError("Bad password for file", name)
return ZipExtFile(zef_file, mode, zinfo, zd,
close_fileobj=should_close)
except:
if should_close:
zef_file.close()
raise
def extract(self, member, path=None, pwd=None):
"""Extract a member from the archive to the current working directory,
using its full name. Its file information is extracted as accurately
as possible. `member' may be a filename or a ZipInfo object. You can
specify a different directory using `path'.
"""
if not isinstance(member, ZipInfo):
member = self.getinfo(member)
if path is None:
path = os.getcwd()
return self._extract_member(member, path, pwd)
def extractall(self, path=None, members=None, pwd=None):
"""Extract all members from the archive to the current working
directory. `path' specifies a different directory to extract to.
`members' is optional and must be a subset of the list returned
by namelist().
"""
if members is None:
members = self.namelist()
for zipinfo in members:
self.extract(zipinfo, path, pwd)
def _extract_member(self, member, targetpath, pwd):
"""Extract the ZipInfo object 'member' to a physical
file on the path targetpath.
"""
# build the destination pathname, replacing
# forward slashes to platform specific separators.
arcname = member.filename.replace('/', os.path.sep)
if os.path.altsep:
arcname = arcname.replace(os.path.altsep, os.path.sep)
# interpret absolute pathname as relative, remove drive letter or
# UNC path, redundant separators, "." and ".." components.
arcname = os.path.splitdrive(arcname)[1]
arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
if x not in ('', os.path.curdir, os.path.pardir))
if os.path.sep == '\\':
# filter illegal characters on Windows
illegal = ':<>|"?*'
if isinstance(arcname, unicode):
table = {ord(c): ord('_') for c in illegal}
else:
table = string.maketrans(illegal, '_' * len(illegal))
arcname = arcname.translate(table)
# remove trailing dots
arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))
arcname = os.path.sep.join(x for x in arcname if x)
targetpath = os.path.join(targetpath, arcname)
targetpath = os.path.normpath(targetpath)
# Create all upper directories if necessary.
upperdirs = os.path.dirname(targetpath)
if upperdirs and not os.path.exists(upperdirs):
os.makedirs(upperdirs)
if member.filename[-1] == '/':
if not os.path.isdir(targetpath):
os.mkdir(targetpath)
return targetpath
with self.open(member, pwd=pwd) as source, \
file(targetpath, "wb") as target:
shutil.copyfileobj(source, target)
return targetpath
def _writecheck(self, zinfo):
"""Check for errors before writing a file to the archive."""
if zinfo.filename in self.NameToInfo:
import warnings
warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
if self.mode not in ("w", "a"):
raise RuntimeError, 'write() requires mode "w" or "a"'
if not self.fp:
raise RuntimeError, \
"Attempt to write ZIP archive that was already closed"
if zinfo.compress_type == ZIP_DEFLATED and not zlib:
raise RuntimeError, \
"Compression requires the (missing) zlib module"
if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
raise RuntimeError, \
"That compression method is not supported"
if not self._allowZip64:
requires_zip64 = None
if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
requires_zip64 = "Files count"
elif zinfo.file_size > ZIP64_LIMIT:
requires_zip64 = "Filesize"
elif zinfo.header_offset > ZIP64_LIMIT:
requires_zip64 = "Zipfile size"
if requires_zip64:
raise LargeZipFile(requires_zip64 +
" would require ZIP64 extensions")
def write(self, filename, arcname=None, compress_type=None):
"""Put the bytes from filename into the archive under the name
arcname."""
if not self.fp:
raise RuntimeError(
"Attempt to write to ZIP archive that was already closed")
st = os.stat(filename)
isdir = stat.S_ISDIR(st.st_mode)
mtime = time.localtime(st.st_mtime)
date_time = mtime[0:6]
# Create ZipInfo instance to store file information
if arcname is None:
arcname = filename
arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
while arcname[0] in (os.sep, os.altsep):
arcname = arcname[1:]
if isdir:
arcname += '/'
zinfo = ZipInfo(arcname, date_time)
zinfo.external_attr = (st[0] & 0xFFFF) << 16L # Unix attributes
if isdir:
zinfo.compress_type = ZIP_STORED
elif compress_type is None:
zinfo.compress_type = self.compression
else:
zinfo.compress_type = compress_type
zinfo.file_size = st.st_size
zinfo.flag_bits = 0x00
zinfo.header_offset = self.fp.tell() # Start of header bytes
self._writecheck(zinfo)
self._didModify = True
if isdir:
zinfo.file_size = 0
zinfo.compress_size = 0
zinfo.CRC = 0
zinfo.external_attr |= 0x10 # MS-DOS directory flag
self.filelist.append(zinfo)
self.NameToInfo[zinfo.filename] = zinfo
self.fp.write(zinfo.FileHeader(False))
return
with open(filename, "rb") as fp:
# Must overwrite CRC and sizes with correct data later
zinfo.CRC = CRC = 0
zinfo.compress_size = compress_size = 0
# Compressed size can be larger than uncompressed size
zip64 = self._allowZip64 and \
zinfo.file_size * 1.05 > ZIP64_LIMIT
self.fp.write(zinfo.FileHeader(zip64))
if zinfo.compress_type == ZIP_DEFLATED:
cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
zlib.DEFLATED, -15)
else:
cmpr = None
file_size = 0
while 1:
buf = fp.read(1024 * 8)
if not buf:
break
file_size = file_size + len(buf)
CRC = crc32(buf, CRC) & 0xffffffff
if cmpr:
buf = cmpr.compress(buf)
compress_size = compress_size + len(buf)
self.fp.write(buf)
if cmpr:
buf = cmpr.flush()
compress_size = compress_size + len(buf)
self.fp.write(buf)
zinfo.compress_size = compress_size
else:
zinfo.compress_size = file_size
zinfo.CRC = CRC
zinfo.file_size = file_size
if not zip64 and self._allowZip64:
if file_size > ZIP64_LIMIT:
raise RuntimeError('File size has increased during compressing')
if compress_size > ZIP64_LIMIT:
raise RuntimeError('Compressed size larger than uncompressed size')
# Seek backwards and write file header (which will now include
# correct CRC and file sizes)
position = self.fp.tell() # Preserve current position in file
self.fp.seek(zinfo.header_offset, 0)
self.fp.write(zinfo.FileHeader(zip64))
self.fp.seek(position, 0)
self.filelist.append(zinfo)
self.NameToInfo[zinfo.filename] = zinfo
def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
"""Write a file into the archive. The contents is the string
'bytes'. 'zinfo_or_arcname' is either a ZipInfo instance or
the name of the file in the archive."""
if not isinstance(zinfo_or_arcname, ZipInfo):
zinfo = ZipInfo(filename=zinfo_or_arcname,
date_time=time.localtime(time.time())[:6])
zinfo.compress_type = self.compression
if zinfo.filename[-1] == '/':
zinfo.external_attr = 0o40775 << 16 # drwxrwxr-x
zinfo.external_attr |= 0x10 # MS-DOS directory flag
else:
zinfo.external_attr = 0o600 << 16 # ?rw-------
else:
zinfo = zinfo_or_arcname
if not self.fp:
raise RuntimeError(
"Attempt to write to ZIP archive that was already closed")
if compress_type is not None:
zinfo.compress_type = compress_type
zinfo.file_size = len(bytes) # Uncompressed size
zinfo.header_offset = self.fp.tell() # Start of header bytes
self._writecheck(zinfo)
self._didModify = True
zinfo.CRC = crc32(bytes) & 0xffffffff # CRC-32 checksum
if zinfo.compress_type == ZIP_DEFLATED:
co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
zlib.DEFLATED, -15)
bytes = co.compress(bytes) + co.flush()
zinfo.compress_size = len(bytes) # Compressed size
else:
zinfo.compress_size = zinfo.file_size
zip64 = zinfo.file_size > ZIP64_LIMIT or \
zinfo.compress_size > ZIP64_LIMIT
if zip64 and not self._allowZip64:
raise LargeZipFile("Filesize would require ZIP64 extensions")
self.fp.write(zinfo.FileHeader(zip64))
self.fp.write(bytes)
if zinfo.flag_bits & 0x08:
# Write CRC and file sizes after the file data
fmt = '<LLQQ' if zip64 else '<LLLL'
self.fp.write(struct.pack(fmt, _DD_SIGNATURE, zinfo.CRC,
zinfo.compress_size, zinfo.file_size))
self.fp.flush()
self.filelist.append(zinfo)
self.NameToInfo[zinfo.filename] = zinfo
def __del__(self):
"""Call the "close()" method in case the user forgot."""
self.close()
def close(self):
"""Close the file, and for mode "w" and "a" write the ending
records."""
if self.fp is None:
return
try:
if self.mode in ("w", "a") and self._didModify: # write ending records
pos1 = self.fp.tell()
for zinfo in self.filelist: # write central directory
dt = zinfo.date_time
dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
extra = []
if zinfo.file_size > ZIP64_LIMIT \
or zinfo.compress_size > ZIP64_LIMIT:
extra.append(zinfo.file_size)
extra.append(zinfo.compress_size)
file_size = 0xffffffff
compress_size = 0xffffffff
else:
file_size = zinfo.file_size
compress_size = zinfo.compress_size
if zinfo.header_offset > ZIP64_LIMIT:
extra.append(zinfo.header_offset)
header_offset = 0xffffffffL
else:
header_offset = zinfo.header_offset
extra_data = zinfo.extra
if extra:
# Append a ZIP64 field to the extra's
extra_data = _strip_extra(extra_data, (1,))
extra_data = struct.pack(
'<HH' + 'Q'*len(extra),
1, 8*len(extra), *extra) + extra_data
extract_version = max(45, zinfo.extract_version)
create_version = max(45, zinfo.create_version)
else:
extract_version = zinfo.extract_version
create_version = zinfo.create_version
try:
filename, flag_bits = zinfo._encodeFilenameFlags()
centdir = struct.pack(structCentralDir,
stringCentralDir, create_version,
zinfo.create_system, extract_version, zinfo.reserved,
flag_bits, zinfo.compress_type, dostime, dosdate,
zinfo.CRC, compress_size, file_size,
len(filename), len(extra_data), len(zinfo.comment),
0, zinfo.internal_attr, zinfo.external_attr,
header_offset)
except DeprecationWarning:
print >>sys.stderr, (structCentralDir,
stringCentralDir, create_version,
zinfo.create_system, extract_version, zinfo.reserved,
zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
zinfo.CRC, compress_size, file_size,
len(zinfo.filename), len(extra_data), len(zinfo.comment),
0, zinfo.internal_attr, zinfo.external_attr,
header_offset)
raise
self.fp.write(centdir)
self.fp.write(filename)
self.fp.write(extra_data)
self.fp.write(zinfo.comment)
pos2 = self.fp.tell()
# Write end-of-zip-archive record
centDirCount = len(self.filelist)
centDirSize = pos2 - pos1
centDirOffset = pos1
requires_zip64 = None
if centDirCount > ZIP_FILECOUNT_LIMIT:
requires_zip64 = "Files count"
elif centDirOffset > ZIP64_LIMIT:
requires_zip64 = "Central directory offset"
elif centDirSize > ZIP64_LIMIT:
requires_zip64 = "Central directory size"
if requires_zip64:
# Need to write the ZIP64 end-of-archive records
if not self._allowZip64:
raise LargeZipFile(requires_zip64 +
" would require ZIP64 extensions")
zip64endrec = struct.pack(
structEndArchive64, stringEndArchive64,
44, 45, 45, 0, 0, centDirCount, centDirCount,
centDirSize, centDirOffset)
self.fp.write(zip64endrec)
zip64locrec = struct.pack(
structEndArchive64Locator,
stringEndArchive64Locator, 0, pos2, 1)
self.fp.write(zip64locrec)
centDirCount = min(centDirCount, 0xFFFF)
centDirSize = min(centDirSize, 0xFFFFFFFF)
centDirOffset = min(centDirOffset, 0xFFFFFFFF)
endrec = struct.pack(structEndArchive, stringEndArchive,
0, 0, centDirCount, centDirCount,
centDirSize, centDirOffset, len(self._comment))
self.fp.write(endrec)
self.fp.write(self._comment)
self.fp.flush()
finally:
fp = self.fp
self.fp = None
if not self._filePassed:
fp.close()
class PyZipFile(ZipFile):
"""Class to create ZIP archives with Python library files and packages."""
def writepy(self, pathname, basename = ""):
"""Add all files from "pathname" to the ZIP archive.
If pathname is a package directory, search the directory and
all package subdirectories recursively for all *.py and enter
the modules into the archive. If pathname is a plain
directory, listdir *.py and enter all modules. Else, pathname
must be a Python *.py file and the module will be put into the
archive. Added modules are always module.pyo or module.pyc.
This method will compile the module.py into module.pyc if
necessary.
"""
dir, name = os.path.split(pathname)
if os.path.isdir(pathname):
initname = os.path.join(pathname, "__init__.py")
if os.path.isfile(initname):
# This is a package directory, add it
if basename:
basename = "%s/%s" % (basename, name)
else:
basename = name
if self.debug:
print "Adding package in", pathname, "as", basename
fname, arcname = self._get_codename(initname[0:-3], basename)
if self.debug:
print "Adding", arcname
self.write(fname, arcname)
dirlist = os.listdir(pathname)
dirlist.remove("__init__.py")
# Add all *.py files and package subdirectories
for filename in dirlist:
path = os.path.join(pathname, filename)
root, ext = os.path.splitext(filename)
if os.path.isdir(path):
if os.path.isfile(os.path.join(path, "__init__.py")):
# This is a package directory, add it
self.writepy(path, basename) # Recursive call
elif ext == ".py":
fname, arcname = self._get_codename(path[0:-3],
basename)
if self.debug:
print "Adding", arcname
self.write(fname, arcname)
else:
# This is NOT a package directory, add its files at top level
if self.debug:
print "Adding files from directory", pathname
for filename in os.listdir(pathname):
path = os.path.join(pathname, filename)
root, ext = os.path.splitext(filename)
if ext == ".py":
fname, arcname = self._get_codename(path[0:-3],
basename)
if self.debug:
print "Adding", arcname
self.write(fname, arcname)
else:
if pathname[-3:] != ".py":
raise RuntimeError, \
'Files added with writepy() must end with ".py"'
fname, arcname = self._get_codename(pathname[0:-3], basename)
if self.debug:
print "Adding file", arcname
self.write(fname, arcname)
def _get_codename(self, pathname, basename):
"""Return (filename, archivename) for the path.
Given a module name path, return the correct file path and
archive name, compiling if necessary. For example, given
/python/lib/string, return (/python/lib/string.pyc, string).
"""
file_py = pathname + ".py"
file_pyc = pathname + ".pyc"
file_pyo = pathname + ".pyo"
if os.path.isfile(file_pyo) and \
os.stat(file_pyo).st_mtime >= os.stat(file_py).st_mtime:
fname = file_pyo # Use .pyo file
elif not os.path.isfile(file_pyc) or \
os.stat(file_pyc).st_mtime < os.stat(file_py).st_mtime:
import py_compile
if self.debug:
print "Compiling", file_py
try:
py_compile.compile(file_py, file_pyc, None, True)
except py_compile.PyCompileError,err:
print err.msg
fname = file_pyc
else:
fname = file_pyc
archivename = os.path.split(fname)[1]
if basename:
archivename = "%s/%s" % (basename, archivename)
return (fname, archivename)
def main(args = None):
import textwrap
USAGE=textwrap.dedent("""\
Usage:
zipfile.py -l zipfile.zip # Show listing of a zipfile
zipfile.py -t zipfile.zip # Test if a zipfile is valid
zipfile.py -e zipfile.zip target # Extract zipfile into target dir
zipfile.py -c zipfile.zip src ... # Create zipfile from sources
""")
if args is None:
args = sys.argv[1:]
if not args or args[0] not in ('-l', '-c', '-e', '-t'):
print USAGE
sys.exit(1)
if args[0] == '-l':
if len(args) != 2:
print USAGE
sys.exit(1)
with ZipFile(args[1], 'r') as zf:
zf.printdir()
elif args[0] == '-t':
if len(args) != 2:
print USAGE
sys.exit(1)
with ZipFile(args[1], 'r') as zf:
badfile = zf.testzip()
if badfile:
print("The following enclosed file is corrupted: {!r}".format(badfile))
print "Done testing"
elif args[0] == '-e':
if len(args) != 3:
print USAGE
sys.exit(1)
with ZipFile(args[1], 'r') as zf:
zf.extractall(args[2])
elif args[0] == '-c':
if len(args) < 3:
print USAGE
sys.exit(1)
def addToZip(zf, path, zippath):
if os.path.isfile(path):
zf.write(path, zippath, ZIP_DEFLATED)
elif os.path.isdir(path):
if zippath:
zf.write(path, zippath)
for nm in os.listdir(path):
addToZip(zf,
os.path.join(path, nm), os.path.join(zippath, nm))
# else: ignore
with ZipFile(args[1], 'w', allowZip64=True) as zf:
for path in args[2:]:
zippath = os.path.basename(path)
if not zippath:
zippath = os.path.basename(os.path.dirname(path))
if zippath in ('', os.curdir, os.pardir):
zippath = ''
addToZip(zf, path, zippath)
if __name__ == "__main__":
main()
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Abstract Base Classes (ABCs) for collections, according to PEP 3119.
DON'T USE THIS MODULE DIRECTLY! The classes here should be imported
via collections; they are defined here only to alleviate certain
bootstrapping issues. Unit tests are in test_collections.
"""
from abc import ABCMeta, abstractmethod
import sys
__all__ = ["Hashable", "Iterable", "Iterator",
"Sized", "Container", "Callable",
"Set", "MutableSet",
"Mapping", "MutableMapping",
"MappingView", "KeysView", "ItemsView", "ValuesView",
"Sequence", "MutableSequence",
]
### ONE-TRICK PONIES ###
def _hasattr(C, attr):
try:
return any(attr in B.__dict__ for B in C.__mro__)
except AttributeError:
# Old-style class
return hasattr(C, attr)
class Hashable:
__metaclass__ = ABCMeta
@abstractmethod
def __hash__(self):
return 0
@classmethod
def __subclasshook__(cls, C):
if cls is Hashable:
try:
for B in C.__mro__:
if "__hash__" in B.__dict__:
if B.__dict__["__hash__"]:
return True
break
except AttributeError:
# Old-style class
if getattr(C, "__hash__", None):
return True
return NotImplemented
class Iterable:
__metaclass__ = ABCMeta
@abstractmethod
def __iter__(self):
while False:
yield None
@classmethod
def __subclasshook__(cls, C):
if cls is Iterable:
if _hasattr(C, "__iter__"):
return True
return NotImplemented
Iterable.register(str)
class Iterator(Iterable):
@abstractmethod
def next(self):
'Return the next item from the iterator. When exhausted, raise StopIteration'
raise StopIteration
def __iter__(self):
return self
@classmethod
def __subclasshook__(cls, C):
if cls is Iterator:
if _hasattr(C, "next") and _hasattr(C, "__iter__"):
return True
return NotImplemented
class Sized:
__metaclass__ = ABCMeta
@abstractmethod
def __len__(self):
return 0
@classmethod
def __subclasshook__(cls, C):
if cls is Sized:
if _hasattr(C, "__len__"):
return True
return NotImplemented
class Container:
__metaclass__ = ABCMeta
@abstractmethod
def __contains__(self, x):
return False
@classmethod
def __subclasshook__(cls, C):
if cls is Container:
if _hasattr(C, "__contains__"):
return True
return NotImplemented
class Callable:
__metaclass__ = ABCMeta
@abstractmethod
def __call__(self, *args, **kwds):
return False
@classmethod
def __subclasshook__(cls, C):
if cls is Callable:
if _hasattr(C, "__call__"):
return True
return NotImplemented
### SETS ###
class Set(Sized, Iterable, Container):
"""A set is a finite, iterable container.
This class provides concrete generic implementations of all
methods except for __contains__, __iter__ and __len__.
To override the comparisons (presumably for speed, as the
semantics are fixed), redefine __le__ and __ge__,
then the other operations will automatically follow suit.
"""
def __le__(self, other):
if not isinstance(other, Set):
return NotImplemented
if len(self) > len(other):
return False
for elem in self:
if elem not in other:
return False
return True
def __lt__(self, other):
if not isinstance(other, Set):
return NotImplemented
return len(self) < len(other) and self.__le__(other)
def __gt__(self, other):
if not isinstance(other, Set):
return NotImplemented
return len(self) > len(other) and self.__ge__(other)
def __ge__(self, other):
if not isinstance(other, Set):
return NotImplemented
if len(self) < len(other):
return False
for elem in other:
if elem not in self:
return False
return True
def __eq__(self, other):
if not isinstance(other, Set):
return NotImplemented
return len(self) == len(other) and self.__le__(other)
def __ne__(self, other):
return not (self == other)
@classmethod
def _from_iterable(cls, it):
'''Construct an instance of the class from any iterable input.
Must override this method if the class constructor signature
does not accept an iterable for an input.
'''
return cls(it)
def __and__(self, other):
if not isinstance(other, Iterable):
return NotImplemented
return self._from_iterable(value for value in other if value in self)
__rand__ = __and__
def isdisjoint(self, other):
'Return True if two sets have a null intersection.'
for value in other:
if value in self:
return False
return True
def __or__(self, other):
if not isinstance(other, Iterable):
return NotImplemented
chain = (e for s in (self, other) for e in s)
return self._from_iterable(chain)
__ror__ = __or__
def __sub__(self, other):
if not isinstance(other, Set):
if not isinstance(other, Iterable):
return NotImplemented
other = self._from_iterable(other)
return self._from_iterable(value for value in self
if value not in other)
def __rsub__(self, other):
if not isinstance(other, Set):
if not isinstance(other, Iterable):
return NotImplemented
other = self._from_iterable(other)
return self._from_iterable(value for value in other
if value not in self)
def __xor__(self, other):
if not isinstance(other, Set):
if not isinstance(other, Iterable):
return NotImplemented
other = self._from_iterable(other)
return (self - other) | (other - self)
__rxor__ = __xor__
# Sets are not hashable by default, but subclasses can change this
__hash__ = None
def _hash(self):
"""Compute the hash value of a set.
Note that we don't define __hash__: not all sets are hashable.
But if you define a hashable set type, its __hash__ should
call this function.
This must be compatible __eq__.
All sets ought to compare equal if they contain the same
elements, regardless of how they are implemented, and
regardless of the order of the elements; so there's not much
freedom for __eq__ or __hash__. We match the algorithm used
by the built-in frozenset type.
"""
MAX = sys.maxint
MASK = 2 * MAX + 1
n = len(self)
h = 1927868237 * (n + 1)
h &= MASK
for x in self:
hx = hash(x)
h ^= (hx ^ (hx << 16) ^ 89869747) * 3644798167
h &= MASK
h = h * 69069 + 907133923
h &= MASK
if h > MAX:
h -= MASK + 1
if h == -1:
h = 590923713
return h
Set.register(frozenset)
class MutableSet(Set):
"""A mutable set is a finite, iterable container.
This class provides concrete generic implementations of all
methods except for __contains__, __iter__, __len__,
add(), and discard().
To override the comparisons (presumably for speed, as the
semantics are fixed), all you have to do is redefine __le__ and
then the other operations will automatically follow suit.
"""
@abstractmethod
def add(self, value):
"""Add an element."""
raise NotImplementedError
@abstractmethod
def discard(self, value):
"""Remove an element. Do not raise an exception if absent."""
raise NotImplementedError
def remove(self, value):
"""Remove an element. If not a member, raise a KeyError."""
if value not in self:
raise KeyError(value)
self.discard(value)
def pop(self):
"""Return the popped value. Raise KeyError if empty."""
it = iter(self)
try:
value = next(it)
except StopIteration:
raise KeyError
self.discard(value)
return value
def clear(self):
"""This is slow (creates N new iterators!) but effective."""
try:
while True:
self.pop()
except KeyError:
pass
def __ior__(self, it):
for value in it:
self.add(value)
return self
def __iand__(self, it):
for value in (self - it):
self.discard(value)
return self
def __ixor__(self, it):
if it is self:
self.clear()
else:
if not isinstance(it, Set):
it = self._from_iterable(it)
for value in it:
if value in self:
self.discard(value)
else:
self.add(value)
return self
def __isub__(self, it):
if it is self:
self.clear()
else:
for value in it:
self.discard(value)
return self
MutableSet.register(set)
### MAPPINGS ###
class Mapping(Sized, Iterable, Container):
"""A Mapping is a generic container for associating key/value
pairs.
This class provides concrete generic implementations of all
methods except for __getitem__, __iter__, and __len__.
"""
@abstractmethod
def __getitem__(self, key):
raise KeyError
def get(self, key, default=None):
'D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.'
try:
return self[key]
except KeyError:
return default
def __contains__(self, key):
try:
self[key]
except KeyError:
return False
else:
return True
def iterkeys(self):
'D.iterkeys() -> an iterator over the keys of D'
return iter(self)
def itervalues(self):
'D.itervalues() -> an iterator over the values of D'
for key in self:
yield self[key]
def iteritems(self):
'D.iteritems() -> an iterator over the (key, value) items of D'
for key in self:
yield (key, self[key])
def keys(self):
"D.keys() -> list of D's keys"
return list(self)
def items(self):
"D.items() -> list of D's (key, value) pairs, as 2-tuples"
return [(key, self[key]) for key in self]
def values(self):
"D.values() -> list of D's values"
return [self[key] for key in self]
# Mappings are not hashable by default, but subclasses can change this
__hash__ = None
def __eq__(self, other):
if not isinstance(other, Mapping):
return NotImplemented
return dict(self.items()) == dict(other.items())
def __ne__(self, other):
return not (self == other)
class MappingView(Sized):
def __init__(self, mapping):
self._mapping = mapping
def __len__(self):
return len(self._mapping)
def __repr__(self):
return '{0.__class__.__name__}({0._mapping!r})'.format(self)
class KeysView(MappingView, Set):
@classmethod
def _from_iterable(self, it):
return set(it)
def __contains__(self, key):
return key in self._mapping
def __iter__(self):
for key in self._mapping:
yield key
KeysView.register(type({}.viewkeys()))
class ItemsView(MappingView, Set):
@classmethod
def _from_iterable(self, it):
return set(it)
def __contains__(self, item):
key, value = item
try:
v = self._mapping[key]
except KeyError:
return False
else:
return v == value
def __iter__(self):
for key in self._mapping:
yield (key, self._mapping[key])
ItemsView.register(type({}.viewitems()))
class ValuesView(MappingView):
def __contains__(self, value):
for key in self._mapping:
if value == self._mapping[key]:
return True
return False
def __iter__(self):
for key in self._mapping:
yield self._mapping[key]
ValuesView.register(type({}.viewvalues()))
class MutableMapping(Mapping):
"""A MutableMapping is a generic container for associating
key/value pairs.
This class provides concrete generic implementations of all
methods except for __getitem__, __setitem__, __delitem__,
__iter__, and __len__.
"""
@abstractmethod
def __setitem__(self, key, value):
raise KeyError
@abstractmethod
def __delitem__(self, key):
raise KeyError
__marker = object()
def pop(self, key, default=__marker):
'''D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If key is not found, d is returned if given, otherwise KeyError is raised.
'''
try:
value = self[key]
except KeyError:
if default is self.__marker:
raise
return default
else:
del self[key]
return value
def popitem(self):
'''D.popitem() -> (k, v), remove and return some (key, value) pair
as a 2-tuple; but raise KeyError if D is empty.
'''
try:
key = next(iter(self))
except StopIteration:
raise KeyError
value = self[key]
del self[key]
return key, value
def clear(self):
'D.clear() -> None. Remove all items from D.'
try:
while True:
self.popitem()
except KeyError:
pass
def update(*args, **kwds):
''' D.update([E, ]**F) -> None. Update D from mapping/iterable E and F.
If E present and has a .keys() method, does: for k in E: D[k] = E[k]
If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v
In either case, this is followed by: for k, v in F.items(): D[k] = v
'''
if not args:
raise TypeError("descriptor 'update' of 'MutableMapping' object "
"needs an argument")
self = args[0]
args = args[1:]
if len(args) > 1:
raise TypeError('update expected at most 1 arguments, got %d' %
len(args))
if args:
other = args[0]
if isinstance(other, Mapping):
for key in other:
self[key] = other[key]
elif hasattr(other, "keys"):
for key in other.keys():
self[key] = other[key]
else:
for key, value in other:
self[key] = value
for key, value in kwds.items():
self[key] = value
def setdefault(self, key, default=None):
'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D'
try:
return self[key]
except KeyError:
self[key] = default
return default
MutableMapping.register(dict)
### SEQUENCES ###
class Sequence(Sized, Iterable, Container):
"""All the operations on a read-only sequence.
Concrete subclasses must override __new__ or __init__,
__getitem__, and __len__.
"""
@abstractmethod
def __getitem__(self, index):
raise IndexError
def __iter__(self):
i = 0
try:
while True:
v = self[i]
yield v
i += 1
except IndexError:
return
def __contains__(self, value):
for v in self:
if v == value:
return True
return False
def __reversed__(self):
for i in reversed(range(len(self))):
yield self[i]
def index(self, value):
'''S.index(value) -> integer -- return first index of value.
Raises ValueError if the value is not present.
'''
for i, v in enumerate(self):
if v == value:
return i
raise ValueError
def count(self, value):
'S.count(value) -> integer -- return number of occurrences of value'
return sum(1 for v in self if v == value)
Sequence.register(tuple)
Sequence.register(basestring)
Sequence.register(buffer)
Sequence.register(xrange)
class MutableSequence(Sequence):
"""All the operations on a read-only sequence.
Concrete subclasses must provide __new__ or __init__,
__getitem__, __setitem__, __delitem__, __len__, and insert().
"""
@abstractmethod
def __setitem__(self, index, value):
raise IndexError
@abstractmethod
def __delitem__(self, index):
raise IndexError
@abstractmethod
def insert(self, index, value):
'S.insert(index, object) -- insert object before index'
raise IndexError
def append(self, value):
'S.append(object) -- append object to the end of the sequence'
self.insert(len(self), value)
def reverse(self):
'S.reverse() -- reverse *IN PLACE*'
n = len(self)
for i in range(n//2):
self[i], self[n-i-1] = self[n-i-1], self[i]
def extend(self, values):
'S.extend(iterable) -- extend sequence by appending elements from the iterable'
for v in values:
self.append(v)
def pop(self, index=-1):
'''S.pop([index]) -> item -- remove and return item at index (default last).
Raise IndexError if list is empty or index is out of range.
'''
v = self[index]
del self[index]
return v
def remove(self, value):
'''S.remove(value) -- remove first occurrence of value.
Raise ValueError if the value is not present.
'''
del self[self.index(value)]
def __iadd__(self, values):
self.extend(values)
return self
MutableSequence.register(list)
"""Shared OS X support functions."""
import os
import re
import sys
__all__ = [
'compiler_fixup',
'customize_config_vars',
'customize_compiler',
'get_platform_osx',
]
# configuration variables that may contain universal build flags,
# like "-arch" or "-isdkroot", that may need customization for
# the user environment
_UNIVERSAL_CONFIG_VARS = ('CFLAGS', 'LDFLAGS', 'CPPFLAGS', 'BASECFLAGS',
'BLDSHARED', 'LDSHARED', 'CC', 'CXX',
'PY_CFLAGS', 'PY_LDFLAGS', 'PY_CPPFLAGS',
'PY_CORE_CFLAGS')
# configuration variables that may contain compiler calls
_COMPILER_CONFIG_VARS = ('BLDSHARED', 'LDSHARED', 'CC', 'CXX')
# prefix added to original configuration variable names
_INITPRE = '_OSX_SUPPORT_INITIAL_'
def _find_executable(executable, path=None):
"""Tries to find 'executable' in the directories listed in 'path'.
A string listing directories separated by 'os.pathsep'; defaults to
os.environ['PATH']. Returns the complete filename or None if not found.
"""
if path is None:
path = os.environ['PATH']
paths = path.split(os.pathsep)
base, ext = os.path.splitext(executable)
if (sys.platform == 'win32' or os.name == 'os2') and (ext != '.exe'):
executable = executable + '.exe'
if not os.path.isfile(executable):
for p in paths:
f = os.path.join(p, executable)
if os.path.isfile(f):
# the file exists, we have a shot at spawn working
return f
return None
else:
return executable
def _read_output(commandstring):
"""Output from successful command execution or None"""
# Similar to os.popen(commandstring, "r").read(),
# but without actually using os.popen because that
# function is not usable during python bootstrap.
# tempfile is also not available then.
import contextlib
try:
import tempfile
fp = tempfile.NamedTemporaryFile()
except ImportError:
fp = open("/tmp/_osx_support.%s"%(
os.getpid(),), "w+b")
with contextlib.closing(fp) as fp:
cmd = "%s 2>/dev/null >'%s'" % (commandstring, fp.name)
return fp.read().strip() if not os.system(cmd) else None
def _find_build_tool(toolname):
"""Find a build tool on current path or using xcrun"""
return (_find_executable(toolname)
or _read_output("/usr/bin/xcrun -find %s" % (toolname,))
or ''
)
_SYSTEM_VERSION = None
def _get_system_version():
"""Return the OS X system version as a string"""
# Reading this plist is a documented way to get the system
# version (see the documentation for the Gestalt Manager)
# We avoid using platform.mac_ver to avoid possible bootstrap issues during
# the build of Python itself (distutils is used to build standard library
# extensions).
global _SYSTEM_VERSION
if _SYSTEM_VERSION is None:
_SYSTEM_VERSION = ''
try:
f = open('/System/Library/CoreServices/SystemVersion.plist')
except IOError:
# We're on a plain darwin box, fall back to the default
# behaviour.
pass
else:
try:
m = re.search(r'<key>ProductUserVisibleVersion</key>\s*'
r'<string>(.*?)</string>', f.read())
finally:
f.close()
if m is not None:
_SYSTEM_VERSION = '.'.join(m.group(1).split('.')[:2])
# else: fall back to the default behaviour
return _SYSTEM_VERSION
def _remove_original_values(_config_vars):
"""Remove original unmodified values for testing"""
# This is needed for higher-level cross-platform tests of get_platform.
for k in list(_config_vars):
if k.startswith(_INITPRE):
del _config_vars[k]
def _save_modified_value(_config_vars, cv, newvalue):
"""Save modified and original unmodified value of configuration var"""
oldvalue = _config_vars.get(cv, '')
if (oldvalue != newvalue) and (_INITPRE + cv not in _config_vars):
_config_vars[_INITPRE + cv] = oldvalue
_config_vars[cv] = newvalue
def _supports_universal_builds():
"""Returns True if universal builds are supported on this system"""
# As an approximation, we assume that if we are running on 10.4 or above,
# then we are running with an Xcode environment that supports universal
# builds, in particular -isysroot and -arch arguments to the compiler. This
# is in support of allowing 10.4 universal builds to run on 10.3.x systems.
osx_version = _get_system_version()
if osx_version:
try:
osx_version = tuple(int(i) for i in osx_version.split('.'))
except ValueError:
osx_version = ''
return bool(osx_version >= (10, 4)) if osx_version else False
def _find_appropriate_compiler(_config_vars):
"""Find appropriate C compiler for extension module builds"""
# Issue #13590:
# The OSX location for the compiler varies between OSX
# (or rather Xcode) releases. With older releases (up-to 10.5)
# the compiler is in /usr/bin, with newer releases the compiler
# can only be found inside Xcode.app if the "Command Line Tools"
# are not installed.
#
# Furthermore, the compiler that can be used varies between
# Xcode releases. Up to Xcode 4 it was possible to use 'gcc-4.2'
# as the compiler, after that 'clang' should be used because
# gcc-4.2 is either not present, or a copy of 'llvm-gcc' that
# miscompiles Python.
# skip checks if the compiler was overridden with a CC env variable
if 'CC' in os.environ:
return _config_vars
# The CC config var might contain additional arguments.
# Ignore them while searching.
cc = oldcc = _config_vars['CC'].split()[0]
if not _find_executable(cc):
# Compiler is not found on the shell search PATH.
# Now search for clang, first on PATH (if the Command LIne
# Tools have been installed in / or if the user has provided
# another location via CC). If not found, try using xcrun
# to find an uninstalled clang (within a selected Xcode).
# NOTE: Cannot use subprocess here because of bootstrap
# issues when building Python itself (and os.popen is
# implemented on top of subprocess and is therefore not
# usable as well)
cc = _find_build_tool('clang')
elif os.path.basename(cc).startswith('gcc'):
# Compiler is GCC, check if it is LLVM-GCC
data = _read_output("'%s' --version"
% (cc.replace("'", "'\"'\"'"),))
if data and 'llvm-gcc' in data:
# Found LLVM-GCC, fall back to clang
cc = _find_build_tool('clang')
if not cc:
raise SystemError(
"Cannot locate working compiler")
if cc != oldcc:
# Found a replacement compiler.
# Modify config vars using new compiler, if not already explicitly
# overridden by an env variable, preserving additional arguments.
for cv in _COMPILER_CONFIG_VARS:
if cv in _config_vars and cv not in os.environ:
cv_split = _config_vars[cv].split()
cv_split[0] = cc if cv != 'CXX' else cc + '++'
_save_modified_value(_config_vars, cv, ' '.join(cv_split))
return _config_vars
def _remove_universal_flags(_config_vars):
"""Remove all universal build arguments from config vars"""
for cv in _UNIVERSAL_CONFIG_VARS:
# Do not alter a config var explicitly overridden by env var
if cv in _config_vars and cv not in os.environ:
flags = _config_vars[cv]
flags = re.sub('-arch\s+\w+\s', ' ', flags)
flags = re.sub('-isysroot [^ \t]*', ' ', flags)
_save_modified_value(_config_vars, cv, flags)
return _config_vars
def _remove_unsupported_archs(_config_vars):
"""Remove any unsupported archs from config vars"""
# Different Xcode releases support different sets for '-arch'
# flags. In particular, Xcode 4.x no longer supports the
# PPC architectures.
#
# This code automatically removes '-arch ppc' and '-arch ppc64'
# when these are not supported. That makes it possible to
# build extensions on OSX 10.7 and later with the prebuilt
# 32-bit installer on the python.org website.
# skip checks if the compiler was overridden with a CC env variable
if 'CC' in os.environ:
return _config_vars
if re.search('-arch\s+ppc', _config_vars['CFLAGS']) is not None:
# NOTE: Cannot use subprocess here because of bootstrap
# issues when building Python itself
status = os.system(
"""echo 'int main{};' | """
"""'%s' -c -arch ppc -x c -o /dev/null /dev/null 2>/dev/null"""
%(_config_vars['CC'].replace("'", "'\"'\"'"),))
if status:
# The compile failed for some reason. Because of differences
# across Xcode and compiler versions, there is no reliable way
# to be sure why it failed. Assume here it was due to lack of
# PPC support and remove the related '-arch' flags from each
# config variables not explicitly overridden by an environment
# variable. If the error was for some other reason, we hope the
# failure will show up again when trying to compile an extension
# module.
for cv in _UNIVERSAL_CONFIG_VARS:
if cv in _config_vars and cv not in os.environ:
flags = _config_vars[cv]
flags = re.sub('-arch\s+ppc\w*\s', ' ', flags)
_save_modified_value(_config_vars, cv, flags)
return _config_vars
def _override_all_archs(_config_vars):
"""Allow override of all archs with ARCHFLAGS env var"""
# NOTE: This name was introduced by Apple in OSX 10.5 and
# is used by several scripting languages distributed with
# that OS release.
if 'ARCHFLAGS' in os.environ:
arch = os.environ['ARCHFLAGS']
for cv in _UNIVERSAL_CONFIG_VARS:
if cv in _config_vars and '-arch' in _config_vars[cv]:
flags = _config_vars[cv]
flags = re.sub('-arch\s+\w+\s', ' ', flags)
flags = flags + ' ' + arch
_save_modified_value(_config_vars, cv, flags)
return _config_vars
def _check_for_unavailable_sdk(_config_vars):
"""Remove references to any SDKs not available"""
# If we're on OSX 10.5 or later and the user tries to
# compile an extension using an SDK that is not present
# on the current machine it is better to not use an SDK
# than to fail. This is particularly important with
# the standalone Command Line Tools alternative to a
# full-blown Xcode install since the CLT packages do not
# provide SDKs. If the SDK is not present, it is assumed
# that the header files and dev libs have been installed
# to /usr and /System/Library by either a standalone CLT
# package or the CLT component within Xcode.
cflags = _config_vars.get('CFLAGS', '')
m = re.search(r'-isysroot\s+(\S+)', cflags)
if m is not None:
sdk = m.group(1)
if not os.path.exists(sdk):
for cv in _UNIVERSAL_CONFIG_VARS:
# Do not alter a config var explicitly overridden by env var
if cv in _config_vars and cv not in os.environ:
flags = _config_vars[cv]
flags = re.sub(r'-isysroot\s+\S+(?:\s|$)', ' ', flags)
_save_modified_value(_config_vars, cv, flags)
return _config_vars
def compiler_fixup(compiler_so, cc_args):
"""
This function will strip '-isysroot PATH' and '-arch ARCH' from the
compile flags if the user has specified one them in extra_compile_flags.
This is needed because '-arch ARCH' adds another architecture to the
build, without a way to remove an architecture. Furthermore GCC will
barf if multiple '-isysroot' arguments are present.
"""
stripArch = stripSysroot = False
compiler_so = list(compiler_so)
if not _supports_universal_builds():
# OSX before 10.4.0, these don't support -arch and -isysroot at
# all.
stripArch = stripSysroot = True
else:
stripArch = '-arch' in cc_args
stripSysroot = '-isysroot' in cc_args
if stripArch or 'ARCHFLAGS' in os.environ:
while True:
try:
index = compiler_so.index('-arch')
# Strip this argument and the next one:
del compiler_so[index:index+2]
except ValueError:
break
if 'ARCHFLAGS' in os.environ and not stripArch:
# User specified different -arch flags in the environ,
# see also distutils.sysconfig
compiler_so = compiler_so + os.environ['ARCHFLAGS'].split()
if stripSysroot:
while True:
try:
index = compiler_so.index('-isysroot')
# Strip this argument and the next one:
del compiler_so[index:index+2]
except ValueError:
break
# Check if the SDK that is used during compilation actually exists,
# the universal build requires the usage of a universal SDK and not all
# users have that installed by default.
sysroot = None
if '-isysroot' in cc_args:
idx = cc_args.index('-isysroot')
sysroot = cc_args[idx+1]
elif '-isysroot' in compiler_so:
idx = compiler_so.index('-isysroot')
sysroot = compiler_so[idx+1]
if sysroot and not os.path.isdir(sysroot):
from distutils import log
log.warn("Compiling with an SDK that doesn't seem to exist: %s",
sysroot)
log.warn("Please check your Xcode installation")
return compiler_so
def customize_config_vars(_config_vars):
"""Customize Python build configuration variables.
Called internally from sysconfig with a mutable mapping
containing name/value pairs parsed from the configured
makefile used to build this interpreter. Returns
the mapping updated as needed to reflect the environment
in which the interpreter is running; in the case of
a Python from a binary installer, the installed
environment may be very different from the build
environment, i.e. different OS levels, different
built tools, different available CPU architectures.
This customization is performed whenever
distutils.sysconfig.get_config_vars() is first
called. It may be used in environments where no
compilers are present, i.e. when installing pure
Python dists. Customization of compiler paths
and detection of unavailable archs is deferred
until the first extension module build is
requested (in distutils.sysconfig.customize_compiler).
Currently called from distutils.sysconfig
"""
if not _supports_universal_builds():
# On Mac OS X before 10.4, check if -arch and -isysroot
# are in CFLAGS or LDFLAGS and remove them if they are.
# This is needed when building extensions on a 10.3 system
# using a universal build of python.
_remove_universal_flags(_config_vars)
# Allow user to override all archs with ARCHFLAGS env var
_override_all_archs(_config_vars)
# Remove references to sdks that are not found
_check_for_unavailable_sdk(_config_vars)
return _config_vars
def customize_compiler(_config_vars):
"""Customize compiler path and configuration variables.
This customization is performed when the first
extension module build is requested
in distutils.sysconfig.customize_compiler).
"""
# Find a compiler to use for extension module builds
_find_appropriate_compiler(_config_vars)
# Remove ppc arch flags if not supported here
_remove_unsupported_archs(_config_vars)
# Allow user to override all archs with ARCHFLAGS env var
_override_all_archs(_config_vars)
return _config_vars
def get_platform_osx(_config_vars, osname, release, machine):
"""Filter values for get_platform()"""
# called from get_platform() in sysconfig and distutils.util
#
# For our purposes, we'll assume that the system version from
# distutils' perspective is what MACOSX_DEPLOYMENT_TARGET is set
# to. This makes the compatibility story a bit more sane because the
# machine is going to compile and link as if it were
# MACOSX_DEPLOYMENT_TARGET.
macver = _config_vars.get('MACOSX_DEPLOYMENT_TARGET', '')
macrelease = _get_system_version() or macver
macver = macver or macrelease
if macver:
release = macver
osname = "macosx"
# Use the original CFLAGS value, if available, so that we
# return the same machine type for the platform string.
# Otherwise, distutils may consider this a cross-compiling
# case and disallow installs.
cflags = _config_vars.get(_INITPRE+'CFLAGS',
_config_vars.get('CFLAGS', ''))
if macrelease:
try:
macrelease = tuple(int(i) for i in macrelease.split('.')[0:2])
except ValueError:
macrelease = (10, 0)
else:
# assume no universal support
macrelease = (10, 0)
if (macrelease >= (10, 4)) and '-arch' in cflags.strip():
# The universal build will build fat binaries, but not on
# systems before 10.4
machine = 'fat'
archs = re.findall('-arch\s+(\S+)', cflags)
archs = tuple(sorted(set(archs)))
if len(archs) == 1:
machine = archs[0]
elif archs == ('i386', 'ppc'):
machine = 'fat'
elif archs == ('i386', 'x86_64'):
machine = 'intel'
elif archs == ('i386', 'ppc', 'x86_64'):
machine = 'fat3'
elif archs == ('ppc64', 'x86_64'):
machine = 'fat64'
elif archs == ('i386', 'ppc', 'ppc64', 'x86_64'):
machine = 'universal'
else:
raise ValueError(
"Don't know machine value for archs=%r" % (archs,))
elif machine == 'i386':
# On OSX the machine type returned by uname is always the
# 32-bit variant, even if the executable architecture is
# the 64-bit variant
if sys.maxint >= 2**32:
machine = 'x86_64'
elif machine in ('PowerPC', 'Power_Macintosh'):
# Pick a sane name for the PPC architecture.
# See 'i386' case
if sys.maxint >= 2**32:
machine = 'ppc64'
else:
machine = 'ppc'
return (osname, release, machine)
"""
Python implementation of the io module.
"""
from __future__ import (print_function, unicode_literals)
import os
import abc
import codecs
import sys
import warnings
import errno
# Import thread instead of threading to reduce startup cost
try:
from thread import allocate_lock as Lock
except ImportError:
from dummy_thread import allocate_lock as Lock
import io
from io import (__all__, SEEK_SET, SEEK_CUR, SEEK_END)
from errno import EINTR
__metaclass__ = type
# open() uses st_blksize whenever we can
DEFAULT_BUFFER_SIZE = 8 * 1024 # bytes
# NOTE: Base classes defined here are registered with the "official" ABCs
# defined in io.py. We don't use real inheritance though, because we don't want
# to inherit the C implementations.
class BlockingIOError(IOError):
"""Exception raised when I/O would block on a non-blocking I/O stream."""
def __init__(self, errno, strerror, characters_written=0):
super(IOError, self).__init__(errno, strerror)
if not isinstance(characters_written, (int, long)):
raise TypeError("characters_written must be a integer")
self.characters_written = characters_written
def open(file, mode="r", buffering=-1,
encoding=None, errors=None,
newline=None, closefd=True):
r"""Open file and return a stream. Raise IOError upon failure.
file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returned I/O object is closed, unless closefd is set to False.)
mode is an optional string that specifies the mode in which the file
is opened. It defaults to 'r' which means open for reading in text
mode. Other common values are 'w' for writing (truncating the file if
it already exists), and 'a' for appending (which on some Unix systems,
means that all writes append to the end of the file regardless of the
current seek position). In text mode, if encoding is not specified the
encoding used is platform dependent. (For reading and writing raw
bytes use binary mode and leave encoding unspecified.) The available
modes are:
========= ===============================================================
Character Meaning
--------- ---------------------------------------------------------------
'r' open for reading (default)
'w' open for writing, truncating the file first
'a' open for writing, appending to the end of the file if it exists
'b' binary mode
't' text mode (default)
'+' open a disk file for updating (reading and writing)
'U' universal newline mode (for backwards compatibility; unneeded
for new code)
========= ===============================================================
The default mode is 'rt' (open for reading text). For binary random
access, the mode 'w+b' opens and truncates the file to 0 bytes, while
'r+b' opens the file without truncation.
Python distinguishes between files opened in binary and text modes,
even when the underlying operating system doesn't. Files opened in
binary mode (appending 'b' to the mode argument) return contents as
bytes objects without any decoding. In text mode (the default, or when
't' is appended to the mode argument), the contents of the file are
returned as strings, the bytes having been first decoded using a
platform-dependent encoding or using the specified encoding if given.
buffering is an optional integer used to set the buffering policy.
Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
line buffering (only usable in text mode), and an integer > 1 to indicate
the size of a fixed-size chunk buffer. When no buffering argument is
given, the default buffering policy works as follows:
* Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying device's
"block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
On many systems, the buffer will typically be 4096 or 8192 bytes long.
* "Interactive" text files (files for which isatty() returns True)
use line buffering. Other text files use the policy described above
for binary files.
encoding is the name of the encoding used to decode or encode the
file. This should only be used in text mode. The default encoding is
platform dependent, but any encoding supported by Python can be
passed. See the codecs module for the list of supported encodings.
errors is an optional string that specifies how encoding errors are to
be handled---this argument should not be used in binary mode. Pass
'strict' to raise a ValueError exception if there is an encoding error
(the default of None has the same effect), or pass 'ignore' to ignore
errors. (Note that ignoring encoding errors can lead to data loss.)
See the documentation for codecs.register for a list of the permitted
encoding error strings.
newline controls how universal newlines works (it only applies to text
mode). It can be None, '', '\n', '\r', and '\r\n'. It works as
follows:
* On input, if newline is None, universal newlines mode is
enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
these are translated into '\n' before being returned to the
caller. If it is '', universal newline mode is enabled, but line
endings are returned to the caller untranslated. If it has any of
the other legal values, input lines are only terminated by the given
string, and the line ending is returned to the caller untranslated.
* On output, if newline is None, any '\n' characters written are
translated to the system default line separator, os.linesep. If
newline is '', no translation takes place. If newline is any of the
other legal values, any '\n' characters written are translated to
the given string.
If closefd is False, the underlying file descriptor will be kept open
when the file is closed. This does not work when a file name is given
and must be True in that case.
open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.
It is also possible to use a string or bytearray as a file for both
reading and writing. For strings StringIO can be used like a file
opened in a text mode, and for bytes a BytesIO can be used like a file
opened in a binary mode.
"""
if not isinstance(file, (basestring, int, long)):
raise TypeError("invalid file: %r" % file)
if not isinstance(mode, basestring):
raise TypeError("invalid mode: %r" % mode)
if not isinstance(buffering, (int, long)):
raise TypeError("invalid buffering: %r" % buffering)
if encoding is not None and not isinstance(encoding, basestring):
raise TypeError("invalid encoding: %r" % encoding)
if errors is not None and not isinstance(errors, basestring):
raise TypeError("invalid errors: %r" % errors)
modes = set(mode)
if modes - set("arwb+tU") or len(mode) > len(modes):
raise ValueError("invalid mode: %r" % mode)
reading = "r" in modes
writing = "w" in modes
appending = "a" in modes
updating = "+" in modes
text = "t" in modes
binary = "b" in modes
if "U" in modes:
if writing or appending:
raise ValueError("can't use U and writing mode at once")
reading = True
if text and binary:
raise ValueError("can't have text and binary mode at once")
if reading + writing + appending > 1:
raise ValueError("can't have read/write/append mode at once")
if not (reading or writing or appending):
raise ValueError("must have exactly one of read/write/append mode")
if binary and encoding is not None:
raise ValueError("binary mode doesn't take an encoding argument")
if binary and errors is not None:
raise ValueError("binary mode doesn't take an errors argument")
if binary and newline is not None:
raise ValueError("binary mode doesn't take a newline argument")
raw = FileIO(file,
(reading and "r" or "") +
(writing and "w" or "") +
(appending and "a" or "") +
(updating and "+" or ""),
closefd)
result = raw
try:
line_buffering = False
if buffering == 1 or buffering < 0 and raw.isatty():
buffering = -1
line_buffering = True
if buffering < 0:
buffering = DEFAULT_BUFFER_SIZE
try:
bs = os.fstat(raw.fileno()).st_blksize
except (os.error, AttributeError):
pass
else:
if bs > 1:
buffering = bs
if buffering < 0:
raise ValueError("invalid buffering size")
if buffering == 0:
if binary:
return result
raise ValueError("can't have unbuffered text I/O")
if updating:
buffer = BufferedRandom(raw, buffering)
elif writing or appending:
buffer = BufferedWriter(raw, buffering)
elif reading:
buffer = BufferedReader(raw, buffering)
else:
raise ValueError("unknown mode: %r" % mode)
result = buffer
if binary:
return result
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
result = text
text.mode = mode
return result
except:
result.close()
raise
class DocDescriptor:
"""Helper for builtins.open.__doc__
"""
def __get__(self, obj, typ):
return (
"open(file, mode='r', buffering=-1, encoding=None, "
"errors=None, newline=None, closefd=True)\n\n" +
open.__doc__)
class OpenWrapper:
"""Wrapper for builtins.open
Trick so that open won't become a bound method when stored
as a class variable (as dbm.dumb does).
See initstdio() in Python/pythonrun.c.
"""
__doc__ = DocDescriptor()
def __new__(cls, *args, **kwargs):
return open(*args, **kwargs)
class UnsupportedOperation(ValueError, IOError):
pass
class IOBase:
__metaclass__ = abc.ABCMeta
"""The abstract base class for all I/O classes, acting on streams of
bytes. There is no public constructor.
This class provides dummy implementations for many methods that
derived classes can override selectively; the default implementations
represent a file that cannot be read, written or seeked.
Even though IOBase does not declare read, readinto, or write because
their signatures will vary, implementations and clients should
consider those methods part of the interface. Also, implementations
may raise an IOError when operations they do not support are called.
The basic type used for binary data read from or written to a file is
the bytes type. Method arguments may also be bytearray or memoryview of
arrays of bytes. In some cases, such as readinto, a writable object such
as bytearray is required. Text I/O classes work with unicode data.
Note that calling any method (even inquiries) on a closed stream is
undefined. Implementations may raise IOError in this case.
IOBase (and its subclasses) support the iterator protocol, meaning
that an IOBase object can be iterated over yielding the lines in a
stream.
IOBase also supports the :keyword:`with` statement. In this example,
fp is closed after the suite of the with statement is complete:
with open('spam.txt', 'r') as fp:
fp.write('Spam and eggs!')
"""
### Internal ###
def _unsupported(self, name):
"""Internal: raise an exception for unsupported operations."""
raise UnsupportedOperation("%s.%s() not supported" %
(self.__class__.__name__, name))
### Positioning ###
def seek(self, pos, whence=0):
"""Change stream position.
Change the stream position to byte offset pos. Argument pos is
interpreted relative to the position indicated by whence. Values
for whence are:
* 0 -- start of stream (the default); offset should be zero or positive
* 1 -- current stream position; offset may be negative
* 2 -- end of stream; offset is usually negative
Return the new absolute position.
"""
self._unsupported("seek")
def tell(self):
"""Return current stream position."""
return self.seek(0, 1)
def truncate(self, pos=None):
"""Truncate file to size bytes.
Size defaults to the current IO position as reported by tell(). Return
the new size.
"""
self._unsupported("truncate")
### Flush and close ###
def flush(self):
"""Flush write buffers, if applicable.
This is not implemented for read-only and non-blocking streams.
"""
self._checkClosed()
# XXX Should this return the number of bytes written???
__closed = False
def close(self):
"""Flush and close the IO object.
This method has no effect if the file is already closed.
"""
if not self.__closed:
try:
self.flush()
finally:
self.__closed = True
def __del__(self):
"""Destructor. Calls close()."""
# The try/except block is in case this is called at program
# exit time, when it's possible that globals have already been
# deleted, and then the close() call might fail. Since
# there's nothing we can do about such failures and they annoy
# the end users, we suppress the traceback.
try:
self.close()
except:
pass
### Inquiries ###
def seekable(self):
"""Return whether object supports random access.
If False, seek(), tell() and truncate() will raise IOError.
This method may need to do a test seek().
"""
return False
def _checkSeekable(self, msg=None):
"""Internal: raise an IOError if file is not seekable
"""
if not self.seekable():
raise IOError("File or stream is not seekable."
if msg is None else msg)
def readable(self):
"""Return whether object was opened for reading.
If False, read() will raise IOError.
"""
return False
def _checkReadable(self, msg=None):
"""Internal: raise an IOError if file is not readable
"""
if not self.readable():
raise IOError("File or stream is not readable."
if msg is None else msg)
def writable(self):
"""Return whether object was opened for writing.
If False, write() and truncate() will raise IOError.
"""
return False
def _checkWritable(self, msg=None):
"""Internal: raise an IOError if file is not writable
"""
if not self.writable():
raise IOError("File or stream is not writable."
if msg is None else msg)
@property
def closed(self):
"""closed: bool. True iff the file has been closed.
For backwards compatibility, this is a property, not a predicate.
"""
return self.__closed
def _checkClosed(self, msg=None):
"""Internal: raise a ValueError if file is closed
"""
if self.closed:
raise ValueError("I/O operation on closed file."
if msg is None else msg)
### Context manager ###
def __enter__(self):
"""Context management protocol. Returns self."""
self._checkClosed()
return self
def __exit__(self, *args):
"""Context management protocol. Calls close()"""
self.close()
### Lower-level APIs ###
# XXX Should these be present even if unimplemented?
def fileno(self):
"""Returns underlying file descriptor if one exists.
An IOError is raised if the IO object does not use a file descriptor.
"""
self._unsupported("fileno")
def isatty(self):
"""Return whether this is an 'interactive' stream.
Return False if it can't be determined.
"""
self._checkClosed()
return False
### Readline[s] and writelines ###
def readline(self, limit=-1):
r"""Read and return a line from the stream.
If limit is specified, at most limit bytes will be read.
The line terminator is always b'\n' for binary files; for text
files, the newlines argument to open can be used to select the line
terminator(s) recognized.
"""
# For backwards compatibility, a (slowish) readline().
if hasattr(self, "peek"):
def nreadahead():
readahead = self.peek(1)
if not readahead:
return 1
n = (readahead.find(b"\n") + 1) or len(readahead)
if limit >= 0:
n = min(n, limit)
return n
else:
def nreadahead():
return 1
if limit is None:
limit = -1
elif not isinstance(limit, (int, long)):
raise TypeError("limit must be an integer")
res = bytearray()
while limit < 0 or len(res) < limit:
b = self.read(nreadahead())
if not b:
break
res += b
if res.endswith(b"\n"):
break
return bytes(res)
def __iter__(self):
self._checkClosed()
return self
def next(self):
line = self.readline()
if not line:
raise StopIteration
return line
def readlines(self, hint=None):
"""Return a list of lines from the stream.
hint can be specified to control the number of lines read: no more
lines will be read if the total size (in bytes/characters) of all
lines so far exceeds hint.
"""
if hint is not None and not isinstance(hint, (int, long)):
raise TypeError("integer or None expected")
if hint is None or hint <= 0:
return list(self)
n = 0
lines = []
for line in self:
lines.append(line)
n += len(line)
if n >= hint:
break
return lines
def writelines(self, lines):
self._checkClosed()
for line in lines:
self.write(line)
io.IOBase.register(IOBase)
class RawIOBase(IOBase):
"""Base class for raw binary I/O."""
# The read() method is implemented by calling readinto(); derived
# classes that want to support read() only need to implement
# readinto() as a primitive operation. In general, readinto() can be
# more efficient than read().
# (It would be tempting to also provide an implementation of
# readinto() in terms of read(), in case the latter is a more suitable
# primitive operation, but that would lead to nasty recursion in case
# a subclass doesn't implement either.)
def read(self, n=-1):
"""Read and return up to n bytes.
Returns an empty bytes object on EOF, or None if the object is
set not to block and has no data to read.
"""
if n is None:
n = -1
if n < 0:
return self.readall()
b = bytearray(n.__index__())
n = self.readinto(b)
if n is None:
return None
del b[n:]
return bytes(b)
def readall(self):
"""Read until EOF, using multiple read() call."""
res = bytearray()
while True:
data = self.read(DEFAULT_BUFFER_SIZE)
if not data:
break
res += data
if res:
return bytes(res)
else:
# b'' or None
return data
def readinto(self, b):
"""Read up to len(b) bytes into b.
Returns number of bytes read (0 for EOF), or None if the object
is set not to block and has no data to read.
"""
self._unsupported("readinto")
def write(self, b):
"""Write the given buffer to the IO stream.
Returns the number of bytes written, which may be less than len(b).
"""
self._unsupported("write")
io.RawIOBase.register(RawIOBase)
from _io import FileIO
RawIOBase.register(FileIO)
class BufferedIOBase(IOBase):
"""Base class for buffered IO objects.
The main difference with RawIOBase is that the read() method
supports omitting the size argument, and does not have a default
implementation that defers to readinto().
In addition, read(), readinto() and write() may raise
BlockingIOError if the underlying raw stream is in non-blocking
mode and not ready; unlike their raw counterparts, they will never
return None.
A typical implementation should not inherit from a RawIOBase
implementation, but wrap one.
"""
def read(self, n=None):
"""Read and return up to n bytes.
If the argument is omitted, None, or negative, reads and
returns all data until EOF.
If the argument is positive, and the underlying raw stream is
not 'interactive', multiple raw reads may be issued to satisfy
the byte count (unless EOF is reached first). But for
interactive raw streams (XXX and for pipes?), at most one raw
read will be issued, and a short result does not imply that
EOF is imminent.
Returns an empty bytes array on EOF.
Raises BlockingIOError if the underlying raw stream has no
data at the moment.
"""
self._unsupported("read")
def read1(self, n=None):
"""Read up to n bytes with at most one read() system call."""
self._unsupported("read1")
def readinto(self, b):
"""Read up to len(b) bytes into b.
Like read(), this may issue multiple reads to the underlying raw
stream, unless the latter is 'interactive'.
Returns the number of bytes read (0 for EOF).
Raises BlockingIOError if the underlying raw stream has no
data at the moment.
"""
data = self.read(len(b))
n = len(data)
try:
b[:n] = data
except TypeError as err:
import array
if not isinstance(b, array.array):
raise err
b[:n] = array.array(b'b', data)
return n
def write(self, b):
"""Write the given buffer to the IO stream.
Return the number of bytes written, which is always len(b).
Raises BlockingIOError if the buffer is full and the
underlying raw stream cannot accept more data at the moment.
"""
self._unsupported("write")
def detach(self):
"""
Separate the underlying raw stream from the buffer and return it.
After the raw stream has been detached, the buffer is in an unusable
state.
"""
self._unsupported("detach")
io.BufferedIOBase.register(BufferedIOBase)
class _BufferedIOMixin(BufferedIOBase):
"""A mixin implementation of BufferedIOBase with an underlying raw stream.
This passes most requests on to the underlying raw stream. It
does *not* provide implementations of read(), readinto() or
write().
"""
def __init__(self, raw):
self._raw = raw
### Positioning ###
def seek(self, pos, whence=0):
new_position = self.raw.seek(pos, whence)
if new_position < 0:
raise IOError("seek() returned an invalid position")
return new_position
def tell(self):
pos = self.raw.tell()
if pos < 0:
raise IOError("tell() returned an invalid position")
return pos
def truncate(self, pos=None):
# Flush the stream. We're mixing buffered I/O with lower-level I/O,
# and a flush may be necessary to synch both views of the current
# file state.
self.flush()
if pos is None:
pos = self.tell()
# XXX: Should seek() be used, instead of passing the position
# XXX directly to truncate?
return self.raw.truncate(pos)
### Flush and close ###
def flush(self):
if self.closed:
raise ValueError("flush of closed file")
self.raw.flush()
def close(self):
if self.raw is not None and not self.closed:
try:
# may raise BlockingIOError or BrokenPipeError etc
self.flush()
finally:
self.raw.close()
def detach(self):
if self.raw is None:
raise ValueError("raw stream already detached")
self.flush()
raw = self._raw
self._raw = None
return raw
### Inquiries ###
def seekable(self):
return self.raw.seekable()
def readable(self):
return self.raw.readable()
def writable(self):
return self.raw.writable()
@property
def raw(self):
return self._raw
@property
def closed(self):
return self.raw.closed
@property
def name(self):
return self.raw.name
@property
def mode(self):
return self.raw.mode
def __repr__(self):
clsname = self.__class__.__name__
try:
name = self.name
except Exception:
return "<_pyio.{0}>".format(clsname)
else:
return "<_pyio.{0} name={1!r}>".format(clsname, name)
### Lower-level APIs ###
def fileno(self):
return self.raw.fileno()
def isatty(self):
return self.raw.isatty()
class BytesIO(BufferedIOBase):
"""Buffered I/O implementation using an in-memory bytes buffer."""
def __init__(self, initial_bytes=None):
buf = bytearray()
if initial_bytes is not None:
buf.extend(initial_bytes)
self._buffer = buf
self._pos = 0
def __getstate__(self):
if self.closed:
raise ValueError("__getstate__ on closed file")
return self.__dict__.copy()
def getvalue(self):
"""Return the bytes value (contents) of the buffer
"""
if self.closed:
raise ValueError("getvalue on closed file")
return bytes(self._buffer)
def read(self, n=None):
if self.closed:
raise ValueError("read from closed file")
if n is None:
n = -1
if not isinstance(n, (int, long)):
raise TypeError("integer argument expected, got {0!r}".format(
type(n)))
if n < 0:
n = len(self._buffer)
if len(self._buffer) <= self._pos:
return b""
newpos = min(len(self._buffer), self._pos + n)
b = self._buffer[self._pos : newpos]
self._pos = newpos
return bytes(b)
def read1(self, n):
"""This is the same as read.
"""
return self.read(n)
def write(self, b):
if self.closed:
raise ValueError("write to closed file")
if isinstance(b, unicode):
raise TypeError("can't write unicode to binary stream")
n = len(b)
if n == 0:
return 0
pos = self._pos
if pos > len(self._buffer):
# Inserts null bytes between the current end of the file
# and the new write position.
padding = b'\x00' * (pos - len(self._buffer))
self._buffer += padding
self._buffer[pos:pos + n] = b
self._pos += n
return n
def seek(self, pos, whence=0):
if self.closed:
raise ValueError("seek on closed file")
try:
pos.__index__
except AttributeError:
raise TypeError("an integer is required")
if whence == 0:
if pos < 0:
raise ValueError("negative seek position %r" % (pos,))
self._pos = pos
elif whence == 1:
self._pos = max(0, self._pos + pos)
elif whence == 2:
self._pos = max(0, len(self._buffer) + pos)
else:
raise ValueError("invalid whence value")
return self._pos
def tell(self):
if self.closed:
raise ValueError("tell on closed file")
return self._pos
def truncate(self, pos=None):
if self.closed:
raise ValueError("truncate on closed file")
if pos is None:
pos = self._pos
else:
try:
pos.__index__
except AttributeError:
raise TypeError("an integer is required")
if pos < 0:
raise ValueError("negative truncate position %r" % (pos,))
del self._buffer[pos:]
return pos
def readable(self):
if self.closed:
raise ValueError("I/O operation on closed file.")
return True
def writable(self):
if self.closed:
raise ValueError("I/O operation on closed file.")
return True
def seekable(self):
if self.closed:
raise ValueError("I/O operation on closed file.")
return True
class BufferedReader(_BufferedIOMixin):
"""BufferedReader(raw[, buffer_size])
A buffer for a readable, sequential BaseRawIO object.
The constructor creates a BufferedReader for the given readable raw
stream and buffer_size. If buffer_size is omitted, DEFAULT_BUFFER_SIZE
is used.
"""
def __init__(self, raw, buffer_size=DEFAULT_BUFFER_SIZE):
"""Create a new buffered reader using the given readable raw IO object.
"""
if not raw.readable():
raise IOError('"raw" argument must be readable.')
_BufferedIOMixin.__init__(self, raw)
if buffer_size <= 0:
raise ValueError("invalid buffer size")
self.buffer_size = buffer_size
self._reset_read_buf()
self._read_lock = Lock()
def _reset_read_buf(self):
self._read_buf = b""
self._read_pos = 0
def read(self, n=None):
"""Read n bytes.
Returns exactly n bytes of data unless the underlying raw IO
stream reaches EOF or if the call would block in non-blocking
mode. If n is negative, read until EOF or until read() would
block.
"""
if n is not None and n < -1:
raise ValueError("invalid number of bytes to read")
with self._read_lock:
return self._read_unlocked(n)
def _read_unlocked(self, n=None):
nodata_val = b""
empty_values = (b"", None)
buf = self._read_buf
pos = self._read_pos
# Special case for when the number of bytes to read is unspecified.
if n is None or n == -1:
self._reset_read_buf()
chunks = [buf[pos:]] # Strip the consumed bytes.
current_size = 0
while True:
# Read until EOF or until read() would block.
try:
chunk = self.raw.read()
except IOError as e:
if e.errno != EINTR:
raise
continue
if chunk in empty_values:
nodata_val = chunk
break
current_size += len(chunk)
chunks.append(chunk)
return b"".join(chunks) or nodata_val
# The number of bytes to read is specified, return at most n bytes.
avail = len(buf) - pos # Length of the available buffered data.
if n <= avail:
# Fast path: the data to read is fully buffered.
self._read_pos += n
return buf[pos:pos+n]
# Slow path: read from the stream until enough bytes are read,
# or until an EOF occurs or until read() would block.
chunks = [buf[pos:]]
wanted = max(self.buffer_size, n)
while avail < n:
try:
chunk = self.raw.read(wanted)
except IOError as e:
if e.errno != EINTR:
raise
continue
if chunk in empty_values:
nodata_val = chunk
break
avail += len(chunk)
chunks.append(chunk)
# n is more than avail only when an EOF occurred or when
# read() would have blocked.
n = min(n, avail)
out = b"".join(chunks)
self._read_buf = out[n:] # Save the extra data in the buffer.
self._read_pos = 0
return out[:n] if out else nodata_val
def peek(self, n=0):
"""Returns buffered bytes without advancing the position.
The argument indicates a desired minimal number of bytes; we
do at most one raw read to satisfy it. We never return more
than self.buffer_size.
"""
with self._read_lock:
return self._peek_unlocked(n)
def _peek_unlocked(self, n=0):
want = min(n, self.buffer_size)
have = len(self._read_buf) - self._read_pos
if have < want or have <= 0:
to_read = self.buffer_size - have
while True:
try:
current = self.raw.read(to_read)
except IOError as e:
if e.errno != EINTR:
raise
continue
break
if current:
self._read_buf = self._read_buf[self._read_pos:] + current
self._read_pos = 0
return self._read_buf[self._read_pos:]
def read1(self, n):
"""Reads up to n bytes, with at most one read() system call."""
# Returns up to n bytes. If at least one byte is buffered, we
# only return buffered bytes. Otherwise, we do one raw read.
if n < 0:
raise ValueError("number of bytes to read must be positive")
if n == 0:
return b""
with self._read_lock:
self._peek_unlocked(1)
return self._read_unlocked(
min(n, len(self._read_buf) - self._read_pos))
def tell(self):
return _BufferedIOMixin.tell(self) - len(self._read_buf) + self._read_pos
def seek(self, pos, whence=0):
if not (0 <= whence <= 2):
raise ValueError("invalid whence value")
with self._read_lock:
if whence == 1:
pos -= len(self._read_buf) - self._read_pos
pos = _BufferedIOMixin.seek(self, pos, whence)
self._reset_read_buf()
return pos
class BufferedWriter(_BufferedIOMixin):
"""A buffer for a writeable sequential RawIO object.
The constructor creates a BufferedWriter for the given writeable raw
stream. If the buffer_size is not given, it defaults to
DEFAULT_BUFFER_SIZE.
"""
_warning_stack_offset = 2
def __init__(self, raw,
buffer_size=DEFAULT_BUFFER_SIZE, max_buffer_size=None):
if not raw.writable():
raise IOError('"raw" argument must be writable.')
_BufferedIOMixin.__init__(self, raw)
if buffer_size <= 0:
raise ValueError("invalid buffer size")
if max_buffer_size is not None:
warnings.warn("max_buffer_size is deprecated", DeprecationWarning,
self._warning_stack_offset)
self.buffer_size = buffer_size
self._write_buf = bytearray()
self._write_lock = Lock()
def write(self, b):
if self.closed:
raise ValueError("write to closed file")
if isinstance(b, unicode):
raise TypeError("can't write unicode to binary stream")
with self._write_lock:
# XXX we can implement some more tricks to try and avoid
# partial writes
if len(self._write_buf) > self.buffer_size:
# We're full, so let's pre-flush the buffer. (This may
# raise BlockingIOError with characters_written == 0.)
self._flush_unlocked()
before = len(self._write_buf)
self._write_buf.extend(b)
written = len(self._write_buf) - before
if len(self._write_buf) > self.buffer_size:
try:
self._flush_unlocked()
except BlockingIOError as e:
if len(self._write_buf) > self.buffer_size:
# We've hit the buffer_size. We have to accept a partial
# write and cut back our buffer.
overage = len(self._write_buf) - self.buffer_size
written -= overage
self._write_buf = self._write_buf[:self.buffer_size]
raise BlockingIOError(e.errno, e.strerror, written)
return written
def truncate(self, pos=None):
with self._write_lock:
self._flush_unlocked()
if pos is None:
pos = self.raw.tell()
return self.raw.truncate(pos)
def flush(self):
with self._write_lock:
self._flush_unlocked()
def _flush_unlocked(self):
if self.closed:
raise ValueError("flush of closed file")
while self._write_buf:
try:
n = self.raw.write(self._write_buf)
except BlockingIOError:
raise RuntimeError("self.raw should implement RawIOBase: it "
"should not raise BlockingIOError")
except IOError as e:
if e.errno != EINTR:
raise
continue
if n is None:
raise BlockingIOError(
errno.EAGAIN,
"write could not complete without blocking", 0)
if n > len(self._write_buf) or n < 0:
raise IOError("write() returned incorrect number of bytes")
del self._write_buf[:n]
def tell(self):
return _BufferedIOMixin.tell(self) + len(self._write_buf)
def seek(self, pos, whence=0):
if not (0 <= whence <= 2):
raise ValueError("invalid whence")
with self._write_lock:
self._flush_unlocked()
return _BufferedIOMixin.seek(self, pos, whence)
class BufferedRWPair(BufferedIOBase):
"""A buffered reader and writer object together.
A buffered reader object and buffered writer object put together to
form a sequential IO object that can read and write. This is typically
used with a socket or two-way pipe.
reader and writer are RawIOBase objects that are readable and
writeable respectively. If the buffer_size is omitted it defaults to
DEFAULT_BUFFER_SIZE.
"""
# XXX The usefulness of this (compared to having two separate IO
# objects) is questionable.
def __init__(self, reader, writer,
buffer_size=DEFAULT_BUFFER_SIZE, max_buffer_size=None):
"""Constructor.
The arguments are two RawIO instances.
"""
if max_buffer_size is not None:
warnings.warn("max_buffer_size is deprecated", DeprecationWarning, 2)
if not reader.readable():
raise IOError('"reader" argument must be readable.')
if not writer.writable():
raise IOError('"writer" argument must be writable.')
self.reader = BufferedReader(reader, buffer_size)
self.writer = BufferedWriter(writer, buffer_size)
def read(self, n=None):
if n is None:
n = -1
return self.reader.read(n)
def readinto(self, b):
return self.reader.readinto(b)
def write(self, b):
return self.writer.write(b)
def peek(self, n=0):
return self.reader.peek(n)
def read1(self, n):
return self.reader.read1(n)
def readable(self):
return self.reader.readable()
def writable(self):
return self.writer.writable()
def flush(self):
return self.writer.flush()
def close(self):
try:
self.writer.close()
finally:
self.reader.close()
def isatty(self):
return self.reader.isatty() or self.writer.isatty()
@property
def closed(self):
return self.writer.closed
class BufferedRandom(BufferedWriter, BufferedReader):
"""A buffered interface to random access streams.
The constructor creates a reader and writer for a seekable stream,
raw, given in the first argument. If the buffer_size is omitted it
defaults to DEFAULT_BUFFER_SIZE.
"""
_warning_stack_offset = 3
def __init__(self, raw,
buffer_size=DEFAULT_BUFFER_SIZE, max_buffer_size=None):
raw._checkSeekable()
BufferedReader.__init__(self, raw, buffer_size)
BufferedWriter.__init__(self, raw, buffer_size, max_buffer_size)
def seek(self, pos, whence=0):
if not (0 <= whence <= 2):
raise ValueError("invalid whence")
self.flush()
if self._read_buf:
# Undo read ahead.
with self._read_lock:
self.raw.seek(self._read_pos - len(self._read_buf), 1)
# First do the raw seek, then empty the read buffer, so that
# if the raw seek fails, we don't lose buffered data forever.
pos = self.raw.seek(pos, whence)
with self._read_lock:
self._reset_read_buf()
if pos < 0:
raise IOError("seek() returned invalid position")
return pos
def tell(self):
if self._write_buf:
return BufferedWriter.tell(self)
else:
return BufferedReader.tell(self)
def truncate(self, pos=None):
if pos is None:
pos = self.tell()
# Use seek to flush the read buffer.
return BufferedWriter.truncate(self, pos)
def read(self, n=None):
if n is None:
n = -1
self.flush()
return BufferedReader.read(self, n)
def readinto(self, b):
self.flush()
return BufferedReader.readinto(self, b)
def peek(self, n=0):
self.flush()
return BufferedReader.peek(self, n)
def read1(self, n):
self.flush()
return BufferedReader.read1(self, n)
def write(self, b):
if self._read_buf:
# Undo readahead
with self._read_lock:
self.raw.seek(self._read_pos - len(self._read_buf), 1)
self._reset_read_buf()
return BufferedWriter.write(self, b)
class TextIOBase(IOBase):
"""Base class for text I/O.
This class provides a character and line based interface to stream
I/O. There is no readinto method because Python's character strings
are immutable. There is no public constructor.
"""
def read(self, n=-1):
"""Read at most n characters from stream.
Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
"""
self._unsupported("read")
def write(self, s):
"""Write string s to stream."""
self._unsupported("write")
def truncate(self, pos=None):
"""Truncate size to pos."""
self._unsupported("truncate")
def readline(self):
"""Read until newline or EOF.
Returns an empty string if EOF is hit immediately.
"""
self._unsupported("readline")
def detach(self):
"""
Separate the underlying buffer from the TextIOBase and return it.
After the underlying buffer has been detached, the TextIO is in an
unusable state.
"""
self._unsupported("detach")
@property
def encoding(self):
"""Subclasses should override."""
return None
@property
def newlines(self):
"""Line endings translated so far.
Only line endings translated during reading are considered.
Subclasses should override.
"""
return None
@property
def errors(self):
"""Error setting of the decoder or encoder.
Subclasses should override."""
return None
io.TextIOBase.register(TextIOBase)
class IncrementalNewlineDecoder(codecs.IncrementalDecoder):
r"""Codec used when reading a file in universal newlines mode. It wraps
another incremental decoder, translating \r\n and \r into \n. It also
records the types of newlines encountered. When used with
translate=False, it ensures that the newline sequence is returned in
one piece.
"""
def __init__(self, decoder, translate, errors='strict'):
codecs.IncrementalDecoder.__init__(self, errors=errors)
self.translate = translate
self.decoder = decoder
self.seennl = 0
self.pendingcr = False
def decode(self, input, final=False):
# decode input (with the eventual \r from a previous pass)
if self.decoder is None:
output = input
else:
output = self.decoder.decode(input, final=final)
if self.pendingcr and (output or final):
output = "\r" + output
self.pendingcr = False
# retain last \r even when not translating data:
# then readline() is sure to get \r\n in one pass
if output.endswith("\r") and not final:
output = output[:-1]
self.pendingcr = True
# Record which newlines are read
crlf = output.count('\r\n')
cr = output.count('\r') - crlf
lf = output.count('\n') - crlf
self.seennl |= (lf and self._LF) | (cr and self._CR) \
| (crlf and self._CRLF)
if self.translate:
if crlf:
output = output.replace("\r\n", "\n")
if cr:
output = output.replace("\r", "\n")
return output
def getstate(self):
if self.decoder is None:
buf = b""
flag = 0
else:
buf, flag = self.decoder.getstate()
flag <<= 1
if self.pendingcr:
flag |= 1
return buf, flag
def setstate(self, state):
buf, flag = state
self.pendingcr = bool(flag & 1)
if self.decoder is not None:
self.decoder.setstate((buf, flag >> 1))
def reset(self):
self.seennl = 0
self.pendingcr = False
if self.decoder is not None:
self.decoder.reset()
_LF = 1
_CR = 2
_CRLF = 4
@property
def newlines(self):
return (None,
"\n",
"\r",
("\r", "\n"),
"\r\n",
("\n", "\r\n"),
("\r", "\r\n"),
("\r", "\n", "\r\n")
)[self.seennl]
class TextIOWrapper(TextIOBase):
r"""Character and line based layer over a BufferedIOBase object, buffer.
encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getpreferredencoding.
errors determines the strictness of encoding and decoding (see the
codecs.register) and defaults to "strict".
newline can be None, '', '\n', '\r', or '\r\n'. It controls the
handling of line endings. If it is None, universal newlines is
enabled. With this enabled, on input, the lines endings '\n', '\r',
or '\r\n' are translated to '\n' before being returned to the
caller. Conversely, on output, '\n' is translated to the system
default line separator, os.linesep. If newline is any other of its
legal values, that newline becomes the newline when the file is read
and it is returned untranslated. On output, '\n' is converted to the
newline.
If line_buffering is True, a call to flush is implied when a call to
write contains a newline character.
"""
_CHUNK_SIZE = 2048
def __init__(self, buffer, encoding=None, errors=None, newline=None,
line_buffering=False):
if newline is not None and not isinstance(newline, basestring):
raise TypeError("illegal newline type: %r" % (type(newline),))
if newline not in (None, "", "\n", "\r", "\r\n"):
raise ValueError("illegal newline value: %r" % (newline,))
if encoding is None:
try:
import locale
except ImportError:
# Importing locale may fail if Python is being built
encoding = "ascii"
else:
encoding = locale.getpreferredencoding()
if not isinstance(encoding, basestring):
raise ValueError("invalid encoding: %r" % encoding)
if sys.py3kwarning and not codecs.lookup(encoding)._is_text_encoding:
msg = ("%r is not a text encoding; "
"use codecs.open() to handle arbitrary codecs")
warnings.warnpy3k(msg % encoding, stacklevel=2)
if errors is None:
errors = "strict"
else:
if not isinstance(errors, basestring):
raise ValueError("invalid errors: %r" % errors)
self._buffer = buffer
self._line_buffering = line_buffering
self._encoding = encoding
self._errors = errors
self._readuniversal = not newline
self._readtranslate = newline is None
self._readnl = newline
self._writetranslate = newline != ''
self._writenl = newline or os.linesep
self._encoder = None
self._decoder = None
self._decoded_chars = '' # buffer for text returned from decoder
self._decoded_chars_used = 0 # offset into _decoded_chars for read()
self._snapshot = None # info for reconstructing decoder state
self._seekable = self._telling = self.buffer.seekable()
if self._seekable and self.writable():
position = self.buffer.tell()
if position != 0:
try:
self._get_encoder().setstate(0)
except LookupError:
# Sometimes the encoder doesn't exist
pass
# self._snapshot is either None, or a tuple (dec_flags, next_input)
# where dec_flags is the second (integer) item of the decoder state
# and next_input is the chunk of input bytes that comes next after the
# snapshot point. We use this to reconstruct decoder states in tell().
# Naming convention:
# - "bytes_..." for integer variables that count input bytes
# - "chars_..." for integer variables that count decoded characters
def __repr__(self):
try:
name = self.name
except Exception:
return "<_pyio.TextIOWrapper encoding='{0}'>".format(self.encoding)
else:
return "<_pyio.TextIOWrapper name={0!r} encoding='{1}'>".format(
name, self.encoding)
@property
def encoding(self):
return self._encoding
@property
def errors(self):
return self._errors
@property
def line_buffering(self):
return self._line_buffering
@property
def buffer(self):
return self._buffer
def seekable(self):
if self.closed:
raise ValueError("I/O operation on closed file.")
return self._seekable
def readable(self):
return self.buffer.readable()
def writable(self):
return self.buffer.writable()
def flush(self):
self.buffer.flush()
self._telling = self._seekable
def close(self):
if self.buffer is not None and not self.closed:
try:
self.flush()
finally:
self.buffer.close()
@property
def closed(self):
return self.buffer.closed
@property
def name(self):
return self.buffer.name
def fileno(self):
return self.buffer.fileno()
def isatty(self):
return self.buffer.isatty()
def write(self, s):
if self.closed:
raise ValueError("write to closed file")
if not isinstance(s, unicode):
raise TypeError("can't write %s to text stream" %
s.__class__.__name__)
length = len(s)
haslf = (self._writetranslate or self._line_buffering) and "\n" in s
if haslf and self._writetranslate and self._writenl != "\n":
s = s.replace("\n", self._writenl)
encoder = self._encoder or self._get_encoder()
# XXX What if we were just reading?
b = encoder.encode(s)
self.buffer.write(b)
if self._line_buffering and (haslf or "\r" in s):
self.flush()
self._set_decoded_chars('')
self._snapshot = None
if self._decoder:
self._decoder.reset()
return length
def _get_encoder(self):
make_encoder = codecs.getincrementalencoder(self._encoding)
self._encoder = make_encoder(self._errors)
return self._encoder
def _get_decoder(self):
make_decoder = codecs.getincrementaldecoder(self._encoding)
decoder = make_decoder(self._errors)
if self._readuniversal:
decoder = IncrementalNewlineDecoder(decoder, self._readtranslate)
self._decoder = decoder
return decoder
# The following three methods implement an ADT for _decoded_chars.
# Text returned from the decoder is buffered here until the client
# requests it by calling our read() or readline() method.
def _set_decoded_chars(self, chars):
"""Set the _decoded_chars buffer."""
self._decoded_chars = chars
self._decoded_chars_used = 0
def _get_decoded_chars(self, n=None):
"""Advance into the _decoded_chars buffer."""
offset = self._decoded_chars_used
if n is None:
chars = self._decoded_chars[offset:]
else:
chars = self._decoded_chars[offset:offset + n]
self._decoded_chars_used += len(chars)
return chars
def _rewind_decoded_chars(self, n):
"""Rewind the _decoded_chars buffer."""
if self._decoded_chars_used < n:
raise AssertionError("rewind decoded_chars out of bounds")
self._decoded_chars_used -= n
def _read_chunk(self):
"""
Read and decode the next chunk of data from the BufferedReader.
"""
# The return value is True unless EOF was reached. The decoded
# string is placed in self._decoded_chars (replacing its previous
# value). The entire input chunk is sent to the decoder, though
# some of it may remain buffered in the decoder, yet to be
# converted.
if self._decoder is None:
raise ValueError("no decoder")
if self._telling:
# To prepare for tell(), we need to snapshot a point in the
# file where the decoder's input buffer is empty.
dec_buffer, dec_flags = self._decoder.getstate()
# Given this, we know there was a valid snapshot point
# len(dec_buffer) bytes ago with decoder state (b'', dec_flags).
# Read a chunk, decode it, and put the result in self._decoded_chars.
input_chunk = self.buffer.read1(self._CHUNK_SIZE)
eof = not input_chunk
self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
if self._telling:
# At the snapshot point, len(dec_buffer) bytes before the read,
# the next input to be decoded is dec_buffer + input_chunk.
self._snapshot = (dec_flags, dec_buffer + input_chunk)
return not eof
def _pack_cookie(self, position, dec_flags=0,
bytes_to_feed=0, need_eof=0, chars_to_skip=0):
# The meaning of a tell() cookie is: seek to position, set the
# decoder flags to dec_flags, read bytes_to_feed bytes, feed them
# into the decoder with need_eof as the EOF flag, then skip
# chars_to_skip characters of the decoded result. For most simple
# decoders, tell() will often just give a byte offset in the file.
return (position | (dec_flags<<64) | (bytes_to_feed<<128) |
(chars_to_skip<<192) | bool(need_eof)<<256)
def _unpack_cookie(self, bigint):
rest, position = divmod(bigint, 1<<64)
rest, dec_flags = divmod(rest, 1<<64)
rest, bytes_to_feed = divmod(rest, 1<<64)
need_eof, chars_to_skip = divmod(rest, 1<<64)
return position, dec_flags, bytes_to_feed, need_eof, chars_to_skip
def tell(self):
if not self._seekable:
raise IOError("underlying stream is not seekable")
if not self._telling:
raise IOError("telling position disabled by next() call")
self.flush()
position = self.buffer.tell()
decoder = self._decoder
if decoder is None or self._snapshot is None:
if self._decoded_chars:
# This should never happen.
raise AssertionError("pending decoded text")
return position
# Skip backward to the snapshot point (see _read_chunk).
dec_flags, next_input = self._snapshot
position -= len(next_input)
# How many decoded characters have been used up since the snapshot?
chars_to_skip = self._decoded_chars_used
if chars_to_skip == 0:
# We haven't moved from the snapshot point.
return self._pack_cookie(position, dec_flags)
# Starting from the snapshot position, we will walk the decoder
# forward until it gives us enough decoded characters.
saved_state = decoder.getstate()
try:
# Note our initial start point.
decoder.setstate((b'', dec_flags))
start_pos = position
start_flags, bytes_fed, chars_decoded = dec_flags, 0, 0
need_eof = 0
# Feed the decoder one byte at a time. As we go, note the
# nearest "safe start point" before the current location
# (a point where the decoder has nothing buffered, so seek()
# can safely start from there and advance to this location).
for next_byte in next_input:
bytes_fed += 1
chars_decoded += len(decoder.decode(next_byte))
dec_buffer, dec_flags = decoder.getstate()
if not dec_buffer and chars_decoded <= chars_to_skip:
# Decoder buffer is empty, so this is a safe start point.
start_pos += bytes_fed
chars_to_skip -= chars_decoded
start_flags, bytes_fed, chars_decoded = dec_flags, 0, 0
if chars_decoded >= chars_to_skip:
break
else:
# We didn't get enough decoded data; signal EOF to get more.
chars_decoded += len(decoder.decode(b'', final=True))
need_eof = 1
if chars_decoded < chars_to_skip:
raise IOError("can't reconstruct logical file position")
# The returned cookie corresponds to the last safe start point.
return self._pack_cookie(
start_pos, start_flags, bytes_fed, need_eof, chars_to_skip)
finally:
decoder.setstate(saved_state)
def truncate(self, pos=None):
self.flush()
if pos is None:
pos = self.tell()
return self.buffer.truncate(pos)
def detach(self):
if self.buffer is None:
raise ValueError("buffer is already detached")
self.flush()
buffer = self._buffer
self._buffer = None
return buffer
def seek(self, cookie, whence=0):
if self.closed:
raise ValueError("tell on closed file")
if not self._seekable:
raise IOError("underlying stream is not seekable")
if whence == 1: # seek relative to current position
if cookie != 0:
raise IOError("can't do nonzero cur-relative seeks")
# Seeking to the current position should attempt to
# sync the underlying buffer with the current position.
whence = 0
cookie = self.tell()
if whence == 2: # seek relative to end of file
if cookie != 0:
raise IOError("can't do nonzero end-relative seeks")
self.flush()
position = self.buffer.seek(0, 2)
self._set_decoded_chars('')
self._snapshot = None
if self._decoder:
self._decoder.reset()
return position
if whence != 0:
raise ValueError("invalid whence (%r, should be 0, 1 or 2)" %
(whence,))
if cookie < 0:
raise ValueError("negative seek position %r" % (cookie,))
self.flush()
# The strategy of seek() is to go back to the safe start point
# and replay the effect of read(chars_to_skip) from there.
start_pos, dec_flags, bytes_to_feed, need_eof, chars_to_skip = \
self._unpack_cookie(cookie)
# Seek back to the safe start point.
self.buffer.seek(start_pos)
self._set_decoded_chars('')
self._snapshot = None
# Restore the decoder to its state from the safe start point.
if cookie == 0 and self._decoder:
self._decoder.reset()
elif self._decoder or dec_flags or chars_to_skip:
self._decoder = self._decoder or self._get_decoder()
self._decoder.setstate((b'', dec_flags))
self._snapshot = (dec_flags, b'')
if chars_to_skip:
# Just like _read_chunk, feed the decoder and save a snapshot.
input_chunk = self.buffer.read(bytes_to_feed)
self._set_decoded_chars(
self._decoder.decode(input_chunk, need_eof))
self._snapshot = (dec_flags, input_chunk)
# Skip chars_to_skip of the decoded characters.
if len(self._decoded_chars) < chars_to_skip:
raise IOError("can't restore logical file position")
self._decoded_chars_used = chars_to_skip
# Finally, reset the encoder (merely useful for proper BOM handling)
try:
encoder = self._encoder or self._get_encoder()
except LookupError:
# Sometimes the encoder doesn't exist
pass
else:
if cookie != 0:
encoder.setstate(0)
else:
encoder.reset()
return cookie
def read(self, n=None):
self._checkReadable()
if n is None:
n = -1
decoder = self._decoder or self._get_decoder()
try:
n.__index__
except AttributeError:
raise TypeError("an integer is required")
if n < 0:
# Read everything.
result = (self._get_decoded_chars() +
decoder.decode(self.buffer.read(), final=True))
self._set_decoded_chars('')
self._snapshot = None
return result
else:
# Keep reading chunks until we have n characters to return.
eof = False
result = self._get_decoded_chars(n)
while len(result) < n and not eof:
eof = not self._read_chunk()
result += self._get_decoded_chars(n - len(result))
return result
def next(self):
self._telling = False
line = self.readline()
if not line:
self._snapshot = None
self._telling = self._seekable
raise StopIteration
return line
def readline(self, limit=None):
if self.closed:
raise ValueError("read from closed file")
if limit is None:
limit = -1
elif not isinstance(limit, (int, long)):
raise TypeError("limit must be an integer")
# Grab all the decoded text (we will rewind any extra bits later).
line = self._get_decoded_chars()
start = 0
# Make the decoder if it doesn't already exist.
if not self._decoder:
self._get_decoder()
pos = endpos = None
while True:
if self._readtranslate:
# Newlines are already translated, only search for \n
pos = line.find('\n', start)
if pos >= 0:
endpos = pos + 1
break
else:
start = len(line)
elif self._readuniversal:
# Universal newline search. Find any of \r, \r\n, \n
# The decoder ensures that \r\n are not split in two pieces
# In C we'd look for these in parallel of course.
nlpos = line.find("\n", start)
crpos = line.find("\r", start)
if crpos == -1:
if nlpos == -1:
# Nothing found
start = len(line)
else:
# Found \n
endpos = nlpos + 1
break
elif nlpos == -1:
# Found lone \r
endpos = crpos + 1
break
elif nlpos < crpos:
# Found \n
endpos = nlpos + 1
break
elif nlpos == crpos + 1:
# Found \r\n
endpos = crpos + 2
break
else:
# Found \r
endpos = crpos + 1
break
else:
# non-universal
pos = line.find(self._readnl)
if pos >= 0:
endpos = pos + len(self._readnl)
break
if limit >= 0 and len(line) >= limit:
endpos = limit # reached length limit
break
# No line ending seen yet - get more data'
while self._read_chunk():
if self._decoded_chars:
break
if self._decoded_chars:
line += self._get_decoded_chars()
else:
# end of file
self._set_decoded_chars('')
self._snapshot = None
return line
if limit >= 0 and endpos > limit:
endpos = limit # don't exceed limit
# Rewind _decoded_chars to just after the line ending we found.
self._rewind_decoded_chars(len(line) - endpos)
return line[:endpos]
@property
def newlines(self):
return self._decoder.newlines if self._decoder else None
class StringIO(TextIOWrapper):
"""Text I/O implementation using an in-memory buffer.
The initial_value argument sets the value of object. The newline
argument is like the one of TextIOWrapper's constructor.
"""
def __init__(self, initial_value="", newline="\n"):
super(StringIO, self).__init__(BytesIO(),
encoding="utf-8",
errors="strict",
newline=newline)
# Issue #5645: make universal newlines semantics the same as in the
# C version, even under Windows.
if newline is None:
self._writetranslate = False
if initial_value:
if not isinstance(initial_value, unicode):
initial_value = unicode(initial_value)
self.write(initial_value)
self.seek(0)
def getvalue(self):
self.flush()
decoder = self._decoder or self._get_decoder()
old_state = decoder.getstate()
decoder.reset()
try:
return decoder.decode(self.buffer.getvalue(), final=True)
finally:
decoder.setstate(old_state)
def __repr__(self):
# TextIOWrapper tells the encoding in its repr. In StringIO,
# that's an implementation detail.
return object.__repr__(self)
@property
def errors(self):
return None
@property
def encoding(self):
return None
def detach(self):
# This doesn't make sense on StringIO.
self._unsupported("detach")
"""Strptime-related classes and functions.
CLASSES:
LocaleTime -- Discovers and stores locale-specific time information
TimeRE -- Creates regexes for pattern matching a string of text containing
time information
FUNCTIONS:
_getlang -- Figure out what language is being used for the locale
strptime -- Calculates the time struct represented by the passed-in string
"""
import time
import locale
import calendar
from re import compile as re_compile
from re import IGNORECASE
from re import escape as re_escape
from datetime import date as datetime_date
try:
from thread import allocate_lock as _thread_allocate_lock
except:
from dummy_thread import allocate_lock as _thread_allocate_lock
__all__ = []
def _getlang():
# Figure out what the current language is set to.
return locale.getlocale(locale.LC_TIME)
class LocaleTime(object):
"""Stores and handles locale-specific information related to time.
ATTRIBUTES:
f_weekday -- full weekday names (7-item list)
a_weekday -- abbreviated weekday names (7-item list)
f_month -- full month names (13-item list; dummy value in [0], which
is added by code)
a_month -- abbreviated month names (13-item list, dummy value in
[0], which is added by code)
am_pm -- AM/PM representation (2-item list)
LC_date_time -- format string for date/time representation (string)
LC_date -- format string for date representation (string)
LC_time -- format string for time representation (string)
timezone -- daylight- and non-daylight-savings timezone representation
(2-item list of sets)
lang -- Language used by instance (2-item tuple)
"""
def __init__(self):
"""Set all attributes.
Order of methods called matters for dependency reasons.
The locale language is set at the offset and then checked again before
exiting. This is to make sure that the attributes were not set with a
mix of information from more than one locale. This would most likely
happen when using threads where one thread calls a locale-dependent
function while another thread changes the locale while the function in
the other thread is still running. Proper coding would call for
locks to prevent changing the locale while locale-dependent code is
running. The check here is done in case someone does not think about
doing this.
Only other possible issue is if someone changed the timezone and did
not call tz.tzset . That is an issue for the programmer, though,
since changing the timezone is worthless without that call.
"""
self.lang = _getlang()
self.__calc_weekday()
self.__calc_month()
self.__calc_am_pm()
self.__calc_timezone()
self.__calc_date_time()
if _getlang() != self.lang:
raise ValueError("locale changed during initialization")
if time.tzname != self.tzname or time.daylight != self.daylight:
raise ValueError("timezone changed during initialization")
def __pad(self, seq, front):
# Add '' to seq to either the front (is True), else the back.
seq = list(seq)
if front:
seq.insert(0, '')
else:
seq.append('')
return seq
def __calc_weekday(self):
# Set self.a_weekday and self.f_weekday using the calendar
# module.
a_weekday = [calendar.day_abbr[i].lower() for i in range(7)]
f_weekday = [calendar.day_name[i].lower() for i in range(7)]
self.a_weekday = a_weekday
self.f_weekday = f_weekday
def __calc_month(self):
# Set self.f_month and self.a_month using the calendar module.
a_month = [calendar.month_abbr[i].lower() for i in range(13)]
f_month = [calendar.month_name[i].lower() for i in range(13)]
self.a_month = a_month
self.f_month = f_month
def __calc_am_pm(self):
# Set self.am_pm by using time.strftime().
# The magic date (1999,3,17,hour,44,55,2,76,0) is not really that
# magical; just happened to have used it everywhere else where a
# static date was needed.
am_pm = []
for hour in (01,22):
time_tuple = time.struct_time((1999,3,17,hour,44,55,2,76,0))
am_pm.append(time.strftime("%p", time_tuple).lower())
self.am_pm = am_pm
def __calc_date_time(self):
# Set self.date_time, self.date, & self.time by using
# time.strftime().
# Use (1999,3,17,22,44,55,2,76,0) for magic date because the amount of
# overloaded numbers is minimized. The order in which searches for
# values within the format string is very important; it eliminates
# possible ambiguity for what something represents.
time_tuple = time.struct_time((1999,3,17,22,44,55,2,76,0))
date_time = [None, None, None]
date_time[0] = time.strftime("%c", time_tuple).lower()
date_time[1] = time.strftime("%x", time_tuple).lower()
date_time[2] = time.strftime("%X", time_tuple).lower()
replacement_pairs = [('%', '%%'), (self.f_weekday[2], '%A'),
(self.f_month[3], '%B'), (self.a_weekday[2], '%a'),
(self.a_month[3], '%b'), (self.am_pm[1], '%p'),
('1999', '%Y'), ('99', '%y'), ('22', '%H'),
('44', '%M'), ('55', '%S'), ('76', '%j'),
('17', '%d'), ('03', '%m'), ('3', '%m'),
# '3' needed for when no leading zero.
('2', '%w'), ('10', '%I')]
replacement_pairs.extend([(tz, "%Z") for tz_values in self.timezone
for tz in tz_values])
for offset,directive in ((0,'%c'), (1,'%x'), (2,'%X')):
current_format = date_time[offset]
for old, new in replacement_pairs:
# Must deal with possible lack of locale info
# manifesting itself as the empty string (e.g., Swedish's
# lack of AM/PM info) or a platform returning a tuple of empty
# strings (e.g., MacOS 9 having timezone as ('','')).
if old:
current_format = current_format.replace(old, new)
# If %W is used, then Sunday, 2005-01-03 will fall on week 0 since
# 2005-01-03 occurs before the first Monday of the year. Otherwise
# %U is used.
time_tuple = time.struct_time((1999,1,3,1,1,1,6,3,0))
if '00' in time.strftime(directive, time_tuple):
U_W = '%W'
else:
U_W = '%U'
date_time[offset] = current_format.replace('11', U_W)
self.LC_date_time = date_time[0]
self.LC_date = date_time[1]
self.LC_time = date_time[2]
def __calc_timezone(self):
# Set self.timezone by using time.tzname.
# Do not worry about possibility of time.tzname[0] == time.tzname[1]
# and time.daylight; handle that in strptime.
try:
time.tzset()
except AttributeError:
pass
self.tzname = time.tzname
self.daylight = time.daylight
no_saving = frozenset(["utc", "gmt", self.tzname[0].lower()])
if self.daylight:
has_saving = frozenset([self.tzname[1].lower()])
else:
has_saving = frozenset()
self.timezone = (no_saving, has_saving)
class TimeRE(dict):
"""Handle conversion from format directives to regexes."""
def __init__(self, locale_time=None):
"""Create keys/values.
Order of execution is important for dependency reasons.
"""
if locale_time:
self.locale_time = locale_time
else:
self.locale_time = LocaleTime()
base = super(TimeRE, self)
base.__init__({
# The " \d" part of the regex is to make %c from ANSI C work
'd': r"(?P<d>3[0-1]|[1-2]\d|0[1-9]|[1-9]| [1-9])",
'f': r"(?P<f>[0-9]{1,6})",
'H': r"(?P<H>2[0-3]|[0-1]\d|\d)",
'I': r"(?P<I>1[0-2]|0[1-9]|[1-9])",
'j': r"(?P<j>36[0-6]|3[0-5]\d|[1-2]\d\d|0[1-9]\d|00[1-9]|[1-9]\d|0[1-9]|[1-9])",
'm': r"(?P<m>1[0-2]|0[1-9]|[1-9])",
'M': r"(?P<M>[0-5]\d|\d)",
'S': r"(?P<S>6[0-1]|[0-5]\d|\d)",
'U': r"(?P<U>5[0-3]|[0-4]\d|\d)",
'w': r"(?P<w>[0-6])",
# W is set below by using 'U'
'y': r"(?P<y>\d\d)",
#XXX: Does 'Y' need to worry about having less or more than
# 4 digits?
'Y': r"(?P<Y>\d\d\d\d)",
'A': self.__seqToRE(self.locale_time.f_weekday, 'A'),
'a': self.__seqToRE(self.locale_time.a_weekday, 'a'),
'B': self.__seqToRE(self.locale_time.f_month[1:], 'B'),
'b': self.__seqToRE(self.locale_time.a_month[1:], 'b'),
'p': self.__seqToRE(self.locale_time.am_pm, 'p'),
'Z': self.__seqToRE((tz for tz_names in self.locale_time.timezone
for tz in tz_names),
'Z'),
'%': '%'})
base.__setitem__('W', base.__getitem__('U').replace('U', 'W'))
base.__setitem__('c', self.pattern(self.locale_time.LC_date_time))
base.__setitem__('x', self.pattern(self.locale_time.LC_date))
base.__setitem__('X', self.pattern(self.locale_time.LC_time))
def __seqToRE(self, to_convert, directive):
"""Convert a list to a regex string for matching a directive.
Want possible matching values to be from longest to shortest. This
prevents the possibility of a match occurring for a value that also
a substring of a larger value that should have matched (e.g., 'abc'
matching when 'abcdef' should have been the match).
"""
to_convert = sorted(to_convert, key=len, reverse=True)
for value in to_convert:
if value != '':
break
else:
return ''
regex = '|'.join(re_escape(stuff) for stuff in to_convert)
regex = '(?P<%s>%s' % (directive, regex)
return '%s)' % regex
def pattern(self, format):
"""Return regex pattern for the format string.
Need to make sure that any characters that might be interpreted as
regex syntax are escaped.
"""
processed_format = ''
# The sub() call escapes all characters that might be misconstrued
# as regex syntax. Cannot use re.escape since we have to deal with
# format directives (%m, etc.).
regex_chars = re_compile(r"([\\.^$*+?\(\){}\[\]|])")
format = regex_chars.sub(r"\\\1", format)
whitespace_replacement = re_compile(r'\s+')
format = whitespace_replacement.sub(r'\\s+', format)
while '%' in format:
directive_index = format.index('%')+1
processed_format = "%s%s%s" % (processed_format,
format[:directive_index-1],
self[format[directive_index]])
format = format[directive_index+1:]
return "%s%s" % (processed_format, format)
def compile(self, format):
"""Return a compiled re object for the format string."""
return re_compile(self.pattern(format), IGNORECASE)
_cache_lock = _thread_allocate_lock()
# DO NOT modify _TimeRE_cache or _regex_cache without acquiring the cache lock
# first!
_TimeRE_cache = TimeRE()
_CACHE_MAX_SIZE = 5 # Max number of regexes stored in _regex_cache
_regex_cache = {}
def _calc_julian_from_U_or_W(year, week_of_year, day_of_week, week_starts_Mon):
"""Calculate the Julian day based on the year, week of the year, and day of
the week, with week_start_day representing whether the week of the year
assumes the week starts on Sunday or Monday (6 or 0)."""
first_weekday = datetime_date(year, 1, 1).weekday()
# If we are dealing with the %U directive (week starts on Sunday), it's
# easier to just shift the view to Sunday being the first day of the
# week.
if not week_starts_Mon:
first_weekday = (first_weekday + 1) % 7
day_of_week = (day_of_week + 1) % 7
# Need to watch out for a week 0 (when the first day of the year is not
# the same as that specified by %U or %W).
week_0_length = (7 - first_weekday) % 7
if week_of_year == 0:
return 1 + day_of_week - first_weekday
else:
days_to_week = week_0_length + (7 * (week_of_year - 1))
return 1 + days_to_week + day_of_week
def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
"""Return a time struct based on the input string and the format string."""
global _TimeRE_cache, _regex_cache
with _cache_lock:
locale_time = _TimeRE_cache.locale_time
if (_getlang() != locale_time.lang or
time.tzname != locale_time.tzname or
time.daylight != locale_time.daylight):
_TimeRE_cache = TimeRE()
_regex_cache.clear()
locale_time = _TimeRE_cache.locale_time
if len(_regex_cache) > _CACHE_MAX_SIZE:
_regex_cache.clear()
format_regex = _regex_cache.get(format)
if not format_regex:
try:
format_regex = _TimeRE_cache.compile(format)
# KeyError raised when a bad format is found; can be specified as
# \\, in which case it was a stray % but with a space after it
except KeyError, err:
bad_directive = err.args[0]
if bad_directive == "\\":
bad_directive = "%"
del err
raise ValueError("'%s' is a bad directive in format '%s'" %
(bad_directive, format))
# IndexError only occurs when the format string is "%"
except IndexError:
raise ValueError("stray %% in format '%s'" % format)
_regex_cache[format] = format_regex
found = format_regex.match(data_string)
if not found:
raise ValueError("time data %r does not match format %r" %
(data_string, format))
if len(data_string) != found.end():
raise ValueError("unconverted data remains: %s" %
data_string[found.end():])
year = None
month = day = 1
hour = minute = second = fraction = 0
tz = -1
# Default to -1 to signify that values not known; not critical to have,
# though
week_of_year = -1
week_of_year_start = -1
# weekday and julian defaulted to None so as to signal need to calculate
# values
weekday = julian = None
found_dict = found.groupdict()
for group_key in found_dict.iterkeys():
# Directives not explicitly handled below:
# c, x, X
# handled by making out of other directives
# U, W
# worthless without day of the week
if group_key == 'y':
year = int(found_dict['y'])
# Open Group specification for strptime() states that a %y
#value in the range of [00, 68] is in the century 2000, while
#[69,99] is in the century 1900
if year <= 68:
year += 2000
else:
year += 1900
elif group_key == 'Y':
year = int(found_dict['Y'])
elif group_key == 'm':
month = int(found_dict['m'])
elif group_key == 'B':
month = locale_time.f_month.index(found_dict['B'].lower())
elif group_key == 'b':
month = locale_time.a_month.index(found_dict['b'].lower())
elif group_key == 'd':
day = int(found_dict['d'])
elif group_key == 'H':
hour = int(found_dict['H'])
elif group_key == 'I':
hour = int(found_dict['I'])
ampm = found_dict.get('p', '').lower()
# If there was no AM/PM indicator, we'll treat this like AM
if ampm in ('', locale_time.am_pm[0]):
# We're in AM so the hour is correct unless we're
# looking at 12 midnight.
# 12 midnight == 12 AM == hour 0
if hour == 12:
hour = 0
elif ampm == locale_time.am_pm[1]:
# We're in PM so we need to add 12 to the hour unless
# we're looking at 12 noon.
# 12 noon == 12 PM == hour 12
if hour != 12:
hour += 12
elif group_key == 'M':
minute = int(found_dict['M'])
elif group_key == 'S':
second = int(found_dict['S'])
elif group_key == 'f':
s = found_dict['f']
# Pad to always return microseconds.
s += "0" * (6 - len(s))
fraction = int(s)
elif group_key == 'A':
weekday = locale_time.f_weekday.index(found_dict['A'].lower())
elif group_key == 'a':
weekday = locale_time.a_weekday.index(found_dict['a'].lower())
elif group_key == 'w':
weekday = int(found_dict['w'])
if weekday == 0:
weekday = 6
else:
weekday -= 1
elif group_key == 'j':
julian = int(found_dict['j'])
elif group_key in ('U', 'W'):
week_of_year = int(found_dict[group_key])
if group_key == 'U':
# U starts week on Sunday.
week_of_year_start = 6
else:
# W starts week on Monday.
week_of_year_start = 0
elif group_key == 'Z':
# Since -1 is default value only need to worry about setting tz if
# it can be something other than -1.
found_zone = found_dict['Z'].lower()
for value, tz_values in enumerate(locale_time.timezone):
if found_zone in tz_values:
# Deal with bad locale setup where timezone names are the
# same and yet time.daylight is true; too ambiguous to
# be able to tell what timezone has daylight savings
if (time.tzname[0] == time.tzname[1] and
time.daylight and found_zone not in ("utc", "gmt")):
break
else:
tz = value
break
leap_year_fix = False
if year is None and month == 2 and day == 29:
year = 1904 # 1904 is first leap year of 20th century
leap_year_fix = True
elif year is None:
year = 1900
# If we know the week of the year and what day of that week, we can figure
# out the Julian day of the year.
if julian is None and week_of_year != -1 and weekday is not None:
week_starts_Mon = True if week_of_year_start == 0 else False
julian = _calc_julian_from_U_or_W(year, week_of_year, weekday,
week_starts_Mon)
if julian <= 0:
year -= 1
yday = 366 if calendar.isleap(year) else 365
julian += yday
# Cannot pre-calculate datetime_date() since can change in Julian
# calculation and thus could have different value for the day of the week
# calculation.
if julian is None:
# Need to add 1 to result since first day of the year is 1, not 0.
julian = datetime_date(year, month, day).toordinal() - \
datetime_date(year, 1, 1).toordinal() + 1
else: # Assume that if they bothered to include Julian day it will
# be accurate.
datetime_result = datetime_date.fromordinal((julian - 1) + datetime_date(year, 1, 1).toordinal())
year = datetime_result.year
month = datetime_result.month
day = datetime_result.day
if weekday is None:
weekday = datetime_date(year, month, day).weekday()
if leap_year_fix:
# the caller didn't supply a year but asked for Feb 29th. We couldn't
# use the default of 1900 for computations. We set it back to ensure
# that February 29th is smaller than March 1st.
year = 1900
return (time.struct_time((year, month, day,
hour, minute, second,
weekday, julian, tz)), fraction)
def _strptime_time(data_string, format="%a %b %d %H:%M:%S %Y"):
return _strptime(data_string, format)[0]
"""Thread-local objects.
(Note that this module provides a Python version of the threading.local
class. Depending on the version of Python you're using, there may be a
faster one available. You should always import the `local` class from
`threading`.)
Thread-local objects support the management of thread-local data.
If you have data that you want to be local to a thread, simply create
a thread-local object and use its attributes:
>>> mydata = local()
>>> mydata.number = 42
>>> mydata.number
42
You can also access the local-object's dictionary:
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
What's important about thread-local objects is that their data are
local to a thread. If we access the data in a different thread:
>>> log = []
>>> def f():
... items = mydata.__dict__.items()
... items.sort()
... log.append(items)
... mydata.number = 11
... log.append(mydata.number)
>>> import threading
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
>>> log
[[], 11]
we get different data. Furthermore, changes made in the other thread
don't affect data seen in this thread:
>>> mydata.number
42
Of course, values you get from a local object, including a __dict__
attribute, are for whatever thread was current at the time the
attribute was read. For that reason, you generally don't want to save
these values across threads, as they apply only to the thread they
came from.
You can create custom local objects by subclassing the local class:
>>> class MyLocal(local):
... number = 2
... def __init__(self, **kw):
... self.__dict__.update(kw)
... def squared(self):
... return self.number ** 2
This can be useful to support default values, methods and
initialization. Note that if you define an __init__ method, it will be
called each time the local object is used in a separate thread. This
is necessary to initialize each thread's dictionary.
Now if we create a local object:
>>> mydata = MyLocal(color='red')
Now we have a default number:
>>> mydata.number
2
an initial color:
>>> mydata.color
'red'
>>> del mydata.color
And a method that operates on the data:
>>> mydata.squared()
4
As before, we can access the data in a separate thread:
>>> log = []
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
>>> log
[[('color', 'red')], 11]
without affecting this thread's data:
>>> mydata.number
2
>>> mydata.color
Traceback (most recent call last):
...
AttributeError: 'MyLocal' object has no attribute 'color'
Note that subclasses can define slots, but they are not thread
local. They are shared across threads:
>>> class MyLocal(local):
... __slots__ = 'number'
>>> mydata = MyLocal()
>>> mydata.number = 42
>>> mydata.color = 'red'
So, the separate thread:
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
affects what we see:
>>> mydata.number
11
>>> del mydata
"""
__all__ = ["local"]
# We need to use objects from the threading module, but the threading
# module may also want to use our `local` class, if support for locals
# isn't compiled in to the `thread` module. This creates potential problems
# with circular imports. For that reason, we don't import `threading`
# until the bottom of this file (a hack sufficient to worm around the
# potential problems). Note that almost all platforms do have support for
# locals in the `thread` module, and there is no circular import problem
# then, so problems introduced by fiddling the order of imports here won't
# manifest on most boxes.
class _localbase(object):
__slots__ = '_local__key', '_local__args', '_local__lock'
def __new__(cls, *args, **kw):
self = object.__new__(cls)
key = '_local__key', 'thread.local.' + str(id(self))
object.__setattr__(self, '_local__key', key)
object.__setattr__(self, '_local__args', (args, kw))
object.__setattr__(self, '_local__lock', RLock())
if (args or kw) and (cls.__init__ is object.__init__):
raise TypeError("Initialization arguments are not supported")
# We need to create the thread dict in anticipation of
# __init__ being called, to make sure we don't call it
# again ourselves.
dict = object.__getattribute__(self, '__dict__')
current_thread().__dict__[key] = dict
return self
def _patch(self):
key = object.__getattribute__(self, '_local__key')
d = current_thread().__dict__.get(key)
if d is None:
d = {}
current_thread().__dict__[key] = d
object.__setattr__(self, '__dict__', d)
# we have a new instance dict, so call out __init__ if we have
# one
cls = type(self)
if cls.__init__ is not object.__init__:
args, kw = object.__getattribute__(self, '_local__args')
cls.__init__(self, *args, **kw)
else:
object.__setattr__(self, '__dict__', d)
class local(_localbase):
def __getattribute__(self, name):
lock = object.__getattribute__(self, '_local__lock')
lock.acquire()
try:
_patch(self)
return object.__getattribute__(self, name)
finally:
lock.release()
def __setattr__(self, name, value):
if name == '__dict__':
raise AttributeError(
"%r object attribute '__dict__' is read-only"
% self.__class__.__name__)
lock = object.__getattribute__(self, '_local__lock')
lock.acquire()
try:
_patch(self)
return object.__setattr__(self, name, value)
finally:
lock.release()
def __delattr__(self, name):
if name == '__dict__':
raise AttributeError(
"%r object attribute '__dict__' is read-only"
% self.__class__.__name__)
lock = object.__getattribute__(self, '_local__lock')
lock.acquire()
try:
_patch(self)
return object.__delattr__(self, name)
finally:
lock.release()
def __del__(self):
import threading
key = object.__getattribute__(self, '_local__key')
try:
# We use the non-locking API since we might already hold the lock
# (__del__ can be called at any point by the cyclic GC).
threads = threading._enumerate()
except:
# If enumerating the current threads fails, as it seems to do
# during shutdown, we'll skip cleanup under the assumption
# that there is nothing to clean up.
return
for thread in threads:
try:
__dict__ = thread.__dict__
except AttributeError:
# Thread is dying, rest in peace.
continue
if key in __dict__:
try:
del __dict__[key]
except KeyError:
pass # didn't have anything in this thread
from threading import current_thread, RLock
# Access WeakSet through the weakref module.
# This code is separated-out because it is needed
# by abc.py to load everything else at startup.
from _weakref import ref
__all__ = ['WeakSet']
class _IterationGuard(object):
# This context manager registers itself in the current iterators of the
# weak container, such as to delay all removals until the context manager
# exits.
# This technique should be relatively thread-safe (since sets are).
def __init__(self, weakcontainer):
# Don't create cycles
self.weakcontainer = ref(weakcontainer)
def __enter__(self):
w = self.weakcontainer()
if w is not None:
w._iterating.add(self)
return self
def __exit__(self, e, t, b):
w = self.weakcontainer()
if w is not None:
s = w._iterating
s.remove(self)
if not s:
w._commit_removals()
class WeakSet(object):
def __init__(self, data=None):
self.data = set()
def _remove(item, selfref=ref(self)):
self = selfref()
if self is not None:
if self._iterating:
self._pending_removals.append(item)
else:
self.data.discard(item)
self._remove = _remove
# A list of keys to be removed
self._pending_removals = []
self._iterating = set()
if data is not None:
self.update(data)
def _commit_removals(self):
l = self._pending_removals
discard = self.data.discard
while l:
discard(l.pop())
def __iter__(self):
with _IterationGuard(self):
for itemref in self.data:
item = itemref()
if item is not None:
# Caveat: the iterator will keep a strong reference to
# `item` until it is resumed or closed.
yield item
def __len__(self):
return len(self.data) - len(self._pending_removals)
def __contains__(self, item):
try:
wr = ref(item)
# Issue #266 - somehow item was freed before wr hash was calculated
return wr in self.data
except TypeError:
return False
def __reduce__(self):
return (self.__class__, (list(self),),
getattr(self, '__dict__', None))
__hash__ = None
def add(self, item):
if self._pending_removals:
self._commit_removals()
self.data.add(ref(item, self._remove))
def clear(self):
if self._pending_removals:
self._commit_removals()
self.data.clear()
def copy(self):
return self.__class__(self)
def pop(self):
if self._pending_removals:
self._commit_removals()
while True:
try:
itemref = self.data.pop()
except KeyError:
raise KeyError('pop from empty WeakSet')
item = itemref()
if item is not None:
return item
def remove(self, item):
if self._pending_removals:
self._commit_removals()
self.data.remove(ref(item))
def discard(self, item):
if self._pending_removals:
self._commit_removals()
self.data.discard(ref(item))
def update(self, other):
if self._pending_removals:
self._commit_removals()
for element in other:
self.add(element)
def __ior__(self, other):
self.update(other)
return self
def difference(self, other):
newset = self.copy()
newset.difference_update(other)
return newset
__sub__ = difference
def difference_update(self, other):
self.__isub__(other)
def __isub__(self, other):
if self._pending_removals:
self._commit_removals()
if self is other:
self.data.clear()
else:
self.data.difference_update(ref(item) for item in other)
return self
def intersection(self, other):
return self.__class__(item for item in other if item in self)
__and__ = intersection
def intersection_update(self, other):
self.__iand__(other)
def __iand__(self, other):
if self._pending_removals:
self._commit_removals()
self.data.intersection_update(ref(item) for item in other)
return self
def issubset(self, other):
return self.data.issubset(ref(item) for item in other)
__le__ = issubset
def __lt__(self, other):
return self.data < set(ref(item) for item in other)
def issuperset(self, other):
return self.data.issuperset(ref(item) for item in other)
__ge__ = issuperset
def __gt__(self, other):
return self.data > set(ref(item) for item in other)
def __eq__(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
return self.data == set(ref(item) for item in other)
def __ne__(self, other):
opposite = self.__eq__(other)
if opposite is NotImplemented:
return NotImplemented
return not opposite
def symmetric_difference(self, other):
newset = self.copy()
newset.symmetric_difference_update(other)
return newset
__xor__ = symmetric_difference
def symmetric_difference_update(self, other):
self.__ixor__(other)
def __ixor__(self, other):
if self._pending_removals:
self._commit_removals()
if self is other:
self.data.clear()
else:
self.data.symmetric_difference_update(ref(item, self._remove) for item in other)
return self
def union(self, other):
return self.__class__(e for s in (self, other) for e in s)
__or__ = union
def isdisjoint(self, other):
return len(self.intersection(other)) == 0
"""Record of phased-in incompatible language changes.
Each line is of the form:
FeatureName = "_Feature(" OptionalRelease "," MandatoryRelease ","
CompilerFlag ")"
where, normally, OptionalRelease < MandatoryRelease, and both are 5-tuples
of the same form as sys.version_info:
(PY_MAJOR_VERSION, # the 2 in 2.1.0a3; an int
PY_MINOR_VERSION, # the 1; an int
PY_MICRO_VERSION, # the 0; an int
PY_RELEASE_LEVEL, # "alpha", "beta", "candidate" or "final"; string
PY_RELEASE_SERIAL # the 3; an int
)
OptionalRelease records the first release in which
from __future__ import FeatureName
was accepted.
In the case of MandatoryReleases that have not yet occurred,
MandatoryRelease predicts the release in which the feature will become part
of the language.
Else MandatoryRelease records when the feature became part of the language;
in releases at or after that, modules no longer need
from __future__ import FeatureName
to use the feature in question, but may continue to use such imports.
MandatoryRelease may also be None, meaning that a planned feature got
dropped.
Instances of class _Feature have two corresponding methods,
.getOptionalRelease() and .getMandatoryRelease().
CompilerFlag is the (bitfield) flag that should be passed in the fourth
argument to the builtin function compile() to enable the feature in
dynamically compiled code. This flag is stored in the .compiler_flag
attribute on _Future instances. These values must match the appropriate
#defines of CO_xxx flags in Include/compile.h.
No feature line is ever to be deleted from this file.
"""
all_feature_names = [
"nested_scopes",
"generators",
"division",
"absolute_import",
"with_statement",
"print_function",
"unicode_literals",
]
__all__ = ["all_feature_names"] + all_feature_names
# The CO_xxx symbols are defined here under the same names used by
# compile.h, so that an editor search will find them here. However,
# they're not exported in __all__, because they don't really belong to
# this module.
CO_NESTED = 0x0010 # nested_scopes
CO_GENERATOR_ALLOWED = 0 # generators (obsolete, was 0x1000)
CO_FUTURE_DIVISION = 0x2000 # division
CO_FUTURE_ABSOLUTE_IMPORT = 0x4000 # perform absolute imports by default
CO_FUTURE_WITH_STATEMENT = 0x8000 # with statement
CO_FUTURE_PRINT_FUNCTION = 0x10000 # print function
CO_FUTURE_UNICODE_LITERALS = 0x20000 # unicode string literals
class _Feature:
def __init__(self, optionalRelease, mandatoryRelease, compiler_flag):
self.optional = optionalRelease
self.mandatory = mandatoryRelease
self.compiler_flag = compiler_flag
def getOptionalRelease(self):
"""Return first release in which this feature was recognized.
This is a 5-tuple, of the same form as sys.version_info.
"""
return self.optional
def getMandatoryRelease(self):
"""Return release in which this feature will become mandatory.
This is a 5-tuple, of the same form as sys.version_info, or, if
the feature was dropped, is None.
"""
return self.mandatory
def __repr__(self):
return "_Feature" + repr((self.optional,
self.mandatory,
self.compiler_flag))
nested_scopes = _Feature((2, 1, 0, "beta", 1),
(2, 2, 0, "alpha", 0),
CO_NESTED)
generators = _Feature((2, 2, 0, "alpha", 1),
(2, 3, 0, "final", 0),
CO_GENERATOR_ALLOWED)
division = _Feature((2, 2, 0, "alpha", 2),
(3, 0, 0, "alpha", 0),
CO_FUTURE_DIVISION)
absolute_import = _Feature((2, 5, 0, "alpha", 1),
(3, 0, 0, "alpha", 0),
CO_FUTURE_ABSOLUTE_IMPORT)
with_statement = _Feature((2, 5, 0, "alpha", 1),
(2, 6, 0, "alpha", 0),
CO_FUTURE_WITH_STATEMENT)
print_function = _Feature((2, 6, 0, "alpha", 2),
(3, 0, 0, "alpha", 0),
CO_FUTURE_PRINT_FUNCTION)
unicode_literals = _Feature((2, 6, 0, "alpha", 2),
(3, 0, 0, "alpha", 0),
CO_FUTURE_UNICODE_LITERALS)
# This file exists as a helper for the test.test_frozen module.
md5: 252FC722FDA02D4874E1DC3C037CE449 | sha1: 64B875749CA01F5F762F29BCD875E2516D255AC5 | sha256: 05D31077C498C7337D48CBB68CB6D2F66A7505999F62251D2356E4E99FC37145 | sha512: 6175BA298CD377B76AC4A0F3D08041006B780B52A445924E2C6DA3AF6489C05A185EABDBAA94A0BCC9D9CEDB7300D25C1DA6B24F8F2E3F8F617DDFDA1DD5BD5A
md5: 0713BB4EDF45E9F992F28364D9C9F05C | sha1: 49BF0172557408EF808903A003248CDA708D1169 | sha256: EC8EBBB0206ED69D1574F330BEB9BC91960E1E7CFD8EFB8AEE7F17B165B3C6BA | sha512: B7A040445D7086F6908DC7E2AB897F220BA0D1972C7B22FDD720C8C9109621A0933A5D53B13E34B11B1EB29CE55BB42F6C9DF66873BDEBD5DBBAF8960F6EA293
"""Different kinds of SAX Exceptions"""
import sys
if sys.platform[:4] == "java":
from java.lang import Exception
del sys
# ===== SAXEXCEPTION =====
class SAXException(Exception):
"""Encapsulate an XML error or warning. This class can contain
basic error or warning information from either the XML parser or
the application: you can subclass it to provide additional
functionality, or to add localization. Note that although you will
receive a SAXException as the argument to the handlers in the
ErrorHandler interface, you are not actually required to raise
the exception; instead, you can simply read the information in
it."""
def __init__(self, msg, exception=None):
"""Creates an exception. The message is required, but the exception
is optional."""
self._msg = msg
self._exception = exception
Exception.__init__(self, msg)
def getMessage(self):
"Return a message for this exception."
return self._msg
def getException(self):
"Return the embedded exception, or None if there was none."
return self._exception
def __str__(self):
"Create a string representation of the exception."
return self._msg
def __getitem__(self, ix):
"""Avoids weird error messages if someone does exception[ix] by
mistake, since Exception has __getitem__ defined."""
raise AttributeError("__getitem__")
# ===== SAXPARSEEXCEPTION =====
class SAXParseException(SAXException):
"""Encapsulate an XML parse error or warning.
This exception will include information for locating the error in
the original XML document. Note that although the application will
receive a SAXParseException as the argument to the handlers in the
ErrorHandler interface, the application is not actually required
to raise the exception; instead, it can simply read the
information in it and take a different action.
Since this exception is a subclass of SAXException, it inherits
the ability to wrap another exception."""
def __init__(self, msg, exception, locator):
"Creates the exception. The exception parameter is allowed to be None."
SAXException.__init__(self, msg, exception)
self._locator = locator
# We need to cache this stuff at construction time.
# If this exception is raised, the objects through which we must
# traverse to get this information may be deleted by the time
# it gets caught.
self._systemId = self._locator.getSystemId()
self._colnum = self._locator.getColumnNumber()
self._linenum = self._locator.getLineNumber()
def getColumnNumber(self):
"""The column number of the end of the text where the exception
occurred."""
return self._colnum
def getLineNumber(self):
"The line number of the end of the text where the exception occurred."
return self._linenum
def getPublicId(self):
"Get the public identifier of the entity where the exception occurred."
return self._locator.getPublicId()
def getSystemId(self):
"Get the system identifier of the entity where the exception occurred."
return self._systemId
def __str__(self):
"Create a string representation of the exception."
sysid = self.getSystemId()
if sysid is None:
sysid = "<unknown>"
linenum = self.getLineNumber()
if linenum is None:
linenum = "?"
colnum = self.getColumnNumber()
if colnum is None:
colnum = "?"
return "%s:%s:%s: %s" % (sysid, linenum, colnum, self._msg)
# ===== SAXNOTRECOGNIZEDEXCEPTION =====
class SAXNotRecognizedException(SAXException):
"""Exception class for an unrecognized identifier.
An XMLReader will raise this exception when it is confronted with an
unrecognized feature or property. SAX applications and extensions may
use this class for similar purposes."""
# ===== SAXNOTSUPPORTEDEXCEPTION =====
class SAXNotSupportedException(SAXException):
"""Exception class for an unsupported operation.
An XMLReader will raise this exception when a service it cannot
perform is requested (specifically setting a state or value). SAX
applications and extensions may use this class for similar
purposes."""
# ===== SAXNOTSUPPORTEDEXCEPTION =====
class SAXReaderNotAvailable(SAXNotSupportedException):
"""Exception class for a missing driver.
An XMLReader module (driver) should raise this exception when it
is first imported, e.g. when a support module cannot be imported.
It also may be raised during parsing, e.g. if executing an external
program is not permitted."""
"""An XML Reader is the SAX 2 name for an XML parser. XML Parsers
should be based on this code. """
import handler
from _exceptions import SAXNotSupportedException, SAXNotRecognizedException
# ===== XMLREADER =====
class XMLReader:
"""Interface for reading an XML document using callbacks.
XMLReader is the interface that an XML parser's SAX2 driver must
implement. This interface allows an application to set and query
features and properties in the parser, to register event handlers
for document processing, and to initiate a document parse.
All SAX interfaces are assumed to be synchronous: the parse
methods must not return until parsing is complete, and readers
must wait for an event-handler callback to return before reporting
the next event."""
def __init__(self):
self._cont_handler = handler.ContentHandler()
self._dtd_handler = handler.DTDHandler()
self._ent_handler = handler.EntityResolver()
self._err_handler = handler.ErrorHandler()
def parse(self, source):
"Parse an XML document from a system identifier or an InputSource."
raise NotImplementedError("This method must be implemented!")
def getContentHandler(self):
"Returns the current ContentHandler."
return self._cont_handler
def setContentHandler(self, handler):
"Registers a new object to receive document content events."
self._cont_handler = handler
def getDTDHandler(self):
"Returns the current DTD handler."
return self._dtd_handler
def setDTDHandler(self, handler):
"Register an object to receive basic DTD-related events."
self._dtd_handler = handler
def getEntityResolver(self):
"Returns the current EntityResolver."
return self._ent_handler
def setEntityResolver(self, resolver):
"Register an object to resolve external entities."
self._ent_handler = resolver
def getErrorHandler(self):
"Returns the current ErrorHandler."
return self._err_handler
def setErrorHandler(self, handler):
"Register an object to receive error-message events."
self._err_handler = handler
def setLocale(self, locale):
"""Allow an application to set the locale for errors and warnings.
SAX parsers are not required to provide localization for errors
and warnings; if they cannot support the requested locale,
however, they must raise a SAX exception. Applications may
request a locale change in the middle of a parse."""
raise SAXNotSupportedException("Locale support not implemented")
def getFeature(self, name):
"Looks up and returns the state of a SAX2 feature."
raise SAXNotRecognizedException("Feature '%s' not recognized" % name)
def setFeature(self, name, state):
"Sets the state of a SAX2 feature."
raise SAXNotRecognizedException("Feature '%s' not recognized" % name)
def getProperty(self, name):
"Looks up and returns the value of a SAX2 property."
raise SAXNotRecognizedException("Property '%s' not recognized" % name)
def setProperty(self, name, value):
"Sets the value of a SAX2 property."
raise SAXNotRecognizedException("Property '%s' not recognized" % name)
class IncrementalParser(XMLReader):
"""This interface adds three extra methods to the XMLReader
interface that allow XML parsers to support incremental
parsing. Support for this interface is optional, since not all
underlying XML parsers support this functionality.
When the parser is instantiated it is ready to begin accepting
data from the feed method immediately. After parsing has been
finished with a call to close the reset method must be called to
make the parser ready to accept new data, either from feed or
using the parse method.
Note that these methods must _not_ be called during parsing, that
is, after parse has been called and before it returns.
By default, the class also implements the parse method of the XMLReader
interface using the feed, close and reset methods of the
IncrementalParser interface as a convenience to SAX 2.0 driver
writers."""
def __init__(self, bufsize=2**16):
self._bufsize = bufsize
XMLReader.__init__(self)
def parse(self, source):
import saxutils
source = saxutils.prepare_input_source(source)
self.prepareParser(source)
file = source.getByteStream()
buffer = file.read(self._bufsize)
while buffer != "":
self.feed(buffer)
buffer = file.read(self._bufsize)
self.close()
def feed(self, data):
"""This method gives the raw XML data in the data parameter to
the parser and makes it parse the data, emitting the
corresponding events. It is allowed for XML constructs to be
split across several calls to feed.
feed may raise SAXException."""
raise NotImplementedError("This method must be implemented!")
def prepareParser(self, source):
"""This method is called by the parse implementation to allow
the SAX 2.0 driver to prepare itself for parsing."""
raise NotImplementedError("prepareParser must be overridden!")
def close(self):
"""This method is called when the entire XML document has been
passed to the parser through the feed method, to notify the
parser that there are no more data. This allows the parser to
do the final checks on the document and empty the internal
data buffer.
The parser will not be ready to parse another document until
the reset method has been called.
close may raise SAXException."""
raise NotImplementedError("This method must be implemented!")
def reset(self):
"""This method is called after close has been called to reset
the parser so that it is ready to parse new documents. The
results of calling parse or feed after close without calling
reset are undefined."""
raise NotImplementedError("This method must be implemented!")
# ===== LOCATOR =====
class Locator:
"""Interface for associating a SAX event with a document
location. A locator object will return valid results only during
calls to DocumentHandler methods; at any other time, the
results are unpredictable."""
def getColumnNumber(self):
"Return the column number where the current event ends."
return -1
def getLineNumber(self):
"Return the line number where the current event ends."
return -1
def getPublicId(self):
"Return the public identifier for the current event."
return None
def getSystemId(self):
"Return the system identifier for the current event."
return None
# ===== INPUTSOURCE =====
class InputSource:
"""Encapsulation of the information needed by the XMLReader to
read entities.
This class may include information about the public identifier,
system identifier, byte stream (possibly with character encoding
information) and/or the character stream of an entity.
Applications will create objects of this class for use in the
XMLReader.parse method and for returning from
EntityResolver.resolveEntity.
An InputSource belongs to the application, the XMLReader is not
allowed to modify InputSource objects passed to it from the
application, although it may make copies and modify those."""
def __init__(self, system_id = None):
self.__system_id = system_id
self.__public_id = None
self.__encoding = None
self.__bytefile = None
self.__charfile = None
def setPublicId(self, public_id):
"Sets the public identifier of this InputSource."
self.__public_id = public_id
def getPublicId(self):
"Returns the public identifier of this InputSource."
return self.__public_id
def setSystemId(self, system_id):
"Sets the system identifier of this InputSource."
self.__system_id = system_id
def getSystemId(self):
"Returns the system identifier of this InputSource."
return self.__system_id
def setEncoding(self, encoding):
"""Sets the character encoding of this InputSource.
The encoding must be a string acceptable for an XML encoding
declaration (see section 4.3.3 of the XML recommendation).
The encoding attribute of the InputSource is ignored if the
InputSource also contains a character stream."""
self.__encoding = encoding
def getEncoding(self):
"Get the character encoding of this InputSource."
return self.__encoding
def setByteStream(self, bytefile):
"""Set the byte stream (a Python file-like object which does
not perform byte-to-character conversion) for this input
source.
The SAX parser will ignore this if there is also a character
stream specified, but it will use a byte stream in preference
to opening a URI connection itself.
If the application knows the character encoding of the byte
stream, it should set it with the setEncoding method."""
self.__bytefile = bytefile
def getByteStream(self):
"""Get the byte stream for this input source.
The getEncoding method will return the character encoding for
this byte stream, or None if unknown."""
return self.__bytefile
def setCharacterStream(self, charfile):
"""Set the character stream for this input source. (The stream
must be a Python 2.0 Unicode-wrapped file-like that performs
conversion to Unicode strings.)
If there is a character stream specified, the SAX parser will
ignore any byte stream and will not attempt to open a URI
connection to the system identifier."""
self.__charfile = charfile
def getCharacterStream(self):
"Get the character stream for this input source."
return self.__charfile
# ===== ATTRIBUTESIMPL =====
class AttributesImpl:
def __init__(self, attrs):
"""Non-NS-aware implementation.
attrs should be of the form {name : value}."""
self._attrs = attrs
def getLength(self):
return len(self._attrs)
def getType(self, name):
return "CDATA"
def getValue(self, name):
return self._attrs[name]
def getValueByQName(self, name):
return self._attrs[name]
def getNameByQName(self, name):
if not name in self._attrs:
raise KeyError, name
return name
def getQNameByName(self, name):
if not name in self._attrs:
raise KeyError, name
return name
def getNames(self):
return self._attrs.keys()
def getQNames(self):
return self._attrs.keys()
def __len__(self):
return len(self._attrs)
def __getitem__(self, name):
return self._attrs[name]
def keys(self):
return self._attrs.keys()
def has_key(self, name):
return name in self._attrs
def __contains__(self, name):
return name in self._attrs
def get(self, name, alternative=None):
return self._attrs.get(name, alternative)
def copy(self):
return self.__class__(self._attrs)
def items(self):
return self._attrs.items()
def values(self):
return self._attrs.values()
# ===== ATTRIBUTESNSIMPL =====
class AttributesNSImpl(AttributesImpl):
def __init__(self, attrs, qnames):
"""NS-aware implementation.
attrs should be of the form {(ns_uri, lname): value, ...}.
qnames of the form {(ns_uri, lname): qname, ...}."""
self._attrs = attrs
self._qnames = qnames
def getValueByQName(self, name):
for (nsname, qname) in self._qnames.items():
if qname == name:
return self._attrs[nsname]
raise KeyError, name
def getNameByQName(self, name):
for (nsname, qname) in self._qnames.items():
if qname == name:
return nsname
raise KeyError, name
def getQNameByName(self, name):
return self._qnames[name]
def getQNames(self):
return self._qnames.values()
def copy(self):
return self.__class__(self._attrs, self._qnames)
def _test():
XMLReader()
IncrementalParser()
Locator()
if __name__ == "__main__":
_test()
"""\
A library of useful helper classes to the SAX classes, for the
convenience of application and driver writers.
"""
import os, urlparse, urllib, types
import io
import sys
import handler
import xmlreader
try:
_StringTypes = [types.StringType, types.UnicodeType]
except AttributeError:
_StringTypes = [types.StringType]
def __dict_replace(s, d):
"""Replace substrings of a string using a dictionary."""
for key, value in d.items():
s = s.replace(key, value)
return s
def escape(data, entities={}):
"""Escape &, <, and > in a string of data.
You can escape other strings of data by passing a dictionary as
the optional entities parameter. The keys and values must all be
strings; each key will be replaced with its corresponding value.
"""
# must do ampersand first
data = data.replace("&", "&")
data = data.replace(">", ">")
data = data.replace("<", "<")
if entities:
data = __dict_replace(data, entities)
return data
def unescape(data, entities={}):
"""Unescape &, <, and > in a string of data.
You can unescape other strings of data by passing a dictionary as
the optional entities parameter. The keys and values must all be
strings; each key will be replaced with its corresponding value.
"""
data = data.replace("<", "<")
data = data.replace(">", ">")
if entities:
data = __dict_replace(data, entities)
# must do ampersand last
return data.replace("&", "&")
def quoteattr(data, entities={}):
"""Escape and quote an attribute value.
Escape &, <, and > in a string of data, then quote it for use as
an attribute value. The \" character will be escaped as well, if
necessary.
You can escape other strings of data by passing a dictionary as
the optional entities parameter. The keys and values must all be
strings; each key will be replaced with its corresponding value.
"""
entities = entities.copy()
entities.update({'\n': ' ', '\r': ' ', '\t':'	'})
data = escape(data, entities)
if '"' in data:
if "'" in data:
data = '"%s"' % data.replace('"', """)
else:
data = "'%s'" % data
else:
data = '"%s"' % data
return data
def _gettextwriter(out, encoding):
if out is None:
import sys
out = sys.stdout
if isinstance(out, io.RawIOBase):
buffer = io.BufferedIOBase(out)
# Keep the original file open when the TextIOWrapper is
# destroyed
buffer.close = lambda: None
else:
# This is to handle passed objects that aren't in the
# IOBase hierarchy, but just have a write method
buffer = io.BufferedIOBase()
buffer.writable = lambda: True
buffer.write = out.write
try:
# TextIOWrapper uses this methods to determine
# if BOM (for UTF-16, etc) should be added
buffer.seekable = out.seekable
buffer.tell = out.tell
except AttributeError:
pass
# wrap a binary writer with TextIOWrapper
return _UnbufferedTextIOWrapper(buffer, encoding=encoding,
errors='xmlcharrefreplace',
newline='\n')
class _UnbufferedTextIOWrapper(io.TextIOWrapper):
def write(self, s):
super(_UnbufferedTextIOWrapper, self).write(s)
self.flush()
class XMLGenerator(handler.ContentHandler):
def __init__(self, out=None, encoding="iso-8859-1"):
handler.ContentHandler.__init__(self)
out = _gettextwriter(out, encoding)
self._write = out.write
self._flush = out.flush
self._ns_contexts = [{}] # contains uri -> prefix dicts
self._current_context = self._ns_contexts[-1]
self._undeclared_ns_maps = []
self._encoding = encoding
def _qname(self, name):
"""Builds a qualified name from a (ns_url, localname) pair"""
if name[0]:
# Per http://www.w3.org/XML/1998/namespace, The 'xml' prefix is
# bound by definition to http://www.w3.org/XML/1998/namespace. It
# does not need to be declared and will not usually be found in
# self._current_context.
if 'http://www.w3.org/XML/1998/namespace' == name[0]:
return 'xml:' + name[1]
# The name is in a non-empty namespace
prefix = self._current_context[name[0]]
if prefix:
# If it is not the default namespace, prepend the prefix
return prefix + ":" + name[1]
# Return the unqualified name
return name[1]
# ContentHandler methods
def startDocument(self):
self._write(u'<?xml version="1.0" encoding="%s"?>\n' %
self._encoding)
def endDocument(self):
self._flush()
def startPrefixMapping(self, prefix, uri):
self._ns_contexts.append(self._current_context.copy())
self._current_context[uri] = prefix
self._undeclared_ns_maps.append((prefix, uri))
def endPrefixMapping(self, prefix):
self._current_context = self._ns_contexts[-1]
del self._ns_contexts[-1]
def startElement(self, name, attrs):
self._write(u'<' + name)
for (name, value) in attrs.items():
self._write(u' %s=%s' % (name, quoteattr(value)))
self._write(u'>')
def endElement(self, name):
self._write(u'</%s>' % name)
def startElementNS(self, name, qname, attrs):
self._write(u'<' + self._qname(name))
for prefix, uri in self._undeclared_ns_maps:
if prefix:
self._write(u' xmlns:%s="%s"' % (prefix, uri))
else:
self._write(u' xmlns="%s"' % uri)
self._undeclared_ns_maps = []
for (name, value) in attrs.items():
self._write(u' %s=%s' % (self._qname(name), quoteattr(value)))
self._write(u'>')
def endElementNS(self, name, qname):
self._write(u'</%s>' % self._qname(name))
def characters(self, content):
if not isinstance(content, unicode):
content = unicode(content, self._encoding)
self._write(escape(content))
def ignorableWhitespace(self, content):
if not isinstance(content, unicode):
content = unicode(content, self._encoding)
self._write(content)
def processingInstruction(self, target, data):
self._write(u'<?%s %s?>' % (target, data))
class XMLFilterBase(xmlreader.XMLReader):
"""This class is designed to sit between an XMLReader and the
client application's event handlers. By default, it does nothing
but pass requests up to the reader and events on to the handlers
unmodified, but subclasses can override specific methods to modify
the event stream or the configuration requests as they pass
through."""
def __init__(self, parent = None):
xmlreader.XMLReader.__init__(self)
self._parent = parent
# ErrorHandler methods
def error(self, exception):
self._err_handler.error(exception)
def fatalError(self, exception):
self._err_handler.fatalError(exception)
def warning(self, exception):
self._err_handler.warning(exception)
# ContentHandler methods
def setDocumentLocator(self, locator):
self._cont_handler.setDocumentLocator(locator)
def startDocument(self):
self._cont_handler.startDocument()
def endDocument(self):
self._cont_handler.endDocument()
def startPrefixMapping(self, prefix, uri):
self._cont_handler.startPrefixMapping(prefix, uri)
def endPrefixMapping(self, prefix):
self._cont_handler.endPrefixMapping(prefix)
def startElement(self, name, attrs):
self._cont_handler.startElement(name, attrs)
def endElement(self, name):
self._cont_handler.endElement(name)
def startElementNS(self, name, qname, attrs):
self._cont_handler.startElementNS(name, qname, attrs)
def endElementNS(self, name, qname):
self._cont_handler.endElementNS(name, qname)
def characters(self, content):
self._cont_handler.characters(content)
def ignorableWhitespace(self, chars):
self._cont_handler.ignorableWhitespace(chars)
def processingInstruction(self, target, data):
self._cont_handler.processingInstruction(target, data)
def skippedEntity(self, name):
self._cont_handler.skippedEntity(name)
# DTDHandler methods
def notationDecl(self, name, publicId, systemId):
self._dtd_handler.notationDecl(name, publicId, systemId)
def unparsedEntityDecl(self, name, publicId, systemId, ndata):
self._dtd_handler.unparsedEntityDecl(name, publicId, systemId, ndata)
# EntityResolver methods
def resolveEntity(self, publicId, systemId):
return self._ent_handler.resolveEntity(publicId, systemId)
# XMLReader methods
def parse(self, source):
self._parent.setContentHandler(self)
self._parent.setErrorHandler(self)
self._parent.setEntityResolver(self)
self._parent.setDTDHandler(self)
self._parent.parse(source)
def setLocale(self, locale):
self._parent.setLocale(locale)
def getFeature(self, name):
return self._parent.getFeature(name)
def setFeature(self, name, state):
self._parent.setFeature(name, state)
def getProperty(self, name):
return self._parent.getProperty(name)
def setProperty(self, name, value):
self._parent.setProperty(name, value)
# XMLFilter methods
def getParent(self):
return self._parent
def setParent(self, parent):
self._parent = parent
# --- Utility functions
def prepare_input_source(source, base = ""):
"""This function takes an InputSource and an optional base URL and
returns a fully resolved InputSource object ready for reading."""
if type(source) in _StringTypes:
source = xmlreader.InputSource(source)
elif hasattr(source, "read"):
f = source
source = xmlreader.InputSource()
source.setByteStream(f)
if hasattr(f, "name"):
source.setSystemId(f.name)
if source.getByteStream() is None:
try:
sysid = source.getSystemId()
basehead = os.path.dirname(os.path.normpath(base))
encoding = sys.getfilesystemencoding()
if isinstance(sysid, unicode):
if not isinstance(basehead, unicode):
try:
basehead = basehead.decode(encoding)
except UnicodeDecodeError:
sysid = sysid.encode(encoding)
else:
if isinstance(basehead, unicode):
try:
sysid = sysid.decode(encoding)
except UnicodeDecodeError:
basehead = basehead.encode(encoding)
sysidfilename = os.path.join(basehead, sysid)
isfile = os.path.isfile(sysidfilename)
except UnicodeError:
isfile = False
if isfile:
source.setSystemId(sysidfilename)
f = open(sysidfilename, "rb")
else:
source.setSystemId(urlparse.urljoin(base, source.getSystemId()))
f = urllib.urlopen(source.getSystemId())
source.setByteStream(f)
return source
"""
This module contains the core classes of version 2.0 of SAX for Python.
This file provides only default classes with absolutely minimum
functionality, from which drivers and applications can be subclassed.
Many of these classes are empty and are included only as documentation
of the interfaces.
$Id$
"""
version = '2.0beta'
#============================================================================
#
# HANDLER INTERFACES
#
#============================================================================
# ===== ERRORHANDLER =====
class ErrorHandler:
"""Basic interface for SAX error handlers.
If you create an object that implements this interface, then
register the object with your XMLReader, the parser will call the
methods in your object to report all warnings and errors. There
are three levels of errors available: warnings, (possibly)
recoverable errors, and unrecoverable errors. All methods take a
SAXParseException as the only parameter."""
def error(self, exception):
"Handle a recoverable error."
raise exception
def fatalError(self, exception):
"Handle a non-recoverable error."
raise exception
def warning(self, exception):
"Handle a warning."
print exception
# ===== CONTENTHANDLER =====
class ContentHandler:
"""Interface for receiving logical document content events.
This is the main callback interface in SAX, and the one most
important to applications. The order of events in this interface
mirrors the order of the information in the document."""
def __init__(self):
self._locator = None
def setDocumentLocator(self, locator):
"""Called by the parser to give the application a locator for
locating the origin of document events.
SAX parsers are strongly encouraged (though not absolutely
required) to supply a locator: if it does so, it must supply
the locator to the application by invoking this method before
invoking any of the other methods in the DocumentHandler
interface.
The locator allows the application to determine the end
position of any document-related event, even if the parser is
not reporting an error. Typically, the application will use
this information for reporting its own errors (such as
character content that does not match an application's
business rules). The information returned by the locator is
probably not sufficient for use with a search engine.
Note that the locator will return correct information only
during the invocation of the events in this interface. The
application should not attempt to use it at any other time."""
self._locator = locator
def startDocument(self):
"""Receive notification of the beginning of a document.
The SAX parser will invoke this method only once, before any
other methods in this interface or in DTDHandler (except for
setDocumentLocator)."""
def endDocument(self):
"""Receive notification of the end of a document.
The SAX parser will invoke this method only once, and it will
be the last method invoked during the parse. The parser shall
not invoke this method until it has either abandoned parsing
(because of an unrecoverable error) or reached the end of
input."""
def startPrefixMapping(self, prefix, uri):
"""Begin the scope of a prefix-URI Namespace mapping.
The information from this event is not necessary for normal
Namespace processing: the SAX XML reader will automatically
replace prefixes for element and attribute names when the
http://xml.org/sax/features/namespaces feature is true (the
default).
There are cases, however, when applications need to use
prefixes in character data or in attribute values, where they
cannot safely be expanded automatically; the
start/endPrefixMapping event supplies the information to the
application to expand prefixes in those contexts itself, if
necessary.
Note that start/endPrefixMapping events are not guaranteed to
be properly nested relative to each-other: all
startPrefixMapping events will occur before the corresponding
startElement event, and all endPrefixMapping events will occur
after the corresponding endElement event, but their order is
not guaranteed."""
def endPrefixMapping(self, prefix):
"""End the scope of a prefix-URI mapping.
See startPrefixMapping for details. This event will always
occur after the corresponding endElement event, but the order
of endPrefixMapping events is not otherwise guaranteed."""
def startElement(self, name, attrs):
"""Signals the start of an element in non-namespace mode.
The name parameter contains the raw XML 1.0 name of the
element type as a string and the attrs parameter holds an
instance of the Attributes class containing the attributes of
the element."""
def endElement(self, name):
"""Signals the end of an element in non-namespace mode.
The name parameter contains the name of the element type, just
as with the startElement event."""
def startElementNS(self, name, qname, attrs):
"""Signals the start of an element in namespace mode.
The name parameter contains the name of the element type as a
(uri, localname) tuple, the qname parameter the raw XML 1.0
name used in the source document, and the attrs parameter
holds an instance of the Attributes class containing the
attributes of the element.
The uri part of the name tuple is None for elements which have
no namespace."""
def endElementNS(self, name, qname):
"""Signals the end of an element in namespace mode.
The name parameter contains the name of the element type, just
as with the startElementNS event."""
def characters(self, content):
"""Receive notification of character data.
The Parser will call this method to report each chunk of
character data. SAX parsers may return all contiguous
character data in a single chunk, or they may split it into
several chunks; however, all of the characters in any single
event must come from the same external entity so that the
Locator provides useful information."""
def ignorableWhitespace(self, whitespace):
"""Receive notification of ignorable whitespace in element content.
Validating Parsers must use this method to report each chunk
of ignorable whitespace (see the W3C XML 1.0 recommendation,
section 2.10): non-validating parsers may also use this method
if they are capable of parsing and using content models.
SAX parsers may return all contiguous whitespace in a single
chunk, or they may split it into several chunks; however, all
of the characters in any single event must come from the same
external entity, so that the Locator provides useful
information."""
def processingInstruction(self, target, data):
"""Receive notification of a processing instruction.
The Parser will invoke this method once for each processing
instruction found: note that processing instructions may occur
before or after the main document element.
A SAX parser should never report an XML declaration (XML 1.0,
section 2.8) or a text declaration (XML 1.0, section 4.3.1)
using this method."""
def skippedEntity(self, name):
"""Receive notification of a skipped entity.
The Parser will invoke this method once for each entity
skipped. Non-validating processors may skip entities if they
have not seen the declarations (because, for example, the
entity was declared in an external DTD subset). All processors
may skip external entities, depending on the values of the
http://xml.org/sax/features/external-general-entities and the
http://xml.org/sax/features/external-parameter-entities
properties."""
# ===== DTDHandler =====
class DTDHandler:
"""Handle DTD events.
This interface specifies only those DTD events required for basic
parsing (unparsed entities and attributes)."""
def notationDecl(self, name, publicId, systemId):
"Handle a notation declaration event."
def unparsedEntityDecl(self, name, publicId, systemId, ndata):
"Handle an unparsed entity declaration event."
# ===== ENTITYRESOLVER =====
class EntityResolver:
"""Basic interface for resolving entities. If you create an object
implementing this interface, then register the object with your
Parser, the parser will call the method in your object to
resolve all external entities. Note that DefaultHandler implements
this interface with the default behaviour."""
def resolveEntity(self, publicId, systemId):
"""Resolve the system identifier of an entity and return either
the system identifier to read from as a string, or an InputSource
to read from."""
return systemId
#============================================================================
#
# CORE FEATURES
#
#============================================================================
feature_namespaces = "http://xml.org/sax/features/namespaces"
# true: Perform Namespace processing (default).
# false: Optionally do not perform Namespace processing
# (implies namespace-prefixes).
# access: (parsing) read-only; (not parsing) read/write
feature_namespace_prefixes = "http://xml.org/sax/features/namespace-prefixes"
# true: Report the original prefixed names and attributes used for Namespace
# declarations.
# false: Do not report attributes used for Namespace declarations, and
# optionally do not report original prefixed names (default).
# access: (parsing) read-only; (not parsing) read/write
feature_string_interning = "http://xml.org/sax/features/string-interning"
# true: All element names, prefixes, attribute names, Namespace URIs, and
# local names are interned using the built-in intern function.
# false: Names are not necessarily interned, although they may be (default).
# access: (parsing) read-only; (not parsing) read/write
feature_validation = "http://xml.org/sax/features/validation"
# true: Report all validation errors (implies external-general-entities and
# external-parameter-entities).
# false: Do not report validation errors.
# access: (parsing) read-only; (not parsing) read/write
feature_external_ges = "http://xml.org/sax/features/external-general-entities"
# true: Include all external general (text) entities.
# false: Do not include external general entities.
# access: (parsing) read-only; (not parsing) read/write
feature_external_pes = "http://xml.org/sax/features/external-parameter-entities"
# true: Include all external parameter entities, including the external
# DTD subset.
# false: Do not include any external parameter entities, even the external
# DTD subset.
# access: (parsing) read-only; (not parsing) read/write
all_features = [feature_namespaces,
feature_namespace_prefixes,
feature_string_interning,
feature_validation,
feature_external_ges,
feature_external_pes]
#============================================================================
#
# CORE PROPERTIES
#
#============================================================================
property_lexical_handler = "http://xml.org/sax/properties/lexical-handler"
# data type: xml.sax.sax2lib.LexicalHandler
# description: An optional extension handler for lexical events like comments.
# access: read/write
property_declaration_handler = "http://xml.org/sax/properties/declaration-handler"
# data type: xml.sax.sax2lib.DeclHandler
# description: An optional extension handler for DTD-related events other
# than notations and unparsed entities.
# access: read/write
property_dom_node = "http://xml.org/sax/properties/dom-node"
# data type: org.w3c.dom.Node
# description: When parsing, the current DOM node being visited if this is
# a DOM iterator; when not parsing, the root DOM node for
# iteration.
# access: (parsing) read-only; (not parsing) read/write
property_xml_string = "http://xml.org/sax/properties/xml-string"
# data type: String
# description: The literal string of characters that was the source for
# the current event.
# access: read-only
property_encoding = "http://www.python.org/sax/properties/encoding"
# data type: String
# description: The name of the encoding to assume for input data.
# access: write: set the encoding, e.g. established by a higher-level
# protocol. May change during parsing (e.g. after
# processing a META tag)
# read: return the current encoding (possibly established through
# auto-detection.
# initial value: UTF-8
#
property_interning_dict = "http://www.python.org/sax/properties/interning-dict"
# data type: Dictionary
# description: The dictionary used to intern common strings in the document
# access: write: Request that the parser uses a specific dictionary, to
# allow interning across different documents
# read: return the current interning dictionary, or None
#
all_properties = [property_lexical_handler,
property_dom_node,
property_declaration_handler,
property_xml_string,
property_encoding,
property_interning_dict]
"""
SAX driver for the pyexpat C module. This driver works with
pyexpat.__version__ == '2.22'.
"""
version = "0.20"
from xml.sax._exceptions import *
from xml.sax.handler import feature_validation, feature_namespaces
from xml.sax.handler import feature_namespace_prefixes
from xml.sax.handler import feature_external_ges, feature_external_pes
from xml.sax.handler import feature_string_interning
from xml.sax.handler import property_xml_string, property_interning_dict
# xml.parsers.expat does not raise ImportError in Jython
import sys
if sys.platform[:4] == "java":
raise SAXReaderNotAvailable("expat not available in Java", None)
del sys
try:
from xml.parsers import expat
except ImportError:
raise SAXReaderNotAvailable("expat not supported", None)
else:
if not hasattr(expat, "ParserCreate"):
raise SAXReaderNotAvailable("expat not supported", None)
from xml.sax import xmlreader, saxutils, handler
AttributesImpl = xmlreader.AttributesImpl
AttributesNSImpl = xmlreader.AttributesNSImpl
# If we're using a sufficiently recent version of Python, we can use
# weak references to avoid cycles between the parser and content
# handler, otherwise we'll just have to pretend.
try:
import _weakref
except ImportError:
def _mkproxy(o):
return o
else:
import weakref
_mkproxy = weakref.proxy
del weakref, _weakref
class _ClosedParser:
pass
# --- ExpatLocator
class ExpatLocator(xmlreader.Locator):
"""Locator for use with the ExpatParser class.
This uses a weak reference to the parser object to avoid creating
a circular reference between the parser and the content handler.
"""
def __init__(self, parser):
self._ref = _mkproxy(parser)
def getColumnNumber(self):
parser = self._ref
if parser._parser is None:
return None
return parser._parser.ErrorColumnNumber
def getLineNumber(self):
parser = self._ref
if parser._parser is None:
return 1
return parser._parser.ErrorLineNumber
def getPublicId(self):
parser = self._ref
if parser is None:
return None
return parser._source.getPublicId()
def getSystemId(self):
parser = self._ref
if parser is None:
return None
return parser._source.getSystemId()
# --- ExpatParser
class ExpatParser(xmlreader.IncrementalParser, xmlreader.Locator):
"""SAX driver for the pyexpat C module."""
def __init__(self, namespaceHandling=0, bufsize=2**16-20):
xmlreader.IncrementalParser.__init__(self, bufsize)
self._source = xmlreader.InputSource()
self._parser = None
self._namespaces = namespaceHandling
self._lex_handler_prop = None
self._parsing = 0
self._entity_stack = []
self._external_ges = 1
self._interning = None
# XMLReader methods
def parse(self, source):
"Parse an XML document from a URL or an InputSource."
source = saxutils.prepare_input_source(source)
self._source = source
try:
self.reset()
self._cont_handler.setDocumentLocator(ExpatLocator(self))
xmlreader.IncrementalParser.parse(self, source)
except:
# bpo-30264: Close the source on error to not leak resources:
# xml.sax.parse() doesn't give access to the underlying parser
# to the caller
self._close_source()
raise
def prepareParser(self, source):
if source.getSystemId() is not None:
base = source.getSystemId()
if isinstance(base, unicode):
base = base.encode('utf-8')
self._parser.SetBase(base)
# Redefined setContentHandler to allow changing handlers during parsing
def setContentHandler(self, handler):
xmlreader.IncrementalParser.setContentHandler(self, handler)
if self._parsing:
self._reset_cont_handler()
def getFeature(self, name):
if name == feature_namespaces:
return self._namespaces
elif name == feature_string_interning:
return self._interning is not None
elif name in (feature_validation, feature_external_pes,
feature_namespace_prefixes):
return 0
elif name == feature_external_ges:
return self._external_ges
raise SAXNotRecognizedException("Feature '%s' not recognized" % name)
def setFeature(self, name, state):
if self._parsing:
raise SAXNotSupportedException("Cannot set features while parsing")
if name == feature_namespaces:
self._namespaces = state
elif name == feature_external_ges:
self._external_ges = state
elif name == feature_string_interning:
if state:
if self._interning is None:
self._interning = {}
else:
self._interning = None
elif name == feature_validation:
if state:
raise SAXNotSupportedException(
"expat does not support validation")
elif name == feature_external_pes:
if state:
raise SAXNotSupportedException(
"expat does not read external parameter entities")
elif name == feature_namespace_prefixes:
if state:
raise SAXNotSupportedException(
"expat does not report namespace prefixes")
else:
raise SAXNotRecognizedException(
"Feature '%s' not recognized" % name)
def getProperty(self, name):
if name == handler.property_lexical_handler:
return self._lex_handler_prop
elif name == property_interning_dict:
return self._interning
elif name == property_xml_string:
if self._parser:
if hasattr(self._parser, "GetInputContext"):
return self._parser.GetInputContext()
else:
raise SAXNotRecognizedException(
"This version of expat does not support getting"
" the XML string")
else:
raise SAXNotSupportedException(
"XML string cannot be returned when not parsing")
raise SAXNotRecognizedException("Property '%s' not recognized" % name)
def setProperty(self, name, value):
if name == handler.property_lexical_handler:
self._lex_handler_prop = value
if self._parsing:
self._reset_lex_handler_prop()
elif name == property_interning_dict:
self._interning = value
elif name == property_xml_string:
raise SAXNotSupportedException("Property '%s' cannot be set" %
name)
else:
raise SAXNotRecognizedException("Property '%s' not recognized" %
name)
# IncrementalParser methods
def feed(self, data, isFinal = 0):
if not self._parsing:
self.reset()
self._parsing = 1
self._cont_handler.startDocument()
try:
# The isFinal parameter is internal to the expat reader.
# If it is set to true, expat will check validity of the entire
# document. When feeding chunks, they are not normally final -
# except when invoked from close.
self._parser.Parse(data, isFinal)
except expat.error, e:
exc = SAXParseException(expat.ErrorString(e.code), e, self)
# FIXME: when to invoke error()?
self._err_handler.fatalError(exc)
def _close_source(self):
source = self._source
try:
file = source.getCharacterStream()
if file is not None:
file.close()
finally:
file = source.getByteStream()
if file is not None:
file.close()
def close(self):
if (self._entity_stack or self._parser is None or
isinstance(self._parser, _ClosedParser)):
# If we are completing an external entity, do nothing here
return
try:
self.feed("", isFinal = 1)
self._cont_handler.endDocument()
self._parsing = 0
# break cycle created by expat handlers pointing to our methods
self._parser = None
finally:
self._parsing = 0
if self._parser is not None:
# Keep ErrorColumnNumber and ErrorLineNumber after closing.
parser = _ClosedParser()
parser.ErrorColumnNumber = self._parser.ErrorColumnNumber
parser.ErrorLineNumber = self._parser.ErrorLineNumber
self._parser = parser
self._close_source()
def _reset_cont_handler(self):
self._parser.ProcessingInstructionHandler = \
self._cont_handler.processingInstruction
self._parser.CharacterDataHandler = self._cont_handler.characters
def _reset_lex_handler_prop(self):
lex = self._lex_handler_prop
parser = self._parser
if lex is None:
parser.CommentHandler = None
parser.StartCdataSectionHandler = None
parser.EndCdataSectionHandler = None
parser.StartDoctypeDeclHandler = None
parser.EndDoctypeDeclHandler = None
else:
parser.CommentHandler = lex.comment
parser.StartCdataSectionHandler = lex.startCDATA
parser.EndCdataSectionHandler = lex.endCDATA
parser.StartDoctypeDeclHandler = self.start_doctype_decl
parser.EndDoctypeDeclHandler = lex.endDTD
def reset(self):
if self._namespaces:
self._parser = expat.ParserCreate(self._source.getEncoding(), " ",
intern=self._interning)
self._parser.namespace_prefixes = 1
self._parser.StartElementHandler = self.start_element_ns
self._parser.EndElementHandler = self.end_element_ns
else:
self._parser = expat.ParserCreate(self._source.getEncoding(),
intern = self._interning)
self._parser.StartElementHandler = self.start_element
self._parser.EndElementHandler = self.end_element
self._reset_cont_handler()
self._parser.UnparsedEntityDeclHandler = self.unparsed_entity_decl
self._parser.NotationDeclHandler = self.notation_decl
self._parser.StartNamespaceDeclHandler = self.start_namespace_decl
self._parser.EndNamespaceDeclHandler = self.end_namespace_decl
self._decl_handler_prop = None
if self._lex_handler_prop:
self._reset_lex_handler_prop()
# self._parser.DefaultHandler =
# self._parser.DefaultHandlerExpand =
# self._parser.NotStandaloneHandler =
self._parser.ExternalEntityRefHandler = self.external_entity_ref
try:
self._parser.SkippedEntityHandler = self.skipped_entity_handler
except AttributeError:
# This pyexpat does not support SkippedEntity
pass
self._parser.SetParamEntityParsing(
expat.XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE)
self._parsing = 0
self._entity_stack = []
# Locator methods
def getColumnNumber(self):
if self._parser is None:
return None
return self._parser.ErrorColumnNumber
def getLineNumber(self):
if self._parser is None:
return 1
return self._parser.ErrorLineNumber
def getPublicId(self):
return self._source.getPublicId()
def getSystemId(self):
return self._source.getSystemId()
# event handlers
def start_element(self, name, attrs):
self._cont_handler.startElement(name, AttributesImpl(attrs))
def end_element(self, name):
self._cont_handler.endElement(name)
def start_element_ns(self, name, attrs):
pair = name.split()
if len(pair) == 1:
# no namespace
pair = (None, name)
elif len(pair) == 3:
pair = pair[0], pair[1]
else:
# default namespace
pair = tuple(pair)
newattrs = {}
qnames = {}
for (aname, value) in attrs.items():
parts = aname.split()
length = len(parts)
if length == 1:
# no namespace
qname = aname
apair = (None, aname)
elif length == 3:
qname = "%s:%s" % (parts[2], parts[1])
apair = parts[0], parts[1]
else:
# default namespace
qname = parts[1]
apair = tuple(parts)
newattrs[apair] = value
qnames[apair] = qname
self._cont_handler.startElementNS(pair, None,
AttributesNSImpl(newattrs, qnames))
def end_element_ns(self, name):
pair = name.split()
if len(pair) == 1:
pair = (None, name)
elif len(pair) == 3:
pair = pair[0], pair[1]
else:
pair = tuple(pair)
self._cont_handler.endElementNS(pair, None)
# this is not used (call directly to ContentHandler)
def processing_instruction(self, target, data):
self._cont_handler.processingInstruction(target, data)
# this is not used (call directly to ContentHandler)
def character_data(self, data):
self._cont_handler.characters(data)
def start_namespace_decl(self, prefix, uri):
self._cont_handler.startPrefixMapping(prefix, uri)
def end_namespace_decl(self, prefix):
self._cont_handler.endPrefixMapping(prefix)
def start_doctype_decl(self, name, sysid, pubid, has_internal_subset):
self._lex_handler_prop.startDTD(name, pubid, sysid)
def unparsed_entity_decl(self, name, base, sysid, pubid, notation_name):
self._dtd_handler.unparsedEntityDecl(name, pubid, sysid, notation_name)
def notation_decl(self, name, base, sysid, pubid):
self._dtd_handler.notationDecl(name, pubid, sysid)
def external_entity_ref(self, context, base, sysid, pubid):
if not self._external_ges:
return 1
source = self._ent_handler.resolveEntity(pubid, sysid)
source = saxutils.prepare_input_source(source,
self._source.getSystemId() or
"")
self._entity_stack.append((self._parser, self._source))
self._parser = self._parser.ExternalEntityParserCreate(context)
self._source = source
try:
xmlreader.IncrementalParser.parse(self, source)
except:
return 0 # FIXME: save error info here?
(self._parser, self._source) = self._entity_stack[-1]
del self._entity_stack[-1]
return 1
def skipped_entity_handler(self, name, is_pe):
if is_pe:
# The SAX spec requires to report skipped PEs with a '%'
name = '%'+name
self._cont_handler.skippedEntity(name)
# ---
def create_parser(*args, **kwargs):
return ExpatParser(*args, **kwargs)
# ---
if __name__ == "__main__":
import xml.sax.saxutils
p = create_parser()
p.setContentHandler(xml.sax.saxutils.XMLGenerator())
p.setErrorHandler(xml.sax.ErrorHandler())
p.parse("http://www.ibiblio.org/xml/examples/shakespeare/hamlet.xml")
"""Python interfaces to XML parsers.
This package contains one module:
expat -- Python wrapper for James Clark's Expat parser, with namespace
support.
"""
"""Interface to the Expat non-validating XML parser."""
__version__ = '$Revision$'
from pyexpat import *
# $Id: __init__.py 3375 2008-02-13 08:05:08Z fredrik $
# elementtree package
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/psf/license for licensing details.
#
# ElementTree
# $Id: SimpleXMLTreeBuilder.py 3225 2007-08-27 21:32:08Z fredrik $
#
# A simple XML tree builder, based on Python's xmllib
#
# Note that due to bugs in xmllib, this builder does not fully support
# namespaces (unqualified attributes are put in the default namespace,
# instead of being left as is). Run this module as a script to find
# out if this affects your Python version.
#
# history:
# 2001-10-20 fl created
# 2002-05-01 fl added namespace support for xmllib
# 2002-08-17 fl added xmllib sanity test
#
# Copyright (c) 1999-2004 by Fredrik Lundh. All rights reserved.
#
# [email protected]
# http://www.pythonware.com
#
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2007 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
##
# Tools to build element trees from XML files, using <b>xmllib</b>.
# This module can be used instead of the standard tree builder, for
# Python versions where "expat" is not available (such as 1.5.2).
# <p>
# Note that due to bugs in <b>xmllib</b>, the namespace support is
# not reliable (you can run the module as a script to find out exactly
# how unreliable it is on your Python version).
##
import xmllib, string
import ElementTree
##
# ElementTree builder for XML source data.
#
# @see elementtree.ElementTree
class TreeBuilder(xmllib.XMLParser):
def __init__(self, html=0, target=None, encoding=None):
self.__builder = ElementTree.TreeBuilder()
if html:
import htmlentitydefs
self.entitydefs.update(htmlentitydefs.entitydefs)
xmllib.XMLParser.__init__(self)
##
# Feeds data to the parser.
#
# @param data Encoded data.
def feed(self, data):
xmllib.XMLParser.feed(self, data)
##
# Finishes feeding data to the parser.
#
# @return An element structure.
# @defreturn Element
def close(self):
xmllib.XMLParser.close(self)
return self.__builder.close()
def handle_data(self, data):
self.__builder.data(data)
handle_cdata = handle_data
def unknown_starttag(self, tag, attrs):
attrib = {}
for key, value in attrs.items():
attrib[fixname(key)] = value
self.__builder.start(fixname(tag), attrib)
def unknown_endtag(self, tag):
self.__builder.end(fixname(tag))
def fixname(name, split=string.split):
# xmllib in 2.0 and later provides limited (and slightly broken)
# support for XML namespaces.
if " " not in name:
return name
return "{%s}%s" % tuple(split(name, " ", 1))
if __name__ == "__main__":
import sys
# sanity check: look for known namespace bugs in xmllib
p = TreeBuilder()
text = """\
<root xmlns='default'>
<tag attribute='value' />
</root>
"""
p.feed(text)
tree = p.close()
status = []
# check for bugs in the xmllib implementation
tag = tree.find("{default}tag")
if tag is None:
status.append("namespaces not supported")
if tag is not None and tag.get("{default}attribute"):
status.append("default namespace applied to unqualified attribute")
# report bugs
if status:
print "xmllib doesn't work properly in this Python version:"
for bug in status:
print "-", bug
else:
print "congratulations; no problems found in xmllib"
#
# ElementTree
# $Id: ElementTree.py 3440 2008-07-18 14:45:01Z fredrik $
#
# light-weight XML support for Python 2.3 and later.
#
# history (since 1.2.6):
# 2005-11-12 fl added tostringlist/fromstringlist helpers
# 2006-07-05 fl merged in selected changes from the 1.3 sandbox
# 2006-07-05 fl removed support for 2.1 and earlier
# 2007-06-21 fl added deprecation/future warnings
# 2007-08-25 fl added doctype hook, added parser version attribute etc
# 2007-08-26 fl added new serializer code (better namespace handling, etc)
# 2007-08-27 fl warn for broken /tag searches on tree level
# 2007-09-02 fl added html/text methods to serializer (experimental)
# 2007-09-05 fl added method argument to tostring/tostringlist
# 2007-09-06 fl improved error handling
# 2007-09-13 fl added itertext, iterfind; assorted cleanups
# 2007-12-15 fl added C14N hooks, copy method (experimental)
#
# Copyright (c) 1999-2008 by Fredrik Lundh. All rights reserved.
#
# [email protected]
# http://www.pythonware.com
#
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/psf/license for licensing details.
__all__ = [
# public symbols
"Comment",
"dump",
"Element", "ElementTree",
"fromstring", "fromstringlist",
"iselement", "iterparse",
"parse", "ParseError",
"PI", "ProcessingInstruction",
"QName",
"SubElement",
"tostring", "tostringlist",
"TreeBuilder",
"VERSION",
"XML",
"XMLParser", "XMLTreeBuilder",
]
VERSION = "1.3.0"
##
# The <b>Element</b> type is a flexible container object, designed to
# store hierarchical data structures in memory. The type can be
# described as a cross between a list and a dictionary.
# <p>
# Each element has a number of properties associated with it:
# <ul>
# <li>a <i>tag</i>. This is a string identifying what kind of data
# this element represents (the element type, in other words).</li>
# <li>a number of <i>attributes</i>, stored in a Python dictionary.</li>
# <li>a <i>text</i> string.</li>
# <li>an optional <i>tail</i> string.</li>
# <li>a number of <i>child elements</i>, stored in a Python sequence</li>
# </ul>
#
# To create an element instance, use the {@link #Element} constructor
# or the {@link #SubElement} factory function.
# <p>
# The {@link #ElementTree} class can be used to wrap an element
# structure, and convert it from and to XML.
##
import sys
import re
import warnings
class _SimpleElementPath(object):
# emulate pre-1.2 find/findtext/findall behaviour
def find(self, element, tag, namespaces=None):
for elem in element:
if elem.tag == tag:
return elem
return None
def findtext(self, element, tag, default=None, namespaces=None):
elem = self.find(element, tag)
if elem is None:
return default
return elem.text or ""
def iterfind(self, element, tag, namespaces=None):
if tag[:3] == ".//":
for elem in element.iter(tag[3:]):
yield elem
for elem in element:
if elem.tag == tag:
yield elem
def findall(self, element, tag, namespaces=None):
return list(self.iterfind(element, tag, namespaces))
try:
from . import ElementPath
except ImportError:
ElementPath = _SimpleElementPath()
##
# Parser error. This is a subclass of <b>SyntaxError</b>.
# <p>
# In addition to the exception value, an exception instance contains a
# specific exception code in the <b>code</b> attribute, and the line and
# column of the error in the <b>position</b> attribute.
class ParseError(SyntaxError):
pass
# --------------------------------------------------------------------
##
# Checks if an object appears to be a valid element object.
#
# @param An element instance.
# @return A true value if this is an element object.
# @defreturn flag
def iselement(element):
# FIXME: not sure about this; might be a better idea to look
# for tag/attrib/text attributes
return isinstance(element, Element) or hasattr(element, "tag")
##
# Element class. This class defines the Element interface, and
# provides a reference implementation of this interface.
# <p>
# The element name, attribute names, and attribute values can be
# either ASCII strings (ordinary Python strings containing only 7-bit
# ASCII characters) or Unicode strings.
#
# @param tag The element name.
# @param attrib An optional dictionary, containing element attributes.
# @param **extra Additional attributes, given as keyword arguments.
# @see Element
# @see SubElement
# @see Comment
# @see ProcessingInstruction
class Element(object):
# <tag attrib>text<child/>...</tag>tail
##
# (Attribute) Element tag.
tag = None
##
# (Attribute) Element attribute dictionary. Where possible, use
# {@link #Element.get},
# {@link #Element.set},
# {@link #Element.keys}, and
# {@link #Element.items} to access
# element attributes.
attrib = None
##
# (Attribute) Text before first subelement. This is either a
# string or the value None. Note that if there was no text, this
# attribute may be either None or an empty string, depending on
# the parser.
text = None
##
# (Attribute) Text after this element's end tag, but before the
# next sibling element's start tag. This is either a string or
# the value None. Note that if there was no text, this attribute
# may be either None or an empty string, depending on the parser.
tail = None # text after end tag, if any
# constructor
def __init__(self, tag, attrib={}, **extra):
attrib = attrib.copy()
attrib.update(extra)
self.tag = tag
self.attrib = attrib
self._children = []
def __repr__(self):
return "<Element %s at 0x%x>" % (repr(self.tag), id(self))
##
# Creates a new element object of the same type as this element.
#
# @param tag Element tag.
# @param attrib Element attributes, given as a dictionary.
# @return A new element instance.
def makeelement(self, tag, attrib):
return self.__class__(tag, attrib)
##
# (Experimental) Copies the current element. This creates a
# shallow copy; subelements will be shared with the original tree.
#
# @return A new element instance.
def copy(self):
elem = self.makeelement(self.tag, self.attrib)
elem.text = self.text
elem.tail = self.tail
elem[:] = self
return elem
##
# Returns the number of subelements. Note that this only counts
# full elements; to check if there's any content in an element, you
# have to check both the length and the <b>text</b> attribute.
#
# @return The number of subelements.
def __len__(self):
return len(self._children)
def __nonzero__(self):
warnings.warn(
"The behavior of this method will change in future versions. "
"Use specific 'len(elem)' or 'elem is not None' test instead.",
FutureWarning, stacklevel=2
)
return len(self._children) != 0 # emulate old behaviour, for now
##
# Returns the given subelement, by index.
#
# @param index What subelement to return.
# @return The given subelement.
# @exception IndexError If the given element does not exist.
def __getitem__(self, index):
return self._children[index]
##
# Replaces the given subelement, by index.
#
# @param index What subelement to replace.
# @param element The new element value.
# @exception IndexError If the given element does not exist.
def __setitem__(self, index, element):
# if isinstance(index, slice):
# for elt in element:
# assert iselement(elt)
# else:
# assert iselement(element)
self._children[index] = element
##
# Deletes the given subelement, by index.
#
# @param index What subelement to delete.
# @exception IndexError If the given element does not exist.
def __delitem__(self, index):
del self._children[index]
##
# Adds a subelement to the end of this element. In document order,
# the new element will appear after the last existing subelement (or
# directly after the text, if it's the first subelement), but before
# the end tag for this element.
#
# @param element The element to add.
def append(self, element):
# assert iselement(element)
self._children.append(element)
##
# Appends subelements from a sequence.
#
# @param elements A sequence object with zero or more elements.
# @since 1.3
def extend(self, elements):
# for element in elements:
# assert iselement(element)
self._children.extend(elements)
##
# Inserts a subelement at the given position in this element.
#
# @param index Where to insert the new subelement.
def insert(self, index, element):
# assert iselement(element)
self._children.insert(index, element)
##
# Removes a matching subelement. Unlike the <b>find</b> methods,
# this method compares elements based on identity, not on tag
# value or contents. To remove subelements by other means, the
# easiest way is often to use a list comprehension to select what
# elements to keep, and use slice assignment to update the parent
# element.
#
# @param element What element to remove.
# @exception ValueError If a matching element could not be found.
def remove(self, element):
# assert iselement(element)
self._children.remove(element)
##
# (Deprecated) Returns all subelements. The elements are returned
# in document order.
#
# @return A list of subelements.
# @defreturn list of Element instances
def getchildren(self):
warnings.warn(
"This method will be removed in future versions. "
"Use 'list(elem)' or iteration over elem instead.",
DeprecationWarning, stacklevel=2
)
return self._children
##
# Finds the first matching subelement, by tag name or path.
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return The first matching element, or None if no element was found.
# @defreturn Element or None
def find(self, path, namespaces=None):
return ElementPath.find(self, path, namespaces)
##
# Finds text for the first matching subelement, by tag name or path.
#
# @param path What element to look for.
# @param default What to return if the element was not found.
# @keyparam namespaces Optional namespace prefix map.
# @return The text content of the first matching element, or the
# default value no element was found. Note that if the element
# is found, but has no text content, this method returns an
# empty string.
# @defreturn string
def findtext(self, path, default=None, namespaces=None):
return ElementPath.findtext(self, path, default, namespaces)
##
# Finds all matching subelements, by tag name or path.
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return A list or other sequence containing all matching elements,
# in document order.
# @defreturn list of Element instances
def findall(self, path, namespaces=None):
return ElementPath.findall(self, path, namespaces)
##
# Finds all matching subelements, by tag name or path.
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return An iterator or sequence containing all matching elements,
# in document order.
# @defreturn a generated sequence of Element instances
def iterfind(self, path, namespaces=None):
return ElementPath.iterfind(self, path, namespaces)
##
# Resets an element. This function removes all subelements, clears
# all attributes, and sets the <b>text</b> and <b>tail</b> attributes
# to None.
def clear(self):
self.attrib.clear()
self._children = []
self.text = self.tail = None
##
# Gets an element attribute. Equivalent to <b>attrib.get</b>, but
# some implementations may handle this a bit more efficiently.
#
# @param key What attribute to look for.
# @param default What to return if the attribute was not found.
# @return The attribute value, or the default value, if the
# attribute was not found.
# @defreturn string or None
def get(self, key, default=None):
return self.attrib.get(key, default)
##
# Sets an element attribute. Equivalent to <b>attrib[key] = value</b>,
# but some implementations may handle this a bit more efficiently.
#
# @param key What attribute to set.
# @param value The attribute value.
def set(self, key, value):
self.attrib[key] = value
##
# Gets a list of attribute names. The names are returned in an
# arbitrary order (just like for an ordinary Python dictionary).
# Equivalent to <b>attrib.keys()</b>.
#
# @return A list of element attribute names.
# @defreturn list of strings
def keys(self):
return self.attrib.keys()
##
# Gets element attributes, as a sequence. The attributes are
# returned in an arbitrary order. Equivalent to <b>attrib.items()</b>.
#
# @return A list of (name, value) tuples for all attributes.
# @defreturn list of (string, string) tuples
def items(self):
return self.attrib.items()
##
# Creates a tree iterator. The iterator loops over this element
# and all subelements, in document order, and returns all elements
# with a matching tag.
# <p>
# If the tree structure is modified during iteration, new or removed
# elements may or may not be included. To get a stable set, use the
# list() function on the iterator, and loop over the resulting list.
#
# @param tag What tags to look for (default is to return all elements).
# @return An iterator containing all the matching elements.
# @defreturn iterator
def iter(self, tag=None):
if tag == "*":
tag = None
if tag is None or self.tag == tag:
yield self
for e in self._children:
for e in e.iter(tag):
yield e
# compatibility
def getiterator(self, tag=None):
# Change for a DeprecationWarning in 1.4
warnings.warn(
"This method will be removed in future versions. "
"Use 'elem.iter()' or 'list(elem.iter())' instead.",
PendingDeprecationWarning, stacklevel=2
)
return list(self.iter(tag))
##
# Creates a text iterator. The iterator loops over this element
# and all subelements, in document order, and returns all inner
# text.
#
# @return An iterator containing all inner text.
# @defreturn iterator
def itertext(self):
tag = self.tag
if not isinstance(tag, basestring) and tag is not None:
return
if self.text:
yield self.text
for e in self:
for s in e.itertext():
yield s
if e.tail:
yield e.tail
# compatibility
_Element = _ElementInterface = Element
##
# Subelement factory. This function creates an element instance, and
# appends it to an existing element.
# <p>
# The element name, attribute names, and attribute values can be
# either 8-bit ASCII strings or Unicode strings.
#
# @param parent The parent element.
# @param tag The subelement name.
# @param attrib An optional dictionary, containing element attributes.
# @param **extra Additional attributes, given as keyword arguments.
# @return An element instance.
# @defreturn Element
def SubElement(parent, tag, attrib={}, **extra):
attrib = attrib.copy()
attrib.update(extra)
element = parent.makeelement(tag, attrib)
parent.append(element)
return element
##
# Comment element factory. This factory function creates a special
# element that will be serialized as an XML comment by the standard
# serializer.
# <p>
# The comment string can be either an 8-bit ASCII string or a Unicode
# string.
#
# @param text A string containing the comment string.
# @return An element instance, representing a comment.
# @defreturn Element
def Comment(text=None):
element = Element(Comment)
element.text = text
return element
##
# PI element factory. This factory function creates a special element
# that will be serialized as an XML processing instruction by the standard
# serializer.
#
# @param target A string containing the PI target.
# @param text A string containing the PI contents, if any.
# @return An element instance, representing a PI.
# @defreturn Element
def ProcessingInstruction(target, text=None):
element = Element(ProcessingInstruction)
element.text = target
if text:
element.text = element.text + " " + text
return element
PI = ProcessingInstruction
##
# QName wrapper. This can be used to wrap a QName attribute value, in
# order to get proper namespace handling on output.
#
# @param text A string containing the QName value, in the form {uri}local,
# or, if the tag argument is given, the URI part of a QName.
# @param tag Optional tag. If given, the first argument is interpreted as
# a URI, and this argument is interpreted as a local name.
# @return An opaque object, representing the QName.
class QName(object):
def __init__(self, text_or_uri, tag=None):
if tag:
text_or_uri = "{%s}%s" % (text_or_uri, tag)
self.text = text_or_uri
def __str__(self):
return self.text
def __hash__(self):
return hash(self.text)
def __cmp__(self, other):
if isinstance(other, QName):
return cmp(self.text, other.text)
return cmp(self.text, other)
# --------------------------------------------------------------------
##
# ElementTree wrapper class. This class represents an entire element
# hierarchy, and adds some extra support for serialization to and from
# standard XML.
#
# @param element Optional root element.
# @keyparam file Optional file handle or file name. If given, the
# tree is initialized with the contents of this XML file.
class ElementTree(object):
def __init__(self, element=None, file=None):
# assert element is None or iselement(element)
self._root = element # first node
if file:
self.parse(file)
##
# Gets the root element for this tree.
#
# @return An element instance.
# @defreturn Element
def getroot(self):
return self._root
##
# Replaces the root element for this tree. This discards the
# current contents of the tree, and replaces it with the given
# element. Use with care.
#
# @param element An element instance.
def _setroot(self, element):
# assert iselement(element)
self._root = element
##
# Loads an external XML document into this element tree.
#
# @param source A file name or file object. If a file object is
# given, it only has to implement a <b>read(n)</b> method.
# @keyparam parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return The document root element.
# @defreturn Element
# @exception ParseError If the parser fails to parse the document.
def parse(self, source, parser=None):
close_source = False
if not hasattr(source, "read"):
source = open(source, "rb")
close_source = True
try:
if not parser:
parser = XMLParser(target=TreeBuilder())
while 1:
data = source.read(65536)
if not data:
break
parser.feed(data)
self._root = parser.close()
return self._root
finally:
if close_source:
source.close()
##
# Creates a tree iterator for the root element. The iterator loops
# over all elements in this tree, in document order.
#
# @param tag What tags to look for (default is to return all elements)
# @return An iterator.
# @defreturn iterator
def iter(self, tag=None):
# assert self._root is not None
return self._root.iter(tag)
# compatibility
def getiterator(self, tag=None):
# Change for a DeprecationWarning in 1.4
warnings.warn(
"This method will be removed in future versions. "
"Use 'tree.iter()' or 'list(tree.iter())' instead.",
PendingDeprecationWarning, stacklevel=2
)
return list(self.iter(tag))
##
# Same as getroot().find(path), starting at the root of the
# tree.
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return The first matching element, or None if no element was found.
# @defreturn Element or None
def find(self, path, namespaces=None):
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.find(path, namespaces)
##
# Same as getroot().findtext(path), starting at the root of the tree.
#
# @param path What element to look for.
# @param default What to return if the element was not found.
# @keyparam namespaces Optional namespace prefix map.
# @return The text content of the first matching element, or the
# default value no element was found. Note that if the element
# is found, but has no text content, this method returns an
# empty string.
# @defreturn string
def findtext(self, path, default=None, namespaces=None):
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.findtext(path, default, namespaces)
##
# Same as getroot().findall(path), starting at the root of the tree.
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return A list or iterator containing all matching elements,
# in document order.
# @defreturn list of Element instances
def findall(self, path, namespaces=None):
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.findall(path, namespaces)
##
# Finds all matching subelements, by tag name or path.
# Same as getroot().iterfind(path).
#
# @param path What element to look for.
# @keyparam namespaces Optional namespace prefix map.
# @return An iterator or sequence containing all matching elements,
# in document order.
# @defreturn a generated sequence of Element instances
def iterfind(self, path, namespaces=None):
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.iterfind(path, namespaces)
##
# Writes the element tree to a file, as XML.
#
# @def write(file, **options)
# @param file A file name, or a file object opened for writing.
# @param **options Options, given as keyword arguments.
# @keyparam encoding Optional output encoding (default is US-ASCII).
# @keyparam xml_declaration Controls if an XML declaration should
# be added to the file. Use False for never, True for always,
# None for only if not US-ASCII or UTF-8. None is default.
# @keyparam default_namespace Sets the default XML namespace (for "xmlns").
# @keyparam method Optional output method ("xml", "html", "text" or
# "c14n"; default is "xml").
def write(self, file_or_filename,
# keyword arguments
encoding=None,
xml_declaration=None,
default_namespace=None,
method=None):
# assert self._root is not None
if not method:
method = "xml"
elif method not in _serialize:
# FIXME: raise an ImportError for c14n if ElementC14N is missing?
raise ValueError("unknown method %r" % method)
if hasattr(file_or_filename, "write"):
file = file_or_filename
else:
file = open(file_or_filename, "wb")
write = file.write
if not encoding:
if method == "c14n":
encoding = "utf-8"
else:
encoding = "us-ascii"
elif xml_declaration or (xml_declaration is None and
encoding not in ("utf-8", "us-ascii")):
if method == "xml":
write("<?xml version='1.0' encoding='%s'?>\n" % encoding)
if method == "text":
_serialize_text(write, self._root, encoding)
else:
qnames, namespaces = _namespaces(
self._root, encoding, default_namespace
)
serialize = _serialize[method]
serialize(write, self._root, encoding, qnames, namespaces)
if file_or_filename is not file:
file.close()
def write_c14n(self, file):
# lxml.etree compatibility. use output method instead
return self.write(file, method="c14n")
# --------------------------------------------------------------------
# serialization support
def _namespaces(elem, encoding, default_namespace=None):
# identify namespaces used in this tree
# maps qnames to *encoded* prefix:local names
qnames = {None: None}
# maps uri:s to prefixes
namespaces = {}
if default_namespace:
namespaces[default_namespace] = ""
def encode(text):
return text.encode(encoding)
def add_qname(qname):
# calculate serialized qname representation
try:
if qname[:1] == "{":
uri, tag = qname[1:].rsplit("}", 1)
prefix = namespaces.get(uri)
if prefix is None:
prefix = _namespace_map.get(uri)
if prefix is None:
prefix = "ns%d" % len(namespaces)
if prefix != "xml":
namespaces[uri] = prefix
if prefix:
qnames[qname] = encode("%s:%s" % (prefix, tag))
else:
qnames[qname] = encode(tag) # default element
else:
if default_namespace:
# FIXME: can this be handled in XML 1.0?
raise ValueError(
"cannot use non-qualified names with "
"default_namespace option"
)
qnames[qname] = encode(qname)
except TypeError:
_raise_serialization_error(qname)
# populate qname and namespaces table
try:
iterate = elem.iter
except AttributeError:
iterate = elem.getiterator # cET compatibility
for elem in iterate():
tag = elem.tag
if isinstance(tag, QName):
if tag.text not in qnames:
add_qname(tag.text)
elif isinstance(tag, basestring):
if tag not in qnames:
add_qname(tag)
elif tag is not None and tag is not Comment and tag is not PI:
_raise_serialization_error(tag)
for key, value in elem.items():
if isinstance(key, QName):
key = key.text
if key not in qnames:
add_qname(key)
if isinstance(value, QName) and value.text not in qnames:
add_qname(value.text)
text = elem.text
if isinstance(text, QName) and text.text not in qnames:
add_qname(text.text)
return qnames, namespaces
def _serialize_xml(write, elem, encoding, qnames, namespaces):
tag = elem.tag
text = elem.text
if tag is Comment:
write("<!--%s-->" % _encode(text, encoding))
elif tag is ProcessingInstruction:
write("<?%s?>" % _encode(text, encoding))
else:
tag = qnames[tag]
if tag is None:
if text:
write(_escape_cdata(text, encoding))
for e in elem:
_serialize_xml(write, e, encoding, qnames, None)
else:
write("<" + tag)
items = elem.items()
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k.encode(encoding),
_escape_attrib(v, encoding)
))
for k, v in sorted(items): # lexical order
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib(v, encoding)
write(" %s=\"%s\"" % (qnames[k], v))
if text or len(elem):
write(">")
if text:
write(_escape_cdata(text, encoding))
for e in elem:
_serialize_xml(write, e, encoding, qnames, None)
write("</" + tag + ">")
else:
write(" />")
if elem.tail:
write(_escape_cdata(elem.tail, encoding))
HTML_EMPTY = ("area", "base", "basefont", "br", "col", "frame", "hr",
"img", "input", "isindex", "link", "meta", "param")
try:
HTML_EMPTY = set(HTML_EMPTY)
except NameError:
pass
def _serialize_html(write, elem, encoding, qnames, namespaces):
tag = elem.tag
text = elem.text
if tag is Comment:
write("<!--%s-->" % _escape_cdata(text, encoding))
elif tag is ProcessingInstruction:
write("<?%s?>" % _escape_cdata(text, encoding))
else:
tag = qnames[tag]
if tag is None:
if text:
write(_escape_cdata(text, encoding))
for e in elem:
_serialize_html(write, e, encoding, qnames, None)
else:
write("<" + tag)
items = elem.items()
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k.encode(encoding),
_escape_attrib(v, encoding)
))
for k, v in sorted(items): # lexical order
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib_html(v, encoding)
# FIXME: handle boolean attributes
write(" %s=\"%s\"" % (qnames[k], v))
write(">")
ltag = tag.lower()
if text:
if ltag == "script" or ltag == "style":
write(_encode(text, encoding))
else:
write(_escape_cdata(text, encoding))
for e in elem:
_serialize_html(write, e, encoding, qnames, None)
if ltag not in HTML_EMPTY:
write("</" + tag + ">")
if elem.tail:
write(_escape_cdata(elem.tail, encoding))
def _serialize_text(write, elem, encoding):
for part in elem.itertext():
write(part.encode(encoding))
if elem.tail:
write(elem.tail.encode(encoding))
_serialize = {
"xml": _serialize_xml,
"html": _serialize_html,
"text": _serialize_text,
# this optional method is imported at the end of the module
# "c14n": _serialize_c14n,
}
##
# Registers a namespace prefix. The registry is global, and any
# existing mapping for either the given prefix or the namespace URI
# will be removed.
#
# @param prefix Namespace prefix.
# @param uri Namespace uri. Tags and attributes in this namespace
# will be serialized with the given prefix, if at all possible.
# @exception ValueError If the prefix is reserved, or is otherwise
# invalid.
def register_namespace(prefix, uri):
if re.match("ns\d+$", prefix):
raise ValueError("Prefix format reserved for internal use")
for k, v in _namespace_map.items():
if k == uri or v == prefix:
del _namespace_map[k]
_namespace_map[uri] = prefix
_namespace_map = {
# "well-known" namespace prefixes
"http://www.w3.org/XML/1998/namespace": "xml",
"http://www.w3.org/1999/xhtml": "html",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf",
"http://schemas.xmlsoap.org/wsdl/": "wsdl",
# xml schema
"http://www.w3.org/2001/XMLSchema": "xs",
"http://www.w3.org/2001/XMLSchema-instance": "xsi",
# dublin core
"http://purl.org/dc/elements/1.1/": "dc",
}
def _raise_serialization_error(text):
raise TypeError(
"cannot serialize %r (type %s)" % (text, type(text).__name__)
)
def _encode(text, encoding):
try:
return text.encode(encoding, "xmlcharrefreplace")
except (TypeError, AttributeError):
_raise_serialization_error(text)
def _escape_cdata(text, encoding):
# escape character data
try:
# it's worth avoiding do-nothing calls for strings that are
# shorter than 500 character, or so. assume that's, by far,
# the most common case in most applications.
if "&" in text:
text = text.replace("&", "&")
if "<" in text:
text = text.replace("<", "<")
if ">" in text:
text = text.replace(">", ">")
return text.encode(encoding, "xmlcharrefreplace")
except (TypeError, AttributeError):
_raise_serialization_error(text)
def _escape_attrib(text, encoding):
# escape attribute value
try:
if "&" in text:
text = text.replace("&", "&")
if "<" in text:
text = text.replace("<", "<")
if ">" in text:
text = text.replace(">", ">")
if "\"" in text:
text = text.replace("\"", """)
if "\n" in text:
text = text.replace("\n", " ")
return text.encode(encoding, "xmlcharrefreplace")
except (TypeError, AttributeError):
_raise_serialization_error(text)
def _escape_attrib_html(text, encoding):
# escape attribute value
try:
if "&" in text:
text = text.replace("&", "&")
if ">" in text:
text = text.replace(">", ">")
if "\"" in text:
text = text.replace("\"", """)
return text.encode(encoding, "xmlcharrefreplace")
except (TypeError, AttributeError):
_raise_serialization_error(text)
# --------------------------------------------------------------------
##
# Generates a string representation of an XML element, including all
# subelements.
#
# @param element An Element instance.
# @keyparam encoding Optional output encoding (default is US-ASCII).
# @keyparam method Optional output method ("xml", "html", "text" or
# "c14n"; default is "xml").
# @return An encoded string containing the XML data.
# @defreturn string
def tostring(element, encoding=None, method=None):
class dummy:
pass
data = []
file = dummy()
file.write = data.append
ElementTree(element).write(file, encoding, method=method)
return "".join(data)
##
# Generates a string representation of an XML element, including all
# subelements. The string is returned as a sequence of string fragments.
#
# @param element An Element instance.
# @keyparam encoding Optional output encoding (default is US-ASCII).
# @keyparam method Optional output method ("xml", "html", "text" or
# "c14n"; default is "xml").
# @return A sequence object containing the XML data.
# @defreturn sequence
# @since 1.3
def tostringlist(element, encoding=None, method=None):
class dummy:
pass
data = []
file = dummy()
file.write = data.append
ElementTree(element).write(file, encoding, method=method)
# FIXME: merge small fragments into larger parts
return data
##
# Writes an element tree or element structure to sys.stdout. This
# function should be used for debugging only.
# <p>
# The exact output format is implementation dependent. In this
# version, it's written as an ordinary XML file.
#
# @param elem An element tree or an individual element.
def dump(elem):
# debugging
if not isinstance(elem, ElementTree):
elem = ElementTree(elem)
elem.write(sys.stdout)
tail = elem.getroot().tail
if not tail or tail[-1] != "\n":
sys.stdout.write("\n")
# --------------------------------------------------------------------
# parsing
##
# Parses an XML document into an element tree.
#
# @param source A filename or file object containing XML data.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return An ElementTree instance
def parse(source, parser=None):
tree = ElementTree()
tree.parse(source, parser)
return tree
##
# Parses an XML document into an element tree incrementally, and reports
# what's going on to the user.
#
# @param source A filename or file object containing XML data.
# @param events A list of events to report back. If omitted, only "end"
# events are reported.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return A (event, elem) iterator.
def iterparse(source, events=None, parser=None):
close_source = False
if not hasattr(source, "read"):
source = open(source, "rb")
close_source = True
try:
if not parser:
parser = XMLParser(target=TreeBuilder())
return _IterParseIterator(source, events, parser, close_source)
except:
if close_source:
source.close()
raise
class _IterParseIterator(object):
def __init__(self, source, events, parser, close_source=False):
self._file = source
self._close_file = close_source
self._events = []
self._index = 0
self._error = None
self.root = self._root = None
self._parser = parser
# wire up the parser for event reporting
parser = self._parser._parser
append = self._events.append
if events is None:
events = ["end"]
for event in events:
if event == "start":
try:
parser.ordered_attributes = 1
parser.specified_attributes = 1
def handler(tag, attrib_in, event=event, append=append,
start=self._parser._start_list):
append((event, start(tag, attrib_in)))
parser.StartElementHandler = handler
except AttributeError:
def handler(tag, attrib_in, event=event, append=append,
start=self._parser._start):
append((event, start(tag, attrib_in)))
parser.StartElementHandler = handler
elif event == "end":
def handler(tag, event=event, append=append,
end=self._parser._end):
append((event, end(tag)))
parser.EndElementHandler = handler
elif event == "start-ns":
def handler(prefix, uri, event=event, append=append):
try:
uri = (uri or "").encode("ascii")
except UnicodeError:
pass
append((event, (prefix or "", uri or "")))
parser.StartNamespaceDeclHandler = handler
elif event == "end-ns":
def handler(prefix, event=event, append=append):
append((event, None))
parser.EndNamespaceDeclHandler = handler
else:
raise ValueError("unknown event %r" % event)
def next(self):
try:
while 1:
try:
item = self._events[self._index]
self._index += 1
return item
except IndexError:
pass
if self._error:
e = self._error
self._error = None
raise e
if self._parser is None:
self.root = self._root
break
# load event buffer
del self._events[:]
self._index = 0
data = self._file.read(16384)
if data:
try:
self._parser.feed(data)
except SyntaxError as exc:
self._error = exc
else:
self._root = self._parser.close()
self._parser = None
except:
if self._close_file:
self._file.close()
raise
if self._close_file:
self._file.close()
raise StopIteration
def __iter__(self):
return self
##
# Parses an XML document from a string constant. This function can
# be used to embed "XML literals" in Python code.
#
# @param source A string containing XML data.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return An Element instance.
# @defreturn Element
def XML(text, parser=None):
if not parser:
parser = XMLParser(target=TreeBuilder())
parser.feed(text)
return parser.close()
##
# Parses an XML document from a string constant, and also returns
# a dictionary which maps from element id:s to elements.
#
# @param source A string containing XML data.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return A tuple containing an Element instance and a dictionary.
# @defreturn (Element, dictionary)
def XMLID(text, parser=None):
if not parser:
parser = XMLParser(target=TreeBuilder())
parser.feed(text)
tree = parser.close()
ids = {}
for elem in tree.iter():
id = elem.get("id")
if id:
ids[id] = elem
return tree, ids
##
# Parses an XML document from a string constant. Same as {@link #XML}.
#
# @def fromstring(text)
# @param source A string containing XML data.
# @return An Element instance.
# @defreturn Element
fromstring = XML
##
# Parses an XML document from a sequence of string fragments.
#
# @param sequence A list or other sequence containing XML data fragments.
# @param parser An optional parser instance. If not given, the
# standard {@link XMLParser} parser is used.
# @return An Element instance.
# @defreturn Element
# @since 1.3
def fromstringlist(sequence, parser=None):
if not parser:
parser = XMLParser(target=TreeBuilder())
for text in sequence:
parser.feed(text)
return parser.close()
# --------------------------------------------------------------------
##
# Generic element structure builder. This builder converts a sequence
# of {@link #TreeBuilder.start}, {@link #TreeBuilder.data}, and {@link
# #TreeBuilder.end} method calls to a well-formed element structure.
# <p>
# You can use this class to build an element structure using a custom XML
# parser, or a parser for some other XML-like format.
#
# @param element_factory Optional element factory. This factory
# is called to create new Element instances, as necessary.
class TreeBuilder(object):
def __init__(self, element_factory=None):
self._data = [] # data collector
self._elem = [] # element stack
self._last = None # last element
self._tail = None # true if we're after an end tag
if element_factory is None:
element_factory = Element
self._factory = element_factory
##
# Flushes the builder buffers, and returns the toplevel document
# element.
#
# @return An Element instance.
# @defreturn Element
def close(self):
assert len(self._elem) == 0, "missing end tags"
assert self._last is not None, "missing toplevel element"
return self._last
def _flush(self):
if self._data:
if self._last is not None:
text = "".join(self._data)
if self._tail:
assert self._last.tail is None, "internal error (tail)"
self._last.tail = text
else:
assert self._last.text is None, "internal error (text)"
self._last.text = text
self._data = []
##
# Adds text to the current element.
#
# @param data A string. This should be either an 8-bit string
# containing ASCII text, or a Unicode string.
def data(self, data):
self._data.append(data)
##
# Opens a new element.
#
# @param tag The element name.
# @param attrib A dictionary containing element attributes.
# @return The opened element.
# @defreturn Element
def start(self, tag, attrs):
self._flush()
self._last = elem = self._factory(tag, attrs)
if self._elem:
self._elem[-1].append(elem)
self._elem.append(elem)
self._tail = 0
return elem
##
# Closes the current element.
#
# @param tag The element name.
# @return The closed element.
# @defreturn Element
def end(self, tag):
self._flush()
self._last = self._elem.pop()
assert self._last.tag == tag,\
"end tag mismatch (expected %s, got %s)" % (
self._last.tag, tag)
self._tail = 1
return self._last
_sentinel = ['sentinel']
##
# Element structure builder for XML source data, based on the
# <b>expat</b> parser.
#
# @keyparam target Target object. If omitted, the builder uses an
# instance of the standard {@link #TreeBuilder} class.
# @keyparam html Predefine HTML entities. This flag is not supported
# by the current implementation.
# @keyparam encoding Optional encoding. If given, the value overrides
# the encoding specified in the XML file.
# @see #ElementTree
# @see #TreeBuilder
class XMLParser(object):
def __init__(self, html=_sentinel, target=None, encoding=None):
if html is not _sentinel:
warnings.warnpy3k(
"The html argument of XMLParser() is deprecated",
DeprecationWarning, stacklevel=2)
try:
from xml.parsers import expat
except ImportError:
try:
import pyexpat as expat
except ImportError:
raise ImportError(
"No module named expat; use SimpleXMLTreeBuilder instead"
)
parser = expat.ParserCreate(encoding, "}")
if target is None:
target = TreeBuilder()
# underscored names are provided for compatibility only
self.parser = self._parser = parser
self.target = self._target = target
self._error = expat.error
self._names = {} # name memo cache
# callbacks
parser.DefaultHandlerExpand = self._default
parser.StartElementHandler = self._start
parser.EndElementHandler = self._end
parser.CharacterDataHandler = self._data
# optional callbacks
parser.CommentHandler = self._comment
parser.ProcessingInstructionHandler = self._pi
# let expat do the buffering, if supported
try:
self._parser.buffer_text = 1
except AttributeError:
pass
# use new-style attribute handling, if supported
try:
self._parser.ordered_attributes = 1
self._parser.specified_attributes = 1
parser.StartElementHandler = self._start_list
except AttributeError:
pass
self._doctype = None
self.entity = {}
try:
self.version = "Expat %d.%d.%d" % expat.version_info
except AttributeError:
pass # unknown
def _raiseerror(self, value):
err = ParseError(value)
err.code = value.code
err.position = value.lineno, value.offset
raise err
def _fixtext(self, text):
# convert text string to ascii, if possible
try:
return text.encode("ascii")
except UnicodeError:
return text
def _fixname(self, key):
# expand qname, and convert name string to ascii, if possible
try:
name = self._names[key]
except KeyError:
name = key
if "}" in name:
name = "{" + name
self._names[key] = name = self._fixtext(name)
return name
def _start(self, tag, attrib_in):
fixname = self._fixname
fixtext = self._fixtext
tag = fixname(tag)
attrib = {}
for key, value in attrib_in.items():
attrib[fixname(key)] = fixtext(value)
return self.target.start(tag, attrib)
def _start_list(self, tag, attrib_in):
fixname = self._fixname
fixtext = self._fixtext
tag = fixname(tag)
attrib = {}
if attrib_in:
for i in range(0, len(attrib_in), 2):
attrib[fixname(attrib_in[i])] = fixtext(attrib_in[i+1])
return self.target.start(tag, attrib)
def _data(self, text):
return self.target.data(self._fixtext(text))
def _end(self, tag):
return self.target.end(self._fixname(tag))
def _comment(self, data):
try:
comment = self.target.comment
except AttributeError:
pass
else:
return comment(self._fixtext(data))
def _pi(self, target, data):
try:
pi = self.target.pi
except AttributeError:
pass
else:
return pi(self._fixtext(target), self._fixtext(data))
def _default(self, text):
prefix = text[:1]
if prefix == "&":
# deal with undefined entities
try:
self.target.data(self.entity[text[1:-1]])
except KeyError:
from xml.parsers import expat
err = expat.error(
"undefined entity %s: line %d, column %d" %
(text, self._parser.ErrorLineNumber,
self._parser.ErrorColumnNumber)
)
err.code = 11 # XML_ERROR_UNDEFINED_ENTITY
err.lineno = self._parser.ErrorLineNumber
err.offset = self._parser.ErrorColumnNumber
raise err
elif prefix == "<" and text[:9] == "<!DOCTYPE":
self._doctype = [] # inside a doctype declaration
elif self._doctype is not None:
# parse doctype contents
if prefix == ">":
self._doctype = None
return
text = text.strip()
if not text:
return
self._doctype.append(text)
n = len(self._doctype)
if n > 2:
type = self._doctype[1]
if type == "PUBLIC" and n == 4:
name, type, pubid, system = self._doctype
elif type == "SYSTEM" and n == 3:
name, type, system = self._doctype
pubid = None
else:
return
if pubid:
pubid = pubid[1:-1]
if hasattr(self.target, "doctype"):
self.target.doctype(name, pubid, system[1:-1])
elif self.doctype != self._XMLParser__doctype:
# warn about deprecated call
self._XMLParser__doctype(name, pubid, system[1:-1])
self.doctype(name, pubid, system[1:-1])
self._doctype = None
##
# (Deprecated) Handles a doctype declaration.
#
# @param name Doctype name.
# @param pubid Public identifier.
# @param system System identifier.
def doctype(self, name, pubid, system):
"""This method of XMLParser is deprecated."""
warnings.warn(
"This method of XMLParser is deprecated. Define doctype() "
"method on the TreeBuilder target.",
DeprecationWarning,
)
# sentinel, if doctype is redefined in a subclass
__doctype = doctype
##
# Feeds data to the parser.
#
# @param data Encoded data.
def feed(self, data):
try:
self._parser.Parse(data, 0)
except self._error, v:
self._raiseerror(v)
##
# Finishes feeding data to the parser.
#
# @return An element structure.
# @defreturn Element
def close(self):
try:
self._parser.Parse("", 1) # end of data
except self._error, v:
self._raiseerror(v)
tree = self.target.close()
del self.target, self._parser # get rid of circular references
return tree
# compatibility
XMLTreeBuilder = XMLParser
# workaround circular import.
try:
from ElementC14N import _serialize_c14n
_serialize["c14n"] = _serialize_c14n
except ImportError:
pass
#
# ElementTree
# $Id: ElementPath.py 3375 2008-02-13 08:05:08Z fredrik $
#
# limited xpath support for element trees
#
# history:
# 2003-05-23 fl created
# 2003-05-28 fl added support for // etc
# 2003-08-27 fl fixed parsing of periods in element names
# 2007-09-10 fl new selection engine
# 2007-09-12 fl fixed parent selector
# 2007-09-13 fl added iterfind; changed findall to return a list
# 2007-11-30 fl added namespaces support
# 2009-10-30 fl added child element value filter
#
# Copyright (c) 2003-2009 by Fredrik Lundh. All rights reserved.
#
# [email protected]
# http://www.pythonware.com
#
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2009 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/psf/license for licensing details.
##
# Implementation module for XPath support. There's usually no reason
# to import this module directly; the <b>ElementTree</b> does this for
# you, if needed.
##
import re
xpath_tokenizer_re = re.compile(
"("
"'[^']*'|\"[^\"]*\"|"
"::|"
"//?|"
"\.\.|"
"\(\)|"
"[/.*:\[\]\(\)@=])|"
"((?:\{[^}]+\})?[^/\[\]\(\)@=\s]+)|"
"\s+"
)
def xpath_tokenizer(pattern, namespaces=None):
for token in xpath_tokenizer_re.findall(pattern):
tag = token[1]
if tag and tag[0] != "{" and ":" in tag:
try:
prefix, uri = tag.split(":", 1)
if not namespaces:
raise KeyError
yield token[0], "{%s}%s" % (namespaces[prefix], uri)
except KeyError:
raise SyntaxError("prefix %r not found in prefix map" % prefix)
else:
yield token
def get_parent_map(context):
parent_map = context.parent_map
if parent_map is None:
context.parent_map = parent_map = {}
for p in context.root.iter():
for e in p:
parent_map[e] = p
return parent_map
def prepare_child(next, token):
tag = token[1]
def select(context, result):
for elem in result:
for e in elem:
if e.tag == tag:
yield e
return select
def prepare_star(next, token):
def select(context, result):
for elem in result:
for e in elem:
yield e
return select
def prepare_self(next, token):
def select(context, result):
for elem in result:
yield elem
return select
def prepare_descendant(next, token):
token = next()
if token[0] == "*":
tag = "*"
elif not token[0]:
tag = token[1]
else:
raise SyntaxError("invalid descendant")
def select(context, result):
for elem in result:
for e in elem.iter(tag):
if e is not elem:
yield e
return select
def prepare_parent(next, token):
def select(context, result):
# FIXME: raise error if .. is applied at toplevel?
parent_map = get_parent_map(context)
result_map = {}
for elem in result:
if elem in parent_map:
parent = parent_map[elem]
if parent not in result_map:
result_map[parent] = None
yield parent
return select
def prepare_predicate(next, token):
# FIXME: replace with real parser!!! refs:
# http://effbot.org/zone/simple-iterator-parser.htm
# http://javascript.crockford.com/tdop/tdop.html
signature = []
predicate = []
while 1:
token = next()
if token[0] == "]":
break
if token[0] and token[0][:1] in "'\"":
token = "'", token[0][1:-1]
signature.append(token[0] or "-")
predicate.append(token[1])
signature = "".join(signature)
# use signature to determine predicate type
if signature == "@-":
# [@attribute] predicate
key = predicate[1]
def select(context, result):
for elem in result:
if elem.get(key) is not None:
yield elem
return select
if signature == "@-='":
# [@attribute='value']
key = predicate[1]
value = predicate[-1]
def select(context, result):
for elem in result:
if elem.get(key) == value:
yield elem
return select
if signature == "-" and not re.match("\d+$", predicate[0]):
# [tag]
tag = predicate[0]
def select(context, result):
for elem in result:
if elem.find(tag) is not None:
yield elem
return select
if signature == "-='" and not re.match("\d+$", predicate[0]):
# [tag='value']
tag = predicate[0]
value = predicate[-1]
def select(context, result):
for elem in result:
for e in elem.findall(tag):
if "".join(e.itertext()) == value:
yield elem
break
return select
if signature == "-" or signature == "-()" or signature == "-()-":
# [index] or [last()] or [last()-index]
if signature == "-":
index = int(predicate[0]) - 1
else:
if predicate[0] != "last":
raise SyntaxError("unsupported function")
if signature == "-()-":
try:
index = int(predicate[2]) - 1
except ValueError:
raise SyntaxError("unsupported expression")
else:
index = -1
def select(context, result):
parent_map = get_parent_map(context)
for elem in result:
try:
parent = parent_map[elem]
# FIXME: what if the selector is "*" ?
elems = list(parent.findall(elem.tag))
if elems[index] is elem:
yield elem
except (IndexError, KeyError):
pass
return select
raise SyntaxError("invalid predicate")
ops = {
"": prepare_child,
"*": prepare_star,
".": prepare_self,
"..": prepare_parent,
"//": prepare_descendant,
"[": prepare_predicate,
}
_cache = {}
class _SelectorContext:
parent_map = None
def __init__(self, root):
self.root = root
# --------------------------------------------------------------------
##
# Generate all matching objects.
def iterfind(elem, path, namespaces=None):
# compile selector pattern
if path[-1:] == "/":
path = path + "*" # implicit all (FIXME: keep this?)
try:
selector = _cache[path]
except KeyError:
if len(_cache) > 100:
_cache.clear()
if path[:1] == "/":
raise SyntaxError("cannot use absolute path on element")
next = iter(xpath_tokenizer(path, namespaces)).next
token = next()
selector = []
while 1:
try:
selector.append(ops[token[0]](next, token))
except StopIteration:
raise SyntaxError("invalid path")
try:
token = next()
if token[0] == "/":
token = next()
except StopIteration:
break
_cache[path] = selector
# execute selector pattern
result = [elem]
context = _SelectorContext(elem)
for select in selector:
result = select(context, result)
return result
##
# Find first matching object.
def find(elem, path, namespaces=None):
try:
return iterfind(elem, path, namespaces).next()
except StopIteration:
return None
##
# Find all matching objects.
def findall(elem, path, namespaces=None):
return list(iterfind(elem, path, namespaces))
##
# Find text for first matching object.
def findtext(elem, path, default=None, namespaces=None):
try:
elem = iterfind(elem, path, namespaces).next()
return elem.text or ""
except StopIteration:
return default
#
# ElementTree
# $Id: ElementInclude.py 3375 2008-02-13 08:05:08Z fredrik $
#
# limited xinclude support for element trees
#
# history:
# 2003-08-15 fl created
# 2003-11-14 fl fixed default loader
#
# Copyright (c) 2003-2004 by Fredrik Lundh. All rights reserved.
#
# [email protected]
# http://www.pythonware.com
#
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/psf/license for licensing details.
##
# Limited XInclude support for the ElementTree package.
##
import copy
from . import ElementTree
XINCLUDE = "{http://www.w3.org/2001/XInclude}"
XINCLUDE_INCLUDE = XINCLUDE + "include"
XINCLUDE_FALLBACK = XINCLUDE + "fallback"
##
# Fatal include error.
class FatalIncludeError(SyntaxError):
pass
##
# Default loader. This loader reads an included resource from disk.
#
# @param href Resource reference.
# @param parse Parse mode. Either "xml" or "text".
# @param encoding Optional text encoding.
# @return The expanded resource. If the parse mode is "xml", this
# is an ElementTree instance. If the parse mode is "text", this
# is a Unicode string. If the loader fails, it can return None
# or raise an IOError exception.
# @throws IOError If the loader fails to load the resource.
def default_loader(href, parse, encoding=None):
with open(href) as file:
if parse == "xml":
data = ElementTree.parse(file).getroot()
else:
data = file.read()
if encoding:
data = data.decode(encoding)
return data
##
# Expand XInclude directives.
#
# @param elem Root element.
# @param loader Optional resource loader. If omitted, it defaults
# to {@link default_loader}. If given, it should be a callable
# that implements the same interface as <b>default_loader</b>.
# @throws FatalIncludeError If the function fails to include a given
# resource, or if the tree contains malformed XInclude elements.
# @throws IOError If the function fails to load a given resource.
def include(elem, loader=None):
if loader is None:
loader = default_loader
# look for xinclude elements
i = 0
while i < len(elem):
e = elem[i]
if e.tag == XINCLUDE_INCLUDE:
# process xinclude directive
href = e.get("href")
parse = e.get("parse", "xml")
if parse == "xml":
node = loader(href, parse)
if node is None:
raise FatalIncludeError(
"cannot load %r as %r" % (href, parse)
)
node = copy.copy(node)
if e.tail:
node.tail = (node.tail or "") + e.tail
elem[i] = node
elif parse == "text":
text = loader(href, parse, e.get("encoding"))
if text is None:
raise FatalIncludeError(
"cannot load %r as %r" % (href, parse)
)
if i:
node = elem[i-1]
node.tail = (node.tail or "") + text + (e.tail or "")
else:
elem.text = (elem.text or "") + text + (e.tail or "")
del elem[i]
continue
else:
raise FatalIncludeError(
"unknown parse type in xi:include tag (%r)" % parse
)
elif e.tag == XINCLUDE_FALLBACK:
raise FatalIncludeError(
"xi:fallback tag must be child of xi:include (%r)" % e.tag
)
else:
include(e, loader)
i = i + 1
# Wrapper module for ElementTree
from xml.etree.ElementTree import *
"""W3C Document Object Model implementation for Python.
The Python mapping of the Document Object Model is documented in the
Python Library Reference in the section on the xml.dom package.
This package contains the following modules:
minidom -- A simple implementation of the Level 1 DOM with namespace
support added (based on the Level 2 specification) and other
minor Level 2 functionality.
pulldom -- DOM builder supporting on-demand tree-building for selected
subtrees of the document.
"""
class Node:
"""Class giving the NodeType constants."""
# DOM implementations may use this as a base class for their own
# Node implementations. If they don't, the constants defined here
# should still be used as the canonical definitions as they match
# the values given in the W3C recommendation. Client code can
# safely refer to these values in all tests of Node.nodeType
# values.
ELEMENT_NODE = 1
ATTRIBUTE_NODE = 2
TEXT_NODE = 3
CDATA_SECTION_NODE = 4
ENTITY_REFERENCE_NODE = 5
ENTITY_NODE = 6
PROCESSING_INSTRUCTION_NODE = 7
COMMENT_NODE = 8
DOCUMENT_NODE = 9
DOCUMENT_TYPE_NODE = 10
DOCUMENT_FRAGMENT_NODE = 11
NOTATION_NODE = 12
#ExceptionCode
INDEX_SIZE_ERR = 1
DOMSTRING_SIZE_ERR = 2
HIERARCHY_REQUEST_ERR = 3
WRONG_DOCUMENT_ERR = 4
INVALID_CHARACTER_ERR = 5
NO_DATA_ALLOWED_ERR = 6
NO_MODIFICATION_ALLOWED_ERR = 7
NOT_FOUND_ERR = 8
NOT_SUPPORTED_ERR = 9
INUSE_ATTRIBUTE_ERR = 10
INVALID_STATE_ERR = 11
SYNTAX_ERR = 12
INVALID_MODIFICATION_ERR = 13
NAMESPACE_ERR = 14
INVALID_ACCESS_ERR = 15
VALIDATION_ERR = 16
class DOMException(Exception):
"""Abstract base class for DOM exceptions.
Exceptions with specific codes are specializations of this class."""
def __init__(self, *args, **kw):
if self.__class__ is DOMException:
raise RuntimeError(
"DOMException should not be instantiated directly")
Exception.__init__(self, *args, **kw)
def _get_code(self):
return self.code
class IndexSizeErr(DOMException):
code = INDEX_SIZE_ERR
class DomstringSizeErr(DOMException):
code = DOMSTRING_SIZE_ERR
class HierarchyRequestErr(DOMException):
code = HIERARCHY_REQUEST_ERR
class WrongDocumentErr(DOMException):
code = WRONG_DOCUMENT_ERR
class InvalidCharacterErr(DOMException):
code = INVALID_CHARACTER_ERR
class NoDataAllowedErr(DOMException):
code = NO_DATA_ALLOWED_ERR
class NoModificationAllowedErr(DOMException):
code = NO_MODIFICATION_ALLOWED_ERR
class NotFoundErr(DOMException):
code = NOT_FOUND_ERR
class NotSupportedErr(DOMException):
code = NOT_SUPPORTED_ERR
class InuseAttributeErr(DOMException):
code = INUSE_ATTRIBUTE_ERR
class InvalidStateErr(DOMException):
code = INVALID_STATE_ERR
class SyntaxErr(DOMException):
code = SYNTAX_ERR
class InvalidModificationErr(DOMException):
code = INVALID_MODIFICATION_ERR
class NamespaceErr(DOMException):
code = NAMESPACE_ERR
class InvalidAccessErr(DOMException):
code = INVALID_ACCESS_ERR
class ValidationErr(DOMException):
code = VALIDATION_ERR
class UserDataHandler:
"""Class giving the operation constants for UserDataHandler.handle()."""
# Based on DOM Level 3 (WD 9 April 2002)
NODE_CLONED = 1
NODE_IMPORTED = 2
NODE_DELETED = 3
NODE_RENAMED = 4
XML_NAMESPACE = "http://www.w3.org/XML/1998/namespace"
XMLNS_NAMESPACE = "http://www.w3.org/2000/xmlns/"
XHTML_NAMESPACE = "http://www.w3.org/1999/xhtml"
EMPTY_NAMESPACE = None
EMPTY_PREFIX = None
from domreg import getDOMImplementation,registerDOMImplementation
"""Implementation of the DOM Level 3 'LS-Load' feature."""
import copy
import xml.dom
from xml.dom.NodeFilter import NodeFilter
__all__ = ["DOMBuilder", "DOMEntityResolver", "DOMInputSource"]
class Options:
"""Features object that has variables set for each DOMBuilder feature.
The DOMBuilder class uses an instance of this class to pass settings to
the ExpatBuilder class.
"""
# Note that the DOMBuilder class in LoadSave constrains which of these
# values can be set using the DOM Level 3 LoadSave feature.
namespaces = 1
namespace_declarations = True
validation = False
external_parameter_entities = True
external_general_entities = True
external_dtd_subset = True
validate_if_schema = False
validate = False
datatype_normalization = False
create_entity_ref_nodes = True
entities = True
whitespace_in_element_content = True
cdata_sections = True
comments = True
charset_overrides_xml_encoding = True
infoset = False
supported_mediatypes_only = False
errorHandler = None
filter = None
class DOMBuilder:
entityResolver = None
errorHandler = None
filter = None
ACTION_REPLACE = 1
ACTION_APPEND_AS_CHILDREN = 2
ACTION_INSERT_AFTER = 3
ACTION_INSERT_BEFORE = 4
_legal_actions = (ACTION_REPLACE, ACTION_APPEND_AS_CHILDREN,
ACTION_INSERT_AFTER, ACTION_INSERT_BEFORE)
def __init__(self):
self._options = Options()
def _get_entityResolver(self):
return self.entityResolver
def _set_entityResolver(self, entityResolver):
self.entityResolver = entityResolver
def _get_errorHandler(self):
return self.errorHandler
def _set_errorHandler(self, errorHandler):
self.errorHandler = errorHandler
def _get_filter(self):
return self.filter
def _set_filter(self, filter):
self.filter = filter
def setFeature(self, name, state):
if self.supportsFeature(name):
state = state and 1 or 0
try:
settings = self._settings[(_name_xform(name), state)]
except KeyError:
raise xml.dom.NotSupportedErr(
"unsupported feature: %r" % (name,))
else:
for name, value in settings:
setattr(self._options, name, value)
else:
raise xml.dom.NotFoundErr("unknown feature: " + repr(name))
def supportsFeature(self, name):
return hasattr(self._options, _name_xform(name))
def canSetFeature(self, name, state):
key = (_name_xform(name), state and 1 or 0)
return key in self._settings
# This dictionary maps from (feature,value) to a list of
# (option,value) pairs that should be set on the Options object.
# If a (feature,value) setting is not in this dictionary, it is
# not supported by the DOMBuilder.
#
_settings = {
("namespace_declarations", 0): [
("namespace_declarations", 0)],
("namespace_declarations", 1): [
("namespace_declarations", 1)],
("validation", 0): [
("validation", 0)],
("external_general_entities", 0): [
("external_general_entities", 0)],
("external_general_entities", 1): [
("external_general_entities", 1)],
("external_parameter_entities", 0): [
("external_parameter_entities", 0)],
("external_parameter_entities", 1): [
("external_parameter_entities", 1)],
("validate_if_schema", 0): [
("validate_if_schema", 0)],
("create_entity_ref_nodes", 0): [
("create_entity_ref_nodes", 0)],
("create_entity_ref_nodes", 1): [
("create_entity_ref_nodes", 1)],
("entities", 0): [
("create_entity_ref_nodes", 0),
("entities", 0)],
("entities", 1): [
("entities", 1)],
("whitespace_in_element_content", 0): [
("whitespace_in_element_content", 0)],
("whitespace_in_element_content", 1): [
("whitespace_in_element_content", 1)],
("cdata_sections", 0): [
("cdata_sections", 0)],
("cdata_sections", 1): [
("cdata_sections", 1)],
("comments", 0): [
("comments", 0)],
("comments", 1): [
("comments", 1)],
("charset_overrides_xml_encoding", 0): [
("charset_overrides_xml_encoding", 0)],
("charset_overrides_xml_encoding", 1): [
("charset_overrides_xml_encoding", 1)],
("infoset", 0): [],
("infoset", 1): [
("namespace_declarations", 0),
("validate_if_schema", 0),
("create_entity_ref_nodes", 0),
("entities", 0),
("cdata_sections", 0),
("datatype_normalization", 1),
("whitespace_in_element_content", 1),
("comments", 1),
("charset_overrides_xml_encoding", 1)],
("supported_mediatypes_only", 0): [
("supported_mediatypes_only", 0)],
("namespaces", 0): [
("namespaces", 0)],
("namespaces", 1): [
("namespaces", 1)],
}
def getFeature(self, name):
xname = _name_xform(name)
try:
return getattr(self._options, xname)
except AttributeError:
if name == "infoset":
options = self._options
return (options.datatype_normalization
and options.whitespace_in_element_content
and options.comments
and options.charset_overrides_xml_encoding
and not (options.namespace_declarations
or options.validate_if_schema
or options.create_entity_ref_nodes
or options.entities
or options.cdata_sections))
raise xml.dom.NotFoundErr("feature %s not known" % repr(name))
def parseURI(self, uri):
if self.entityResolver:
input = self.entityResolver.resolveEntity(None, uri)
else:
input = DOMEntityResolver().resolveEntity(None, uri)
return self.parse(input)
def parse(self, input):
options = copy.copy(self._options)
options.filter = self.filter
options.errorHandler = self.errorHandler
fp = input.byteStream
if fp is None and options.systemId:
import urllib2
fp = urllib2.urlopen(input.systemId)
return self._parse_bytestream(fp, options)
def parseWithContext(self, input, cnode, action):
if action not in self._legal_actions:
raise ValueError("not a legal action")
raise NotImplementedError("Haven't written this yet...")
def _parse_bytestream(self, stream, options):
import xml.dom.expatbuilder
builder = xml.dom.expatbuilder.makeBuilder(options)
return builder.parseFile(stream)
def _name_xform(name):
return name.lower().replace('-', '_')
class DOMEntityResolver(object):
__slots__ = '_opener',
def resolveEntity(self, publicId, systemId):
assert systemId is not None
source = DOMInputSource()
source.publicId = publicId
source.systemId = systemId
source.byteStream = self._get_opener().open(systemId)
# determine the encoding if the transport provided it
source.encoding = self._guess_media_encoding(source)
# determine the base URI is we can
import posixpath, urlparse
parts = urlparse.urlparse(systemId)
scheme, netloc, path, params, query, fragment = parts
# XXX should we check the scheme here as well?
if path and not path.endswith("/"):
path = posixpath.dirname(path) + "/"
parts = scheme, netloc, path, params, query, fragment
source.baseURI = urlparse.urlunparse(parts)
return source
def _get_opener(self):
try:
return self._opener
except AttributeError:
self._opener = self._create_opener()
return self._opener
def _create_opener(self):
import urllib2
return urllib2.build_opener()
def _guess_media_encoding(self, source):
info = source.byteStream.info()
if "Content-Type" in info:
for param in info.getplist():
if param.startswith("charset="):
return param.split("=", 1)[1].lower()
class DOMInputSource(object):
__slots__ = ('byteStream', 'characterStream', 'stringData',
'encoding', 'publicId', 'systemId', 'baseURI')
def __init__(self):
self.byteStream = None
self.characterStream = None
self.stringData = None
self.encoding = None
self.publicId = None
self.systemId = None
self.baseURI = None
def _get_byteStream(self):
return self.byteStream
def _set_byteStream(self, byteStream):
self.byteStream = byteStream
def _get_characterStream(self):
return self.characterStream
def _set_characterStream(self, characterStream):
self.characterStream = characterStream
def _get_stringData(self):
return self.stringData
def _set_stringData(self, data):
self.stringData = data
def _get_encoding(self):
return self.encoding
def _set_encoding(self, encoding):
self.encoding = encoding
def _get_publicId(self):
return self.publicId
def _set_publicId(self, publicId):
self.publicId = publicId
def _get_systemId(self):
return self.systemId
def _set_systemId(self, systemId):
self.systemId = systemId
def _get_baseURI(self):
return self.baseURI
def _set_baseURI(self, uri):
self.baseURI = uri
class DOMBuilderFilter:
"""Element filter which can be used to tailor construction of
a DOM instance.
"""
# There's really no need for this class; concrete implementations
# should just implement the endElement() and startElement()
# methods as appropriate. Using this makes it easy to only
# implement one of them.
FILTER_ACCEPT = 1
FILTER_REJECT = 2
FILTER_SKIP = 3
FILTER_INTERRUPT = 4
whatToShow = NodeFilter.SHOW_ALL
def _get_whatToShow(self):
return self.whatToShow
def acceptNode(self, element):
return self.FILTER_ACCEPT
def startContainer(self, element):
return self.FILTER_ACCEPT
del NodeFilter
class DocumentLS:
"""Mixin to create documents that conform to the load/save spec."""
async = False
def _get_async(self):
return False
def _set_async(self, async):
if async:
raise xml.dom.NotSupportedErr(
"asynchronous document loading is not supported")
def abort(self):
# What does it mean to "clear" a document? Does the
# documentElement disappear?
raise NotImplementedError(
"haven't figured out what this means yet")
def load(self, uri):
raise NotImplementedError("haven't written this yet")
def loadXML(self, source):
raise NotImplementedError("haven't written this yet")
def saveXML(self, snode):
if snode is None:
snode = self
elif snode.ownerDocument is not self:
raise xml.dom.WrongDocumentErr()
return snode.toxml()
class DOMImplementationLS:
MODE_SYNCHRONOUS = 1
MODE_ASYNCHRONOUS = 2
def createDOMBuilder(self, mode, schemaType):
if schemaType is not None:
raise xml.dom.NotSupportedErr(
"schemaType not yet supported")
if mode == self.MODE_SYNCHRONOUS:
return DOMBuilder()
if mode == self.MODE_ASYNCHRONOUS:
raise xml.dom.NotSupportedErr(
"asynchronous builders are not supported")
raise ValueError("unknown value for mode")
def createDOMWriter(self):
raise NotImplementedError(
"the writer interface hasn't been written yet!")
def createDOMInputSource(self):
return DOMInputSource()
import xml.sax
import xml.sax.handler
import types
try:
_StringTypes = [types.StringType, types.UnicodeType]
except AttributeError:
_StringTypes = [types.StringType]
START_ELEMENT = "START_ELEMENT"
END_ELEMENT = "END_ELEMENT"
COMMENT = "COMMENT"
START_DOCUMENT = "START_DOCUMENT"
END_DOCUMENT = "END_DOCUMENT"
PROCESSING_INSTRUCTION = "PROCESSING_INSTRUCTION"
IGNORABLE_WHITESPACE = "IGNORABLE_WHITESPACE"
CHARACTERS = "CHARACTERS"
class PullDOM(xml.sax.ContentHandler):
_locator = None
document = None
def __init__(self, documentFactory=None):
from xml.dom import XML_NAMESPACE
self.documentFactory = documentFactory
self.firstEvent = [None, None]
self.lastEvent = self.firstEvent
self.elementStack = []
self.push = self.elementStack.append
try:
self.pop = self.elementStack.pop
except AttributeError:
# use class' pop instead
pass
self._ns_contexts = [{XML_NAMESPACE:'xml'}] # contains uri -> prefix dicts
self._current_context = self._ns_contexts[-1]
self.pending_events = []
def pop(self):
result = self.elementStack[-1]
del self.elementStack[-1]
return result
def setDocumentLocator(self, locator):
self._locator = locator
def startPrefixMapping(self, prefix, uri):
if not hasattr(self, '_xmlns_attrs'):
self._xmlns_attrs = []
self._xmlns_attrs.append((prefix or 'xmlns', uri))
self._ns_contexts.append(self._current_context.copy())
self._current_context[uri] = prefix or None
def endPrefixMapping(self, prefix):
self._current_context = self._ns_contexts.pop()
def startElementNS(self, name, tagName , attrs):
# Retrieve xml namespace declaration attributes.
xmlns_uri = 'http://www.w3.org/2000/xmlns/'
xmlns_attrs = getattr(self, '_xmlns_attrs', None)
if xmlns_attrs is not None:
for aname, value in xmlns_attrs:
attrs._attrs[(xmlns_uri, aname)] = value
self._xmlns_attrs = []
uri, localname = name
if uri:
# When using namespaces, the reader may or may not
# provide us with the original name. If not, create
# *a* valid tagName from the current context.
if tagName is None:
prefix = self._current_context[uri]
if prefix:
tagName = prefix + ":" + localname
else:
tagName = localname
if self.document:
node = self.document.createElementNS(uri, tagName)
else:
node = self.buildDocument(uri, tagName)
else:
# When the tagname is not prefixed, it just appears as
# localname
if self.document:
node = self.document.createElement(localname)
else:
node = self.buildDocument(None, localname)
for aname,value in attrs.items():
a_uri, a_localname = aname
if a_uri == xmlns_uri:
if a_localname == 'xmlns':
qname = a_localname
else:
qname = 'xmlns:' + a_localname
attr = self.document.createAttributeNS(a_uri, qname)
node.setAttributeNodeNS(attr)
elif a_uri:
prefix = self._current_context[a_uri]
if prefix:
qname = prefix + ":" + a_localname
else:
qname = a_localname
attr = self.document.createAttributeNS(a_uri, qname)
node.setAttributeNodeNS(attr)
else:
attr = self.document.createAttribute(a_localname)
node.setAttributeNode(attr)
attr.value = value
self.lastEvent[1] = [(START_ELEMENT, node), None]
self.lastEvent = self.lastEvent[1]
self.push(node)
def endElementNS(self, name, tagName):
self.lastEvent[1] = [(END_ELEMENT, self.pop()), None]
self.lastEvent = self.lastEvent[1]
def startElement(self, name, attrs):
if self.document:
node = self.document.createElement(name)
else:
node = self.buildDocument(None, name)
for aname,value in attrs.items():
attr = self.document.createAttribute(aname)
attr.value = value
node.setAttributeNode(attr)
self.lastEvent[1] = [(START_ELEMENT, node), None]
self.lastEvent = self.lastEvent[1]
self.push(node)
def endElement(self, name):
self.lastEvent[1] = [(END_ELEMENT, self.pop()), None]
self.lastEvent = self.lastEvent[1]
def comment(self, s):
if self.document:
node = self.document.createComment(s)
self.lastEvent[1] = [(COMMENT, node), None]
self.lastEvent = self.lastEvent[1]
else:
event = [(COMMENT, s), None]
self.pending_events.append(event)
def processingInstruction(self, target, data):
if self.document:
node = self.document.createProcessingInstruction(target, data)
self.lastEvent[1] = [(PROCESSING_INSTRUCTION, node), None]
self.lastEvent = self.lastEvent[1]
else:
event = [(PROCESSING_INSTRUCTION, target, data), None]
self.pending_events.append(event)
def ignorableWhitespace(self, chars):
node = self.document.createTextNode(chars)
self.lastEvent[1] = [(IGNORABLE_WHITESPACE, node), None]
self.lastEvent = self.lastEvent[1]
def characters(self, chars):
node = self.document.createTextNode(chars)
self.lastEvent[1] = [(CHARACTERS, node), None]
self.lastEvent = self.lastEvent[1]
def startDocument(self):
if self.documentFactory is None:
import xml.dom.minidom
self.documentFactory = xml.dom.minidom.Document.implementation
def buildDocument(self, uri, tagname):
# Can't do that in startDocument, since we need the tagname
# XXX: obtain DocumentType
node = self.documentFactory.createDocument(uri, tagname, None)
self.document = node
self.lastEvent[1] = [(START_DOCUMENT, node), None]
self.lastEvent = self.lastEvent[1]
self.push(node)
# Put everything we have seen so far into the document
for e in self.pending_events:
if e[0][0] == PROCESSING_INSTRUCTION:
_,target,data = e[0]
n = self.document.createProcessingInstruction(target, data)
e[0] = (PROCESSING_INSTRUCTION, n)
elif e[0][0] == COMMENT:
n = self.document.createComment(e[0][1])
e[0] = (COMMENT, n)
else:
raise AssertionError("Unknown pending event ",e[0][0])
self.lastEvent[1] = e
self.lastEvent = e
self.pending_events = None
return node.firstChild
def endDocument(self):
self.lastEvent[1] = [(END_DOCUMENT, self.document), None]
self.pop()
def clear(self):
"clear(): Explicitly release parsing structures"
self.document = None
class ErrorHandler:
def warning(self, exception):
print exception
def error(self, exception):
raise exception
def fatalError(self, exception):
raise exception
class DOMEventStream:
def __init__(self, stream, parser, bufsize):
self.stream = stream
self.parser = parser
self.bufsize = bufsize
if not hasattr(self.parser, 'feed'):
self.getEvent = self._slurp
self.reset()
def reset(self):
self.pulldom = PullDOM()
# This content handler relies on namespace support
self.parser.setFeature(xml.sax.handler.feature_namespaces, 1)
self.parser.setContentHandler(self.pulldom)
def __getitem__(self, pos):
rc = self.getEvent()
if rc:
return rc
raise IndexError
def next(self):
rc = self.getEvent()
if rc:
return rc
raise StopIteration
def __iter__(self):
return self
def expandNode(self, node):
event = self.getEvent()
parents = [node]
while event:
token, cur_node = event
if cur_node is node:
return
if token != END_ELEMENT:
parents[-1].appendChild(cur_node)
if token == START_ELEMENT:
parents.append(cur_node)
elif token == END_ELEMENT:
del parents[-1]
event = self.getEvent()
def getEvent(self):
# use IncrementalParser interface, so we get the desired
# pull effect
if not self.pulldom.firstEvent[1]:
self.pulldom.lastEvent = self.pulldom.firstEvent
while not self.pulldom.firstEvent[1]:
buf = self.stream.read(self.bufsize)
if not buf:
self.parser.close()
return None
self.parser.feed(buf)
rc = self.pulldom.firstEvent[1][0]
self.pulldom.firstEvent[1] = self.pulldom.firstEvent[1][1]
return rc
def _slurp(self):
""" Fallback replacement for getEvent() using the
standard SAX2 interface, which means we slurp the
SAX events into memory (no performance gain, but
we are compatible to all SAX parsers).
"""
self.parser.parse(self.stream)
self.getEvent = self._emit
return self._emit()
def _emit(self):
""" Fallback replacement for getEvent() that emits
the events that _slurp() read previously.
"""
rc = self.pulldom.firstEvent[1][0]
self.pulldom.firstEvent[1] = self.pulldom.firstEvent[1][1]
return rc
def clear(self):
"""clear(): Explicitly release parsing objects"""
self.pulldom.clear()
del self.pulldom
self.parser = None
self.stream = None
class SAX2DOM(PullDOM):
def startElementNS(self, name, tagName , attrs):
PullDOM.startElementNS(self, name, tagName, attrs)
curNode = self.elementStack[-1]
parentNode = self.elementStack[-2]
parentNode.appendChild(curNode)
def startElement(self, name, attrs):
PullDOM.startElement(self, name, attrs)
curNode = self.elementStack[-1]
parentNode = self.elementStack[-2]
parentNode.appendChild(curNode)
def processingInstruction(self, target, data):
PullDOM.processingInstruction(self, target, data)
node = self.lastEvent[0][1]
parentNode = self.elementStack[-1]
parentNode.appendChild(node)
def ignorableWhitespace(self, chars):
PullDOM.ignorableWhitespace(self, chars)
node = self.lastEvent[0][1]
parentNode = self.elementStack[-1]
parentNode.appendChild(node)
def characters(self, chars):
PullDOM.characters(self, chars)
node = self.lastEvent[0][1]
parentNode = self.elementStack[-1]
parentNode.appendChild(node)
default_bufsize = (2 ** 14) - 20
def parse(stream_or_string, parser=None, bufsize=None):
if bufsize is None:
bufsize = default_bufsize
if type(stream_or_string) in _StringTypes:
stream = open(stream_or_string)
else:
stream = stream_or_string
if not parser:
parser = xml.sax.make_parser()
return DOMEventStream(stream, parser, bufsize)
def parseString(string, parser=None):
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
bufsize = len(string)
buf = StringIO(string)
if not parser:
parser = xml.sax.make_parser()
return DOMEventStream(buf, parser, bufsize)
# This is the Python mapping for interface NodeFilter from
# DOM2-Traversal-Range. It contains only constants.
class NodeFilter:
"""
This is the DOM2 NodeFilter interface. It contains only constants.
"""
FILTER_ACCEPT = 1
FILTER_REJECT = 2
FILTER_SKIP = 3
SHOW_ALL = 0xFFFFFFFFL
SHOW_ELEMENT = 0x00000001
SHOW_ATTRIBUTE = 0x00000002
SHOW_TEXT = 0x00000004
SHOW_CDATA_SECTION = 0x00000008
SHOW_ENTITY_REFERENCE = 0x00000010
SHOW_ENTITY = 0x00000020
SHOW_PROCESSING_INSTRUCTION = 0x00000040
SHOW_COMMENT = 0x00000080
SHOW_DOCUMENT = 0x00000100
SHOW_DOCUMENT_TYPE = 0x00000200
SHOW_DOCUMENT_FRAGMENT = 0x00000400
SHOW_NOTATION = 0x00000800
def acceptNode(self, node):
raise NotImplementedError
"""Running tests"""
import sys
import time
from . import result
from .signals import registerResult
__unittest = True
class _WritelnDecorator(object):
"""Used to decorate file-like objects with a handy 'writeln' method"""
def __init__(self,stream):
self.stream = stream
def __getattr__(self, attr):
if attr in ('stream', '__getstate__'):
raise AttributeError(attr)
return getattr(self.stream,attr)
def writeln(self, arg=None):
if arg:
self.write(arg)
self.write('\n') # text-mode streams translate to \r\n if needed
class TextTestResult(result.TestResult):
"""A test result class that can print formatted text results to a stream.
Used by TextTestRunner.
"""
separator1 = '=' * 70
separator2 = '-' * 70
def __init__(self, stream, descriptions, verbosity):
super(TextTestResult, self).__init__(stream, descriptions, verbosity)
self.stream = stream
self.showAll = verbosity > 1
self.dots = verbosity == 1
self.descriptions = descriptions
def getDescription(self, test):
doc_first_line = test.shortDescription()
if self.descriptions and doc_first_line:
return '\n'.join((str(test), doc_first_line))
else:
return str(test)
def startTest(self, test):
super(TextTestResult, self).startTest(test)
if self.showAll:
self.stream.write(self.getDescription(test))
self.stream.write(" ... ")
self.stream.flush()
def addSuccess(self, test):
super(TextTestResult, self).addSuccess(test)
if self.showAll:
self.stream.writeln("ok")
elif self.dots:
self.stream.write('.')
self.stream.flush()
def addError(self, test, err):
super(TextTestResult, self).addError(test, err)
if self.showAll:
self.stream.writeln("ERROR")
elif self.dots:
self.stream.write('E')
self.stream.flush()
def addFailure(self, test, err):
super(TextTestResult, self).addFailure(test, err)
if self.showAll:
self.stream.writeln("FAIL")
elif self.dots:
self.stream.write('F')
self.stream.flush()
def addSkip(self, test, reason):
super(TextTestResult, self).addSkip(test, reason)
if self.showAll:
self.stream.writeln("skipped {0!r}".format(reason))
elif self.dots:
self.stream.write("s")
self.stream.flush()
def addExpectedFailure(self, test, err):
super(TextTestResult, self).addExpectedFailure(test, err)
if self.showAll:
self.stream.writeln("expected failure")
elif self.dots:
self.stream.write("x")
self.stream.flush()
def addUnexpectedSuccess(self, test):
super(TextTestResult, self).addUnexpectedSuccess(test)
if self.showAll:
self.stream.writeln("unexpected success")
elif self.dots:
self.stream.write("u")
self.stream.flush()
def printErrors(self):
if self.dots or self.showAll:
self.stream.writeln()
self.printErrorList('ERROR', self.errors)
self.printErrorList('FAIL', self.failures)
def printErrorList(self, flavour, errors):
for test, err in errors:
self.stream.writeln(self.separator1)
self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
self.stream.writeln(self.separator2)
self.stream.writeln("%s" % err)
class TextTestRunner(object):
"""A test runner class that displays results in textual form.
It prints out the names of tests as they are run, errors as they
occur, and a summary of the results at the end of the test run.
"""
resultclass = TextTestResult
def __init__(self, stream=sys.stderr, descriptions=True, verbosity=1,
failfast=False, buffer=False, resultclass=None):
self.stream = _WritelnDecorator(stream)
self.descriptions = descriptions
self.verbosity = verbosity
self.failfast = failfast
self.buffer = buffer
if resultclass is not None:
self.resultclass = resultclass
def _makeResult(self):
return self.resultclass(self.stream, self.descriptions, self.verbosity)
def run(self, test):
"Run the given test case or test suite."
result = self._makeResult()
registerResult(result)
result.failfast = self.failfast
result.buffer = self.buffer
startTime = time.time()
startTestRun = getattr(result, 'startTestRun', None)
if startTestRun is not None:
startTestRun()
try:
test(result)
finally:
stopTestRun = getattr(result, 'stopTestRun', None)
if stopTestRun is not None:
stopTestRun()
stopTime = time.time()
timeTaken = stopTime - startTime
result.printErrors()
if hasattr(result, 'separator2'):
self.stream.writeln(result.separator2)
run = result.testsRun
self.stream.writeln("Ran %d test%s in %.3fs" %
(run, run != 1 and "s" or "", timeTaken))
self.stream.writeln()
expectedFails = unexpectedSuccesses = skipped = 0
try:
results = map(len, (result.expectedFailures,
result.unexpectedSuccesses,
result.skipped))
except AttributeError:
pass
else:
expectedFails, unexpectedSuccesses, skipped = results
infos = []
if not result.wasSuccessful():
self.stream.write("FAILED")
failed, errored = map(len, (result.failures, result.errors))
if failed:
infos.append("failures=%d" % failed)
if errored:
infos.append("errors=%d" % errored)
else:
self.stream.write("OK")
if skipped:
infos.append("skipped=%d" % skipped)
if expectedFails:
infos.append("expected failures=%d" % expectedFails)
if unexpectedSuccesses:
infos.append("unexpected successes=%d" % unexpectedSuccesses)
if infos:
self.stream.writeln(" (%s)" % (", ".join(infos),))
else:
self.stream.write("\n")
return result
"""Test result object"""
import os
import sys
import traceback
from StringIO import StringIO
from . import util
from functools import wraps
__unittest = True
def failfast(method):
@wraps(method)
def inner(self, *args, **kw):
if getattr(self, 'failfast', False):
self.stop()
return method(self, *args, **kw)
return inner
STDOUT_LINE = '\nStdout:\n%s'
STDERR_LINE = '\nStderr:\n%s'
class TestResult(object):
"""Holder for test result information.
Test results are automatically managed by the TestCase and TestSuite
classes, and do not need to be explicitly manipulated by writers of tests.
Each instance holds the total number of tests run, and collections of
failures and errors that occurred among those test runs. The collections
contain tuples of (testcase, exceptioninfo), where exceptioninfo is the
formatted traceback of the error that occurred.
"""
_previousTestClass = None
_testRunEntered = False
_moduleSetUpFailed = False
def __init__(self, stream=None, descriptions=None, verbosity=None):
self.failfast = False
self.failures = []
self.errors = []
self.testsRun = 0
self.skipped = []
self.expectedFailures = []
self.unexpectedSuccesses = []
self.shouldStop = False
self.buffer = False
self._stdout_buffer = None
self._stderr_buffer = None
self._original_stdout = sys.stdout
self._original_stderr = sys.stderr
self._mirrorOutput = False
def printErrors(self):
"Called by TestRunner after test run"
def startTest(self, test):
"Called when the given test is about to be run"
self.testsRun += 1
self._mirrorOutput = False
self._setupStdout()
def _setupStdout(self):
if self.buffer:
if self._stderr_buffer is None:
self._stderr_buffer = StringIO()
self._stdout_buffer = StringIO()
sys.stdout = self._stdout_buffer
sys.stderr = self._stderr_buffer
def startTestRun(self):
"""Called once before any tests are executed.
See startTest for a method called before each test.
"""
def stopTest(self, test):
"""Called when the given test has been run"""
self._restoreStdout()
self._mirrorOutput = False
def _restoreStdout(self):
if self.buffer:
if self._mirrorOutput:
output = sys.stdout.getvalue()
error = sys.stderr.getvalue()
if output:
if not output.endswith('\n'):
output += '\n'
self._original_stdout.write(STDOUT_LINE % output)
if error:
if not error.endswith('\n'):
error += '\n'
self._original_stderr.write(STDERR_LINE % error)
sys.stdout = self._original_stdout
sys.stderr = self._original_stderr
self._stdout_buffer.seek(0)
self._stdout_buffer.truncate()
self._stderr_buffer.seek(0)
self._stderr_buffer.truncate()
def stopTestRun(self):
"""Called once after all tests are executed.
See stopTest for a method called after each test.
"""
@failfast
def addError(self, test, err):
"""Called when an error has occurred. 'err' is a tuple of values as
returned by sys.exc_info().
"""
self.errors.append((test, self._exc_info_to_string(err, test)))
self._mirrorOutput = True
@failfast
def addFailure(self, test, err):
"""Called when an error has occurred. 'err' is a tuple of values as
returned by sys.exc_info()."""
self.failures.append((test, self._exc_info_to_string(err, test)))
self._mirrorOutput = True
def addSuccess(self, test):
"Called when a test has completed successfully"
pass
def addSkip(self, test, reason):
"""Called when a test is skipped."""
self.skipped.append((test, reason))
def addExpectedFailure(self, test, err):
"""Called when an expected failure/error occurred."""
self.expectedFailures.append(
(test, self._exc_info_to_string(err, test)))
@failfast
def addUnexpectedSuccess(self, test):
"""Called when a test was expected to fail, but succeed."""
self.unexpectedSuccesses.append(test)
def wasSuccessful(self):
"Tells whether or not this result was a success"
return len(self.failures) == len(self.errors) == 0
def stop(self):
"Indicates that the tests should be aborted"
self.shouldStop = True
def _exc_info_to_string(self, err, test):
"""Converts a sys.exc_info()-style tuple of values into a string."""
exctype, value, tb = err
# Skip test runner traceback levels
while tb and self._is_relevant_tb_level(tb):
tb = tb.tb_next
if exctype is test.failureException:
# Skip assert*() traceback levels
length = self._count_relevant_tb_levels(tb)
msgLines = traceback.format_exception(exctype, value, tb, length)
else:
msgLines = traceback.format_exception(exctype, value, tb)
if self.buffer:
output = sys.stdout.getvalue()
error = sys.stderr.getvalue()
if output:
if not output.endswith('\n'):
output += '\n'
msgLines.append(STDOUT_LINE % output)
if error:
if not error.endswith('\n'):
error += '\n'
msgLines.append(STDERR_LINE % error)
return ''.join(msgLines)
def _is_relevant_tb_level(self, tb):
return '__unittest' in tb.tb_frame.f_globals
def _count_relevant_tb_levels(self, tb):
length = 0
while tb and not self._is_relevant_tb_level(tb):
length += 1
tb = tb.tb_next
return length
def __repr__(self):
return ("<%s run=%i errors=%i failures=%i>" %
(util.strclass(self.__class__), self.testsRun, len(self.errors),
len(self.failures)))
"""Unittest main program"""
import sys
import os
import types
from . import loader, runner
from .signals import installHandler
__unittest = True
FAILFAST = " -f, --failfast Stop on first failure\n"
CATCHBREAK = " -c, --catch Catch control-C and display results\n"
BUFFEROUTPUT = " -b, --buffer Buffer stdout and stderr during test runs\n"
USAGE_AS_MAIN = """\
Usage: %(progName)s [options] [tests]
Options:
-h, --help Show this message
-v, --verbose Verbose output
-q, --quiet Minimal output
%(failfast)s%(catchbreak)s%(buffer)s
Examples:
%(progName)s test_module - run tests from test_module
%(progName)s module.TestClass - run tests from module.TestClass
%(progName)s module.Class.test_method - run specified test method
[tests] can be a list of any number of test modules, classes and test
methods.
Alternative Usage: %(progName)s discover [options]
Options:
-v, --verbose Verbose output
%(failfast)s%(catchbreak)s%(buffer)s -s directory Directory to start discovery ('.' default)
-p pattern Pattern to match test files ('test*.py' default)
-t directory Top level directory of project (default to
start directory)
For test discovery all test modules must be importable from the top
level directory of the project.
"""
USAGE_FROM_MODULE = """\
Usage: %(progName)s [options] [test] [...]
Options:
-h, --help Show this message
-v, --verbose Verbose output
-q, --quiet Minimal output
%(failfast)s%(catchbreak)s%(buffer)s
Examples:
%(progName)s - run default set of tests
%(progName)s MyTestSuite - run suite 'MyTestSuite'
%(progName)s MyTestCase.testSomething - run MyTestCase.testSomething
%(progName)s MyTestCase - run all 'test*' test methods
in MyTestCase
"""
class TestProgram(object):
"""A command-line program that runs a set of tests; this is primarily
for making test modules conveniently executable.
"""
USAGE = USAGE_FROM_MODULE
# defaults for testing
failfast = catchbreak = buffer = progName = None
def __init__(self, module='__main__', defaultTest=None, argv=None,
testRunner=None, testLoader=loader.defaultTestLoader,
exit=True, verbosity=1, failfast=None, catchbreak=None,
buffer=None):
if isinstance(module, basestring):
self.module = __import__(module)
for part in module.split('.')[1:]:
self.module = getattr(self.module, part)
else:
self.module = module
if argv is None:
argv = sys.argv
self.exit = exit
self.failfast = failfast
self.catchbreak = catchbreak
self.verbosity = verbosity
self.buffer = buffer
self.defaultTest = defaultTest
self.testRunner = testRunner
self.testLoader = testLoader
self.progName = os.path.basename(argv[0])
self.parseArgs(argv)
self.runTests()
def usageExit(self, msg=None):
if msg:
print msg
usage = {'progName': self.progName, 'catchbreak': '', 'failfast': '',
'buffer': ''}
if self.failfast != False:
usage['failfast'] = FAILFAST
if self.catchbreak != False:
usage['catchbreak'] = CATCHBREAK
if self.buffer != False:
usage['buffer'] = BUFFEROUTPUT
print self.USAGE % usage
sys.exit(2)
def parseArgs(self, argv):
if len(argv) > 1 and argv[1].lower() == 'discover':
self._do_discovery(argv[2:])
return
import getopt
long_opts = ['help', 'verbose', 'quiet', 'failfast', 'catch', 'buffer']
try:
options, args = getopt.getopt(argv[1:], 'hHvqfcb', long_opts)
for opt, value in options:
if opt in ('-h','-H','--help'):
self.usageExit()
if opt in ('-q','--quiet'):
self.verbosity = 0
if opt in ('-v','--verbose'):
self.verbosity = 2
if opt in ('-f','--failfast'):
if self.failfast is None:
self.failfast = True
# Should this raise an exception if -f is not valid?
if opt in ('-c','--catch'):
if self.catchbreak is None:
self.catchbreak = True
# Should this raise an exception if -c is not valid?
if opt in ('-b','--buffer'):
if self.buffer is None:
self.buffer = True
# Should this raise an exception if -b is not valid?
if len(args) == 0 and self.defaultTest is None:
# createTests will load tests from self.module
self.testNames = None
elif len(args) > 0:
self.testNames = args
if __name__ == '__main__':
# to support python -m unittest ...
self.module = None
else:
self.testNames = (self.defaultTest,)
self.createTests()
except getopt.error, msg:
self.usageExit(msg)
def createTests(self):
if self.testNames is None:
self.test = self.testLoader.loadTestsFromModule(self.module)
else:
self.test = self.testLoader.loadTestsFromNames(self.testNames,
self.module)
def _do_discovery(self, argv, Loader=None):
if Loader is None:
Loader = lambda: self.testLoader
# handle command line args for test discovery
self.progName = '%s discover' % self.progName
import optparse
parser = optparse.OptionParser()
parser.prog = self.progName
parser.add_option('-v', '--verbose', dest='verbose', default=False,
help='Verbose output', action='store_true')
if self.failfast != False:
parser.add_option('-f', '--failfast', dest='failfast', default=False,
help='Stop on first fail or error',
action='store_true')
if self.catchbreak != False:
parser.add_option('-c', '--catch', dest='catchbreak', default=False,
help='Catch Ctrl-C and display results so far',
action='store_true')
if self.buffer != False:
parser.add_option('-b', '--buffer', dest='buffer', default=False,
help='Buffer stdout and stderr during tests',
action='store_true')
parser.add_option('-s', '--start-directory', dest='start', default='.',
help="Directory to start discovery ('.' default)")
parser.add_option('-p', '--pattern', dest='pattern', default='test*.py',
help="Pattern to match tests ('test*.py' default)")
parser.add_option('-t', '--top-level-directory', dest='top', default=None,
help='Top level directory of project (defaults to start directory)')
options, args = parser.parse_args(argv)
if len(args) > 3:
self.usageExit()
for name, value in zip(('start', 'pattern', 'top'), args):
setattr(options, name, value)
# only set options from the parsing here
# if they weren't set explicitly in the constructor
if self.failfast is None:
self.failfast = options.failfast
if self.catchbreak is None:
self.catchbreak = options.catchbreak
if self.buffer is None:
self.buffer = options.buffer
if options.verbose:
self.verbosity = 2
start_dir = options.start
pattern = options.pattern
top_level_dir = options.top
loader = Loader()
self.test = loader.discover(start_dir, pattern, top_level_dir)
def runTests(self):
if self.catchbreak:
installHandler()
if self.testRunner is None:
self.testRunner = runner.TextTestRunner
if isinstance(self.testRunner, (type, types.ClassType)):
try:
testRunner = self.testRunner(verbosity=self.verbosity,
failfast=self.failfast,
buffer=self.buffer)
except TypeError:
# didn't accept the verbosity, buffer or failfast arguments
testRunner = self.testRunner()
else:
# it is assumed to be a TestRunner instance
testRunner = self.testRunner
self.result = testRunner.run(self.test)
if self.exit:
sys.exit(not self.result.wasSuccessful())
main = TestProgram
"""Loading unittests."""
import os
import re
import sys
import traceback
import types
from functools import cmp_to_key as _CmpToKey
from fnmatch import fnmatch
from . import case, suite
__unittest = True
# what about .pyc or .pyo (etc)
# we would need to avoid loading the same tests multiple times
# from '.py', '.pyc' *and* '.pyo'
VALID_MODULE_NAME = re.compile(r'[_a-z]\w*\.py$', re.IGNORECASE)
def _make_failed_import_test(name, suiteClass):
message = 'Failed to import test module: %s\n%s' % (name, traceback.format_exc())
return _make_failed_test('ModuleImportFailure', name, ImportError(message),
suiteClass)
def _make_failed_load_tests(name, exception, suiteClass):
return _make_failed_test('LoadTestsFailure', name, exception, suiteClass)
def _make_failed_test(classname, methodname, exception, suiteClass):
def testFailure(self):
raise exception
attrs = {methodname: testFailure}
TestClass = type(classname, (case.TestCase,), attrs)
return suiteClass((TestClass(methodname),))
class TestLoader(object):
"""
This class is responsible for loading tests according to various criteria
and returning them wrapped in a TestSuite
"""
testMethodPrefix = 'test'
sortTestMethodsUsing = cmp
suiteClass = suite.TestSuite
_top_level_dir = None
def loadTestsFromTestCase(self, testCaseClass):
"""Return a suite of all test cases contained in testCaseClass"""
if issubclass(testCaseClass, suite.TestSuite):
raise TypeError("Test cases should not be derived from TestSuite." \
" Maybe you meant to derive from TestCase?")
testCaseNames = self.getTestCaseNames(testCaseClass)
if not testCaseNames and hasattr(testCaseClass, 'runTest'):
testCaseNames = ['runTest']
loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
return loaded_suite
def loadTestsFromModule(self, module, use_load_tests=True):
"""Return a suite of all test cases contained in the given module"""
tests = []
for name in dir(module):
obj = getattr(module, name)
if isinstance(obj, type) and issubclass(obj, case.TestCase):
tests.append(self.loadTestsFromTestCase(obj))
load_tests = getattr(module, 'load_tests', None)
tests = self.suiteClass(tests)
if use_load_tests and load_tests is not None:
try:
return load_tests(self, tests, None)
except Exception, e:
return _make_failed_load_tests(module.__name__, e,
self.suiteClass)
return tests
def loadTestsFromName(self, name, module=None):
"""Return a suite of all test cases given a string specifier.
The name may resolve either to a module, a test case class, a
test method within a test case class, or a callable object which
returns a TestCase or TestSuite instance.
The method optionally resolves the names relative to a given module.
"""
parts = name.split('.')
if module is None:
parts_copy = parts[:]
while parts_copy:
try:
module = __import__('.'.join(parts_copy))
break
except ImportError:
del parts_copy[-1]
if not parts_copy:
raise
parts = parts[1:]
obj = module
for part in parts:
parent, obj = obj, getattr(obj, part)
if isinstance(obj, types.ModuleType):
return self.loadTestsFromModule(obj)
elif isinstance(obj, type) and issubclass(obj, case.TestCase):
return self.loadTestsFromTestCase(obj)
elif (isinstance(obj, types.UnboundMethodType) and
isinstance(parent, type) and
issubclass(parent, case.TestCase)):
name = parts[-1]
inst = parent(name)
return self.suiteClass([inst])
elif isinstance(obj, suite.TestSuite):
return obj
elif hasattr(obj, '__call__'):
test = obj()
if isinstance(test, suite.TestSuite):
return test
elif isinstance(test, case.TestCase):
return self.suiteClass([test])
else:
raise TypeError("calling %s returned %s, not a test" %
(obj, test))
else:
raise TypeError("don't know how to make test from: %s" % obj)
def loadTestsFromNames(self, names, module=None):
"""Return a suite of all test cases found using the given sequence
of string specifiers. See 'loadTestsFromName()'.
"""
suites = [self.loadTestsFromName(name, module) for name in names]
return self.suiteClass(suites)
def getTestCaseNames(self, testCaseClass):
"""Return a sorted sequence of method names found within testCaseClass
"""
def isTestMethod(attrname, testCaseClass=testCaseClass,
prefix=self.testMethodPrefix):
return attrname.startswith(prefix) and \
hasattr(getattr(testCaseClass, attrname), '__call__')
testFnNames = filter(isTestMethod, dir(testCaseClass))
if self.sortTestMethodsUsing:
testFnNames.sort(key=_CmpToKey(self.sortTestMethodsUsing))
return testFnNames
def discover(self, start_dir, pattern='test*.py', top_level_dir=None):
"""Find and return all test modules from the specified start
directory, recursing into subdirectories to find them. Only test files
that match the pattern will be loaded. (Using shell style pattern
matching.)
All test modules must be importable from the top level of the project.
If the start directory is not the top level directory then the top
level directory must be specified separately.
If a test package name (directory with '__init__.py') matches the
pattern then the package will be checked for a 'load_tests' function. If
this exists then it will be called with loader, tests, pattern.
If load_tests exists then discovery does *not* recurse into the package,
load_tests is responsible for loading all tests in the package.
The pattern is deliberately not stored as a loader attribute so that
packages can continue discovery themselves. top_level_dir is stored so
load_tests does not need to pass this argument in to loader.discover().
"""
set_implicit_top = False
if top_level_dir is None and self._top_level_dir is not None:
# make top_level_dir optional if called from load_tests in a package
top_level_dir = self._top_level_dir
elif top_level_dir is None:
set_implicit_top = True
top_level_dir = start_dir
top_level_dir = os.path.abspath(top_level_dir)
if not top_level_dir in sys.path:
# all test modules must be importable from the top level directory
# should we *unconditionally* put the start directory in first
# in sys.path to minimise likelihood of conflicts between installed
# modules and development versions?
sys.path.insert(0, top_level_dir)
self._top_level_dir = top_level_dir
is_not_importable = False
if os.path.isdir(os.path.abspath(start_dir)):
start_dir = os.path.abspath(start_dir)
if start_dir != top_level_dir:
is_not_importable = not os.path.isfile(os.path.join(start_dir, '__init__.py'))
else:
# support for discovery from dotted module names
try:
__import__(start_dir)
except ImportError:
is_not_importable = True
else:
the_module = sys.modules[start_dir]
top_part = start_dir.split('.')[0]
start_dir = os.path.abspath(os.path.dirname((the_module.__file__)))
if set_implicit_top:
self._top_level_dir = self._get_directory_containing_module(top_part)
sys.path.remove(top_level_dir)
if is_not_importable:
raise ImportError('Start directory is not importable: %r' % start_dir)
tests = list(self._find_tests(start_dir, pattern))
return self.suiteClass(tests)
def _get_directory_containing_module(self, module_name):
module = sys.modules[module_name]
full_path = os.path.abspath(module.__file__)
if os.path.basename(full_path).lower().startswith('__init__.py'):
return os.path.dirname(os.path.dirname(full_path))
else:
# here we have been given a module rather than a package - so
# all we can do is search the *same* directory the module is in
# should an exception be raised instead
return os.path.dirname(full_path)
def _get_name_from_path(self, path):
path = os.path.splitext(os.path.normpath(path))[0]
_relpath = os.path.relpath(path, self._top_level_dir)
assert not os.path.isabs(_relpath), "Path must be within the project"
assert not _relpath.startswith('..'), "Path must be within the project"
name = _relpath.replace(os.path.sep, '.')
return name
def _get_module_from_name(self, name):
__import__(name)
return sys.modules[name]
def _match_path(self, path, full_path, pattern):
# override this method to use alternative matching strategy
return fnmatch(path, pattern)
def _find_tests(self, start_dir, pattern):
"""Used by discovery. Yields test suites it loads."""
paths = os.listdir(start_dir)
for path in paths:
full_path = os.path.join(start_dir, path)
if os.path.isfile(full_path):
if not VALID_MODULE_NAME.match(path):
# valid Python identifiers only
continue
if not self._match_path(path, full_path, pattern):
continue
# if the test file matches, load it
name = self._get_name_from_path(full_path)
try:
module = self._get_module_from_name(name)
except:
yield _make_failed_import_test(name, self.suiteClass)
else:
mod_file = os.path.abspath(getattr(module, '__file__', full_path))
realpath = os.path.splitext(os.path.realpath(mod_file))[0]
fullpath_noext = os.path.splitext(os.path.realpath(full_path))[0]
if realpath.lower() != fullpath_noext.lower():
module_dir = os.path.dirname(realpath)
mod_name = os.path.splitext(os.path.basename(full_path))[0]
expected_dir = os.path.dirname(full_path)
msg = ("%r module incorrectly imported from %r. Expected %r. "
"Is this module globally installed?")
raise ImportError(msg % (mod_name, module_dir, expected_dir))
yield self.loadTestsFromModule(module)
elif os.path.isdir(full_path):
if not os.path.isfile(os.path.join(full_path, '__init__.py')):
continue
load_tests = None
tests = None
if fnmatch(path, pattern):
# only check load_tests if the package directory itself matches the filter
name = self._get_name_from_path(full_path)
package = self._get_module_from_name(name)
load_tests = getattr(package, 'load_tests', None)
tests = self.loadTestsFromModule(package, use_load_tests=False)
if load_tests is None:
if tests is not None:
# tests loaded from package file
yield tests
# recurse into the package
for test in self._find_tests(full_path, pattern):
yield test
else:
try:
yield load_tests(self, tests, pattern)
except Exception, e:
yield _make_failed_load_tests(package.__name__, e,
self.suiteClass)
defaultTestLoader = TestLoader()
def _makeLoader(prefix, sortUsing, suiteClass=None):
loader = TestLoader()
loader.sortTestMethodsUsing = sortUsing
loader.testMethodPrefix = prefix
if suiteClass:
loader.suiteClass = suiteClass
return loader
def getTestCaseNames(testCaseClass, prefix, sortUsing=cmp):
return _makeLoader(prefix, sortUsing).getTestCaseNames(testCaseClass)
def makeSuite(testCaseClass, prefix='test', sortUsing=cmp,
suiteClass=suite.TestSuite):
return _makeLoader(prefix, sortUsing, suiteClass).loadTestsFromTestCase(testCaseClass)
def findTestCases(module, prefix='test', sortUsing=cmp,
suiteClass=suite.TestSuite):
return _makeLoader(prefix, sortUsing, suiteClass).loadTestsFromModule(module)
"""Test case implementation"""
import collections
import sys
import functools
import difflib
import pprint
import re
import types
import warnings
from . import result
from .util import (
strclass, safe_repr, unorderable_list_difference,
_count_diff_all_purpose, _count_diff_hashable
)
__unittest = True
DIFF_OMITTED = ('\nDiff is %s characters long. '
'Set self.maxDiff to None to see it.')
class SkipTest(Exception):
"""
Raise this exception in a test to skip it.
Usually you can use TestCase.skipTest() or one of the skipping decorators
instead of raising this directly.
"""
pass
class _ExpectedFailure(Exception):
"""
Raise this when a test is expected to fail.
This is an implementation detail.
"""
def __init__(self, exc_info):
super(_ExpectedFailure, self).__init__()
self.exc_info = exc_info
class _UnexpectedSuccess(Exception):
"""
The test was supposed to fail, but it didn't!
"""
pass
def _id(obj):
return obj
def skip(reason):
"""
Unconditionally skip a test.
"""
def decorator(test_item):
if not isinstance(test_item, (type, types.ClassType)):
@functools.wraps(test_item)
def skip_wrapper(*args, **kwargs):
raise SkipTest(reason)
test_item = skip_wrapper
test_item.__unittest_skip__ = True
test_item.__unittest_skip_why__ = reason
return test_item
return decorator
def skipIf(condition, reason):
"""
Skip a test if the condition is true.
"""
if condition:
return skip(reason)
return _id
def skipUnless(condition, reason):
"""
Skip a test unless the condition is true.
"""
if not condition:
return skip(reason)
return _id
def expectedFailure(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
try:
func(*args, **kwargs)
except Exception:
raise _ExpectedFailure(sys.exc_info())
raise _UnexpectedSuccess
return wrapper
class _AssertRaisesContext(object):
"""A context manager used to implement TestCase.assertRaises* methods."""
def __init__(self, expected, test_case, expected_regexp=None):
self.expected = expected
self.failureException = test_case.failureException
self.expected_regexp = expected_regexp
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, tb):
if exc_type is None:
try:
exc_name = self.expected.__name__
except AttributeError:
exc_name = str(self.expected)
raise self.failureException(
"{0} not raised".format(exc_name))
# Save an expected exception in self.exception.
# Let any other exception propagate through.
if issubclass(exc_type, self.expected):
self.exception = exc_value
elif (hasattr(exc_value, 'clsException') and
isinstance(exc_value.clsException, self.expected)):
self.exception = exc_value.clsException
else:
return False
if self.expected_regexp is None:
return True
expected_regexp = self.expected_regexp
if not expected_regexp.search(str(exc_value)):
raise self.failureException('"%s" does not match "%s"' %
(expected_regexp.pattern, str(exc_value)))
return True
class TestCase(object):
"""A class whose instances are single test cases.
By default, the test code itself should be placed in a method named
'runTest'.
If the fixture may be used for many test cases, create as
many test methods as are needed. When instantiating such a TestCase
subclass, specify in the constructor arguments the name of the test method
that the instance is to execute.
Test authors should subclass TestCase for their own tests. Construction
and deconstruction of the test's environment ('fixture') can be
implemented by overriding the 'setUp' and 'tearDown' methods respectively.
If it is necessary to override the __init__ method, the base class
__init__ method must always be called. It is important that subclasses
should not change the signature of their __init__ method, since instances
of the classes are instantiated automatically by parts of the framework
in order to be run.
When subclassing TestCase, you can set these attributes:
* failureException: determines which exception will be raised when
the instance's assertion methods fail; test methods raising this
exception will be deemed to have 'failed' rather than 'errored'.
* longMessage: determines whether long messages (including repr of
objects used in assert methods) will be printed on failure in *addition*
to any explicit message passed.
* maxDiff: sets the maximum length of a diff in failure messages
by assert methods using difflib. It is looked up as an instance
attribute so can be configured by individual tests if required.
"""
failureException = AssertionError
longMessage = False
maxDiff = 80*8
# If a string is longer than _diffThreshold, use normal comparison instead
# of difflib. See #11763.
_diffThreshold = 2**16
# Attribute used by TestSuite for classSetUp
_classSetupFailed = False
def __init__(self, methodName='runTest'):
"""Create an instance of the class that will use the named test
method when executed. Raises a ValueError if the instance does
not have a method with the specified name.
"""
self._testMethodName = methodName
self._resultForDoCleanups = None
try:
testMethod = getattr(self, methodName)
except AttributeError:
raise ValueError("no such test method in %s: %s" %
(self.__class__, methodName))
self._testMethodDoc = testMethod.__doc__
self._cleanups = []
# Map types to custom assertEqual functions that will compare
# instances of said type in more detail to generate a more useful
# error message.
self._type_equality_funcs = {}
self.addTypeEqualityFunc(dict, 'assertDictEqual')
self.addTypeEqualityFunc(list, 'assertListEqual')
self.addTypeEqualityFunc(tuple, 'assertTupleEqual')
self.addTypeEqualityFunc(set, 'assertSetEqual')
self.addTypeEqualityFunc(frozenset, 'assertSetEqual')
try:
self.addTypeEqualityFunc(unicode, 'assertMultiLineEqual')
except NameError:
# No unicode support in this build
pass
def addTypeEqualityFunc(self, typeobj, function):
"""Add a type specific assertEqual style function to compare a type.
This method is for use by TestCase subclasses that need to register
their own type equality functions to provide nicer error messages.
Args:
typeobj: The data type to call this function on when both values
are of the same type in assertEqual().
function: The callable taking two arguments and an optional
msg= argument that raises self.failureException with a
useful error message when the two arguments are not equal.
"""
self._type_equality_funcs[typeobj] = function
def addCleanup(self, function, *args, **kwargs):
"""Add a function, with arguments, to be called when the test is
completed. Functions added are called on a LIFO basis and are
called after tearDown on test failure or success.
Cleanup items are called even if setUp fails (unlike tearDown)."""
self._cleanups.append((function, args, kwargs))
def setUp(self):
"Hook method for setting up the test fixture before exercising it."
pass
def tearDown(self):
"Hook method for deconstructing the test fixture after testing it."
pass
@classmethod
def setUpClass(cls):
"Hook method for setting up class fixture before running tests in the class."
@classmethod
def tearDownClass(cls):
"Hook method for deconstructing the class fixture after running all tests in the class."
def countTestCases(self):
return 1
def defaultTestResult(self):
return result.TestResult()
def shortDescription(self):
"""Returns a one-line description of the test, or None if no
description has been provided.
The default implementation of this method returns the first line of
the specified test method's docstring.
"""
doc = self._testMethodDoc
return doc and doc.split("\n")[0].strip() or None
def id(self):
return "%s.%s" % (strclass(self.__class__), self._testMethodName)
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self._testMethodName == other._testMethodName
def __ne__(self, other):
return not self == other
def __hash__(self):
return hash((type(self), self._testMethodName))
def __str__(self):
return "%s (%s)" % (self._testMethodName, strclass(self.__class__))
def __repr__(self):
return "<%s testMethod=%s>" % \
(strclass(self.__class__), self._testMethodName)
def _addSkip(self, result, reason):
addSkip = getattr(result, 'addSkip', None)
if addSkip is not None:
addSkip(self, reason)
else:
warnings.warn("TestResult has no addSkip method, skips not reported",
RuntimeWarning, 2)
result.addSuccess(self)
def run(self, result=None):
orig_result = result
if result is None:
result = self.defaultTestResult()
startTestRun = getattr(result, 'startTestRun', None)
if startTestRun is not None:
startTestRun()
self._resultForDoCleanups = result
result.startTest(self)
testMethod = getattr(self, self._testMethodName)
if (getattr(self.__class__, "__unittest_skip__", False) or
getattr(testMethod, "__unittest_skip__", False)):
# If the class or method was skipped.
try:
skip_why = (getattr(self.__class__, '__unittest_skip_why__', '')
or getattr(testMethod, '__unittest_skip_why__', ''))
self._addSkip(result, skip_why)
finally:
result.stopTest(self)
return
try:
success = False
try:
self.setUp()
except SkipTest as e:
self._addSkip(result, str(e))
except KeyboardInterrupt:
raise
except:
result.addError(self, sys.exc_info())
else:
try:
testMethod()
except KeyboardInterrupt:
raise
except self.failureException:
result.addFailure(self, sys.exc_info())
except _ExpectedFailure as e:
addExpectedFailure = getattr(result, 'addExpectedFailure', None)
if addExpectedFailure is not None:
addExpectedFailure(self, e.exc_info)
else:
warnings.warn("TestResult has no addExpectedFailure method, reporting as passes",
RuntimeWarning)
result.addSuccess(self)
except _UnexpectedSuccess:
addUnexpectedSuccess = getattr(result, 'addUnexpectedSuccess', None)
if addUnexpectedSuccess is not None:
addUnexpectedSuccess(self)
else:
warnings.warn("TestResult has no addUnexpectedSuccess method, reporting as failures",
RuntimeWarning)
result.addFailure(self, sys.exc_info())
except SkipTest as e:
self._addSkip(result, str(e))
except:
result.addError(self, sys.exc_info())
else:
success = True
try:
self.tearDown()
except KeyboardInterrupt:
raise
except:
result.addError(self, sys.exc_info())
success = False
cleanUpSuccess = self.doCleanups()
success = success and cleanUpSuccess
if success:
result.addSuccess(self)
finally:
result.stopTest(self)
if orig_result is None:
stopTestRun = getattr(result, 'stopTestRun', None)
if stopTestRun is not None:
stopTestRun()
def doCleanups(self):
"""Execute all cleanup functions. Normally called for you after
tearDown."""
result = self._resultForDoCleanups
ok = True
while self._cleanups:
function, args, kwargs = self._cleanups.pop(-1)
try:
function(*args, **kwargs)
except KeyboardInterrupt:
raise
except:
ok = False
result.addError(self, sys.exc_info())
return ok
def __call__(self, *args, **kwds):
return self.run(*args, **kwds)
def debug(self):
"""Run the test without collecting errors in a TestResult"""
self.setUp()
getattr(self, self._testMethodName)()
self.tearDown()
while self._cleanups:
function, args, kwargs = self._cleanups.pop(-1)
function(*args, **kwargs)
def skipTest(self, reason):
"""Skip this test."""
raise SkipTest(reason)
def fail(self, msg=None):
"""Fail immediately, with the given message."""
raise self.failureException(msg)
def assertFalse(self, expr, msg=None):
"""Check that the expression is false."""
if expr:
msg = self._formatMessage(msg, "%s is not false" % safe_repr(expr))
raise self.failureException(msg)
def assertTrue(self, expr, msg=None):
"""Check that the expression is true."""
if not expr:
msg = self._formatMessage(msg, "%s is not true" % safe_repr(expr))
raise self.failureException(msg)
def _formatMessage(self, msg, standardMsg):
"""Honour the longMessage attribute when generating failure messages.
If longMessage is False this means:
* Use only an explicit message if it is provided
* Otherwise use the standard message for the assert
If longMessage is True:
* Use the standard message
* If an explicit message is provided, plus ' : ' and the explicit message
"""
if not self.longMessage:
return msg or standardMsg
if msg is None:
return standardMsg
try:
# don't switch to '{}' formatting in Python 2.X
# it changes the way unicode input is handled
return '%s : %s' % (standardMsg, msg)
except UnicodeDecodeError:
return '%s : %s' % (safe_repr(standardMsg), safe_repr(msg))
def assertRaises(self, excClass, callableObj=None, *args, **kwargs):
"""Fail unless an exception of class excClass is raised
by callableObj when invoked with arguments args and keyword
arguments kwargs. If a different type of exception is
raised, it will not be caught, and the test case will be
deemed to have suffered an error, exactly as for an
unexpected exception.
If called with callableObj omitted or None, will return a
context object used like this::
with self.assertRaises(SomeException):
do_something()
The context manager keeps a reference to the exception as
the 'exception' attribute. This allows you to inspect the
exception after the assertion::
with self.assertRaises(SomeException) as cm:
do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
"""
context = _AssertRaisesContext(excClass, self)
if callableObj is None:
return context
with context:
callableObj(*args, **kwargs)
def _getAssertEqualityFunc(self, first, second):
"""Get a detailed comparison function for the types of the two args.
Returns: A callable accepting (first, second, msg=None) that will
raise a failure exception if first != second with a useful human
readable error message for those types.
"""
#
# NOTE(gregory.p.smith): I considered isinstance(first, type(second))
# and vice versa. I opted for the conservative approach in case
# subclasses are not intended to be compared in detail to their super
# class instances using a type equality func. This means testing
# subtypes won't automagically use the detailed comparison. Callers
# should use their type specific assertSpamEqual method to compare
# subclasses if the detailed comparison is desired and appropriate.
# See the discussion in http://bugs.python.org/issue2578.
#
if type(first) is type(second):
asserter = self._type_equality_funcs.get(type(first))
if asserter is not None:
if isinstance(asserter, basestring):
asserter = getattr(self, asserter)
return asserter
return self._baseAssertEqual
def _baseAssertEqual(self, first, second, msg=None):
"""The default assertEqual implementation, not type specific."""
if not first == second:
standardMsg = '%s != %s' % (safe_repr(first), safe_repr(second))
msg = self._formatMessage(msg, standardMsg)
raise self.failureException(msg)
def assertEqual(self, first, second, msg=None):
"""Fail if the two objects are unequal as determined by the '=='
operator.
"""
assertion_func = self._getAssertEqualityFunc(first, second)
assertion_func(first, second, msg=msg)
def assertNotEqual(self, first, second, msg=None):
"""Fail if the two objects are equal as determined by the '!='
operator.
"""
if not first != second:
msg = self._formatMessage(msg, '%s == %s' % (safe_repr(first),
safe_repr(second)))
raise self.failureException(msg)
def assertAlmostEqual(self, first, second, places=None, msg=None, delta=None):
"""Fail if the two objects are unequal as determined by their
difference rounded to the given number of decimal places
(default 7) and comparing to zero, or by comparing that the
difference between the two objects is more than the given
delta.
Note that decimal places (from zero) are usually not the same
as significant digits (measured from the most significant digit).
If the two objects compare equal then they will automatically
compare almost equal.
"""
if first == second:
# shortcut
return
if delta is not None and places is not None:
raise TypeError("specify delta or places not both")
if delta is not None:
if abs(first - second) <= delta:
return
standardMsg = '%s != %s within %s delta' % (safe_repr(first),
safe_repr(second),
safe_repr(delta))
else:
if places is None:
places = 7
if round(abs(second-first), places) == 0:
return
standardMsg = '%s != %s within %r places' % (safe_repr(first),
safe_repr(second),
places)
msg = self._formatMessage(msg, standardMsg)
raise self.failureException(msg)
def assertNotAlmostEqual(self, first, second, places=None, msg=None, delta=None):
"""Fail if the two objects are equal as determined by their
difference rounded to the given number of decimal places
(default 7) and comparing to zero, or by comparing that the
difference between the two objects is less than the given delta.
Note that decimal places (from zero) are usually not the same
as significant digits (measured from the most significant digit).
Objects that are equal automatically fail.
"""
if delta is not None and places is not None:
raise TypeError("specify delta or places not both")
if delta is not None:
if not (first == second) and abs(first - second) > delta:
return
standardMsg = '%s == %s within %s delta' % (safe_repr(first),
safe_repr(second),
safe_repr(delta))
else:
if places is None:
places = 7
if not (first == second) and round(abs(second-first), places) != 0:
return
standardMsg = '%s == %s within %r places' % (safe_repr(first),
safe_repr(second),
places)
msg = self._formatMessage(msg, standardMsg)
raise self.failureException(msg)
# Synonyms for assertion methods
# The plurals are undocumented. Keep them that way to discourage use.
# Do not add more. Do not remove.
# Going through a deprecation cycle on these would annoy many people.
assertEquals = assertEqual
assertNotEquals = assertNotEqual
assertAlmostEquals = assertAlmostEqual
assertNotAlmostEquals = assertNotAlmostEqual
assert_ = assertTrue
# These fail* assertion method names are pending deprecation and will
# be a DeprecationWarning in 3.2; http://bugs.python.org/issue2578
def _deprecate(original_func):
def deprecated_func(*args, **kwargs):
warnings.warn(
'Please use {0} instead.'.format(original_func.__name__),
PendingDeprecationWarning, 2)
return original_func(*args, **kwargs)
return deprecated_func
failUnlessEqual = _deprecate(assertEqual)
failIfEqual = _deprecate(assertNotEqual)
failUnlessAlmostEqual = _deprecate(assertAlmostEqual)
failIfAlmostEqual = _deprecate(assertNotAlmostEqual)
failUnless = _deprecate(assertTrue)
failUnlessRaises = _deprecate(assertRaises)
failIf = _deprecate(assertFalse)
def assertSequenceEqual(self, seq1, seq2, msg=None, seq_type=None):
"""An equality assertion for ordered sequences (like lists and tuples).
For the purposes of this function, a valid ordered sequence type is one
which can be indexed, has a length, and has an equality operator.
Args:
seq1: The first sequence to compare.
seq2: The second sequence to compare.
seq_type: The expected datatype of the sequences, or None if no
datatype should be enforced.
msg: Optional message to use on failure instead of a list of
differences.
"""
if seq_type is not None:
seq_type_name = seq_type.__name__
if not isinstance(seq1, seq_type):
raise self.failureException('First sequence is not a %s: %s'
% (seq_type_name, safe_repr(seq1)))
if not isinstance(seq2, seq_type):
raise self.failureException('Second sequence is not a %s: %s'
% (seq_type_name, safe_repr(seq2)))
else:
seq_type_name = "sequence"
differing = None
try:
len1 = len(seq1)
except (TypeError, NotImplementedError):
differing = 'First %s has no length. Non-sequence?' % (
seq_type_name)
if differing is None:
try:
len2 = len(seq2)
except (TypeError, NotImplementedError):
differing = 'Second %s has no length. Non-sequence?' % (
seq_type_name)
if differing is None:
if seq1 == seq2:
return
seq1_repr = safe_repr(seq1)
seq2_repr = safe_repr(seq2)
if len(seq1_repr) > 30:
seq1_repr = seq1_repr[:30] + '...'
if len(seq2_repr) > 30:
seq2_repr = seq2_repr[:30] + '...'
elements = (seq_type_name.capitalize(), seq1_repr, seq2_repr)
differing = '%ss differ: %s != %s\n' % elements
for i in xrange(min(len1, len2)):
try:
item1 = seq1[i]
except (TypeError, IndexError, NotImplementedError):
differing += ('\nUnable to index element %d of first %s\n' %
(i, seq_type_name))
break
try:
item2 = seq2[i]
except (TypeError, IndexError, NotImplementedError):
differing += ('\nUnable to index element %d of second %s\n' %
(i, seq_type_name))
break
if item1 != item2:
differing += ('\nFirst differing element %d:\n%s\n%s\n' %
(i, safe_repr(item1), safe_repr(item2)))
break
else:
if (len1 == len2 and seq_type is None and
type(seq1) != type(seq2)):
# The sequences are the same, but have differing types.
return
if len1 > len2:
differing += ('\nFirst %s contains %d additional '
'elements.\n' % (seq_type_name, len1 - len2))
try:
differing += ('First extra element %d:\n%s\n' %
(len2, safe_repr(seq1[len2])))
except (TypeError, IndexError, NotImplementedError):
differing += ('Unable to index element %d '
'of first %s\n' % (len2, seq_type_name))
elif len1 < len2:
differing += ('\nSecond %s contains %d additional '
'elements.\n' % (seq_type_name, len2 - len1))
try:
differing += ('First extra element %d:\n%s\n' %
(len1, safe_repr(seq2[len1])))
except (TypeError, IndexError, NotImplementedError):
differing += ('Unable to index element %d '
'of second %s\n' % (len1, seq_type_name))
standardMsg = differing
diffMsg = '\n' + '\n'.join(
difflib.ndiff(pprint.pformat(seq1).splitlines(),
pprint.pformat(seq2).splitlines()))
standardMsg = self._truncateMessage(standardMsg, diffMsg)
msg = self._formatMessage(msg, standardMsg)
self.fail(msg)
def _truncateMessage(self, message, diff):
max_diff = self.maxDiff
if max_diff is None or len(diff) <= max_diff:
return message + diff
return message + (DIFF_OMITTED % len(diff))
def assertListEqual(self, list1, list2, msg=None):
"""A list-specific equality assertion.
Args:
list1: The first list to compare.
list2: The second list to compare.
msg: Optional message to use on failure instead of a list of
differences.
"""
self.assertSequenceEqual(list1, list2, msg, seq_type=list)
def assertTupleEqual(self, tuple1, tuple2, msg=None):
"""A tuple-specific equality assertion.
Args:
tuple1: The first tuple to compare.
tuple2: The second tuple to compare.
msg: Optional message to use on failure instead of a list of
differences.
"""
self.assertSequenceEqual(tuple1, tuple2, msg, seq_type=tuple)
def assertSetEqual(self, set1, set2, msg=None):
"""A set-specific equality assertion.
Args:
set1: The first set to compare.
set2: The second set to compare.
msg: Optional message to use on failure instead of a list of
differences.
assertSetEqual uses ducktyping to support different types of sets, and
is optimized for sets specifically (parameters must support a
difference method).
"""
try:
difference1 = set1.difference(set2)
except TypeError, e:
self.fail('invalid type when attempting set difference: %s' % e)
except AttributeError, e:
self.fail('first argument does not support set difference: %s' % e)
try:
difference2 = set2.difference(set1)
except TypeError, e:
self.fail('invalid type when attempting set difference: %s' % e)
except AttributeError, e:
self.fail('second argument does not support set difference: %s' % e)
if not (difference1 or difference2):
return
lines = []
if difference1:
lines.append('Items in the first set but not the second:')
for item in difference1:
lines.append(repr(item))
if difference2:
lines.append('Items in the second set but not the first:')
for item in difference2:
lines.append(repr(item))
standardMsg = '\n'.join(lines)
self.fail(self._formatMessage(msg, standardMsg))
def assertIn(self, member, container, msg=None):
"""Just like self.assertTrue(a in b), but with a nicer default message."""
if member not in container:
standardMsg = '%s not found in %s' % (safe_repr(member),
safe_repr(container))
self.fail(self._formatMessage(msg, standardMsg))
def assertNotIn(self, member, container, msg=None):
"""Just like self.assertTrue(a not in b), but with a nicer default message."""
if member in container:
standardMsg = '%s unexpectedly found in %s' % (safe_repr(member),
safe_repr(container))
self.fail(self._formatMessage(msg, standardMsg))
def assertIs(self, expr1, expr2, msg=None):
"""Just like self.assertTrue(a is b), but with a nicer default message."""
if expr1 is not expr2:
standardMsg = '%s is not %s' % (safe_repr(expr1),
safe_repr(expr2))
self.fail(self._formatMessage(msg, standardMsg))
def assertIsNot(self, expr1, expr2, msg=None):
"""Just like self.assertTrue(a is not b), but with a nicer default message."""
if expr1 is expr2:
standardMsg = 'unexpectedly identical: %s' % (safe_repr(expr1),)
self.fail(self._formatMessage(msg, standardMsg))
def assertDictEqual(self, d1, d2, msg=None):
self.assertIsInstance(d1, dict, 'First argument is not a dictionary')
self.assertIsInstance(d2, dict, 'Second argument is not a dictionary')
if d1 != d2:
standardMsg = '%s != %s' % (safe_repr(d1, True), safe_repr(d2, True))
diff = ('\n' + '\n'.join(difflib.ndiff(
pprint.pformat(d1).splitlines(),
pprint.pformat(d2).splitlines())))
standardMsg = self._truncateMessage(standardMsg, diff)
self.fail(self._formatMessage(msg, standardMsg))
def assertDictContainsSubset(self, expected, actual, msg=None):
"""Checks whether actual is a superset of expected."""
missing = []
mismatched = []
for key, value in expected.iteritems():
if key not in actual:
missing.append(key)
elif value != actual[key]:
mismatched.append('%s, expected: %s, actual: %s' %
(safe_repr(key), safe_repr(value),
safe_repr(actual[key])))
if not (missing or mismatched):
return
standardMsg = ''
if missing:
standardMsg = 'Missing: %s' % ','.join(safe_repr(m) for m in
missing)
if mismatched:
if standardMsg:
standardMsg += '; '
standardMsg += 'Mismatched values: %s' % ','.join(mismatched)
self.fail(self._formatMessage(msg, standardMsg))
def assertItemsEqual(self, expected_seq, actual_seq, msg=None):
"""An unordered sequence specific comparison. It asserts that
actual_seq and expected_seq have the same element counts.
Equivalent to::
self.assertEqual(Counter(iter(actual_seq)),
Counter(iter(expected_seq)))
Asserts that each element has the same count in both sequences.
Example:
- [0, 1, 1] and [1, 0, 1] compare equal.
- [0, 0, 1] and [0, 1] compare unequal.
"""
first_seq, second_seq = list(expected_seq), list(actual_seq)
with warnings.catch_warnings():
if sys.py3kwarning:
# Silence Py3k warning raised during the sorting
for _msg in ["(code|dict|type) inequality comparisons",
"builtin_function_or_method order comparisons",
"comparing unequal types"]:
warnings.filterwarnings("ignore", _msg, DeprecationWarning)
try:
first = collections.Counter(first_seq)
second = collections.Counter(second_seq)
except TypeError:
# Handle case with unhashable elements
differences = _count_diff_all_purpose(first_seq, second_seq)
else:
if first == second:
return
differences = _count_diff_hashable(first_seq, second_seq)
if differences:
standardMsg = 'Element counts were not equal:\n'
lines = ['First has %d, Second has %d: %r' % diff for diff in differences]
diffMsg = '\n'.join(lines)
standardMsg = self._truncateMessage(standardMsg, diffMsg)
msg = self._formatMessage(msg, standardMsg)
self.fail(msg)
def assertMultiLineEqual(self, first, second, msg=None):
"""Assert that two multi-line strings are equal."""
self.assertIsInstance(first, basestring,
'First argument is not a string')
self.assertIsInstance(second, basestring,
'Second argument is not a string')
if first != second:
# don't use difflib if the strings are too long
if (len(first) > self._diffThreshold or
len(second) > self._diffThreshold):
self._baseAssertEqual(first, second, msg)
firstlines = first.splitlines(True)
secondlines = second.splitlines(True)
if len(firstlines) == 1 and first.strip('\r\n') == first:
firstlines = [first + '\n']
secondlines = [second + '\n']
standardMsg = '%s != %s' % (safe_repr(first, True),
safe_repr(second, True))
diff = '\n' + ''.join(difflib.ndiff(firstlines, secondlines))
standardMsg = self._truncateMessage(standardMsg, diff)
self.fail(self._formatMessage(msg, standardMsg))
def assertLess(self, a, b, msg=None):
"""Just like self.assertTrue(a < b), but with a nicer default message."""
if not a < b:
standardMsg = '%s not less than %s' % (safe_repr(a), safe_repr(b))
self.fail(self._formatMessage(msg, standardMsg))
def assertLessEqual(self, a, b, msg=None):
"""Just like self.assertTrue(a <= b), but with a nicer default message."""
if not a <= b:
standardMsg = '%s not less than or equal to %s' % (safe_repr(a), safe_repr(b))
self.fail(self._formatMessage(msg, standardMsg))
def assertGreater(self, a, b, msg=None):
"""Just like self.assertTrue(a > b), but with a nicer default message."""
if not a > b:
standardMsg = '%s not greater than %s' % (safe_repr(a), safe_repr(b))
self.fail(self._formatMessage(msg, standardMsg))
def assertGreaterEqual(self, a, b, msg=None):
"""Just like self.assertTrue(a >= b), but with a nicer default message."""
if not a >= b:
standardMsg = '%s not greater than or equal to %s' % (safe_repr(a), safe_repr(b))
self.fail(self._formatMessage(msg, standardMsg))
def assertIsNone(self, obj, msg=None):
"""Same as self.assertTrue(obj is None), with a nicer default message."""
if obj is not None:
standardMsg = '%s is not None' % (safe_repr(obj),)
self.fail(self._formatMessage(msg, standardMsg))
def assertIsNotNone(self, obj, msg=None):
"""Included for symmetry with assertIsNone."""
if obj is None:
standardMsg = 'unexpectedly None'
self.fail(self._formatMessage(msg, standardMsg))
def assertIsInstance(self, obj, cls, msg=None):
"""Same as self.assertTrue(isinstance(obj, cls)), with a nicer
default message."""
if not isinstance(obj, cls):
standardMsg = '%s is not an instance of %r' % (safe_repr(obj), cls)
self.fail(self._formatMessage(msg, standardMsg))
def assertNotIsInstance(self, obj, cls, msg=None):
"""Included for symmetry with assertIsInstance."""
if isinstance(obj, cls):
standardMsg = '%s is an instance of %r' % (safe_repr(obj), cls)
self.fail(self._formatMessage(msg, standardMsg))
def assertRaisesRegexp(self, expected_exception, expected_regexp,
callable_obj=None, *args, **kwargs):
"""Asserts that the message in a raised exception matches a regexp.
Args:
expected_exception: Exception class expected to be raised.
expected_regexp: Regexp (re pattern object or string) expected
to be found in error message.
callable_obj: Function to be called.
args: Extra args.
kwargs: Extra kwargs.
"""
if expected_regexp is not None:
expected_regexp = re.compile(expected_regexp)
context = _AssertRaisesContext(expected_exception, self, expected_regexp)
if callable_obj is None:
return context
with context:
callable_obj(*args, **kwargs)
def assertRegexpMatches(self, text, expected_regexp, msg=None):
"""Fail the test unless the text matches the regular expression."""
if isinstance(expected_regexp, basestring):
expected_regexp = re.compile(expected_regexp)
if not expected_regexp.search(text):
msg = msg or "Regexp didn't match"
msg = '%s: %r not found in %r' % (msg, expected_regexp.pattern, text)
raise self.failureException(msg)
def assertNotRegexpMatches(self, text, unexpected_regexp, msg=None):
"""Fail the test if the text matches the regular expression."""
if isinstance(unexpected_regexp, basestring):
unexpected_regexp = re.compile(unexpected_regexp)
match = unexpected_regexp.search(text)
if match:
msg = msg or "Regexp matched"
msg = '%s: %r matches %r in %r' % (msg,
text[match.start():match.end()],
unexpected_regexp.pattern,
text)
raise self.failureException(msg)
class FunctionTestCase(TestCase):
"""A test case that wraps a test function.
This is useful for slipping pre-existing test functions into the
unittest framework. Optionally, set-up and tidy-up functions can be
supplied. As with TestCase, the tidy-up ('tearDown') function will
always be called if the set-up ('setUp') function ran successfully.
"""
def __init__(self, testFunc, setUp=None, tearDown=None, description=None):
super(FunctionTestCase, self).__init__()
self._setUpFunc = setUp
self._tearDownFunc = tearDown
self._testFunc = testFunc
self._description = description
def setUp(self):
if self._setUpFunc is not None:
self._setUpFunc()
def tearDown(self):
if self._tearDownFunc is not None:
self._tearDownFunc()
def runTest(self):
self._testFunc()
def id(self):
return self._testFunc.__name__
def __eq__(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
return self._setUpFunc == other._setUpFunc and \
self._tearDownFunc == other._tearDownFunc and \
self._testFunc == other._testFunc and \
self._description == other._description
def __ne__(self, other):
return not self == other
def __hash__(self):
return hash((type(self), self._setUpFunc, self._tearDownFunc,
self._testFunc, self._description))
def __str__(self):
return "%s (%s)" % (strclass(self.__class__),
self._testFunc.__name__)
def __repr__(self):
return "<%s tec=%s>" % (strclass(self.__class__),
self._testFunc)
def shortDescription(self):
if self._description is not None:
return self._description
doc = self._testFunc.__doc__
return doc and doc.split("\n")[0].strip() or None
"""Define names for all type symbols known in the standard interpreter.
Types that are part of optional modules (e.g. array) are not listed.
"""
import sys
# Iterators in Python aren't a matter of type but of protocol. A large
# and changing number of builtin types implement *some* flavor of
# iterator. Don't check the type! Use hasattr to check for both
# "__iter__" and "next" attributes instead.
NoneType = type(None)
TypeType = type
ObjectType = object
IntType = int
LongType = long
FloatType = float
BooleanType = bool
try:
ComplexType = complex
except NameError:
pass
StringType = str
# StringTypes is already outdated. Instead of writing "type(x) in
# types.StringTypes", you should use "isinstance(x, basestring)". But
# we keep around for compatibility with Python 2.2.
try:
UnicodeType = unicode
StringTypes = (StringType, UnicodeType)
except NameError:
StringTypes = (StringType,)
BufferType = buffer
TupleType = tuple
ListType = list
DictType = DictionaryType = dict
def _f(): pass
FunctionType = type(_f)
LambdaType = type(lambda: None) # Same as FunctionType
CodeType = type(_f.func_code)
def _g():
yield 1
GeneratorType = type(_g())
class _C:
def _m(self): pass
ClassType = type(_C)
UnboundMethodType = type(_C._m) # Same as MethodType
_x = _C()
InstanceType = type(_x)
MethodType = type(_x._m)
BuiltinFunctionType = type(len)
BuiltinMethodType = type([].append) # Same as BuiltinFunctionType
ModuleType = type(sys)
FileType = file
XRangeType = xrange
try:
raise TypeError
except TypeError:
tb = sys.exc_info()[2]
TracebackType = type(tb)
FrameType = type(tb.tb_frame)
del tb
SliceType = slice
EllipsisType = type(Ellipsis)
DictProxyType = type(TypeType.__dict__)
NotImplementedType = type(NotImplemented)
# For Jython, the following two types are identical
GetSetDescriptorType = type(FunctionType.func_code)
MemberDescriptorType = type(FunctionType.func_globals)
del sys, _f, _g, _C, _x # Not for export
__all__ = list(n for n in globals() if n[:1] != '_')
VERIFICATION Verification is intended to assist the Chocolatey moderators and community in verifying that this package's contents are trustworthy.
This package is published by the IronPython Project itself. The binaries are identical to other package types published by the project, in particular the IronPython nuget package.
IronPython
===
IronPython is an open-source implementation of the Python programming language which is tightly integrated with the .NET Framework. IronPython can use the .NET Framework and Python libraries, and other .NET languages can use Python code just as easily.
IronPython can be obtained at [http://ironpython.net/](http://ironpython.net/).
| **What?** | **Where?** |
| --------: | :------------: |
| **Windows/Linux/macOS Builds** | [![Azure build status](https://dotnet.visualstudio.com/IronLanguages/_apis/build/status/ironpython2)](https://dotnet.visualstudio.com/IronLanguages/_build/latest?definitionId=42) [![Github build status](https://github.com/IronLanguages/ironpython2/workflows/CI/badge.svg)](https://github.com/IronLanguages/ironpython2/actions?workflow=CI) |
| **Downloads** | [![NuGet](https://img.shields.io/nuget/v/IronPython.svg)](https://www.nuget.org/packages/IronPython/) [![Release](https://img.shields.io/github/release/IronLanguages/ironpython2.svg)](https://github.com/IronLanguages/ironpython2/releases/latest)|
| **Help** | [![Gitter chat](https://badges.gitter.im/IronLanguages/ironpython.svg)](https://gitter.im/IronLanguages/ironpython) [![StackExchange](https://img.shields.io/stackexchange/stackoverflow/t/ironpython.svg)](http://stackoverflow.com/questions/tagged/ironpython) |
Comparison of IronPython vs. C# for 'Hello World'
C#:
```cs
using System;
class Hello
{
static void Main()
{
Console.WriteLine("Hello World");
}
}
```
IronPython:
```py
print "Hello World"
```
IronPython is a Dynamic Language that runs on the .NET DLR ([Dynamic Language Runtime](http://en.wikipedia.org/wiki/Dynamic_Language_Runtime)) in contrast with VB.NET and C# which are [static languages](http://en.wikipedia.org/wiki/Type_system).
IronPython can also import DLL files compiled in other languages and use functions defined therein. For example:
```py
import clr
clr.AddReference("System.Windows.Forms")
from System.Windows.Forms import *
```
# Code of Conduct
This project has adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community.
For more information see the [.NET Foundation Code of Conduct](https://dotnetfoundation.org/code-of-conduct).
# Documentation
Documentation can be found here: http://ironpython.net/documentation/dotnet/
## Additional information
Please see http://wiki.github.com/IronLanguages/main for information on:
- Setting up a development environment with easy access to utility scripts
- [Building](Documentation/building.md)
- Running test
## Chat/Communication
Join our Gitter-Chat under: https://gitter.im/IronLanguages/ironpython
Log in or click on link to see number of positives.
- IKVM.Reflection.dll (0718c32f2184) - ## / 70
- ipy.exe (152ff4a241fd) - ## / 69
- ipy32.exe (921270d008e7) - ## / 68
- ipyc.exe (727d0963da91) - ## / 67
- ipyw.exe (a1788f9abc1f) - ## / 68
- ipyw32.exe (cfb23f53be70) - ## / 68
- ironpython.2.7.11-candidate1.nupkg (48895d461c0b) - ## / 61
- IronPython.dll (9986be2d54d5) - ## / 67
- IronPython.Modules.dll (1e993d278213) - ## / 67
- Microsoft.Dynamic.dll (05d31077c498) - ## / 67
- Microsoft.Scripting.dll (ec8ebbb0206e) - ## / 66
- IronPython.SQLite.dll (d430cff64770) - ## / 65
- IronPython.Wpf.dll (4f2d443dc901) - ## / 66
In cases where actual malware is found, the packages are subject to removal. Software sometimes has false positives. Moderators do not necessarily validate the safety of the underlying software, only that a package retrieves software from the official distribution point and/or validate embedded software against official distribution point (where distribution rights allow redistribution).
Chocolatey Pro provides runtime protection from possible malware.
Add to Builder | Version | Downloads | Last Updated | Status |
---|---|---|---|---|
IronPython 3.4.1 | 2686 | Tuesday, July 18, 2023 | Approved | |
IronPython 3.4.0 | 1134 | Monday, December 12, 2022 | Approved | |
IronPython 2.7.12 | 5455 | Friday, January 21, 2022 | Approved | |
IronPython 2.7.11 | 5080 | Tuesday, November 17, 2020 | Approved | |
IronPython 2.7.11-candidate1 | 144 | Tuesday, September 15, 2020 | Approved | |
IronPython 2.7.10 | 1871 | Thursday, May 28, 2020 | Approved | |
IronPython 2.7.10-alpha0 | 424 | Saturday, February 23, 2019 | Approved | |
IronPython 2.7.9 | 6720 | Tuesday, October 9, 2018 | Approved | |
IronPython 2.7.9-candidate1 | 264 | Monday, August 20, 2018 | Approved | |
IronPython 2.7.8.1 | 9601 | Tuesday, March 20, 2018 | Approved | |
IronPython 2.7.5.20150916 | 1711 | Tuesday, September 15, 2015 | Approved | |
IronPython 2.7.5.20150915 | 400 | Tuesday, September 15, 2015 | Approved | |
IronPython 2.7.4.20140816 | 1444 | Friday, August 15, 2014 | Approved | |
IronPython 2.7.4.20140815 | 393 | Friday, August 15, 2014 | Approved |
This package has no dependencies.
Ground Rules:
- This discussion is only about IronPython and the IronPython package. If you have feedback for Chocolatey, please contact the Google Group.
- This discussion will carry over multiple versions. If you have a comment about a particular version, please note that in your comments.
- The maintainers of this Chocolatey Package will be notified about new comments that are posted to this Disqus thread, however, it is NOT a guarantee that you will get a response. If you do not hear back from the maintainers after posting a message below, please follow up by using the link on the left side of this page or follow this link to contact maintainers. If you still hear nothing back, please follow the package triage process.
- Tell us what you love about the package or IronPython, or tell us what needs improvement.
- Share your experiences with the package, or extra configuration or gotchas that you've found.
- If you use a url, the comment will be flagged for moderation until you've been whitelisted. Disqus moderated comments are approved on a weekly schedule if not sooner. It could take between 1-5 days for your comment to show up.