KEncodingProber Class Reference
from PyKDE4.kdecore import *
Detailed Description
Provides encoding detection(probe) capabilities.
Probe the encoding of raw data only. In the case it can't find it, return the most possible encoding it guessed.
Always do Unicode probe regardless the ProberType
Feed data to it several times with feed() until ProberState changes to FoundIt/NotMe, or confidence() returns a value you find acceptable.
Intended lifetime of the object: one instance per ProberType.
Typical use:
 QByteArray data, moredata;
 ...
 KEncodingProber prober(KEncodingProber.Chinese);
 prober.feed(data);
 prober.feed(moredata);
 if (prober.confidence() > 0.6)
    QString out = QTextCodec.codecForName(prober.encoding())->toUnicode(data);
At least 256 characters are needed to change the ProberState from Probing to FoundIt. If you don't have so many characters to probe, decide whether to accept the encoding it guessed so far according to the Confidence by yourself.
Guess encoding of char array
| Enumerations | |
| ProberState | { FoundIt, NotMe, Probing } | 
| ProberType | { None, Universal, Arabic, Baltic, CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic, Greek, Hebrew, Japanese, Korean, NorthernSaami, Other, SouthEasternEurope, Thai, Turkish, Unicode, WesternEuropean } | 
| Methods | |
| __init__ (self, KEncodingProber.ProberType proberType=KEncodingProber.Universal) | |
| __init__ (self, KEncodingProber other) | |
| float | confidence (self) | 
| QByteArray | encoding (self) | 
| QString | encodingName (self) | 
| KEncodingProber.ProberState | feed (self, QByteArray data) | 
| KEncodingProber.ProberState | feed (self, QString data, int len) | 
| KEncodingProber.ProberType | proberType (self) | 
| reset (self) | |
| setProberType (self, KEncodingProber.ProberType proberType) | |
| KEncodingProber.ProberState | state (self) | 
| Static Methods | |
| QString | nameForProberType (KEncodingProber.ProberType proberType) | 
| KEncodingProber.ProberType | proberTypeForName (QString lang) | 
Method Documentation
| __init__ | ( | self, | ||
| KEncodingProber.ProberType | proberType=KEncodingProber.Universal | |||
| ) | 
Default ProberType is Universal(detect all possibe encodings)
| __init__ | ( | self, | ||
| KEncodingProber | other | |||
| ) | 
| float confidence | ( | self ) | 
- Returns:
- the confidence(sureness) of encoding it guessed so far (0.0 ~ 0.99), not very reliable for single byte encodings
| QByteArray encoding | ( | self ) | 
- Returns:
- a QByteArray with the name of the best encoding it has guessed so far
- Since:
- 4.2.2
| QString encodingName | ( | self ) | 
- Returns:
- the name of the best encoding it has guessed so far
- Warning:
- The returned string is allocated with strdup, so some memory is leaked with every call.
- Deprecated:
- Use encoding() instead, which returns a QByteArray.
| KEncodingProber.ProberState feed | ( | self, | ||
| QByteArray | data | |||
| ) | 
The main class method
feed data to the prober
- Returns:
- the ProberState after probing the fed data.
| KEncodingProber.ProberState feed | ( | self, | ||
| QString | data, | |||
| int | len | |||
| ) | 
The main class method
feed data to the prober
- Returns:
- the ProberState after probing the fed data.
| KEncodingProber.ProberType proberType | ( | self ) | 
| reset | ( | self ) | 
reset the prober's internal state and data.
| setProberType | ( | self, | ||
| KEncodingProber.ProberType | proberType | |||
| ) | 
change current prober's ProberType and reset the prober
| KEncodingProber.ProberState state | ( | self ) | 
- Returns:
- the prober's current ProberState
Static Method Documentation
| QString nameForProberType | ( | KEncodingProber.ProberType | proberType | |
| ) | 
map ProberType to language string
| KEncodingProber.ProberType proberTypeForName | ( | QString | lang | |
| ) | 
- Returns:
- the ProberType for lang (eg. proberTypeForName("Chinese Simplified") will return KEncodingProber.ChineseSimplified
Enumeration Documentation
| ProberState | 
- Enumerator:
- 
FoundIt NotMe Probing 
| ProberType | 
- Enumerator:
- 
None Universal Arabic Baltic CentralEuropean ChineseSimplified ChineseTraditional Cyrillic Greek Hebrew Japanese Korean NorthernSaami Other SouthEasternEurope Thai Turkish Unicode WesternEuropean 
 KDE 4.6 PyKDE API Reference
        KDE 4.6 PyKDE API Reference