o EŸhÎGã@sÂdZddlZddlZddlmZdgZe d¡Ze d¡Ze d¡Z e d¡Z e d ¡Z e d ¡Z e d ¡Z e d ¡Ze d ¡Ze d¡Ze dej¡Ze d ¡Ze d¡ZGdd„dejƒZdS)zA parser for HTML and XHTML.éN)ÚunescapeÚ HTMLParserz[&<]z &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]z z--\s*>z+([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*aF <[a-zA-Z][^\t\n\r\f />\x00]* # tag name (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\s]* # bare value ) \s* # possibly followed by a space )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z#c@sàeZdZdZdZddœdd„Zdd„Zd d „Zd d „Zd Z dd„Z dd„Z dd„Z dd„Z dd„Zd7dd„Zdd„Zdd„Zdd „Zd!d"„Zd#d$„Zd%d&„Zd'd(„Zd)d*„Zd+d,„Zd-d.„Zd/d0„Zd1d2„Zd3d4„Zd5d6„Zd S)8raEFind tags and other markup and call handler functions. Usage: p = HTMLParser() p.feed(data) ... p.close() Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). If convert_charrefs is True the character references are converted automatically to the corresponding Unicode character (and self.handle_data() is no longer split in chunks), otherwise they are passed by calling self.handle_entityref() or self.handle_charref() with the string containing respectively the named or numeric reference as the argument. )ÚscriptÚstyleT)Úconvert_charrefscCs||_| ¡dS)zÆInitialize and reset this instance. If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters. N)rÚreset)Úselfr©r ú"/usr/lib/python3.10/html/parser.pyÚ__init__Ws zHTMLParser.__init__cCs(d|_d|_t|_d|_tj |¡dS)z1Reset this instance. Loses all unprocessed data.Úz???N)ÚrawdataÚlasttagÚinteresting_normalÚ interestingÚ cdata_elemÚ _markupbaseÚ ParserBaser©r r r r r`s zHTMLParser.resetcCs|j||_| d¡dS)z‘Feed data to the parser. Call this as often as you want, with as little or as much text as you want (may include '\n'). rN)rÚgoahead©r Údatar r r Úfeedhs zHTMLParser.feedcCs| d¡dS)zHandle any buffered data.éN)rrr r r ÚcloseqszHTMLParser.closeNcCs|jS)z)Return full source of start tag: '<...>'.)Ú_HTMLParser__starttag_textrr r r Úget_starttag_textwszHTMLParser.get_starttag_textcCs$| ¡|_t d|jtj¡|_dS)Nz )ÚlowerrÚreÚcompileÚIr)r Úelemr r r Úset_cdata_mode{s zHTMLParser.set_cdata_modecCst|_d|_dS©N)rrrrr r r Úclear_cdata_modes zHTMLParser.clear_cdata_modec Cs|j}d}t|ƒ}||krU|jr;|js;| d|¡}|dkr:| dt||dƒ¡}|dkr8t d¡  ||¡s8n|}n|j   ||¡}|rI|  ¡}n|jrNn|}||kro|jrf|jsf|  t |||…ƒ¡n |  |||…¡| ||¡}||kr{nÚ|j}|d|ƒrŒt ||¡r| |¡} n@|d|ƒr›| |¡} n5|d|ƒr¦| |¡} n*|d|ƒr±| |¡} n|d |ƒr¼| |¡} n|d |ksÄ|rÎ|  d¡|d } nn…| dkr…|sÙn|t ||¡ràn£|d|ƒr|d |krò|  d¡n‘t ||¡rùnŠ| ||d d…¡n~|d|ƒr0|}d D]} | | |d ¡r"|t| ƒ8}nq| ||d |…¡nS|d|ƒrB| ||dd…¡nA|||d… ¡dkr[| ||d d…¡n(|d |ƒrm| ||d d…¡n|d|ƒr| ||d d…¡ntdƒ‚|} | || ¡}nÅ|d|ƒrÜt ||¡}|r¿|  ¡d d…} | !| ¡| "¡} |d| d ƒs¸| d } | || ¡}q d||d…vrÛ|  |||d …¡| ||d ¡}ny|d|ƒrMt# ||¡}|r |  d ¡} | $| ¡| "¡} |d| d ƒs| d } | || ¡}q t% ||¡}|r7|r6|  ¡||d…kr6| "¡} | |kr.|} | ||d ¡}n|d |krL|  d¡| ||d ¡}nnJdƒ‚||ks|r„||kr„|js„|jru|jsu|  t |||…ƒ¡n |  |||…¡| ||¡}||d…|_dS)Nrú<ú&é"z[\s;]úÚkÚsuffixÚnamer r r r†sè   ÿ€                         þ               …} zHTMLParser.goaheadcCsº|j}|||d…dksJdƒ‚|||d…dkr | |¡S|||d…dkr/| |¡S|||d… ¡d krX| d |d¡}|d krId S| ||d|…¡|d S| |¡S) Nr-r,z+unexpected call to parse_html_declaration()r/r*r0zÚ rc)rÚcheck_for_whole_start_tagrÚtagfind_tolerantr@rPrNrrÚattrfind_tolerantrÚappendÚstripÚgetposÚcountr6r8r<rHÚhandle_startendtagÚhandle_starttagÚCDATA_CONTENT_ELEMENTSr#)r rTÚendposrÚattrsr@rXÚtagÚmÚattrnameÚrestÚ attrvaluerPÚlinenoÚoffsetr r r rA?sX   &( ó   ÿ   ý  zHTMLParser.parse_starttagcCs²|j}t ||¡}|rU| ¡}|||d…}|dkr|dS|dkr?| d|¡r-|dS| d|¡r5dS||kr;|S|dS|dkrEdS|dvrKdS||krQ|S|dStd ƒ‚) Nrrú/rcr-r4r z6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZr3)rÚlocatestarttagend_tolerantr@rPr>rL)r rTrrrrVÚnextr r r rers.   z$HTMLParser.check_for_whole_start_tagcCs*|j}|||d…dksJdƒ‚t ||d¡}|sdS| ¡}t ||¡}|sn|jdur9| |||…¡|St ||d¡}|sV|||d…dkrQ|dS|  |¡S|  d¡  ¡}|  d| ¡¡}|  |¡|dS|  d¡  ¡}|jdurŠ||jkrŠ| |||…¡|S|  |¡| ¡|S) Nr-r)zunexpected call to parse_endtagrr4r0zr)rÚ endendtagr:rPÚ endtagfindr@rr<rfr\rNrr7Ú handle_endtagr%)r rTrr@r]Ú namematchÚtagnamer"r r r rB”s8       zHTMLParser.parse_endtagcCs| ||¡| |¡dSr$)rmr}©r rqrpr r r rl¼s zHTMLParser.handle_startendtagcCódSr$r r€r r r rmÁózHTMLParser.handle_starttagcCrr$r )r rqr r r r}År‚zHTMLParser.handle_endtagcCrr$r ©r rZr r r rOÉr‚zHTMLParser.handle_charrefcCrr$r rƒr r r rRÍr‚zHTMLParser.handle_entityrefcCrr$r rr r r r<Ñr‚zHTMLParser.handle_datacCrr$r rr r r rGÕr‚zHTMLParser.handle_commentcCrr$r )r Údeclr r r rJÙr‚zHTMLParser.handle_declcCrr$r rr r r rKÝr‚zHTMLParser.handle_picCrr$r rr r r rIàr‚zHTMLParser.unknown_decl)r)Ú__name__Ú __module__Ú __qualname__Ú__doc__rnr rrrrrr#r%rrEr\rDrArerBrlrmr}rOrRr<rGrJrKrIr r r r r?s:     3"( )rˆrrÚhtmlrÚ__all__r rrSrQrMr?rFr`Ú commentcloserfrgÚVERBOSEryr{r|rrr r r r Ús.           ÿò