[Python] Beautifulsoup4 教學

程式語言:Python
Package:beautifulsoup4

官方文件

功能:分析 html
import requests
from bs4 import BeautifulSoup

result = requests.get("https://www.google.com.tw/")
c = result.content

soup = BeautifulSoup(c, "html.parser")
links = soup.find_all("a")

data = {}
for a in links:
    title = a.text.strip()
    data[title] = a.attrs['href']

BeautifulSoup

  • BeautifulSoup(markup="", features=None, builder=None, parse_only=None, from_encoding=None, exclude_encodings=None, **kwargs) 
    • markup
      • 解析的 html
    • features
      • 解析器
    • parse_only
      • 只解析在 SoupStrainer 中指定的元素
    • from_encoding
      • 指定編碼,無指定的話,會自動檢測
      • BeautifulSoup(markup, from_encoding="iso-8859-8")
    • exclude_encodings
      • 排除編碼,使用 list
      • BeautifulSoup(markup, exclude_encodings=["ISO-8859-7"])
    • 屬性
      • .contains_replacement_characters 
        • True:文檔編碼時作了特殊字符的替換
  • SoupStrainer(name=None, attrs={}, text=None, **kwargs)
    • 參數說明,請參考 搜尋方法的 Filter
    • 範例
      from bs4 import BeautifulSoup
      from bs4 import SoupStrainer
      
      html_doc = """
      <html><head><title>The Dormouse's story</title></head>
          <body>
      <p class="title"><b>The Dormouse's story</b></p>
      
      <p class="story">Once upon a time there were three little sisters; and their names were
      <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
      <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
      <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
      and they lived at the bottom of a well.</p>
      
      <p class="story">...</p>
      """
      
      only_a_tags = SoupStrainer("a")
      print(BeautifulSoup(html_doc, "html.parser", parse_only=only_a_tags).prettify())
      # <a class="sister" href="http://example.com/elsie" id="link1">
      #  Elsie
      # </a>
      # <a class="sister" href="http://example.com/lacie" id="link2">
      #  Lacie
      # </a>
      # <a class="sister" href="http://example.com/tillie" id="link3">
      #  Tillie
      # </a>
      
      only_tags_with_id_link2 = SoupStrainer(id="link2")
      print(BeautifulSoup(html_doc, "html.parser", parse_only=only_tags_with_id_link2).prettify())
      # <a class="sister" href="http://example.com/lacie" id="link2">
      #  Lacie
      # </a>
      
      def is_short_string(text):
          return len(text) < 10
      
      only_short_strings = SoupStrainer(text=is_short_string)
      print(BeautifulSoup(html_doc, "html.parser", parse_only=only_short_strings).prettify())
      # Elsie
      # ,
      # Lacie
      # and
      # Tillie
      # ...
      #
      

解析器

不同解析器,可能得到的結果也會不一樣

使用方法 優勢 劣勢
Python標準庫 BeautifulSoup(markup, "html.parser")
  • Python的內置標準庫
  • 執行速度適中
  • 文檔容錯能力強
  • Python 2.7.3 or 3.2.2) 前的版本中文檔容錯能力差
lxml HTML 解析器 BeautifulSoup(markup, "lxml")
  • 速度快
  • 文檔容錯能力強
  • 需要安裝C語言庫
lxml XML 解析器
BeautifulSoup(markup, ["lxml", "xml"])
BeautifulSoup(markup, "xml")
  • 速度快
  • 唯一支持XML的解析器
  • 需要安裝C語言庫
html5lib BeautifulSoup(markup, "html5lib")
  • 最好的容錯性
  • 以瀏覽器的方式解析文檔
  • 生成HTML5格式的文檔
  • 速度慢
  • 不依賴外部擴展

物件種類

Tag
同 html 中的 tag
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')
tag = soup.b
type(tag)
# <class 'bs4.element.Tag'>
  • name
    • tag 的名字
    • 用法
      • 取得:tag.name
      • 修改:tag.name = "abc" 
      • 比較架構:tagA == tagB
      • 同一個對象: tagA is tagB
      • 複製
      • import copy
        markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
        soup = BeautifulSoup(markup, "html.parser")
        a_copy = copy.copy(soup.a)
        print(a_copy)
        # <a href="http://example.com/">I linked to <i>example.com</i></a>
        soup.a == a_copy
        # True
        soup.a is a_copy
        # False
        
  • Attributes
    • tag 的屬性,像是 class, id, ...
    • 用法
      • 取得
        • tag.attrs
        • tag['class'], tag['id']...
        • tag.get('class'), tag.get('id')...
        • 多值屬性,將回傳 list,像是 class
      • 修改
        • tag['class'] = 'verybold'
      • 刪除
        • del tag['class']

NavigableString
tag 包起來的字串,若是不明確,則回傳 None
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')
soup.b.string 
# 'Extremely bold'
type(soup.b.string)
# <class 'bs4.element.NavigableString'>

soup = BeautifulSoup('<b class="boldest">Extremely bold<i>abc</i></b>')
soup.b.string
# None
  • 用法
    • 取得:tag.string
    • 修改:tag.string.replace_with("abc")
  • 如果想在 Beautiful Soup 之外使用,需用 str() or unicode() 轉換。以免浪費內存
  • 不支援
    • .contents
    • .string
    • find()

BeautifulSoup
整份 document
  • 類似 tag,但無 attribute
  • soup.name # '[document]'

Comments and other special strings
只是 NavigableString 的子類
markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>"
soup = BeautifulSoup(markup)
comment = soup.b.string
type(comment)
# <class 'bs4.element.Comment'>
  • 種類
    • Comment
    • XML 相關
      • CData
      • ProcessingInstruction
      • Declaration
      • Doctype

使用範例

html 如下
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
解析器如下
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

訪問的方法

  • .tagName
    • 回傳第一個找到的子節點
    • 範例
      • soup.head
        • <head><title>The Dormouse's story</title></head>
      • soup.body.b
        • <b>The Dormouse's story</b>
      • soup.a
        • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
      • body = soup.body
        body.b
        • <b>The Dormouse's story</b>
  • .contents
    • 將當前元素的所有直接子節點以 list 輸出
    • 範例
      • soup.body.contents
        • [
          • '\n', 
          • <p class="title"><b>The Dormouse's story</b></p>, 
          • '\n', 
          • <p class="story">Once upon a time there were three little sisters; and their names were
            <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
            <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
            <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
            and they lived at the bottom of a well.</p>, 
          • '\n', 
          • <p class="story">...</p>, 
          • '\n'
        • ]
      •  soup.contents
        • [
          • '\n', 
          • <html><head><title>The Dormouse's story</title></head>
            <body>
            <p class="title"><b>The Dormouse's story</b></p>
            <p class="story">Once upon a time there were three little sisters; and their names were
            <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
            <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
            <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
            and they lived at the bottom of a well.</p>
            <p class="story">...</p>
            </body></html>
        • ]
  • .children
    • 得到當前元素所有直接子節點的 iterator
    • 範例
      • for child in soup.body.children:
            print(child)
  • .descendants
    • 得到當前元素所有子節點(含子孫節點) 的 generator
    • 範例
      • for child in soup.body.descendants:
            print(child)
  • .string
    • 若當前元素只有一個 NavigableString 類型的子節點,可得到此值
      若超過一個,則回傳 None
    • 範例
      • soup.body.string
        • None
      • soup.body.p.string
        • "The Dormouse's story"
  • .strings
    • 得到當前元素所有 NavigableString 類型子節點(含子孫節點) 的 generator
    • 範例
      • for string in soup.body.strings:
            print(repr(string))
  • .stripped_strings
    • 同 .strings,但去除空白行 與 段頭段尾的空白和 \n
      • for string in soup.body.strings:
            print(repr(string))
  • .parent
    • 得到當前元素的父節點
    • 範例
      • soup.title.parent
        • <head><title>The Dormouse's story</title></head>
      • soup.parent
        • None
  • .parents
    • 得到當前元素所有父節點的 generator
    • 範例
      • for parent in soup.title.parents:
            print(parent.name)
  • .next_sibling
    • 得到當前元素之後的兄弟節點
    • 範例
      • soup.head.next_sibling
        • '\n'
      • soup.head.next_sibling.next_sibling.name
        • 'body'
      • soup.title.next_sibling
        • None
  • .next_siblings
    • 得到當前元素之後所有兄弟節點的 generator
    • 範例
      • for sibling in soup.a.next_siblings:
            print(repr(sibling))
  • .previous_sibling
    • 得到當前元素之前的兄弟節點
    • 範例
      • soup.body.previous_sibling
        • '\n'
      • soup.body.previous_sibling.previous_sibling
        • <head><title>The Dormouse's story</title></head>
      • soup.head.previous_sibling
        • None
  • .previous_siblings
    • 得到當前元素之前所有兄弟節點的 generator
    • 範例
      • for sibling in soup.find(id="link3").previous_siblings:
            print(repr(sibling)) 
  • .next_element
    • 得到當前元素之後的解析元素
    • 範例
      • soup.head.next_element
        • <title>The Dormouse's story</title>
      • soup.a.next_element
        • 'Elsie'
        • 解析器先進入 <a> 標籤,然後是字符串 'Elsie',然後關閉 </a> 標籤
  • .next_elements
    • 得到當前元素之後所有解析元素的 generator
    • 範例
      • for element in soup.a.next_elements:
            print(repr(element))
  • .previous_element
    • 得到當前元素之前的解析元素
    • 範例
      • soup.body.previous_element
        • '\n'
      • soup.p.string.previous_element
        • <b>The Dormouse's story</b>
  • .previous_elements
    • 得到當前元素之前所有解析元素的 generator
    • 範例
      • for element in soup.p.string.previous_elements:
            print(repr(element))

搜尋的方法

  • Filters (搜尋參數可被傳入的類型,像 name, string, **kwargs)
    • String (同 text)
      • 範例
        • 'b'
        • soup.find_all('b')
          • [<b>The Dormouse's story</b>]
    • Regular Expression
      • 範例
        • for tag in soup.find_all(re.compile("^b")):
              print(tag.name)
          • body
            b
    • List
      • 範例
        • soup.find_all(["a", "b"])
          • [
            • <b>The Dormouse's story</b>, 
            • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
            • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
            • <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
          • ]
    • True
      • 範例
        • soup.find_all(id=True)
          • [
            • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
            • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
            • <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
          • ]
    • Function
      • 需返回 True or False
      • 範例
        • def not_lacie(href):
              return href and not re.compile("lacie").search(href) soup.find_all(href=not_lacie)
          • [
            • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
            • <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
          • ]
  • find_all(name=None, attrs={}, recursive=True, string=None, limit=None, **kwargs)
    • 搜尋當前元素的所有子節點
    • 簡寫
      • soup.find_all(...) == soup(...)
    • name
      • 可以搜尋名字為 name 的 tag
      • 字符串對象會被自動忽略掉
      • 範例
        • soup.find_all('title')
          • [<title>The Dormouse's story</title>]
    • attrs
      • 可以搜尋指定屬性的值
      • 範例
        • soup.find_all(attrs={"class": "sister"})
          • [
            • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
            • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
            • <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
          • ]
    • recursive 
      • 預設搜尋當前元素的所有子孫節點
        設為 False,則只會搜尋當前元素的直接子節點
      • 範例
        • soup.html.find_all("title", recursive=False)
          • []
    • string
      • 搜尋字符串內容
        不混用其他參數時,只會回傳 NavigableString 物件
      • 範例
        • soup.find_all(string="Elsie")
          • ['Elsie']
        • soup.find_all("a", string="Elsie")
          • [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    • limit
      • 限制返回結果的數量
      • 範例
        • soup.find_all("a", limit=2)
          • [
            • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
            • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
          • ]
    • **kwargs
      • 如果 key name 不是內置的參數名,將會當作元素的屬性名,並用 key value 來搜尋其值
      • 範例
        • soup.find_all(href=re.compile("elsie"), id='link1')
          • [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
        • soup.find_all("a", class_="sister")
          • [
            • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
            • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
            • <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
          • ]
      • 有些元素屬性在搜尋不能使用,比如 HTML5 中的 data-* 屬性, 需使用 attrs
      • class為 python 保留字,故需加上底線 class_ 使用
        • 因 class 為多值屬性,若 CSS 類名的順序與實際不符,將搜索不到結果
          • 範例
          • css_soup = BeautifulSoup('<p class="body strikeout"></p>', "html.parser")
            • css_soup.find_all("p", class_="body strikeout")
              • [<p class="body strikeout"></p>]
            • css_soup.find_all("p", class_="strikeout body")
              • []
  • find(name=None, attrs={}, recursive=True, text=None, **kwargs)
    • 類似 find_all,但只返回第一個找到的元素,找不到為 None
    • 簡寫
      • soup.find(tag) == soup.tag
  • find_parents(name=None, attrs={}, limit=None, **kwargs)
    • 搜尋當前元素的所有父輩節點
    • 範例
      • soup.b.find_parents('p')
        • [<p class="title"><b>The Dormouse's story</b></p>]
  • find_parent(name=None, attrs={}, **kwargs)
    • 搜尋當前元素的第一個父輩節點
    • 範例
      • soup.b.find_parent('p')
        •  <p class="title"><b>The Dormouse's story</b></p>
  • find_next_siblings(name=None, attrs={}, text=None, limit=None, **kwargs)
    • 搜尋當前元素之後的所有兄弟節點
    • 範例
      •  soup.a.find_next_siblings()
        • [
          • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
          • <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
        • ]
  • find_next_sibling(name=None, attrs={}, text=None, **kwargs)
    • 搜尋當前元素之後的第一個兄弟節點
    • 範例
      • soup.a.find_next_sibling()
        • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
  • find_previous_siblings(name=None, attrs={}, text=None, limit=None, **kwargs)
    • 搜尋當前元素之前的所有兄弟節點
    • 範例
      • soup.find(id='link3').find_previous_siblings() 
        • [
          • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
          • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
        • ]
  • find_previous_sibling(name=None, attrs={}, text=None, **kwargs)
    • 搜尋當前元素之前的第一個兄弟節點
    • 範例
      • soup.find(id='link3').find_previous_sibling()
        • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
  • find_all_next(name=None, attrs={}, text=None, limit=None, **kwargs)
    • 搜尋當前元素之後的所有解析節點
    • 範例
      • soup.a.find_all_next('p')
        • [<p class="story">...</p>]
  • find_next(name=None, attrs={}, text=None, **kwargs)
    • 搜尋當前元素之後的第一個解析節點
    • 範例
      • soup.a.find_next('p')
        • <p class="story">...</p>
  • find_all_previous(name=None, attrs={}, text=None, limit=None, **kwargs)
    • 搜尋當前元素之前的所有解析節點
    • 範例
      • soup.find(class_="story").find_all_previous('b') 
        • [<b>The Dormouse's story</b>]
  • find_previous(name=None, attrs={}, text=None, **kwargs)
    • 搜尋當前元素之前的第一個解析節點 
    • 範例
      • soup.find(class_="story").find_previous('b')
        • <b>The Dormouse's story</b>
  • select(selector, limit=None)
    • 使用 CSS 選擇器 的語法找到所有符合元素
    • 範例
      • soup.select('a.sister')
        • [
          • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
          • <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
          • <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
        • ]
  • select_one(selector)
    • 使用 CSS 選擇器 的語法找到第一個符合元素
    • 範例
      • soup.select_one('a.sister')
        • <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
  • get_text(separator="", strip=False, types=(NavigableString, CData))
    • 得到元素中包含的文本內容
    • 範例
      • soup.get_text()
        • "\n,The Dormouse's story,\n,\n,The Dormouse's story,\n,Once upon a time there were three little sisters; and their names were\n,Elsie,,\n,Lacie, and\n,Tillie,;\nand they lived at the bottom of a well.,\n,...,\n"

修改的方法

  • 修改元素的名稱和屬性
  • soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', "html.parser")
    tag = soup.b
    
    tag.name = "blockquote"
    tag['class'] = 'verybold'
    tag['id'] = 1
    tag
    # <blockquote class="verybold" id="1">Extremely bold</blockquote>
    
    del tag['class']
    del tag['id']
    tag
    # <blockquote>Extremely bold</blockquote>
    
  • 修改 .string
  • markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    
    tag = soup.a
    tag.string = "New link text."
    tag
    # <a href="http://example.com/">New link text.</a>
    
  • append(tag)
    • 元素中添加內容
    soup = BeautifulSoup("<a>Foo</a>", "html.parser")
    soup.a.append("Bar")
    
    soup
    # <html><head></head><body><a>FooBar</a></body></html>
    soup.a.contents
    # ['Foo', 'Bar']
    
  • new_tag(self, name, namespace=None, nsprefix=None, **attrs)
    • 新增元素
    soup = BeautifulSoup("<b></b>" , "html.parser")
    original_tag = soup.b
    
    new_tag = soup.new_tag("a", href="http://www.example.com")
    original_tag.append(new_tag)
    original_tag
    # <b><a href="http://www.example.com"></a></b>
    
    new_tag.string = "Link text."
    original_tag
    # <b><a href="http://www.example.com">Link text.</a></b>
    
  • insert(position, new_child)
    • 在指定位置插入元素
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    tag = soup.a
    
    tag.insert(1, "but did not endorse ")
    tag
    # <a href="http://example.com/">I linked to but did not endorse <i>example.com</i></a>
    tag.contents
    # ['I linked to ', 'but did not endorse', <i>example.com</i>]
    
  • insert_before(predecessor)
    • 在當前元素之前插入元素
    soup = BeautifulSoup("<b>stop</b>", "html.parser")
    tag = soup.new_tag("i")
    tag.string = "Don't"
    soup.b.string.insert_before(tag)
    soup.b
    # <b><i>Don't</i>stop</b>
    
  • insert_after(successor)
    • 在當前元素之後插入元素
    soup = BeautifulSoup("<b><i>Don't</i>stop</b>", "html.parser")
    soup.b.i.insert_after(soup.new_string(" ever "))
    soup.b
    # <b><i>Don't</i> ever stop</b>
    soup.b.contents
    # [<i>Don't</i>, ' ever ', 'stop']
    
  • clear(decompose=False)
    • 移除當前元素的內容
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    tag = soup.a
    
    tag.clear()
    tag
    # <a href="http://example.com/"></a>
    
  • extract()
    • 將當前元素抽出,並返回
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    a_tag = soup.a
    
    i_tag = soup.i.extract()
    
    a_tag
    # <a href="http://example.com/">I linked to</a>
    
    i_tag
    # <i>example.com</i>
    
    print(i_tag.parent)
    # None
    
  • decompose()
    • 將當前元素銷毀,但不返回
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    a_tag = soup.a
    
    soup.i.decompose()
    
    a_tag
    # <a href="http://example.com/">I linked to</a>
    
  • replace_with(replace_with)
    • 將當前元素取代,並返回被取代的元素
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    a_tag = soup.a
    
    new_tag = soup.new_tag("b")
    new_tag.string = "example.net"
    old_tag = a_tag.i.replace_with(new_tag)
    
    a_tag
    # <a href="http://example.com/">I linked to <b>example.net</b></a>
    old_tag
    # <i>example.com</i>
    
  • wrap(wrap_inside)
    • 將當前元素包裝,並返回結果
    soup = BeautifulSoup("<p>I wish I was bold.</p>", "html.parser")
    soup.p.string.wrap(soup.new_tag("b"))
    # <b>I wish I was bold.</b>
    
    soup.p.wrap(soup.new_tag("div"))
    # <div><p><b>I wish I was bold.</b></p></div>
    
  • unwrap()
    • 將當前元素移除外層元素,並返回被移除的元素
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    a_tag = soup.a
    
    a_tag.i.unwrap()
    # <i></i>
    a_tag
    # <a href="http://example.com/">I linked to example.com</a>
    

輸出的方法

  • prettify(encoding=None, formatter="minimal")
    • 整理後輸出
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    soup.prettify()
    # '<a href="http://example.com/">\n I linked to\n <i>\n  example.com\n </i>\n</a>'
    
    print(soup.prettify())
    # <a href="http://example.com/">
    #  I linked to
    #  <i>
    #   example.com
    #  </i>
    # </a>
    
  • str(soup)
    • 只想得到結果字符串,不重視格式
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, "html.parser")
    str(soup)
    

留言