3. 웹 스크래핑

<aside> 💡 웹 스크래핑은 인터넷에서 데이터를 수집하기 위해 웹 페이지의 HTML을 분석하는 자동화된 방법입니다.

</aside>

🧪 스크래핑할 페이지

https://showcases.yalco.kr/python/web-scrap-ex/

import requests
from bs4 import BeautifulSoup

# 웹 페이지 불러오기
url = "<https://showcases.yalco.kr/python/web-scrap-ex/>"
response = requests.get(url)
html = response.text

pass

<aside> 💡 Requests 모듈은 파이썬에서 HTTP 요청을 쉽게 보낼 수 있게 해주는 라이브러리입니다.

</aside>

<aside> 💡 BeautifulSoup은 웹 페이지의 구조를 분석하고 파싱하여, 원하는 정보를 추출할 수 있게 도와주는 파이썬 라이브러리입니다. 복잡한 웹 페이지에서도 쉽게 데이터를 찾고 접근할 수 있어, 데이터 수집과 웹 스크래핑에 널리 사용됩니다.

</aside>

# BeautifulSoup 객체 생성
soup = BeautifulSoup(html, 'html.parser')

# 페이지 제목 및 header 안의 h1 요소 찾기
page_title = soup.title.text
header_h1 = soup.header.h1.text

# id가 info인 section 안의 div 요소들 파싱
info_section_divs = soup.find('section', id='info').find_all('div', recursive=False)

info_data = {}

for div in info_section_divs:
    h2_text = div.h2.text  # 각 div 안의 h2 텍스트
    list_items = div.find('ul').find_all('li')
    items_data = []

    for li in list_items:
        a_tag = li.find('a')
        item_data = {
            'href': a_tag['href'],
            'text': a_tag.text
        }
        items_data.append(item_data)

    info_data[h2_text] = items_data

⚠️ 게시판 테이블은 자바스크립트로 렌더링

BeautifulSoup 으로는 가져올 수 없음 (HTML 소스에 포함 X)
브라우저를 직접 조작하는 라이브러리 필요