[TIL] 12일차_NAVER_Finance_Daily_Price

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Molybdenum의 개발기록

[TIL] 12일차_NAVER_Finance_Daily_Price_Scraping 본문

TIL

[TIL] 12일차_NAVER_Finance_Daily_Price_Scraping

Molybdenum_j 2023. 3. 6. 11:43

▶ 네이버 금융 개별종목 수집

수집하고자 하는 페이지 URL

네이버 금융 국내증시 : https://finance.naver.com/sise/
주요 종목
- 삼성전자 : https://finance.naver.com/item/main.naver?code=005930
- 현대차 : https://finance.naver.com/item/main.naver?code=005380
- SK하이닉스 : https://finance.naver.com/item/main.naver?code=000660

▶ 라이브러리 로드

import pandas as pd
import request
from bs4 import BeautifulSoup as bs

▶ 수집할 URL 정하기

item_code = "005930"
item_name = "삼성전자"

# stock_item = {"삼성전자" : "005930",
              "카카오" : "035720"}

page_no = 1

url= f"https://finance.naver.com/item/sise_day.naver?code={item_code}&page={page_no}"

▶ request를 통한 HTTP 요청

headers에 "user-agent" 항목으로 브라우저임을 알려준다.

"user-agent" 정보는 브라우저 -> 네트워크탭 -> headers -> user-agent 정보를 참고한다.

headers = {"user-agent": "Mozilla/5.0"}
print(headers["user-agent"])
response = requests.get(url, headers=headers)
response.status_code

▶ BeautifulSoup을 통한 table 태그 찾기

find로 특정태그, id, class를 지정해서 원하는 태그를 찾을 수 있다.

html = bs(response.text)

html.find("a")

html.find_all('span')[0]

table_day = html.select("table")[0]

▶ pandas 코드 한줄로 데이터 수집하기

read_html을 이용하여 url의 page내의 값을 DataFrame으로 받아온다.

cp949는 한글 인코딩을 위해 사용한다. 기본 인코딩 설정은 utf-8이며, 네이버의 일별시세는 cp949 인코딩으로 불러올 수 있다. 데이터를 로드 했을 때 한글 인코딩이 깨진다면 대부분 cp949로 불러올 수 있다.

table = pd.read_html(response.text)
table[0]

table[0]와 table[1]을 확인하여 보면 table[0]에 필요한 데이터들이 있다. pd.read_html() url을 넣어주면 동작하지 않았던 이유는 데이터를 받아올 수 없기 때문인데 html 소스코드를 넣어주면 알아서 parsing한다.

▶ 결측치 제거하기

dropna를 통해 결측치가 들어있는 row를 제거한다.

axis 행, 열 중에 무엇을 기준으로 삭제할 것인지를 찾는다.

axis : {0 or 'index', 1 or 'columns'}, default 0

how 전부다 결측치일 때는 all, 일부만 결측치 일 때는 any

how : {'any', 'all'}, default 'any'

temp = table[0].dropna()

▶ 데이터 프레임 반환하기

return temp

▶ 함수가 만들어졌는지 확인

get_day_list("035720",2)

출처-멋쟁이사자처럼_AISCHOOL_박조은강사님

'TIL' 카테고리의 다른 글

[TIL] 15일차_SQL_Grammar_01 (0)	2023.03.07
[TIL] 14-16일차_President_Speech_Crawling (0)	2023.03.06
[TIL] 12일차_FinanceDataReader (0)	2023.03.06
[TIL] 12일차_Crawling (0)	2023.03.06
[TIL] 11일차_Pandas (0)	2023.03.06

'TIL' Related Articles

Comments

Molybdenum의 개발기록

[TIL] 12일차_NAVER_Finance_Daily_Price_Scraping 본문

[TIL] 12일차_NAVER_Finance_Daily_Price_Scraping

▶ 네이버 금융 개별종목 수집

'TIL' 카테고리의 다른 글

티스토리툴바