<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:media="http://search.yahoo.com/mrss/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>부동산 정보 크롤링 &#8211; 투데이즈.kr</title>
	<atom:link href="https://2days.kr/tag/%eb%b6%80%eb%8f%99%ec%82%b0-%ec%a0%95%eb%b3%b4-%ed%81%ac%eb%a1%a4%eb%a7%81/feed/" rel="self" type="application/rss+xml" />
	<link>https://2days.kr</link>
	<description>투데이즈</description>
	<lastBuildDate>Sun, 16 Nov 2025 13:11:12 +0000</lastBuildDate>
	<language>ko-KR</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8</generator>

<image>
	<url>https://2days.kr/wp-content/uploads/2025/10/cropped-simbol-1-32x32.png</url>
	<title>부동산 정보 크롤링 &#8211; 투데이즈.kr</title>
	<link>https://2days.kr</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2</title>
		<link>https://2days.kr/18/09/22/56553/it/program/</link>
		
		<dc:creator><![CDATA[urjent]]></dc:creator>
		<pubDate>Wed, 18 Sep 2024 13:35:01 +0000</pubDate>
				<category><![CDATA[program]]></category>
		<category><![CDATA[네이버 부동산]]></category>
		<category><![CDATA[네이버 부동산 크롤링]]></category>
		<category><![CDATA[네이버 크롤링]]></category>
		<category><![CDATA[부동산]]></category>
		<category><![CDATA[부동산 정보 크롤링]]></category>
		<category><![CDATA[파이썬]]></category>
		<category><![CDATA[파이썬 부동산]]></category>
		<category><![CDATA[파이썬 크롤링]]></category>
		<guid isPermaLink="false">https://2days.kr/?p=56553</guid>

					<description><![CDATA[[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 편에 이어서 추가적으로 결과 값을 조금 더 디테일하게 정리해보려고 합니다. https://fin.land.naver.com/complexes/106861?tab=complex-info 여기에서 보면 우리 데이터와 일부 맞지 않는 부분을 확인 할 수 있습니다. 바로 공급면적, 전용면적이 실제 매물에 나와 있는 면적과 다르다는 것입니다. [고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2 그 이유는 바로 네이버 [&#8230;]]]></description>
										<content:encoded><![CDATA[<p data-ke-size="size16">[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 편에 이어서 추가적으로 결과 값을 조금 더 디테일하게 정리해보려고 합니다. <a href="https://fin.land.naver.com/complexes/106861?tab=complex-info" target="_blank" rel="noopener noreferrer noopener">https://fin.land.naver.com/complexes/106861?tab=complex-info</a></p>
<p data-ke-size="size16">여기에서 보면 우리 데이터와 일부 맞지 않는 부분을 확인 할 수 있습니다. 바로 공급면적, 전용면적이 실제 매물에 나와 있는 면적과 다르다는 것입니다.</p>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><figure style="width: 2544px" class="wp-caption alignnone"><img decoding="async" src="https://blog.kakaocdn.net/dn/kNe8i/btsJFrFyRi2/gcx17BYOkKkh0eWUVLKSK0/img.png" alt="[고급] 부동산 정보 필터 고도화 - 네이버 매물 정리하기 2" width="2544" height="852" data-origin-width="2544" data-origin-height="852" data-filename="스크린샷 2024-09-18 오전 9.23.47.png" data-is-animation="false" loading="lazy" data-origin- title="[고급] 부동산 정보 필터 고도화 - 네이버 매물 정리하기 2 2"><figcaption class="wp-caption-text">[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2</figcaption></figure><figcaption>[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2</figcaption></figure><div class='code-block code-block-2' style='margin: 8px auto; text-align: center; display: block; clear: both;'>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8940400388075870"
     crossorigin="anonymous"></script>
<!-- 중간 -->
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-8940400388075870"
     data-ad-slot="8794586137"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script></div>

<p data-ke-size="size16">그 이유는 바로 네이버 면적 정보 부분에서 면적 정보를 각각 클릭해야만 해당 면적에 대한 정보를 가져오게 되는데, 우리가 크롤링했던 공급면적, 전용면적 등의 정보는 가장 처음 나오는 면적에 대한 정보를 끌어 왔기 때문이죠, 따라서 면적이라는 부분의 데이터를 기준으로 맞춰서 해당 공급면적을 찾아내고, 그에 맞는 전용면적 ~ 방/욕실에 대한 정보를 수정해야겠습니다.</p>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/b6p1RV/btsJDviA0O1/UmxBmwVcZouZlXZ7GdI1zk/img.png" data-origin-width="1488" data-origin-height="1158" data-filename="스크린샷 2024-09-18 오전 9.25.38.png" data-is-animation="false" loading="lazy" alt="img" title="[고급] 부동산 정보 필터 고도화 - 네이버 매물 정리하기 2 3"><figcaption>[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2</figcaption></figure>
<h3 data-ke-size="size23"><b>[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2</b></h3>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><figure style="width: 2560px" class="wp-caption alignnone"><img decoding="async" src="https://blog.kakaocdn.net/dn/sr0e1/btsJDwuPIL5/DERzjgIQ1ykgt3RksCoNp1/img.png" alt="[고급] 부동산 정보 필터 고도화 - 네이버 매물 정리하기 2" width="2560" height="2560" data-origin-width="2560" data-origin-height="2560" data-filename="[고급] 부동산 정보 필터 고도화 - 네이버 매물 정리하기 2.png" data-is-animation="false" loading="lazy" data-origin- title="[고급] 부동산 정보 필터 고도화 - 네이버 매물 정리하기 2 4"><figcaption class="wp-caption-text">[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2</figcaption></figure><figcaption>[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2</figcaption></figure>
<p data-ke-size="size16">오늘 글은 시리즈로 구성된 기본편을 기본으로 하고 있습니다. 아직 기본편을 못 보신 분들이라면 아래 글을 한번 읽어 주세요!</p>
<p data-ke-size="size16"><a href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-1" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1</a></p>
<p data-ke-size="size16"><a href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-2" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2</a></p>
<p data-ke-size="size16"><a href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-3" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #3</a></p>
<p data-ke-size="size16"><a href="https://aboda.kr/entry/%EA%B3%A0%EA%B8%89-%EB%B6%80%EB%8F%99%EC%82%B0-%EC%A0%95%EB%B3%B4-%ED%95%84%ED%84%B0-%EA%B3%A0%EB%8F%84%ED%99%94-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%A6%AC%ED%95%98%EA%B8%B0" target="_blank" rel="noopener">2024.09.17 &#8211; [부동산/자동화 프로젝트] &#8211; [고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기</a></p>
<p data-ke-size="size16">이 부분은 동적 네트워크를 사용해야 해서, 현재 데이터에서만 정리해볼까 합니다. 현재 정리되어 있는 전체코드는 아래와 같습니다.</p>

<pre id="code_1726623550100" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">from google.colab import drive
import requests
import json
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

<span class="hljs-comment"># Google Drive 마운트</span>
drive.mount(<span class="hljs-string">'/content/drive'</span>)

<span class="hljs-comment"># 법정동 코드를 가져오는 함수</span>
def get_dong_codes_for_city(city_name, sigungu_name=None, json_path=<span class="hljs-string">'/content/drive/MyDrive/district.json'</span>):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None

    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            <span class="hljs-keyword">if</span> sigungu_name and sigungu_name != <span class="hljs-string">'전체'</span>:
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]:
                    <span class="hljs-keyword">if</span> sigungu[<span class="hljs-string">'sigungu_name'</span>] == sigungu_name:
                        <span class="hljs-built_in">return</span> [sigungu[<span class="hljs-string">'sigungu_code'</span>]], [
                            {<span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>], <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]} <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                        ]
            <span class="hljs-keyword">else</span>:  <span class="hljs-comment"># 시군구 '전체'</span>
                sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]]
                dong_codes = [
                    {<span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>], <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]}
                    <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]
                    <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                ]
                <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

<span class="hljs-comment"># 아파트 코드 리스트 가져오기</span>
def get_apt_list(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0"</span>
    }

    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()

        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            df = pd.DataFrame(data[<span class="hljs-string">'complexList'</span>])
            required_columns = [<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>]

            <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> required_columns:
                <span class="hljs-keyword">if</span> col not <span class="hljs-keyword">in</span> df.columns:
                    df[col] = None

            <span class="hljs-built_in">return</span> df[required_columns]
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> pd.DataFrame(columns=required_columns)

    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching data for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=required_columns)

<span class="hljs-comment"># 아파트 코드로 상세 정보를 가져오는 함수 (매매 정보 추가)</span>
def get_apt_details(apt_code):
    details_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=complex-info'</span>
    article_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=article&amp;tradeTypes=A1'</span>
    
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"fin.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://fin.land.naver.com/"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0"</span>
    }
    
    try:
        <span class="hljs-comment"># 기본 정보 가져오기</span>
        r_details = requests.get(details_url, headers=header)
        r_details.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup_details = BeautifulSoup(r_details.content, <span class="hljs-string">'html.parser'</span>)
        
        <span class="hljs-comment"># 아파트 이름 추출</span>
        apt_name_tag = soup_details.find(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'ComplexSummary_name__vX3IN'</span>)
        apt_name = apt_name_tag.text.strip() <span class="hljs-keyword">if</span> apt_name_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>

        <span class="hljs-comment"># 기본 정보 딕셔너리</span>
        detail_dict = {<span class="hljs-string">'complexNo'</span>: apt_code, <span class="hljs-string">'complexName'</span>: apt_name}
        
        <span class="hljs-comment"># 기본 상세 정보 추출 (공급면적, 전용면적, 방/욕실 등)</span>
        detail_items = soup_details.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'DataList_item__T1hMR'</span>)
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> detail_items:
            term = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_term__Tks7l'</span>).text.strip()
            definition = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_definition__d9KY1'</span>).text.strip()
            <span class="hljs-keyword">if</span> term <span class="hljs-keyword">in</span> [<span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>, <span class="hljs-string">'위치'</span>, <span class="hljs-string">'사용승인일'</span>, <span class="hljs-string">'세대수'</span>, <span class="hljs-string">'난방'</span>, <span class="hljs-string">'주차'</span>, <span class="hljs-string">'전기차 충전시설'</span>, <span class="hljs-string">'용적률/건폐율'</span>, <span class="hljs-string">'관리사무소 전화'</span>, <span class="hljs-string">'건설사'</span>]:
                detail_dict[term] = definition
        
        <span class="hljs-comment"># 매물 정보 가져오기</span>
        r_article = requests.get(article_url, headers=header)
        r_article.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup_article = BeautifulSoup(r_article.content, <span class="hljs-string">'html.parser'</span>)
        
        <span class="hljs-comment"># 매물 리스트</span>
        listings = []
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> soup_article.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'ComplexArticleItem_item__L5o7k'</span>):
            listing = {}
            
            <span class="hljs-comment"># 매물 이름</span>
            name_tag = item.find(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'ComplexArticleItem_name__4h3AA'</span>)
            listing[<span class="hljs-string">'매물명'</span>] = name_tag.text.strip() <span class="hljs-keyword">if</span> name_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
            
            <span class="hljs-comment"># 매매 가격</span>
            price_tag = item.find(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'ComplexArticleItem_price__DFeIb'</span>)
            listing[<span class="hljs-string">'매매가'</span>] = price_tag.text.strip() <span class="hljs-keyword">if</span> price_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
            
            <span class="hljs-comment"># 면적, 층수, 방향</span>
            summary_items = item.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'ComplexArticleItem_item-summary__oHSwl'</span>)
            <span class="hljs-keyword">if</span> len(summary_items) &gt;= 4:
                listing[<span class="hljs-string">'면적'</span>] = summary_items[1].text.strip() <span class="hljs-keyword">if</span> len(summary_items) &gt; 1 <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
                listing[<span class="hljs-string">'층수'</span>] = summary_items[2].text.strip() <span class="hljs-keyword">if</span> len(summary_items) &gt; 2 <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
                listing[<span class="hljs-string">'방향'</span>] = summary_items[3].text.strip() <span class="hljs-keyword">if</span> len(summary_items) &gt; 3 <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
            
            <span class="hljs-comment"># 이미지</span>
            image_tag = item.find(<span class="hljs-string">'img'</span>)
            listing[<span class="hljs-string">'이미지'</span>] = image_tag[<span class="hljs-string">'src'</span>] <span class="hljs-keyword">if</span> image_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'No image'</span>
            
            <span class="hljs-comment"># 코멘트</span>
            comment_tag = item.find(<span class="hljs-string">'p'</span>, class_=<span class="hljs-string">'ComplexArticleItem_comment__zN_dK'</span>)
            listing[<span class="hljs-string">'코멘트'</span>] = comment_tag.text.strip() <span class="hljs-keyword">if</span> comment_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'No comment'</span>
            
            <span class="hljs-comment"># 각 매물마다 기본 상세 정보(공급면적, 방/욕실 등)를 매물에 추가</span>
            combined_listing = {**detail_dict, **listing}
            listings.append(combined_listing)
        
        <span class="hljs-built_in">return</span> listings
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching details for {apt_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> []

<span class="hljs-comment"># 아파트 정보를 수집하는 함수 (법정동 선택 가능)</span>
def collect_apt_info_for_city(city_name, sigungu_name, dong_name=None, json_path=<span class="hljs-string">'/content/drive/MyDrive/district.json'</span>):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, sigungu_name, json_path)

    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None

    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-comment"># 법정동 선택</span>
    <span class="hljs-keyword">if</span> dong_name and dong_name != <span class="hljs-string">'전체'</span>:
        dong_code_name_map = {k: v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> dong_code_name_map.items() <span class="hljs-keyword">if</span> v == dong_name}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting apartment codes for {dong_code} ({dong_name})"</span>)
        apt_codes = get_apt_list(dong_code)

        <span class="hljs-keyword">if</span> not apt_codes.empty:
            <span class="hljs-keyword">for</span> _, apt_info <span class="hljs-keyword">in</span> apt_codes.iterrows():
                apt_code = apt_info[<span class="hljs-string">'complexNo'</span>]
                <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting details for {apt_code}"</span>)
                listings = get_apt_details(apt_code)
                
                <span class="hljs-keyword">if</span> listings:
                    <span class="hljs-keyword">for</span> listing <span class="hljs-keyword">in</span> listings:
                        <span class="hljs-comment"># 모든 매물 정보를 결합</span>
                        listing[<span class="hljs-string">'dong_code'</span>] = dong_code
                        listing[<span class="hljs-string">'dong_name'</span>] = dong_name
                        all_apt_data.append(listing)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No apartment codes found for {dong_code}"</span>)

    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.DataFrame(all_apt_data)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = sigungu_name
        final_df[<span class="hljs-string">'dong_name'</span>] = dong_name <span class="hljs-keyword">if</span> dong_name <span class="hljs-keyword">else</span> <span class="hljs-string">'전체'</span>
        
        <span class="hljs-comment"># 엑셀 파일로 저장</span>
        file_path = f<span class="hljs-string">'/content/drive/MyDrive/{city_name}_{sigungu_name}_apartments.xlsx'</span>
        final_df.to_excel(file_path, index=False)
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data to save."</span>)

<span class="hljs-comment"># 함수 호출 예시</span>
collect_apt_info_for_city(<span class="hljs-string">"서울특별시"</span>, <span class="hljs-string">"강남구"</span>, <span class="hljs-string">"개포동"</span>)</pre>
<p>자 다음편에서는 이 서비스를 스트림릿으로 연계해서 실제 사용자가 편하게 웹에서 선택하여 사용할 수 있도록 정리해보겠습니다.</p>
<p>&nbsp;</p>
<!-- CONTENT END 2 -->
]]></content:encoded>
					
		
		
		<media:content url="https://2days.kr/wp-content/uploads/2024/09/고급-부동산-정보-필터-고도화-네이버-매물-정리하기-2.png" medium="image"></media:content>
            	</item>
		<item>
		<title>부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]</title>
		<link>https://2days.kr/17/09/07/56548/it/program/</link>
		
		<dc:creator><![CDATA[urjent]]></dc:creator>
		<pubDate>Mon, 16 Sep 2024 22:06:37 +0000</pubDate>
				<category><![CDATA[program]]></category>
		<category><![CDATA[네이버 부동산]]></category>
		<category><![CDATA[네이버부동산]]></category>
		<category><![CDATA[네이버부동산크롤링]]></category>
		<category><![CDATA[네이버크롤링]]></category>
		<category><![CDATA[부동산]]></category>
		<category><![CDATA[부동산 정보 크롤링]]></category>
		<category><![CDATA[부동산 크롤링]]></category>
		<guid isPermaLink="false">https://2days.kr/?p=56548</guid>

					<description><![CDATA[부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 그 동안 부동산 정보 크롤링 코드를 모아서 좀 더 고도화 하는 작업을 하도록 해볼게요. 우선 네이버에서 &#8220;서울특별시&#8221; 를 입력할 경우 모든 법정동을 조회하여 법정동에 헤당하는 아파트를 먼저 모아 오도록 하겠습니다. 법정동에 포함된 아파트 정보를 수집하여 분석할 수 있는 raw data를 만들고 제 기준에 따라 매물을 정리해보도록 하겠습니다. [&#8230;]]]></description>
										<content:encoded><![CDATA[<p data-ke-size="size16">부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 그 동안 부동산 정보 크롤링 코드를 모아서 좀 더 고도화 하는 작업을 하도록 해볼게요. 우선 네이버에서 &#8220;서울특별시&#8221; 를 입력할 경우 모든 법정동을 조회하여 법정동에 헤당하는 아파트를 먼저 모아 오도록 하겠습니다. 법정동에 포함된 아파트 정보를 수집하여 분석할 수 있는 raw data를 만들고 제 기준에 따라 매물을 정리해보도록 하겠습니다.</p>
<h3 data-ke-size="size23">부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]</h3>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><figure style="width: 2560px" class="wp-caption alignnone"><img alt="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]" title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]" post-id="56548" fifu-featured="1" decoding="async" src="https://blog.kakaocdn.net/dn/b7YJpZ/btsJD8fFRcQ/pB1RQqe0MeWqbrelTlqcd1/img.png" alt="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]" width="2560" height="2560" data-origin-width="2560" data-origin-height="2560" data-is-animation="false" data-filename="[고급] 부동산 정보 필터 고도화 - 네이버 매물 정리하기.png" loading="lazy" data-origin- title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 8"><figcaption class="wp-caption-text">부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]</figcaption></figure><figcaption>[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기</figcaption></figure><div class='code-block code-block-2' style='margin: 8px auto; text-align: center; display: block; clear: both;'>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8940400388075870"
     crossorigin="anonymous"></script>
<!-- 중간 -->
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-8940400388075870"
     data-ad-slot="8794586137"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script></div>

<p data-ke-size="size16"><a href="https://aboda.kr/entry/%ED%8C%8C%EC%9D%B4%EC%8D%AC-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%A7%A4%EA%B0%80-%EC%A1%B0%ED%9A%8C-%ED%94%84%EB%A1%9C%EA%B7%B8%EB%9E%A8-%EB%A7%8C%EB%93%A4%EA%B8%B0-2%ED%8E%B8-%EC%A7%80%EC%97%AD%EC%BD%94%EB%93%9C" target="_blank" rel="noopener">2024.09.14 &#8211; [부동산/자동화 프로젝트] &#8211; 파이썬 부동산 매매가 조회 프로그램 만들기 2편 (지역코드)</a></p>
<p data-ke-size="size16"><a style="background-color: #e6f5ff; color: #0070d1; text-align: start;" href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-2" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2</a></p>
<p data-ke-size="size16"><a href="https://2days.kr/14/09/08/56521/coding/data/">파이썬 부동산 매매가 조회 프로그램 만들기 2편 (지역코드)</a></p>
<p data-ke-size="size16">부동산 매매가 조회편과 부동산 매물 정보 수집 편을 확인하면, 아래 코드를 수정하는데 어려움이 없으 실 겁니다.</p>
<pre id="code_1726453526136" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd
from datetime import datetime

def get_dong_codes_for_city(city_name, json_path):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None
    
    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]]
            dong_codes = [
                {
                    <span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>],
                    <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]
                }
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]
                <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
            ]
            <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

def get_apt_list(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378?ms=37.5018495,127.0438028,16&amp;a=APT&amp;b=A1&amp;e=RETAIL"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()
        
        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            df = pd.DataFrame(data[<span class="hljs-string">'complexList'</span>])
            
            <span class="hljs-comment"># 추가 정보 포함 (예: 건축년도, 세대수, 평수 등)</span>
            required_columns = [<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>]
            
            <span class="hljs-comment"># 필요한 컬럼만 추출 (없으면 기본값 설정)</span>
            <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> required_columns:
                <span class="hljs-keyword">if</span> col not <span class="hljs-keyword">in</span> df.columns:
                    df[col] = None
            
            <span class="hljs-built_in">return</span> df[required_columns]
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>])
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching data for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>])

def collect_apt_info_for_city(city_name, json_path):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, json_path)
    
    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None
    
    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting data for {dong_code} ({dong_name})"</span>)
        apt_data = get_apt_list(dong_code)
        
        <span class="hljs-keyword">if</span> not apt_data.empty:
            apt_data = apt_data.copy()  <span class="hljs-comment"># 데이터프레임 복사본을 생성</span>
            apt_data[<span class="hljs-string">'dong_code'</span>] = dong_code
            apt_data[<span class="hljs-string">'dong_name'</span>] = dong_name
            all_apt_data.append(apt_data)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}"</span>)
    
    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.concat(all_apt_data, ignore_index=True)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = final_df[<span class="hljs-string">'dong_code'</span>].apply(lambda x: x[:5])  <span class="hljs-comment"># sigungu_name 추출</span>
        final_df[<span class="hljs-string">'dong_name'</span>] = final_df[<span class="hljs-string">'dong_name'</span>].apply(lambda x: x)  <span class="hljs-comment"># 동 이름 적용</span>
        final_df = final_df[[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>]]
        <span class="hljs-built_in">return</span> final_df
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>])

def save_to_excel(df, city_name):
    now = datetime.now().strftime(<span class="hljs-string">"%Y%m%d_%H%M%S"</span>)
    file_name = f<span class="hljs-string">"{city_name}_{now}.xlsx"</span>
    file_path = f<span class="hljs-string">'/content/drive/MyDrive/{file_name}'</span>
    
    df.to_excel(file_path, index=False)
    <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = input(<span class="hljs-string">"Enter the city or province name: "</span>)
json_path = <span class="hljs-string">'/content/drive/MyDrive/district.json'</span>  <span class="hljs-comment"># 올바른 JSON 파일 경로로 수정하십시오.</span>

<span class="hljs-comment"># 아파트 정보 수집</span>
apt_data = collect_apt_info_for_city(city_name, json_path)

<span class="hljs-keyword">if</span> apt_data is not None:
    <span class="hljs-built_in">print</span>(apt_data)
    save_to_excel(apt_data, city_name)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data collected."</span>)</pre>
<p data-ke-size="size16">이제 이 코드를 실행해서 어떻게 나오는지 확인해볼까요? 서울특별시 각 구에 속해 있는 아파트 리스트를 아래와 같이 잘 출력하는 것을 확인했습니다. 이제 각 아파트 정보를 옆에 추가적으로 넣어 보도록하겠습니다.</p>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/x35fg/btsJFp8zmNg/kHPsQ9pn01dCcs20FQyzLK/img.png" data-origin-width="732" data-origin-height="1006" data-filename="스크린샷 2024-09-16 오후 8.16.33.png" data-is-animation="false" loading="lazy" alt="img" title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 9"><figcaption>[고급] 부동산 정보 필터 고도화</figcaption></figure>
<p data-ke-size="size16">이 데이터에 아파트별 정보를 추가로 넣는 작업을 하면 아래 코드와 같습니다</p>
<pre id="code_1726520774035" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

def get_dong_codes_for_city(city_name, json_path):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None
    
    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]]
            dong_codes = [
                {
                    <span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>],
                    <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]
                }
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]
                <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
            ]
            <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

def get_apt_codes(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378?ms=37.5018495,127.0438028,16&amp;a=APT&amp;b=A1&amp;e=RETAIL"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()
        
        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            apt_codes = [complex_info[<span class="hljs-string">'complexNo'</span>] <span class="hljs-keyword">for</span> complex_info <span class="hljs-keyword">in</span> data[<span class="hljs-string">'complexList'</span>]]
            <span class="hljs-built_in">return</span> apt_codes
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> []
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching apartment codes for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> []

def get_apt_details(apt_code):
    details_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=complex-info'</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"fin.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://fin.land.naver.com/"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(details_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup = BeautifulSoup(r.content, <span class="hljs-string">'html.parser'</span>)
        
        <span class="hljs-comment"># Extract complex details</span>
        detail_dict = {<span class="hljs-string">'complexNo'</span>: apt_code}
        
        detail_items = soup.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'DataList_item__T1hMR'</span>)
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> detail_items:
            term = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_term__Tks7l'</span>).text.strip()
            definition = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_definition__d9KY1'</span>).text.strip()
            <span class="hljs-keyword">if</span> term <span class="hljs-keyword">in</span> [<span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>]:
                detail_dict[term] = definition
        
        <span class="hljs-built_in">return</span> detail_dict
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching details for {apt_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> {}

def collect_apt_info_for_city(city_name, json_path):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, json_path)
    
    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None
    
    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting apartment codes for {dong_code} ({dong_name})"</span>)
        apt_codes = get_apt_codes(dong_code)
        
        <span class="hljs-keyword">if</span> apt_codes:
            <span class="hljs-keyword">for</span> apt_code <span class="hljs-keyword">in</span> apt_codes:
                <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting details for {apt_code}"</span>)
                apt_details = get_apt_details(apt_code)
                
                <span class="hljs-keyword">if</span> apt_details:
                    apt_details[<span class="hljs-string">'dong_code'</span>] = dong_code
                    apt_details[<span class="hljs-string">'dong_name'</span>] = dong_name
                    all_apt_data.append(apt_details)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No apartment codes found for {dong_code}"</span>)
    
    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.DataFrame(all_apt_data)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = final_df[<span class="hljs-string">'dong_code'</span>].apply(lambda x: x[:5])  <span class="hljs-comment"># sigungu_name 추출</span>
        final_df[<span class="hljs-string">'dong_name'</span>] = final_df[<span class="hljs-string">'dong_name'</span>].apply(lambda x: x)  <span class="hljs-comment"># 동 이름 적용</span>
        final_df = final_df[[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>]]
        <span class="hljs-built_in">return</span> final_df
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>])

def save_to_excel(df, city_name):
    now = datetime.now().strftime(<span class="hljs-string">"%Y%m%d_%H%M%S"</span>)
    file_name = f<span class="hljs-string">"{city_name}_{now}.xlsx"</span>
    file_path = f<span class="hljs-string">'/content/drive/MyDrive/{file_name}'</span>
    
    df.to_excel(file_path, index=False)
    <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = input(<span class="hljs-string">"Enter the city or province name: "</span>)
json_path = <span class="hljs-string">'/content/drive/MyDrive/district.json'</span>  <span class="hljs-comment"># 올바른 JSON 파일 경로로 수정하십시오.</span>

<span class="hljs-comment"># 아파트 정보 수집</span>
apt_data = collect_apt_info_for_city(city_name, json_path)

<span class="hljs-keyword">if</span> apt_data is not None:
    <span class="hljs-built_in">print</span>(apt_data)
    save_to_excel(apt_data, city_name)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data collected."</span>)</pre>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><img decoding="async" src="https://blog.kakaocdn.net/dn/bbB4a6/btsJE2Z8mel/Sn6gf0ppHMSkNKQ6h8bdek/img.png" data-is-animation="false" data-origin-width="2142" data-origin-height="828" data-filename="스크린샷 2024-09-17 오전 6.06.24.png" loading="lazy" alt="img" title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 10"></figure>
<p>서울특별시 전체 동에 대한 정보를 수집하는 것은 꽤 시간이 오래 걸립니다. 서울특별시 전체 동에 대한 정보가 아닌 특정 구까지 입력받아서 조사하도록 수정해보겠습니다. 그리고 만약 전체 구를 조사하고 싶으면 전체 라고 입력할 경우 전체 구에 포함된 법정동을 조사하도록 수정합니다.</p>

<pre id="code_1726521405336" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

<span class="hljs-comment"># 시/도와 구 정보를 JSON 파일에서 불러오는 함수</span>
def get_dong_codes_for_city(city_name, json_path, sigungu_name=None):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None
    
    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            all_sigungu = si_do[<span class="hljs-string">'sigungu'</span>]
            <span class="hljs-keyword">if</span> sigungu_name and sigungu_name != <span class="hljs-string">"전체"</span>:
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> all_sigungu:
                    <span class="hljs-keyword">if</span> sigungu[<span class="hljs-string">'sigungu_name'</span>] == sigungu_name:
                        dong_codes = [
                            {
                                <span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>],
                                <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]
                            }
                            <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                        ]
                        <span class="hljs-built_in">return</span> [sigungu[<span class="hljs-string">'sigungu_code'</span>]], dong_codes
            <span class="hljs-keyword">else</span>:  <span class="hljs-comment"># 전체 구를 선택한 경우</span>
                sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> all_sigungu]
                dong_codes = [
                    {
                        <span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>],
                        <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]
                    }
                    <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> all_sigungu
                    <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                ]
                <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

<span class="hljs-comment"># 법정동 코드로 아파트 코드를 가져오는 함수</span>
def get_apt_codes(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378?ms=37.5018495,127.0438028,16&amp;a=APT&amp;b=A1&amp;e=RETAIL"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()
        
        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            apt_codes = [complex_info[<span class="hljs-string">'complexNo'</span>] <span class="hljs-keyword">for</span> complex_info <span class="hljs-keyword">in</span> data[<span class="hljs-string">'complexList'</span>]]
            <span class="hljs-built_in">return</span> apt_codes
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> []
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching apartment codes for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> []

<span class="hljs-comment"># 아파트 코드로 상세 정보를 가져오는 함수</span>
def get_apt_details(apt_code):
    details_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=complex-info'</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"fin.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://fin.land.naver.com/"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(details_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup = BeautifulSoup(r.content, <span class="hljs-string">'html.parser'</span>)
        
        <span class="hljs-comment"># Extract complex details</span>
        detail_dict = {<span class="hljs-string">'complexNo'</span>: apt_code}
        
        detail_items = soup.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'DataList_item__T1hMR'</span>)
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> detail_items:
            term = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_term__Tks7l'</span>).text.strip()
            definition = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_definition__d9KY1'</span>).text.strip()
            <span class="hljs-keyword">if</span> term <span class="hljs-keyword">in</span> [<span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>]:
                detail_dict[term] = definition
        
        <span class="hljs-built_in">return</span> detail_dict
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching details for {apt_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> {}

<span class="hljs-comment"># 도시와 구별로 아파트 정보를 수집하는 함수</span>
def collect_apt_info_for_city(city_name, json_path, sigungu_name=None):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, json_path, sigungu_name)
    
    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} or {sigungu_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None
    
    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting apartment codes for {dong_code} ({dong_name})"</span>)
        apt_codes = get_apt_codes(dong_code)
        
        <span class="hljs-keyword">if</span> apt_codes:
            <span class="hljs-keyword">for</span> apt_code <span class="hljs-keyword">in</span> apt_codes:
                <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting details for {apt_code}"</span>)
                apt_details = get_apt_details(apt_code)
                
                <span class="hljs-keyword">if</span> apt_details:
                    apt_details[<span class="hljs-string">'dong_code'</span>] = dong_code
                    apt_details[<span class="hljs-string">'dong_name'</span>] = dong_name
                    all_apt_data.append(apt_details)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No apartment codes found for {dong_code}"</span>)
    
    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.DataFrame(all_apt_data)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = final_df[<span class="hljs-string">'dong_code'</span>].apply(lambda x: x[:5])  <span class="hljs-comment"># sigungu_name 추출</span>
        final_df[<span class="hljs-string">'dong_name'</span>] = final_df[<span class="hljs-string">'dong_name'</span>].apply(lambda x: x)  <span class="hljs-comment"># 동 이름 적용</span>
        final_df = final_df[[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>]]
        <span class="hljs-built_in">return</span> final_df
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>])

<span class="hljs-comment"># 엑셀 파일로 저장하는 함수</span>
def save_to_excel(df, city_name, sigungu_name=None):
    now = datetime.now().strftime(<span class="hljs-string">"%Y%m%d_%H%M%S"</span>)
    <span class="hljs-keyword">if</span> sigungu_name and sigungu_name != <span class="hljs-string">"전체"</span>:
        file_name = f<span class="hljs-string">"{city_name}_{sigungu_name}_{now}.xlsx"</span>
    <span class="hljs-keyword">else</span>:
        file_name = f<span class="hljs-string">"{city_name}_전체_{now}.xlsx"</span>
    
    file_path = f<span class="hljs-string">'/content/drive/MyDrive/{file_name}'</span>
    
    df.to_excel(file_path, index=False)
    <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = input(<span class="hljs-string">"Enter the city or province name (e.g., 서울특별시): "</span>)
sigungu_name = input(<span class="hljs-string">"Enter the district (gu) name or type '전체' for all districts: "</span>)
json_path = <span class="hljs-string">'/content/drive/MyDrive/district.json'</span>  <span class="hljs-comment"># 올바른 JSON 파일 경로로 수정하십시오.</span>

<span class="hljs-comment"># 아파트 정보 수집</span>
apt_data = collect_apt_info_for_city(city_name, json_path, sigungu_name)

<span class="hljs-keyword">if</span> apt_data is not None:
    <span class="hljs-built_in">print</span>(apt_data)
    save_to_excel(apt_data, city_name, sigungu_name)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data collected."</span>)</pre>
<p>잘 동작은 합니다만, 여전히 구에 포함된 아파트도 너무 많죠? 시간이 많이 걸리네요, 이제는 특정 시, 구, 동까지 선택할 수 있도록 더 수정해봅니다.</p>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><img decoding="async" src="https://blog.kakaocdn.net/dn/cTfBPZ/btsJEiWKiLx/LeR3JldMbs1VHhVTjD8aIK/img.png" data-is-animation="false" data-origin-width="1288" data-origin-height="808" data-filename="스크린샷 2024-09-17 오전 6.16.14.png" loading="lazy" alt="img" title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 11"></figure>
<p>수정을 하면서, 아파트명이 누락되어 있기 때문에 아파트명도 포함시키도록 하겠습니다. 즉 <span style="color: #0d0d0d; text-align: start;">법정동도 선택할 수 있도록 할 예정인데</span></p>
<p><span style="color: #0d0d0d; text-align: start;">예를 들어, 서울특별시, 마포구, 아현동까지 법정동을 선택하면 법정동만 조사하고, 전체 라고 선택하면 전체를 조사하게 되는 것입니다. 시/군, 법정동 각각 전체를 입력으면서 만약 시군구를 전체라고 입력하면 법정동은 자동으로 입력 받을 필요 없이 전체를 조사하게 됩니다. 그리고 조사자료에 아파트명이 빠져 있는데 아파트 코드명 옆 열에 아파트 명을 넣을 예정입니다.</span></p>
<pre id="code_1726522604485" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">from google.colab import drive
import requests
import json
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

<span class="hljs-comment"># Google Drive 마운트</span>
drive.mount(<span class="hljs-string">'/content/drive'</span>)

<span class="hljs-comment"># 법정동 코드를 가져오는 함수</span>
def get_dong_codes_for_city(city_name, sigungu_name=None, json_path=<span class="hljs-string">'/content/drive/MyDrive/district.json'</span>):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None

    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            <span class="hljs-keyword">if</span> sigungu_name and sigungu_name != <span class="hljs-string">'전체'</span>:
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]:
                    <span class="hljs-keyword">if</span> sigungu[<span class="hljs-string">'sigungu_name'</span>] == sigungu_name:
                        <span class="hljs-built_in">return</span> [sigungu[<span class="hljs-string">'sigungu_code'</span>]], [
                            {<span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>], <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]} <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                        ]
            <span class="hljs-keyword">else</span>:  <span class="hljs-comment"># 시군구 '전체'</span>
                sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]]
                dong_codes = [
                    {<span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>], <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]}
                    <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]
                    <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                ]
                <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

<span class="hljs-comment"># 아파트 코드 리스트 가져오기</span>
def get_apt_list(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0"</span>
    }

    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()

        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            df = pd.DataFrame(data[<span class="hljs-string">'complexList'</span>])
            required_columns = [<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>]

            <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> required_columns:
                <span class="hljs-keyword">if</span> col not <span class="hljs-keyword">in</span> df.columns:
                    df[col] = None

            <span class="hljs-built_in">return</span> df[required_columns]
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> pd.DataFrame(columns=required_columns)

    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching data for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=required_columns)

<span class="hljs-comment"># 아파트 코드로 상세 정보를 가져오는 함수</span>
def get_apt_details(apt_code):
    details_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=complex-info'</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"fin.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://fin.land.naver.com/"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(details_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup = BeautifulSoup(r.content, <span class="hljs-string">'html.parser'</span>)
        
        <span class="hljs-comment"># 아파트 이름 추출</span>
        apt_name_tag = soup.find(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'ComplexSummary_name__vX3IN'</span>)
        apt_name = apt_name_tag.text.strip() <span class="hljs-keyword">if</span> apt_name_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>

        <span class="hljs-comment"># 상세 정보 추출</span>
        detail_dict = {<span class="hljs-string">'complexNo'</span>: apt_code, <span class="hljs-string">'complexName'</span>: apt_name}

        detail_items = soup.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'DataList_item__T1hMR'</span>)
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> detail_items:
            term = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_term__Tks7l'</span>).text.strip()
            definition = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_definition__d9KY1'</span>).text.strip()
            <span class="hljs-keyword">if</span> term <span class="hljs-keyword">in</span> [<span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>, <span class="hljs-string">'위치'</span>, <span class="hljs-string">'사용승인일'</span>, <span class="hljs-string">'세대수'</span>, <span class="hljs-string">'난방'</span>, <span class="hljs-string">'주차'</span>, <span class="hljs-string">'전기차 충전시설'</span>, <span class="hljs-string">'용적률/건폐율'</span>, <span class="hljs-string">'관리사무소 전화'</span>, <span class="hljs-string">'건설사'</span>]:
                detail_dict[term] = definition
        
        <span class="hljs-built_in">return</span> detail_dict
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching details for {apt_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> {}

<span class="hljs-comment"># 아파트 정보를 수집하는 함수 (법정동 선택 가능)</span>
def collect_apt_info_for_city(city_name, sigungu_name, dong_name=None, json_path=<span class="hljs-string">'/content/drive/MyDrive/district.json'</span>):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, sigungu_name, json_path)

    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None

    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-comment"># 법정동 선택</span>
    <span class="hljs-keyword">if</span> dong_name and dong_name != <span class="hljs-string">'전체'</span>:
        dong_code_name_map = {k: v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> dong_code_name_map.items() <span class="hljs-keyword">if</span> v == dong_name}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting apartment codes for {dong_code} ({dong_name})"</span>)
        apt_codes = get_apt_list(dong_code)

        <span class="hljs-keyword">if</span> not apt_codes.empty:
            <span class="hljs-keyword">for</span> _, apt_info <span class="hljs-keyword">in</span> apt_codes.iterrows():
                apt_code = apt_info[<span class="hljs-string">'complexNo'</span>]
                <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting details for {apt_code}"</span>)
                apt_details = get_apt_details(apt_code)
                
                <span class="hljs-keyword">if</span> apt_details:
                    apt_details[<span class="hljs-string">'dong_code'</span>] = dong_code
                    apt_details[<span class="hljs-string">'dong_name'</span>] = dong_name
                    all_apt_data.append(apt_details)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No apartment codes found for {dong_code}"</span>)

    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.DataFrame(all_apt_data)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = sigungu_name
        final_df[<span class="hljs-string">'dong_name'</span>] = final_df[<span class="hljs-string">'dong_name'</span>].apply(lambda x: x)
        
        <span class="hljs-comment"># 필요한 모든 열을 포함하도록 설정</span>
        columns = [<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>, <span class="hljs-string">'위치'</span>, <span class="hljs-string">'사용승인일'</span>, <span class="hljs-string">'세대수'</span>, <span class="hljs-string">'난방'</span>, <span class="hljs-string">'주차'</span>, <span class="hljs-string">'전기차 충전시설'</span>, <span class="hljs-string">'용적률/건폐율'</span>, <span class="hljs-string">'관리사무소 전화'</span>, <span class="hljs-string">'건설사'</span>]
        <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> columns:
            <span class="hljs-keyword">if</span> col not <span class="hljs-keyword">in</span> final_df.columns:
                final_df[col] = None
        
        final_df = final_df[columns]
        <span class="hljs-built_in">return</span> final_df
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>, <span class="hljs-string">'위치'</span>, <span class="hljs-string">'사용승인일'</span>, <span class="hljs-string">'세대수'</span>, <span class="hljs-string">'난방'</span>, <span class="hljs-string">'주차'</span>, <span class="hljs-string">'전기차 충전시설'</span>, <span class="hljs-string">'용적률/건폐율'</span>, <span class="hljs-string">'관리사무소 전화'</span>, <span class="hljs-string">'건설사'</span>])

<span class="hljs-comment"># 엑셀 저장 함수</span>
def save_to_excel(df, city_name, sigungu_name):
    now = datetime.now().strftime(<span class="hljs-string">"%Y%m%d_%H%M%S"</span>)
    file_name = f<span class="hljs-string">"{city_name}_{sigungu_name}_{now}.xlsx"</span>
    file_path = f<span class="hljs-string">'/content/drive/MyDrive/{file_name}'</span>
    
    df.to_excel(file_path, index=False)
    <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = input(<span class="hljs-string">"Enter the city or province name: "</span>)
sigungu_name = input(f<span class="hljs-string">"Enter the district name in {city_name} (or '전체' for all districts): "</span>)
dong_name = None

<span class="hljs-keyword">if</span> sigungu_name != <span class="hljs-string">'전체'</span>:
    dong_name = input(f<span class="hljs-string">"Enter the dong name in {sigungu_name} (or '전체' for all dongs): "</span>)

<span class="hljs-comment"># 아파트 정보 수집</span>
apt_data = collect_apt_info_for_city(city_name, sigungu_name, dong_name)

<span class="hljs-keyword">if</span> apt_data is not None and not apt_data.empty:
    <span class="hljs-built_in">print</span>(apt_data)
    save_to_excel(apt_data, city_name, sigungu_name)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data collected."</span>)
ㅁ</pre>
<p>오늘 추가된 정보를 보면 아파트의 정보까지 모두 크롤링할 수 있는 것을 확인 할 수 있습니다. 이제 여기에서 각 아파트의 평형별 정보를 추가하는 것과, 평형대별 아파트의 가격을 조회하면 될 것 같습니다. 다음 편에서 소개할게요! 감사합니다.</p>
<!-- CONTENT END 4 -->
]]></content:encoded>
					
		
		
		<media:content url="https://blog.kakaocdn.net/dn/b7YJpZ/btsJD8fFRcQ/pB1RQqe0MeWqbrelTlqcd1/img.png" medium="image"></media:content>
            	</item>
	</channel>
</rss>
