<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>부동산 크롤링 &#8211; 투데이즈.kr</title>
	<atom:link href="https://2days.kr/tag/%eb%b6%80%eb%8f%99%ec%82%b0-%ed%81%ac%eb%a1%a4%eb%a7%81/feed/" rel="self" type="application/rss+xml" />
	<link>https://2days.kr</link>
	<description>투데이즈</description>
	<lastBuildDate>Sun, 16 Nov 2025 13:11:46 +0000</lastBuildDate>
	<language>ko-KR</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8</generator>

<image>
	<url>https://2days.kr/wp-content/uploads/2025/10/cropped-simbol-1-32x32.png</url>
	<title>부동산 크롤링 &#8211; 투데이즈.kr</title>
	<link>https://2days.kr</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>[심화] streamlit 부동산 호가 수집 정보 서비스 하기</title>
		<link>https://2days.kr/18/09/23/56558/it/program/</link>
		
		<dc:creator><![CDATA[urjent]]></dc:creator>
		<pubDate>Wed, 18 Sep 2024 14:10:59 +0000</pubDate>
				<category><![CDATA[program]]></category>
		<category><![CDATA[부동산 정보]]></category>
		<category><![CDATA[부동산 크롤링]]></category>
		<category><![CDATA[부동산 호가]]></category>
		<category><![CDATA[파이썬]]></category>
		<category><![CDATA[파이썬 부동산]]></category>
		<guid isPermaLink="false">https://2days.kr/?p=56558</guid>

					<description><![CDATA[[심화] streamlit 부동산 호가 수집 정보 서비스 하기 편은 앞서 만든 코드를 이제 streamlit에서 서비스를 하기 위한 강의 입니다. 이 서비스를 통해 각 사용자가 입력하는 값에 따라 정보를 추출해서 보여줄 수 있기 때문에 매우 유용한 정보가 되리라 생각합니다. [심화] streamlit 부동산 호가 수집 정보 서비스 하기 이 편을 보기 전에 전 포스팅을 참고 하시면 이해가 [&#8230;]]]></description>
										<content:encoded><![CDATA[<p>[심화] streamlit 부동산 호가 수집 정보 서비스 하기 편은 앞서 만든 코드를 이제 streamlit에서 서비스를 하기 위한 강의 입니다. 이 서비스를 통해 각 사용자가 입력하는 값에 따라 정보를 추출해서 보여줄 수 있기 때문에 매우 유용한 정보가 되리라 생각합니다.</p>
<h3 data-ke-size="size23">[심화] streamlit 부동산 호가 수집 정보 서비스 하기</h3>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><figure style="width: 2560px" class="wp-caption alignnone"><img fetchpriority="high" decoding="async" src="https://blog.kakaocdn.net/dn/cwTDfk/btsJEO9eXUo/SnOyebR9pUL7V4Hyv6LFT1/img.png" alt="[심화] streamlit 부동산 호가 수집 정보 서비스 하기" width="2560" height="2560" data-origin-width="2560" data-origin-height="2560" data-is-animation="false" data-filename="[심화] streamlit 에 부동산 호가 수집 정보 서비스 하기.png" data-origin- title="[심화] streamlit 부동산 호가 수집 정보 서비스 하기 3"><figcaption class="wp-caption-text">[심화] streamlit 부동산 호가 수집 정보 서비스 하기</figcaption></figure></figure><div class='code-block code-block-2' style='margin: 8px auto; text-align: center; display: block; clear: both;'>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8940400388075870"
     crossorigin="anonymous"></script>
<!-- 중간 -->
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-8940400388075870"
     data-ad-slot="8794586137"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script></div>

<p>이 편을 보기 전에 전 포스팅을 참고 하시면 이해가 더욱 되시리라 생각을 합니다.</p>
<p><a href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-1" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1</a></p>
<p><a href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-2" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2</a></p>
<p><a href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-3" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #3</a></p>
<p><a href="https://aboda.kr/entry/%EA%B3%A0%EA%B8%89-%EB%B6%80%EB%8F%99%EC%82%B0-%EC%A0%95%EB%B3%B4-%ED%95%84%ED%84%B0-%EA%B3%A0%EB%8F%84%ED%99%94-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%A6%AC%ED%95%98%EA%B8%B0" target="_blank" rel="noopener">2024.09.17 &#8211; [부동산/자동화 프로젝트] &#8211; [고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기</a></p>
<p><a href="https://aboda.kr/entry/%EA%B3%A0%EA%B8%89-%EB%B6%80%EB%8F%99%EC%82%B0-%EC%A0%95%EB%B3%B4-%ED%95%84%ED%84%B0-%EA%B3%A0%EB%8F%84%ED%99%94-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%A6%AC%ED%95%98%EA%B8%B0-2" target="_blank" rel="noopener">2024.09.18 &#8211; [부동산/자동화 프로젝트] &#8211; [고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기 2</a></p>
<p>이제 우리가 할 것은 스트림릿에 코드를 올려서 코드를 실행하기만 하면 됩니다. 먼저 streamlit 스트림릿에 가입을 합니다.</p>
<p><a href="https://streamlit.io/" target="_blank" rel="noopener noreferrer noopener">https://streamlit.io/</a></p>
<figure id="og_1726667422533" contenteditable="false" data-ke-type="opengraph" data-ke-align="alignCenter" data-og-type="website" data-og-title="Streamlit • A faster way to build and share data apps" data-og-description="Streamlit is an open-source Python framework for data scientists and AI/ML engineers to deliver interactive data apps – in only a few lines of code." data-og-host="streamlit.io" data-og-source-url="https://streamlit.io/" data-og-url="https://streamlit.io/" data-og-image="https://scrap.kakaocdn.net/dn/o4UxG/hyW2Rekb3a/geKZPpVvbg35ryKVWsWRq0/img.jpg?width=1200&amp;height=630&amp;face=0_0_1200_630,https://scrap.kakaocdn.net/dn/TNMNX/hyW20h1mM3/uXf4grFoSmlFKDHgmQKzk0/img.jpg?width=1200&amp;height=630&amp;face=0_0_1200_630">
<div class="og-image"></div>
<div class="og-text">
<p class="og-title">Streamlit • A faster way to build and share data apps</p>
<p class="og-desc">Streamlit is an open-source Python framework for data scientists and AI/ML engineers to deliver interactive data apps – in only a few lines of code.</p>
<p class="og-host">streamlit.io</p>
</div>
</figure>
<p>가입을 한 후 아래 코드를 넣어 줍니다. 저는 코드를 직접 넣었습니다.</p>
<pre id="code_1726667476308" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import streamlit as st
import pandas as pd
from io import BytesIO
import requests
import json
from bs4 import BeautifulSoup

<span class="hljs-comment"># JSON 파일에서 법정동 코드 가져오기</span>
def get_dong_codes_for_city(city_name, sigungu_name=None, json_path=<span class="hljs-string">'district.json'</span>):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        st.error(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None

    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            <span class="hljs-keyword">if</span> sigungu_name and sigungu_name != <span class="hljs-string">'전체'</span>:
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]:
                    <span class="hljs-keyword">if</span> sigungu[<span class="hljs-string">'sigungu_name'</span>] == sigungu_name:
                        <span class="hljs-built_in">return</span> [sigungu[<span class="hljs-string">'sigungu_code'</span>]], [
                            {<span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>], <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]} <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                        ]
            <span class="hljs-keyword">else</span>:
                sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]]
                dong_codes = [
                    {<span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>], <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]}
                    <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]
                    <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                ]
                <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

<span class="hljs-comment"># 아파트 코드 리스트 가져오기</span>
def get_apt_list(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0"</span>
    }

    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()

        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            df = pd.DataFrame(data[<span class="hljs-string">'complexList'</span>])
            required_columns = [<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>]

            <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> required_columns:
                <span class="hljs-keyword">if</span> col not <span class="hljs-keyword">in</span> df.columns:
                    df[col] = None

            <span class="hljs-built_in">return</span> df[required_columns]
        <span class="hljs-keyword">else</span>:
            st.warning(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> pd.DataFrame(columns=required_columns)

    except Exception as e:
        st.error(f<span class="hljs-string">"Error fetching data for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=required_columns)

<span class="hljs-comment"># 아파트 코드로 상세 정보 가져오기</span>
def get_apt_details(apt_code):
    details_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=complex-info'</span>
    article_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=article&amp;tradeTypes=A1'</span>
    
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"fin.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://fin.land.naver.com/"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0"</span>
    }
    
    try:
        <span class="hljs-comment"># 기본 정보 가져오기</span>
        r_details = requests.get(details_url, headers=header)
        r_details.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup_details = BeautifulSoup(r_details.content, <span class="hljs-string">'html.parser'</span>)
        
        apt_name_tag = soup_details.find(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'ComplexSummary_name__vX3IN'</span>)
        apt_name = apt_name_tag.text.strip() <span class="hljs-keyword">if</span> apt_name_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
        detail_dict = {<span class="hljs-string">'complexNo'</span>: apt_code, <span class="hljs-string">'complexName'</span>: apt_name}
        
        detail_items = soup_details.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'DataList_item__T1hMR'</span>)
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> detail_items:
            term = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_term__Tks7l'</span>).text.strip()
            definition = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_definition__d9KY1'</span>).text.strip()
            <span class="hljs-keyword">if</span> term <span class="hljs-keyword">in</span> [<span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>, <span class="hljs-string">'위치'</span>, <span class="hljs-string">'사용승인일'</span>, <span class="hljs-string">'세대수'</span>, <span class="hljs-string">'난방'</span>, <span class="hljs-string">'주차'</span>, <span class="hljs-string">'전기차 충전시설'</span>, <span class="hljs-string">'용적률/건폐율'</span>, <span class="hljs-string">'관리사무소 전화'</span>, <span class="hljs-string">'건설사'</span>]:
                detail_dict[term] = definition

        <span class="hljs-comment"># 매물 정보 가져오기</span>
        r_article = requests.get(article_url, headers=header)
        r_article.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup_article = BeautifulSoup(r_article.content, <span class="hljs-string">'html.parser'</span>)
        
        listings = []
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> soup_article.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'ComplexArticleItem_item__L5o7k'</span>):
            listing = {}
            name_tag = item.find(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'ComplexArticleItem_name__4h3AA'</span>)
            listing[<span class="hljs-string">'매물명'</span>] = name_tag.text.strip() <span class="hljs-keyword">if</span> name_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
            price_tag = item.find(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'ComplexArticleItem_price__DFeIb'</span>)
            listing[<span class="hljs-string">'매매가'</span>] = price_tag.text.strip() <span class="hljs-keyword">if</span> price_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
            
            summary_items = item.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'ComplexArticleItem_item-summary__oHSwl'</span>)
            <span class="hljs-keyword">if</span> len(summary_items) &gt;= 4:
                listing[<span class="hljs-string">'면적'</span>] = summary_items[1].text.strip() <span class="hljs-keyword">if</span> len(summary_items) &gt; 1 <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
                listing[<span class="hljs-string">'층수'</span>] = summary_items[2].text.strip() <span class="hljs-keyword">if</span> len(summary_items) &gt; 2 <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
                listing[<span class="hljs-string">'방향'</span>] = summary_items[3].text.strip() <span class="hljs-keyword">if</span> len(summary_items) &gt; 3 <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>
            
            image_tag = item.find(<span class="hljs-string">'img'</span>)
            listing[<span class="hljs-string">'이미지'</span>] = image_tag[<span class="hljs-string">'src'</span>] <span class="hljs-keyword">if</span> image_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'No image'</span>
            comment_tag = item.find(<span class="hljs-string">'p'</span>, class_=<span class="hljs-string">'ComplexArticleItem_comment__zN_dK'</span>)
            listing[<span class="hljs-string">'코멘트'</span>] = comment_tag.text.strip() <span class="hljs-keyword">if</span> comment_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'No comment'</span>
            
            combined_listing = {**detail_dict, **listing}
            listings.append(combined_listing)
        
        <span class="hljs-built_in">return</span> listings
    
    except Exception as e:
        st.error(f<span class="hljs-string">"Error fetching details for {apt_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> []

<span class="hljs-comment"># 아파트 정보를 수집하는 함수</span>
def collect_apt_info_for_city(city_name, sigungu_name, dong_name=None, json_path=<span class="hljs-string">'district.json'</span>):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, sigungu_name, json_path)

    <span class="hljs-keyword">if</span> dong_list is None:
        st.error(f<span class="hljs-string">"Error: {city_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None

    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}
    
    <span class="hljs-comment"># 수집 중 표시를 위한 placeholder</span>
    placeholder = st.empty()

    <span class="hljs-keyword">if</span> dong_name and dong_name != <span class="hljs-string">'전체'</span>:
        dong_code_name_map = {k: v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> dong_code_name_map.items() <span class="hljs-keyword">if</span> v == dong_name}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        placeholder.write(f<span class="hljs-string">"{dong_name} ({dong_code}) - 수집중입니다."</span>)
        apt_codes = get_apt_list(dong_code)

        <span class="hljs-keyword">if</span> not apt_codes.empty:
            <span class="hljs-keyword">for</span> _, apt_info <span class="hljs-keyword">in</span> apt_codes.iterrows():
                apt_code = apt_info[<span class="hljs-string">'complexNo'</span>]
                apt_name = apt_info[<span class="hljs-string">'complexName'</span>]
                placeholder.write(f<span class="hljs-string">"{apt_name} ({apt_code}) - 수집중입니다."</span>)
                listings = get_apt_details(apt_code)
                
                <span class="hljs-keyword">if</span> listings:
                    <span class="hljs-keyword">for</span> listing <span class="hljs-keyword">in</span> listings:
                        listing[<span class="hljs-string">'dong_code'</span>] = dong_code
                        listing[<span class="hljs-string">'dong_name'</span>] = dong_name
                        all_apt_data.append(listing)
        <span class="hljs-keyword">else</span>:
            st.warning(f<span class="hljs-string">"No apartment codes found for {dong_code}"</span>)

    <span class="hljs-comment"># 수집이 완료된 후, 수집 중 메시지를 지우기</span>
    placeholder.empty()

    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.DataFrame(all_apt_data)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = sigungu_name
        final_df[<span class="hljs-string">'dong_name'</span>] = dong_name <span class="hljs-keyword">if</span> dong_name <span class="hljs-keyword">else</span> <span class="hljs-string">'전체'</span>
        
        <span class="hljs-comment"># 데이터프레임 결과 출력</span>
        st.write(<span class="hljs-string">"아파트 정보 수집 완료:"</span>)
        st.dataframe(final_df)

        <span class="hljs-comment"># 엑셀 파일로 저장</span>
        output = BytesIO()
        with pd.ExcelWriter(output, engine=<span class="hljs-string">'xlsxwriter'</span>) as writer:
            final_df.to_excel(writer, index=False)
        output.seek(0)

        <span class="hljs-comment"># 엑셀 파일 다운로드 버튼</span>
        st.download_button(
            label=<span class="hljs-string">"Download Excel"</span>,
            data=output,
            file_name=f<span class="hljs-string">"{city_name}_{sigungu_name}_apartments.xlsx"</span>,
            mime=<span class="hljs-string">"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"</span>
        )

        <span class="hljs-comment"># CSV 파일 다운로드 버튼</span>
        csv = final_df.to_csv(index=False).encode(<span class="hljs-string">'utf-8'</span>)
        st.download_button(
            label=<span class="hljs-string">"Download CSV"</span>,
            data=csv,
            file_name=f<span class="hljs-string">"{city_name}_{sigungu_name}_apartments.csv"</span>,
            mime=<span class="hljs-string">"text/csv"</span>
        )
    <span class="hljs-keyword">else</span>:
        st.write(<span class="hljs-string">"No data to save."</span>)

<span class="hljs-comment"># Streamlit 앱 실행</span>
st.title(<span class="hljs-string">"아파트 정보 수집기"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = st.text_input(<span class="hljs-string">"시/도 이름 입력"</span>, <span class="hljs-string">"서울특별시"</span>)
sigungu_name = st.text_input(<span class="hljs-string">"구/군/구 이름 입력"</span>, <span class="hljs-string">"강남구"</span>)
dong_name = st.text_input(<span class="hljs-string">"동 이름 입력 (선택사항)"</span>, <span class="hljs-string">"전체"</span>)

<span class="hljs-keyword">if</span> st.button(<span class="hljs-string">"정보 수집 시작"</span>):
    collect_apt_info_for_city(city_name, sigungu_name, dong_name)</pre>
<p>코드를 넣고 스트림릿에서 실행을 해봅니다.</p>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><img decoding="async" src="https://blog.kakaocdn.net/dn/IvHwB/btsJDIoGrsF/OsGisWkNIERb9zF4NeL3ak/img.png" data-is-animation="false" data-origin-width="2026" data-origin-height="1126" data-filename="스크린샷 2024-09-18 오후 10.52.00.png" alt="img" title="[심화] streamlit 부동산 호가 수집 정보 서비스 하기 4"><figcaption>[심화] streamlit 에 부동산 호가 수집 정보 서비스 하기</figcaption></figure>
<p>매우 정보가 잘 나오고 있네요. 이제 아파트 정보를 모두 수집하면 결과가 어떻게 나올까요?</p>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><img decoding="async" src="https://blog.kakaocdn.net/dn/oNjcB/btsJDYq0I2n/zpRfOMTYgUyj8kgvjJMkIk/img.png" data-is-animation="false" data-origin-width="1778" data-origin-height="1278" data-filename="스크린샷 2024-09-18 오후 10.53.02.png" alt="img" title="[심화] streamlit 부동산 호가 수집 정보 서비스 하기 5"></figure>
<p>수집이 완료되면 이렇게 표로도 보여주고 엑셀 또는 CSV 파일 형태로 다운도 받을 수 있도록 코드가 잘 완료 되었습니다.</p>
<p><a href="https://2days.kr/14/09/12/56525/coding/data/">파이썬 부동산 매매가 조회 프로그램 만들기 3편 (서울아파트 컬럼 정리)</a></p>
<p><a href="https://2days.kr/14/09/14/56529/coding/data/">파이썬 부동산 매매가 조회 프로그램 만들기 4편 (전국 데이터)</a></p>
<!-- AI CONTENT END 2 -->
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]</title>
		<link>https://2days.kr/17/09/07/56548/it/program/</link>
		
		<dc:creator><![CDATA[urjent]]></dc:creator>
		<pubDate>Mon, 16 Sep 2024 22:06:37 +0000</pubDate>
				<category><![CDATA[program]]></category>
		<category><![CDATA[네이버 부동산]]></category>
		<category><![CDATA[네이버부동산]]></category>
		<category><![CDATA[네이버부동산크롤링]]></category>
		<category><![CDATA[네이버크롤링]]></category>
		<category><![CDATA[부동산]]></category>
		<category><![CDATA[부동산 정보 크롤링]]></category>
		<category><![CDATA[부동산 크롤링]]></category>
		<guid isPermaLink="false">https://2days.kr/?p=56548</guid>

					<description><![CDATA[부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 그 동안 부동산 정보 크롤링 코드를 모아서 좀 더 고도화 하는 작업을 하도록 해볼게요. 우선 네이버에서 &#8220;서울특별시&#8221; 를 입력할 경우 모든 법정동을 조회하여 법정동에 헤당하는 아파트를 먼저 모아 오도록 하겠습니다. 법정동에 포함된 아파트 정보를 수집하여 분석할 수 있는 raw data를 만들고 제 기준에 따라 매물을 정리해보도록 하겠습니다. [&#8230;]]]></description>
										<content:encoded><![CDATA[<p data-ke-size="size16">부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 그 동안 부동산 정보 크롤링 코드를 모아서 좀 더 고도화 하는 작업을 하도록 해볼게요. 우선 네이버에서 &#8220;서울특별시&#8221; 를 입력할 경우 모든 법정동을 조회하여 법정동에 헤당하는 아파트를 먼저 모아 오도록 하겠습니다. 법정동에 포함된 아파트 정보를 수집하여 분석할 수 있는 raw data를 만들고 제 기준에 따라 매물을 정리해보도록 하겠습니다.</p>
<h3 data-ke-size="size23">부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]</h3>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><figure style="width: 2560px" class="wp-caption alignnone"><img alt="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]" title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]" post-id="56548" fifu-featured="1" decoding="async" src="https://blog.kakaocdn.net/dn/b7YJpZ/btsJD8fFRcQ/pB1RQqe0MeWqbrelTlqcd1/img.png" alt="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]" width="2560" height="2560" data-origin-width="2560" data-origin-height="2560" data-is-animation="false" data-filename="[고급] 부동산 정보 필터 고도화 - 네이버 매물 정리하기.png" data-origin- title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 9"><figcaption class="wp-caption-text">부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급]</figcaption></figure><figcaption>[고급] 부동산 정보 필터 고도화 &#8211; 네이버 매물 정리하기</figcaption></figure><div class='code-block code-block-2' style='margin: 8px auto; text-align: center; display: block; clear: both;'>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8940400388075870"
     crossorigin="anonymous"></script>
<!-- 중간 -->
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-8940400388075870"
     data-ad-slot="8794586137"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script></div>

<p data-ke-size="size16"><a href="https://aboda.kr/entry/%ED%8C%8C%EC%9D%B4%EC%8D%AC-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%A7%A4%EA%B0%80-%EC%A1%B0%ED%9A%8C-%ED%94%84%EB%A1%9C%EA%B7%B8%EB%9E%A8-%EB%A7%8C%EB%93%A4%EA%B8%B0-2%ED%8E%B8-%EC%A7%80%EC%97%AD%EC%BD%94%EB%93%9C" target="_blank" rel="noopener">2024.09.14 &#8211; [부동산/자동화 프로젝트] &#8211; 파이썬 부동산 매매가 조회 프로그램 만들기 2편 (지역코드)</a></p>
<p data-ke-size="size16"><a style="background-color: #e6f5ff; color: #0070d1; text-align: start;" href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-2" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2</a></p>
<p data-ke-size="size16"><a href="https://2days.kr/14/09/08/56521/coding/data/">파이썬 부동산 매매가 조회 프로그램 만들기 2편 (지역코드)</a></p>
<p data-ke-size="size16">부동산 매매가 조회편과 부동산 매물 정보 수집 편을 확인하면, 아래 코드를 수정하는데 어려움이 없으 실 겁니다.</p>
<pre id="code_1726453526136" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd
from datetime import datetime

def get_dong_codes_for_city(city_name, json_path):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None
    
    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]]
            dong_codes = [
                {
                    <span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>],
                    <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]
                }
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]
                <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
            ]
            <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

def get_apt_list(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378?ms=37.5018495,127.0438028,16&amp;a=APT&amp;b=A1&amp;e=RETAIL"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()
        
        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            df = pd.DataFrame(data[<span class="hljs-string">'complexList'</span>])
            
            <span class="hljs-comment"># 추가 정보 포함 (예: 건축년도, 세대수, 평수 등)</span>
            required_columns = [<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>]
            
            <span class="hljs-comment"># 필요한 컬럼만 추출 (없으면 기본값 설정)</span>
            <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> required_columns:
                <span class="hljs-keyword">if</span> col not <span class="hljs-keyword">in</span> df.columns:
                    df[col] = None
            
            <span class="hljs-built_in">return</span> df[required_columns]
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>])
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching data for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>])

def collect_apt_info_for_city(city_name, json_path):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, json_path)
    
    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None
    
    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting data for {dong_code} ({dong_name})"</span>)
        apt_data = get_apt_list(dong_code)
        
        <span class="hljs-keyword">if</span> not apt_data.empty:
            apt_data = apt_data.copy()  <span class="hljs-comment"># 데이터프레임 복사본을 생성</span>
            apt_data[<span class="hljs-string">'dong_code'</span>] = dong_code
            apt_data[<span class="hljs-string">'dong_name'</span>] = dong_name
            all_apt_data.append(apt_data)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}"</span>)
    
    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.concat(all_apt_data, ignore_index=True)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = final_df[<span class="hljs-string">'dong_code'</span>].apply(lambda x: x[:5])  <span class="hljs-comment"># sigungu_name 추출</span>
        final_df[<span class="hljs-string">'dong_name'</span>] = final_df[<span class="hljs-string">'dong_name'</span>].apply(lambda x: x)  <span class="hljs-comment"># 동 이름 적용</span>
        final_df = final_df[[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>]]
        <span class="hljs-built_in">return</span> final_df
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>])

def save_to_excel(df, city_name):
    now = datetime.now().strftime(<span class="hljs-string">"%Y%m%d_%H%M%S"</span>)
    file_name = f<span class="hljs-string">"{city_name}_{now}.xlsx"</span>
    file_path = f<span class="hljs-string">'/content/drive/MyDrive/{file_name}'</span>
    
    df.to_excel(file_path, index=False)
    <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = input(<span class="hljs-string">"Enter the city or province name: "</span>)
json_path = <span class="hljs-string">'/content/drive/MyDrive/district.json'</span>  <span class="hljs-comment"># 올바른 JSON 파일 경로로 수정하십시오.</span>

<span class="hljs-comment"># 아파트 정보 수집</span>
apt_data = collect_apt_info_for_city(city_name, json_path)

<span class="hljs-keyword">if</span> apt_data is not None:
    <span class="hljs-built_in">print</span>(apt_data)
    save_to_excel(apt_data, city_name)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data collected."</span>)</pre>
<p data-ke-size="size16">이제 이 코드를 실행해서 어떻게 나오는지 확인해볼까요? 서울특별시 각 구에 속해 있는 아파트 리스트를 아래와 같이 잘 출력하는 것을 확인했습니다. 이제 각 아파트 정보를 옆에 추가적으로 넣어 보도록하겠습니다.</p>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/x35fg/btsJFp8zmNg/kHPsQ9pn01dCcs20FQyzLK/img.png" data-origin-width="732" data-origin-height="1006" data-filename="스크린샷 2024-09-16 오후 8.16.33.png" data-is-animation="false" alt="img" title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 10"><figcaption>[고급] 부동산 정보 필터 고도화</figcaption></figure>
<p data-ke-size="size16">이 데이터에 아파트별 정보를 추가로 넣는 작업을 하면 아래 코드와 같습니다</p>
<pre id="code_1726520774035" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

def get_dong_codes_for_city(city_name, json_path):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None
    
    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]]
            dong_codes = [
                {
                    <span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>],
                    <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]
                }
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]
                <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
            ]
            <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

def get_apt_codes(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378?ms=37.5018495,127.0438028,16&amp;a=APT&amp;b=A1&amp;e=RETAIL"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()
        
        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            apt_codes = [complex_info[<span class="hljs-string">'complexNo'</span>] <span class="hljs-keyword">for</span> complex_info <span class="hljs-keyword">in</span> data[<span class="hljs-string">'complexList'</span>]]
            <span class="hljs-built_in">return</span> apt_codes
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> []
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching apartment codes for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> []

def get_apt_details(apt_code):
    details_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=complex-info'</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"fin.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://fin.land.naver.com/"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(details_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup = BeautifulSoup(r.content, <span class="hljs-string">'html.parser'</span>)
        
        <span class="hljs-comment"># Extract complex details</span>
        detail_dict = {<span class="hljs-string">'complexNo'</span>: apt_code}
        
        detail_items = soup.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'DataList_item__T1hMR'</span>)
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> detail_items:
            term = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_term__Tks7l'</span>).text.strip()
            definition = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_definition__d9KY1'</span>).text.strip()
            <span class="hljs-keyword">if</span> term <span class="hljs-keyword">in</span> [<span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>]:
                detail_dict[term] = definition
        
        <span class="hljs-built_in">return</span> detail_dict
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching details for {apt_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> {}

def collect_apt_info_for_city(city_name, json_path):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, json_path)
    
    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None
    
    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting apartment codes for {dong_code} ({dong_name})"</span>)
        apt_codes = get_apt_codes(dong_code)
        
        <span class="hljs-keyword">if</span> apt_codes:
            <span class="hljs-keyword">for</span> apt_code <span class="hljs-keyword">in</span> apt_codes:
                <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting details for {apt_code}"</span>)
                apt_details = get_apt_details(apt_code)
                
                <span class="hljs-keyword">if</span> apt_details:
                    apt_details[<span class="hljs-string">'dong_code'</span>] = dong_code
                    apt_details[<span class="hljs-string">'dong_name'</span>] = dong_name
                    all_apt_data.append(apt_details)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No apartment codes found for {dong_code}"</span>)
    
    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.DataFrame(all_apt_data)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = final_df[<span class="hljs-string">'dong_code'</span>].apply(lambda x: x[:5])  <span class="hljs-comment"># sigungu_name 추출</span>
        final_df[<span class="hljs-string">'dong_name'</span>] = final_df[<span class="hljs-string">'dong_name'</span>].apply(lambda x: x)  <span class="hljs-comment"># 동 이름 적용</span>
        final_df = final_df[[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>]]
        <span class="hljs-built_in">return</span> final_df
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>])

def save_to_excel(df, city_name):
    now = datetime.now().strftime(<span class="hljs-string">"%Y%m%d_%H%M%S"</span>)
    file_name = f<span class="hljs-string">"{city_name}_{now}.xlsx"</span>
    file_path = f<span class="hljs-string">'/content/drive/MyDrive/{file_name}'</span>
    
    df.to_excel(file_path, index=False)
    <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = input(<span class="hljs-string">"Enter the city or province name: "</span>)
json_path = <span class="hljs-string">'/content/drive/MyDrive/district.json'</span>  <span class="hljs-comment"># 올바른 JSON 파일 경로로 수정하십시오.</span>

<span class="hljs-comment"># 아파트 정보 수집</span>
apt_data = collect_apt_info_for_city(city_name, json_path)

<span class="hljs-keyword">if</span> apt_data is not None:
    <span class="hljs-built_in">print</span>(apt_data)
    save_to_excel(apt_data, city_name)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data collected."</span>)</pre>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><img decoding="async" src="https://blog.kakaocdn.net/dn/bbB4a6/btsJE2Z8mel/Sn6gf0ppHMSkNKQ6h8bdek/img.png" data-is-animation="false" data-origin-width="2142" data-origin-height="828" data-filename="스크린샷 2024-09-17 오전 6.06.24.png" alt="img" title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 11"></figure>
<p>서울특별시 전체 동에 대한 정보를 수집하는 것은 꽤 시간이 오래 걸립니다. 서울특별시 전체 동에 대한 정보가 아닌 특정 구까지 입력받아서 조사하도록 수정해보겠습니다. 그리고 만약 전체 구를 조사하고 싶으면 전체 라고 입력할 경우 전체 구에 포함된 법정동을 조사하도록 수정합니다.</p>

<pre id="code_1726521405336" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

<span class="hljs-comment"># 시/도와 구 정보를 JSON 파일에서 불러오는 함수</span>
def get_dong_codes_for_city(city_name, json_path, sigungu_name=None):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None
    
    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            all_sigungu = si_do[<span class="hljs-string">'sigungu'</span>]
            <span class="hljs-keyword">if</span> sigungu_name and sigungu_name != <span class="hljs-string">"전체"</span>:
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> all_sigungu:
                    <span class="hljs-keyword">if</span> sigungu[<span class="hljs-string">'sigungu_name'</span>] == sigungu_name:
                        dong_codes = [
                            {
                                <span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>],
                                <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]
                            }
                            <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                        ]
                        <span class="hljs-built_in">return</span> [sigungu[<span class="hljs-string">'sigungu_code'</span>]], dong_codes
            <span class="hljs-keyword">else</span>:  <span class="hljs-comment"># 전체 구를 선택한 경우</span>
                sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> all_sigungu]
                dong_codes = [
                    {
                        <span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>],
                        <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]
                    }
                    <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> all_sigungu
                    <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                ]
                <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

<span class="hljs-comment"># 법정동 코드로 아파트 코드를 가져오는 함수</span>
def get_apt_codes(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378?ms=37.5018495,127.0438028,16&amp;a=APT&amp;b=A1&amp;e=RETAIL"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()
        
        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            apt_codes = [complex_info[<span class="hljs-string">'complexNo'</span>] <span class="hljs-keyword">for</span> complex_info <span class="hljs-keyword">in</span> data[<span class="hljs-string">'complexList'</span>]]
            <span class="hljs-built_in">return</span> apt_codes
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> []
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching apartment codes for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> []

<span class="hljs-comment"># 아파트 코드로 상세 정보를 가져오는 함수</span>
def get_apt_details(apt_code):
    details_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=complex-info'</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"fin.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://fin.land.naver.com/"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(details_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup = BeautifulSoup(r.content, <span class="hljs-string">'html.parser'</span>)
        
        <span class="hljs-comment"># Extract complex details</span>
        detail_dict = {<span class="hljs-string">'complexNo'</span>: apt_code}
        
        detail_items = soup.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'DataList_item__T1hMR'</span>)
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> detail_items:
            term = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_term__Tks7l'</span>).text.strip()
            definition = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_definition__d9KY1'</span>).text.strip()
            <span class="hljs-keyword">if</span> term <span class="hljs-keyword">in</span> [<span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>]:
                detail_dict[term] = definition
        
        <span class="hljs-built_in">return</span> detail_dict
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching details for {apt_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> {}

<span class="hljs-comment"># 도시와 구별로 아파트 정보를 수집하는 함수</span>
def collect_apt_info_for_city(city_name, json_path, sigungu_name=None):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, json_path, sigungu_name)
    
    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} or {sigungu_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None
    
    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting apartment codes for {dong_code} ({dong_name})"</span>)
        apt_codes = get_apt_codes(dong_code)
        
        <span class="hljs-keyword">if</span> apt_codes:
            <span class="hljs-keyword">for</span> apt_code <span class="hljs-keyword">in</span> apt_codes:
                <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting details for {apt_code}"</span>)
                apt_details = get_apt_details(apt_code)
                
                <span class="hljs-keyword">if</span> apt_details:
                    apt_details[<span class="hljs-string">'dong_code'</span>] = dong_code
                    apt_details[<span class="hljs-string">'dong_name'</span>] = dong_name
                    all_apt_data.append(apt_details)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No apartment codes found for {dong_code}"</span>)
    
    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.DataFrame(all_apt_data)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = final_df[<span class="hljs-string">'dong_code'</span>].apply(lambda x: x[:5])  <span class="hljs-comment"># sigungu_name 추출</span>
        final_df[<span class="hljs-string">'dong_name'</span>] = final_df[<span class="hljs-string">'dong_name'</span>].apply(lambda x: x)  <span class="hljs-comment"># 동 이름 적용</span>
        final_df = final_df[[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>]]
        <span class="hljs-built_in">return</span> final_df
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>])

<span class="hljs-comment"># 엑셀 파일로 저장하는 함수</span>
def save_to_excel(df, city_name, sigungu_name=None):
    now = datetime.now().strftime(<span class="hljs-string">"%Y%m%d_%H%M%S"</span>)
    <span class="hljs-keyword">if</span> sigungu_name and sigungu_name != <span class="hljs-string">"전체"</span>:
        file_name = f<span class="hljs-string">"{city_name}_{sigungu_name}_{now}.xlsx"</span>
    <span class="hljs-keyword">else</span>:
        file_name = f<span class="hljs-string">"{city_name}_전체_{now}.xlsx"</span>
    
    file_path = f<span class="hljs-string">'/content/drive/MyDrive/{file_name}'</span>
    
    df.to_excel(file_path, index=False)
    <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = input(<span class="hljs-string">"Enter the city or province name (e.g., 서울특별시): "</span>)
sigungu_name = input(<span class="hljs-string">"Enter the district (gu) name or type '전체' for all districts: "</span>)
json_path = <span class="hljs-string">'/content/drive/MyDrive/district.json'</span>  <span class="hljs-comment"># 올바른 JSON 파일 경로로 수정하십시오.</span>

<span class="hljs-comment"># 아파트 정보 수집</span>
apt_data = collect_apt_info_for_city(city_name, json_path, sigungu_name)

<span class="hljs-keyword">if</span> apt_data is not None:
    <span class="hljs-built_in">print</span>(apt_data)
    save_to_excel(apt_data, city_name, sigungu_name)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data collected."</span>)</pre>
<p>잘 동작은 합니다만, 여전히 구에 포함된 아파트도 너무 많죠? 시간이 많이 걸리네요, 이제는 특정 시, 구, 동까지 선택할 수 있도록 더 수정해봅니다.</p>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter"><img decoding="async" src="https://blog.kakaocdn.net/dn/cTfBPZ/btsJEiWKiLx/LeR3JldMbs1VHhVTjD8aIK/img.png" data-is-animation="false" data-origin-width="1288" data-origin-height="808" data-filename="스크린샷 2024-09-17 오전 6.16.14.png" alt="img" title="부동산정보 필터 고도화 : 네이버 매물 정리하기 [고급] 12"></figure>
<p>수정을 하면서, 아파트명이 누락되어 있기 때문에 아파트명도 포함시키도록 하겠습니다. 즉 <span style="color: #0d0d0d; text-align: start;">법정동도 선택할 수 있도록 할 예정인데</span></p>
<p><span style="color: #0d0d0d; text-align: start;">예를 들어, 서울특별시, 마포구, 아현동까지 법정동을 선택하면 법정동만 조사하고, 전체 라고 선택하면 전체를 조사하게 되는 것입니다. 시/군, 법정동 각각 전체를 입력으면서 만약 시군구를 전체라고 입력하면 법정동은 자동으로 입력 받을 필요 없이 전체를 조사하게 됩니다. 그리고 조사자료에 아파트명이 빠져 있는데 아파트 코드명 옆 열에 아파트 명을 넣을 예정입니다.</span></p>
<pre id="code_1726522604485" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">from google.colab import drive
import requests
import json
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

<span class="hljs-comment"># Google Drive 마운트</span>
drive.mount(<span class="hljs-string">'/content/drive'</span>)

<span class="hljs-comment"># 법정동 코드를 가져오는 함수</span>
def get_dong_codes_for_city(city_name, sigungu_name=None, json_path=<span class="hljs-string">'/content/drive/MyDrive/district.json'</span>):
    try:
        with open(json_path, <span class="hljs-string">'r'</span>, encoding=<span class="hljs-string">'utf-8'</span>) as file:
            data = json.load(file)
    except FileNotFoundError:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: The file at {json_path} was not found."</span>)
        <span class="hljs-built_in">return</span> None, None

    <span class="hljs-keyword">for</span> si_do <span class="hljs-keyword">in</span> data:
        <span class="hljs-keyword">if</span> si_do[<span class="hljs-string">'si_do_name'</span>] == city_name:
            <span class="hljs-keyword">if</span> sigungu_name and sigungu_name != <span class="hljs-string">'전체'</span>:
                <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]:
                    <span class="hljs-keyword">if</span> sigungu[<span class="hljs-string">'sigungu_name'</span>] == sigungu_name:
                        <span class="hljs-built_in">return</span> [sigungu[<span class="hljs-string">'sigungu_code'</span>]], [
                            {<span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>], <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]} <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                        ]
            <span class="hljs-keyword">else</span>:  <span class="hljs-comment"># 시군구 '전체'</span>
                sigungu_codes = [sigungu[<span class="hljs-string">'sigungu_code'</span>] <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]]
                dong_codes = [
                    {<span class="hljs-string">'code'</span>: dong[<span class="hljs-string">'code'</span>], <span class="hljs-string">'name'</span>: dong[<span class="hljs-string">'name'</span>]}
                    <span class="hljs-keyword">for</span> sigungu <span class="hljs-keyword">in</span> si_do[<span class="hljs-string">'sigungu'</span>]
                    <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> sigungu[<span class="hljs-string">'eup_myeon_dong'</span>]
                ]
                <span class="hljs-built_in">return</span> sigungu_codes, dong_codes
    <span class="hljs-built_in">return</span> None, None

<span class="hljs-comment"># 아파트 코드 리스트 가져오기</span>
def get_apt_list(dong_code):
    down_url = f<span class="hljs-string">'https://new.land.naver.com/api/regions/complexes?cortarNo={dong_code}&amp;realEstateType=APT&amp;order='</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"new.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://new.land.naver.com/complexes/102378"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0"</span>
    }

    try:
        r = requests.get(down_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        data = r.json()

        <span class="hljs-keyword">if</span> <span class="hljs-string">'complexList'</span> <span class="hljs-keyword">in</span> data and isinstance(data[<span class="hljs-string">'complexList'</span>], list):
            df = pd.DataFrame(data[<span class="hljs-string">'complexList'</span>])
            required_columns = [<span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'buildYear'</span>, <span class="hljs-string">'totalHouseholdCount'</span>, <span class="hljs-string">'areaSize'</span>, <span class="hljs-string">'price'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'floor'</span>]

            <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> required_columns:
                <span class="hljs-keyword">if</span> col not <span class="hljs-keyword">in</span> df.columns:
                    df[col] = None

            <span class="hljs-built_in">return</span> df[required_columns]
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No data found for {dong_code}."</span>)
            <span class="hljs-built_in">return</span> pd.DataFrame(columns=required_columns)

    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching data for {dong_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=required_columns)

<span class="hljs-comment"># 아파트 코드로 상세 정보를 가져오는 함수</span>
def get_apt_details(apt_code):
    details_url = f<span class="hljs-string">'https://fin.land.naver.com/complexes/{apt_code}?tab=complex-info'</span>
    header = {
        <span class="hljs-string">"Accept-Encoding"</span>: <span class="hljs-string">"gzip"</span>,
        <span class="hljs-string">"Host"</span>: <span class="hljs-string">"fin.land.naver.com"</span>,
        <span class="hljs-string">"Referer"</span>: <span class="hljs-string">"https://fin.land.naver.com/"</span>,
        <span class="hljs-string">"Sec-Fetch-Dest"</span>: <span class="hljs-string">"empty"</span>,
        <span class="hljs-string">"Sec-Fetch-Mode"</span>: <span class="hljs-string">"cors"</span>,
        <span class="hljs-string">"Sec-Fetch-Site"</span>: <span class="hljs-string">"same-origin"</span>,
        <span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"</span>
    }
    
    try:
        r = requests.get(details_url, headers=header)
        r.encoding = <span class="hljs-string">"utf-8-sig"</span>
        soup = BeautifulSoup(r.content, <span class="hljs-string">'html.parser'</span>)
        
        <span class="hljs-comment"># 아파트 이름 추출</span>
        apt_name_tag = soup.find(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'ComplexSummary_name__vX3IN'</span>)
        apt_name = apt_name_tag.text.strip() <span class="hljs-keyword">if</span> apt_name_tag <span class="hljs-keyword">else</span> <span class="hljs-string">'Unknown'</span>

        <span class="hljs-comment"># 상세 정보 추출</span>
        detail_dict = {<span class="hljs-string">'complexNo'</span>: apt_code, <span class="hljs-string">'complexName'</span>: apt_name}

        detail_items = soup.find_all(<span class="hljs-string">'li'</span>, class_=<span class="hljs-string">'DataList_item__T1hMR'</span>)
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> detail_items:
            term = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_term__Tks7l'</span>).text.strip()
            definition = item.find(<span class="hljs-string">'div'</span>, class_=<span class="hljs-string">'DataList_definition__d9KY1'</span>).text.strip()
            <span class="hljs-keyword">if</span> term <span class="hljs-keyword">in</span> [<span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>, <span class="hljs-string">'위치'</span>, <span class="hljs-string">'사용승인일'</span>, <span class="hljs-string">'세대수'</span>, <span class="hljs-string">'난방'</span>, <span class="hljs-string">'주차'</span>, <span class="hljs-string">'전기차 충전시설'</span>, <span class="hljs-string">'용적률/건폐율'</span>, <span class="hljs-string">'관리사무소 전화'</span>, <span class="hljs-string">'건설사'</span>]:
                detail_dict[term] = definition
        
        <span class="hljs-built_in">return</span> detail_dict
    
    except Exception as e:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error fetching details for {apt_code}: {e}"</span>)
        <span class="hljs-built_in">return</span> {}

<span class="hljs-comment"># 아파트 정보를 수집하는 함수 (법정동 선택 가능)</span>
def collect_apt_info_for_city(city_name, sigungu_name, dong_name=None, json_path=<span class="hljs-string">'/content/drive/MyDrive/district.json'</span>):
    sigungu_codes, dong_list = get_dong_codes_for_city(city_name, sigungu_name, json_path)

    <span class="hljs-keyword">if</span> dong_list is None:
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error: {city_name} not found in JSON."</span>)
        <span class="hljs-built_in">return</span> None

    all_apt_data = []
    dong_code_name_map = {dong[<span class="hljs-string">'code'</span>]: dong[<span class="hljs-string">'name'</span>] <span class="hljs-keyword">for</span> dong <span class="hljs-keyword">in</span> dong_list}

    <span class="hljs-comment"># 법정동 선택</span>
    <span class="hljs-keyword">if</span> dong_name and dong_name != <span class="hljs-string">'전체'</span>:
        dong_code_name_map = {k: v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> dong_code_name_map.items() <span class="hljs-keyword">if</span> v == dong_name}

    <span class="hljs-keyword">for</span> dong_code, dong_name <span class="hljs-keyword">in</span> dong_code_name_map.items():
        <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting apartment codes for {dong_code} ({dong_name})"</span>)
        apt_codes = get_apt_list(dong_code)

        <span class="hljs-keyword">if</span> not apt_codes.empty:
            <span class="hljs-keyword">for</span> _, apt_info <span class="hljs-keyword">in</span> apt_codes.iterrows():
                apt_code = apt_info[<span class="hljs-string">'complexNo'</span>]
                <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Collecting details for {apt_code}"</span>)
                apt_details = get_apt_details(apt_code)
                
                <span class="hljs-keyword">if</span> apt_details:
                    apt_details[<span class="hljs-string">'dong_code'</span>] = dong_code
                    apt_details[<span class="hljs-string">'dong_name'</span>] = dong_name
                    all_apt_data.append(apt_details)
        <span class="hljs-keyword">else</span>:
            <span class="hljs-built_in">print</span>(f<span class="hljs-string">"No apartment codes found for {dong_code}"</span>)

    <span class="hljs-keyword">if</span> all_apt_data:
        final_df = pd.DataFrame(all_apt_data)
        final_df[<span class="hljs-string">'si_do_name'</span>] = city_name
        final_df[<span class="hljs-string">'sigungu_name'</span>] = sigungu_name
        final_df[<span class="hljs-string">'dong_name'</span>] = final_df[<span class="hljs-string">'dong_name'</span>].apply(lambda x: x)
        
        <span class="hljs-comment"># 필요한 모든 열을 포함하도록 설정</span>
        columns = [<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>, <span class="hljs-string">'위치'</span>, <span class="hljs-string">'사용승인일'</span>, <span class="hljs-string">'세대수'</span>, <span class="hljs-string">'난방'</span>, <span class="hljs-string">'주차'</span>, <span class="hljs-string">'전기차 충전시설'</span>, <span class="hljs-string">'용적률/건폐율'</span>, <span class="hljs-string">'관리사무소 전화'</span>, <span class="hljs-string">'건설사'</span>]
        <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> columns:
            <span class="hljs-keyword">if</span> col not <span class="hljs-keyword">in</span> final_df.columns:
                final_df[col] = None
        
        final_df = final_df[columns]
        <span class="hljs-built_in">return</span> final_df
    <span class="hljs-keyword">else</span>:
        <span class="hljs-built_in">return</span> pd.DataFrame(columns=[<span class="hljs-string">'si_do_name'</span>, <span class="hljs-string">'sigungu_name'</span>, <span class="hljs-string">'dong_name'</span>, <span class="hljs-string">'complexNo'</span>, <span class="hljs-string">'complexName'</span>, <span class="hljs-string">'공급면적'</span>, <span class="hljs-string">'전용면적'</span>, <span class="hljs-string">'해당면적 세대수'</span>, <span class="hljs-string">'현관구조'</span>, <span class="hljs-string">'방/욕실'</span>, <span class="hljs-string">'위치'</span>, <span class="hljs-string">'사용승인일'</span>, <span class="hljs-string">'세대수'</span>, <span class="hljs-string">'난방'</span>, <span class="hljs-string">'주차'</span>, <span class="hljs-string">'전기차 충전시설'</span>, <span class="hljs-string">'용적률/건폐율'</span>, <span class="hljs-string">'관리사무소 전화'</span>, <span class="hljs-string">'건설사'</span>])

<span class="hljs-comment"># 엑셀 저장 함수</span>
def save_to_excel(df, city_name, sigungu_name):
    now = datetime.now().strftime(<span class="hljs-string">"%Y%m%d_%H%M%S"</span>)
    file_name = f<span class="hljs-string">"{city_name}_{sigungu_name}_{now}.xlsx"</span>
    file_path = f<span class="hljs-string">'/content/drive/MyDrive/{file_name}'</span>
    
    df.to_excel(file_path, index=False)
    <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Data saved to {file_path}"</span>)

<span class="hljs-comment"># 사용자 입력 받기</span>
city_name = input(<span class="hljs-string">"Enter the city or province name: "</span>)
sigungu_name = input(f<span class="hljs-string">"Enter the district name in {city_name} (or '전체' for all districts): "</span>)
dong_name = None

<span class="hljs-keyword">if</span> sigungu_name != <span class="hljs-string">'전체'</span>:
    dong_name = input(f<span class="hljs-string">"Enter the dong name in {sigungu_name} (or '전체' for all dongs): "</span>)

<span class="hljs-comment"># 아파트 정보 수집</span>
apt_data = collect_apt_info_for_city(city_name, sigungu_name, dong_name)

<span class="hljs-keyword">if</span> apt_data is not None and not apt_data.empty:
    <span class="hljs-built_in">print</span>(apt_data)
    save_to_excel(apt_data, city_name, sigungu_name)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"No data collected."</span>)
ㅁ</pre>
<p>오늘 추가된 정보를 보면 아파트의 정보까지 모두 크롤링할 수 있는 것을 확인 할 수 있습니다. 이제 여기에서 각 아파트의 평형별 정보를 추가하는 것과, 평형대별 아파트의 가격을 조회하면 될 것 같습니다. 다음 편에서 소개할게요! 감사합니다.</p>
<!-- AI CONTENT END 4 -->
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2</title>
		<link>https://2days.kr/15/09/08/56538/it/program/</link>
		
		<dc:creator><![CDATA[urjent]]></dc:creator>
		<pubDate>Sat, 14 Sep 2024 23:09:46 +0000</pubDate>
				<category><![CDATA[program]]></category>
		<category><![CDATA[부동산]]></category>
		<category><![CDATA[부동산 크롤링]]></category>
		<category><![CDATA[크롤링]]></category>
		<category><![CDATA[파이썬]]></category>
		<category><![CDATA[파이썬 크롤링]]></category>
		<guid isPermaLink="false">https://2days.kr/?p=56538</guid>

					<description><![CDATA[부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2 ㅣ 현대 사회에서 데이터는 매우 중요한 자산입니다. 특히 부동산 시장에서는 아파트 단지에 대한 정보가 투자 결정에 큰 영향을 미치기 때문에, 이를 효율적으로 수집하는 방법이 필요합니다. 이번 포스트에서는 Python을 사용하여 아파트 단지 정보를 크롤링하는 방법에 대해 자세히 알아보겠습니다. 이 과정에서는 Naver의 부동산 API를 [&#8230;]]]></description>
										<content:encoded><![CDATA[<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2 ㅣ 현대 사회에서 데이터는 매우 중요한 자산입니다. 특히 부동산 시장에서는 아파트 단지에 대한 정보가 투자 결정에 큰 영향을 미치기 때문에, 이를 효율적으로 수집하는 방법이 필요합니다. 이번 포스트에서는 Python을 사용하여 아파트 단지 정보를 크롤링하는 방법에 대해 자세히 알아보겠습니다. 이 과정에서는 Naver의 부동산 API를 활용하여 세대수, 사용승인일, 평형별 면적 정보 등을 수집할 것입니다.</span></p>
<h3 style="background-color: #ffffff; color: #000000; text-align: start;" data-ke-size="size23"><span style="color: #000000;"><b>부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2</b></span></h3>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter">
<p><figure style="width: 2560px" class="wp-caption alignnone"><img decoding="async" src="https://blog.kakaocdn.net/dn/bHLFUO/btsJDRrpjuV/2IEGJMM9ZuuOlAuuSAgdK1/img.png" alt="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #2" width="2560" height="2560" data-origin-width="2560" data-origin-height="2560" data-is-animation="false" data-filename="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공.png" data-origin- title="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #2 15"><figcaption class="wp-caption-text">부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #2</figcaption></figure><figcaption>부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공</figcaption></figure><div class='code-block code-block-2' style='margin: 8px auto; text-align: center; display: block; clear: both;'>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8940400388075870"
     crossorigin="anonymous"></script>
<!-- 중간 -->
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-8940400388075870"
     data-ad-slot="8794586137"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script></div>

<p>&nbsp;</p>
<p><span style="background-color: #ffffff; color: #000000; text-align: start;">오늘 시리즈는 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 편입니다. 아직 1편을 못 보신 분들이라면 1편을 먼저 읽고 오시는게 도움이 되실 수 있습니다.</span></p>
<p><span style="color: #000000;"><a style="color: #000000;" href="https://aboda.kr/entry/%EB%B6%80%EB%8F%99%EC%82%B0-%EB%A7%A4%EB%AC%BC-%EC%A0%95%EB%B3%B4-%EC%88%98%EC%A7%91%ED%95%98%EA%B8%B0-%EB%B6%80%EB%8F%99%EC%82%B0-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%84%A4%EC%9D%B4%EB%B2%84-%EB%B6%80%EB%8F%99%EC%82%B0-%ED%81%AC%EB%A1%A4%EB%A7%81-%EB%B0%8F-%EA%B0%80%EA%B3%B5-1" target="_blank" rel="noopener">2024.09.15 &#8211; [부동산/자동화 프로젝트] &#8211; 부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1</a></span></p>
<p>&nbsp;</p>
<h2 style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">1. 크롤링의 필요성</span></h2>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">부동산 시장은 끊임없이 변화하고 있으며, 이에 따라 아파트 단지에 대한 정보도 지속적으로 업데이트됩니다. 투자자, 구매자, 임대인 등 다양한 이해관계자들은 이러한 정보를 신속하게 파악해야 합니다. 하지만 수작업으로 정보를 수집하는 것은 시간과 노력이 많이 소요되므로, 자동화된 방법이 필요합니다. Python은 이러한 작업을 수행하기에 적합한 언어로, 다양한 라이브러리를 통해 웹 크롤링을 쉽게 구현할 수 있습니다.</span></p>
<h2 style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">2. 필요한 라이브러리 설치</span></h2>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">Python을 사용하여 웹 크롤링을 수행하기 위해서는 몇 가지 라이브러리를 설치해야 합니다. 주로 사용되는 라이브러리는 requests, BeautifulSoup, 그리고 json입니다. 아래의 명령어를 사용하여 필요한 라이브러리를 설치할 수 있습니다.</span></p>
<div>
<pre class="hljs mipsasm" style="background-color: #1e1e1e; color: #dcdcdc;" contenteditable="false">pip <span class="hljs-keyword">install </span>requests <span class="hljs-keyword">beautifulsoup4 </span>pandas
</pre>
</div>
<h2 style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">3. API 요청을 위한 기본 설정</span></h2>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">Naver의 부동산 API를 사용하여 아파트 단지 정보를 요청하기 위해서는 API의 URL과 요청 헤더를 설정해야 합니다. 아래는 기본적인 설정 코드입니다.</span></p>
<div>
<pre class="hljs pgsql" style="background-color: #1e1e1e; color: #dcdcdc;" contenteditable="false"><span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">import</span> <span class="hljs-type">json</span>
<span class="hljs-keyword">from</span> bs4 <span class="hljs-keyword">import</span> BeautifulSoup

url = "https://new.land.naver.com/api/complexes/overview/"
param = {
    <span class="hljs-string">'complexNo'</span>: <span class="hljs-string">'23620'</span>  # 조회할 아파트 단지 번호
}
<span class="hljs-keyword">header</span> = {
    <span class="hljs-string">'User-Agent'</span>: <span class="hljs-string">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.220 Whale/1.3.51.7 Safari/537.36'</span>,
    <span class="hljs-string">'Referer'</span>: <span class="hljs-string">'https://m.land.naver.com/'</span>
}
</pre>
</div>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">위 코드에서 complexNo는 조회하고자 하는 아파트 단지의 고유 번호입니다. 이 번호는 Naver 부동산 사이트에서 각 단지의 URL을 통해 확인할 수 있습니다.</span></p>
<h2 style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">4. 아파트 단지 정보 요청</span></h2>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">이제 설정한 URL과 헤더를 사용하여 API에 GET 요청을 보내고, 응답을 받아 아파트 단지 정보를 추출해 보겠습니다.</span></p>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;">먼저 단지를 추출할 수 있는 <a href="https://2days.kr/12/12/08/70405/english/">request</a> 주소를 한번 살펴 보겠습니다.</span></p>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000; text-align: left;">아래 캡쳐 화면은 단지 정보를 보여주는 링크를 Postman으로 Request 문과 호출 결과를 조회한 화면입니다. 화면에서 붉은색 상자는 링크를 입력하는 곳이며, 파란색 상자는 Python Request문을 생성해 줍니다. 그리고 초록색 상자는 해당 Requst 호출의 결과를 보기쉽게 보여줍니다. 주의할 점은 생성된 Request 문의 header 값을 추가해주어야지 Naver 사이트에서 원하는 결과값을 얻을 수 있습니다. 아래 코드에 해당 header 값이 있습니다.</span></p>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000; text-align: left;">먼저 Postman을 접속해서 해당 주소를 검색해보겠습니다.</span></p>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000; text-align: left;"><a style="color: #000000;" href="https://www.postman.com/" target="_blank" rel="noopener noreferrer noopener">https://www.postman.com/</a></span></p>
<p style="background-color: #ffffff; color: #000000; text-align: start;"><span style="color: #000000;"><a style="color: #000000;" href="https://new.land.naver.com/api/complexes/overview/23620?complexNo=23620" target="_blank" rel="noopener">https://new.land.naver.com/api/complexes/overview/23620?complexNo=23620</a></span></p>
<div id="SE-737c04ad-8844-4600-9ed2-b5c89df5c568" style="background-color: #f7f7f7; color: #777777; text-align: left;">
<div>
<div>
<div>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/bmdd7n/btsJEo3g7M8/VDP8LBLhLaf3HemQ9k37dk/img.png" data-height="736" data-width="886" data-lazy-src="" data-origin-width="966" data-origin-height="804" data-is-animation="false" alt="img" title="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #2 16"></figure>
</div>
</div>
</div>
</div>
<p><span style="background-color: #ffffff; color: #000000; text-align: left;">Postman을 통해 보여지는 단지 정보의 key와 value입니다.</span></p>
<figure data-ke-type="image" data-ke-style="alignLeft" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/bCQ9LM/btsJDWMPD4W/a0ojq1Eta45UlWpihBsRwK/img.jpg" data-height="640" data-width="298" data-lazy-src="" data-origin-width="298" data-origin-height="640" data-is-animation="false" alt="img" title="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #2 17"></figure>
<p><span style="color: #000000;">Request 정보를 토대로 아래와 같이 코드를 작성해봅니다.</span></p>
<pre id="code_1726353123879" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd
import requests
from bs4 import BeautifulSoup

url = <span class="hljs-string">"https://new.land.naver.com/api/complexes/overview/"</span>

param = {
    <span class="hljs-string">'complexNo'</span>: <span class="hljs-string">'23620'</span>
}
header = {
    <span class="hljs-string">'User-Agent'</span>: <span class="hljs-string">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.220 Whale/1.3.51.7 Safari/537.36'</span>,
    <span class="hljs-string">'Referer'</span>: <span class="hljs-string">'https://m.land.naver.com/'</span>
}
payload = {}

response = requests.request(<span class="hljs-string">"GET"</span>, url+param[<span class="hljs-string">'complexNo'</span>], params=param, headers=header, data=payload)
u = response.url
temp = json.loads(response.text)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"\n\n단지명: %s 사용승인일: %s  세대수: %s \n"</span> \
      %(temp[<span class="hljs-string">'complexName'</span>], temp[<span class="hljs-string">'useApproveYmd'</span>], temp[<span class="hljs-string">'totalHouseHoldCount'</span>]))

<span class="hljs-comment"># 추가 정보 조회</span>
url2 = <span class="hljs-string">"https://m.land.naver.com/complex/info/"</span>+ param[<span class="hljs-string">'complexNo'</span>] + <span class="hljs-string">"?ptpNo=1"</span>
response2 = requests.request(<span class="hljs-string">"GET"</span>, url2, headers=header, data=payload)
doc = BeautifulSoup(response2.text, <span class="hljs-string">'html.parser'</span>)

titles = doc.find_all(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'tit'</span>)
datas = doc.find_all(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'data'</span>)
tmp = dict()
<span class="hljs-keyword">for</span> title, data <span class="hljs-keyword">in</span> zip(titles, datas):
       tmp.setdefault(title.text, data.text.replace(<span class="hljs-string">"\n"</span>, <span class="hljs-string">""</span>).strip())
<span class="hljs-built_in">print</span>(<span class="hljs-string">"용적률: "</span> + tmp[<span class="hljs-string">'용적률'</span>] + <span class="hljs-string">" 건폐율: "</span> + tmp[<span class="hljs-string">'건폐율'</span>])

<span class="hljs-comment"># 평형 별 정보 조회</span>
temp2 = temp[<span class="hljs-string">'pyeongs'</span>]
<span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> temp2:
     <span class="hljs-built_in">print</span>(<span class="hljs-string">"분양: %6s m^2 [ %-5s] 전용: %5s m^2(%5s 평)"</span> \
           %(item[<span class="hljs-string">'supplyArea'</span>], item[<span class="hljs-string">'pyeongName2'</span>], item[<span class="hljs-string">'exclusiveArea'</span>], item[<span class="hljs-string">'exclusivePyeong'</span>]))</pre>
<h3 style="background-color: #ffffff; color: #0d0d0d; text-align: start;"><span style="color: #000000;">필요한 라이브러리 불러오기</span></h3>
<pre id="code_1726353680744" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd
from bs4 import BeautifulSoup</pre>
<ul style="list-style-type: disc; background-color: #ffffff; color: #0d0d0d; text-align: start;" data-ke-list-type="disc">
<li><span style="color: #000000;">requests: HTTP 요청을 보내고 응답을 받기 위해 사용됩니다.</span></li>
<li><span style="color: #000000;">json: API에서 받은 JSON 데이터를 처리하는 데 사용됩니다.</span></li>
<li><span style="color: #000000;">pandas: 데이터 처리를 용이하게 하기 위한 라이브러리. 현재 코드에서는 사용되지 않았지만 나중에 데이터를 처리할 때 유용합니다.</span></li>
<li><span style="color: #000000;">BeautifulSoup: HTML을 파싱하고 필요한 정보를 추출하기 위해 사용됩니다.</span></li>
</ul>
<h3 style="background-color: #ffffff; color: #0d0d0d; text-align: start;"><span style="color: #000000;">API URL과 요청 매개변수 설정</span></h3>
<pre id="code_1726353700028" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">url = <span class="hljs-string">"https://new.land.naver.com/api/complexes/overview/"</span>

param = {
    <span class="hljs-string">'complexNo'</span>: <span class="hljs-string">'23620'</span>
}</pre>
<div style="background-color: #000000;"></div>
<ul style="list-style-type: disc; background-color: #ffffff; color: #0d0d0d; text-align: start;" data-ke-list-type="disc">
<li><span style="color: #000000;">url: 네이버 부동산 API의 단지 정보에 접근하는 URL입니다.</span></li>
<li><span style="color: #000000;">param: 조회하려는 특정 아파트 단지의 고유번호가 담긴 파라미터로, &#8216;23620&#8217;은 &#8220;상암월드컵파크 4단지&#8221;를 나타냅니다.</span></li>
</ul>
<h3 style="background-color: #ffffff; color: #0d0d0d; text-align: start;"><span style="color: #000000;">요청 헤더 설정</span></h3>
<pre id="code_1726353742027" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">header = {
    <span class="hljs-string">'User-Agent'</span>: <span class="hljs-string">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.220 Whale/1.3.51.7 Safari/537.36'</span>,
    <span class="hljs-string">'Referer'</span>: <span class="hljs-string">'https://m.land.naver.com/'</span>
}</pre>
<div style="background-color: #000000;"></div>
<ul style="list-style-type: disc; background-color: #ffffff; color: #0d0d0d; text-align: start;" data-ke-list-type="disc">
<li><span style="color: #000000;">User-Agent: 웹사이트에서 요청을 받을 때 어떤 브라우저로 접근하는지 인식하는 값입니다. 웹 서버에서 비정상적인 접근을 차단하는 경우가 있어, 이 값을 설정하여 사람처럼 보이도록 합니다.</span></li>
<li><span style="color: #000000;">Referer: <a href="https://2days.kr/30/11/12/70250/aboda/">API</a> 요청을 네이버 모바일 부동산 사이트에서 온 것처럼 보이게 만듭니다.</span></li>
</ul>
<h3 style="background-color: #ffffff; color: #0d0d0d; text-align: start;"><span style="color: #000000;">단지 정보 요청 및 출력</span></h3>
<pre id="code_1726353760546" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">response = requests.request(<span class="hljs-string">"GET"</span>, url + param[<span class="hljs-string">'complexNo'</span>], params=param, headers=header, data={})
temp = json.loads(response.text)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"\n\n단지명: %s 사용승인일: %s  세대수: %s \n"</span> \
      %(temp[<span class="hljs-string">'complexName'</span>], temp[<span class="hljs-string">'useApproveYmd'</span>], temp[<span class="hljs-string">'totalHouseHoldCount'</span>]))</pre>
<ul style="list-style-type: disc; background-color: #ffffff; color: #0d0d0d; text-align: start;" data-ke-list-type="disc">
<li><span style="color: #000000;">requests.request(&#8220;GET&#8221;, &#8230;): GET 요청을 통해 API로부터 데이터를 가져옵니다.</span></li>
<li><span style="color: #000000;">json.loads(response.text): API 응답을 JSON 형식으로 변환합니다.</span></li>
<li><span style="color: #000000;">print: 단지명, 사용승인일, 세대수를 출력합니다. 각각 temp에서 추출된 정보입니다.</span></li>
</ul>
<h3 style="background-color: #ffffff; color: #0d0d0d; text-align: start;"><span style="color: #000000;">추가 단지 정보 요청</span></h3>
<pre id="code_1726353790947" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">url2 = <span class="hljs-string">"https://m.land.naver.com/complex/info/"</span>+ param[<span class="hljs-string">'complexNo'</span>] + <span class="hljs-string">"?ptpNo=1"</span>
response2 = requests.request(<span class="hljs-string">"GET"</span>, url2, headers=header, data={})
doc = BeautifulSoup(response2.text, <span class="hljs-string">'html.parser'</span>)</pre>
<ul style="list-style-type: disc; background-color: #ffffff; color: #0d0d0d; text-align: start;" data-ke-list-type="disc">
<li><span style="color: #000000;">url2: 네이버 모바일 부동산 웹페이지에서 해당 단지에 대한 추가 정보를 가져오는 URL입니다.</span></li>
<li><span style="color: #000000;">BeautifulSoup: HTML 응답을 파싱하여 필요한 정보를 쉽게 추출할 수 있도록 합니다.</span></li>
</ul>
<h3 style="background-color: #ffffff; color: #0d0d0d; text-align: start;"><span style="color: #000000;">용적률, 건폐율 추출</span></h3>
<pre id="code_1726353811914" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">titles = doc.find_all(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'tit'</span>)
datas = doc.find_all(<span class="hljs-string">'span'</span>, class_=<span class="hljs-string">'data'</span>)
tmp = dict()
<span class="hljs-keyword">for</span> title, data <span class="hljs-keyword">in</span> zip(titles, datas):
    tmp.setdefault(title.text, data.text.replace(<span class="hljs-string">"\n"</span>, <span class="hljs-string">""</span>).strip())
<span class="hljs-built_in">print</span>(<span class="hljs-string">"용적률: "</span> + tmp[<span class="hljs-string">'용적률'</span>] + <span class="hljs-string">" 건폐율: "</span> + tmp[<span class="hljs-string">'건폐율'</span>])</pre>
<ul style="list-style-type: disc; background-color: #ffffff; color: #0d0d0d; text-align: start;" data-ke-list-type="disc">
<li><span style="color: #000000;">doc.find_all(&#8216;span&#8217;, class_=&#8217;tit&#8217;): HTML에서 &#8216;span&#8217; 태그 중 클래스가 &#8216;tit&#8217;인 모든 요소를 찾습니다. 이는 제목(용적률, 건폐율 등)에 해당합니다.</span></li>
<li><span style="color: #000000;">zip: titles와 datas를 한 쌍으로 묶어 딕셔너리에 저장합니다.</span></li>
<li><span style="color: #000000;">print: 용적률과 건폐율을 출력합니다.</span></li>
</ul>
<h3 style="background-color: #ffffff; color: #0d0d0d; text-align: start;"><span style="color: #000000;">7. 평형별 정보 조회</span></h3>
<pre id="code_1726353824243" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">temp2 = temp[<span class="hljs-string">'pyeongs'</span>]
<span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> temp2:
     <span class="hljs-built_in">print</span>(<span class="hljs-string">"분양: %6s m^2 [ %-5s] 전용: %5s m^2(%5s 평)"</span> \
           %(item[<span class="hljs-string">'supplyArea'</span>], item[<span class="hljs-string">'pyeongName2'</span>], item[<span class="hljs-string">'exclusiveArea'</span>], item[<span class="hljs-string">'exclusivePyeong'</span>]))</pre>
<ul style="list-style-type: disc; background-color: #ffffff; color: #0d0d0d; text-align: start;" data-ke-list-type="disc">
<li><span style="color: #000000;">temp2 = temp[&#8216;pyeongs&#8217;]: 단지 정보에서 평형별 데이터를 추출합니다.</span></li>
<li><span style="color: #000000;">for item in temp2: 각 평형의 정보를 순회하며 분양 면적과 전용 면적을 출력합니다.</span></li>
</ul>
<p><a href="https://2days.kr/15/09/07/56533/coding/data/">부동산 매물 정보 수집하기 &amp;#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1</a></p>
<!-- AI CONTENT END 6 -->
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1</title>
		<link>https://2days.kr/15/09/07/56533/it/program/</link>
		
		<dc:creator><![CDATA[urjent]]></dc:creator>
		<pubDate>Sat, 14 Sep 2024 22:19:38 +0000</pubDate>
				<category><![CDATA[program]]></category>
		<category><![CDATA[네이버]]></category>
		<category><![CDATA[네이버 부동산]]></category>
		<category><![CDATA[네이버 크롤링]]></category>
		<category><![CDATA[네이버 파이썬]]></category>
		<category><![CDATA[부동산 크롤링]]></category>
		<category><![CDATA[부동산 파이썬]]></category>
		<category><![CDATA[크롤링]]></category>
		<category><![CDATA[파이썬]]></category>
		<guid isPermaLink="false">https://2days.kr/?p=56533</guid>

					<description><![CDATA[부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1 ㅣ 네이버 부동산 데이터는 매우 유용하게 활용할 수 있지만, 원하는 형태로 변환된 자료를 얻는 것은 상당히 어렵습니다. Excel의 VBA를 사용하여 데이터를 변환하는 방법이 소개되곤 하지만, Python의 뛰어난 기능을 통해 실시간 부동산 매물 정보를 크롤링하는 구체적인 방법을 공부하고 있습니다. 이제 네이버 부동산에서 방대한 [&#8230;]]]></description>
										<content:encoded><![CDATA[<p>부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1 ㅣ 네이버 부동산 데이터는 매우 유용하게 활용할 수 있지만, 원하는 형태로 변환된 자료를 얻는 것은 상당히 어렵습니다. Excel의 VBA를 사용하여 데이터를 변환하는 방법이 소개되곤 하지만, Python의 뛰어난 기능을 통해 실시간 부동산 매물 정보를 크롤링하는 구체적인 방법을 공부하고 있습니다.</p>
<p style="background-color: #ffffff; color: #0d0d0d; text-align: start;">이제 네이버 부동산에서 방대한 데이터를 Python을 활용해 필요한 정보를 직접 수집해 보겠습니다.</p>
<h3 style="background-color: #ffffff; color: #0d0d0d; text-align: start;" data-ke-size="size23">부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1</h3>
<figure data-ke-type="image" data-ke-mobilestyle="widthOrigin" data-ke-style="alignCenter">
<p><figure style="width: 2560px" class="wp-caption alignnone"><img loading="lazy" decoding="async" src="https://blog.kakaocdn.net/dn/YBEvR/btsJEUU24W5/T4wrGkk09jkMFBxAF1jitK/img.png" alt="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #1" width="2560" height="2560" data-origin-width="2560" data-origin-height="2560" data-is-animation="false" data-filename="부동산 매물 정보 수집하기 - 부동산 데이터 크롤링 및 가공 #1.png" data-origin- title="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #1 22"><figcaption class="wp-caption-text">부동산 매물 정보 수집하기 &#8211; 부동산 데이터 네이버 부동산 크롤링 및 가공 #1</figcaption></figure><figcaption>부동산 매물 정보 수집하기 &#8211; 부동산 데이터 크롤링 및 가공 #1</figcaption></figure><div class='code-block code-block-2' style='margin: 8px auto; text-align: center; display: block; clear: both;'>
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8940400388075870"
     crossorigin="anonymous"></script>
<!-- 중간 -->
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-8940400388075870"
     data-ad-slot="8794586137"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script></div>

<div id="SE-8f4a9d36-5aac-471b-9959-953aee3f8c53" style="background-color: #f7f7f7; color: #777777; text-align: left;">
<div>
<div>
<div>
<p id="SE-D957264E-4CFD-40F9-A4BD-D27C04709E52" style="color: #000000; text-align: var(--se-text-default-value-text-align);">네이버 부동산은 PC 환경과 모바일 환경으로 나누어져 서비스 되어지고 있으며,각각의 환경을 크롤링하는 방식도 차이가 있습니다. 두 방식 중에서 모바일 환경을 크롤링하는 방식이 보여지는 정보가 적어서 보다 유리합니다.</p>
</div>
</div>
</div>
</div>
<div id="SE-d184a622-947d-4abe-99fd-b636b3c39475" style="background-color: #f7f7f7; color: #777777; text-align: left;">
<div>
<div>
<div>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/piDAn/btsJEjVgagn/4I4snJQCRLpWL10yi1OBXk/img.jpg" data-height="563" data-width="886" data-lazy-src="" data-origin-width="966" data-origin-height="615" data-is-animation="false" alt="img" title="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #1 23"></figure>
</div>
<div>
<p id="SE-F5057863-6C9B-467E-A914-7954491FE61E" style="color: #000000; text-align: var(--se-image-default-caption-text-align);"><span style="color: #555555;">PC환경</span></p>
</div>
</div>
</div>
</div>
<div id="SE-a55f1b6b-e89f-465c-a428-6bafdab2e902" style="background-color: #f7f7f7; color: #777777; text-align: left;">
<div>
<div>
<div>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/b9T632/btsJDcb3WhG/Gvbudr4UH8bpHxiYGufkJ0/img.jpg" data-height="632" data-width="886" data-lazy-src="" data-origin-width="966" data-origin-height="691" data-is-animation="false" alt="img" title="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #1 24"></figure>
</div>
<div>
<p id="SE-86964DA9-18AE-417C-940E-E5A8420DD101" style="color: #000000; text-align: var(--se-image-default-caption-text-align);"><span style="color: #555555;">모바일환경</span></p>
</div>
</div>
</div>
</div>
<p>특정 아파트 단지의 매물 정보 가져오기</p>
<p style="background-color: #ffffff; color: #0d0d0d; text-align: start;">아래 이미지에서는 <a href="http://m.land.naver.com에" target="_blank">http://m.land.naver.com에</a> 접속하여 &#8216;상암월드컵파크4단지&#8217;의 매물과 관련된 정보를 보여준다. 여기서 중요한 요소는 아파트 단지의 고유 식별자인 &#8216;23620&#8217;과 거래 방식을 나타내는 코드 &#8216;A1:B1:B2 &#8216;입니다. 각 코드의 의미는 A1은 매매, B1은 전세, B2는 월세, 그리고 B3는 단기임대에 해당합니다.</p>
<p style="background-color: #ffffff; color: #0d0d0d; text-align: start;">예를 들어, 다음 주소에서 매물 정보를 확인할 수 있습니다.다<a href="https://m.land.naver.com/complex/info/23620?tradTpCd=A1:B1:B2:B3&amp;ptpNo=1&amp;bildNo=&amp;articleListYN=Y" target="_blank" rel="noopener">https://m.land.naver.com/complex/info/23620?tradTpCd=A1:B1:B2:B3&amp;ptpNo=1&amp;bildNo=&amp;articleListYN=Y</a></p>
<p>특정단지의 매물값을 가져오는 코드를 다시 작성해보면 아래와 같습니다.</p>
<pre id="code_1726351819860" class="bash hljs" contenteditable="false" data-ke-language="bash" data-ke-type="codeblock">import requests
import json
import pandas as pd

URL = <span class="hljs-string">"https://m.land.naver.com/complex/getComplexArticleList"</span>

parameter = {
    <span class="hljs-string">'hscpNo'</span>: <span class="hljs-string">'23620'</span>, <span class="hljs-comment"># 상암월드컵파크4단지 고유번호</span>
    <span class="hljs-string">'tradTpCd'</span>: <span class="hljs-string">'A1:B1:B2'</span>, <span class="hljs-comment"># 거래방식 3가지</span>
    <span class="hljs-string">'order'</span>: <span class="hljs-string">'spc_'</span>, <span class="hljs-comment"># 면적별 정열</span>
}

header = {
    <span class="hljs-string">'User-Agent'</span>: <span class="hljs-string">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.39'</span>,
    <span class="hljs-string">'Referer'</span>: <span class="hljs-string">'https://m.land.naver.com/'</span>
}

page = 0
lands = []

<span class="hljs-keyword">while</span> True:
    page = page + 1
    parameter[<span class="hljs-string">'page'</span>] = page

    response = requests.get(URL, params=parameter, headers=header)
    <span class="hljs-keyword">if</span> response.status_code != 200:
        <span class="hljs-built_in">print</span>(<span class="hljs-string">'invalid status: %d'</span> % response.status_code)
        <span class="hljs-built_in">break</span>

    data = json.loads(response.text)
    result = data[<span class="hljs-string">'result'</span>]
    <span class="hljs-keyword">if</span> result is None:
        <span class="hljs-built_in">print</span>(<span class="hljs-string">'no result'</span>)
        <span class="hljs-built_in">break</span>
    
    <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> result[<span class="hljs-string">'list'</span>]:
        lands.append([item[<span class="hljs-string">'tradTpNm'</span>], item[<span class="hljs-string">'bildNm'</span>], item[<span class="hljs-string">'flrInfo'</span>], item[<span class="hljs-string">'prcInfo'</span>], item[<span class="hljs-string">'spc1'</span>]])
    
    <span class="hljs-keyword">if</span> result[<span class="hljs-string">'moreDataYn'</span>] == <span class="hljs-string">'N'</span>:
        <span class="hljs-built_in">break</span>
<span class="hljs-built_in">print</span>(pd.DataFrame(lands))</pre>
<div id="SE-75bc509e-e497-4245-9e4c-e9b932f1248d" style="background-color: #f7f7f7; color: #777777; text-align: left;">
<div>
<div>
<div>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/nhjI8/btsJDv92amU/Y1GwjB0SlVeUmAAkjCmBB0/img.jpg" data-height="1001" data-width="886" data-lazy-src="" data-origin-width="792" data-origin-height="895" data-is-animation="false" alt="img" title="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #1 25"></figure>
</div>
</div>
</div>
</div>
<div id="SE-71890684-4379-48e9-ac90-9757a8b0f2d5" style="background-color: #f7f7f7; color: #777777; text-align: left;">
<div>
<div>
<div>
<figure data-ke-type="image" data-ke-style="alignCenter" data-ke-mobilestyle="widthOrigin"><img decoding="async" src="https://blog.kakaocdn.net/dn/2QTjm/btsJDopKzg8/fpaGyMkpsBo2FhV7ueKNvk/img.jpg" data-height="820" data-width="641" data-lazy-src="" data-origin-width="641" data-origin-height="820" data-is-animation="false" alt="img" title="부동산 매물 정보 수집하기 - 부동산 데이터 네이버 부동산 크롤링 및 가공 #1 26"></figure>
</div>
</div>
</div>
</div>
<p>다음편에서는 각 아파트별 정보를 얻을 수 있는 파이썬 코드를 알아보도록 하겠습니다.</p>
<!-- AI CONTENT END 8 -->
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
