<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://sungminoh.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://sungminoh.github.io/" rel="alternate" type="text/html" /><updated>2026-06-05T00:19:48+00:00</updated><id>https://sungminoh.github.io/feed.xml</id><title type="html">Sungmin’s journey</title><subtitle>Developer who is interested in recommendation system, personalization, etc.</subtitle><author><name>Sungmin</name></author><entry><title type="html">Mac sleep에서 깨어날 때 명령어 실행하기</title><link href="https://sungminoh.github.io/posts/productivity/mac-sleepwatcher/" rel="alternate" type="text/html" title="Mac sleep에서 깨어날 때 명령어 실행하기" /><published>2020-12-15T00:00:00+00:00</published><updated>2020-12-15T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/productivity/mac-sleepwatcher</id><content type="html" xml:base="https://sungminoh.github.io/posts/productivity/mac-sleepwatcher/"><![CDATA[<h1 id="mac에서-자동으로-명령어-실행하기">Mac에서 자동으로 명령어 실행하기</h1>

<p>때때로 맥에서 주기적으로 어떤 특정 명령어를 실행하고 싶을 때가 있다. <del>(일정 시간이 지나면 만료되는 토큰을 자동으로 재발급 한다던지…)</del></p>

<p>대부분의 경우 cron을 이용하면 되는데, <code class="language-plaintext highlighter-rouge">crontab -e</code> 으로 cron 설정을 에디터에서 띄우고 아래와 같은 식으로 설정해 주면 된다.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 <span class="k">*</span> <span class="k">*</span> <span class="k">*</span> <span class="k">*</span> <span class="o">[</span>실행할 커맨드] <span class="o">&gt;</span> /tmp/cron.log 2&gt;/tmp/cron.error.log
</code></pre></div></div>

<p>다만 cron은 맥이 깨어 있을 때에만 동작하므로, 맥을 처음 키거나 슬립에서 깨어났을 경우에는 명령어가 돌기 전인 상태 일 수 있다.</p>

<p>이럴땐 그냥 명령어를 직접 실행해도 되고 다음 크론 스케쥴까지 기다려도 되지만, 아무래도 컴퓨터의 상태가 그때 그때 다를 수 있다는게 여간 귀찮은게 아니므로 이것도 자동화 해본다.</p>

<h1 id="sleepwatcher"><a href="https://www.bernhard-baehr.de/">SleepWatcher</a></h1>

<p>맥이 sleep에 빠지거나 혹은 wake up하는 상태 변경을 모니터링 하는 데몬이다. 이를 이용하면 sleep, wake up 을 트리거로 특정 작업을 수행할 수 있다.</p>

<h2 id="install">Install</h2>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew <span class="nb">install </span>sleepwatcher
</code></pre></div></div>

<p>설치를 하고 나면 <code class="language-plaintext highlighter-rouge">/usr/local/Cellar/sleepwatcher/&lt;version&gt;/</code>에 <code class="language-plaintext highlighter-rouge">de.bernhard-baehr.sleepwatcher-20compatibility-localuser.plist</code> 와 같은 plist파일이 생긴다.</p>

<p>이를 <code class="language-plaintext highlighter-rouge">~/Library/LaunchAgents/</code> 로 적당하게 복사해준다.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cp</span> /usr/local/Cellar/sleepwatcher/2.2.1/de.bernhard-baehr.sleepwatcher-20compatibility-localuser.plist ~/Library/LaunchAgents/my.sleepwatcher.plist
</code></pre></div></div>

<h2 id="configure">Configure</h2>

<p>아마 해당 plist 파일의 내용을 보면</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span class="cp">&lt;!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"&gt;</span>
<span class="nt">&lt;plist</span> <span class="na">version=</span><span class="s">"1.0"</span><span class="nt">&gt;</span>
<span class="nt">&lt;dict&gt;</span>
	<span class="nt">&lt;key&gt;</span>Label<span class="nt">&lt;/key&gt;</span>
	<span class="nt">&lt;string&gt;</span>de.bernhard-baehr.sleepwatcher<span class="nt">&lt;/string&gt;</span>
	<span class="nt">&lt;key&gt;</span>ProgramArguments<span class="nt">&lt;/key&gt;</span>
	<span class="nt">&lt;array&gt;</span>
		<span class="nt">&lt;string&gt;</span>/usr/local/sbin/sleepwatcher<span class="nt">&lt;/string&gt;</span>
		<span class="nt">&lt;string&gt;</span>-V<span class="nt">&lt;/string&gt;</span>
		<span class="nt">&lt;string&gt;</span>-s ~/.sleep<span class="nt">&lt;/string&gt;</span>
		<span class="nt">&lt;string&gt;</span>-w ~/.wakeup<span class="nt">&lt;/string&gt;</span>
	<span class="nt">&lt;/array&gt;</span>
	<span class="nt">&lt;key&gt;</span>RunAtLoad<span class="nt">&lt;/key&gt;</span>
	<span class="nt">&lt;true/&gt;</span>
	<span class="nt">&lt;key&gt;</span>KeepAlive<span class="nt">&lt;/key&gt;</span>
	<span class="nt">&lt;true/&gt;</span>
<span class="nt">&lt;/dict&gt;</span>
<span class="nt">&lt;/plist&gt;</span>
</code></pre></div></div>

<p>와 같이 되어있을텐데, 이에 맞추어서 홈에 <code class="language-plaintext highlighter-rouge">.sleep</code>, <code class="language-plaintext highlighter-rouge">.wakeup</code> 파일을 생성하고 executable로 만든다.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">touch</span> ~/.wakeup<span class="p">;</span> <span class="nb">chmod</span> +x ~/.wakeup<span class="p">;</span>
<span class="nb">touch</span> ~/.sleep<span class="p">;</span> <span class="nb">chmod</span> +x ~/.sleep
</code></pre></div></div>

<p>그리고 각 파일 안에, sleep이나 wake up 시 수행할 코드를 넣어두면 된다.</p>

<h2 id="run">Run</h2>

<p>이제 데몬을 실행한다.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>launchctl start ~/Library/LaunchAgents/my.sleepwatcher.plist
</code></pre></div></div>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="productivity" /><category term="mac" /><summary type="html"><![CDATA[Mac에서 자동으로 명령어 실행하기]]></summary></entry><entry><title type="html">Python 싱글톤 패턴</title><link href="https://sungminoh.github.io/posts/development/python-singleton/" rel="alternate" type="text/html" title="Python 싱글톤 패턴" /><published>2020-09-07T00:00:00+00:00</published><updated>2020-09-07T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/python-singleton</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/python-singleton/"><![CDATA[<h1 id="python-thread-safe-singleton">Python thread-safe singleton</h1>

<p>모니터링 서비스 클래스를 구현하다가 파이썬 싱글톤 패턴을 어떻게 구현하는게 좋을지 고민을 해보았다.</p>

<p>Python은 자유도가 높다보니 싱그톤 패턴을 구현하는 방법도 여러가지가 있는데, https://stackoverflow.com/questions/6760685/creating-a-singleton-in-python 이 글이 각 방법을 잘 비교 설명하고 있어 간략히 번역&amp;정리하고, 내 구현을 소개한다.</p>

<h2 id="stackoverflow-글">Stackoverflow 글</h2>

<h3 id="1-decorator">1. Decorator</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">singleton</span><span class="p">(</span><span class="n">class_</span><span class="p">):</span>
    <span class="n">instances</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">def</span> <span class="nf">getinstance</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="k">if</span> <span class="n">class_</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">instances</span><span class="p">:</span>
            <span class="n">instances</span><span class="p">[</span><span class="n">class_</span><span class="p">]</span> <span class="o">=</span> <span class="n">class_</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">instances</span><span class="p">[</span><span class="n">class_</span><span class="p">]</span>
    <span class="k">return</span> <span class="n">getinstance</span>

<span class="o">@</span><span class="n">singleton</span>
<span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="n">BaseClass</span><span class="p">):</span>
    <span class="k">pass</span>
</code></pre></div></div>

<ul>
  <li>장점: 상속 같은거보다 직관적이다</li>
  <li>단점: 클래스가 decorator로 감싸져서 함수가 되어버린다. 때문에 클래스 메소드에 접근할 수 없다.</li>
</ul>

<h3 id="2-base-class">2 .Base class</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Singleton</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
    <span class="n">_instance</span> <span class="o">=</span> <span class="bp">None</span>
    <span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">cls</span><span class="p">.</span><span class="n">_instance</span><span class="p">,</span> <span class="n">cls</span><span class="p">):</span>
            <span class="n">class_</span><span class="p">.</span><span class="n">_instance</span> <span class="o">=</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">class_</span><span class="p">.</span><span class="n">_instance</span>

<span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="n">Singleton</span><span class="p">,</span> <span class="n">BaseClass</span><span class="p">):</span>
    <span class="k">pass</span>
</code></pre></div></div>

<ul>
  <li>장점: 클래스다.</li>
  <li>단점: <code class="language-plaintext highlighter-rouge">MyClass</code>의 <code class="language-plaintext highlighter-rouge">__new__</code> 가 계속 호출된다.</li>
</ul>

<h3 id="3-meta-class">3. Meta class</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Singleton</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span>
    <span class="n">_instances</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">def</span> <span class="nf">__call__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="k">if</span> <span class="n">cls</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">:</span>
            <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">[</span><span class="n">cls</span><span class="p">]</span> <span class="o">=</span> <span class="nb">super</span><span class="p">().</span><span class="n">__call__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="k">else</span><span class="p">:</span>  <span class="c1"># 매번 __init__ 호출하고 싶으면
</span>            <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">[</span><span class="n">cls</span><span class="p">].</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">[</span><span class="n">cls</span><span class="p">]</span>

<span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="n">BaseClass</span><span class="p">,</span> <span class="n">metaclass</span><span class="o">=</span><span class="n">Singleton</span><span class="p">):</span>
    <span class="k">pass</span>
</code></pre></div></div>

<ul>
  <li>장점: 클래스고, 상속 신경쓸 필요없고, 메타클래스의 의미에 들어맞는다. <code class="language-plaintext highlighter-rouge">MyClass</code>의 <code class="language-plaintext highlighter-rouge">__new__</code> 는 처음 인스턴스 생성될 때 한번만 호출된다.</li>
  <li>단점: ABC처럼 다른 meta class를 상속하는 클래스를 상속하지 못한다.</li>
</ul>

<h3 id="4-그냥-module-사용">4. 그냥 Module 사용</h3>

<ul>
  <li>장점: 싱글톤 자체가 안티패턴이다. global과 다를바 없으니 그냥 모듈을 사용하자!</li>
  <li>단점: 상속은? Lazy evaluation은? non-singleton으로 리팩토링 하고싶을때는?</li>
</ul>

<h2 id="그래서">그래서..?</h2>

<p>이 중에서 3번 방법이 가장 괜찮은 것 같지만, multi thread 환경에서 잘 동작하게 하기 위해서는 추가적인 구현이 필요하다. 또, ABC를 대체할 만한 다른 구현체도 필요하다.</p>

<p><code class="language-plaintext highlighter-rouge">ABC</code>는 별 일 하지 않는 <code class="language-plaintext highlighter-rouge">ABCMeta</code>를 상속한다. 싱글톤 meta class를 구현할 때에도 이 <code class="language-plaintext highlighter-rouge">ABCMeta</code> 를 상속하여, 하위 클래스가 Abstract class가 될 수 있도록 한다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SingletonABCMeta</span><span class="p">(</span><span class="n">ABCMeta</span><span class="p">):</span>
    <span class="n">_instances</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="n">_locks</span> <span class="o">=</span> <span class="p">{}</span>

    <span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">mcls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">class_dict</span><span class="p">):</span>
        <span class="n">module</span> <span class="o">=</span> <span class="n">class_dict</span><span class="p">[</span><span class="s">'__module__'</span><span class="p">]</span>
        <span class="n">classname</span> <span class="o">=</span> <span class="n">class_dict</span><span class="p">[</span><span class="s">'__qualname__'</span><span class="p">]</span>
        <span class="n">mcls</span><span class="p">.</span><span class="n">_locks</span><span class="p">[</span><span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">module</span><span class="si">}</span><span class="s">.</span><span class="si">{</span><span class="n">classname</span><span class="si">}</span><span class="s">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">threading</span><span class="p">.</span><span class="n">Lock</span><span class="p">()</span>
        <span class="k">return</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">class_dict</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">__call__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="n">module</span> <span class="o">=</span> <span class="n">cls</span><span class="p">.</span><span class="n">__module__</span>
        <span class="n">classname</span> <span class="o">=</span> <span class="n">cls</span><span class="p">.</span><span class="n">__name__</span>
        <span class="n">name</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">module</span><span class="si">}</span><span class="s">.</span><span class="si">{</span><span class="n">classname</span><span class="si">}</span><span class="s">'</span>
        <span class="n">lock</span> <span class="o">=</span> <span class="n">cls</span><span class="p">.</span><span class="n">_locks</span><span class="p">[</span><span class="n">name</span><span class="p">]</span>
        <span class="k">if</span> <span class="n">cls</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">:</span>
            <span class="k">try</span><span class="p">:</span>
                <span class="n">lock</span><span class="p">.</span><span class="n">acquire</span><span class="p">()</span>
                <span class="k">if</span> <span class="n">cls</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">:</span>
                    <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">[</span><span class="n">cls</span><span class="p">]</span> <span class="o">=</span> <span class="nb">super</span><span class="p">().</span><span class="n">__call__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
                    <span class="k">print</span><span class="p">(</span><span class="s">'Singleton class %r is instantiated'</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
                <span class="k">else</span><span class="p">:</span>  <span class="c1"># To call __init__ every time.
</span>                    <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">[</span><span class="n">cls</span><span class="p">].</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
            <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
                <span class="k">raise</span> <span class="nb">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s">'Fail to instantiate </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">'</span><span class="p">)</span> <span class="k">from</span> <span class="n">e</span>
            <span class="k">finally</span><span class="p">:</span>
                <span class="n">lock</span><span class="p">.</span><span class="n">release</span><span class="p">()</span>
        <span class="k">return</span> <span class="n">cls</span><span class="p">.</span><span class="n">_instances</span><span class="p">[</span><span class="n">cls</span><span class="p">]</span>


<span class="k">class</span> <span class="nc">SingletonABC</span><span class="p">(</span><span class="n">metaclass</span><span class="o">=</span><span class="n">SingletonABCMeta</span><span class="p">):</span>
    <span class="k">pass</span>


<span class="k">class</span> <span class="nc">Singleton</span><span class="p">(</span><span class="n">SingletonABC</span><span class="p">):</span>
    <span class="k">pass</span>
</code></pre></div></div>

<h2 id="근데-싱글톤-패턴-진짜-써야해">근데 싱글톤 패턴 진짜 써야해?</h2>

<p>global 변수는 상태가 공유된다는게 명시적이지만, 싱글톤은 이미 생성된 인스턴스를 재활용함으로써 implicity하게 상태를 공유하는 문제가 있다.</p>

<p>하지만 Constant/Context 처럼 그 자체 의미상 전역에서 공유되어야 하는 상태값들일 때이거나, Logger나 Monitoring Service와 같이 데이터를 흘려보내는 역할만 할 때에는 실제로 상태를 공유한다기보단 동일한 설정을 사용할 뿐이므로 괜찮다고 본다.</p>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="python" /><summary type="html"><![CDATA[Python thread-safe singleton]]></summary></entry><entry><title type="html">로컬 스파크에서 snappy 읽기 (hadoop build)</title><link href="https://sungminoh.github.io/posts/development/build-hadoop-with-snappy-support/" rel="alternate" type="text/html" title="로컬 스파크에서 snappy 읽기 (hadoop build)" /><published>2019-04-26T00:00:00+00:00</published><updated>2019-04-26T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/build-hadoop-with-snappy-support</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/build-hadoop-with-snappy-support/"><![CDATA[<p>하둡에 데이터를 저정할 때에는 저장 공간과 분산처리시의 네트워크 전송 속도를 줄이기 위해 파일을 압축해서 저장하곤 한다.</p>

<p>이때 흔히 사용하는 압축 방식 중 하나로 구글에거 개발한 <a href="https://github.com/google/snappy">snappy</a>가 있는데, gzip(GNU zip)의 압축률보다는 보다는 못하지만 CPU 사용량이 적고 압축/압축해제에 걸리는 시간이 조금들어 실제 맵리듀스에서 더 나은 성능을 보인다고 한다.</p>

<p>때문에, 많은 분산시스템에서 snappy를 설치하여 압축 저장을 하고 있는데, 문제는 snappy가 없는 환경에서 이 파일을 읽으려고 할 때이다.</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<p>맥북에 <code class="language-plaintext highlighter-rouge">brew</code>로 스파크를 설치하고, snappy 압축된 텍스트파일을 읽어오려 해보자.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sc</span><span class="p">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">'text-data.snappy'</span><span class="p">)</span>
<span class="c1">## ERROR
## native snappy library not available: this version of libhadoop was built without snappy support.
</span></code></pre></div></div>

<p>원래 로컬에서 간단히 스파크를 사용하기 위해서 하둡을 따로 설치할 필요는 없지만, 하둡에서 기본적으로 지원 되지 않는 포멧도 처리하기 위해서는 라이브러리를 추가해주어야한다.</p>

<p>그 라이브러리를 얻기 위해서 위 에러메시지에서 얘기하는대로 snappy support와 함께 하둡을 직접 빌드해보자.</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<h2 id="install-prerequisites">Install prerequisites</h2>

<p>하둡을 빌드하기 위해 필요한 프로그램들을 설치한다.</p>

<h3 id="make-make-snappy">make, make, snappy</h3>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install make and cmake</span>
brew <span class="nb">install </span>make
brew <span class="nb">install </span>cmake

<span class="c"># Install snappy</span>
brew <span class="nb">install </span>snappy
</code></pre></div></div>

<h3 id="openssl">OpenSSL</h3>

<p>Hadoop 2.7.7 버전만인지 모르겠지만, OpenSSL 1.1 버전을 사용하면 아래와 같은 에러가 발생한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>variable has incomplete <span class="nb">type</span> <span class="s1">'HMAC_CTX'</span> <span class="o">(</span>aka <span class="s1">'hmac_ctx_st'</span><span class="o">)</span>
</code></pre></div></div>

<p>따라서 1.0을 설치하고 잠시 1.0버전을 디폴트로 설정한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install openssl@1.0</span>
<span class="c"># brew depreacted 1.0</span>
<span class="c"># brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/64555220bfbf4a25598523c2e4d3a232560eaad7/Formula/openssl.rb -f</span>
brew <span class="nb">install </span>rbenv/tap/openssl@1.0
<span class="nb">rm</span> <span class="nt">-rf</span> /usr/local/opt/openssl
<span class="nb">ln</span> <span class="nt">-s</span> /usr/local/Cellar/openssl@1.0/1.0.2t /usr/local/opt/openssl
</code></pre></div></div>

<h3 id="protobuf">protobuf</h3>

<p>하둡을 빌드할 땐 2.5.0 버전의 protocbuf가 필요하니, 만약 3.5 버전 등 을 사용하고 있다면 잠시 protoc를 unlink하거나 옮겨두어야한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Download protobuf-2.5.0 must be 2.5.0</span>
wget https://github.com/protocolbuffers/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
<span class="nb">tar </span>xzf protobuf-2.5.0.tar.gz
<span class="c"># Install protobuf</span>
<span class="nb">cd </span>protobuf-2.5.0
./configure
make
make <span class="nb">install</span>
</code></pre></div></div>

<p>이제 <code class="language-plaintext highlighter-rouge">/usr/local/bin</code>에 <code class="language-plaintext highlighter-rouge">protoc</code>파일이 생기되고, 하둡빌드는 이 바이너리를 이용한다.</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<h2 id="build-hadoop-native-from-the-source-with-snappy-support">Build hadoop native from the source with snappy support</h2>

<p>우선 하둡을 다운로드한다. (2.7.10 은 안된다)</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7-src.tar.gz
<span class="nb">tar </span>xzf hadoop-2.7.7-src.tar.gz
<span class="nb">cd </span>hadoop-2.7.7-src
</code></pre></div></div>

<p>아래와 같이 필요한 환경변수 설정하고 빌드한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Exports some env variables</span>
<span class="nb">export </span><span class="nv">JAVA_HOME</span><span class="o">=</span><span class="si">$(</span>/usr/libexec/java_home<span class="si">)</span>
<span class="nb">export </span><span class="nv">OPENSSL_ROOT_DIR</span><span class="o">=</span>/usr/local/opt/openssl
<span class="nb">export </span><span class="nv">OPENSSL_LIBRARIES</span><span class="o">=</span>/usr/local/opt/openssl/lib
<span class="c"># Build</span>
mvn package <span class="nt">-Pdist</span>,native <span class="nt">-DskipTests</span> <span class="nt">-Dtar</span> <span class="nt">-e</span>
<span class="c"># Move</span>
<span class="nb">mkdir</span> <span class="nt">-p</span> /usr/local/Cellar/hadoop/hadoop-2.7.7
<span class="nb">cp</span> <span class="nt">-R</span> hadoop-dist/target/hadoop-2.7.7/lib /usr/local/Cellar/hadoop/hadoop-2.7.7
</code></pre></div></div>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<h2 id="add-extra-driver-library-for-spark-to-read-snappy-file">Add extra driver library for spark to read snappy file</h2>

<p><code class="language-plaintext highlighter-rouge">$SPARK_HOME/conf/spark-defaults.conf</code> 에 다음과 같이 <code class="language-plaintext highlighter-rouge">extraLibraryPath</code> 설정을 추가한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo</span> <span class="s1">'spark.driver.extraLibraryPath    /usr/local/Cellar/hadoop/hadoop-2.7.7/lib/native'</span> <span class="o">&gt;&gt;</span> <span class="nv">$SPARK_HOME</span>/conf/spark-defaults.conf
</code></pre></div></div>

<p>여기까지 하고 다시 snappy 압축 파일을 읽어보면, 성공적으로 데이터를 조회할 수 있을 것이다.</p>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="spark" /><category term="hadoop" /><summary type="html"><![CDATA[하둡에 데이터를 저정할 때에는 저장 공간과 분산처리시의 네트워크 전송 속도를 줄이기 위해 파일을 압축해서 저장하곤 한다.]]></summary></entry><entry><title type="html">스태틱 클래스 테스트하기</title><link href="https://sungminoh.github.io/posts/development/java-mock-static-method/" rel="alternate" type="text/html" title="스태틱 클래스 테스트하기" /><published>2019-04-26T00:00:00+00:00</published><updated>2019-04-26T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/java-mock-static-method</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/java-mock-static-method/"><![CDATA[<p>테스트 코드는 한 메소드만 테스트 하는 것이 좋지만, 해당 메소드가 다른 메소드를 사용하거나 할 때에는 어디까지 고려를 해야할지 애매할 때가 많다. 다음 예시를 보자</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">Foo</span> <span class="o">{</span>
  <span class="kd">public</span> <span class="kd">static</span> <span class="nc">Double</span> <span class="nf">getMax</span><span class="o">(</span><span class="nc">List</span><span class="o">&lt;</span><span class="nc">Double</span><span class="o">&gt;</span> <span class="n">numbers</span><span class="o">)</span> <span class="o">{}</span>
  <span class="kd">public</span> <span class="kd">static</span> <span class="nc">Double</span> <span class="nf">getMin</span><span class="o">(</span><span class="nc">List</span><span class="o">&lt;</span><span class="nc">Double</span><span class="o">&gt;</span> <span class="n">numbers</span><span class="o">)</span> <span class="o">{}</span>
  <span class="kd">public</span> <span class="kd">static</span> <span class="nc">List</span><span class="o">&lt;</span><span class="nc">Double</span><span class="o">&gt;</span> <span class="nf">normalize</span><span class="o">(</span><span class="nc">List</span><span class="o">&lt;</span><span class="nc">Double</span><span class="o">&gt;</span> <span class="n">numbers</span><span class="o">)</span> <span class="o">{</span>
    <span class="kt">double</span> <span class="n">max</span> <span class="o">=</span> <span class="n">getMax</span><span class="o">();</span>
    <span class="kt">double</span> <span class="n">min</span> <span class="o">=</span> <span class="n">getMin</span><span class="o">();</span>
    <span class="k">return</span> <span class="n">numbers</span><span class="o">.</span><span class="na">stream</span><span class="o">()</span>
      <span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">n</span> <span class="o">-&gt;</span> <span class="n">n</span> <span class="o">/</span> <span class="o">(</span><span class="n">max</span> <span class="o">-</span> <span class="n">min</span><span class="o">))</span>
      <span class="o">.</span><span class="na">collect</span><span class="o">(</span><span class="nc">Collectors</span><span class="o">.</span><span class="na">toList</span><span class="o">());</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>위 코드에서 <code class="language-plaintext highlighter-rouge">normalize</code> 메소드를 테스트 하고 싶다면, <code class="language-plaintext highlighter-rouge">getMax</code>, <code class="language-plaintext highlighter-rouge">getMin</code>이 무엇을 리턴할지까지 고려해서 테스트코드를 짜야할까? 답은 사실 간단하다. 만약 <code class="language-plaintext highlighter-rouge">normalize</code> 메소드 하나를 테스트하기 위해 다른 메소들까지 제대로 된 값을 내놓는지 테스트 함께 하게 되면, 다른 메소드들의 로직이 바뀌었을 때 마다 <code class="language-plaintext highlighter-rouge">normalize</code> 의 테스트코드도 수정해 줘야 할 수도 있다.</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<p>테스트코드에서 메소드간의 의존성을 제거하기 위해 흔히 사용하는 프레임워크로는 <a href="https://site.mockito.org/">mockito</a> 가 있다. mockito의 <code class="language-plaintext highlighter-rouge">mock</code> 메소드를 사용하면, 인스턴스가 하나 주어지는데, <code class="language-plaintext highlighter-rouge">when</code> 메소드를 사용하면 이 인스턴스의 메소드 콜의 결과 값을 overwrite할 수 있다.(method stub 라고 한다.) 이렇게 가짜 인스턴스를 만들면 테스트하고자 하는 클래스에 인젝션하거나(db connection, file system등의 테스트), 해당 클래스의 다른 메소드(subrutine을 갖는 메소드)를 좀 더 쉽게 테스트 할 수 있다.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mport</span> <span class="kd">static</span> <span class="n">org</span><span class="o">.</span><span class="na">mockito</span><span class="o">.</span><span class="na">Mockito</span><span class="o">.*;</span>
<span class="c1">// mock creation</span>
<span class="nc">List</span> <span class="n">mockedList</span> <span class="o">=</span> <span class="n">mock</span><span class="o">(</span><span class="nc">List</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="n">when</span><span class="o">(</span><span class="n">mockedList</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">any</span><span class="o">())).</span><span class="na">thenReturn</span><span class="o">(</span><span class="s">"first"</span><span class="o">);</span>
<span class="c1">// using mock object - it does not throw any "unexpected interaction" exception</span>
<span class="n">mockedList</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">"one"</span><span class="o">);</span>
<span class="n">mockedList</span><span class="o">.</span><span class="na">clear</span><span class="o">();</span>
<span class="c1">// selective, explicit, highly readable verification</span>
<span class="n">verify</span><span class="o">(</span><span class="n">mockedList</span><span class="o">).</span><span class="na">add</span><span class="o">(</span><span class="s">"one"</span><span class="o">);</span>
<span class="n">verify</span><span class="o">(</span><span class="n">mockedList</span><span class="o">).</span><span class="na">clear</span><span class="o">();</span>
<span class="c1">// mock method acts as we defined</span>
<span class="n">assertEquals</span><span class="o">(</span><span class="s">"first"</span><span class="o">,</span> <span class="n">mockList</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="mi">0</span><span class="o">));</span>
</code></pre></div></div>

<p><br /></p>

<p>Spring 에서도 그렇고 많은 경우 자바에서는 static 클래스 보다는 singleton 패턴을 사용하기에 대부분의 경우에는 mockito로도 무리없이 테스트코드를 작서할 수 있지만, 때때로 static 클래스를 사용해야 할 때가 있다. 만약 앞에 예로든 <code class="language-plaintext highlighter-rouge">Foo</code> 의 메소드를 테스트 하기 위해서는 어떻게 해야할까?</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">Foo</span> <span class="n">foo</span> <span class="o">=</span> <span class="n">mock</span><span class="o">(</span><span class="nc">Foo</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>  <span class="c1">// ???</span>
</code></pre></div></div>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<p>mockito의 api중에서는 쓸만한 것이 딱히 없고, 아래와 같이 직접 mock 클래스를 구현해서 테스트 할 수 있을텐데, 조작하고자 하는 메소드 조합에 따라 모두 이런 클래스를 만들 순 없을 것이다.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nf">MockFoo</span><span class="o">()</span> <span class="kd">extends</span> <span class="nc">Foo</span> <span class="o">{</span>
  <span class="kd">private</span> <span class="kd">static</span> <span class="nc">Double</span> <span class="nf">getMax</span><span class="o">()</span> <span class="o">{</span>
    <span class="k">return</span> <span class="mi">3</span><span class="o">;</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>그래서 이럴때 mockito처럼 사용할 수 있는 것이 있는데 그게 바로 powermockito이다. 아래와 같이 의존성을 추가하면 powermockito를 사용할 수 있다.</p>

<div class="language-groovy highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">testCompile</span> <span class="nl">group:</span> <span class="s1">'org.powermock'</span><span class="o">,</span> <span class="nl">name:</span> <span class="s1">'powermock-api-mockito'</span><span class="o">,</span> <span class="nl">version:</span> <span class="s1">'1.7.3'</span>
</code></pre></div></div>

<p>하지만 이건 mockito처럼 단순하게 사용할 수 없고, 테스트 클래스에 <code class="language-plaintext highlighter-rouge">@RunwWith(PowerMockRunner.class)</code> 와 stub 하고자 하는 클래스를 인자로 준 <code class="language-plaintext highlighter-rouge">@PrepareForTest({Foo.class, AnotherClassToMock.class})</code> 어노테이션을 달아주어야한다. 몇몇 에러를 무시하고 싶다면 <code class="language-plaintext highlighter-rouge">@PowerMockIgnore({}) </code> 를 사용할 수 있다.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@RunWith</span><span class="o">(</span><span class="nc">PowerMockRunner</span><span class="o">.</span><span class="na">class</span><span class="o">)</span>  <span class="c1">// These two lines are</span>
<span class="nd">@PrepareForTest</span><span class="o">({</span><span class="nc">Foo</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="nc">UserGroupInformation</span><span class="o">.</span><span class="na">class</span><span class="o">})</span>  <span class="c1">// what is necessary</span>
<span class="nd">@PowerMockIgnore</span><span class="o">({</span><span class="s">"javax.management.*"</span><span class="o">,</span> <span class="s">"javax.xml."</span><span class="o">,</span> <span class="s">"org.w3c."</span><span class="o">,</span> <span class="s">"org.apache.apache._"</span><span class="o">,</span> <span class="s">"com.sun.*"</span><span class="o">})</span>  <span class="c1">// This isn't. This is to ignore some errors</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">FooTest</span> <span class="o">{</span>
  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">normalizeTest</span><span class="o">()</span> <span class="o">{</span>
  	<span class="c1">// Stub all methods. The others will return null or a default value.</span>
    <span class="nc">PowerMockito</span><span class="o">.</span><span class="na">mockStatic</span><span class="o">(</span><span class="nc">Foo</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
    <span class="nc">Mockito</span><span class="o">.</span><span class="na">when</span><span class="o">(</span><span class="nc">UserGroupInformation</span><span class="o">.</span><span class="na">getMax</span><span class="o">()).</span><span class="na">thenReturn</span><span class="o">(</span><span class="mi">3</span><span class="o">);</span>

    <span class="c1">// Mock a method and leave the others.</span>
    <span class="nc">PowerMockito</span><span class="o">.</span><span class="na">spy</span><span class="o">(</span><span class="nc">Foo</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
    <span class="nc">PowerMockito</span><span class="o">.</span><span class="na">doReturn</span><span class="o">(</span><span class="mi">2</span><span class="o">.).</span><span class="na">when</span><span class="o">(</span><span class="nc">Foo</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="s">"getMax"</span><span class="o">);</span>
    <span class="nc">PowerMockito</span><span class="o">.</span><span class="na">doReturn</span><span class="o">(</span><span class="mi">1</span><span class="o">.).</span><span class="na">when</span><span class="o">(</span><span class="nc">Foo</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="s">"getMin"</span><span class="o">);</span>

    <span class="c1">// To test the result</span>
    <span class="n">assertEquals</span><span class="o">(</span><span class="nc">Array</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">1</span><span class="o">.,</span> <span class="mi">3</span><span class="o">.,</span> <span class="mi">5</span><span class="o">.),</span> <span class="nc">Foo</span><span class="o">.</span><span class="na">normalize</span><span class="o">(</span><span class="nc">Array</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">1</span><span class="o">.,</span> <span class="mi">3</span><span class="o">.,</span> <span class="mi">5</span><span class="o">.)));</span>
    <span class="c1">// To count the number of method calls</span>
    <span class="nc">PowerMockito</span><span class="o">.</span><span class="na">verifyStatic</span><span class="o">(</span><span class="nc">Mockito</span><span class="o">.</span><span class="na">times</span><span class="o">(</span><span class="mi">1</span><span class="o">));</span>
		<span class="nc">Foo</span><span class="o">.</span><span class="na">getMax</span><span class="o">(</span><span class="nc">Mockito</span><span class="o">.</span><span class="na">any</span><span class="o">());</span>
		<span class="nc">Foo</span><span class="o">.</span><span class="na">getMin</span><span class="o">(</span><span class="nc">Mockito</span><span class="o">.</span><span class="na">any</span><span class="o">());</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">mockStatic</code> 는 클래스의 모든 메소드를 바꾸어 버리므로 <code class="language-plaintext highlighter-rouge">when</code> 을 통해 명시 하지 않은 메소드들은 <code class="language-plaintext highlighter-rouge">null</code>이나 기본값을 리턴하는 것에 유의해야한다. 특정 메소드만 stub하고 나머지 메소드는 그대로 사용하고 싶다면 <code class="language-plaintext highlighter-rouge">spy</code> 를 만들어 <code class="language-plaintext highlighter-rouge">doReturn</code> 구문을 통해 리턴값을 변경하여야 한다.</p>

<p>서브루틴의 콜횟수를 검증하고 싶다면 <code class="language-plaintext highlighter-rouge">verifyStatic</code> 을 사용하는데, 이 메소드는 인자로 메소드를 받지 않고 횟수만 받으며, 그 이후에 나오는 메소드 콜들의 호출휫수를 검증한다.</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<p>다만 powermockito를 사용할 때 문제가 있는데 만약 테스트코드에서 스파크를 사용한다면 <code class="language-plaintext highlighter-rouge">SparkContext</code> 를 생성할 때 아래와 같은 에러가 발생할 것이다.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java.io.IOException: failure to login
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:796)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2162)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2162)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2162)
    at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:301)
</code></pre></div></div>

<p>이건 <code class="language-plaintext highlighter-rouge">@RunWith</code> 어노테이션 때문에 발생하는 일인데, 이 어노테이션은 꼭 필요하므로 차라리 문제가 발생하는 부분을 추가적으로 mocking하는 편이 낫다.</p>

<p>에러를 잘 따라가보면, <code class="language-plaintext highlighter-rouge">SparkContext</code> 에서 user를 가져오기 위해 <code class="language-plaintext highlighter-rouge">UserGroupInformation.getCurrentUser</code> 를 호출하는데, 여기서 에러가 나는 것을 알 수 있다. 아래와 같이 이 메소드를 stub한다면 <code class="language-plaintext highlighter-rouge">SparkContext</code> 가 정상적으로 생성될 것이다.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">UserGroupInformation</span> <span class="n">mock</span> <span class="o">=</span> <span class="nc">PowerMockito</span><span class="o">.</span><span class="na">mock</span><span class="o">(</span><span class="nc">UserGroupInformation</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="nc">PowerMockito</span><span class="o">.</span><span class="na">when</span><span class="o">(</span><span class="n">mock</span><span class="o">.</span><span class="na">getUserName</span><span class="o">()).</span><span class="na">thenReturn</span><span class="o">(</span><span class="s">"tester"</span><span class="o">);</span>
<span class="nc">PowerMockito</span><span class="o">.</span><span class="na">spy</span><span class="o">(</span><span class="nc">UserGroupInformation</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="nc">PowerMockito</span><span class="o">.</span><span class="na">doReturn</span><span class="o">(</span><span class="n">mock</span><span class="o">).</span><span class="na">when</span><span class="o">(</span><span class="nc">UserGroupInformation</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="s">"getCurrentUser"</span><span class="o">);</span>
</code></pre></div></div>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="java" /><summary type="html"><![CDATA[테스트 코드는 한 메소드만 테스트 하는 것이 좋지만, 해당 메소드가 다른 메소드를 사용하거나 할 때에는 어디까지 고려를 해야할지 애매할 때가 많다. 다음 예시를 보자]]></summary></entry><entry><title type="html">MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings</title><link href="https://sungminoh.github.io/posts/development/MRNet-Product2Vec/" rel="alternate" type="text/html" title="MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings" /><published>2018-07-29T00:00:00+00:00</published><updated>2018-07-29T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/MRNet-Product2Vec</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/MRNet-Product2Vec/"><![CDATA[<p>2017년 9월에 나온 General한 Content-base Product embedding을 만드는 방법에 대한 아마존의 논문이다.</p>

<p>이 논문에서는 사용자의 View sequence, Purchase sequence와 같은 데이터를 사용하지 않고, 상품명, 상품설명, 카테고리와 같은 Product가 가진 고유한 정보들만을 사용해 임베딩을 만드는 방법을 다룬다.</p>

<p>이런 방법은 use behavior와 이 없기 때문에, 새로 등록된 상품이더라도 임베딩을 가질 수 있고, Cold starter problem을 해결하는데 쓰일 수 있다.</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<h2 id="introduction">Introduction</h2>

<p>간단히 상품명을 파싱하여 one hot encoding하고 이를 이용하여 product vector 즉 embedding을 만든다고 해보자.</p>

<p>이렇게 Product embedding을 만들면 각 벡터의 크기는 단어의 종류 갯수만큼이 될 것이고(high dimension) 벡터의 element값은 대부분 0일 것(sparse)이다. 이렇게 만든 피처를 카테고리분류나 추천과 같은 문제에 사용할 수도 있겠지만, 고차원 피처는 계산도 비효율적일 뿐더러 오버피팅을 야기할 가능성이 높다. 또, Nearest Neighbor search를 하기엔 단어만 다르면 완전 다른 디멘전에 있게 되니 별 의미도 없고, 딥러닝에 쓰자니 파라메터터수가 너무많아지는 문제가 있다. 이런 방법으로는 다른 문제에 범용적으로 쓸만한, 상품을 대변할만한 embedding을 얻는데엔 한계가 있다.</p>

<p>e-commerce에서 필요로 하는 유용한 product embedding이란 모름지기 <strong>1. 상품의 범용적인 시그널들을 담고 있으면서, 2. 다른 상품관련 ML모델에 사용할수 있어야 할 것</strong>이다.</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<h2 id="proposed-approach">Proposed Approach</h2>

<p>이 논문에서는 MRNet-Product2Vector(<strong>M</strong>ulti-task <strong>R</strong>ecurrent Neural <strong>Net</strong>work <strong>Product to Vector</strong>) 라 이름 붙인 방법을 제안한다.</p>

<p>구조를 간단히 요약하자면 다음과 같다.</p>

<ol>
  <li>상품명에 쓰인 단어들을 <strong>word2vec로 벡터화</strong>한다. (<a href="https://radimrehurek.com/gensim/">Gensim</a> 으로 10번이상 등장한 단어들을 128차원 벡터로 만든다.)</li>
  <li>이를 Bidirectional RNN의 인풋으로 삼고 <strong>RNN의 결과로</strong> 나온 두개의 벡터(Bidirectional 이니까 양쪽에서 두개)를 concat하여 <strong>길이 \(d\) 인 embedding을 만든다.</strong></li>
  <li>이 임베딩을 여러개의 분류, 회귀 문제를 푸는 모델의 인풋으로 삼는다. (총 15개의 task를 사용 한다.)</li>
  <li>이러한 모델을 각각의 PG마다 학습시킨다. (총 23개 PG. 단계 2의 임베딩은 PG-specific embedding이라 할 수 있다.)</li>
  <li>다른 PG에 해당하는 부분을 0으로 하여 임베딩을 확장하고, \(2d\) fully connected hidden layer를 가진 <strong>sparse autoencoder를 학습</strong>시킨다. <strong>이때의 길이 \(2d\) 임베딩을 PG-agnostic embedding으로 한다.</strong></li>
</ol>

<p><br /></p>

<p><br /></p>

<h4 id="bidirectional-recurrent-neural-network">Bidirectional Recurrent Neural Network</h4>

<p>Bidirectional RNN로는 <a href="https://en.wikipedia.org/wiki/Long_short-term_memory">LSTM</a>을 사용해 <a href="https://en.wikipedia.org/wiki/Backpropagation_through_time">BPTT</a>로 학습한다. 아래 그림처럼 word vector들은 순방향, 역방향으로 RNN의 인풋으로 들어가고, 그 결과를 이어붙여 embedding을 만든다</p>

<p><br /></p>

<p><img src="/assets/img/2018-07-29-MRNet-Product2Vec/MRNet-Product2Vec-for-PG-specific.png" alt="MRNet-Product2Vec for PG-specific" /></p>

<p>Fig.1: (a) MRNet-Product2Vec for PG-specific</p>

<p><br /></p>

<p>좀 더 자세히 설명하자면, $t$ 번째 forward hidden layer 아웃풋을 $h^f_t$ 라 하고, backword hidden layer의 아웃풋을 $h^b_t$ 라 한다면, 이 둘은 아래와 같이 표현할 수 있고</p>

\[h^f_t = \phi(W^f x_t + U^f h^f_{t-1}) \\
h^b_t = \phi(W^b x_t + U^b h^b_{t-1}) \\
\text{where }U^f \text{ and } U^b \text{ are the recursive weight matrices.}\\
\phi \text{ is nonlinearity such as tanh or RELU.}\]

<p>우리가 사용할 최종 임베딩은 \(h_T = [h^f_T,h^b_T]\) 라 할 수 있다.</p>

<p><br /></p>

<p><br /></p>

<h4 id="sub-tasks">Sub tasks</h4>

<p>상품의 속성에는 카테고리, 크기, 재료 등과 같이 시간에 따라 변하지 않는 속성들도 있고, 가격, 조회수, 리뷰 등과 같은 시간에 따라 변하는 속성들도 있다. 이러한 속성들을 모두 맞추도록 학습 태스크들을 구성하였지만, 이 태스크들이 프로덕트에 대한 모든 정보를 담지는 못하기 때문에, 아무래도 학습에 사용하지 않은 새로운 태스크의 경우 임베딩이 제 역할을 못할 가능성도 있다. 학습에 사용한 태스크의 목록은 아래와 같다.</p>

<p><br /></p>

<p>Table 1: Tasks used to train MRNet-Product2Vec.</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Static</th>
      <th>Dynamic</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Classification</td>
      <td>Color, Size, Material, Subcategory, Item Type, Hazardous, Batteries, High Value, Target Gender</td>
      <td>Offer, Review</td>
    </tr>
    <tr>
      <td>Regression</td>
      <td>Weight</td>
      <td>Price, View Count</td>
    </tr>
    <tr>
      <td>Decoding</td>
      <td>TF-IDF representation (5000 dim.)</td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p><br /></p>

<p>Loss는 각 서브태스크의 로스의 합으로 정의한다. 분류문제의 경우, 임베딩에 softmax를 적용시켜 cross-entropy loss를 계산하고, 회귀문제는 스칼라 값으로 변환하여 squared loss를 구한다. 디코딩 태스크도 마찬가지로 5000차원으로 변환 후(\(o_n = W_nh_T + b_n\)) TF-IDF표현과 비교하여 squared loss를 계산한다.</p>

<p>하지만, 각 인풋마다 15개의 모든 태스크의 로스를 구하여 학습(Joint Optimization)을 시키지 않고, <strong>랜덤하게 하나의 태스크를 골라 학습(Alternating Optimization)</strong>시키는 방법을 사용한다. 왜냐하면 태스크에 따라 label이 없을 수도 있기 때문이다. 다만, 이렇게 할 때엔 각 태스크를 균등하게 사용하여 학습이 편향되지 않도록 유의해야한다.</p>

<p><br /></p>

<p><br /></p>

<h4 id="pgproduct-group-agnostic-embeddings">PG(Product Group) Agnostic Embeddings</h4>

<p>사실 처음부터 PG에 상관없이 모델을 구성할 수도 있었을 것이다. 하지만 PG에 따라 가격분포, 재료, 크기 등의 속성이 워낙 다르기 때문에, PG 내부의 속성을 섬세하게 학습하지 못할 것이다. 따라서 총 23개의 <strong>PG마다 각각 PG-specific embedding을 학습시키고, sparse autoencoder 를 통해 PG-agnostic embedding을 따로 만든다.</strong></p>

<p><br /></p>

<p><img src="/assets/img/2018-07-29-MRNet-Product2Vec/Sparse-Autoencoder-for-PG-agnostic-embeddings.png" alt="Sparse Autoencoder for PG-agnostic embeddings" /></p>

<p>Fig.1: (b) Sparse Autoencoder for PG-agnostic embeddings</p>

<p><br /></p>

<p>autoencoder는 PG-specific embedding을 확장한 벡터를 인풋으로 받는다.</p>

\[\text{Input} = [0,...,0,e^p_1, ...,e^p_d,0,...,0] \text{ (length of } P*d \text{)}\\
P: \text{the number of PGs}\\
d: \text{the length of PG-specific embedding}\]

<p>중간에는 PG-specific embedding의 2배 길이의 layer를 두고, 이 \(2d\) 길이의 벡터를 PG-agnostic embedding으로 한다.</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<h2 id="experimental-results">Experimental Results</h2>

<p>각 PG당 최대 1M개의 학습데이터로 PG-specific model을 학습시킨다. Grid K520 GPU 하나를 사용했을때 한 에포크당 30분여가 걸린다.</p>

<p>PG-agnostic embedding을 만드는데에는 각 PG당 랜덤하게 500K개를 뽑아(총 11.5M개) 학습데이터로 사용한다. 한 에포크당 20분여가 걸린다.</p>

<p><br /></p>

<p>학습에 사용한 것과 다른 5개의 분류 문제로 검증한다.</p>

<ul>
  <li>Plugs: 플러그의 유무</li>
  <li>SIOC(<strong>S</strong>hip <strong>I</strong>n its <strong>O</strong>wn <strong>C</strong>ontainer): 개별박스 포장 유무</li>
  <li>Ingestible: 식용가능 여부</li>
  <li>Browse Categories: PG 내의 서브카테고리로의 분류</li>
</ul>

<p>여기 상품명이 겹치지 않더라도 유사한 상품을 잘 잡아내는지 확인하기 위한 검증도 한다.</p>

<ul>
  <li>SIOC(unseen): 상품명에 겹치는 단어가 특정 threshold보다 적도록 학습데이터와 검증데이터를 나누어 SIOC를 수핸한다.</li>
</ul>

<p>SIOC(unseen)는 AUC를 구하고, 위 네가지 분류문제에 대해서는 five-fold validatoin의 average AUC를 구한다.</p>

<p><br /></p>

<p>Table 2: Results on five classification tasks. RF: Random Forest, LR: Logistic Regression. TF-IDF dim.: &gt;10K, MRNet-Product2Vec dim.: 256 and 128. All numbers are relative w.r.t TF-IDF-LR.</p>

<table>
  <thead>
    <tr>
      <th>Task</th>
      <th>MRNet-RF</th>
      <th>MRNet-LR</th>
      <th>TF-IDF-RF</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Plugs</td>
      <td>-2.8%</td>
      <td>-9.72%</td>
      <td>-2.8%</td>
    </tr>
    <tr>
      <td>SIOC</td>
      <td>-5.81%</td>
      <td>-18.60%</td>
      <td>-9.3%</td>
    </tr>
    <tr>
      <td>Browse Categories</td>
      <td>-16.67%</td>
      <td>-26.38%</td>
      <td>-25.0%</td>
    </tr>
    <tr>
      <td>Ingestible</td>
      <td>0%</td>
      <td><strong>+2.15%</strong></td>
      <td>-11.8%</td>
    </tr>
    <tr>
      <td>SIOC (unseen)</td>
      <td><strong>+10%</strong></td>
      <td>0%</td>
      <td>-3.33%</td>
    </tr>
  </tbody>
</table>

<p><br /></p>

<p>결과가 나쁘게 보일수도 있는데, TF-IDF 3% 수준의 dimension임을 생각하면 꽤나 좋다.</p>

<p>Nearest Neighbor결과도 나쁘지 않다.</p>

<p><br /></p>

<p><img src="/assets/img/2018-07-29-MRNet-Product2Vec/Nearest-neighbors-computed-using-MRNet-Product2Vec.png" alt="Nearest neighbors computed using MRNet-Product2Vec" /></p>

<p>Fig. 2: Nearest neighbors computed using MRNet-Product2Vec for each query product (first column) (best viewed in electronic copy).</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<h2 id="pg-agnostic-mrnet-product2vec">PG Agnostic MRNet-Product2Vec</h2>

<p>글로벌 e-commerce에서는, 언어에 상관없이 상품에 대한 정보를 비교 검색해야하는 상황이 있을 수 있다. 예를들어 셀러 입장에서는 상품을 등록할 때 옆 나라에서는 유사한 어떤 제품이 어떻게 판매되고 있는지 알고 싶을 수 있고, 고객 입장에서는 내가 좋아하는 상품과 비슷한 상품이 다른 나라엔 어떤게 있나 찾아보고 싶을 수 있다. 또, 새로운 ML모델을 학습하는데 있어서는 언어에 따라 라벨 데이터가 없을 수도 있다.</p>

<p>따라서 product embedding은 언어에 상관이 없어야 한다. 여기서는 multimodal autoencoder를 변형하여 특정 언어로 학습한 product embedding을 common space로 옮겨본다.</p>

<p>영어로 학습한 길이 256인 p번째 프로덕트 임베딩을 \(x^{UK}_p\) 라 하고, 같은 상품에 대해 프랑스어로 학습한 길이 256 임베딩을 \(x^{FR}_p\) 라 할때 다음 세가지 데이터셋을 만들어 autoencoder를 학습시킨다.</p>

<ol>
  <li>
\[x_p = [x^{UK}_p, 0], \space y_p = [0,x^{FR}_p]\]
  </li>
  <li>
\[x_p = [0,x^{FR}_p], \space y_p = [x^{UK}_p, 0]\]
  </li>
  <li>
\[x_p = [x^{UK}_p, x^{FR}_p], \space y_p = [x^{UK}_p, x^{FR}_p]\]
  </li>
</ol>

<p><br /></p>

<p><img src="/assets/img/2018-07-29-MRNet-Product2Vec/Architecture-of-Multimodal-Autoencoder.png" alt="Architecture of Multimodal Autoencoder" /></p>

<p>Fig. 3: Language Agnostic MRNet-Product2Vec (a) Architecture of Multimodal Autoencoder</p>

<p><br /></p>

<p><img src="/assets/img/2018-07-29-MRNet-Product2Vec/Nearest-neighbors-from-UK-products.png" alt="Nearest neighbors from UK products" /></p>

<p>Fig. 3: Language Agnostic MRNet-Product2Vec (b) Nearest neighbors from UK products</p>

<p><br /></p>

<p><br /></p>

<p><br /></p>

<h2 id="discussion-and-future-work">Discussion and Future Work</h2>

<p>2B 개의 프로덕트에 대해서 학습시켜서 잘 사용하고 있다.</p>

<p>상품명만 사용하기 때문에 cold-starter 문제를 해결할 수 있다.</p>

<p>태스크는 원하면 더 추가할 수 있다는 장점이 있다.</p>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="paper" /><category term="ml" /><summary type="html"><![CDATA[2017년 9월에 나온 General한 Content-base Product embedding을 만드는 방법에 대한 아마존의 논문이다.]]></summary></entry><entry><title type="html">Ubuntu EC2 세팅하기</title><link href="https://sungminoh.github.io/posts/development/setup-new-instance/" rel="alternate" type="text/html" title="Ubuntu EC2 세팅하기" /><published>2018-07-18T00:00:00+00:00</published><updated>2018-07-18T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/setup-new-instance</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/setup-new-instance/"><![CDATA[<h3 id="password">Password</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># 루트계정에 패스워드 지정
passwd root
sudo vi /etc/ssh/sshd_config

# 아래 두개 옵션을 yes로 하고 저장후 닫기
PermitRootLogin yes
PasswordAuthentication yes

# sshd 재시작
sudo service ssh restart
</code></pre></div></div>

<h3 id="jenkins"><a href="https://wiki.jenkins.io/display/JENKINS/Installing+Jenkins+on+Ubuntu">jenkins</a></h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget -q -O - https://pkg.jenkins.io/debian/jenkins-ci.org.key | sudo apt-key add -
sudo sh -c 'echo deb http://pkg.jenkins.io/debian-stable binary/ &gt; /etc/apt/sources.list.d/jenkins.list'
sudo apt-get update
sudo apt-get install jenkins
</code></pre></div></div>

<h3 id="hostname">hostname</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># /ect/hosts
127.0.1.1 www.hostname.com

# /etc/hostname
www.hostname.com

sudo hostnamectl set-hostname www.hostname.com
</code></pre></div></div>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="linux" /><summary type="html"><![CDATA[Password ``` 루트계정에 패스워드 지정 passwd root sudo vi /etc/ssh/sshd_config]]></summary></entry><entry><title type="html">우분투에 파이썬 설치하기</title><link href="https://sungminoh.github.io/posts/development/install-python-in-ubuntu/" rel="alternate" type="text/html" title="우분투에 파이썬 설치하기" /><published>2018-07-08T00:00:00+00:00</published><updated>2018-07-08T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/install-python-in-ubuntu</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/install-python-in-ubuntu/"><![CDATA[<h2 id="python-27--pip2-설치">python 2.7 &amp; pip2 설치</h2>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt-get <span class="nb">install </span>python-software-properties <span class="nt">-y</span>
</code></pre></div></div>

<p>패키지를 다운받기 위한 pip를 설치한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt-get <span class="nb">install </span>python-pip <span class="nt">-y</span>
pip list
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
You are using pip version 8.1.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
</code></pre></div>  </div>
</blockquote>

<p><br /></p>

<p>업그레이드 하란 대로 업그레이드하면 <code class="language-plaintext highlighter-rouge">can not import name main</code> 하는 에러가 뜬다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install</span> <span class="nt">--upgrade</span> pip
pip list
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in &lt;module&gt;
    from pip import main
ImportError: cannot import name main
</code></pre></div>  </div>
</blockquote>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python <span class="nt">-m</span> pip <span class="nt">--version</span>
pip 10.0.1 from /home/ubuntu/.local/lib/python2.7/site-packages/pip <span class="o">(</span>python 2.7<span class="o">)</span>
</code></pre></div></div>

<p><br /></p>

<p><code class="language-plaintext highlighter-rouge">pip install --upgrade pip</code> 를 하면, <code class="language-plaintext highlighter-rouge">~/.local</code> 에 pip가 설치되는데, <code class="language-plaintext highlighter-rouge">pip</code> 명령어는 <code class="language-plaintext highlighter-rouge">/usr/bin</code> 에 있는 예전 pip를 사용하기 때문이다.</p>

<p>확인해보자.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ls</span> <span class="nt">-l</span> /usr/bin | <span class="nb">grep </span>pip
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-rwxr-xr-x  1 root   root     292 Nov 10  2016 pip
-rwxr-xr-x  1 root   root     283 Nov 10  2016 pip2
</code></pre></div>  </div>
</blockquote>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">head</span> /usr/bin/pip2
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/usr/bin/python
# EASY-INSTALL-ENTRY-SCRIPT: 'pip==8.1.1','console_scripts','pip2'
__requires__ = 'pip==8.1.1'
import sys
from pkg_resources import load_entry_point

if __name__ == '__main__':
    sys.exit(
        load_entry_point('pip==8.1.1', 'console_scripts', 'pip2')()
    )
</code></pre></div>  </div>
</blockquote>

<p><br /></p>

<p>그러므로 <code class="language-plaintext highlighter-rouge">sudo pip install --upgrade pip</code> 로 업그레이드 하도록 하자. 그러면 pip명령어는 <code class="language-plaintext highlighter-rouge">usr/bin/</code>에 있는 pip들을 무시하고,  <code class="language-plaintext highlighter-rouge">/usr/local/bin/</code>에 새로 생긴 excutable 을 실행시킨다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>which pip
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/usr/local/bin/pip
</code></pre></div>  </div>
</blockquote>

<p><br />
<br /></p>

<h2 id="python-36--pip3-설치">python 3.6 &amp; pip3 설치</h2>

<p>만약 Ubuntu 16.10 이전 버전을 사용하고 있다면, python3은 3.5버전이 설치된다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ls</span> <span class="nt">-l</span> /usr/bin | <span class="nb">grep </span>python
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
lrwxrwxrwx 1 root   root           9 Mar 23  2016 python3 -&gt; python3.5
-rwxr-xr-x 2 root   root     4464400 Nov 28  2017 python3.5
...
</code></pre></div>  </div>
</blockquote>

<p>pip3도 설치하고 업그레이드한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt <span class="nb">install </span>python3-pip <span class="nt">-y</span>
<span class="nb">sudo </span>pip3 <span class="nb">install</span> <span class="nt">--upgrade</span> pip  <span class="c"># sudo로 업그레이드한다.</span>
</code></pre></div></div>

<p><br /></p>

<p>아래와 같은 멋진 기능을 쓰기 위해서 python 3.6을 설치해보자.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">f</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s">(</span><span class="si">{</span><span class="n">args</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">kwargs</span><span class="si">}</span><span class="s">) is called'</span><span class="p">)</span>
</code></pre></div></div>

<p>그냥 PPA를 추가해주고 설치하면된다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>add-apt-repository ppa:jonathonf/python-3.6
<span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get <span class="nb">install </span>python3.6 <span class="nt">-y</span>
</code></pre></div></div>

<p><br /></p>

<p>기본 python3을 python3.6으로 바꿔주고,</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo rm</span> <span class="nt">-f</span> /usr/bin/python3
<span class="nb">sudo ln</span> <span class="nt">-s</span> /usr/bin/python3.6 /usr/bin/python3
</code></pre></div></div>

<p>pip 도 python3.6으로 바꿔준다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://bootstrap.pypa.io/get-pip.py | <span class="nb">sudo </span>python3.6
</code></pre></div></div>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="python" /><category term="linux" /><summary type="html"><![CDATA[python 2.7 &amp; pip2 설치]]></summary></entry><entry><title type="html">모듈별 로거 사용하기</title><link href="https://sungminoh.github.io/posts/development/python-module-logger/" rel="alternate" type="text/html" title="모듈별 로거 사용하기" /><published>2018-07-08T00:00:00+00:00</published><updated>2018-07-08T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/python-module-logger</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/python-module-logger/"><![CDATA[<p>로거를 만들어주는 함수를 따로 작성한다</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">get_logger</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">logfile</span><span class="o">=</span><span class="s">'/dev/null'</span><span class="p">,</span> <span class="n">level</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">stream</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
    <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="n">getLogger</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">level</span><span class="p">)</span>
    <span class="n">formatter</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="n">Formatter</span><span class="p">(</span><span class="s">'%(asctime)s %(levelname)-5s %(lineno)4s:%(filename)-20s - %(message)s'</span><span class="p">)</span>
    <span class="c1"># file handler
</span>    <span class="n">file_handler</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="n">FileHandler</span><span class="p">(</span><span class="n">logfile</span><span class="p">)</span>
    <span class="n">file_handler</span><span class="p">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">level</span><span class="p">)</span>
    <span class="n">file_handler</span><span class="p">.</span><span class="n">setFormatter</span><span class="p">(</span><span class="n">formatter</span><span class="p">)</span>
    <span class="n">logger</span><span class="p">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">file_handler</span><span class="p">)</span>
    <span class="c1"># stream handler
</span>    <span class="k">if</span> <span class="n">stream</span><span class="p">:</span>
        <span class="n">stream_handler</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="n">StreamHandler</span><span class="p">()</span>
        <span class="n">stream_handler</span><span class="p">.</span><span class="n">setLevel</span><span class="p">(</span><span class="n">level</span><span class="p">)</span>
        <span class="n">stream_handler</span><span class="p">.</span><span class="n">setFormatter</span><span class="p">(</span><span class="n">formatter</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="n">addHandler</span><span class="p">(</span><span class="n">stream_handler</span><span class="p">)</span>
    <span class="c1"># binding
</span>    <span class="k">return</span> <span class="n">logger</span>
</code></pre></div></div>

<p><br /></p>

<p>각 모듈의 상단에서 아래와 같이 로거를 생성해 사용한다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span> <span class="o">=</span> <span class="n">get_logger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
</code></pre></div></div>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="python" /><summary type="html"><![CDATA[로거를 만들어주는 함수를 따로 작성한다]]></summary></entry><entry><title type="html">Decorator로 함수 실행시간 기록하기</title><link href="https://sungminoh.github.io/posts/development/python-timer-decorator/" rel="alternate" type="text/html" title="Decorator로 함수 실행시간 기록하기" /><published>2018-07-08T00:00:00+00:00</published><updated>2018-07-08T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/python-timer-decorator</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/python-timer-decorator/"><![CDATA[<h2 id="decorator로-함수-실행시간-로깅하고-이메일-알람보내기">Decorator로 함수 실행시간 로깅하고, 이메일 알람보내기</h2>

<p>데이터 전처리나 학습을 시켜놓고 며칠 뒤 결과를 확인하는 경우가 많은데, 종종 중간에 작업이 실패하기도 하고 생각보다 빨리 끝나있을 때도 있다.</p>

<p>만약 돌려놓은 함수가 끝났는지, 끝났다면 성공적으로 끝났는지 실패했는지 바로 알 수 있다면, 다음 작업을 이어서 수행하거나 디버깅후 재실행 할 수 있을 것이다.</p>

<p>이번 글에서는 함수의 실행시간을 기록하고, 나아가 이메일 알림까지 보내주는 데코레이터를 만들어본다.</p>

<h3 id="1단계-함수-실행시간을-기록해주는-아주-간단한-데코레이터를-만들어보자">1단계, 함수 실행시간을 기록해주는 아주 간단한 데코레이터를 만들어보자.</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span>
<span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">wraps</span>

<span class="k">def</span> <span class="nf">timed</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
    <span class="o">@</span><span class="n">wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="n">start</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
        <span class="n">result</span> <span class="o">=</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'Success. </span><span class="si">{</span><span class="n">end</span><span class="o">-</span><span class="n">start</span><span class="si">}</span><span class="s"> taken for </span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">result</span>
    <span class="k">return</span> <span class="n">wrapper</span>

<span class="o">@</span><span class="n">timed</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'Hello World'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt; foo()
Hello World
Success. 0:00:00.000023 taken for foo
</code></pre></div></div>

<p><br /></p>

<p>decorator를 사용할 때에는 <code class="language-plaintext highlighter-rouge">functools.wraps</code> 으로 wrapper를 감싸지 않으면 함수의 메타정보를 잃어버리게되므로 주의한다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">nothing</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">func_name_will_be_overwritten</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="k">return</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">func_name_will_be_overwritten</span>

<span class="o">@</span><span class="n">nothing</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'Hello World'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt; print(foo.__name__)
func_name_will_be_overwritten
</code></pre></div></div>

<p><br /></p>

<p><br /></p>

<h3 id="2단계-별도의-로거를-파라메터로-받아-로깅한다">2단계, 별도의 로거를 파라메터로 받아 로깅한다.</h3>

<p>파라메터를 별도로 받는 decorator를 작성하려면 한겹 더 감싸준다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">timed</span><span class="p">(</span><span class="n">logger</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">level</span><span class="o">=</span><span class="mi">10</span><span class="p">):</span>
    <span class="n">log</span> <span class="o">=</span> <span class="k">print</span>
    <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">logging</span><span class="p">.</span><span class="n">getLoggerClass</span><span class="p">()):</span>
        <span class="n">log</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">m</span><span class="p">:</span> <span class="n">logger</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">level</span><span class="p">,</span> <span class="n">m</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">decorator</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
        <span class="o">@</span><span class="n">wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
        <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
            <span class="n">start</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
            <span class="n">result</span> <span class="o">=</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
            <span class="n">end</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
            <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s">'Success. </span><span class="si">{</span><span class="n">end</span><span class="o">-</span><span class="n">start</span><span class="si">}</span><span class="s"> taken for </span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
            <span class="k">return</span> <span class="n">result</span>
        <span class="k">return</span> <span class="n">wrapper</span>
    <span class="k">return</span> <span class="n">decorator</span>

<span class="o">@</span><span class="n">timed</span><span class="p">(</span><span class="n">logger</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'Hello World'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt; foo()
Hello World
2018-07-08 16:02:23,926 DEBUG    4:test.py - Success. 0:00:00.000025 taken for foo
</code></pre></div></div>

<p>이렇게 별도의 파라메터를 받는 데코레이터는 안타깝게도 항상 <code class="language-plaintext highlighter-rouge">()</code> 가 필요하다. 기본 파라메터를 사용할 때엔 괄호 없이 사용하고 싶다면 아래와 같이 편법을 사용하는 수 밖에 없다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">decorator_with_optional_parameter</span><span class="p">(</span><span class="n">param</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">decorator</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
        <span class="o">@</span><span class="n">wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
        <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
            <span class="k">print</span><span class="p">(</span><span class="n">param</span><span class="p">)</span>
            <span class="k">return</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">wrapper</span>
    <span class="k">if</span> <span class="nb">callable</span><span class="p">(</span><span class="n">param</span><span class="p">):</span>  <span class="c1"># params가 callable이면 ()없이 쓰여서 함수가 바로 들어왔다고 가정한다.
</span>        <span class="k">return</span> <span class="n">decorator</span><span class="p">(</span><span class="n">param</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">decorator</span>

<span class="o">@</span><span class="n">decorator_with_optional_parameter</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'Hello'</span><span class="p">)</span>

<span class="o">@</span><span class="n">decorator_with_optional_parameter</span>
<span class="k">def</span> <span class="nf">bar</span><span class="p">():</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'World'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt; foo()
None
Hello
&gt;&gt;&gt; bar()
&lt;function bar at 0x10778ea60&gt;
World
</code></pre></div></div>

<p><br /></p>

<p><br /></p>

<h3 id="3단계-caller-parameter와-같은-좀-더-많은-정보를-기록한다">3단계, caller, parameter와 같은 좀 더 많은 정보를 기록한다.</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">inspect</span> <span class="kn">import</span> <span class="n">getframeinfo</span><span class="p">,</span> <span class="n">currentframe</span>

<span class="k">def</span> <span class="nf">timed</span><span class="p">(</span><span class="n">logger</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">level</span><span class="o">=</span><span class="mi">10</span><span class="p">):</span>
    <span class="n">log</span> <span class="o">=</span> <span class="k">print</span>
    <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">logging</span><span class="p">.</span><span class="n">getLoggerClass</span><span class="p">()):</span>
        <span class="n">log</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">m</span><span class="p">:</span> <span class="n">logger</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">level</span><span class="p">,</span> <span class="n">m</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">decorator</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
        <span class="o">@</span><span class="n">wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
        <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
            <span class="c1"># 파라메터 정보를 출력하기 위한 정보를 구성한다.
</span>            <span class="n">parameters</span> <span class="o">=</span> <span class="p">[</span><span class="nb">repr</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">args</span><span class="p">]</span>
            <span class="k">if</span> <span class="n">kwargs</span><span class="p">:</span>
                <span class="n">parameters</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">kwargs</span><span class="p">))</span>
            <span class="n">funcname</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s">(</span><span class="si">{</span><span class="s">', '</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">parameters</span><span class="p">)</span><span class="si">}</span><span class="s">)"</span>

            <span class="c1"># caller 정보를 출력한다.
</span>            <span class="n">caller</span> <span class="o">=</span> <span class="n">getframeinfo</span><span class="p">(</span><span class="n">currentframe</span><span class="p">().</span><span class="n">f_back</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">caller</span><span class="p">:</span>
                <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">funcname</span><span class="si">}</span><span class="s"> is called by </span><span class="si">{</span><span class="n">caller</span><span class="p">.</span><span class="n">function</span><span class="si">}</span><span class="s"> in </span><span class="si">{</span><span class="n">caller</span><span class="p">.</span><span class="n">lineno</span><span class="si">}</span><span class="s">:</span><span class="si">{</span><span class="n">caller</span><span class="p">.</span><span class="n">filename</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
            <span class="k">else</span><span class="p">:</span>
                <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">funcname</span><span class="si">}</span><span class="s"> is called'</span><span class="p">)</span>

            <span class="c1"># 시간을 기록하고 함수를 실행한다.
</span>            <span class="n">start</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
            <span class="n">result</span> <span class="o">=</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
            <span class="n">end</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
            <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s">'Success. </span><span class="si">{</span><span class="n">end</span><span class="o">-</span><span class="n">start</span><span class="si">}</span><span class="s"> taken for </span><span class="si">{</span><span class="n">funcname</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
            <span class="k">return</span> <span class="n">result</span>
        <span class="k">return</span> <span class="n">wrapper</span>
    <span class="k">return</span> <span class="n">decorator</span>

<span class="o">@</span><span class="n">timed</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'Hello World </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt; foo('John')
foo('John') is called by &lt;module&gt; in 40:test.py
Hello World John
Success. 0:00:00.000013 taken for foo('John')
</code></pre></div></div>

<p><br /></p>

<p><br /></p>

<h3 id="4단계-함수-성공실패시-이메일을-보내도록-한다">4단계, 함수 성공/실패시 이메일을 보내도록 한다.</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">traceback</span>

<span class="k">def</span> <span class="nf">timed</span><span class="p">(</span><span class="n">logger</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">level</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">notify</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
    <span class="n">log</span> <span class="o">=</span> <span class="k">print</span>
    <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">logger</span><span class="p">,</span> <span class="n">logging</span><span class="p">.</span><span class="n">getLoggerClass</span><span class="p">()):</span>
        <span class="n">log</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">m</span><span class="p">:</span> <span class="n">logger</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">level</span><span class="p">,</span> <span class="n">m</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">decorator</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
        <span class="o">@</span><span class="n">wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
        <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
            <span class="c1"># 파라메터 정보를 출력하기 위한 정보를 구성한다.
</span>            <span class="n">parameters</span> <span class="o">=</span> <span class="p">[</span><span class="nb">repr</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">args</span><span class="p">]</span>
            <span class="k">if</span> <span class="n">kwargs</span><span class="p">:</span>
                <span class="n">parameters</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">kwargs</span><span class="p">))</span>
            <span class="n">funcname</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">func</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s">(</span><span class="si">{</span><span class="s">', '</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">parameters</span><span class="p">)</span><span class="si">}</span><span class="s">)"</span>

            <span class="c1"># caller 정보를 출력한다.
</span>            <span class="n">caller</span> <span class="o">=</span> <span class="n">getframeinfo</span><span class="p">(</span><span class="n">currentframe</span><span class="p">().</span><span class="n">f_back</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">caller</span><span class="p">:</span>
                <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">funcname</span><span class="si">}</span><span class="s"> is called by </span><span class="si">{</span><span class="n">caller</span><span class="p">.</span><span class="n">function</span><span class="si">}</span><span class="s"> in </span><span class="si">{</span><span class="n">caller</span><span class="p">.</span><span class="n">lineno</span><span class="si">}</span><span class="s">:</span><span class="si">{</span><span class="n">caller</span><span class="p">.</span><span class="n">filename</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
            <span class="k">else</span><span class="p">:</span>
                <span class="n">log</span><span class="p">(</span><span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">funcname</span><span class="si">}</span><span class="s"> is called'</span><span class="p">)</span>

            <span class="c1"># 시간을 기록하고 함수를 실행한다.
</span>            <span class="n">start</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
            <span class="c1"># 함수의 성패에 따라 적절한 메일을 보낸다.
</span>            <span class="k">try</span><span class="p">:</span>
                <span class="n">result</span> <span class="o">=</span> <span class="n">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
            <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
                <span class="n">end</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
                <span class="n">msg</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'Fail. </span><span class="si">{</span><span class="n">end</span><span class="o">-</span><span class="n">start</span><span class="si">}</span><span class="s"> taken for </span><span class="si">{</span><span class="n">funcname</span><span class="si">}</span><span class="s">: '</span>
                <span class="n">log</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
                <span class="n">send_email</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span> <span class="n">traceback</span><span class="p">.</span><span class="n">format_exc</span><span class="p">())</span>
                <span class="k">raise</span> <span class="n">e</span>
            <span class="n">end</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
            <span class="n">msg</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'Success. </span><span class="si">{</span><span class="n">end</span><span class="o">-</span><span class="n">start</span><span class="si">}</span><span class="s"> taken for </span><span class="si">{</span><span class="n">funcname</span><span class="si">}</span><span class="s">'</span>
            <span class="n">log</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span>
            <span class="n">send_email</span><span class="p">(</span><span class="n">msg</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">result</span><span class="p">))</span>
            <span class="k">return</span> <span class="n">result</span>
        <span class="k">return</span> <span class="n">wrapper</span>
    <span class="k">return</span> <span class="n">decorator</span>

<span class="o">@</span><span class="n">timed</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'Hello World </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
</code></pre></div></div>

<p><br /></p>

<p>이메일을 보내는 함수는 대충 이렇게 작성한다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">smtplib</span>
<span class="kn">from</span> <span class="nn">email.mime.text</span> <span class="kn">import</span> <span class="n">MIMEText</span>

<span class="k">def</span> <span class="nf">send_email</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="n">receivers</span><span class="p">,</span> <span class="n">subject</span><span class="p">,</span> <span class="n">content</span><span class="p">):</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="n">msg</span> <span class="o">=</span> <span class="n">MIMEText</span><span class="p">(</span><span class="n">content</span><span class="p">.</span><span class="n">encode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">),</span> <span class="s">'html'</span><span class="p">,</span> <span class="n">_charset</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">)</span>
        <span class="n">msg</span><span class="p">[</span><span class="s">'Subject'</span><span class="p">]</span> <span class="o">=</span> <span class="n">subject</span>
        <span class="n">msg</span><span class="p">[</span><span class="s">'From'</span><span class="p">]</span> <span class="o">=</span> <span class="n">sender</span>
        <span class="n">msg</span><span class="p">[</span><span class="s">'To'</span><span class="p">]</span> <span class="o">=</span> <span class="s">', '</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">receivers</span><span class="p">)</span>
        <span class="n">s</span> <span class="o">=</span> <span class="n">smtplib</span><span class="p">.</span><span class="n">SMTP</span><span class="p">(</span><span class="s">'localhost'</span><span class="p">)</span>
        <span class="n">s</span><span class="p">.</span><span class="n">sendmail</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="n">receivers</span><span class="p">,</span> <span class="n">msg</span><span class="p">.</span><span class="n">as_string</span><span class="p">())</span>
        <span class="n">s</span><span class="p">.</span><span class="n">quit</span><span class="p">()</span>
        <span class="k">print</span><span class="p">(</span><span class="s">'Successfully sent the mail to %s'</span> <span class="o">%</span> <span class="n">msg</span><span class="p">[</span><span class="s">'To'</span><span class="p">])</span>
    <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="s">'Fail to send the mail.'</span><span class="p">,</span> <span class="n">e</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt; foo()
foo() is called by &lt;module&gt; in 71:test.py
Fail. 0:00:00.000012 taken for foo():  foo() missing 1 required positional argument: 'name'
Successfully sent the mail to receiver1, receiver2
Traceback (most recent call last):
  File "test.py", line 71, in &lt;module&gt;
    foo()
  File "test.py", line 57, in wrapper
    raise e
  File "test.py", line 51, in wrapper
    result = func(*args, **kwargs)
TypeError: foo() missing 1 required positional argument: 'name'
</code></pre></div></div>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="python" /><summary type="html"><![CDATA[Decorator로 함수 실행시간 로깅하고, 이메일 알람보내기]]></summary></entry><entry><title type="html">스파크(Spark)에서 S3 이용하기</title><link href="https://sungminoh.github.io/posts/development/use-s3-for-spark/" rel="alternate" type="text/html" title="스파크(Spark)에서 S3 이용하기" /><published>2018-07-08T00:00:00+00:00</published><updated>2018-07-08T00:00:00+00:00</updated><id>https://sungminoh.github.io/posts/development/use-s3-for-spark</id><content type="html" xml:base="https://sungminoh.github.io/posts/development/use-s3-for-spark/"><![CDATA[<h2 id="스파크-설치">스파크 설치</h2>

<p>스파크를 다운로드한다 (https://spark.apache.org/downloads.html)</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl http://mirror.apache-kr.org/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz | <span class="nb">tar </span>xzf -
</code></pre></div></div>

<p><br /></p>

<p>원하는 경로로 옮기고 적절한 링크를 생성한다. (<code class="language-plaintext highlighter-rouge">opt/spark</code> 또는 <code class="language-plaintext highlighter-rouge">/usr/lib/spark</code> 마음대로)</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo mv </span>spark-2.3.0-bin-hadoop2.7 /usr/local/
<span class="nb">sudo ln</span> <span class="nt">-s</span> /usr/local/spark-2.3.0-bin-hadoop2.7 /usr/local/spark
</code></pre></div></div>

<p><br /></p>

<p>spark executable들을 커맨드로 사용할 수 있도록 등록한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>find /usr/local/spark/bin/ <span class="nt">-executable</span> <span class="nt">-type</span> f <span class="nt">-exec</span> <span class="nb">sudo ln</span> <span class="nt">-s</span> <span class="s1">'{}'</span> /usr/local/bin/ <span class="se">\;</span>
find /usr/local/bin/ <span class="nt">-executable</span> <span class="nt">-type</span> l | <span class="nb">awk</span> <span class="s1">'{print substr($0, index($0, $9))}'</span>
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>beeline -&gt; /usr/local/spark/bin/beeline
docker-image-tool.sh -&gt; /usr/local/spark/bin/docker-image-tool.sh
find-spark-home -&gt; /usr/local/spark/bin/find-spark-home
pyspark -&gt; /usr/local/spark/bin/pyspark
run-example -&gt; /usr/local/spark/bin/run-example
spark-class -&gt; /usr/local/spark/bin/spark-class
sparkR -&gt; /usr/local/spark/bin/sparkR
spark-shell -&gt; /usr/local/spark/bin/spark-shell
spark-sql -&gt; /usr/local/spark/bin/spark-sql
spark-submit -&gt; /usr/local/spark/bin/spark-submit
</code></pre></div>  </div>
</blockquote>

<p><br /></p>

<p><code class="language-plaintext highlighter-rouge">spark-shell</code> 을 실행시켜봐도, 자바를 설치하지 않았으므로 사용할 수 없다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>spark-shell
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/usr/local/bin/spark-class: line 24: /usr/local/bin/load-spark-env.sh: No such file or directory
JAVA_HOME is not set
</code></pre></div>  </div>
</blockquote>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The program 'java' can be found in the following packages:
 * default-jre
 * gcj-5-jre-headless
 * openjdk-8-jre-headless
 * gcj-4.8-jre-headless
 * gcj-4.9-jre-headless
 * openjdk-9-jre-headless
Try: sudo apt install &lt;selected package&gt;
</code></pre></div>  </div>
</blockquote>

<p><br /></p>

<p>하지만 자바를 설치하고 <code class="language-plaintext highlighter-rouge">spark-shell</code>을 실행해봐도 안된다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt-get <span class="nb">install </span>default-jre <span class="nt">-y</span>
spark-shell
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/usr/local/bin/spark-class: line 24: /usr/local/bin/load-spark-env.sh: No such file or directory
Failed to find Spark jars directory (/usr/local/assembly/target/scala-/jars).
You need to build Spark with the target "package" before running this program.
</code></pre></div>  </div>
</blockquote>

<p><br /></p>

<p><code class="language-plaintext highlighter-rouge">SPARK_HOME</code> 을 등록해준다</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo</span> <span class="s2">"export SPARK_HOME=/usr/local/spark"</span> <span class="o">&gt;&gt;</span> ~/.bashrc<span class="p">;</span> <span class="nb">source</span> ~/.bashrc
spark-shell
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2018-06-06 02:45:29 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://ip-12-345-67-890.ap-northeast-2.compute.internal:4040
Spark context available as 'sc' (master = local[*], app id = local-1528253140612).
Spark session available as 'spark'.
Welcome to
     ____              __
    / __/__  ___ _____/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 2.3.0
     /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala&gt;
</code></pre></div>  </div>

</blockquote>

<p><br /></p>

<p>이제 pyspark 를 실행하거나, python 콘솔을 띄우거나, ipython을 띄우거나, 스크립트를 작성하거나 해서 다음 코드를 돌려보자.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span><span class="p">,</span> <span class="n">SQLContext</span>
<span class="n">sqlc</span> <span class="o">=</span> <span class="n">SQLContext</span><span class="p">(</span><span class="n">SparkContext</span><span class="p">())</span>
<span class="n">sqlc</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">orc</span><span class="p">(</span><span class="s">'s3://bucket/filepath'</span><span class="p">)</span>
</code></pre></div></div>

<blockquote>
  <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Py4JJavaError: An error occurred while calling o62.orc.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
...
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
...
</code></pre></div>  </div>
</blockquote>

<p>S3A file system을 찾지 못해 s3에 있는 데이터를 읽어오지 못한다.</p>

<p><br />
<br /></p>

<h2 id="spark-에서-s3-사용하기">Spark 에서 S3 사용하기</h2>

<p><a href="https://aws.amazon.com/ko/sdk-for-java/">Java용 AWS SDK</a>를 다운로드한다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt-get <span class="nb">install </span>unzip
wget https://sdk-for-java.amazonwebservices.com/latest/aws-java-sdk.zip
unzip aws-java-sdk.zip
</code></pre></div></div>

<p><br /></p>

<p>필요한 jar를 적당한 경로로 옮긴다. (버전에 따라 파일명이 달라질 수 있다.)</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo mv </span>aws-java-sdk-1.11.342/lib/aws-java-sdk-1.11.342.jar /usr/local/spark
</code></pre></div></div>

<p><br /></p>

<p><a href="https://talend-update.talend.com/nexus/content/repositories/libraries/org/talend/libraries/hadoop-aws-2.7.3-amzn-2/6.0.0/">하둡 AWS jar</a>도 다운받는다.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget https://talend-update.talend.com/nexus/content/repositories/libraries/org/talend/libraries/hadoop-aws-2.7.3-amzn-2/6.0.0/hadoop-aws-2.7.3-amzn-2-6.0.0.jar
</code></pre></div></div>

<p><br /></p>

<p>적당한 경로로 옮긴다</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo mv </span>hadoop-aws-2.7.3-amzn-2-6.0.0.jar /usr/local/spark/
</code></pre></div></div>

<p><br /></p>

<p>이제 다음과 같이 configure를 주면, S3에서 데이터를 읽어온다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SQLContext</span><span class="p">,</span> <span class="n">SparkContext</span><span class="p">,</span> <span class="n">SparkConf</span>
<span class="n">conf</span> <span class="o">=</span> <span class="n">SparkConf</span><span class="p">().</span><span class="n">setAll</span><span class="p">([(</span><span class="s">'spark.driver.extraClassPath'</span><span class="p">,</span> <span class="s">'/usr/local/spark/hadoop-aws-2.7.3-amzn-2-6.0.0.jar:/usr/local/spark/aws-java-sdk-1.11.342.jar'</span><span class="p">)])</span>
<span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="n">conf</span><span class="o">=</span><span class="n">conf</span><span class="p">)</span>
<span class="n">sqlc</span> <span class="o">=</span> <span class="n">SQLContext</span><span class="p">(</span><span class="n">sc</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">sqlc</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">text</span><span class="p">(</span><span class="s">'s3a://my-bucket/path/to/my/data'</span><span class="p">)</span>
</code></pre></div></div>

<p><br /></p>

<p>매번 configure를 주는게 귀찮다면, <code class="language-plaintext highlighter-rouge">$SPARK_HOME/conf/spark-defaults.conf</code> 를 생성한다.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>spark.master                     local
spark.driver.extraClassPath      /usr/local/spark/hadoop-aws-2.7.3-amzn-2-6.0.0.jar:/usr/local/spark/aws-java-sdk-1.11.342.jar
</code></pre></div></div>

<p><br />
<br /></p>

<ul>
  <li>
    <p>EC2에서 사용하는거라면, 별도의 AWS_ACCESS_KEY_ID, AWS_SECRET_KEY 등을 설정할 필요가 없다.</p>
  </li>
  <li>머신 한대에서 사용할 거라면 hadoop도 없어도 되고, 하둡이나 얀 관련한 환경설정도 필요없다.</li>
  <li>spark, aws java sdk, hadoop aws jar 이 셋만 준비하자.</li>
</ul>]]></content><author><name>Sungmin</name></author><category term="posts" /><category term="development" /><category term="spark" /><category term="aws" /><summary type="html"><![CDATA[스파크 설치]]></summary></entry></feed>