<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Jeremy Cole</title>
	<atom:link href="http://blog.jcole.us/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.jcole.us</link>
	<description>Geek, electronics nerd, database nerd, father of three.</description>
	<lastBuildDate>Mon, 20 Feb 2012 19:08:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.jcole.us' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Jeremy Cole</title>
		<link>http://blog.jcole.us</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.jcole.us/osd.xml" title="Jeremy Cole" />
	<atom:link rel='hub' href='http://blog.jcole.us/?pushpress=hub'/>
		<item>
		<title>Kiva: How to make a difference for 1,000 people with only $100 per month</title>
		<link>http://blog.jcole.us/2012/02/18/kiva-how-to-make-a-difference-for-1000-people-with-only-100-per-month/</link>
		<comments>http://blog.jcole.us/2012/02/18/kiva-how-to-make-a-difference-for-1000-people-with-only-100-per-month/#comments</comments>
		<pubDate>Sun, 19 Feb 2012 03:17:30 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://jcoledotus.wordpress.com/?p=743</guid>
		<description><![CDATA[Kiva (also see my lender page) is an amazing and deceptively simple idea: People, mostly in third world countries, need loans to buy food, crops, cows, equipment, education, etc. so they get a loan from a local Kiva partner, and those loans are backed by Kiva users in $25 increments. There is no interest paid [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=743&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.kiva.org/">Kiva</a> (also see my <a href="http://www.kiva.org/lender/jeremycole">lender page</a>) is an amazing and deceptively simple idea: People, mostly in third world countries, need loans to buy food, crops, cows, equipment, education, etc. so they get a loan from a local Kiva partner, and those loans are backed by Kiva users in $25 increments. There is no interest paid to Kiva users, (although the local partners do charge some interest), so it&#8217;s not really an investment per se. </p>
<p>I&#8217;ve been a user of Kiva for a more than five years now, and have made 350 loans so far for a total of $9,400 loaned. In the first few years I only sporadically made some loans and let repaid money sit around for a long time. In the past couple of years I&#8217;ve been using Kiva more consistently, every month re-investing the full repayment amounts as soon as they come in, and usually adding 4-6 loans ($100-$150) of new money.</p>
<p>As I&#8217;ve been doing this I noticed an effect that makes perfect sense but I hadn&#8217;t considered before: Since the loans are anywhere from 9-24 months, but the repaid amounts are repaid typically monthly, if the repaid amounts are re-invested immediately, the original loan amounts stack on top of each other, allowing the same money to be invested several times over simultaneously.</p>
<p>Recently I&#8217;ve been thinking about actually quantifying that effect and figuring out what impact it could have. Since it&#8217;s not a very simple calculation, I put together a spreadsheet to calculate the full picture for me.</p>
<h2>Assumptions</h2>
<p>The following assumptions are made:</p>
<ul>
<li><strong>Amount per loan: $25.00</strong> &mdash; This is the standard loan amount on Kiva, so this just assumes you never double up on a single loan (which is not a good idea as it spreads the loss risk poorly).</li>
<li><strong>Investment per month: 4 loans, or $100</strong> (and reinvest all repayments) &mdash; This is approximately what I&#8217;ve been doing, although frequently it&#8217;s a bit more than 4.</li>
<li><strong>Average loan duration: 15 months</strong> &mdash; This is the average loan duration for my loan portfolio, and seems about average for Kiva.</li>
<li><strong>Loss rate: 2.07%</strong> &mdash; This is the actual loss rate of my portfolio, which is a bit higher than the average Kiva user at 1.09% because I tend to loan to war-torn and riskier areas.</li>
</ul>
<h2>Results</h2>
<p>After 5 years (60 months) of consistent and prompt investment, the results could be:</p>
<ul>
<li><strong>Total investment: $6,000</strong> &mdash; This is the actual amount you&#8217;ve paid out of pocket.</li>
<li><strong>Total loans made: 1,004</strong> &mdash; The number of individuals or groups helped. This is the most amazing thing, watching all of these individuals succeed due to your help.</li>
<li><strong>Total amount loaned: $25,100</strong> &mdash; The amount your $6,000 turns into after re-investment through immediate re-loaning.</li>
<li><strong>Total amount lost: $519.57</strong> &mdash; Due to a combination of loan defaults and currency exchange loss, not all of your money will be returned.</li>
<li><strong>Total amount returned: $5480.43</strong> &mdash; If you stopped making loans after the 60 months and started to withdraw your money from Kiva, at the end of it you&#8217;d get this much back (investment minus loss).</li>
</ul>
<p>Check out the <a href="https://docs.google.com/spreadsheet/ccc?key=0Ak69OGWTdlptdDhmYnFUbXk5Zmx6QnJScXFRV2lHZ3c">full calculator on Google Docs</a> for all the details and per-month amounts.</p>
<h2>Conclusion</h2>
<p>The really amazing thing with following this plan is that the Kiva borrowers themselves end up&mdash;through prompt repayment of their loans&mdash;funding each other. For me, the amount being invested each month is quite modest, and through reinvestment of the repayments, the monthly impact is huge. This month, I received almost $400 in repayments, added an additional $100, and made $500 in new loans.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/743/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/743/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/743/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/743/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/743/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/743/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/743/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/743/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/743/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/743/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/743/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/743/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/743/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/743/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=743&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2012/02/18/kiva-how-to-make-a-difference-for-1000-people-with-only-100-per-month/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>
	</item>
		<item>
		<title>Vector: A new way to think about replication performance and slave lag</title>
		<link>http://blog.jcole.us/2011/12/29/vector-a-new-way-to-think-about-replication-performance-and-slave-lag/</link>
		<comments>http://blog.jcole.us/2011/12/29/vector-a-new-way-to-think-about-replication-performance-and-slave-lag/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 03:35:10 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">https://jcoledotus.wordpress.com/?p=692</guid>
		<description><![CDATA[Background I have been considering a new way to think about, measure, graph, and monitor replication lag for a while. Previously I&#8217;ve always been primarily been using replication delay (lag), in seconds. Initially this came from SHOW SLAVE STATUS&#8216;s Seconds_Behind_Master field, later replaced by mk-heartbeat&#8216;s delay information. (These are equivalent, but mk-heartbeat is less buggy [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=692&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>Background</h2>
<p>I have been considering a new way to think about, measure, graph, and monitor replication lag for a while. Previously I&#8217;ve always been primarily been using replication delay (lag), in seconds. Initially this came from <tt>SHOW SLAVE STATUS</tt>&#8216;s <tt>Seconds_Behind_Master</tt> field, later replaced by <tt>mk-heartbeat</tt>&#8216;s delay information. (These are equivalent, but <tt>mk-heartbeat</tt> is less buggy and works to represent true lag with relay slaves in the replication topology.) Using delay in seconds to monitor replication has a few problems though:</p>
<ul>
<li>The current number of seconds delayed provides no information about how long it might take to catch up.</li>
<li>It&#8217;s very difficult to determine whether a slave is catching up at all, due to the scale. If a slave is a few thousand seconds behind it&#8217;s hard to tell whether it&#8217;s catching up, falling behind, or neither, at any particular moment.</li>
<li>If multiple slaves are graphed together, they may have widely different absolute delay values, and thus scales, and it can be very difficult to compare them, or to see that a slave is having problems, until it&#8217;s too late. Two slaves may be falling behind at the same rate, but if one is 300 seconds behind, and one is 20,000 seconds behind, the graphs are difficult to interpret.</li>
</ul>
<p>Given these problems, I determined that while we need the absolute <em>delay</em> information available, it&#8217;s not good enough by itself. I started to think about what it is that I&#8217;m really trying to determine when looking at the delay graphs.</p>
<h2>Vector: Velocity <em>and</em> direction</h2>
<p>The key bits of information missing from the delay information seem to be:</p>
<ul>
<li>Is the slave falling behind, catching up, or neither? We need a measure of the <strong>direction</strong> of replication&#8217;s delay.</li>
<li>How <em>fast</em> is the slave falling behind or catching up? We need a measure of <strong>velocity</strong> of replication&#8217;s performance</li>
</ul>
<p>Fortunately these two things can be combined together into a single number representing the <strong>vector</strong> of replication. This can then be presented to the user (likely a DBA) in an easy to consume format. The graphs can be read as follows:</p>
<ul>
<li><strong><em>Y = 0</em></strong> means the slave is neither catching up nor falling behind. It is replicating <em>in real time</em>. I chose zero for this state in order to make the other two states a bit more meaningful and to make the graph symmetric by default.</li>
<li><strong><em>Y &gt; 0</em></strong> means the slave is catching up at Y <em>seconds per second</em>. There is no maximum rate a slave can catch up, but in practice seeing velocities &gt;1 for extended periods is relatively uncommon in already busy systems.</li>
<li><strong><em>Y &lt; 0</em></strong> means the slave is falling behind at Y <em>seconds per second</em>. As a special case, <em>Y = -1</em> means that the slave is completely stopped and playing no events. Lagging is a function of the passage of time, so it is not possible to lag faster than one second per second.</li>
</ul>
<p>I like the symmetry of having the zero line be the center point, and having healthy hosts idle with a flat line at zero. Lag appears in the form of a meander away from zero into the negative, matched by an always equal-area<sup>1</sup> (but not necessarily similarly shaped) correction into the positive. In practice the Y-scale of graphs is <strong>fixed</strong> at <tt>[-1, +1]</tt> and the graphs are very easy and quick to interpret.</p>
<h3>Example 1</h3>
<p>Most slaves are replicating real time; one slave fell behind for some time before catching up again.</p>
<p><img src="http://jcole.us/blog/files/vector_example_1_vector.png"><br />
<em><strong>Vector</strong> &#8211; A few small perturbations can be seen, and one slave replicated at less than real time time for many hours, before finally crossing over zero and catching up at an increasing rate until current time was reached.</em></p>
<p><img src="http://jcole.us/blog/files/vector_example_1_delay.png"><br />
<em><strong>Delay</strong> &#8211; The small perturbations are difficult to see due to the scale imposed by the one very delayed slave. Although it&#8217;s easy to see on a day view that the slave did catch up quickly, that is less obvious when monitoring in real time.</em></p>
<h3>Example 2</h3>
<p>Many slaves with different replication rates, and a lot of trouble.</p>
<p><img src="http://jcole.us/blog/files/vector_example_2_vector.png"><br />
<em><strong>Vector</strong> &#8211; Overall replication performance is quite poor, and shows evidence of being unlikely to catch up to current time or maintain real time replication in the future.</em></p>
<p><img src="http://jcole.us/blog/files/vector_example_2_delay.png"><br />
<em><strong>Delay</strong> &#8211; It&#8217;s difficult to know if things are getting better or worse. The replication performance of each host is almost impossible to compare.</em></p>
<h2>Implementation</h2>
<p>In basic terms, the number of seconds of replication stream applied per second of real time, should be measured frequently, and with reasonably good precision. I have <tt>mk-heartbeat</tt> writing heartbeat events into a <tt>heartbeat</tt> table once a second on the master (which has an NTP-synchronized clock), providing a ready source of the progression of &#8220;replicated heartbeat time&#8221;<sup>2</sup> (<tt>htime</tt>) to the slaves. The slaves of course have their own NTP-synchronized clocks providing a source of local &#8220;clock time&#8221; (<tt>ctime</tt>). Both of these are collected on each slave once a minute, as integers (Unix epoch timestamps). Both the current sample (subscript <tt>c</tt>) and the previous successful sample (subscript <tt>p</tt>) are available to the processing program. The vector is calculated, stored, and sent off to be graphed once per minute. </p>
<p><!-- vector = \begin{pmatrix}\frac{htime_{c} - htime_{p}}{ctime_{c} - ctime_{p}}\end{pmatrix} - 1 --><img src="http://jcole.us/blog/files/vector_formula.png"></p>
<p>The implementation is actually quite simple, and tolerant of almost any sampling interval. In the future it could be extended to use millisecond resolution (although it can never be any higher resolution than the frequency the heartbeat is updated).</p>
<p><sup>1</sup> This is kind of an interesting point. Since the graph is nicely centered on zero, negative numbers represent the exact same scale as positive numbers, on the same dimensions.</p>
<p><sup>2</sup> <tt>SELECT UNIX_TIMESTAMP(ts) AS ts FROM heartbeat WHERE id = 1</tt></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/692/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=692&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2011/12/29/vector-a-new-way-to-think-about-replication-performance-and-slave-lag/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>

		<media:content url="http://jcole.us/blog/files/vector_example_1_vector.png" medium="image" />

		<media:content url="http://jcole.us/blog/files/vector_example_1_delay.png" medium="image" />

		<media:content url="http://jcole.us/blog/files/vector_example_2_vector.png" medium="image" />

		<media:content url="http://jcole.us/blog/files/vector_example_2_delay.png" medium="image" />

		<media:content url="http://jcole.us/blog/files/vector_formula.png" medium="image" />
	</item>
		<item>
		<title>Migrating to WordPress.com hosting</title>
		<link>http://blog.jcole.us/2011/12/29/migrating-to-wordpress-com-hosting/</link>
		<comments>http://blog.jcole.us/2011/12/29/migrating-to-wordpress-com-hosting/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 02:42:07 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blog.jcole.us/?p=686</guid>
		<description><![CDATA[As I get older I realize how much I don&#8217;t want my personal life to feel like my work life. I&#8217;ve been running my own WordPress.org installation for years, and it&#8217;s been easy and not really a problem. However it does also mean maintaining Linux, Apache, PHP, and other supporting infrastructure, and keeping things updated [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=686&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As I get older I realize how much I don&#8217;t want my personal life to feel like my work life. I&#8217;ve been running my own <a href="http://wordpress.org/">WordPress.org</a> installation for years, and it&#8217;s been easy and not really a problem. However it does also mean maintaining Linux, Apache, PHP, and other supporting infrastructure, and keeping things updated and upgraded all the time. My usage of WordPress is remarkably simplistic, so I really don&#8217;t need to do all of that. I decided to migrate to a paid <a href="http://wordpress.com/">WordPress.com</a> hosted account instead.</p>
<p>As part of this migration I also changed the URL structure a bit, to match WordPress.com&#8217;s directory structure and make things simpler (and easier to move in the future if I want to). There is a mod_rewrite redirector in place to keep the old URLs working forever. (I hope.) The changes are:</p>
<ul>
<li>Moving from <tt>jcole.us/blog</tt> to <tt>blog.jcole.us</tt>.</li>
<li>Removing <tt>archives/</tt> from the permalink path structure.</li>
<li>Simplifying the theme a bit. It&#8217;s pretty generic right now, but I&#8217;ll fix it up some more in the future.</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/686/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/686/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/686/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=686&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2011/12/29/migrating-to-wordpress-com-hosting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>
	</item>
		<item>
		<title>January 2011 MySQL Meetup: Clustrix</title>
		<link>http://blog.jcole.us/2011/01/08/january-2011-mysql-meetup-clustrix/</link>
		<comments>http://blog.jcole.us/2011/01/08/january-2011-mysql-meetup-clustrix/#comments</comments>
		<pubDate>Sun, 09 Jan 2011 02:16:22 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[MySQL Meetup]]></category>

		<guid isPermaLink="false">http://jcole.us/blog/?p=683</guid>
		<description><![CDATA[Clustrix will be presenting at this month&#8217;s Silicon Valley MySQL Meetup in Sunnyvale, CA. Stop by if you can! Paul Mikesell (CEO and VP Engineering) and Aaron Passey (CTO) will be discussing the unique architecture behind Clustrix’s massively scalable, distributed, MySQL-compatible database solution. They will talk about how the company has addressed the common challenges [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=683&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.clustrix.com/">Clustrix</a> will be presenting at this <a href="http://www.meetup.com/mysql-silicon-valley/calendar/15276732/">month&#8217;s Silicon Valley MySQL Meetup</a> in Sunnyvale, CA.  Stop by if you can!</p>
<p><a href="http://www.clustrix.com/"><img src="http://jcoledotus.files.wordpress.com/2011/12/meetup-clustrix-logo.jpg?w=450" /></a></p>
<p>Paul Mikesell (CEO and VP Engineering) and Aaron Passey (CTO) will be discussing the unique architecture behind Clustrix’s massively scalable, distributed, MySQL-compatible database solution. They will talk about how the company has addressed the common challenges associated with achieving massive scale for transactional (OLTP) relational workloads.</p>
<p>Prior to developing the Clustrix solution over the past four years, Paul was co-founder of Isilon Systems, just acquired by EMC in December 2010 for $2.25B and still has the largest clustering capability (&gt;120 nodes) of any NAS solution.</p>
<p><a href="http://www.meetup.com/mysql-silicon-valley/calendar/15276732/">Read more and RSVP on Meetup.com!</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/683/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/683/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/683/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/683/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/683/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/683/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/683/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/683/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/683/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/683/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/683/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/683/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/683/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/683/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=683&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2011/01/08/january-2011-mysql-meetup-clustrix/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>

		<media:content url="http://jcoledotus.files.wordpress.com/2011/12/meetup-clustrix-logo.jpg" medium="image" />
	</item>
		<item>
		<title>InnoDB online index add and &#8220;The table &#8216;t&#8217; is full&#8221; error</title>
		<link>http://blog.jcole.us/2011/01/05/innodb-online-index-add-and-the-table-t-is-full-error/</link>
		<comments>http://blog.jcole.us/2011/01/05/innodb-online-index-add-and-the-table-t-is-full-error/#comments</comments>
		<pubDate>Wed, 05 Jan 2011 18:46:24 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://jcole.us/blog/?p=677</guid>
		<description><![CDATA[While trying to add an index to a fairly large table today, on a server1 I&#8217;d not worked on previously, I got the following error after some time (and while I was away from the computer): mysql&#62; ALTER TABLE t -&#62; ADD KEY `index_a` (`a`), -&#62; ADD KEY `index_b` (`b`), -&#62; ADD KEY `index_c` (`c`); [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=677&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>While trying to add an index to a fairly large table today, on a server<sup>1</sup> I&#8217;d not worked on previously, I got the following error after some time (and while I was away from the computer):</p>
<blockquote><pre>
mysql&gt; ALTER TABLE t
    -&gt;   ADD KEY `index_a` (`a`),
    -&gt;   ADD KEY `index_b` (`b`),
    -&gt;   ADD KEY `index_c` (`c`);
ERROR 1114 (HY000): The table 't' is full
</pre>
</blockquote>
<p>The error log did not bring particular enlightenment (as usual, InnoDB is extremely verbose with the logs, without saying anything useful):</p>
<blockquote><pre>
110105 16:22:30  InnoDB: Error: Write to file (merge) failed at offset 4 1387266048.
InnoDB: 1048576 bytes should have been written, only 888832 were written.
InnoDB: Operating system error number 0.
InnoDB: Check that your OS and file system support files of this size.
InnoDB: Check also that the disk is not full or a disk quota exceeded.
InnoDB: Error number 0 means 'Success'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html
110105 16:23:00 [ERROR] /usr/sbin/mysqld: The table 't' is full
</pre>
</blockquote>
<p>I had to re-run the command while keeping a close eye on things, and I discovered that it was writing significant amounts of data to the root file system (which isn&#8217;t very big, as usual). I looked in all the usual places, and didn&#8217;t see any files of note.  However, on a hunch, I checked out <tt>/proc/&lt;pid&gt;/fd</tt> (which can be a lifesaver).  I found these:</p>
<blockquote><pre>
# ls -l /proc/`pidof mysqld`/fd | grep deleted
lrwx------ 1 root root 64 Jan  5 17:33 14 -&gt; /var/tmp/ibEnEaSj (deleted)
lrwx------ 1 root root 64 Jan  5 17:33 5 -&gt; /var/tmp/ibEoZHQc (deleted)
lrwx------ 1 root root 64 Jan  5 17:33 6 -&gt; /var/tmp/ibHlWZb3 (deleted)
lrwx------ 1 root root 64 Jan  5 17:33 7 -&gt; /var/tmp/ibUtVhxT (deleted)
lrwx------ 1 root root 64 Jan  5 17:33 8 -&gt; /var/tmp/ibt1daDR (deleted)
</pre>
</blockquote>
<p>I can only assume it&#8217;s one of these files that&#8217;s growing.  Changing the setting of <tt>tmpdir</tt> fixed things up, and it&#8217;s writing its large data files to a place with significant space now (and on a much bigger and faster RAID array, to boot).  However, this brings with it a couple of questions:</p>
<ol>
<li>Why does InnoDB need significant space in <tt>tmpdir</tt>? This is a new requirement with InnoDB plugin (due to online index addition only?), but I don&#8217;t see it documented anywhere.<sup>2</sup></li>
<li>Why are the files deleted while in use? This makes it very painful for a DBA to manage it and see what&#8217;s using space. I know it&#8217;s a typical Unix paradigm, but cleanup-on-start and leaving the files linked is much easier to manage.</li>
<li>Why are the error messages useless? How else is a DBA supposed to track this down?</li>
</ol>
<p>I could also note that using online index add makes useless the only previous way of getting some sort of a status update while adding indexes: watching the temporary file grow. Perhaps it&#8217;s time to bring back my <a href="http://jcole.us/patches/mysql/5.0/progress_rows_percent.patch">patch to show progress</a> (<a href="http://bugs.mysql.com/bug.php?id=26182">MySQL Bug #26182</a>)?</p>
<p><sup>1</sup> Running <tt>MySQL-server-percona-5.1.42-9.rhel5</tt> with <tt>innodb_version</tt> = 1.0.6-9.</p>
<p><sup>2</sup> Perhaps it should go on <a href="http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-create-index-limitations.html">Fast Index Creation in the InnoDB Storage Engine: Limitations</a> at very least?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/677/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/677/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/677/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/677/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/677/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/677/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/677/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/677/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/677/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/677/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/677/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/677/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/677/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/677/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=677&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2011/01/05/innodb-online-index-add-and-the-table-t-is-full-error/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>
	</item>
		<item>
		<title>Are we really living in the 1700s?</title>
		<link>http://blog.jcole.us/2011/01/02/are-we-really-living-in-the-1700s/</link>
		<comments>http://blog.jcole.us/2011/01/02/are-we-really-living-in-the-1700s/#comments</comments>
		<pubDate>Mon, 03 Jan 2011 03:29:28 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://jcole.us/blog/?p=667</guid>
		<description><![CDATA[I stumbled upon an interesting theory today, which I hadn&#8217;t heard of or researched before: a supposition that the Dark Ages was not dark merely because there was so little political, cultural, archeological, scientific, etc. advancements, but rather that it&#8217;s because a period of approximately 300 years (AD 614-911) of the dark ages didn&#8217;t exist [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=667&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I stumbled upon an interesting theory today, which I hadn&#8217;t heard of or researched before: a supposition that the <a href="http://en.wikipedia.org/wiki/Dark_Ages">Dark Ages</a> was not dark merely because there was so little political, cultural, archeological, scientific, etc. advancements, but rather that it&#8217;s because a period of approximately 300 years (AD 614-911) of the dark ages <em>didn&#8217;t exist at all</em>: A theory titled the <a href="http://en.wikipedia.org/wiki/Phantom_time_hypothesis">Phantom time hypothesis</a>.</p>
<p>A paper <a href="http://www.cl.cam.ac.uk/~mgk25/volatile/Niemitz-1997.pdf">Did the Early Middle Ages Really Exist?</a> by Dr. Hans-Ulrich Niemitz, lays out the theory (much of it originated from Heribert Illig in the early 1990s). An interesting <a href="http://www.korthweb.de/PhZT/FAQ_E.html">FAQ</a> written by Jan Beaufort rounds out some the questions.  A summary and a few more bits of insight are <a href="http://lelarge.de/wamse.html">provided by Gunnar Ries and Ruth Lelarge</a>.  The topic is also covered by a <a href="http://www.damninteresting.com/the-phantom-time-hypothesis">DamnInteresting post</a> and thread of comments&mdash;some comical and some insightful. Of course any such theory is not without a lot of criticism and counterpoints, as it should have, and Phil John Kneis <a href="http://www.philjohn.com/papers/pjkd_h04.html">counters a lot of things nicely</a>. I found the story of <a href="http://en.wikipedia.org/wiki/Seven_Sleepers">The Seven Sleepers of Ephesus</a> as a potential link to possible truth of this story; not that the Seven Sleepers story was at all truth, but that in that time stories (fictions) were made up to relate truths or partial truths quite frequently.</p>
<p>Some of the claims made, often based at least tangentially on well-known reasons for having called them the &#8220;Dark Ages&#8221; in the first place (and in many cases refuted by others) are:</p>
<ul>
<li>Overall lack of reputable, accurately dated literature and historical documents from the time period.</li>
<li>Much of the otherwise supporting evidence in support of the time period depending on Carbon-14 dating, which has had adjustments made in order to line up with perceived history, and which has been calibrated based on dendrochronology (tree-ring dating) which has had known flaws.</li>
<li>Architectural development discontinuities; the seemingly complete stoppage of forward progress in architectural style for 300 years, with buildings known to be constructed after the Dark Ages (based on calibration backwards from modern times) compared to buildings known to be constructed before them (based on calibration forwards from ancient times) showing little or no difference.</li>
<li>Farming, war, and scientific knowledge making practically no advancement during the period.</li>
<li>A huge spate of forgery of official documents and religious texts otherwise before, during, and after the time period.</li>
</ul>
<p>It&#8217;s not easy to believe one way or the other, but I personally find it not that hard to believe that three hundred years may have been accidentally or intentionally inserted into the calendar for whatever reason.  Remember that calendars are man-invented tools, not scientific absolutes, and particularly the points of reference used in them are almost entirely arbitrary.  It wouldn&#8217;t change our daily lives<sup>1</sup> in any way if the actual number of years having passed since the Roman empire was closer to 1700 than 2000.  We&#8217;ve made a lot of adjustments to the calendar, and even switched points of reference and entire ways of counting several times during man&#8217;s history.  Even today, not everyone uses the same calendar or agrees on the calendar.<sup>2</sup></p>
<p>It&#8217;s easy to misinterpret the meaning of &#8220;missing years&#8221;, and it seems like many comment authors on the various articles have made the mistake of thinking that Illig and Niemitz are suggesting that the <em>years themselves didn&#8217;t exist</em>, which of course is nonsense. The only claim being made is that three centuries of already-sketchy history may have just been fabricated entirely, an offset to the calendar was introduced intentionally or not, and as a result of that, we&#8217;ve been mis-numbering years for the 1100 years following that.</p>
<p>If true, there would be no need to correct the current calendar date, just to note in history books, elapsed-time calculations involving the past and including those years, and other places that there is a gap.  There are already other gaps in the Gregorian calendar, this would just be the largest.</p>
<p>What do you think?  Do you find it plausible? Should I be signing this post January 2, 1714? If true, I guess that gives us an extra 298 years before we the <a href="http://en.wikipedia.org/wiki/2012_phenomenon">Mayan-predicted end-times in 2012</a>, so that&#8217;s a bonus!</p>
<p><sup>1</sup> With the possible exception that some churches, and some acts of various churches, may be on slightly shakier footings, especially if it could be proven that they had been maliciously involved in the fabrication of the calendar.</p>
<p><sup>2</sup> See Wikipedia&#8217;s <a href="http://en.wikipedia.org/wiki/List_of_calendars">List of calendars</a> and <a href="http://en.wikipedia.org/wiki/Calendar">Calendar</a> pages for a few points of study. In addition to our <a href="http://en.wikipedia.org/wiki/Gregorian_calendar">Gregorian</a> calendar, at least the <a href="http://en.wikipedia.org/wiki/Hebrew_calendar">Hebrew</a> and <a href="http://en.wikipedia.org/wiki/Chinese_calendar">Chinese</a> calendars are still in widespread use.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/667/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=667&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2011/01/02/are-we-really-living-in-the-1700s/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>
	</item>
		<item>
		<title>Now a Database Architect at Twitter</title>
		<link>http://blog.jcole.us/2010/11/15/now-a-database-architect-at-twitter/</link>
		<comments>http://blog.jcole.us/2010/11/15/now-a-database-architect-at-twitter/#comments</comments>
		<pubDate>Tue, 16 Nov 2010 04:44:46 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[Commuting]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://jcole.us/blog/?p=647</guid>
		<description><![CDATA[Just a small announcement: Starting today, I am now &#8220;MySQL Database Architect&#8221; at Twitter, where I am joining some old friends on the small but hard-working DBA and operations teams there. I&#8217;ll be working to help debug, support, and scale the MySQL databases, of course, and who knows what else. I&#8217;m looking forward to the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=647&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Just a small announcement:</p>
<p><a href="http://twitter.com/"><img src="http://jcole.us/blog/files/twitter-logo.png" width="333" height="62" /></a></p>
<p>Starting today, I am now &#8220;MySQL Database Architect&#8221; at <a href="http://twitter.com/">Twitter</a>, where I am joining some old friends on the small but hard-working DBA and operations teams there. I&#8217;ll be working to help debug, support, and scale the MySQL databases, of course, and who knows what else. I&#8217;m looking forward to the challenges and fast paced operations again. I&#8217;m also looking forward to writing a lot more on this blog about MySQL. I&#8217;ve had a Twitter account for a long time, but I suppose I&#8217;ll write a lot more on it now:</p>
<p><a href="http://twitter.com/jeremycole"><img src="http://jcole.us/blog/files/twitter-follow-me.png" /></a></p>
<p>Since I will now be commuting from <a href="http://maps.google.com/maps?q=monterey%20terrace,%20sunnyvale,%20ca%20to%20795%20folsom%20st.,%20san%20francisco,%20ca">Sunnyvale to San Francisco</a> nearly daily, I think my old &#8220;<a href="http://jcole.us/blog/archives/category/commuting/">Commuting</a>&#8221; category will get more of a work-out too.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/647/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/647/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/647/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/647/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/647/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/647/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/647/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/647/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/647/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/647/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/647/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/647/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/647/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/647/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=647&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2010/11/15/now-a-database-architect-at-twitter/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>

		<media:content url="http://jcole.us/blog/files/twitter-logo.png" medium="image" />

		<media:content url="http://jcole.us/blog/files/twitter-follow-me.png" medium="image" />
	</item>
		<item>
		<title>On early MySQL development hostnames</title>
		<link>http://blog.jcole.us/2010/10/12/on-early-mysql-development-hostnames/</link>
		<comments>http://blog.jcole.us/2010/10/12/on-early-mysql-development-hostnames/#comments</comments>
		<pubDate>Tue, 12 Oct 2010 22:03:24 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://jcole.us/blog/?p=637</guid>
		<description><![CDATA[While reading through the manual I ran across something I had totally forgotten about from the early MySQL days. Early on, Monty (or was it Jani?) decided to name many development servers variants of &#8220;bitch&#8221; in different languages. I have no idea what the back-story was, but maybe Monty or Jani can fill it in. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=637&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>While reading through the manual I ran across something I had totally forgotten about from the early MySQL days. Early on, Monty (or was it Jani?) decided to name many development servers variants of &#8220;bitch&#8221; in different languages.  I have no idea what the back-story was, but maybe Monty or Jani can fill it in.  All of these names live on all over the place, such as in the MySQL and InnoDB documentation, bug reports, and mailing list messages.  See:</p>
<ul>
<li><strong>bitch.mysql.fi</strong> &mdash; English, of course.</li>
<li><strong>hundin.mysql.fi</strong> &mdash; <a href="http://translate.google.com/#de|en|h%C3%BCndin">German</a></li>
<li><strong>hynda.mysql.fi</strong> &mdash; <a href="http://translate.google.com/#sv|en|hynda">Swedish</a></li>
<li><strong>narttu.mysql.fi</strong> &mdash; <a href="http://translate.google.com/#fi|en|narttu">Finnish</a></li>
<li><strong>tik.mysql.fi</strong> &mdash; <a href="http://translate.google.com/#sv|en|tik">Swedish</a></li>
<li><strong>tramp.mysql.fi</strong> &mdash; English, probably.  Similar in meaning to some slang uses of &#8220;bitch&#8221;.</li>
</ul>
<p>There are a few honorable mentions, which I&#8217;m not sure are variants of &#8220;bitch&#8221;, but very well could be:</p>
<ul>
<li><strong>donna.mysql.fi</strong></li>
<li><strong>mashka.mysql.fi</strong></li>
<li><strong>mishka.mysql.fi</strong></li>
</ul>
<p>It&#8217;s funny to see how those names live on in &#8220;infamy&#8221; on Google. Try searching for any of them like &#8220;<a href="http://www.google.com/search?q=%2Bmysql+%2Bhundin">+mysql +hundin</a>&#8220;.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/637/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/637/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/637/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=637&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2010/10/12/on-early-mysql-development-hostnames/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>
	</item>
		<item>
		<title>How to find an errant MySQL client</title>
		<link>http://blog.jcole.us/2010/10/02/how-to-find-an-errant-mysql-client/</link>
		<comments>http://blog.jcole.us/2010/10/02/how-to-find-an-errant-mysql-client/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 20:14:59 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://jcole.us/blog/?p=623</guid>
		<description><![CDATA[A common story: You&#8217;ve got some connection, either it&#8217;s busy running something it shouldn&#8217;t be, it&#8217;s in Sleep but holding some important lock, or you just don&#8217;t know why it&#8217;s connected to your database server in the first place. You see it in your SHOW PROCESSLIST like so: mysql&#62; show processlist G *************************** 1. row [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=623&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A common story: You&#8217;ve got some connection, either it&#8217;s busy running something it shouldn&#8217;t be, it&#8217;s in <tt>Sleep</tt> but holding some important lock, or you just don&#8217;t know why it&#8217;s connected to your database server in the first place.  You see it in your <tt>SHOW PROCESSLIST</tt> like so:</p>
<blockquote><pre>
mysql&gt; show processlist G
*************************** 1. row ***************************
     Id: 5979887
   User: root
   Host: localhost:55997
     db: NULL
Command: Sleep
   Time: 475
  State:
   Info: NULL
</pre>
</blockquote>
<p>How do you find that client, especially if it&#8217;s on another host?  MySQL is providing you all the information you need above: <tt>localhost:55997</tt>.  Of course <tt>localhost</tt> is the host or IP address, and <tt>55997</tt> is the source port of the socket; the port number (usually randomly assigned) on the far end of the socket, from the MySQL server&#8217;s perspective.  You can turn that number into something useful&mdash;the PID and user&mdash;, by running the following command on the host that made the connection:</p>
<blockquote><pre>
# lsof -nPi :55997
COMMAND  PID  USER   FD   TYPE    DEVICE SIZE NODE NAME
mysqld  5026 mysql 1654u  IPv4 303329996       TCP 127.0.0.1:3306-&gt;127.0.0.1:55997 (ESTABLISHED)
mysql   9146 jcole    3u  IPv4 303329995       TCP 127.0.0.1:55997-&gt;127.0.0.1:3306 (ESTABLISHED)
</pre>
</blockquote>
<p>(Note that here you can see both sides of the socket, since I&#8217;m running these commands on localhost.  Disregard the first entry, as it&#8217;s the half of the connection owned by the MySQL server.)</p>
<p>You can then find out the full command-line of the process with <tt>ps</tt>:</p>
<blockquote><pre>
# ps -fp 9146
UID        PID  PPID  C STIME TTY          TIME CMD
jcole     9146  8740  0 12:53 pts/3    00:00:00 mysql -h 127.0.0.1 -u root -p
</pre>
</blockquote>
<p>And see what it&#8217;s doing with <tt>strace</tt> (waiting on a read of <tt>stdin</tt> in this dumb test):</p>
<blockquote><pre>
# strace -p 9146
Process 9146 attached - interrupt to quit
read(0,  &lt;unfinished ...&gt;
Process 9146 detached
</pre>
</blockquote>
<p>I hope that&#8217;s helpful!  It&#8217;s a pretty common debugging trick for me.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/623/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=623&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2010/10/02/how-to-find-an-errant-mysql-client/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>
	</item>
		<item>
		<title>The MySQL &#8220;swap insanity&#8221; problem and the effects of the NUMA architecture</title>
		<link>http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/</link>
		<comments>http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 01:10:10 +0000</pubDate>
		<dc:creator>jeremycole</dc:creator>
				<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://jcole.us/blog/?p=522</guid>
		<description><![CDATA[The &#8220;swap insanity&#8221; problem, in brief When running MySQL on a large system (e.g., 64GB RAM and dual quad core CPUs) with a large InnoDB buffer pool (e.g., 48GB), over time, Linux decides to swap out potentially large amounts of memory, despite appearing1 to be under no real memory pressure. Monitoring reveals that at no [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=522&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>The &#8220;swap insanity&#8221; problem, in brief</h2>
<p>When running MySQL on a large system (e.g., 64GB RAM and dual quad core CPUs) with a large InnoDB buffer pool (e.g., 48GB), over time, Linux decides to swap out potentially large amounts of memory, despite appearing<sup>1</sup> to be under no real memory pressure.  Monitoring reveals that at no time is the system in actual need of more memory than it has available; and memory isn&#8217;t leaking, <tt>mysqld</tt>&#8216;s <a href="http://en.wikipedia.org/wiki/Resident_set_size">RSS</a> is normal and stable.</p>
<p>Normally a tiny bit of swap usage could be OK (we&#8217;re really concerned about activity&mdash;swaps in and out), but in many cases, &#8220;real&#8221; useful memory is being swapped: primarily parts of InnoDB&#8217;s buffer pool. When it&#8217;s needed once again, a big performance hit is taken to swap it back in, causing random delays in random queries.  This can cause overall unpredictable performance on production systems, and often once swapping starts, the system may enter a performance death-spiral.</p>
<p>While not every system, and not every workload experiences this problem, it&#8217;s common enough that it&#8217;s well known, and for those that know it well it can be a major headache.</p>
<h2>The history of &#8220;swap insanity&#8221;</h2>
<p>Over the past two to four years, there has been an off-and-on discussion about Linux swapping and MySQL, often titled &#8220;swap insanity&#8221; (I think coined by Kevin Burton).  I have followed it closely, but I haven&#8217;t contributed much because I didn&#8217;t have anything new to add.  The major contributors to the discussion over the past years have been:</p>
<ul>
<li><a href="http://feedblog.org/2006/09/27/stupid-linux-swap-tricks-with-swappiness/">Kevin Burton</a> &mdash; Discussion of swappiness and MySQL on Linux.</li>
<li><a href="http://feedblog.org/2007/09/29/using-o_direct-on-linux-and-innodb-to-fix-swap-insanity/">Kevin Burton</a> &mdash; Proposed IO_DIRECT as a solution (doesn&#8217;t work) and discussed memlock (may help, but not a full solution).</li>
<li><a href="http://www.mysqlperformanceblog.com/2008/04/06/should-you-have-your-swap-file-enabled-while-running-mysql/">Peter Zaitsev</a> &mdash; Discussed swappiness, memlock, and fielded a lot of discussion in the comments.</li>
<li><a href="http://don.blogs.smugmug.com/2008/05/01/mysql-and-the-linux-swap-problem/">Don MacAskill</a> &mdash; Proposed an innovative (albeit hacky) solution using swap on ramdisk, and a lot more interesting discussion in the comments.</li>
<li><a href="http://mysqldba.blogspot.com/2008/05/linux-64-bit-mysql-swap-and-memory.html">Dathan Pattishall</a> &mdash; Describes how Linux behavior can be even worse with swap disabled, and proposes using <tt>swapoff</tt> to clear it, but no real solution.</li>
<li><a href="http://marc.info/?t=120997175700001">Rik van Riel on the LKML</a> &mdash; A few answers and proposal of the Split-LRU patch.</li>
<li><a href="http://feedblog.org/2008/09/03/linux-split-lru-patch-improves-mysql-swap-performance/">Kevin Burton</a> &mdash; Discussion of Linux Split-LRU patch with some success.</li>
<li><a href="http://mysqlha.blogspot.com/2008/11/linux-mysql-vmstat.html">Mark Callaghan</a> &mdash; Discussion of vmstat and monitoring things, and a recap of a few possible solutions.</li>
<li><a href="http://feedblog.org/2009/01/25/splitlru-patch-in-kernel-2628-must-have-for-mysql-and-innodb/">Kevin Burton</a> &mdash; More discussion that Linux Split-LRU is essential.</li>
<li><a href="http://feedblog.org/2009/02/14/the-middle-path-and-the-solution-to-linux-swap/">Kevin Burton</a> &mdash; Choosing the middle road by enabling swap, but with a small amount of space, and giving up the battle.</li>
<li><a href="http://www.mysqlperformanceblog.com/2010/01/18/why-swapping-is-bad-for-mysql-performance/">Peter Zaitsev</a> &mdash; More discussion about why swapping is bad, but no solution.</li>
</ul>
<p>Despite all the discussion, not much has changed.  There are some hacky solutions to get MySQL to stop swapping, but nothing definite.  I&#8217;ve known these solutions and hacks now for a while, but the core question was never really settled: &#8220;<em>Why</em> does this happen?&#8221; and it&#8217;s never sat well with me.  I was recently tasked with trying to sort this mess out once and for all, so I&#8217;ve now done quite a bit of research and testing related to the problem.  I&#8217;ve learned a lot, and decided a big blog entry might be the best way to share it.  Enjoy.</p>
<p>There was a lot of discussion and some work went into adding the relatively new <tt>swappiness</tt> tunable a few years ago, and I think that may have solved some of the original problems, but at around the same time the basic architecture of the machine changed to NUMA, which I think introduced some new problems, with the very same symptoms, masking the original fix.</p>
<h2>Contrasting the SMP/UMA and NUMA architectures</h2>
<h3>The SMP/UMA architecture</h3>
<p><img src="http://jcole.us/blog/files/uma-architecture.png" /><br />
<em>The SMP, or UMA architecture, simplified</em></p>
<p>When the PC world first got multiple processors, they were all arranged with equal access to all of the memory in the system.  This is called <a href="http://en.wikipedia.org/wiki/Symmetric_Multi-Processing">Symmetric Multi-processing (SMP)</a>, or sometimes Uniform Memory Architecture (UMA, especially in contrast to NUMA).  In the past few years this architecture has been largely phased out between physical socketed processors, but is still alive and well today within a single processor with multiple cores: all cores have equal access to the memory bank.  </p>
<h3>The NUMA architecture</h3>
<p><img src="http://jcole.us/blog/files/numa-architecture.png" /><br />
<em>The NUMA architecture, simplified</em></p>
<p>The new architecture for multiple processors, starting with <a href="http://en.wikipedia.org/wiki/Opteron">AMD&#8217;s Opteron</a> and <a href="http://en.wikipedia.org/wiki/Nehalem_%28microarchitecture%29">Intel&#8217;s Nehalem</a><sup>2</sup> processors (we&#8217;ll call these &#8220;modern PC CPUs&#8221;), is a <a href="http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access">Non-Uniform Memory Access (NUMA)</a> architecture, or more correctly <a href="http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access#Cache_coherent_NUMA_.28ccNUMA.29">Cache-Coherent NUMA (ccNUMA)</a>.  In this architecture, each processor has a &#8220;local&#8221; bank of memory, to which it has much closer (lower latency) access.  The whole system may still operate as one unit, and all memory is basically accessible from everywhere, but at a potentially higher latency and lower performance.</p>
<p>Fundamentally, some memory locations (&#8220;local&#8221; ones) are faster, that is, cost less to access, than other locations (&#8220;remote&#8221; ones attached to other processors).  For a more detailed discussion of NUMA implementation and its support in Linux, see <a href="http://lwn.net/Articles/254445/">Ulrich Drepper&#8217;s article on LWN.net</a>.</p>
<h2>How Linux handles a NUMA system</h2>
<p>Linux automatically understands when it&#8217;s running on a NUMA architecture system and does a few things:</p>
<ol>
<li>Enumerates the hardware to understand the physical layout.</li>
<li>Divides the processors (not cores) into &#8220;nodes&#8221;.  With modern PC processors, this means one node per physical processor, regardless of the number of cores present.</li>
<li>Attaches each memory module in the system to the node for the processor it is local to.</li>
<li>Collects cost information about inter-node communication (&#8220;distance&#8221; between nodes).</li>
</ol>
<p>You can see how Linux enumerated your system&#8217;s NUMA layout using the <tt>numactl --hardware</tt> command:</p>
<blockquote><pre>
# numactl --hardware
available: 2 nodes (0-1)
node 0 size: 32276 MB
node 0 free: 26856 MB
node 1 size: 32320 MB
node 1 free: 26897 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10
</pre>
</blockquote>
<p>This tells you a few important things:</p>
<ul>
<li><strong>The number of nodes, and their node numbers</strong>  &mdash; In this case there are two nodes numbered &#8220;0&#8243; and &#8220;1&#8243;.</li>
<li><strong>The amount of memory available within each node</strong>  &mdash; This machine has 64GB of memory total, and two physical (quad core) CPUs, so it has 32GB in each node.  Note that the sizes aren&#8217;t exactly half of 64GB, and aren&#8217;t exactly equal, due to some memory being stolen from each node for whatever internal purposes the kernel has in mind.</li>
<li><strong>The &#8220;distance&#8221; between nodes</strong>  &mdash; This is a representation of the cost of accessing memory located in (for example) Node 0 from Node 1.  In this case, Linux claims a distance of &#8220;10&#8243; for local memory and &#8220;21&#8243; for non-local memory.</li>
</ul>
<h2>How NUMA changes things for Linux</h2>
<p>Technically, as long as everything runs just fine, there&#8217;s no reason that being UMA or NUMA should change <em>how</em> things work at the OS level.  However, if you&#8217;re to get the best possible performance (and indeed in some cases with extreme performance differences for non-local NUMA access, any performance at all) some additional work has to be done, directly dealing with the internals of NUMA.  Linux does the following things which might be unexpected if you think of CPUs and memory as black boxes:</p>
<ul>
<li>Each process and thread inherits, from its parent, a NUMA policy.  The inherited policy can be modified on a per-thread basis, and it defines the CPUs and even individual cores the process is allowed to be scheduled on, where it should be allocated memory from, and how strict to be about those two decisions.</li>
<li>Each thread is initially allocated a &#8220;preferred&#8221; node to run on.  The thread <em>can</em> be run elsewhere (if policy allows), but the scheduler attempts to ensure that it is always run on the preferred node.</li>
<li>Memory allocated for the process is allocated on a particular node, by default &#8220;current&#8221;, which means the same node as the thread is preferred to run on.  On UMA/SMP architectures all memory was treated equally, and had the same cost, but now the system has to think a bit about where it comes from, because accessing non-local memory has implications on performance and may cause cache coherency delays.</li>
<li>Memory allocations made on one node will <strong>not</strong> be moved to another node, regardless of system needs.  Once memory is allocated on a node, it will stay there.</li>
</ul>
<p>The NUMA policy of any process can be changed, with broad-reaching effects, very simply using <a href="http://linux.die.net/man/8/numactl"><tt>numactl</tt></a> as a wrapper for the program.  With a bit of additional work, it can be fine-tuned in detail by linking in <a href="http://linux.die.net/man/3/numa"><tt>libnuma</tt></a> and writing some code yourself to manage the policy.  Some interesting things that can be done simply with the <tt>numactl</tt> wrapper are:</p>
<ul>
<li>Allocate memory with a particular policy:
<ul>
<li>locally on the &#8220;current&#8221; node &mdash; using <tt>--localalloc</tt>, and also the default mode</li>
<li>preferably on a particular node, but elsewhere if necessary &mdash; using <tt>--preferred=<em>node</em></tt></li>
<li>always on a particular node or set of nodes &mdash; using <tt>--membind=<em>nodes</em></tt></li>
<li><em>interleaved</em>, that is, spread evenly round-robin across all or a set of nodes &mdash; using <tt>--interleaved=all</tt> or <tt>--interleaved=<em>nodes</em></tt></li>
</ul>
</li>
<li>Run the program on a particular node or set of nodes, in this case that means physical CPUs (<tt>--cpunodebind=<em>nodes</em></tt>) or on a particular core or set of cores (<tt>--physcpubind=<em>cpus</em></tt>).</li>
</ul>
<h2>What NUMA means for MySQL and InnoDB</h2>
<p>InnoDB, and really, nearly all database servers (<a href="http://kevinclosson.wordpress.com/2009/05/14/you-buy-a-numa-system-oracle-says-disable-numa-what-gives-part-ii/">such as Oracle</a>), present an atypical workload (from the point of view of the majority of installations) to Linux: a single large multi-threaded process which consumes nearly all of the system&#8217;s memory and should be expected to consume as much of the rest of the system resources as possible.</p>
<p>In a NUMA-based system, where the memory is divided into multiple nodes, how the system should handle this is not necessarily straightforward.  The default behavior of the system is to allocate memory in the same node as a thread is scheduled to run on, and this works well for small amounts of memory, but when you want to allocate more than half of the system memory it&#8217;s no longer physically possible to even do it in a single NUMA node: In a two-node system, only 50% of the memory is in each node.  Additionally, since many different queries will be running at the same time, on both processors, neither individual processor necessarily has preferential access to any particular part of memory needed by a particular query.</p>
<p>It turns out that this seems to matter in one very important way. Using <a href="http://linux.die.net/man/5/numa_maps"><tt>/proc/<em>pid</em>/numa_maps</tt></a> we can see all of the allocations made by <tt>mysqld</tt>, and some interesting information about them.  If you look for a really big number in the <tt>anon=<em>size</em></tt>, you can pretty easily find the buffer pool (which will consume <a href="http://mysqlha.blogspot.com/2008/11/innodb-memory-overhead.html">more than 51GB of memory</a> for the 48GB that it has been configured to use) [line-wrapped for clarity]:</p>
<blockquote><pre>
2aaaaad3e000 default anon=13240527 dirty=13223315
  swapcache=3440324 active=13202235 N0=7865429 N1=5375098
</pre>
</blockquote>
<p>The fields being shown here are:</p>
<ul>
<li><tt>2aaaaad3e000</tt> &mdash; The virtual address of the memory region.  Ignore this other than the fact that it&#8217;s a unique ID for this piece of memory.</li>
<li><tt>default</tt> &mdash; The NUMA policy in use for this region.</li>
<li><tt>anon=<em>number</em></tt> &mdash; The number of anonymous pages mapped.</li>
<li><tt>dirty=<em>number</em></tt> &mdash; The number of pages that are dirty because they have been modified.  Generally memory allocated only within a single process is always going to be used, and thus dirty, but if a process forks it may have many copy-on-write pages mapped that are not dirty.</li>
<li><tt>swapcache=<em>number</em></tt> &mdash; The number of pages swapped out but unmodified since they were swapped out, and thus they are ready to be freed if needed, but are still in memory at the moment.</li>
<li><tt>active=<em>number</em></tt> &mdash; The number of pages on the &#8220;active list&#8221;; if this field is shown, some memory is inactive (<tt>anon</tt> minus <tt>active</tt>) which means it may be paged out by the swapper soon.</li>
<li><tt>N0=<em>number</em></tt> and <tt>N1=<em>number</em></tt> &mdash; The number of pages allocated on Node 0 and Node 1, respectively.</li>
</ul>
<p>The entire <tt>numa_maps</tt> can be quickly summarized by the a simple script <a href="http://jcole.us/blog/files/numa-maps-summary.pl">numa-maps-summary.pl</a>, which I&#8217;ve written while analyzing this problem:</p>
<blockquote><pre>
N0        :      7983584 ( 30.45 GB)
N1        :      5440464 ( 20.75 GB)
active    :     13406601 ( 51.14 GB)
anon      :     13422697 ( 51.20 GB)
dirty     :     13407242 ( 51.14 GB)
mapmax    :          977 (  0.00 GB)
mapped    :         1377 (  0.01 GB)
swapcache :      3619780 ( 13.81 GB)
</pre>
</blockquote>
<p>An couple of interesting and somewhat unexpected things pop out to me:</p>
<ol>
<li>The sheer imbalance in how much memory is allocated in Node 0 versus Node 1.  This is actually absolutely normal per the default policy.  Using the default NUMA policy, memory was preferentially allocated in Node 0, but Node 1 was used as a last resort.</li>
<li>The sheer <em>amount</em> of memory allocated in Node 0.  This is absolutely critical &mdash; Node 0 is out of free memory!  It only contains about 32GB of memory in total, and it has allocated a single large chunk of more than 30GB to InnoDB&#8217;s buffer pool.  A few other smaller allocations to other processes finish it off, and suddenly it has no memory free isn&#8217;t even caching anything.</li>
</ol>
<p>The memory allocated by MySQL looks something like this:</p>
<p><img src="http://jcole.us/blog/files/numa-imbalanced-allocation.png" /><br />
<em>Allocating memory severely imbalanced, preferring Node 0</em></p>
<p>Due to Node 0 being completely exhausted of free memory, even though the system has plenty of free memory overall (over 10GB has been used for caches) it is <em>entirely</em> on Node 1.  If any process scheduled on Node 0 needs local memory for anything, it will cause some of the already-allocated memory to be swapped out in order to free up some Node 0 pages.  Even though there is free memory on Node 1, the Linux kernel in many circumstances (which admittedly I don&#8217;t totally understand<sup>3</sup>) prefers to page out Node 0 memory rather than free some of the cache on Node 1 and use that memory.  Of course the paging is far more expensive than non-local memory access ever would be.</p>
<h2>A small change, to big effect</h2>
<p>An easy solution to this is to interleave the allocated memory.  It is possible to do this using <tt>numactl</tt> as described above:</p>
<blockquote><pre>
# numactl --interleave all <em>command</em>
</pre>
</blockquote>
<p>We can use this with MySQL by making a <a href="http://jcole.us/patches/mysql/5.1/numa_interleave_simple.patch">one-line change to <tt>mysqld_safe</tt></a>, adding the following line (after <tt>cmd="$NOHUP_NICENESS"</tt>), which prefixes the command to start <tt>mysqld</tt> with a call to <tt>numactl</tt>:</p>
<blockquote><pre>
cmd="/usr/bin/numactl --interleave all $cmd"
</pre>
</blockquote>
<p>Now, when MySQL needs memory it will allocate it interleaved across all nodes, effectively balancing the amount of memory allocated in each node.  This will leave some free memory in each node, allowing the Linux kernel to cache data on both nodes, thus allowing memory to be easily freed on either node just by freeing caches (as it&#8217;s supposed to work) rather than paging.</p>
<p>Performance regression testing has been done comparing the two scenarios (default local plus spillover allocation versus interleaved allocation) using the DBT2 benchmark, and found that performance in the nominal case is identical.  This is expected.  The breakthrough comes in that: In all cases where swap use could be triggered in a repeatable fashion, the system <em>no longer swaps</em>!</p>
<p>You can now see from the <tt>numa_maps</tt> that all allocated memory has been spread evenly across Node 0 and Node 1:</p>
<blockquote><pre>
2aaaaad3e000 interleave=0-1 anon=13359067 dirty=13359067
  N0=6679535 N1=6679532
</pre>
</blockquote>
<p>And the summary looks like this:</p>
<blockquote><pre>
N0        :      6814756 ( 26.00 GB)
N1        :      6816444 ( 26.00 GB)
anon      :     13629853 ( 51.99 GB)
dirty     :     13629853 ( 51.99 GB)
mapmax    :          296 (  0.00 GB)
mapped    :         1384 (  0.01 GB)
</pre>
</blockquote>
<p>In graphical terms, the allocation of all memory within <tt>mysqld</tt> has been made in a balanced way:</p>
<p><img src="http://jcole.us/blog/files/numa-balanced-allocation.png" /><br />
<em>Allocating memory balanced (interleaved) across nodes</em></p>
<h2>An aside on <tt>zone_reclaim_mode</tt></h2>
<p>The <a href="http://www.kernel.org/doc/Documentation/sysctl/vm.txt"><tt>zone_reclaim_mode</tt></a> tunable in <tt>/proc/sys/vm</tt> can be used to fine-tune memory reclamation policies in a NUMA system.  Subject to <a href="http://marc.info/?l=linux-mm&amp;m=128563913214216&amp;w=2">some clarifications</a> from the <tt>linux-mm</tt> mailing list, it doesn&#8217;t seem to help in this case.</p>
<h2>An even better solution?</h2>
<p>It occurred to me (and was backed up by the <tt>linux-mm</tt> mailing list) that there is probably further room for optimization, although I haven&#8217;t done any testing so far.  Interleaving <em>all</em> allocations is a pretty big hammer, and while it does solve this problem, I wonder if an even better solution would be to <em>intelligently</em> manage the fact that this is a NUMA architecture, using the <tt>libnuma</tt> library.  Some thoughts that come to mind are:</p>
<ul>
<li>Spread the buffer pool across all nodes intelligently in large chunks, or by index, rather than round-robin per page.</li>
<li>Keep the allocation policy for normal query threads to &#8220;local&#8221; so their memory isn&#8217;t interleaved across both nodes.  I think interleaved allocation could cause slightly worse performance for some queries which would use a substantial amount of local memory (such as for large queries, temporary tables, or sorts), but I haven&#8217;t tested this.</li>
<li>Managing I/O in and out to/from the buffer pool using threads that will only be scheduled on the same node that the memory they will use is allocated on (this is a rather complex optimization).</li>
<li>Re-schedule simpler query threads (many PK lookups, etc.) on nodes with local access to the data they need.  Move them actively when necessary, rather than keeping them on the same node. (I don&#8217;t know if the cost of the switch makes up for this, but it could be trivial if the buffer pool were organized by index onto separate nodes.)</li>
</ul>
<p>I have no idea if any of the above would really show practical benefits in a real-world system, but I&#8217;d love to hear any comments or ideas.</p>
<p><strong>Update 1</strong>: Changed the link for &#8220;Rik van Riel on the LKML &mdash; A few answers and proposal of the Split-LRU patch.&#8221; to be a bit closer to my intention.  The old link points to the message that started the thread, the new link points to the index of the messages in the thread.</p>
<p><strong>Update 2</strong>: Added a <a href="http://kevinclosson.wordpress.com/2009/05/14/you-buy-a-numa-system-oracle-says-disable-numa-what-gives-part-ii/">link</a> above <a href="http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/#comment-1578970">provided by Kevin Closson</a> about Oracle on NUMA systems.</p>
<p>- &#8211; -</p>
<p><sup>1</sup> Using <tt>free</tt> shows some memory free and lots of cache in use, and totalling up the resident set sizes from <tt>ps</tt> or <tt>top</tt> shows that the running processes don&#8217;t need more memory than is available.</p>
<p><sup>2</sup> An article in Dr. Dobb&#8217;s Journal titled <a href="http://www.drdobbs.com/go-parallel/article/printableArticle.jhtml?articleID=222301437">A Deeper Look Inside Intel QuickPath Interconnect</a> gives pretty good high level coverage.  Intel published a paper entitled <a href="http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf">Performance Analysis Guide for IntelÂ® Core<sup>TM</sup> i7 Processor and IntelÂ® Xeon<sup>TM</sup> 5500 processors</a> which is quite good for understanding the internals of NUMA and QPI on Intel&#8217;s Nehalem series of processors.</p>
<p><sup>3</sup> I started a thread on the <tt>linux-mm</tt> mailing list related to <a href="http://marc.info/?t=128528110500005">MySQL on NUMA</a>, and there are two other threads related on <a href="http://marc.info/?t=128434966300001">zone_reclaim_mode</a> and on <a href="http://marc.info/?t=128075346800003">swapping</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/jcoledotus.wordpress.com/522/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/jcoledotus.wordpress.com/522/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/jcoledotus.wordpress.com/522/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/jcoledotus.wordpress.com/522/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/jcoledotus.wordpress.com/522/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/jcoledotus.wordpress.com/522/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/jcoledotus.wordpress.com/522/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/jcoledotus.wordpress.com/522/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/jcoledotus.wordpress.com/522/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/jcoledotus.wordpress.com/522/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/jcoledotus.wordpress.com/522/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/jcoledotus.wordpress.com/522/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/jcoledotus.wordpress.com/522/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/jcoledotus.wordpress.com/522/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.jcole.us&amp;blog=30683698&amp;post=522&amp;subd=jcoledotus&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/feed/</wfw:commentRss>
		<slash:comments>64</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/37cd63a9fd0344804fd6a991a55c283a?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=R" medium="image">
			<media:title type="html">jeremycole</media:title>
		</media:content>

		<media:content url="http://jcole.us/blog/files/uma-architecture.png" medium="image" />

		<media:content url="http://jcole.us/blog/files/numa-architecture.png" medium="image" />

		<media:content url="http://jcole.us/blog/files/numa-imbalanced-allocation.png" medium="image" />

		<media:content url="http://jcole.us/blog/files/numa-balanced-allocation.png" medium="image" />
	</item>
	</channel>
</rss>
