Dropbox announced that As of March 15, 2017 the Public folder in your Dropbox account has been converted into a standard folder. Sad, all my previous links are hosted on Dropbox, and they all become invalid URLs.
I figured that I have to use another image hosting service. After some searching online, Cloundinary seems to be a good option. One design of my hosted images are that they are organized in subfolders under a ‘blog’ folder. This means that if I want to seamlessly convert the links, I need to preserve the folder structure too.
Cloudinary seems to suggest they support autocreating folders. Unfortunately that does not quite work for me.
In fact, I wrote a small script to do this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

With that, I did a simple sed run on all my post sources:
1


then rake preview. Boom! All the images are shown again!
]]>我開的車是 MBC300。買完後才發現，自己的輪胎是所謂的 “runflat”。這東西的好處是，即使是爆胎了，它的 sidewall^{1} 足夠堅硬，所以可以支持已經失去氣壓的輪胎在 50mph 的速度以內再走 50 miles。而壞處就是…… 幾乎沒法修，一旦損毀，基本上只能馬上趕去修車店，而且由於算是比較新的技術，不一定馬上有貨；而且造價一般也比較貴。由於 sidewall 硬，一般也會使得駕駛非常 bumpy （這點我深有同感，車子開起來非常顛簸，感覺一直在抖，但是好處是感覺非常敏感抓地，飆起來時很有 sporty 的感覺）。最後，一般帶了 runflat 的車子，都不會再配給備胎，所以如果爆胎了，那麼只能乖乖去修車店了，當然，好處就是少帶了一個備胎，trunk 可以省出很多空間。除了以上這些，根據我對我自己輪胎的觀察，還發現一到冬天，胎壓降得特別快，基本上2個月就要去打一次氣。
上面說的那些，在我開了五年的舊車上從來沒有體驗過，因爲一般我都定期半年保養，而胎壓什麼的都應該在保養時搞定了。這也間接導致了我對輪胎非常小白，因爲舊車的輪胎似乎非常耐擦，我只是在買回來後換過一次，接下來五年關於輪胎的問題真的一次沒遇過。
說起 runflat，可真是多牢騷。由於我的車買的是運動版，配了 19’ 的 wheel 以及 Pirelli PZero Summer runflats。開車至今一年半，我的左前胎就壞過兩次[^flat tire]。好在都在保修期內，所以基本上只花了幾十塊免費換新胎^{2}。但是這一來一去也是非常花時間的，還不能開車上班。
最近，我發現我的右前胎的外側基本上已經磨平了。剛發現那會非常震驚，因爲我的車才開了 15k miles 不到，竟然就快掛了。後來經過種種網上的研究，這款胎的確壽命就是很短！因爲 Pirelli 是意大利專給跑車做 performance tires 的…… 所以比較 prestigious，而這一款又是特別的薄，所以網上很多人都說甚至不到 10k 就要換了。
網上做了大量功課，發現原配 runflat 的車裝 nonrunflat 也是可以的。想起之前的種種麻煩，斷然決定還是入手 nonrunflat。在 tireack 上做了功課，基本上瞭解了有幾個考慮：
tirerack 的界面做得很好。在它的幫助下，我基本上鎖定了以下這幾款：
優劣很明顯了，我很快就決定了入手 PS3+。這裏要再表揚一下 tirerack，線下功能也做得非常好。選好了 tire 後，還能寄到指定的 tire shop 進行安裝，並且價格也是有保證的。根據它的推薦，我很快發現了 Nielsen Automotive。這家店在 yelp 是100+的五星級review。而且後來也證實，他們的服務非常靠譜！（見下節）
tirerack 給定的價格還是非常好的。我一套輪胎下來，才不到 800 刀。相對來說，costco 的價格更低，但是仔細研究後發覺，他們的 service 口碑非常差。這也是我最終決定選擇 tirerack 的原因。
一切準備就緒後，我就在網上下了 order，讓他們寄到 Nielsen Automotive。這裏有兩個小插曲：
輪胎是從 NV 寄過來的，週三寄出週四就到了。週四下午我接到 Nielsen Automotive 的電話後，約好週五下午去修。他們的報價是 $100，比 tirerack 上推薦的還要便宜 $20!
週五去到店裏，發覺是一個外表非常老舊的店。修車的也都是老頭子。但是！老頭子們都非常非常 nice & gentle，讓我感覺非常好，沒有那種普通修車店那種笑里藏刀的感覺。我一開頭還不相信，知道最後面修好車發現，他們是真真正正的 nice，一點也沒坑我。說好了一個小時最後其實花了兩個小時，修車技師 Chris 主動給我道歉並解釋說這個系列的車比較難，有一個地方他總也搞不定，所以花費了更多的時間。老先生很誠懇，我當然也不好意思多說什麼。開出去走了一圈，發現輪胎氣壓不是非常對稱，讓 Chris 再幫忙 balance 一下，也是二話不說就幫我弄好了。當我準備給他一點小費時，他還非常禮貌地拒絕了。不收小費，這我在美國還真是第一次見到。
吐血推薦這間 tire shop。地址是 888 El Camino Real, San Carlos, CA 94070。
經歷過這次讓我對輪胎的維護與保養有了比較深的認識。tirerack 價錢合理並且服務也好，更重要的是上面有大量的科普文章，非常有營養。
在快奔三的年纪，终于长出了最后一颗智齿。
本来想着说这辈子不用拔智齿的——因为我25岁前就已经长完了三颗智齿，并且都 align 得非常好，并没有阻生。之前在中国时的医生也告诉我，没有必要拔掉，因为他们出来得迟，因此比起其他牙齿也会掉得迟。所以以后牙齿掉完后，他们还会在那里，可以做到固定镶牙的用途。当然我在美国没有听到有医生这样提起的，而且美国的习惯是 4 颗一起全部拔掉，一了百了！听起来有点 scary，只是他们的考虑也不是没有道理：智齿一般会长歪，即使没有长歪，也可能引发各种疾病，比如由于长得比较靠里，比较难清洁到，因此很容易蛀牙。更甚的是，蛀牙后，会引起正常的牙齿也蛀牙！这个情况就发生了在我身上：我的下面两颗支持都有蛀牙，并且引起了旁边的蛀牙！为了这个问题，我已经补过两次牙齿。这成为了我这次拔掉他们的最大原因。
我的最后一颗智齿是左上那颗，长歪了，好在不是顶着别的牙齿，而是横向的生长，朝着舌头方向长出来了。所以舌头蹭到很不舒服。大概5个多月前刚长出来时去检查过，还拍了 panarama 的 xray （学校的 dental delta 保险三年只 cover 一次，这次只能自费，花了大概 70 刀）。但由于只是刚长成，并不清楚趋势，因此我也带着侥幸的心理没有理他（考虑到别的三颗都乖乖的直直地长了出来！）。最近终于把找工、TA、论文的事情忙得差不多，感觉不拔不行了，赶紧联系医生。可惜之前去的 dental clinic 不能做 oral surgery，于是医生 refer 我去当地最大的 local hospital：carle。总结起来，大概 timeline 是这样的：
12月份看的牙医，拍 Xray，医生诊断需要拔智齿，开 referal paper 让我去 carl 做『oral surgery』。我偷懒没有马上去做。
5月初打电话给 carle schedule 了一个 appointment。护士告诉我让 clinic 发 xray 给他们。于是我打电话让 clinic 发，他们说会在我 appointment 前发过去（结果后来没发！）
过了一个星期去做了 examination，竟然是免费的。护士帮我一检查，发觉 xray 竟然没发过来。于是我马上电话过去让他们发。护士和医生很好人，说如果他们不发，给我免费做一个。等了好久，终于给我发了过来。医生也是顺理成章地问了下我情况，问我要不要做 sedation，全身麻醉 (anesthesia，Dental IV)，还是做 local 的局部麻醉。作为一个小白，当然是咨询医生意见。医生说大家都做 IV，睡一觉醒来就拔好了。于是我就说好我也要（只是没想到后来麻醉竟然要 375）。由于我以为我保险马上要过期，于是眼泪汪汪地让他给我这周马上做手术。他们非常 accomodating，愣是给我 schedule 到了第二天。拔智齿在美国算是小手术了，因此要签一张 paper，类似于 disclosure。走之前，我拿着手术清单去 patient account 查价 （check against insurance），但是前台阿姨说不能马上知道结果，只能给我第二天打电话。回到家后，我对着 bill 大概估算了一下，一颗牙 285 x 4 = 1140 + 麻醉 375 = 1515，而保险的 upper limit 是 1000，即使 100% cover 也要自付 515。感觉还是太贵，于是打电话问他们能不能取消掉。得到的答复是，我必须跟医生商量。但是医生一直 busy，一直没有给我答复。直到他们下班前，护士才电话我说，第二天跟医生讲就好。整个流程持续了大概 2 个多小时，花了 3 刀 parking。
第二天早上收到医院电话，得到具体 insurance 的信息以及bill。我掐指一算：一颗牙 285 x 4 = 1140 + 麻醉 375 = 1515，但是保险 upper limit 是 1000，而且不！包！麻！醉！这样 quote 下来我的价格是 (285x4  45) * 30% + 375 + 45 = 748.5，其中 45 是保险的 deductible。并且由于 carle 不属于 delta dental 的 network，保险只 cover 70%，which means I pay 30%! 太贵了。于是我果断选择了只做局部麻醉，因为是免费的。这样下来就是我的 out of pocket 是 373.5, which is not too bad。果然穷人多受罪…… 不过换个角度想，全身麻醉怎么说还是比较伤身的，没用也好。不过为了预防万一，我还是做好了全身麻醉的准备。根据给我的资料，全身麻醉需要在术前8小时不能进食，任何水、candy、gum都不能吃！（据说是为了防止你大小便失禁？）由于我手术是下午1:45开始，那么算起来我最迟也要在凌晨 5 点时吃东西……要那么早爬起来准备+吃，还是算了吧。我昨天晚上是8点吃的晚饭，这样算起来我有将近 18 小时滴！水！未！进！T_T 并且由于全麻带来的作用，术后人会不清醒，还必须有人带你去医院负责把你运回家，非常严肃，当然也非常麻烦。
下午开车到了位于 champaign 的医院（carle 在 urbana 和 champaign 各有一间，医生轮岗）。checkin ，乖乖地贡献 373.5 后等了半小时进手术室。总的来说护士们都很 caring，会一直跟我聊天，也很详细地给我解释术后的一些注意事项。不久后，医生助手进来给我打麻药。老实说，整个过程中最痛的其实是这个环节，因为他们会给智齿附近的 gum 打针，扎进去恐怖+疼的感觉，一半一半吧。整个人也会不自觉地绷紧了，双手牢牢地按住自己大腿。不过其实局部麻醉对我来说也是常事了，之前补牙时也做过，所以多少有点心理准备。不久后，整个嘴麻了，连吞口水也感觉有点困难。
医生进来后手术正式开始。新出来的那颗（左上），我估计 5 秒就拔出来了，一点感觉都没有。医生会 count：this is one。我当时心里一阵窃喜，原来这么水啊！可惜如意算盘打错了，拔下面那颗时，我感觉整颗死死地嵌在牙腔里，怎么也拔不出来，牙根处有点隐隐作疼，但更疼的，是我的下巴。因为作用力太大，我感觉整个下巴都要被扯起来了…… 也许是因为太牢固，也许是因为补过牙，医生决定把它钻开再拔。于是就听到了钻头在口腔里巨大的声音。钻好后，医生估计是一片一片地拔出来的，其实我感觉不到具体手法，只知道他们一直在里面捣鼓。期间医生也会预警：你接下来会听到一些碎裂的声音…… 右上的智齿虽然也是 fully erupted，但是比起下面的好拔多了，虽然不如左上的容易，但是不需要钻就拔了出来。右下的是最 hard core 的，不仅要钻，而且钻完了也花了好大功夫才解决。估计下面的开口比较严重，所以医生都给我缝合了。没感觉到他给我穿针，但是能感觉到他拉线和打结时把牙肉缝起来。其实真正拔牙的整个流程估计不超过10分钟？老实说也没有网上写的那么恐怖，可能是我身体素质比较好吧 :)
完了后做起来歇息了一下便走了（说到这，我都佩服我自己了，没有过敏，没有烟酒，没有遗传病，没有重大病史，各种指标正常，拔牙不叫不闹，估计医生也喜欢我这种病人吧…… 感谢上天感谢爹娘给我的强壮体魄~）。回家后一直要咬着纸巾止血。大概34个小时候麻药效果过去了。根据护士说法，我可以服用一些 ibuprofen （美国常见止疼药）来止疼。把纸巾取出来后的大概半个小时，真的很疼。小时候牙疼的那种感觉又回来了。不过好在半小时过后就没怎么疼了，也许是止疼药功效上来了，也可能伤口愈合得好。总的来说，还是没什么大问题。
接下来要做的，就是看它恢复得如何了。刚拔完自然是不敢吃比较硬的食物的。乖乖地给自己熬了粥喝。希望身体赶紧回复过来，一周后还要去 DC 和 NY 旅行呢~
过了两周，总算没那么疼了。牙线也开始脱落，能正常吃东西了。我觉得我恢复得还是不错的，没有需要用上医生开的处方药，只是用 OTC 的 ibuprofen 即可止疼。只是每天早上起来时会比较疼，因为平躺着会让它充血。
PS. 如果有在 UIUC 的朋友也想去拔智齿，可以先去 Creative Smiles Dental （在 marketplace 那边）检查。没有租生的智齿它们是可以直接拔的。提我的名字（说我 refer 你）可以每人各拿 $50 的优惠 :)
]]>I’ve been looking around entrylevel watches for some time. Recently, I answered a question in Zhihu (Chinese Quora), and I feel like to transcribe this collection in my blog.
This watch is discussed in my previous blog post. It is a bit debatable as some people think it completely ripoff the NOMOS Tangente. However, considering there so many Rolex Submariner homages, this should also considered a homage. It uses a Seagull ST1701 movement, sapphire crystal, and display case back. At this price point, it is really a good deal. Get in on Amazon.
I’ve always like the ORIENT logo, much more than SEIKO’s. This watch features Roman literals, blue hands and white classic dial. Amazon
This has been quite popular in China, due to their successful marketing campaign. It is a quartz (Japanese) watch with multiple variation, and looks really good. Get the best looking ones here and here.
Now moves to my favorite category. Rolex designed the true classic look of today’s diver’s watch. However, the retail price for a submariner is probably too high for lots of people. Fortunately, Japanese brands SEIKO and ORIENT provide some affordable options here.
A very popular one. Considered as a poor man’s Rolex. You can get it around $180 at Amazon.
Also a very popular one. As I said, I really like Orient’s LOGO. I find it far more interesting to see the lions and shields rather than just the word “SEIKO”. $130 at Amazon.
Ray’s brother, Mako. The watch face is slightly different, thinner hands, arabic numbers at 6, 9 and 12 o’clock. Also around $130 at Amazon.
One of my favorite. It’s lume is fantastic and got countless compliments. Only $100 at Amazon.
This is my favorite style. Look at the beautifully designed digits! It’s lume is also awesome: lumes at all numbers! When it’s on sale you can get it at $125.
Bestseller in Amazon. Come with various colors (black, green, blue, beige). It’s so cheap ($50) that people are getting all colors and rotate everyday. Also, since it’s so affordable, it will become your workhorse. You won’t treat it like million dollar watch that only put in your locker.
Quite popular in watchuseek forum. Very nice seagull movement with a complete seethrough design. That said, it might be too busy to see the hands. $135 at Amazon.
This skeleton is cleaner than the Seagull’s. Nevertheless, Seagull is more reputable than Fossil in watch making. Amazon.
So there you have it. These watches are very affordable and have the great quality that earn them reputation. You can see them in nearly every watch forums. If you just started watch collecting, consider getting one of them. Let me know if you have any other suggestions.
]]>So NOMOS has this awesome looking Bauhaus style watch: Tangente. Apparently everyone wants one, but not everyone wants to break their bank ($2330 USD).
Luckily, we have an alternative here: Rodina series watches (or with Date).
Some people think it a copycat watch, while some others think it a homage. The truth is, with price only $139.99, it is a really affordable price. During Valentine’s, it is on sale with $99.99 shipped. The website frequently has sales. Without further delay, I bought it when it’s one sale. It is shipped from Tianjin, China via EMS, and USPS after it arrives in US. After 10 days of waiting, I finally received it!
It’s funny that as others commented, the soft bag is supposed to be used to hold the watch, but after opening the case, the watch is not wrapped around it.
However, the watch itself is fine and intact. It is wrapped in plastic strips under good protection.
These are all the stuff in the box. I can understand that, as I don’t think Rodina is not any famous big brand. Don’t expect full service inside the box.
This is really how it should come:
It’s transparent back, which is a perfect showoff for automatic watches. As you can see, it’s 5ATM, which is slightly better than 3ATM that a lot of cheap watches have. It is said to have a Seagull ST1713 movement (with date function).
The crown is really a bummer. If you go to look for its pictures online, the ‘R’ will have a blue coating. However, the blue coating on mine is totally off. I wiped them clean instead. You can see there is still blue pieces in the cleaning cloth. Nevertheless, I like the letter ‘R’ in the crown.
The watch dial is unbeatable. Case is made with sapphire crystal, which is astonishing for this price. The hands are in blue dark blue, which looks really cool. The only problem as someone else pointed out is that the fonts does not match: the numbers are sansserif, while the Rodina letters are serif fonts. Furthermore, ‘CHINA MADE’ sounds weird, although I can understand they may feel proud to say this.
The strap. Leather is of cheap quality as expected. Feels like plastic. However, the making is not bad. The stitching, the holes, are definitely satisfactory. I picked the brown one as I think black is too dressy for everyday wear. I have a skinny wrist, so I almost used the last hole.
This is how it looks like on my wrist. 39mm is a bit small for me. However, the winding piece is not stable enough. It creates noise and you can feel it moving when you move your arm and thus triggering the winding action. The ticking sound is pretty small though, compared to the notorious Timex Weekender.
With lots of imperfection, this watch is still worth buying. I really like it’s style. I wish I get one with roman literals and dates (the one with Roman literal does not have dates). I wish the leather is of better quality. But with this price point, one cannot expect too much, and the quality of the watch itself is really good. Here’s the pros and cons:
Pros:
Cons:
In general, my first impression for the watch is good. Might need some more time wearing it to see if , what do you think?
[Update] After several days of wear, I have to say that I really like the style. I also observed that the watch accuracy is excellent: only 2~3 seconds slow.
]]>有感于最近香港要求普选的民运，心中大概有点想法。
考虑到自己对政治的认识与时间的限制，为了防止与 别人无谓的争辩， 不想把这些想法发到社交网络上，毕竟，这只是个人的想法，或者更严格来说， 是一种理念。因为很多主义与思想，都没有绝对的对错，我们只能做到根据自己的人生观与思想来 做出取舍。而偏偏这个世界上又很多人喜欢跟别人 argue，一定要说服别人为止——我根本没有这个时间 与精力去辩论。另外，这也同时印证了我为什么是 pro精英主义的，我深知自己对这些事物的认识 不够，所以自觉需要更严谨的学习才有资本与别人争论。我相信多年以后，自己的看法一定又会 有所改变。彼时回首看这些文章，一定会觉得自己幼稚。不过无论怎么说，还是记录下自己的想法。
我对这件事情的看法很复杂——我对于民运人士的诉求部分支持又带有保留。争取投票权与心中的理想 当然是一种很好的东西，这点我是支持的。但是普选是否就意味着民主？更大的两个问题是，
普选就能保证选出来的人代表民意吗？ 这就相当于是多数人的暴政。对于这点，已经有很多论述。 这不是我的主要主题，在此不述。
如何保证选民投票的质量？ 很显然很多人限于知识水平，也许只能看到对自己最直接的利弊影响， 而并无法理解哪位候选人的纲领政策对社会，对国家，对世界的更深层次影响，反过来说，这些影响 也许反而会给自己带来并不那么直接的有害效益。 同时，没有自己独立思考过的选民，往往对候选人的印象仅仅来源于道听途说以及政治宣传，甚至 金钱操作，因此这些投票是受到影响的。而『知道自己在做什么』的人实在太少了，这一点在我 求学生涯中越来越明显越来越深刻。我见过太多例子，并不是表面直觉看起来那么简单，经过 深层分析抽丝剥茧后，才发觉真相往往是非常 counterintuitive 的。对于知识水平完全不在 一个层面的人的 bold move，很多时候只能说 『too simple, sometimes naive』。
所以，这些选票，我认为必须是有权重的。
第二点体现了我 pro精英主义的最重要原因。想对地，支持普选的更偏向于民粹主义。 两者都有他们的 pros and cons。这两篇文章介绍得很详细：
就投票权这一点来说，这两种主义就相当于两个极端：一个是认为选票控制在精英手里，一个是认为 人人有选票。我希望选票最终是要受到限制的，但是这个限制在公平与效益最大化的前提下做到最小。
一个可能的解决方案，就是进行 voter’s test。就好像高考一样，诚然每个人都应该享有教育权， 但是在现实资源的限制下，我们必须用考试这种相对公平的方法来保证我们的资源要用在对的人 身上，对于参考者来说，也算是进行了投入，付出了机会成本，并且能认识到这个权利是要争取回来的。 同时，这个考试，也保证了投票者具有基本的知识修养。另外打一个比方，就是人人都可以享有路权（投票权）， 但是要真正在马路上开车行驶（行使投票权），就必须对他人负责，通过考试来证明自己有资格使用这条路 并且不会影响他人的生命安全（投票时做出理性选择）。这里有篇文章与我观点类似。
这不是一个新的 idea，事实上，美国以前为了限制黑人投票，的确也实行过类似的考试。但那并不是说 这个制度就行不通，只是当时它被 abuse 了。这个考试必须是公平的，比如我们不设任何年龄与学历限制。这篇文章介绍了一些背景，里面提到了一些支持 voter’s test 的人。当然，我再次认识到这只是自己的主观认识，而且没有经过大量的背景阅读，所以很可能 这个制度会有更深层的问题。日后我会继续就这个 topic 展开讨论。
]]>Essentially, we want to find a most likely value of \(\theta\) given \(\data\), that is \(\arg \max P(\theta  \data)\). According to Bayes Rule, we have
\[ P(\theta \given \data) = \frac{P(\data \given \theta)P(\theta)}{P(\data)} \]
and the terms have the following meanings:
An easy way out is to use the MLE method. We want to find a \(\theta\) the best explains the data. That is, we maximize \(P(\data \given \theta)\). Denote such a value as \(\hat{\theta}_{ML}\). We have
\[ \hat{\theta}_{ML} = \argmax_\theta P(\data \given \theta) = \argmax_\theta P(\mathbf{x}_1, \ldots, \mathbf{x}_N \given \theta ) \]
Note that the above \(P\) is a joint distribution over the data. We usually assume the observations are independent. Thus, we have
\[ P(\mathbf{x}_1, \ldots, \mathbf{x}_N \given \theta ) = \prod_{i=1}^{N} P(\mathbf{x}_i \given \theta ) \]
We usually use logarithm to simplify the computation, as logarithm is monotonically increasing. Thus, we write:
\[ \mathcal{L}(\data \given \theta) = \sum_{i=1}^N \log P(\mathbf{x}_i \given \theta ) \]
Finally, we seek for the ML solution:
\[ \hat{\theta}_{ML} = \argmax_\theta \mathcal{L}(\data \given \theta) \]
If we know the distribution \(P\), we can usually solve the above by setting derivative of \(\theta\) to 0 and solve for \(\theta\), that is,
\[ \frac{\partial L}{\partial \theta} = 0 \]
In MAP, we maximize \(P(\theta \given \data)\) directly. Denote the MAP hypothesis as \(\hat{\theta}_{MAP}\), we have:
\[\begin{array}{rl} \hat{\theta}_{MAP} = & \argmax_\theta P(\theta \given \data) \\\\ = & \argmax_\theta \frac{P(\data \given \theta)P(\theta)}{P(\data)} \\\\ = & \argmax_\theta P(\data \given \theta)P(\theta) \end{array}\]
Note that the last step is due to the evidence (data) \(\data\) is constant, and thus can be omitted in \(\argmax\).
At this step, we notice that the only difference between \(\hat{\theta}_{ML}\) and \(\hat{\theta}_{MAP}\) is the prior term \(P(\theta)\). Another way to interpret is that we consider \(MAP\) is more general than \(MLE\), as if we assume all the possible \(\theta\) are equally probable a priori, e.g., they have the same prior probability, or uniform prior, we can effectively remove \(P(\theta)\) from the MAP formula, and it looks like exactly the same as MLE.
Finally, if the independent observation holds, again we can use logarithm and expand \(\hat{\theta}_{MAP}\) as:
\[ \begin{array}{rl} \hat{\theta}_{MAP} = & \argmax_\theta L(\data \given \theta) \\\\ = & \argmax_\theta \sum_{i=1}^{N} \log P(\mathbf{x}_i \given \theta ) + \log P(\theta) \end{array} \]
The extra prior term has the effect that we are essentially ‘pulling’ the \(\theta\) distribution towards prior value. This makes sense as we are putting our domain knowledge as prior and intuitively the estimation is biased towards the prior value.
Assume that we are given a set of data \(\data\), where each example \(\mathbf{x_j}=(a_1, a_2, \ldots, a_n)\), which can be viewed as conjunctions of attributes values. \(v_j \in V\) is the corresponding class value. Using MAP, we can classify an example \(\mathbf{x}\) as:
\[v_{MAP}=\argmax_{v_j\in V} P(v_j \given a_1, \ldots, a_n)\]
The problem is that it is hard to find a joint distribution for \(P(\mathbf{x} \given \theta)\). If we use the data to estimate the distribution, we typically don’t have enough data for each attribute. In other words, the data we have is very sparse compared to the whole distribution space.
Naive bayes makes the assumption that each attribute is conditionally independent given the target class \(v_j\), that is,
\[P(a_1, \ldots, a_n \given v_j) = \prod_{i=1}^n P(a_i \given v_j)\]
which can be easily estimated from the data. Thus, we have the following naive bayes classifier:
\[v_{NB} = \argmax_{v_j \in V} P(v_j) \prod_{i=1}^n P(a_i \given v_j)\]
Note that the learning of naive bayes simply involves in estimating \(P(a_i \given v_j)\) and \(P(v_j)\) based on the frequencies in the training data.
Normally the conditional independence assumption does not hold, but naive bayes performs well even if so. More importantly, when conditional independence is satisfied, Naive Bayes corresponds to MAP classification.
MLE, MAP and Naive Bayes are all connected. While MLE and MAP are parameter estimation methods that returns a single value of the paramter being estimated, NB is a classifier that predicts the probability of the class that an example belongs to. We also have the following insightes:
After reading this article, I have the following interpretation:
A few days ago, I encountered an issue which seems to be common among mid2009 MBPs: one of the RAM (in slot 1) is not recognized anymore. Or, sometimes it is recognized, but after sleep and wake up, the computer freezes and impossible to recover but force power off.
It turns out this is a common issue in this model. See the discussion in this thread and this thread. In the following, I am going to present my temporary fix for this problem. For those of you that still want to stick to the old MBP, the fix shall last for a while. But I do recomend you backup all the files and prepare to migrate some day soon.
As I suggested in my reply, this may due to a RAM slot degradation. My guess is, the RAM slot cannot align the RAM to a correct contact positions any more. Precisely, see the post image above. Notice that the two clips on the left and right are used to hold the rams in a horizontal position, otherwise they will bend upwards. I took a close look at those clips and found that the plastic wore out and cannot hold them as original. I don’t have great ways to fix them, so I just cropped some papers and insert them between the RAMs and the back edges on the body, hoping they can help tucking the RAMs.
As I am fixing it, I accidentally broke the left holder. So I have to customized a plastic holder and stuck it with the logic board to hold the inner ram (slot 1).
To verify my theory, I then taped a padding at the corresponding RAM position in the back cover:
And they look like the following before I close:
Note that you have to screw it real tight to create the pressure such that the RAM is aligned. That said, there will be a ‘bump’ at that position, and will easily cause scratch. So I use an apple sticker to cover my ass:
This method worked for me, well, at 99% of the time. Sometimes after sleep, the MBP still won’t wake up. I notice that this usually due to running the MBP for a long time, and it’s hot inside. Nevertheless, this is the best solution I can come up with by now. If you have any other cheap solution that does not require replacing the logic board, please let me know in the comments.
Finally, Zach Clawson created a dedicated page for this issue, which lists lots of reference and provides explanation to it. Make sure you check it out if you have encountered similar issue.
There are other common issues for this model, and they can be easily fixed. See my following posts:
If you have similar experience, do not hesitate to let me know. If you find my instruction helpful, leave a comment and share it!
]]>Recently, my mid 2009 MBP (Model A1278) fails to recognize the hard drive. My first bet was another disk failure on me, but it was not the case. I took down the hard drive and put it to a mobile hard drive case and it can be read smoothly.
It turns out that it is due to the SATA cable fault, which is a notorious problem for mid 2009 MBP model. See the threads and discussions here, here, here and here.
Luckily, the solution is simple, just go ahead and purchase a replacement cable and replace it. iFixit has a very detailed illustrated document on the procedures. However, Amazon has a cheaper option, and it works fine for me. I didn’t test the infrared though, since I never and not plan to use it.
There are other common issues for this model, and they can be easily fixed. See my following posts:
If you have similar experience, do not hesitate to let me know. If you find my instruction helpful, leave a comment and share it!
]]>After blogging with Octopress for a while, I have already gained some insights on it, and my publishing flow has been smoother. I think it is right time to share my flow as a reference.
The following sections outlines the flow. The last section contains assorted tips and tricks. For the basic configuration of Octopress, please refer to the official website. I also recommend installing Alfred.app.
Artem Yakimenko created an awesome alfred workflow for publishing and generating octopress websites. Use blog publish [title]
to create a new post:
It then opens the post in your specified text editor with template.
I use Sublime Text (ST) as my text editor, because it provides VIM keybinding and there is a huge repository of plugins. While editing, I use [Marked] to instantly render the markdown file and view the result. In fact, the title image shows my editing and previewing in action.
To make the process sweeter, there is a ST plugin called Marked App Menu
that allows you to open the current file in Marked. Just search in ST Package Control to install it.
To preview the generate website, simply install pow and execute rake watch
under octopress directory to monitor the change. Octopress official website provides some explanation. After installation, you can view your website locally at http://octopress.dev.
You should also install the pow alfred workflow, which can help you open pow website in a breeze.
Since I am an EECS student, I write a lot of Optimization, Machine Learning and Computer Vision stuff, which relies heavily on mathematics. Thus, writing math formulas is a must for me. MathJax is a javascript library for rendering math by writing LaTeX math. To do this, one needs to configure the site and link to the library. Put these two lines into <octopress>/source/_includes/custom/head.html
:
1 2 3 4 5 

The first script block loads MathJax, and the second loads a custom configuration in source/javascripts/MathJaxLocal.js
. It is a good place to write your own macro there. For instance:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Now you can write math!
\[e^{i \pi} + 1 = 0\]
There are a couple of good articles for reference:
I host my images in Dropbox Public folder, since you can simply copy the public link and paste it to the post source, for example:
The previous image is done by a timed capture from skitch. For advanced vector graphics, I use OmniGraffle.
Pandoc is a swissarmy knife like tool that convert documents in multiple formats to several dozens of output formats. I mainly use it as the markdown converter for Octopress. A plugin can help you with that.
After installation, I update the markdown
section of _config.yml
with the following:
1 2 3 4 5 6 7 8 

which tells Octopress to use pandoc, and pass the option smart
, mathjax
and use the style file ieee.csl
to format the biliography blog.bib
. For example, refer to [@xiao2013optimally]
generates refer to [[1]] (scroll down to see the References section).
Dash.app is an API Documentation Browser and Code Snippet Manager. It provides an convinient alfred workflow that searches the documents:
opens
Continuously growing…
http://blog.ivansiu.com
returns the url of site, which is http://blog.ivansiu.com in my case. In fact, anything in _config.yml
is a variable under site
.In conclusion, Octopress is a revolutionary blogging framework. It provides a robust static site building framework (jekyll, bootstrap, scss, etc.) and allows complete control over the source, which is perfect for users that have basic coding and source control skills. In fact, it gives me a similar feeling of getting touch with a Mac. That is, compared to Windows, which is too close and does not provide builtin programmingfriendly environment (Console, UNIX stuff, etc.), and compared to Linux, which is very open but too many variations and too many customizations needed, it combines their advantages by presenting a user friendly interface and provides all sorts of underlying UNIX tools. I am very satisfied about this and my intention to write posts have revived. However, some sort of basic configuration is still needed. In particular, I would say Mathjax rendering and better image support definitely need to be integrated in the next release.
What’s your thought? Do you have any neat tricks publishing with Octopress? Please leave your comments.
[1] Z. Xiao, Y. Du, H. Tian, and M. D. Wong, “Optimally minimizing overlay violation in selfaligned double patterning decomposition for rowbased standard cell layout in polynomial time,” in Computeraided design (iccad), 2013 ieee/acm international conference on, 2013, pp. 32–39.
This is a follow up of the L1minimization series. The previous two posts are:
We have explored using L1minimization technique to recover a sparse signal. The example shows a 1D example. This post demonsrates on a 2D example, where the image is viewed as a signal. This makes sense as we can perform 2D Fourier Transform in the image, where the basis are a combination of horizontal and vertical waves. For a complete introduction to FFT on images, refer to this tutorial. Notice that similar to 1D signal, we do not measure the image directly in time domain, but we do it in the frequency domain. Concretely, say \(x\) is the 2D image collapsed to 1D, and \(A \in \reals^{k\times n}\) is the measurement matrix, \(b\) is the observation, we then have \(Ax=b\). Usually we will require \(k = n\) to obtain an exact solution for \(x\) given \(A\) and \(b\). Now, if we use FFT and obtain the frequency coefficients as \(\hat{x}\), we can also perform similar measurements \(\hat{A} \hat{x} = \hat{b}\), and the requirement \(k = n\) is the same. In other words, the required samples (the information) is the same. By using the inverse fourier transform, we can convert \(\hat{x}\) back to \(x\). The only difference is that the measurement \(\hat{A}\) is taken in frequency (Fourier) domain. As we can see later, we can utilize sparse information to reduce \(k\).
We first introduct the concept of image gradients. For any 2D real image I
, if we think about each row as a signal, we can then view the ‘difference’ between adjacent pixels as (horizontal) gradient Gx(I)
, this makes sense since a sharpe change denotes an edge. Similary, we can define the vertical gradient Gy(I)
for columns. Thus, we have
\[Gx(I) = \begin{cases} I_{i+1, j}  I_{ij} & i < n \\\\ 0 & i = n \end{cases} \qquad Gy(I) = \begin{cases} I_{i, j+1}  I_{ij} & j < n \\\\ 0 & j = n \end{cases}\]
where the image size is \(n\times n\).
Collectively, the image gradient G(I)
is defined as the magnitude (2norm) of both components:
\[G(I)_{ij} = \sqrt{(Gx(I)_{ij})^2 + (Gy(I)_{ij})^2}\]
The following shows Gx
, Gy
and G
of the phantom image:
Gx(I) 
Gy(I) 
G(I) 
The total variation TV(I)
of an image is just the sum of this discrete gradient at every point.
\[TV(I)= \norm{G(I)}_1 = \sum_{i,j} G(I)_{ij}\]
We notice that \(TV(I)\) is just the L1norm of \(G(I)\), which leads us to the following: if we have an image that is sparse in its image gradients, we can exploit that and use our L1minimization trick.
The ratio of nonzero elements in Gx
, Gy
and G
of the phantom image is 0.0711
, 0.0634
and 0.0769
, respectively. These ratios are really small  and we consider the gradient as sparse.
Let \(F: \reals^{n\times n} \to \complex^{n\times n}\) be the FFT operator, and \(F I\) be the Fourier transform taken on image I. Define a set \(\Omega\) as the \(k\) twodimensional frequencies chosen according to some sampling pattern from the \(n \times n\). We further define \(F_\Omega I: \reals^{n \times n} \to \complex^k\) as the \(k\) observation taken from the fourier transform of image I. We can then solve the following optimization problem to recover \(I\):
\[\min_I \norm{F_\Omega I  b}^2_2\]
where \(F_\Omega\) can be view as the measurement matrix, \(b\) is the observation, and we want to find \(I\) such that the reconstruction cost (energy) is minimized.
However, the above does not quite work. As we can see in the following images, the L2minimization does a poor job, either for a random measurement or a radial measurement [[1]] in Fourier domain.
Random measurement  L2minimization  L1minimization 
Radial measurement  L2minimization  L1minimization 
To utilize the sparse information, we add a L1regularization term to the above objective function, which yields the following:
\[(TV_1) \quad \min_I \norm{F_\Omega I  b}^2_2 + \lambda TV(I)\]
Without surprise, optimizing the above gives us a perfect reconstruction of the original image. It is shown that if there exists a piecewise constant I with sufficiently few edges (i.e., \(G(I)_{ij}\) is nonzero for only a small number of indices i, j), then \((TV_1)\) will recover I exactly.
A heavily commented code example is available in my github repository. Leave a comment if you have any question.
Now, take a look at another example cameraman
, which has the following gradients (intensity rescaled using matlab’s imagesc
.
Cameraman  Gradient 
The following shows the reconstructions (left two are using random measurements, right two are using radial measurements).
Rand (L2)  Rand (L1)  Radial (L2)  Radial (L1) 
As we can see, the results are not as good. In fact, the nonzero ratio of its gradient is 0.9928, which is not sparse at all. However, if we plot the histogram of gradients, we will find that most of the gradient magnitudes are small:
In particular, most of them are smaller than 200, which means the number of ‘changes’ that are larger than 200 is small. In fact, the ratio of gradient > 200 is only 0.0964! Thus, there are two possible ways to discard these information and get a ‘compressed’ image that is sparse in gradients:
I’ll leave these conjectures for furture implementation. For those intereted, please try them yourself and let me know your results. If you have any thoughts, do not hesitate to leave a comment.
For interested readers, the following references will be helpful.
[1] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” Information Theory, IEEE Transactions on, vol. 52, no. 2, pp. 489–509, 2006.
[2] E. Candes and J. Romberg, “L1magic: Recovery of sparse signals via convex programming,” vol. 4, 2005.
[3] J. S. Hesthaven, K. Chowdhary, E. Walsh, and others, “Sparse gradient image reconstruction from incomplete fourier measurements and prior edge information,” IEEE TRANSACTIONS ON IMAGE PROCESSING, 2012.
[4] J. K. Pant, W.S. Lu, and A. Antoniou, “A new algorithm for compressive sensing based on totalvariation norm,” in Circuits and systems (iscas), 2013 ieee international symposium on, 2013, pp. 1352–1355.
This is a followup of the previous post on applications of L1 minimization.
As we know, any signal can be decomposed into a linear combination of basis, and the most famous one is Fourier Transform. For simplicity, let’s assume that we have a signal that is a superposition of some sinusoids. For example, the following:
1


With discrete consine transform (DCT), we can easily find the coefficients of corresponding sinusoid components. The above example’s coefficients (in frequency domain) and signal in time domain are shown in the post figure.
Now, let’s assume we do not know the signal and want to reconstruct it by sampling. Theorectically, the number of samples required is at least two times the signal frequency, according to the famous Nyquist–Shannon sampling theorem.
However, this assume zeroknowledge about the signal. If we know some structure of the signal, e.g., the DCT coefficients are sparse in our case, we can further reduce the number of samples required.^{1}
The following code snippet demonstrates how this works. We generate the original signal in time domain and then perform a DCT to obtain the coefficients.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

Let’s assume that we have a device that can sample from the frequency domain. To do this, we create a random measurement matrix to obtain the samples. We use 80 samples here. Note that we normalize the measurement matrix to have orthonormal basis, i.e., the norm of each row is 1, and the dot product of different row is 0.
1 2 3 4 5 6 7 

We first try a leastsquare approach, which boils down to inverse the matrix and obtain \(\hat{x}=A^{1} b\). Note that as A is not square, we are using its pseudoinverse here. Furthermore, as A is othornormal, its transpose is the same as pseudoinverse.
1 2 3 4 5 6 7 8 9 10 11 

As we can see, there are lots of nonzeros in the coefficients, and the recovered signal is very different from the original signal.
Finally, we use L1minimization for reconstruction. I used lasso
to perform a L1regualarized minimization. Another package that performs various L1minimization is l1magic.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 

The above shows that L1minimization successfully recovered the original signal. A complete code snippet can be found here.
Ordinary Least Square (OLS), L2regularization and L1regularization are all techniques of finding solutions in a linear system. However, they serve for different purposes. Recently, L1regularization gains much attention due to its ability in finding sparse solutions. This post demonstrates this by comparing OLS, L2 and L1 regularization.
Consider the following linear system:
\[Ax = y\]
where \(A \in \reals^{m \times n}\), \(m\) is the number of rows (observations) and \(n\) is the number of columns (variable dimension), \(x\) is the variable coefficients and \(y\) is the response. There are three cases to consider:
In the following, we show their performances by solving a simple case.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 

Output:
1 2 

The above code snippets generates an underdetermined matrix \(A\), and a sparse coefficients which has 200 variables but only 10 of them are nonzeros. Noises are added to the responses. We then run the proposed three methods to try to recover the coefficients. It then generates two plots:
Scikit has some excellent examples on regualarization (1, 2). Quora has an excellent discussion on L2 vs L1 regualarization. I found the top three answers very useful in understanding deeper, especially from the Bayesian regularization paradigm perspective by thinking the regularization as MAP (Maximum A Posteriori) that adds a Laplacian (L1) or Gaussian (L2) prior to the original objective.
]]>log
to print out anything in the console. However, after you compiled it to an app, this cannot work anymore.
I find there are several ways to do it in this thread. The two approaches that work best for me are:
logger
to log to the syslog. E.g.,
1


However, I don’t know why sometimes this is not logged. So I will use the following:
1 2 

One of the neat things you can do in OS X is to reveal a file in Finder.app from some other applications. Turns out lots of the time, we also want to do that in the terminal. The following script helps you with that:
1 2 3 4 5 6 7 8 9 

Note:
~/bin
. Also remember to chmod +x
.realpath
, which returns the fullpath of a file. You can find it here.
1 2 3 

realpath
is a command line utility that is included in most UNIX distributions but not Mac OS X. Thanks to Stuart Campbell, a minimal implementation is provided here, and my fork.
If you use homebrew
, you can tap my repo, and install it using homebrew.
1 2 

Viola! Now you can get the full path of file in console. A nice thing I often use is to chain it with pbcopy
to copy the full path to the OS X clipboard.
While tmux
provides much better functionality than screen
, most of us that work with tmux have been using screen
for a long time, and it is more comfortable for us to use ctrla
than the default ctrlb
, which is fingerstrechy. Thus the first thing I will do after installing tmux
is to rebind the prefix to ctrla
. That gives us the most handy way of swapping last two windows by typing ctrla ctrla
.
However, this comes for a price. That is, in a shell environment that is integrated with readline
, ctrla
is used to jump the begining of line. Now that it is mapped as prefix, we can no longer do that.
Surprisingly, the solution is pretty simple. Just use ctrla a
to send the prefix itself, and use ctrla ctrla
to go to the last window. Specifically, add these two lines in ~/.tmux.conf
:
1 2 

via.
]]>I recently worked with Matlab a lot. When in console, sometimes I want to use less
to quickly examine the file content, and I have already set it up such that it uses sourcehighlight to output colorful escape sequence to the console. However, sourcehighlight does not come with a syntax support for Matlab by default. Luckily, this post and this (in Chinese) provides a solution.
First of all, install sourcehighlight using homebrew.
1


Note that sourcehighlight depends on boost, and as of the date of this post, brew provides a precompiled library (bottle) for boost. However, the python support is compiled against the system python, so if you installed a custom one (say installed via homebrew) and use it by default, brew will compile the boost from source instead, which takes an extremely long time. To prevent this, we need to unlink it first, and link it back afterwards. That is,
1 2 

Go to the following folder and create two files,
1


1 2 3 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Finally, edit lang.map
to create a mapping for matlab file.
1


There are several ways to manipulate it, but really not enough:
The workflow I adopted is a twostep approach: mark those I want to keep in the iPhone, delete all stuff using the 3rd method, then upload back.
If you have some videos that occupy lots of space, you can convert them to lower quality. To do this, HandBrake is the excellent tool, which is opensourced. However, it does not support batch change. Luckily, HandBrakeBatch by OSOMAC can achieve this.
To upload things back, SimpleTransfer can help, especially for videos, which cannot be put back into camera roll easily (I have no idea why apple restricts this). The lite version is free, but you can only upload one item at a time, and there are also some restrictions.
]]>proxy.pac
at all. Turns out because of sandboxing, it will not allow reading file from local. A traditional solution is to turn on Web Sharing, and thus use HTTP to read the pac file such as http://localhost/proxy.pac
.
However, this cannot be done that simple, since Apple removed Web Sharing from normal version of Mavericks. To turn on the web service (Apache), do this:
1


Also, place the pac file under /Library/WebServer/Documents
, which is the default Document Root of Apache.
Tom Fischer proposed another way to get around, however I don’t think it a good idea to mess around the system files.
]]>